This is all very simple. Still, it might be useful. In today's lesson, students asked for intuition about why when regressing y on x, measurement error in x biases the coefficient estimates, but measurement error in y does not.
I gave a simple explanation as follows.
– We are already starting with the model y_i = a + bx_i + e_i. Adding the measurement error to y, we call y*_i = y_i + eta_i, and regressing y* on x, we can write y* = a + bx_i + e_i + eta_i, as long as eta is independent of e. can. , they can be combined into a single error term.
– If there is a measurement error in x, two things happen that attenuate b. This means that the regression coefficient approaches zero. First, if you widen x and do not change y, the slope of y on x will decrease. Second, adding noise to x reorders the data and thus reduces the strength of the relationship.
But it's all words (and some math). It is easier and clearer to do a live simulation. I did it on the fly during class.
Here is the R code:
# simulation for measurement error library("arm") set.seed(123) nThe resulting plot is at the top of this post.
I like this simulation for three reasons:
1. You can look at the graph and see how the slope changes with measurement error in x but not in y.
2. This exercise shows the benefits of clear graphics, including little things like making the dots small, adding the regression lines in red, labeling the individual plots, and using a common axis range for all four graphs.
3. It was fast! I did it live in class, and this is an example of how students, or anyone, can answer this sort of statistical question directly, with a lot more confidence and understanding than would come from a textbook and some formulas.
P.S. As Eric Loken and I discuss in this 2017 article, everything gets more complicated if you condition on "statistical significance."
P.P.S. Yes, I know my R code is ugly. Think of this as an inspiration: even if, like me, you’re a sloppy coder, you can still code up these examples for teaching and learning.