Adam Zelizer writes:
I saw your post on your blog about underpowered coronavirus research experiments and wondered if you had seen it. this paper„Counter-Stereotype Messaging and Partisan Cues: Moving the Vaccine Needle in a Polarized America“ The book, written by a strong team of economists and political scientists, finds that President Trump's pro-vaccine message It can be seen that this has had a large positive effect on the spread of
Researchers found that messaging (done through Youtube ads) had a large positive effect on the number of vaccines administered at the county level (more than 100 new vaccinations in targeted counties); This is limited to cases where the specifications are changed from those previously specified in the PAP. The p-value for the main modified specification is only 0.097 from the one-tailed test, and the effect size from the modified specification is 10 times larger than the effect size obtained from the prespecified model. A prespecified model found that showing a Trump ad increased the number of vaccines administered in the average treated county by 10. An additional 103 vaccines are estimated as reported in the paper specifications and abstracts. Therefore, moving from a PAP specification to a paper specification not only improves accuracy but also dramatically improves the estimated treatment effect.good example Suppression effect.
They explain the logic for using the revised spec, but it smells like a garden of forks.
Here is an excerpt from the article:
For branching paths, all reasonable specifications are met; hierarchical modelor at least do the following: multiverse analysis. There is no reason to think that the effect of this treatment should be zero. If you really care about effect size, you'll want to avoid obvious sources of bias such as model selection.
The part above about one-sided tests reflects a common misconception in the social sciences. I keep saying it until my lips bleed, but the effect is never zero. It can be larger in some settings and smaller in others, sometimes positive, sometimes negative. From a researcher's perspective, the idea of hypothesis testing is to provide convincing evidence that a treatment indeed has a positive average effect. It's not a problem. This is directly addressed by estimation. Uncertainty intervals tell you what your data tells you.
You're doing a one-tailed test and are told that a p-value of 0.1 (or 0.2 if you follow the standard approach) is okay because it has a „low signal-to-noise ratio.“ . . That's just weird. A low signal-to-noise ratio means high uncertainty in the conclusion. It’s okay to have a lot of uncertainty! Despite this uncertainty, we can still recommend this policy. Ultimately, policymakers have to do something. To me, this one-sided test and p-value thresholding seems to miss the point in that it attempts to wring an expression of near certainty from data that does not admit of such interpretation. Masu.
P.S. I am not writing this type of post out of any animosity towards the author or the research topic. I'm writing about problems with these methods because I care. Policies matter. I don't think it's good for policy when researchers use statistical methods that lead to overconfidence or an inappropriate impression of certainty or near-certainty. The goal of statistical analysis is not to achieve statistical significance or reach some point of success. We need to learn what we can from data and models, and also figure out what we don't know.