Image by Ludomił Sawicki on Unsplash
Image by Ludomił Sawicki on Unsplash

How can we look at the same dataset and come to wildly different conclusions?

Recently, a study came out where 73 research teams independently analyzed the same data, all trying to test the same hypothesis. Seventy-one of the teams came up with numerical results across a total of 1,253 models. Across these 1,253 different ways of looking at the data, about 58% showed no effect, 17% showed a positive effect, and 25% showed a negative effect. But that’s not even the oddest part. 

The oddest part is that despite a heroic attempt to do so, the study authors failed to explain why the different research teams reached such different conclusions:

“More than 95% of the total variance in numerical results remains unexplained even after qualitative coding of all identifiable decisions in each team’s workflow. This reveals a universe of uncertainty that remains hidden when considering a single study in isolation.”

The hypothesis they were trying to test was whether greater immigration reduces support for social policies (such as for government-provided healthcare).

The study included data from 31 countries at up to five time periods each.

If all of the countries and time points were independent data points, that would be an effective sample size of at most 5*31 = 155.

However, the data points from one country at different points in time are highly correlated.

So, in practice, this might be equivalent to more like 75 (independent) data points.

If the (effective) sample size was equivalent to only about n=75, the 95th percentile confidence interval on a correlation could be pretty large (e.g., it could be +- 0.20), suggesting that false negatives would be very common (unless the relationship in question is pretty strong).

In view of this, I think it’s possible that this study was doomed from the start.

Why? Unless we’d expect a reasonably strong effect, maybe there just wasn’t enough data to answer the question at hand.

Some of the teams may have come to this conclusion as well. Of the 73 research teams involved in the study, one of them conducted preliminary measurement scaling tests and concluded that the hypothesis could not be reliably tested. Another team’s preregistered models failed to converge, and so that team also had no numerical results. Some teams reached more than one conclusion, and across the 89 team conclusions reached, 12 of those conclusions (13.5%) were that the hypothesis was not testable with the data provided.

Then, of course, there is the possibility of confounding variables – other factors that could be linked to both immigration and inflation that make the true relationship seem bigger or smaller than it really is.

Here’s a link to the paper if you’re interested: “Observing many researchers using the same data and hypothesis reveals a hidden universe of uncertainty.”

A big thanks to Cameron Colby Thomson for pointing me to this study!


This piece was first written on November 30, 2022, and first appeared on this site on September 29, 2023.


  

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *