I first published this post on the Clearer Thinking blog on December 19, 2022, and first cross-posted it to this site on January 21, 2023.
You have probably heard the phrase “replication crisis.” It refers to the grim fact that, in a number of fields of science, when researchers attempt to replicate previously published studies, they fairly often don’t get the same results. The magnitude of the problem depends on the field, but in psychology, it seems that something like 40% of studies in top journals don’t replicate. We’ve been tackling this crisis with our new Transparent Replications project, and this post explains one of our key ideas.
Replication failures are sometimes simply due to bad luck, but more often, they are caused by p-hacking – the use of fishy statistical techniques that lead to statistically significant (but misleading or erroneous) results. As big a problem as p-hacking is, there is another substantial problem in science that gets talked about much less. Although certain subtypes of this problem have been named previously, to my knowledge, the problem itself has no name, so I’m giving it one: “Importance Hacking.”
Academics want to publish in the top journals in their field. To understand Importance Hacking, let’s consider a (slightly oversimplified) list of the three most commonly-discussed ways to get a paper published in top psychology journals:
- Conduct valuable research – make a genuinely interesting or important discovery, or add something valuable to the state of scientific knowledge. This is, of course, what just about everyone wants to do, but it’s very, very hard!
- Commit fraud – for instance, by making up your data. Thankfully, very few people are willing to do this because it’s so unethical. So this is by far the least used approach.
- p-hack – use fishy statistics, HARKing (i.e., hypothesizing after the results are known), selective reporting, using hidden researcher degrees of freedom, etc., in order to get a p<0.05 result that is actually just a false positive. This is a major problem and the focus of the replication crisis. Of course, false positives can also come about without fault, due to bad luck.
But here is a fourth way to get a paper published in a top journal: Importance Hacking.
4. Importance Hack – get a result that is actually not interesting, not important, and not valuable, but write about it in such a way that reviewers are convinced it is interesting, important, and/or valuable, so that it gets published.
For research to be valuable to society (and, in an ideal world, publishable in top journals), it must be true AND interesting (or important, useful, etc.). Researchers sometimes p-hack their results to skirt around the “true” criterion (by generating interesting false positives). On the other hand, Importance Hacking is a method for skirting the “interesting” criterion.
Importance Hacking is related to concepts like hype and overselling, though hype and overselling are far more general. Importance Hacking refers specifically to a phenomenon whereby research with little to no value gets published in top journals due to the use of strategies that lead reviewers to misinterpret the work. On the other hand, hype and overselling are used in many ways in many stages of research (including to make valuable research appear even more valuable).
One way to understand importance hacking is by comparing it to p-hacking. P-hacking refers to a set of bad research practices that enable researchers to publish non-existent effects. In other words, p-hacking misleads paper reviewers into thinking that non-existent effects are real. Importance Hacking, on the other hand, encompasses a different set of bad research practices: those that lead paper reviewers to believe that real (i.e., existent) results that have little to no value actually have substantial value.
This diagram illustrates how I think Importance Hacking interferes with the pipeline of producing valuable research:
There are a number of subtypes of Importance Hacking based on the method used to make a result appear interesting/important/valuable when it’s not. Here is how I subdivide them:
Types of Importance Hacking
1. Hacking Conclusions: make it seem like you showed some interesting thing X but actually show something else (X′) which sounds similar to X but is much less interesting/important. In these cases, researchers do not truly find what they imply they have found. This phenomenon is also closely connected with validity issues.
- Example 1: showing X is true in a simple video game but claiming that X is true in real life.
- Example 2: showing A and B are correlated and claiming that A causes B (when really A and B are probably both caused by some third factor C, which makes the finding much less interesting).
- Example 3: if a researcher claims to be measuring “aggression,” and couches all conclusions in these terms but is actually measuring milliliters of hot sauce that a person puts in someone else’s food. Their result about aggression will be valid only insofar as it is true that this is a valid measure of aggression.
- Example 4: some types of hacking conclusions would fall under the terms “overclaiming” or “overgeneralizing;” Tal Yarkoni has a relevant paper called The Generalizability Crisis.
2. Hacking Novelty: refer to something in a way that makes it seem more novel or unintuitive than it is. Perhaps the result is already well known or is merely what just about everyone’s common sense would already tell them is true. In these cases, researchers really do find what they claim to have found, but what they found is not novel (despite them making it seem so). Hacking Novelty is also connected to the “Jingle-jangle” fallacy – where people can be led to believe two identical concepts are different because they have different names (or, more subtly, because they are operationalized somewhat differently).
- Example 1: showing something that is already well-known but giving it a new name that leads people to think it is something new. The concept of “grit” has received this criticism; some people claim it could turn out to be just another word for conscientiousness (or already known facets of conscientiousness) – though this question does not yet seem to be settled (different sides of this debate can be found in these papers: 1, 2, 3 and 4).
- Example 2: showing that A and B are correlated, which seems surprising given how the constructs are named, but if you were to dig into how A and B were measured, it would be obvious they would be correlated.
- Example 3: showing a common-sense result that almost everyone already would predict but making it seem like it’s not obvious (e.g., by giving it a fancy scientific name).
3. Hacking Usefulness: make a result seem useful or relevant to some important outcome when in fact, it’s useless and irrelevant. In these cases, researchers find what they claim to have found, but what they find is not useful (despite them making it sound useful).
- Example: focusing on statistical significance when the effect size is so small that the result is useless. Clinicians often distinguish between “statistical significance” and “clinical significance” to highlight the pitfalls of ignoring effect sizes when considering the importance of a finding.
4. Hacking Beauty: make a result seem clean and beautiful when in fact, it’s messy or hard to interpret. In these cases, researchers focus on certain details or results and tell a story around those, but they could have focused on other details or results that would have made the story less pretty, less clear-cut, or harder to make sense of. This is related to Giner-Sorolla’s 2012 paper Science or art: How aesthetic standards grease the way through the publication bottleneck but undermine science. Hacking beauty sometimes reduces to selective reporting of some kind (i.e., selective reporting of measures, analyses, or studies) or at least of selective focus on certain findings and not others. This becomes more difficult with pre-registration; if you have to report the results of planned analyses, there’s less room to make them look pretty (you could just say they’re pretty, but that seems like overclaiming)
- Example: emphasizing the parts of the result that tell a clean story while not including (or burying somewhere in the paper) the parts that contradict that story
Science faces multiple challenges. Over the past decade, the replication crisis and subsequent open science movement have greatly increased awareness of p-hacking as a problem. Measures have begun to be put in place to reduce p-hacking. Importance Hacking is another substantial problem, but it has received far less attention.
If a pipe is leaking from two holes and its pressure is kept fixed, then repairing one hole will result in the other one leaking faster. Similarly, as best practices increasingly become commonplace as a means to reduce p-hacking, so long as the career pressures to publish in top journals don’t let up, the occurrence of Importance Hacking may increase.
It’s time to start the conversation about how Importance Hacking can be addressed.
If you’re interested in learning more about Importance Hacking, you can listen to psychology professor Alexa Tullett and me discussing it on the Clearer Thinking podcast (there, I refer to it as “Importance Laundering,” but I now think “Importance Hacking” is a better name) or me talking about it on the Two Psychologists Four Beers podcast. We also discuss my new project, Transparent Replications, which conducts rapid replications of recently published psychology papers in top journals in an effort to shift incentives and create more reliable, replicable research. If you enjoyed this article, you may be interested in checking our replication reports and learning more about the project.
Did you like this article? If so, you may like to explore the ClearerThinking Podcast, where I have fun, in-depth conversations with brilliant people about ideas that matter. Click here to see a full list of episodes.
While it’s not great, I’m not really seeing how this is a major threat to science. It doesn’t seem like these techniques are beyond the capability of peer-review to manage (i.e. my sense is that importance hacking hasn’t meant the genuinely important papers aren’t most likely to get into the best journals…though it’s always very noisey) and even if/when they are I don’t see how they corrupt the scientific record (it’s akin to lawyers on both sides phrasing their arguments in the most persuasive terms rather than the most objective ones).
What’s troubling about p-hacking is that it creates a false/misleading scientific record. The problem here seems to be, at worst, that we might not promote the people with the most important results but that’s largely a matter of luck anyway.
Even your examples don’t really seem that bad. Maybe grit is really just conscientiousness (I suspect it is). But maybe it’s not and that possibility makes it interesting and even if not it’s still good work. Yah, sometimes a significant result isn’t actually meaningful in the real world but if the scientists reading the paper can’t tell the difference the problem isn’t importance hacking since they’ll have the same issue interpreting their own data.
Besides, I’m not seeing the alternative. If you try to say that scientists shouldn’t make their work sound as interesting/important/etc.. as possible how could you possibly create an enforceable rule? Whoever was willing to push the line the most would have an advantage.
At least if everyone tries to make their work sound as important as possible then it’s all competing on a fairly level playing field.
Besides, I don’t see how it can both be true that this importance hacking is tricking the reviewers into giving priority to the wrong papers but yet you can see through it to notice it’s happening so much as to threaten science.
We can discover p-hacking that the reviewers couldn’t notice because of replication failures and by looking at statistical properties of many articles. The only way to see this problem is to observe the same information the reviewers saw and second guess them. So how can you distinguish between this and the theory that they do see through it and it’s just that there is a fair bit of disagreement about what’s important/publishable?
I really became aware of Importance Hacking occurring as I worked on replicating papers. When replicating, we rebuild the study from scratch in all its details, engaging with the exact wording of every question, the precise presentation of materials, etc., which is a much closer investigation than reviewers typically do. It is far easier to spot Importance Hacking when replicating than when reviewing.
Hi Peter – I claim these do very often get by peer review (that’s precisely why people engage in this behavior – if it didn’t get through peer review, there would be little point). Additionally, this behavior, I believe, does distort scientific knowledge because upon a cursory reading of these papers (or, worse still, just reading the abstract) they often seem like they showed something they didn’t show (because if they didn’t seem that they showed something they didn’t show, it would be obvious that they showed something of little to no value). In that regard, it is much like when p-hacking occurs (resulting in false positives) because it leads the reader to a false impression about what’s true.
Too true — this leads to some hyperinflated claims in science but there are at least some checks and balances to “making up facts.” However, it is truly rampant in the social sciences and especially economics where the payoff to creating an ideologically driven “fact” through spurious statistics is very high.