The pressure on scientists is without a doubt stronger than ever; in an environment with ever growing competition, you simply have to publish all the time and constantly make breakthroughs, sometimes up to inhuman levels. This environment, often call “publish or perish” is starting to take its toll on the quality of science, as more and more researchers publish incomplete or sometimes misleading findings – with some authors even falsifying their data.
But weeding out bad science is an extremely complicated and delicate process, and short of peer review there’s little left to do; even peer review is flawed, or at the very least imperfect. The best thing we have is peer review, and that has some pretty big flaws, so researchers have developed a new way of finding fraudulent papers: the “obfuscation index”.
“We believe the underlying idea behind obfuscation is to muddle the truth,” said one of the team, David Markowitz. “Scientists faking data know that they are committing a misconduct and do not want to get caught. Therefore, one strategy to evade this may be to obscure parts of the paper. We suggest that language can be one of many variables to differentiate between fraudulent and genuine science.”
In other words, when scientists want to mask something, they use big words – scientific jargon – hoping to puzzle people (or why not, bore them) enough that their shortcomings are overlooked. In order to test this theory Jeff Hancock, a professor of communication at Stanford, and graduate student David Markowitz searched the archives of PubMed, a database of life sciences journals, from 1973 to 2013 for retracted papers. They found a total of 253 scientific papers retracted for fraud in the four decades, and compared them to unretracted papers from the same period to see what stands out.
They found that fraudulent papers use significantly more jargon than regular ones:
“Fradulent papers had about 60 more jargon-like words per paper compared to unretracted papers,” Markowitz said. “This is a non-trivial amount.”
Of course, just because someone uses more complicated language doesn’t mean there is something wrong with their research, but in the future, a computerized algorithm could tag these and label them for additional investigation. This approach is just in its initial stages though.
“Science fraud is of increasing concern in academia, and automatic tools for identifying fraud might be useful,” Hancock said. “But much more research is needed before considering this kind of approach. Obviously, there is a very high error rate that would need to be improved, but also science is based on trust, and introducing a ‘fraud detection’ tool into the publication process might undermine that trust.”