Several online forums developed over the last years to foster open discussion of peer-reviewed scientific publications, PeerPub and PeerJ are two of them. The integrity of data is central to the discussion – assuming that discussed openly problems with data will helo to correct the scientific record.
But is this assumption justified?
Paul S. Brookes, a researcher at the University of Rochester in the United States, wanted to find out. He created a blog as a platform for people to submit questionable data along with the respective publications to him to be published and discussed in an open forum.
From July to December 2012 he received 274 hints by email regarding data integrity problems in published journal articles, mostly in the life sciences. The cases were documented in blog posts along with background information and illustrations. In January 2013 legal threats forced him to stop his project but researchers continued to anonymously submit papers with documented problems – in total 223. These cases remained in Paul Brookes’ private collection and could not be released due to the legal circumstances.
However, with almost documented cases of data problems in nearly 500 scientific studies he found himself in possession of an interesting dataset and analysed it to answer the question:
With the escalating adoption of social media techniques by science activists, …
He published his findings in the journal PeerJ under the title ‘Internet publicity of data problems in the bioscience literature correlates with enhanced corrective action”. A summary is illustrated below.
Public discussion of data integrity – does it make a difference?
At the time of analysis Paul P. Brookes had 497 paper with 274 papers with documented problems in data integrity: 274 published on his blog and 223 that reached him after he stopped publication. Two datasets of comparable size. In both datasets the number of problematic panels per paper was between 2 and 3. It has been shown previously that the Impact Factor of a Journal is connected to willingnes to retract or correct a paper. However, the two datasets did not vary in average 5 yr impact factor of the journals the papers where published in.
The rate at which corrections or retractions where made upon identification of a data problem, however, varied greatly between the cases discussed in public and the dataset the remained private. Publically discussed cases with data problems where retracted 6.5-fold more often than the non discussed cases and the rate of correction was 7.7-fold higher.
“Combined, 23% of the publicly discussed papers were subjected to some type of corrective action, versus 3.1% of the private non-discussed papers. This overall 7-fold difference in levels of corrective action suggests a large impact of online public discussion.”
Does public discussion of data problems make a difference on Laboratory Level?
The number of lab groups represented in the two data sets was similar, 75 in the public and 62 in the private dataset. Some had one problematic paper, other multiple, the average – for both sets – was between 3 and 4 problematic papers per laboratory group.
However, the striking difference was the response of the lab groups in form of a corrective action. Within the public set 28 lab groups took corrective action of at least one paper, while the in the private set only 6 lab groups took corrective action!
“This suggests that corrective actions in the private set took place on a more individualized basis, with more clustering of corrective actions in the public set perhaps being a direct consequence of greater publicity.”
And what are the conclusions of your study? Paul Brookes summarizes in an interview with PeerJ:
“That internet publicity of data problems in published papers is associated with a greater level of corrective action taken on those papers, compared to papers for which there was no public discussion of their alleged shortcomings. Although it is tempting to speculate that this is a cause-and-effect relationship, there are a number of caveats to the study, in the form of external factors that could have accounted for some of the difference between the two groups of papers. However, the difference was so large (7-fold), that the most likely explanation is the publicity itself.”