I need your data, your code, and your DOI

I love open science. Since you are reading a scientific blog, I believe it is likely that you also support many of open science ideas. Indeed, easy access to publications, code, and research data makes research easier to reuse, while also ensuring transparency of the process and better quality control. Unfortunately the academic community is extremely conservative and it just takes forever for new standards to become commonplace.

The push for change in scientific practice comes from many directions.

  • Many funding agencies now require that all publications funded by them are publicly accessible. The upcoming Plan S would go further and only allow open access publications for all public funded research.
  • Frequently when submitting a grant proposal these days one also must include a data management plan [1].
  • The glossy journals in our field tighten their data publication requirements (see Nature and Science).
  • At the same time there are multiple grassroots initiatives for setting up open access community-run journals: SciPost [2] and Quantum.

Also as individual researchers we can do a lot. For example, our group routinely publishes the source code and data for our projects. Recently Gary Steele and I proposed to our department that every group pledges to publish at least the processed data with every single publication. This is miles away from the long-term vision of publishing FAIR data, but it is a step in the right direction that does not cost too much effort and that we can do right now. We were extremely pleased when our colleagues agreed with our reasoning and accepted the proposal.

The policy changes and initiatives help improve the practice, but policy changes are slow and grassroots initiatives require extra work and might require convincing skeptically minded colleagues. Interestingly I realized that there is another way to promote open science, which doesn't have any of those drawbacks. Instead it is awesome from all points of view:

  • It does not require any effort on your side.
  • It has an immediate effect.
  • It helps researchers to do better what they are doing anyway.

Almost too good to be true, isn't it? I am talking about one situation where every researcher is in a position of power: reviewing papers. The job of a reviewer is to ensure that the paper is correct, and that it meets a quality standard. As soon as the manuscript is even a bit complex, one cannot assert its correctness without examining the data and the code that are used in it. Likewise, if the data and the code comprise a significant part of the research output, the manuscript quality is directly improved if the code and the data is published as well.

Therefore I have decided that a part of my job as a reviewer is to to ensure that the code and the data is available for review as soon as it is sufficiently nontrivial. I have requested the code and the data on several occasions, following this request with a suggestion to also publish the code and the data.

I was pleasantly surprised with the outcome. Firstly, nobody wants to argue against a reasonable request by a referee. Secondly, often the authors are happy to share their work results and do a really decent job. Finally, on more than one occasion already requesting the data was enough for the authors to find a minor error in their manuscript and fix it. In the current system where publishing this supplementary information does not bring any benefit, the authors are seldom motivated to make their code understandable and data accessible. Once a review requests the data and the code, the situation changes: now whether the paper gets published also depends on the result of this additional evaluation.

So from now on, whenever I review a manuscript, in addition to any other topics relevant to the review, I am going to write the following [3]:

The obtained data as well as the code used in its generation and analysis constitute a significant part of the research output. Therefore in order to establish its correctness I request that the authors submit both for review. Additionally, for the readers to be able to perform the same validation I request that the authors upload the data and the code to an established data repository (e.g. Zenodo/figshare/datadryad) or as supplementary material for this submission.

I hope you join me and do the same [4].

[1] One has to note that the data management plans are mostly overlooked during the review.
[2] Full disclosure: I'm a member of SciPost editorial college.
[3] Obviously, I'll adjust this bit if the paper doesn't have code or data to speak of.
[4] Consider that bit of text public domain and use it as you see fit.