Should you or not share your data?
“So, most researchers already endorse data sharing in principle. So, if you consider, for example, being asked to share your data from a reviewer for your article or from an editor, most researchers would be completely willing to do this and would see it as a necessary part of the peer review process. Even further, I think most researchers would expect to share their data or would expect another researcher to share their data if asked to do so on reasonable request after publication.
So, in principle, data sharing is already part of the scholarly communication ecosystem, one could say. The problem is that most of this is done behind the curtain, so to speak.
So it’s not done in repositories that are publicly available. It’s done by communicating on an individual level.
So, there are quite a few issues behind why data sharing behind the scenes is actually quite problematic.
The first one is (1) preservation. Email addresses become expired. People become difficult to contact. People move universities. But even if a researcher can be contacted, even if those email addresses haven’t broken, most of this data actually lives on private hard drives, which we all know can be quite frail. Try finding that hard drive 10 years later and hoping it’s working or being able to find that data if it’s moved to a different hard drive. Some of this data just lives on a post-doc’s laptop. What we have to have is a stable place for this data to live for preservation to work, and that’s in a publicly-available repository.
Another problem with data sharing being behind the curtains is it (2) doesn’t allow for the full economic impact of research. And what I mean by that is it allows for data to be used to the extent that an individual researcher can reuse it, or maybe even a small collaboration can reuse it. But data can be reused in many ways, and one researcher cannot always decide how that data can be reused or envision how it can be reused. Some good examples are with the Human Genome Project, which has had a $965 billion impact on the US economy. That’s not just through research. That’s through tools that have been developed by private industry as well as by researchers.
Behind the scenes data sharing also (3) doesn’t work in cases where there’s an urgent need to share data. So, those kinds of cases we’ve seen recently, for example, with the Zika virus or the Ebola outbreak. In such cases, we often see data sharing right at the beginning, and scientists and researchers are very good at sharing not only sequence data in those cases, but epidemiology data. However, what we find with the outbreaks is, although they’re very good at sharing at the beginning, there’s a sort of data gap that we can often see once the outbreak leaves the media attention. So what’s really needed is a more consistent to share data in such cases.
(4) Some problems in research are just too large for any one individual or collaboration. In those such cases, we really need a global view of the data rather than an individual or small research group view of the data. Really good examples for that are in cancer. In order to beat this, we’re going to have to have the power that one can get by looking at all of the data. And so, there are just too many types of cancer, too many types of mutations within that, thousands, that we have to have the skill that we can only get with shared, open data.
Benefits of sharing data
Sharing data can also benefit researchers in many ways. For example, several studies across different fields are now showing that there’s a citation advantage to shared data linked to articles. This makes sense, because there’s an extra link to your article from a repository. Your data as well as your article is going to be more discoverable.
Sharing your data also makes your data more able to be tested by other scientists, validated, reused by other scientists. Doing this makes your work more credible, because it makes it more reproducible.
In addition to a citation impact, research has shown that studies with collaborations tend to be higher-impact studies. Data sharing helps attract more collaborations.”
—————————————————————————————————————
Source: Amye Kenall (2018): The importance of sharing data. Nature masterclasses course. Available at: https://masterclasses.nature.com/courses/28/videos/718 Retrieved on 18th February 2019