Apr 6

Apr 6 Can Open Research Data Prevent Errors, Pain, and Suffering?

Science Education, Science, Accessibility, Open Data, Open Access, Policy, R&D Management, Research, Research Data, Peer Review, Transparency

By Dennis D. McDonald

An interview on NPR's Morning Edition on April 6 with Richard Harris about his new book Rigor Mortis: How Sloppy Science Creates Worthless Cures, Crushes Hope, and Wastes Billions highlights problems that sometimes arise when poorly conducted or hastily published research leads to pain, suffering, and wasted money.

Some will point this out as a reason to reduce NIH's biomedical research funding. It's also worth noting that a partial solution is already being put in place in many areas of scientific research. This is the move to make the underlying data of scientific research more open and accessible to the public and to other researchers. I've written about some of these efforts here.

Making research data more open and accessible doesn't guarantee that data-generating work will be verified or disproven through further analysis or replication. Pressure on researchers to do something "original" or "creative" may counterbalance a willingness to re-do previous work. Funding agencies may themselves prefer to fund new or original work. Or they may be reluctant to fund directly the potentially costly efforts needed to guarantee accessibility of data generated by previous research.

Nevertheless, when decisions are being made about investing in expensive drug research -- or in any research program that involves lives or the spending of vast sums of money -- evaluating the sources of previously-published findings being used to support, say, an NIH grant application, just makes good "due-diligence" sense.

There are several reasons why previous research and analysis may be difficult to replicate:

The original research or analysis was fraudulently conducted and the data tampered with.
There was no data tampering but mistakes or unconscious bias inadvertently crept into the previous research.
The conditions surrounding the original research cannot be replicated perfectly, thus creating a challenge to reproducing the exact findings in the same way.

In other words, just making the data from previous research available will not by itself be sufficient to overcome all the problems listed above. Equipment, methods, conditions, context, and other factors influence findings. These factors may all influence research findings and are difficult to document in a brief peer reviewed journal article -- or in raw data files.

One possible but extreme idea is the need to make all points in the research cycle more open, accessible, and transparent, not just the data that "comes out the other end." As nice as this may sound in theory, the practical (and cost) implications of doing so are extensive.

While I personally support making well documented research data more open and accessible and part of the normal "cost of doing business" for scientists and other researchers, the reality is that many other changes also need to take place before open research data and access are universal. For example, not all researchers are ready to share their data. While research funders may increasingly make data accessibility a condition of funding, there are still a lot of details to work out concerning how this will be accomplished and sustained over time.

One intermediate solution is to encourage and accept the use of social media and social networks as a normal part of communication among researchers. The idea is not to use such media as a data exchange mechanism but as a way to encourage relationships, trust, and collaboration especially among researchers working in different institutions or fields. This is already happening among younger researchers and is encouraged in varying degrees by traditional institutions such as professional associations.

Another long term solution is to continue emphasizing traditional and rigorous research methods in scientific training. Re-analyzing existing data does not free the researcher from understanding traditional concepts of experimental design even if new "big data" and "data science" techniques are being used in analysis. After all, it always pays to understand the sources of error and variation in one's data!