More Perspectives on Sharing Large Open Research Data Sets: Physics
“We’re probably going to see an increasing number of of reports like Genetic Drivers of Immune Response to Cancer Discovered Through ‘Big Data’ Analysis where access to and analysis of a large body of previously collected data leads to significant findings.”
One area where data sharing is already active is in Physics, as described by the University of Notre Dame’s Thomas McCauley in his presentation HEP Data for Everyone: CERN open data and the ATLAS and CMS experiments.
CERN launched its Open Data Portal for data from its Large Hadron Collider experiments in 2014. McCauley’s presentation puts this into historical context.
What I found most interesting about McCauley’s presentation is how CERN’s open data policies and practices are intertwined with CERN’s dual mission of research and education. The topics of CERN’s research, the data generated, and the communities its programs work with are complex. There are, as I like to say, “many moving parts.”
McCauley provides a nuanced view of open data in the CERN context that I believe helps make sense of a very complex situation. Slide number 5 describes four “levels of access to data”:
- Level 1: data directly related to publications
- Level 2: simplified data formats suitable for education and outreach
- Level 3: “analysis-level” reconstructed data, simulation, and software
- Level 4: raw data and associated software
The rest of the presentation is devoted to the tools and approaches for making different types of data available for different uses to different groups. It makes for instructive reading even if your open data program doesn’t reach the volume or complexity of CERN data.
Based on my own reading of the presentation I had some reactions that might be useful when applying these ideas other programs.
It’s clear from McCauley’s presentation that what has evolved at CERN is significantly more than just “throwing data over the fence and hoping people will analyze it.” There are some serious “wrapper” types of services required to make data useful in different ways to different users. It’s clear that time, attention, and money have been devoted to creating and sustaining these services. As I’ve noted in other open data contexts, addressing the “who pays for what” cost issues head-on is a must-have part of your strategy; this seems to have been done at CERN.
It’s also likely that what looks like a well organized open data program now has actually evolved not smoothly but in fits and starts. That’s not a criticism but the reality of what happens when you enter into a program where research and the methods for sharing data are evolving. You have to experiment to see what works. You also have to be ready for surprises – and failures. This is especially true when new methods and approaches for making data available and analyzable are being introduced (e.g., see Informatica Unveils Hourly-Priced AWS Data Management Tools).
Having a governance framework also helps. I don’t just mean a data governance framework that defines and maintains quality and currency of data and metadata but an approach to overall data program governance that is empowered to orchestrate, coordinate, and where appropriate, require action. Such a program governance will only be effective if it is closely aligned with the program itself not just its “open data” goals.
- The Commerce Data Advisory Council's 2nd Meeting: Storytelling, Staff Recruiting, and Complex Processes
- Data Program Governance and the Success of Shared Digital Services
- Developing a Basic Model for Data Analytics Project Selection
- Introduction to PLANNING AND MANAGING BIG DATA PROJECTS: SELECTED ARTICLES
- Learning From General Electric’s Big Data Challenges
- Managing Data-Intensive Programs and Projects: Selected Articles
- Problems and Opportunities with Big Data: a Project Management Perspective
- Should Clinical Trial Data Sharing Be a Precondition for Refereed Journal Article Acceptance?
- Some Perspectives on Sharing Large Open Research Data Sets
- The Tip of the Spear: Connecting Big Data Project Management with Enterprise Data Strategy
- What Kind of Management Structure Is Needed to Govern a Data Analytics Program?
Copyright (c) 2016 by Dennis D. McDonald, Ph.D. An independent consultant located in Alexandria Virginia, Dennis’ interests include project, program, and data management; market assessment, digital strategy, and program planning; change management; and, technology adoption. Clients have included HHS CMS, U.S. Dept. of Veterans Affairs, National Academy of Engineering, the World Bank, and the U.S. Environmental Protection Agency. His professional web site is here: http://www.ddmcd.com. Follow Dennis on LinkedIn, Twitter, and Google+. Reach him by email at firstname.lastname@example.org.