Dec 16

Dec 16 Looking Beyond Open Data Availability to Managing Open Data Value

Developing Countries, Metrics, Open Data, Project Management

In The global “open” pulse from the 2014 Open Data Index Jason Hare reviewed the recent Open Knowledge report on annual trends in open data around the world. Jason’s headline is,

Overall the level of “open” is down to 11% from 15% a year ago

I won’t quibble with the “crowdsourced” and “snowball sample” methodology that is being used to develop this annual Index. I’ve been involved in large scale sampling and trend analysis efforts myself. The rules, definitions, and weighting scheme being used by Open Knowledge appear to be well thought through.

An important question is, it seems to me, why aren’t we seeing greater and more rapid adoption of open data practices as the number and variety of initiatives worldwide seems to be increasing? Do the Open Knowledge numbers reflect a global “slowdown” or drawback in open data efforts? Are the Open Knowledge numbers reflecting the very real challenge of doing longitudinal studies in an environment where conditions are still evolving?

Part of the answer might be hinted at in +Christian Kreutz’s excellent take on reality, 7 lessons learnt in five years of citizen participation & civic innovation. Kreutz reviews the very real challenges open data initiatives face once they get underway. His points include: they take time to implement, they require the participation of people who might not be accustomed to working together, it’s easy to overemphasize the importance of technology, significant resources will be necessary to make the initiatives sustainable, and “impact” is difficult to guarantee — or measure.

My comment on his article was,

Excellent discussion of the real world. Policy and theory are great but only usefulness and impact lead to real sustainability.

While it is important to measure how open data availability is trending, the really important question is whether or not open data initiatives are having positive impacts.

Impacts, unfortunately, are difficult to measure:

The impacts may or may not be related to the goals of a governmental program that generates the open data.
The impacts may be difficult to measure.
The impacts may occur over a long period of time making them difficult to track.

One hoped for impact of many open data programs is that open data will be used to develop products, services, and jobs. This is certainly one of the goals of the U.S. Government’s NOAA in its big data partnership program where the potential commercial value of government supplied data is explicitly being incorporated into open data planning.

Business value of open data is also important as a planning consideration for programs in developing countries. In a recent World Bank research effort reported by Prasanna Lal Das and Alla Morrison in New surveys reveal dynamism, challenges of open data-driven businesses in developing countries, “data intensive” startup businesses around the world were surveyed concerning their business activities and investment requirements. It was found that government supplied open data was still in the “early stages” of being adopted as a key element in the development of commercial products and services. Reasons for this included a lack of awareness of open data availability, a lack of differentiation between “open data” and “big data” when reporting, the difficulty of obtaining and processing data from a variety of sources, and a lack of investment capital available to support development efforts.

My own thinking on this (see Don’t Just Make Data Open, Make Open Data Useful!) is that open data initiatives need to be planned from the start to take into account the usefulness of the data that will be provided to the various user groups that are being targeted. The different user groups — both end users and intermediaries — need to be included in planning right from the start. Also, some sort of “total open data lifecycle cost” should be considered for every data set being considered for open data program delivery; for example, the costs of data cleaning and standardization need to be considered if the commercial value of derivative products is being included in the planning process, as should the presence or absence of a workforce or audience that possesses an appropriate level of data literacy.

The idea of taking a “holistic” view of the open data planning process is not new. Commercial publishers, software developers, and data intensive startups such as those surveyed by the World Bank have all had to juggle and manipulate many different resources from requirements definition through delivery and consumption. I can even remember back when I was managing the design and conduct of statistical surveys about professional information products having to resist “nice to know” questions being added to survey questionnaires given the downstream impact on follow ups, data editing, analysis, and reporting costs.

The need for open data program management to continually monitor and manage data usefulness, usage impacts, and commercial viability means that tracking open data availability should never be thought of asan adequate proxy for whether open data programs are doing us any good.