Click or tap above image to download a .pdf of this article.OMB to release data inventories

The White House’s Office of Management and Budget (OMB) has agreed to make federal data inventories available in JSON or CSV format on a quarterly basis starting on February 28 of this year. As described by the Sunlight Foundation this positive resolution of a longstanding Freedom of Information request is a good thing and has the potential for significantly improving federal program transparency.

Soon it will be easier to see what data resources are being maintained by separate federal agencies. Based on that information the public should be better able to ask serious questions about the program activities described by the inventoried data assets and about the programs’ effectiveness.

Certainly this is a good thing. I’ll take openness over “secrecy” any day especially when it comes to how my tax dollars are being spent.

More than just data are needed

Still, knowing that data exist – which is what the inventories will tell us — is not the same as accessing and interpreting the data. Even assuming the public eventually gains access to the inventoried data, we’ll still need contextual information about the programs described by the data and measurement of the impacts these programs have.

Traditionally, impact measures have been harder to come by than raw data describing operations. Plus, individual agencies have already been active with publishing their data or metadata; see discussions of data.gov, EPA, NOAA, CMS, and USAID. Hopefully the different data access processes will be coordinated.

Why data release takes time

There are several reasons why government agencies are slow to release data to the public. Some reasons are good (for example, the agency must first remove personally identifiable information), some are bad (for example, someone is trying to hide something), and some fall somewhere in between (for example, the agency is overhauling critical systems and lacks the resources needed to make requested data public right now).

There are also several reasons for releasing data to the public. These involve basic transparency (for example, let people know how their tax dollars are spent), creation of additional channels for communicating efforts of agencies to target audiences, and stimulating innovation by the private sector in new, unusual, or even profitable ways.

Needed: clean data and metadata

All such efforts require the release of clean, high-quality data and metadata. Making people reliant on data containing errors, even when data are released in the name of “transparency,” can cause problems down the road.

Data management professionals know this. In an ideal world data will only be released to the public when the data and associated metadata are clean, consistent, standardized, and information and analytical resources (such as documented APIs) are available to support contextually meaningful analysis, visualization, and interpretation.

In the real world, of course, immediate perfection may not be possible. If you’ve ever been involved in a database creation or conversion process you know that error detection and correction are ongoing processes especially when data are moved between systems or where some type of manual processing is involved somewhere in the data management lifecycle. Mistakes can happen. Sometimes these mistakes don’t get noticed till after the data are released and closely scrutinized. I won’t be surprised if errors or inconsistencies are detected in the data inventories and the data sets they describe after public release begins. I would also expect that “errors” will be picked up by those with an ax to grind.

So be it. The sooner that errors are caught and corrected the better. Plus, as pointed out by the Sunlight Foundation, making data inventories available leads to discussions of priorities for further data release and analysis.

Needed: a coherent strategy

Still, part of me would like to see data being released as a matter of course as a result of some sort of coherent strategy that aligns agency goals with real world strategies tied to performance metrics. This goes back to my belief that “open data” programs shouldn’t just be “bolted on” as an afterthought to government websites but should be integrated with the programs that generate the data. Commercial services such as Socrata’s GovStat and OpenGov provide a toolized approach to providing such transparency but governments still need to do the necessary planning around how to measure progress against public goals.

One thing that’s definitely needed is leadership that knows how to relate the metrics describing what the agency does (for example, operating costs) to how well the agency performs.

Related reading

Copyright © 2015 by Dennis D. McDonald, Ph.D. Dennis is a management consultant based in Alexandria, Virginia. His experience includes consulting company ownership and management, database publishing and data transformation, managing the integration of large systems, corporate technology strategy, social media adoption, statistical research, and IT cost analysis. Clients have included the U.S. Department of Veterans Affairs, the U.S. Environmental Protection Agency, the National Academy of Engineering, and the National Library of Medicine. He has worked as a project manager, analyst, and researcher throughout the U.S. and in Europe, Egypt, and China. His web site is located at www.ddmcd.com and his email address is ddmcd@yahoo.com. On Twitter he is @ddmcd.