Data Standardization Scores and Changing the DATA Act
My post Data Standards and Data Dictionaries Need Data Governance discussed the need for governance when moving an organization (or organizations) toward the adoption of data standards as is the case with the DATA Act.
To effectively manage this you need to know where you are coming from and where you are going. To help illustrate I’ve created a simple scale for understanding organizing costs and benefits associated with improved data standardization:
Figure 1. Data Standardization Scale
At the left side rated at a maximum of 10 is “1 Standard to Rule Them All.” For a given set of entities, processes, or transactions this means adopting and using a single standard describing data and metadata contents and formats. If there are multiple systems involved they must adopt this single standard and adjust processes and procedures accordingly. Even when the benefits of adopting a single standard outweigh the costs you need to understand how to manage or “govern” these costs as you go through the adoption process.
The middle state is “Interoperability.” This means that multiple systems and data standards are able to communicate and share information accurately and efficiently even when the same thing is described differently by different systems. This may require a translation approach that is applied retroactively or in real time when system or data standard boundaries are encountered. In any modern organization there will be multiple systems and processes that need to share data, and this is especially true for Federal agencies. How such interoperability is handled, even when it is an intermediate condition on the way to total standardization, will impact efficiency, effectiveness, and service quality. An added consideration might be reliance on multi-client cloud based or commercial systems where data standards are intended for use by multiple organizations.
At the right side we see what I have called “Chaos.” This implies no data standardization across systems. Unified reporting here can be difficult or expensive when management attempts to create a single view across operations. Data exchange is expensive and error prone if special or custom translation methods must be built to accommodate required data exchange.
When real money is involved a chaotic data standards situation can lead to significant — and expensive — errors. Risk associated with these errors wil be at least partly driven by the use being made of the data, e.g., are the data feeding into benefits calculation or check-writing? Or are data feeding a high level analysis across multiple data types that is examining general demographic or regional trends, not characteristics of individuals or instititutions? In such situations modern data analysis and mapping software might help you by speeding up the process of understanding how data in different systems are organized. This might be especially useful if you are combining data standardization and system consolidation efforts.
Standardization and transparency
What got me thinking about the above and the need to assess both “as is” and “to be” states was a series of articles about OMB-requested changes to the DATA Act currently before Congress, one by Hudson Hollister of the Data Transparency Coalition, the other by Jim Harper of the Cato Institute. In both cases OMB and the Obama Administration are taken to task for having apparently asked Congress to weaken provisions of the DATA Act that require data standardization around Federal spending. (My own Senator Warner, a co-sponsor of the DATA Act, has stated his opposition to OMB’s suggested changes to the DATA Act.)
These articles are pretty clear that pulling back from financial data standardization reduces government transparency which seems opposed to “open government” policies adopted by the Obama Administration.
Standards and transparency: related but not identical
While I’m not privy to the politics going on “behind the scenes” I do think it’s useful to consider how data standards and transparency are related.
Data can be standardized but not transparent
You can have data that are completely standardized across multiple systems yet, for one reason or another, they are not made available to the public. Perhaps the reason is purely political — i.e., an agency or legislator wishes not to air “dirty linen” in public. Perhaps the reason is associated with national security — e.g., don’t tell your adversaries how much you are spending on Defense System X.
Data can be transparent but not standardized
On the other hand, you can have non-standardized data — even “chaotic” data — that can be quickly and cheaply made available piecemeal online as data files are generated. Making government reports available as difficult-to-extract .pdf documents can be a case in point. Yes, the published files might not be totally standardized and might even be using different coding standards for referring to the same entity; this can be maddening when the same organization is named or referred to in different ways in data files that describe different activities or transactions.
Imagine life without standardized geocoding or life without zip codes and you get the picture of what operators of different financial systems within the Federal government deal with every day. (Note: be sure to check out PDF Liberation for information about efforts to extract useful structure from such documents.)
One can make the argument that it is better to have non standardized data available to the public than no data at all, especially given the availability of cloud based data publishing tools and the growing number of organizations expressing an interest in developing commercial products — and jobs — on top of public date. Again, you can still score a “10” on the above “data standardization scale” and still rate close to “0” in transparency.
Back to governance
I keep coming back to the need for a governance structure when it comes to developing and implementing data standards. We’ve already seen that a lack of real governance and control can lead to wheel-spinning and delay; see A Project Manager’s Perspective on the GAO’s Federal Data Transparency Report for an example of what happens when authority and funding aren’t adequate.
I was very encouraged by what I heard at the December Data Transparency Coalition breakfast where those most likely to be responsible for implementing the DATA Act discussed high level planning. While I was impressed with what I heard I also expressed some concern, based on my own experience managing data projects, that the following needed to be addressed:
- Will the DATA Act provide sufficient resources to ensure that staff, technology, and industry collaboration are available and effectively planned and managed?
- Will appropriate governance mechanisms be in place to ensure that what the DATA Act requires is accompanied by appropriate authority and accountability mechanisms?
- If a choice has to be made between (1) making nonstandardized or even “dirty” data accessible and usable to users in the short-term and (2) waiting longer till data and metadata standards are developed, agreed upon, and implemented, how will such decisions be made?
Based on this recent news about OMB’s recommended changes to the DATA Act it has become less clear how these concerns will be addressed.
The basic ideas behind the DATA Act’s focus on financial data standardization makes such eminently good sense that efforts to weaken such standardization should be carefully and openly assessed. Fundamentally, data standardization if managed well can reduce costs, improve data manageability, reduce errors, and improve communication. Implementing data standards can also improve how date transparency efforts are supported as long as the people who operate the underlying systems want to be more transparent.
Copyright © 2014 by Dennis D. McDonald