Data Standards and Data Dictionaries Need Data Governance
I had the good fortune recently to spend an hour on the phone with Peter Benson, founder and CEO of ECCMA, the Electronic Commerce and Code Management Association.
ECCMA, among other things, is the project leader for the standards program ISO 2275 (open technical dictionaries and their application to the exchange of characteristic data) and ISO 8000 (information and data quality).
I wanted to talk with Peter about the “total cost of standardization” which I’ve written about here. Given the likely impact of the coming U.S. DATA Act on access to federal spending data, and continued development of federal “open data” programs, I wanted to “pick the brain” of someone for whom data and metadata standardization is a “meat and potatoes” business.
The problem of secrecy
At the center of Peter’s concern about data standardization is secrecy. The approach promoted by ECCMA for industry data standardization through open access data dictionaries that are built and maintained collaboratively is that individuals and organizations cooperate and share information in the development and maintenance of such dictionaries. Use and reuse of dictionary contents is the goal. This generates cost and time savings through avoidance of duplication. Organizational or industry secrecy to promote short-term political or financial gain would fly in the face of such objectives.
Too much concern about costs?
Peter also believes that too much concern over the “costs of standardization” could be misguided given increasing availability of cloud-based technologies that lower the cost of storage and access. Technology also facilitates necessary data operations such as data mapping and data analytics, both of which are needed in some form in any effort at creating technical data dictionaries describing term meanings, equivalences, and relationships.
Needed: Data Governance
One challenge, Peter believes, is connected with implementing appropriate data governance practices. This is not something that the IT department should attempt on its own. Decisions related to concept definitions, concept relationships, syntax, and data usage in business and industry must meaningfully involve business stakeholders.
I’m currently reviewing the wealth of information ECCMA makes available on its website and agree with Peter about a lot of things especially the importance of data governance and the need to involve both business and IT in decisions about “data standards.” How data definitions and relationships are defined and governed inside and outside an organization requires what I call a collaborative project management approach where planning, flexibility, sharing, and trust are critical to success.
When thinking about “data governance” it’s important how we define “data standardization.” The shareable and collaboratively-built open access data dictionary concept promoted by ECCMA originated from the needs of manufacturers, supply chain managers, EDI specialists, and NATO governments. It has proven to be a pragmatic approach to organizing how concepts, terms, and their relationships are defined and used. People using different information systems and languages need to communicate and share information about the same product or concept and they need to do so without necessarily using the same language or terminology. The idea of the data dictionary supports this since relationships between concepts and different terms can be made explicit, shareable, reusable, and the basis for on demand translation.
Data governance and automation
I am also currently researching the extent to which the development of such data dictionaries can be automated, given the constant rise and electronic data volume and accessibility. “Data governance” will beimportant in situations such as DATA Act implementation where financial data from different industries, conceptual models, and systems will need to be managed and inconsistencies resolved. Another factor driving the need for improved data governance is the movement to open up Federal data by government agencies as the current Administration is pushing. Making more data available will probably reveal significant overlaps in data management and collection by many agencies and this will generate a demand for more standards, more sharing of resources — and improved data governance.
The AllWeb — the Internet of Things
Finally, the “Internet of Things” is upon us which I’m starting to refer to as the “AllWeb.” This will generate large volumes of online-accessible data as everything from toasters and refrigerators to sump pumps, thermostats, workout results, and medical records go online. Making sense of such varied volumes of data will require a variety of approaches including automated approaches relying on algorithms and semantic technologies. Many of the decisions related to large data volumes will also benefit from the formalism introduced by organized data governance programs. Planning how to do that should already be underway.
Copyright (c) 2014 by Dennis D. McDonald. For more about Data Program Management go here. Contact Dennis by email at firstname.lastname@example.org or by phone at 703-402-7382. Check out his curated Managing Data collection on Google+.