Data Standards and Data Dictionaries Need Data Governance
Data standards are important. Just as important are the processes through which data standards are developed and governed.
I once had the good fortune to speak with Peter Benson, founder and CEO of ECCMA, the Electronic Commerce and Code Management Association.
ECCMA, among other things, is the project leader for the standards program ISO 2275 (open technical dictionaries and their application to the exchange of characteristic data) and ISO 8000 (information and data quality).
I wanted to talk with Peter about the “total cost of standardization” which I’ve written about here. Given the likely impact of legislation such as U.S. DATA Act on access to federal spending data, and continued development of federal “open data” programs, I wanted to “pick the brain” of someone for whom data and metadata standardization is a “meat and potatoes” business.
The problem of secrecy
At the center of Peter’s concern about data standardization is secrecy. The approach promoted by ECCMA for industry data standardization through open access data dictionaries that are built and maintained collaboratively is that individuals and organizations cooperate and share information in the development and maintenance of such dictionaries. Use and reuse of dictionary contents is the goal. This generates cost and time savings through avoidance of duplication. Organizational or industry secrecy to promote short-term political or financial gain would fly in the face of such objectives.
Too much concern about costs?
Peter also believes that too much concern over the “costs of standardization” could be misguided given increasing availability of cloud-based technologies that lower the cost of storage and access. Technology also facilitates necessary data operations such as data mapping and data analytics, both of which are needed in some form in any effort at creating technical data dictionaries describing term meanings, equivalences, and relationships.
Needed: Data Governance
One challenge, Peter believes, is connected with implementing appropriate data governance practices. This is not something that the IT department should attempt on its own. Decisions related to concept definitions, concept relationships, syntax, and data usage in business and industry must meaningfully involve business stakeholders.
I’m currently reviewing the wealth of information ECCMA makes available on its website and agree with Peter about a lot of things especially the importance of data governance and the need to involve both business and IT in decisions about “data standards.” How data definitions and relationships are defined and governed inside and outside an organization requires what I call a collaborative project management approach where planning, flexibility, sharing, and trust are critical to success.
When thinking about “data governance” it’s important how we define “data standardization.” The shareable and collaboratively-built open access data dictionary concept promoted by ECCMA originated from the needs of manufacturers, supply chain managers, EDI specialists, and NATO governments. It has proven to be a pragmatic approach to organizing how concepts, terms, and their relationships are defined and used. People using different information systems and languages need to communicate and share information about the same product or concept and they need to do so without necessarily using the same language or terminology. The idea of the data dictionary supports this since relationships between concepts and different terms can be made explicit, shareable, reusable, and the basis for on demand translation.
Data governance and automation
I am also currently researching the extent to which the development of such data dictionaries can be automated, given the constant rise and electronic data volume and accessibility. “Data governance” will be important in situations such as DATA Act implementation where financial data from different industries, conceptual models, and systems will need to be managed and inconsistencies resolved. Another factor driving the need for improved data governance is the movement to open up Federal data by government agencies as the current Administration is pushing. Making more data available will probably reveal significant overlaps in data management and collection by many agencies and this will generate a demand for more standards, more sharing of resources — and improved data governance.
The Internet of Things
Finally, the “Internet of Things” is upon us. This generates large volumes of electronic data as everything from toasters and refrigerators to sump pumps, thermostats, workout results, and medical records go online. Making sense of such varied volumes of data requires a variety of approaches including automated approaches relying on algorithms and semantic technologies.
Where to start?
Many of the decisions related to large or complex data volumes will benefit from the structure and discipline introduced by organized data program governance initiatives that explicitly connect business objectives to data governance initiatives.
In Improving Data Program Management: Where to Start? I suggest the following topics to address when developing your own data program governance strategy:
- Data inventory. Do you know what data you have?
- Data provenance and process ownership. Who is responsible for how your data are used, regardless of where the data originate?
- Metadata repository. What are the terms, concepts, and process connections related to your data that you need to document and control?
- Data governance and stewardship. How do you efficiently manage your data as your organization and its data requirements continue to evolve?