How Much Data Governance Is Enough?
Note - this article is currently in draft form and is being released for comment prior to completion. If you have comments to share please send them to me at firstname.lastname@example.org.
What is data governance?
Definitions of Data Governance vary. Here's mine:
- Data Governance is the orchestration and management of all the systems, processes, and technologies that contribute to and maintain the quality, reliability, and usability of an organization's data and metadata.
Why improve data governance?
In Do Organizations Need Data Governance as a Service (DGaaS)? I described three "data application categories" that data governance systems, processes, and technologies must support:
- Using data and metadata to help understand what was (e.g., by providing historical context for the data provided to management)
- Using data and metadata to help understand what is (e.g., using available data from inside and outside the organization to gain a better understanding on what’s happening now)
- Using data and metadata to help understand what will happen (e.g., using predictive models to help compare and contrast different options and scenarios)
Important question to ask when designing an improved data governance program are the following:
- Even if we focus initially on only one of these application categories, what constitutes enough data governance?
- How can we make sure, right from the start, that data governance practices associated with our data, and the metadata associated with our data, are both efficient and effective?
No ocean boiling!
On the one hand, you don't want to set out to "boil the ocean" and become so engaged with gaining support for and initiating so many strategic changes to data-related people, processes and technology that you lose management's support along the way. Initial overreaching is potentially a problem when the organization's data literacy is scarce while at the same times legacy systems and processes need to evolve as the organization pursues digital transformation and modernization. You need to pick initial targets for improved data governance carefully.
One implication of this "no ocean boiling" rule: it may not be wise to start out by attempting to inventory and model all the organization's data. Rather, focus instead on the data and metadata immediately involved with solving a critical problem or issue, and then evolve from there.
Beware low hanging fruit
On the other hand, you don't want to overemphasize "low hanging fruit" and short term deliverables. Doing so runs the risk of "underwhelming" your target audiences or management with the analytics you deliver. (You don't want to hear, "This is all you got?")
This might be an issue in organizations where readily accessible pools of data already exist but don't readily relate to important challenges or problems. Using currently available data and readily available visualization tools, you can quickly develop analytical prototypes, proof-of-concepts, and dashboards. Software vendors understandably promote ease of use of their analytical products but these may not be the best place to start if they don't focus on problems management really worries about.
You may find, for example, that when you examine your data sources that your problems extend beyond typical problems like missing data or values caused by minor data collection or transformation issues. Your data quality and consistency conditions may actually be more serious than you initially expected. Addressing them might require more time and resources than the quick reporting and graphics you've promised and can deliver from available sources.
Either way, making better use of your data requires a planning effort and appropriate management. I've addressed data governance scoping -- deciding what's in and what's out in a data governance effort -- in the two part series A Framework for Defining the Scope of Data Governance Strategy Projects (Part 1, Part 2). There I suggested that having a well defined problem, application, or question was probably the key ingredient to setting reasonable bounds around an initial improvement in data governance based improvements in data analytics. Proper scoping, especially early on, is essential.
The topic of where to start is also important. Some suggest that hiring a chief data officer or data scientist at the outset is also a good idea. An example of this approach is described by Anirban Das in Reasons to hire a Director of Data Science before hiring your first data scientist.
The situation will drive whether it is more important to hire a data scientist with business qualifications or a data savvy business manager to head the effort. Either way, it is important to realize that data governance in an organization is not just a technical exercise but one that requires participation of both management and the IT department.
Whichever approach you take you need to balance both tactical and strategic concerns. An overarching goal is to align data analysis and governance efforts with the needs of the organization given where it is now and where it is going. You need to prioritize when and how you address a range of technical as well as non-technical concerns.
Because of the way data permeates and flows through an organization you need to know where to "draw the line" especially when starting out.
It might not be a good idea to purchase a data governance software tool before you figure out how -- and why -- you need the tool. (For a list of software tools related to data governance see the table of contents for the research report Global Data Governance Software Market Size, Status and Forecast 2022. Another potentially useful list is 30 top master data management products.)
Data governance "facets"
Which areas need to be addressed when establishing a data governnce program that supports improved data analytics? Consider the following ten facets of data governance identified by Mathematica in its white paper Holistic Data Governance: A Framework for Competitive Advantage:
The individual "facets" identified in the above illustration will be familiar to any consultant that has ever been engaged with "strategic alignment" or IT strategy projects in large organizations. These are traditional areas that planners, strategists, managers, and consultants need to address in the design of any initiative to support improved data analytics or digital transformation.
Whatever data dependent problem or decision you set out to address with improved data analytics, each of the above areas needs to be considered, even when you are eager to provide an early deliverable or prototype that shows management what better data analytics can do.
But which facets which should you address initially when management is breathing down your neck to get a data analytics program underway? How do you design a project to deliver both useful analytics quickly as well as a sustainable and expandable data governance process?
Now or later?
Assume that you are developing a document that describes your plan. Such a document should provide:
- A description of the specific tasks and initiatives that need to be performed in order to deliver as quickly as possible a useful analytical deliverable to management as a proof of value.
- A description of how the work associated with these early deliverables will serve as the foundation for an enterprise wide data governance and analysis program.
- Documented plans that describe how the processes associated with (1) and (2) will be managed, communicated, and evaluated (i.e., who does what and when).
The standard categories provided in the Mathematica list provide an excellent starting point for addressing all three of these requirements. For each facet, the planner needs to address (1) what needs to be done now, (2) how this relates to an enterprise wide effort, and (3) how these efforts will all be managed.
Necessary -- but insufficient
For some organizations the above may not be enough. The Mathematica facet list, after all, is fundamentally a standard categorization of what needs to be done with any serious tech-reliant initiative, not just those that focus on data or analytics.Taking a comprehensive view of how data can be exploited may also require both organizational and technical capabilities that are new or unfamiliar to the organization (e.g., shortcomings regarding staff and management data literacy). Some resistance may arise, for example, when data and metadata standardization require changes in how current systems, processes, and data-related communication or semantics are managed. ("You want us to do what with our data?")
Differences are bound to exist in how even basic data are described by different functions or departments, differences that need to be addressed when taking an enterprise view of data. Different departments may have different ways of referencing customer addresses, for example, differences that ripple through the databases and applications that these departments rely on for daily operational support. At the international level, different countries and cultures may have different family and housing structures that need to be addressed when comparing sales and demographic data.
Start with existing processes?
Also, it's one thing to focus data analysis attention on making improvements in existing processes and systems more efficient. People are likely to understand why certain changes need to be made to increase efficiency related to traditional metrics such as throughput per resource unit or cost per transaction. Such metrics can be understood and justified in the context of currently understood processes and technologies. Focusing initial improvements in data governance on the improvement or optimization of current well-understood processes is one of the reasons why it may make sense to start by upgrading how currently available data are analyzed and presented.
Focus on the future?
It's quite another thing to sell management on what you hope will come out of better data analysis efforts, especially if needed management and governance efforts are complex or expensive.
Such uncertainty will always be a challenge, especially when data literacy is at a premium in the organization. This situation is similar to the challenges associated with justifying R&D expenditures involving uncertain outcomes.
Uncertainty regarding data and analytics is addressed in Risk and Uncertainty in Prioritizing Health Data Analysis Initiatives and in Risk, Uncertainty, and Managing Big Data Projects.)
A data governance program should evolve by supporting initial data analytics initiatives as well as a foundation for future more comprehensive data governance operations.
In support of this evolutionary approach, the following are examples of governance-related questions and issues to address when planning initial data analytics efforts. These are associated -- loosely -- with the "facets" mentioned above. Addressing them will help ensure that initiatives associated with improved data analytics are managed efficiently and with an eye to creating a foundation for future growth. (Another "short list" of planning questions is here: Improving Data Program Management: Where to Start?)
The following list displays, for each planning area. the purpose, tactical (short term) considerations, and strategic (long term) considerations.
Purpose: Make sure planned data governance efforts directly address problems or issues of importance to the organization.
Tactical: Focus initially on one important problem, the data needed to describe and address it, and analytical deliverables that are clearly linked to the problem.
Strategic: Make sure that how the organization is changing is considered in growing or expanding data governance efforts. As organizational goals and objectives change, data governance efforts should evolve as well.
Purpose: Understand the technologies associated with managing data and metadata and how they are organized and interact.
Tactical: What systems and applications are directly associated with the initial problem to be addressed? What do we need to know about how these operate and interact? In the short term, how do we manage them in order to deliver useful analytics quickly?
Strategic: In the longer term, what do we need to know about how the organization's technical architecture is changing as the organization as a whole engages in digital transformation efforts? For example, as more systems and data are moved to the cloud, how will data governance be impacted?
3. Business Case
Purpose: Define the relationship between improved data governance efforts and their quantitative and qualitative impacts on the organization and how it accomplishes its goals and objectives.
Tactical: Define both effectiveness and efficiency measures based on how better data and analytics addresses the initially selected application or problem.
Strategic: Develop and implement processes and procedures for engaging with both technical and business staff as data analytics and data governance expand to address more problems and application areas. (This has organizational implications.)
Purpose: Understand how the people, processes, and technologies interact in applying data analytics to help solve corporate problems.
Tactical: Focus initially on dependencies that are controllable or static, e.g., by minimizing the number or complexity of "moving data targets" when starting out.
Strategic: Acknowledge that it will never be possible to control or predict all the process or system dependencies that impact how data are governed and analyzed. This interdependency should influence the structure of the governance processes introduced and how they relate to ongoing management.
Purpose: Develop and implement the management initiatives required to build and sustain an ongoing data governance effort.
Tactical: Emphasize a "light touch" involving minimal bureaucracy, ceremony, and documentation. If appropriate adopt an agile project management approach. Focus on constant communication and feedback as deliverables are developed, tested, and evaluated.
Strategic: Document as you go what is learned about managing the initial project and how this may need evolve as the scope of data analysis efforts are increased over time. Carefully consider how management of more complex efforts will be managed and how this "fits in" to existing management and oversight practices.
Purpose: Define, track, and deliver the metrics that describe the costs, benefits, and effectiveness of improved data analytics and their supporting data governance program.
Tactical: Do not initiate any efforts without establishing defined metrics for tracking both costs and business and operational effectiveness.
Strategic: As with Management above, consider how ongoing cost and effectiveness measures of expanded data governance and analytics efforts will be tracked. Be prepared to document the changes that may be required in corporate oversight given the possible need to address cross-functional and cross-departmental data exchange. Such efforts may include consideration of how to overcome a lack of standardization in how data and metadata are managed, exchanged, and communicated.
Purpose: Secure sufficient human time and talent to manage not only data analytics work and technology but also management support to ensure efficiency, effectiveness, and sustainability.
Purpose: RIght from the start, make sure staff and management all understand that
Purpose: Identify the business processes that will be impacted by improvements in how data are analyzed and used in the organization.
Purpose: Identify what tools are needed to initiate and manage improved data and metadata governance efforts.
Tactical: Try to support initial data governance efforts using available software tools.
Strategic: Seriously consider that, as the number and complexity of data analytics applications and required data governance efforts increases, it may be useful to implement dedicated and flexible tools to support semantic analysis, data stewardship, metadata management, and collaboration.
Purpose: Identify who in the organization can articulate a vision for where the organization is going and secure the support and collaboration of those individuals.
Tactical: Engage with at least one person capable of expressing -- and understanding -- a vision for what improved data analytics can accomplish.
Strategic: Make sure that people are involved in ongoing data governance,