How to measure your data maturity?

Richard Benjamins    27 November, 2018
Big Data and Artificial Intelligence (AI) have become very popular these days, and many organizations have started their data journey to become more data-driven and take automated, intelligent decisions. But a data journey is a complex journey with several intermediate stages. While it is relatively clear what the stages are and what kind of activities they comprise (illustrated in Figure 1), it is less clear how to assess the overall data maturity of an organization with respect to its goal to fuel Analytics and AI.

Figure 1 The phases of a typical data journey towards becoming a data-driven organization
 
Indeed, measuring the data maturity of organizations is a multi-dimensional activity, covering a wide range of areas. In this article, we will provide an overview of those dimensions and how to measure progress on each of them. Figure 2 shows the dimensions, which we explain below using examples of what it means to be less or more mature.
 
 
Figure 2 The dimension of measuring organizational data maturity
 
 

IT, platform & tools

Anyone who wants to do something with data and AI needs a platform where data is stored and accessed. Early stage, immature organizations will likely have any platform to start with, either in the Cloud or on-premise, with no particular strategy. Mature organizations will have a clear strategy for how to support all facets needed for Analytics and AI. The strategy will encompass whether systems will run on-premise, in the Cloud or using a hybrid approach. It will describe the reference architecture for the big data software stack, APIs for accessing data in secure ways, etc. It will also cover the analytics, data visualization and data quality tools available for the users across the organization. Mature organizations will have automated most of the processes to run the platforms and tools on a daily basis, with minimum manual intervention. Finally, mature companies have a clear budget assigned to this along with a data roadmap of new functionalities and new data sources to include.  

Data protection

Data protection refers to the privacy and security of the organization’s data. Data protection can also be viewed as part of Data Governance, but due to its importance, it is often considered separately. With the new European GDPR regulation, it is clear for many organizations what it means to protect privacy of customer data. For most organizations, it is, however, still a major challenge to comply with all aspects of the GDPR. Because GDPR has set the bar high, we can say that organizations that are fully GDPR compliant, are mature on the data protection dimension. Data-mature organizations, in addition, use all kinds of privacy-enhancing technologies such as encryption, anonymization & pseudonymization, and differential privacy to reduce the risk of revealing personal information. With respect to security, apart from the technological solutions for secure data storage, transfer, access, and publishing, mature organizations also have a clear policy on who has access to what types of data, with special attention given to people with administrator rights who might be able to access all data and (encryption, hashing) keys.  

Data governance & management

This dimension measures how well data is managed as an asset. Almost all organizations that have started their data journey some time ago will recognize that one of the biggest problems is to have access to quality data and to understand what all data fields mean. Managing data as an asset includes aspects such as having an up-to-data inventory of all data sources, a data dictionary, and a master-data-management solution with data quality and lineage. But it is also about processes, ownership, and stewardship. Data sources typically have an owner that is responsible for the data generation, either as a consequence of an operation (e.g. payment data generated by POS devices) or through explicit data collection. A data steward takes care of the data on a daily basis in terms of availability, quality, updates, etc. Organizations that take data serious tend to set up a “data management office” that functions as a centre of excellence to advise the different stakeholders in the organization. More advanced organizations not only manage their data, but also their analytical models throughout their lifecycle. They will also consider external data, either procured or as Open Data to increase the value potential. And the most mature organizations have a clear policy on Open Data, stating how Open Data should be managed when used (license, liability, updates, etc), and when and under what circumstances private data can be published as Open Data, and under what licence.

Organization

The organization dimension refers to how the data professionals are organized in the company. Is there a separate organization like a Chief Data Officer ? How powerful is this position in terms of distance from the CEO (-1, -2, -3)? Or are the data professionals split between several organizations such as IT, Marketing and Finance?  What is the function of the data team? Is it a centre of excellence or is it operational, running all data operations of the company on a daily basis? And how well are the data professionals connected to the different businesses? Is there a company-wide “data board”, where data leaders and business leaders share, discuss and take decisions to align business and data priorities? Is there an initiative to “democratize” the data beyond the data professionals to the business people? How is the next layer of people involved in creating value from data?  

People

The people dimension is all about how organizations go about acquiring and retaining the skills and profiles required for the data journey towards AI and Analytics. Is it just treated as one of the many profiles, or is there a special focus reflecting the scarceness in the market? If hiring is hard, are there programs for training and upskilling the workforce? How refined are the profile definitions? It should recognize the different essential profiles including data scientist (analytics and Machine Learning), data engineer (data pre-processing and cleansing), data architect (architectural design of platforms) data “translators” (translate insights in business relevance), and AI engineers.  

Business

The final dimension, which is enabled by all the other dimensions, is the business dimension where the real value creation takes place. Mature organizations have a comprehensive data strategy where they lay out their plans and objectives for the six dimensions discussed in this article. There is also a clear vision on how much needs to be invested in each of the dimensions for achieving the goals. A data-mature organization also has a clear view on what use cases are possible and what the expected benefits are. Moreover, such organizations measure the economic impact of use cases  and report them in a consistent manner at the company level so that there is a clear understanding of the value generated by the data investments. This is essential for continuing to invest in data.   Finally, the most data-mature organizations are, apart from applying data and AI internally to optimize their business, looking at new opportunities with business. This could be based on insights generated from company data that are of value for other sectors and industries. For example, mobility data generated from mobile antennas, always in an anonymous and aggregated way, and combined with external data, has value for the traffic management , retail  and tourism  sector. But the new business opportunity could also be based on partnerships with companies from other sectors to combine data and generate differential insights. Data and AI can also be used for Social Good, that is, to pursue social objectives such as the Sustainable Development Goals of the UN.  

How to execute a data maturity assessment?

  A common way to perform a data maturity assessment is to translate each dimension into a set of questions with predefined answers ranging from 1 to 5, where 1 represents little maturity and 5 maximal maturity. This gives a questionnaire of less than 100 questions which still should be manageable. The questionnaire can be completed through interviews or as a self-assessment, possibly with a session afterwards where the self-assessed answers are challenged and the scores adapted. The resulting scores on each question are then aggregated per dimension, and finally in an overall data-maturity score. If done properly and avoiding tendencies to “look good”, this is a powerful tool to manage the data maturity of organizations: it embodies a data-driven way to manage the data journey. It allows to set objectives, track progress over time, prioritize data investments, and to compare or benchmark different units, especially in multi-national corporations.     Don’t miss out on a single post. Subscribe to LUCA Data Speaks.   You can also follow us on TwitterYouTube and LinkedIn

Leave a Reply

Your email address will not be published.