My previous article discussed an introduction to guardrails and their significance. This is the second blog I plan to write about this topic in a series. This article mainly focuses on the challenges of writing the guardrails in data analytics that uses Data Mesh as the architectural construct and how they can be approached.
📈 Data Analytics Guardrails
Data analytics guardrails are rules, procedures, or controls put in place to help ensure that data analytics processes and systems are being used safely and ethically. These guardrails are designed to prevent data misuse or the unintentional disclosure of sensitive information, and to help ensure that data analytics results are accurate and reliable. Data analytics guardrails can include data governance policies, data privacy regulations, and quality assurance processes. They can also include technical controls, such as data encryption or access controls, to help protect against unauthorized access to data or data breaches. Overall, data analytics guardrails are an important part of any organization’s data analytics strategy, as they help to ensure the data’s safety, security, and integrity and the results of the data analytics process.
🚧 Challenges in Writing Guardrails
The challenge is to “write” guardrails that are easy to interpret and socialise, are feasible and implementable, are machine-readable and can be assured by machines/systems.
But the first hurdle is to agree on what “good” looks like with all the stakeholders, including the team that is expected to live within the boundaries.
⚒️ Breaking the Problem into Manageable Chunks
The first question that comes to mind is where to begin writing the guardrails. The most effective method is to start from the top and then componentise the entire architecture canvas across the Enterprise Architecture into domains, further dividing domains into sub-domain areas. One sub-domain listed below within the Data Domain can be Data Analytics.
A sub-domain, e.g. Data Analytics, can be further modularised by creating independent solution building blocks. We can then understand the scope of these blocks, e.g. operational vs analytical plane, and define finer guardrails for each solution building block within these planes, inside that finite scope and context.
The figure above on Data Analytics Reference Architecture shows how Operational Plane and Analytical Planes are coming close together in a Domain-Driven Design and Data Mesh Architecture. Data Product Management, Data Security and Privacy, Data Quality and others overlap significantly between the operational and analytical planes. On the other hand, Data Integration has overlaps but is still developing as a self-serve, evolutionary architecture and may be maintained by IT rather than the business.
⚛️ Identifying Fundamental Building Block – Data Products
The Data Analytics Reference Architecture and the Solution Building Blocks shown above then use the architectural construct of Data Mesh. Data Mesh helps create the data infrastructure, within the data analytics sub-domain to connect the Operational Plane and the Analytical Plane.
The reference architecture above provides the blueprint to create manageable and reusable technical infrastructure per higher principles from IT guardrails separation of concern. Whereas the Data Mesh construct then provides the governance layer and data architecture to help create Data Products using principles of Domain-Driven Design, making it manageable within the bounded context but interoperable across domains, sub-domains and bounded context.
Data Products accessed via the Data Analytics tools/demand via Data Marketplace will be the actual tangible output to the personas consuming information. As a result, Data Product can be used as a fundamental building block within the Data analytics subdomain.
The advanced analytical plane in the illustration above does not detail the AI / ML aspect, which will be discussed in future blogs.
🌏 Fundamental Building Block within the Data Ecosystem
The responsibility of the architecture is to ensure the quality of the output. If the quality of the fundamental building blocks, i.e. Data Products, and the solution building blocks, i.e. technical components, is well maintained, a high-quality output delivering use cases will be assured.
Each Data Product should ensure that it fits well into the data ecosystem and complies with different dimensions of data governance, data architecture and data management frameworks.
|Area||Description||Data Product Context|
|Data Architecture||Defines the blueprint for organising and managing data assets||Lifecycle Management of Data Products|
|Data Modelling & Design||The process of discovering, analysing, representing, and communicating data requirements and designs||It deals with how Data Products are internally structured and connected to the use cases. This area brings the discipline ensuring that the data products are reused, data is shared, and redundant copies of data are not created outside of the domain where the data belongs.|
|Data Storage & Operations||The design, implementation, and support of stored data. Operations provide support throughout the data lifecycle from planning for to disposal of data||This is the Information Lifecycle management of the data stored within the Data Products.|
|Data Security and Privacy||The protection of data from unauthorized access or use.||The measures and practices are implemented to prevent unauthorized access to or disclosure of Data Products. It also deals with the data within the Data Products around the rights of individuals to control how their personal information is collected, used, and shared.|
|Data Integration & Interoperability||The movement, consolidation, and translation of data between data stores, applications, and organisations. Integration: messaging, formats etc. Interoperability: semantics, understanding, and interworking||Helps in the consolidation, convergence and connection of the Data Products.|
|Document & Content Management||Storage, management, and access to digital content, such as documents, images, videos, and audio files.||Unstructured data managed within the Data Products. The architecture managing Data Lake vs. Data Warehouse use cases for documents, images, videos and audit files.|
|Reference & Master Data||Ongoing reconciliation, maintenance, and sharing of core enterprise data to enable consistent access to a single version of truth||Storing hierarchies to map levels, codes to harmonise data from various systems, lookups to map descriptions, enumerations and any domain/sub-domain specific patterns within or across Data Products.|
|BI & Advanced Analytics||Planning, implementation, and control processes and technology to manage decision support data, analytics, and reporting||The data structure design within Data Products, e.g. dimensional approach, denormalised, etc. And how the data/information is consumed within the reports, dashboards and advanced analytics.|
|Metadata Management||High quality integrated metadata on: definitions, data models, data flows, and other information critical to understanding data throughout its lifecycle||This discipline aims to link Data Products to the Enterprise Metadata Management Service. It should be noted that Domain Boundaries are subject to change as a result of modifications to business processes or upcoming Domain Distillations. The domain, sub-domain, bounded context and data product boundaries should not be materialised/physicalised but mapped logically within the metadata repository.|
|Data Quality Management||The planning and implementation of quality management techniques and systems to measure, assess, and improve the fitness of data||Checking Data Integrity within the Data Products in the Data Analytics sub-domain. Data Integrity is a subset of overall Data Quality Management. The Data Integrity check within Data Analytics should be integrated into the organisation’s Enterprise Data Quality Service.|
When the Data Product complies with the guidelines established by each of the dimensions above, it can be shared and reused and is expected to provide the use cases’ specified benefits.
📏Maturity Levels of Guardrails and Principles
The third write-up in the series will focus on the maturity levels for data analytics guardrails.