Mastering Metadata Management: An initial step

#data

What is Metadata?
Metadata is the unsung hero of the digital world, quietly working behind the scenes to organize, describe, and provide context for your data. In the realm of information management, metadata is the key to unlocking the full potential of your enterprise’s data assets. But what exactly is metadata?

Metadata, simply put, is data about data. It’s the information that describes the characteristics and attributes of your data, making it easier to find, understand, and utilize. Metadata has various aspects, such as:

Descriptive Metadata: This type of metadata provides information about the content, such as titles, keywords, authors, and summaries. It’s like the cover of a book, giving you a quick glimpse of what’s inside.
Technical Metadata: This includes details like file formats, data size, creation dates, and version history. It helps ensure data integrity and compatibility.
Administrative Metadata: This category covers ownership, access rights, and usage permissions. It helps in managing data security and compliance.
For effective management of data within an organization, this information is invaluable. However creating, updating and managing this information requires significant effort. In the next section we will have a look at one of the challenges you can expect when implementing metadata management within your organization.

The challenge of metadata span
What does a single definition mean? A single definition could have multiple interpretations, especially across the boundaries of divisions of a company.From the perspective of the HR department, an asset often refers to an employee. Whilst from the perspective of the IT department, an asset can refer to the ERP. These definitions of these assets vary widely. Definition span is a challenge that is most common for descriptive metadata.

A common solution is to find a consensus for a specific definition. This results in long discussions in which often a suboptimal description will be formulated. These definitions are hard to fully grasp and they tend to lose their value quickly because the reality tends to change. Also, are these definitions updated when the market changes? Do they feel natural within the team you are working in? Also in the perspective of security and governance, not all metadata should have the same constraints.

Solution
This issue has troubled software engineering similarly. Both in regards to definitions as in business logic. Applying a similar solution could relieve some of the pain of this problem.

Domain Driven Design (DDD) has given a framework that helps to solve this problem. Among others, it suggests using the Ubiquitous Language. This language is the common language used within a domain. A set of definitions and descriptions used within the domain by all the participants. Eric Evans talks about “Ubiquitous Language” in his book Domain Driven Design.

In our context, it boils down to that the span of your meta data should be limited to a domain it is used in. This has the following benefits:

Consensus on definition is easier to reach
There is a strong relationship between the metadata and daily business
The sense of ownership of the language is much stronger
Implementation of all DDD principles can be rather intrusive and takes a significant amount of time. To remain agile a use case could be a good moment to see if the method fits the organization. Use the following steps to guide you through the process.

Gather Domain Experts

During the use case, work closely with the domain experts of the use case. Domain experts could be stakeholders, data analysts or business analysts. Finally, make sure the end users are also included as domain experts. This step is mentioned specifically to highlight the need to identify stakeholders early on in the process.

Establish Metadata Repositories

There are plenty of options available for creating new and shiny repositories. However, having buy-in from business and other stakeholders is of the highest importance. If there is a repository available, which is already being used by business stakeholders and analysts, this would be the obvious choice. After stakeholder buy-in, the methods of keeping the information up to date and relevant are very important features of the repository. If nothing is available, an initial repository on a network attached storage having documents with a good template would be a good start. For templates I would recommend the following link from google data cards. These cards are used to inform users of what the data products contain and how they should be used. If the methodology appears to be a good fit, upgrading from a nas to a supported product would be a good next step. Useful tools and the pro’s and cons of the tools will be discussed in a later blog.

Create a Ubiquitous Language

Work with domain experts to create a shared vocabulary, a ubiquitous language, that bridges the gap between technical and non-technical teams. A great method to do so is event storming. During a storming session, the team will go through all source, transformation and data outlets. Identify the key properties of the data. During this step, take time to start filling out the metadata repository with the mentioned sources and outlets.

Necessary procedures: Checkup and Registration

There are two procedures necessary to get the return of investment of the metadata repository. Checkup and maintenance of the repository and registration of all new sources. The registration of new sources within the repository is the responsibility of everyone that works with the data. The most natural moment to register the new sources is during conceptualisation and initial exploration of a new use case. A convenient and worthwhile exercise to ensure the metadata is up to date is to hold a fire drill. Simulate a data outage. See how effectively the outage is communicated throughout the company and during the post evaluation identify possible misunderstandings between engineering, stakeholders, data owners and stewards. This fire drill is also a good method to identify if a data product is ready for production, but more on this in a follow up blog.

These steps could help setup and maintain a healthy metadata repository.

Conclusion
In summary, metadata is a silent hero in managing data effectively. This blog clarified what metadata is and its vital role in organizing and contextualizing data. We introduced the Google Data Card project as a useful resource for understanding metadata’s organization.

However, metadata management is not without its challenges, particularly the issue of varied definitions. To overcome this, Domain-Driven Design provides a framework for creating a shared language specific to your domain, making consensus easier and improving the connection between metadata and daily operations.

Ready to take action? Here’s how:

Collaborate with domain experts.
Set up metadata repositories.
Develop a shared vocabulary using techniques like event storming.
Setup procedures to ensure the metadata is update regularly
Consider conducting fire drills to evaluate communication and readiness for production.
Stay tuned for more insights and tools on metadata management. If you have questions or need further guidance, please reach out on ruud.cools@the-experts.nl. Your journey to mastering metadata management starts here, and we’re here to help every step of the way.

the/experts. Blog

the/experts. Blog is a community of amazing users

Mastering Metadata Management: An initial step

Discussion (0)

Read next

Keycloak - Configuration as Code Pt. 4

Goodbye ArgumentCaptor - Welcome assertArg()

The Death of Manual Feedback Analysis:

Mastering Mockito's MockedConstruction feature