Most market leaders and experts consider different usage perspectives and characteristics for the Data Catalog, which makes any attempt at a single definition tricky. Nevertheless, when we look at their respective definitions, we notice the emergence of two types of recurring functionalities.
The first one concerns the collection of metadata associated with the use of data considered in a specific context (business, solution, technological …), which is a use that does not depend on any form of implementation. The second functionality provides the collection of metadata associated with the use of data in its implementation context (operational data storage system, digital application, transactional system, analytical system…).
Metadata for an intended use by the operational staff
Over the last 10 years, we have seen a multiplication of technologies revolving around digital, whether hardware (smartphone, fiber, …) or software (operating system, CRM, …). This evolution can also be seen in organizations that are increasingly adopting these new technologies, which generates a multiplication of data within organizations.
Among all the technological innovations introduced on the market that exploit information on the use of data or metadata, the Data Catalog proposes to associate the one resulting from a technical context (e.g. use through the implementation of operational, decision-making, analytical systems, AI, data integration, files, etc.) with the one resulting from a business context (e.g. use through a requirement, discourse, law, regulatory text, etc.).
Thus, we notice that the use of common terms (e.g. customer, username, etc.), whether related to the professions, solutions, or technologies of the company, increases misunderstandings and miscommunication.
In fact, the notion of “Catalog” has long accompanied people in charge of manipulating data associated with a process or the discourse of a language used in the company. This data was not necessarily linked to a system. It was rather associated with an initiative (or intended collaboration) on a perimeter of analysis or in one or more business areas of the organization.
The purpose of this glossary was to facilitate communication between all stakeholders in an initiative.
For this reason, the collection of different types of metadata associated with intended use in these contexts is necessary.
We can talk in particular about descriptive metadata that allows to:
● Discover data for new people entering the organization,
● Identify data for specific analysis and study needs,
● Understand the meaning of each data in a table.
Metadata for one or more implementations
The notion of data is frequently associated with its implementation in an application or classification specific to one or more areas of the organization (e.g. categories and types of products, customers …).
This data has a description of concepts, terms, definitions, but also structural, syntactic, and semantic characteristics as well as authorized values specific to its use in different implementations.
This description of the company’s data could be the subject of an inventory of terms and implementation features- metadata associated with a specific initiative – and was often referred to as a data dictionary.
In this context, we can talk about technical metadata, administrative metadata, structural metadata, and many others that allow multiple uses of metadata:
● Understand the relationships between families of data in the same domain,
● Visualize the logical structure of multi-domain data models,
● Know the types and versions of information in IS tools,
● Keep historical information of stored data.