Metadata: Enabling Data Sharing
Metadata is descriptive information about data such as data source, location, owner, field names and so on. With advanced analytics, the definition of what constitutes data is greatly expanded. In addition to databases, your data includes archives of photos and video, diagnostic test results, sensor readings, log files, documents, spreadsheets, and more.
Given this expanded definition, data can be found almost anywhere and data owners distributed throughout your company. A metadata project will help you understand what data is available to be inventoried and shared with the rest of the organization. A logical first step is to take an inventory of datasets and collect standard information about them in an electronic catalog. In the example below, all of the fields are searchable and the record provides users with enough details to search, retrieve, and evaluate the dataset.
Objective
Your objective is to lead the development of a metadata management strategy to facilitate the sharing of datasets, create new opportunities for collaboration, and reduce redundancy in data collection. Typically, there are three stages in metadata content development: Data Catalog, Data Definitions, and User Annotations.
Key Insights
- A systematic approach to collecting and storing information about datasets helps ensure data becomes more accessible for reuse, and delivers ongoing value as an asset.
- A metadata catalog reduces time spent searching for data and allows more time for data preparation, visualization and analysis.
Collecting Metadata
In your data catalog, the metadata (descriptive details) will enable others to discover and view the data. These six types of questions can guide you on what to include in the catalog design.
-
Who created the data?
-
Who owns the data?
-
Who will maintain it?
-
Who is using the data?
-
What is the purpose of the data?
-
What is the content of the data?
-
What is the security level?
-
When was the data created?
-
When was is last updated?
-
Is there a date when the data becomes invalid?
-
Where is the data stored?
-
Where did the data come from?
-
Why was the data created?
-
How is the data formatted?
-
How many databases store this data?
-
How can users gain access to the data?
Organizing Metadata
Here is an example of a metadata schema for a data catalog, which shows how the descriptive information can be organized.
Quick Tips
- Start with a proof of principle project that takes a small, manageable set of data sources to explore and test metadata approaches and technical solutions. You’ll need to test and re-test to find a system, process, and storage solution that will meet your company’s needs.
- Use a cross-functional team to lead metadata planning and management. Their expertise will help ensure the data descriptions are valid and organizationally correct, which helps facilitate data sharing and consistency.
- Select a metadata storage solution that can be accessed by most people in the company and uses a format that will last over time. As you identify options, consider mature formats supported by several vendors, or open source alternatives.
Conclusion
Metadata collection and management can give you significant insights into the variety and types of datasets available within your organization. You can also save time and resources in the long run. Your teams benefit by sharing reliable datasets and avoiding duplication.
When your solution is in place, you’ll make it easier for people to design and manage new analytic models that generate actionable insights to solve business challenges.