Dataset Metadata

Without metadata, a catalog of published datasets could not exist. Metadata provides the essential context that makes datasets discoverable, understandable, and usable. Most open data portals include built-in tools to create and manage metadata when publishing or updating datasets. Some systems even automate metadata updates during dataset edits. Each dataset you publish should include many of the following metadata elements, which fall into two primary categories: Basic Elements and Advanced Elements.

Basic Metadata Elements

These are the core descriptors that help users find, evaluate, and understand your dataset. Many appear directly on catalog navigation pages or in search results.

  • Title (or Name): A clear, human-readable name for the dataset. Use plain English and enough detail to support search and discovery. Avoid acronyms or overly technical language.

  • Description: A concise but informative summary that helps users quickly determine whether the dataset is relevant to their needs.

  • Category (or Theme): The primary subject area of the dataset, typically chosen from a predefined list. Some portals allow only one category per dataset, while others support multiple.

  • Keywords (or Tags): Single words or short phrases that describe the dataset. Include terms that technical and non-technical users might search for. Keywords also support discovery through recommendation engines.

  • Modification Date: The most recent date on which the dataset was updated, modified, or changed.

  • Contact Information: The name and email address of the dataset publisher or responsible party, so users can follow up with questions or feedback.

  • License: Information about how the dataset may be used. Most open data is released under public domain or open licenses, though some datasets may carry specific reuse restrictions.

Advanced Metadata Elements

These elements enhance interoperability and support automated systems that integrate data from multiple sources. While they may not appear in user-facing catalogs, they are critical for enabling third-party tools and platforms to access and interpret your data.

  • Frequency: Describes how often the dataset is updated, using plain language such as “Daily,” “Monthly,” “Annually,” or “As needed.” This is especially useful for developers setting up data integrations or automated processes.

  • Temporal Coverage: Indicates the time period covered by the dataset. This may be a general timeframe (e.g., 2015–2023) or the exact start and end dates reflected in the data.

  • Spatial Coverage: Defines the geographic scope of the dataset. This could be a city, county, state, or other defined area. For geospatial datasets, this may also include bounding coordinates or shapes, though that level of detail is less common.

Last updated