Access to data sets is critical to many of today’s efforts across all sectors and industries, whether it be scientific research, business analysis, or public policy. In the scientific community and at various levels of the public sector, reproducibility and transparency are essential for progress, so data sharing is vital. For example, in the United States a recent new policy requires free and equal access to the results of all federally funded research, including data and statistical information along with publications.
To make it easier to discover content with this level of statistical detail and to better extract this information from across the web, Google is now making it easier to search for data sets. You can click on any of the three main results (see below) to access the dataset page or you can explore more by clicking on “More datasets”. Here’s an example:
When users search for datasets in Google search, they find a dedicated section that highlights pages with dataset descriptions. They can explore many more data sets by clicking on “More Data Sets” and going to Search for data sets. |
With data set search technology
Search for data sets, a dedicated search engine for data sets, powers this feature and indexes over 45 million data sets from over 13,000 websites. The data sets cover many disciplines and topics, including government, scientific, and commercial data sets. Dataset search shows users essential metadata about datasets and previews of the data where available. Users can then follow the links to the data repositories that host the data sets.
Search for data sets mainly indices Dataset pages on the Web that contain structured data from schema.org. Schema.org metadata allows web page authors to describe the semantics of the page: the entities in the pages and their properties. For data set pages, The schema.org metadata describes key elements of datasets, such as their description, license, temporal and spatial coverage, and available download formats. In addition to adding this metadata and making it easily accessible, Dataset Search normalizes and reconciles metadata that comes directly from web pages.
If you’re a dataset author or provider and want your datasets to be found by others in Search, be sure to publish your dataset in a way that makes it discoverable and specify how others can reuse the data. Specifically, make sure that the web page describing the dataset has machine-readable metadata. The easiest way to ensure this is to publish your dataset to an established dataset repository. Some repositories cater to specific research communities, while others are “generalist” (figshare.com, zenodo.org, datadryad.org, kaggle.com, etc.). These repositories automatically include metadata on dataset pages for each dataset, making it easy for search engines to discover and include them in specialized results sections, as in the figure above.
As data sharing continues to grow and evolve, we will continue to make data sets as easy to find, access, and use as any other type of information on the web.
expressions of gratitude
We are very grateful to the many Googlers who contributed to the development and launch of this feature, including: Rachel Zax, Damian Biollo, Shiyu Chen, Jonathan Drake, Sunil Vemuri, Stephen Tseou, Amit Bapat, Will Leszczuk, Marc Najork, Sergei Vassilvitskii, Bruno Possas and Corinna Cortes.