Alasdair J G Gray

Connecting the dots in the World's data Dataset Descriptions Meeting

On Monday 16 May, several interested individuals (including myself) from ELIXIR, bioCADDIE, Bioschemas and the W3C HCLS Community Profile met with representatives from Google involved in the activity on describing datasets.

Finding datasets, and understanding their content, is a challenging task for humans and currently not possible to automate. is an initiative from the major web search engines to help with the discovery of web resource.

There are multiple parallel activities in the life sciences community working on developing ways to publish metadata about datasets. This is due to the wide variety of use cases that dataset descriptions need to satisfy, including data discovery, data citation, and provenance tracking. bioCADDIE has worked on an extensive analysis of  use cases mapping data models from existing efforts. We agreed this could be used as a basis to improve the existing dataset type and come up with a new Bioschemas dataset specification.

The outcomes of the meeting with Google was to focus on the find-ability of datasets with an emphasis on data citation. For data citation the following key properties are important (as stated in the FORCE 11 data citation principles).

Data citation Image from

The next steps will be to develop some pilot projects to both publish and use dataset descriptions for discovery and citation.

ELIXIR-UK members are heavily involved in and will be involved in these efforts as part of the ELIXIR interoperability platform activities in collaboration with NIH and ELIXIR-EBI.

My thanks go to Rafael Jimenez for his help with the preparation of this post which will also appear in the ELIXIR-UK newsletter.

    About Me


    I'm an Associate Professor in Computer Science at Heriot-Watt University. My research focuses on linking datasets. Read more