Alasdair J G Gray

Connecting the dots in the World's data

BioHackathon 2020 (Virtual)

This year, due to the pandemic, the European BioHackathon went virtual. Despite the online only nature of the event, it was well attended and there were more topics than ever – 41 topics – covering both the usual ELIXIR topics and some dedicated to the COVID response. Bioschemas was again well represented in different hacking projects. This year the focus was on exploiting Bioschemas markup for different communities.

The main focus of the work was showing that the IDPcentral registry could be populated using Bioschemas markup. Protein markup was scraped from three intrinsically disordered protein resources – DisProt, MobiDB, and ProteinEnsemble. The scraped data was merged and converted into the existing IDPcentral registry model. We also generated a knowledge graph over which some initial analysis queries were run, e.g. identifying which proteins are found in more than one source. The work can be found in this notebook. We plan to carry on implementing this work and extending beyond a sample of pages from each site.

Within the plant community, we formalised the mapping between MIAPPE (the minimum information model for plant phenotyping experiments) and Bioschemas Study profile. This will be tested in the coming weeks on the PIPPA site for phenotype experiments.

Egon Willighagen used BMUSE to scrape MolecularEntity markup from MassBank compound pages. The table below reports the statistics on the data that was scraped.

count description
1,628,333 Total number of triples
15,762 InChIs
10,627 Monoisotopic molecular weights
7,508 Molecular formula
1 Different license(s)

The slides below were presented at the final reporting session by the various projects that invovled Bioschemas.

    About Me

    Headshot

    I'm an Associate Professor in Computer Science at Heriot-Watt University. My research focuses on linking datasets. Read more

    Tweets