Wikidata has the ability to provide references for each and every statement that it stores. This paper builds on Seyed’s earlier work to create topical subsets over Wikidata. Here we focus on the provision of references for statements in 6 topical subsets of Wikidata and report summary statistics to report on their usage.
SPaR.txt is the result of a recent pilot project conducted with my colleague Ioannis Konstas jointly with Bimal Kumar and Richard Watson from Northumbria University. We are investigating using NLP techniques jointly with Knowledge Graphs to extract requirements from the UK’s building regulations. SPaR.txt, developed by our RA Ruben Kruiper, allows for extracting terms with little training required. This was shown over the Scottish building regulations since they are openly available in a machine processable format. This provides the first stepping stone to being able to automate the extraction of requirements from regulatory texts.
This paper is the result of a collaboration with Ilaria Tiddi, an Assistant Profssor at the VU Amsterdam, which came about after one of her trips visiting Edinburgh. Ilaria has been involved in the creation of the Collaboration Databank (CoDa) which contains summaries of social science experiments on collaboration. We explored whether it would be possible to use a nanopublication representation, with rich provenance information, to detect contradictory results and suggest potential causes for the contradiction.
The main topic that I focused on at last year’s virtual BioHackathon was using Bioschemas markup scraped from web pages about proteins known to be disordered and populating this data into a registry (IDPcentral). We’ve refined the process, and markup, quite a bit since the hackathon, resulting in one notebook that transforms the scraped data files into a consolidated knowledge graph and another notebook which runs some simple analysis queries, including the HCLS Dataset Description metadata statistic queries.
Wikidata is an amazing source of data. However, it’s query service is limited to relatively straightforward queries due to fair usage timeouts, and its size makes it impractical for most people to use locally. To evaluate his PhD work on references, Seyed will need to process complex queries, so we need a mechanism to construct representative subsets of the whole. This paper explores our initial ideas on creating topical subsets from Wikidata.
One of the topics that my PhD student Seyed and I were involved with at last year’s virtual BioHackathon was subsetting knowledge graphs and Wikidata. Our hacking group, led by Jose Labra Gayo has now written up our findings, covering use cases and existing tooling for subsetting.
This year, due to the pandemic, the European BioHackathon went virtual. Despite the online only nature of the event, it was well attended and there were more topics than ever – 41 topics – covering both the usual ELIXIR topics and some dedicated to the COVID response. Bioschemas was again well represented in different hacking projects. This year the focus was on exploiting Bioschemas markup for different communities.
In this paper we use data about off shore oil platforms to identify items that have the potential for reuse or recycling. We do this by integrating their histroic design data and mainenance logs based on the ISO 15926 standard.
Today my PhD student Ahmad Alsadeeqi successfully defended his PhD thesis “Systematically Corrupting Data to Evaluate Record Linkage”, receiving minor corrections. Congratulations!
I once again attended the European BioHackathon which took place in November outside of Paris. It was another intense week with 150 developers from across Europe (and beyond) working together on 34 topics.
subscribe via RSS