Alasdair J G Gray

Connecting the dots in the World's data

First steps with Jupyter Notebooks

9 Oct 2018

At the 2nd Workshop on Enabling Open Semantic Sciences (SemSci2018), colocated at ISWC2018, I presented the following paper (Gray, 2018) (slides at end of this post):

Title: Using a Jupyter Notebook to perform a reproducible scientific analysis over semantic web sources

Abstract: In recent years there has been a reproducibility crisis in science. Computational notebooks, such as Jupyter, have been touted as one solution to this problem. However, when executing analyses over live SPARQL endpoints, we get different answers depending upon when the analysis in the notebook was executed. In this paper, we identify some of the issues discovered in trying to develop a reproducible analysis over a collection of biomedical data sources and suggest some best practice to overcome these issues.

The paper covers my first attempt at using a computational notebook to publish a data analysis for reproducibility. The paper provokes more questions than it answers and this was the case in the workshop too.

One of the really great things about the paper is that you can launch the notebook, without installing any software, by clicking on the binder button below. You can then rerun the entire notebook and see whether you get the same results that I did when I ran the analysis over the various datasets.

Using a Jupyter Notebook to perform a reproducible scientific analysis over semantic web sources from Alasdair Gray

Using a Jupyter Notebook to perform a reproducible scientific analysis over semantic web sources
Gray, Alasdair J G
In Enabling Open Semantic Science, Monterey, California, USA, 2018
Executable version: https://mybinder.org/v2/gh/AlasdairGray/SemSci2018/master?filepath=SemSci2018%20Publication.ipynb

In recent years there has been a reproducibility crisis in science. Computational notebooks, such as Jupyter, have been touted as one solution to this problem. However, when executing analyses over live SPARQL endpoints, we get different answers depending upon when the analysis in the notebook was executed. In this paper, we identify some of the issues discovered in trying to develop a reproducible analysis over a collection of biomedical data sources and suggest some best practice to overcome these issues.
```
@inproceedings{Gray2018:jupyter:SemSci2018,
  author = {Gray, Alasdair J G},
  title = {Using a Jupyter Notebook to perform a reproducible scientific analysis over semantic web sources},
  optcrossref = {},
  optkey = {},
  booktitle = {Enabling Open Semantic Science},
  year = {2018},
  opteditor = {},
  optvolume = {},
  optnumber = {},
  optseries = {},
  optpages = {},
  month = oct,
  address = {Monterey, California, USA},
  optorganization = {},
  optpublisher = {},
  note = {Executable version: https://mybinder.org/v2/gh/AlasdairGray/SemSci2018/master?filepath=SemSci2018%20Publication.ipynb},
  url = {http://ceur-ws.org/Vol-2184/paper-02/paper-02.html},
  optannote = {}
}
```

About Me

I'm an Associate Professor in Computer Science at Heriot-Watt University. My research focuses on linking datasets. Read more

Tweets

Tweets by gray_alasdair