Category Archives: Blog

FAIR Data Management for Projects: FAIR at the first Mile

FAIRDOM members Carole Goble, Wolfgang Müller & Frederik Coppens ran an ELIXIR workshop at the all ELIXIR hands meeting in Lisbon, 17 – 20 June 2019. Its purpose was to

* Share information on ongoing projects and (strategic) plans within ELIXIR touching on ‘FAIR at the First Mile’ (local data management).

* To bring together Nodes, Platforms and Communities along with other organisations (like FAIRDOM) to share and propose best practices, platforms and joint work for FAIR data management for projects supported by the Nodes.

* To discuss plans for for ELIXIR CONVERGE (ELIXIR’s INFRADEV3 proposal), EOSC-Life Plant A+ pilot and the EIP Strategic Plan Task 2 ‘Interoperability with a purpose’.

After an introduction by Frederk Coppens, deputy Head of Node (HoN) of Belgium describing ELIXIR-CONVERGE, Pinar Alper from Luxembourg described their human data setup which is tailored to meeting complex data protection needs. Inge Jonassen, HoN of Norway described the FAIRDOM/NELS setup run by ELIXIR-NO.

After a talk on FAIRDOM for first-mile data management by Wolfgang Müller, describing some approaches to be used in the EOSC-Live pilot Plant A+, the workshop closed with asession about the Interoperability Platform strategic Plan Task 2, the interoperabiltiy for ELIXIR Communites and Nodes.

Ghent FAIRDOM workshop, 11-12 June 2019

Frederik Coppens of VIB invited institutional data managers from the Belgian Flanders region, as well as FAIRDOM team and users to Ghent in order to discuss FAIRDOM data management and possible future collaboration.

The workshop started out with a presentation session in which the participating institutions presented their planned use case scenarios.

Katy Wolstencroft from Uni Leiden presented the early history of FAIRDOM, the use cases that shaped FAIRDOM and SEEK.  After a discussion of the need of the visiting institution, Fatemeh Zamanzad Ghavidel from Bergen (Digital Life, ELIXIR Norway) presented their use of FAIRDOM together with NeLS, the Norwegian e-Infrastructure for the Life Sciences. Then Wolfgang Müller of FAIRDOM/LiSyM/de.NBI described a variety of use cases for FAIRDOM in projects.

Then, Stuart Owen presented the FAIRDOM solution for a concrete scenario provided by Frederik Coppens. This scenario involved running Galaxy from within FAIRDOM, the use of sample types in SEEK, as well as the extraction of metadata. Wolfgang Müller presented Jupyter Notebooks running from within SEEK, as well as the use of openRefine for creating bespoke data transformation solutions.

The day finished with breakout discussions about a variety of topics from development to community management.

On the second day, pathways to collaboration were discussed in more depth.

Digital Life Workshop, Bergen, Norway, 8th May

At the start of this week Natalie Stanford, Olga Krebs, and Ron Henkel, from the FAIRDOM Team, were in Norway providing basic and advanced training for Digital Life Norway.

The agenda for the day included an introduction to FAIRDOM as a whole, followed by a short introduction to Rightfield, then parallel sessions for basic and advanced FAIRDOMHub training. The training goals for the day were:

Basic Training

  • What is the ISA structure?
  • How to create and interlink ISA elements
  • Browsing and downloading information/data from FAIRDOMHub
  • Uploading/registering your own assets in FAIRDOMHub
  • Linking assets to the ISA
  • Registering publications and linking to other assets.

Advanced Training

  • Model management in FAIRDOMHub
    • Creating modelling assay
    • Reproducing modelling experiments registered in FAIRDOMHub
    • Creating SED-ML
    • Creating COMBINE archives
    • Archiving your own modelling work
    • Tracking model evolution
  • Publishing using FAIRDOMHub
    • Why is traditional supplementary data is not enough?
    • What is the ISA structure?
    • Setting up a full ISA structure with attached data and models
    • Making the ISA structure publicly available
    • Snapshotting the ISA and associated data and models
    • Assigning DOIs for linking to within publications.
  • Samples in FAIRDOMHub
    • Learning about the new samples framework
    • Using existing Sample Types within the FAIRDOMHub
    • Using Metadata sheets to describe samples
    • Generating your own samples using forms and templates in FAIRDOMHub



The workshop was a great success, and the tutorials were popular with all 20 participants. It was particularly rewarding to see how well everyone was working together to learn and solve the data management issues of our example data management conundrum at Happy Salmon Co.

The researchers attending the workshop provided feedback on the technical implementation of both modelling and samples handling in the FAIRDOMHub. As a result we hope to start some user experience improvements over the coming months.

We were well looked after by our Norwegian hosts, and all of us look forward to working more with the Digital Life Norway.

Systems Biology Developer’s Foundry workshop

More than 4 years after organising the first such workshop, FAIRDOM in collaboration with de.NBI ran the 3rd Systems Biology Developer’s Foundry Workshop.

Foundry Workshops are an innovative, experimental workshop format that is based on a mixture between a demo session and scientific speed dating („show-and-tell sessions“). In the show-and-tell sessions, people show what they have achieved. Most demos in conferences are run by very many people in parallel, and in very short time, giving the viewers the need to choose the demos they view based on first sight. In contrast, in show-and-tell sessions a number of people present to everyone else participating in the speed dating in a number of small talks.
The first day is devoted to a number of sessions in which 3 presenters each present their work to a 3rd of the audience in parallel, and for 10 minutes. After their first presentations, another third of the audience will visit them to receive the same presentation, and finally the rest of the audience. Like that all the audience receive the talk, but as the public is small and the public changes, each small group of participants receive a presentation tailored to their needs, emphasizing the content that is most interesting to this small group.
This process is repeated with other presenters until all participants have seen all other participants’ work. After these sessions, people tend to know each other pretty well, which fuels the subsequent discussions, and makes them more interesting and effective.

In the 2016 edition of the Foundry Workshop, 16 scientists from Switzerland, Slovenia, the Netherlands, Germany, and UK gave and received talks on topics as diverse as usability of data management systems, modeling tools for systems biologists, and solving concrete systems medicine problems using the workflow engine KNIME and the statistical literate programming language RMarkdown. While this wide variety of topics may appear to be too varied for real discussions, there was much common ground among the participants. We all work in an environment where we try to combine complex tools into powerful, yet simple services for use of a demanding public: Scientists. After an initial introduction of the participants, assigning the talks to sessions, and the sequence of show-and-tell sessions, we spontaneously agreed to have a session about SnakeMake, nextflow, Galaxy, Taverna, KNIME. People with knowledge about the respective systems prepared material in a 30 minute break and then gave small presentations followed by lively discussions. Then we had to break discussions for dinner.
In the evening, we continued the discussions in a Trattoria I Siciliani in Frankfurt, preceded by an involuntary, but fun, traversal of the Frankfurt Christmas market and crossing the Main on a footbridge, enjoying the Frankfurt skyline.

On the second day, we started again with two workflow systems reports and then dived into Bérénice Batut’s introduction to docker, that (with discussions) took up the whole rest of the day. Docker is a system for simplifying the use of containers that currently has enormous traction and an extremely lively community. The level of previous docker exposure of the participant was very diverse, which was not a drawback, but spawned fruitful discussions. Each slide was discussed, leading to new insights for the participants.
In our view, this workshop created the following values for its participants:
(i) Knowing more people outside the previous collaborations who can help with concrete challenges, (ii) adding new, relevant tools to the own toolkit, (iii) clearer insight into use and misuse of docker, and finally (iv) a better view on with whom to integrate, and at which level.
Running such workshops is challenging, and fun. The challenge lies in creating a make-it-up-as-we-go atmosphere while still keeping an eye on the goals of the workshop. If it works, depends very strongly on the participants who have to be open-minded and interested. We wish to thank the participants who very quickly bonded and acted as a team.
In the preparation of the workshop, someone who did not participate expressed interest to run a Foundry Workshop with this structure about another, related topic in Bioinformatics. We are looking forward to this development and will support this where we can.

Sponsors of the workshop
Wolfgang Müller is grateful for the base funding he receives at HITS gGmbH, a private not-for-profit research insitute. This base funding that enables running the group also afforded some support for costs incurred in this workshop.
de.NBI, funded by the German BMBF, is the German Network for Bioinformatics Infrastructure. The de.NBI-SysBio node closely collaborates with FAIRDOM, as HITS is a member of FAIRDOM. Service, training, dissemination, outreach are part of de.NBI’s mission.
FAIRDOM is a transnational project funded by BBSRC, BMBF, SystemsX, NWO that aims at FAIR Data Operations Models management. We are collaborating with many other groups and scientists interested in FAIR Data, Operations and Models. Service, training, dissemination, outreach are part of FAIRDOM’s mission.

Featured Image
The image of Frankfurt Skyline featured in this article is Frankfurt by Barnyz. The original can be found on Flickr, and it is used in accordance with the Creative Commons licence.

Data and model management needs for a knowledge base of salmon physiology: The Digital Salmon

By Tina Graceline and Jon Olav Vik, Centre for Integrative Genetics (CIGENE), Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences.

Systems biology for salmon farming is the topic of the Digital Salmon, a FAIRDOM partner and active user of Our use-case was highlighted at the first FAIRDOM user meeting in Barcelona, 15 Sept 2016. The Digital Salmon currently has two projects, DigiSal and GenoSysFat, which comprise a model-driven, tightly integrated theoretical-experimental study of mechanistic interactions among genetic and feed factors. By combining experiment and modelling we aim to deliver a predictive understanding of a whole range of possible diets, much more efficiently than by traditional feeding trials alone. From a data management perspective, we have a lot of data and models that can potentially be linked via common languages; genes code for enzymes, which catalyze biochemical reactions, which transform molecules whose concentrations we can manipulate, measure and model.

Atlantic salmon farming generates approximately 6 billion euro every year and is projected to generate 20 billion euro in 2050. To support this growth, many challenges must be addressed. Salmon farming in the future must navigate conflicting and shifting demands of sustainability, shifting feed prices, disease, climate change, and product quality. The industry needs to develop a flexible, integrated basis of knowledge for rapid response to new challenges. Project DigiSal lays the foundations for a Digital Salmon: an ensemble of mathematical descriptions of salmon physiology, combining mathematics, high-dimensional data analysis, computer science and measurement technology along with genomics and experimental biology. We chose to begin with the challenges associated with novel feedstuffs.

Salmon are carnivores but today aquaculture provides more than half their fat and protein from plants, challenging the metabolic system and affecting fish health and nutritional value of salmon meat. The effects of the novel feed ingredients on the salmon body are complex and involve many organs. The newly sequenced salmon genome and related resources will enable a tightly integrated theoretical-experimental study of mechanistic interactions among genetic and feed factors. This brings us to systems biology: understanding the living body as a set of components that both affect each other and depend on each other. By combining experiment and modelling we aim to deliver a predictive understanding of a whole range of possible diets, much more efficiently than by traditional feeding trials alone.

In late 2015, the Digital Salmon became a FAIRDOM partner. We have been using the FAIRDOMHub, an online instance of the SEEK software, and it has proved very useful in contextualizing our research assets—data, operating procedures, and models—in an investigation-study-assay structure, adapted to our research. We have saved much time and explanation and avoided many misunderstandings by being able to point coworkers to a data file that automatically links to experimental protocols and the wider research motivation.

Members of the FAIRDOM team have provided training, help, and advice on planning how we should structure and manage our data within the FAIRDOMHub. We are currently seeking a dedicated biosemantician to improve the semantic interoperability of our data and ultimately our ability to query over related data. If this sounds like an attractive challenge to you, please contact project leader Jon Olav Vik.

The project also faces difficult design decisions in managing large, non-public data. The raw data make up a few TB per year, and we would prefer to catalogue rather than store this data in the FAIRDOMHub, perhaps linking to our homebrew, lightweight LIMS system which keeps track of our biological samples. We are also eagerly anticipating FAIRDOM development in interfacing the SEEK with Git version control of analysis reports and computer code, and programmatic access to data and models on the FAIRDOMHub.

Overall, FAIRDOM software and expertise has greatly improved our ability to implement data and model management protocol across our project. We see this being hugely valuable as our project matures. We’re looking forward to growing our relationship with the FAIRDOM team, in particular our contact persons Natalie Stanford and Stuart Owen.

DigiSal is funded by the Research Council of Norway grant 248792 as part of its Digital Life initiative. It is hosted by the Norwegian University of Life Sciences, with partners at universities in Trondheim, Bergen, Tromsø, Wageningen and Stirling, the Institute for Marine Research, and the industry companies AquaGen and EWOS. We also collaborate closely with the Foods of Norway centre for research-driven innovation.

Our first user meeting

Here at FAIRDOM we celebrated another great year of providing data and model management support to researchers by running our first user meeting. It formed part of the satellite meetings for the ICSB, and it was held at the Barcelona Biomedical Research Park: a beautiful venue that backs straight onto the beach. On the morning of the meeting the sun was out, the skies were clear, and the weather reports promised 30 degrees by lunch time. Despite the lure of a perfect summers day on the beach, we arrived at the meeting to find over 40 people eager learn about, and discuss, FAIRDOM data and model management. Brilliant.

20160914_144739 (1)

The beach just outside the Biomedical Research Park

We had keynote discussions which looked at FAIRDOM and what we offer to assist researchers in data management (Carole Goble); How to encourage researchers to share and manage their data (Gareth Knight and Yannick Wurm) and; How to publish citable research (Jacky Snoep).

Our attendees getting ready to start the day.

In addition to this we had short presentation from large consortia projects that are championing the use of data management, using FAIRDOM, within their projects. During the afternoon we had breakout sessions for training (Olga Krebs and Bernd Rinn), technical requirements (Stuart Owen), discussions on incentivising data and model management (Natalie Stanford).

Some key take home messages from the meeting

1. For science to progress we need to build on previous knowledge, however much of this knowledge – in the form of data – is disappearing rapidly over the long term.


Scientists losing data at a rapid rate (2013)

2. The issue of implementing data and model management is a global issue, demonstrated by the breadth of countries members of the user meeting attended.

Screen Shot 2016-09-23 at 14.50.56

Research countries of registered attendees at the FAIRDOM user meeting

3. Writable APIs are the way forward for allowing integration of data and model management resources.

4. The sign of a successful meeting  is when three quarters of the attendees insist on joining you for dinner.

Screen Shot 2016-09-23 at 14.55.13

Some of the attendees and the FAIRDOM team enjoying talks over dinner

5. It’s good to catch up on the presentations even if you couldn’t attend:

Gareth Knight
London School of Hygiene & Tropical Medicine
Rewarding open science practice in funded research (coming soon)

Yannick Wurm
Queen Mary University London, UK
A major transition in social evolution & some data tidbits

Carole Goble
The University of Manchester, UK
FAIRDOM introduction

Jacky Snoep
Stellenbosch University, SA
Publishing FAIR studies with citable research assets: a case study. Modelling glucose metabolism in malaria patients

Sergey Lashin
Institute of Cytology and Genetics SB RAS, Russia
Siberian SEEK (coming soon)

Tomasz Zielinski
University of Edinburgh, UK
Evaluation of SEEK and OpenBIS for data management on a centre-wide level

Jon Olav Vik
Norwegian University of Life Sciences, Norway
Data and model management needs for a knowledge base of salmon physiology. Use Case: The Digital Salmon

Martin Böhm
DKFZ, Heidelberg, Germany
PAL’s Experience: IMOMESIC

Matthias König
Charite´ – Universitätsmediz Berlin, Germany
Management of data, models, analyses and code for reproducible computational research: Present and Future

Maksim Zakhartsev
Hohenheim University, Stuttgart, Germany
MOSES, ZucAt and ExtremoPharm (coming soon)

Review of NBI-SysBio-Workshop ‘Kinetics on the move’

On the occasion of the 10th Anniversary of SABIO-RK (Biochemical Reaction Kinetics Database), the Scientific Database and Visualization (SDBV) group of HITS hosted the Workshop on Data for Computational Modelling in Heidelberg (30-31 May 2016). It was organized as a training course for experimentalists and modelers by the Data Management Node (NBI-SysBio) of de.NBI and supported by the FAIRDOM project and HITS.

In the keynote talk Ursula Kummer gave an overview of SABIO-RK and Modelling Biological Systems. The following talks and hands-on sessions focused on the publication, curation, retrieval, and usage of kinetic data from the reaction kinetics database SABIO-RK and on the use of data in modeling. The second day was dedicated to data management including best practice in data and model storage and re-usability by introducing the SEEK system and incorporated tools. The talks and hand-on sessions were enriched by questions, lively discussions and feedback given by the approximately twenty workshop participants.

Written by Maja Rey of de.NBI and SABIO-RK

Samples Club 2

Our second Samples Club was held in Manchester, just before easter, on March 21st-23rd. The meeting was jointly hosted between FAIRDOM and ELIXIR-UK, as part of ELIXIR-EXCELERATE. Participants with an interest in sample metadata and modelling got together to discuss a range of topics related to a sample centric interoperability framework driven by use cases.  A broad range of European and national projects were represented including (but not limited to) BioSamples, BBMRI, CORBEL, ISBE, SynBioChem, FAIRDOM, RD-Connect, ISA, the BD2K centres, CEDAR and BioCADDIE, and Ocean Sampling Day (OSD).

A number of working groups were formed in order to continue work on key topics that emerged from the meeting. These included:

Working Group 1

Sample meta-data profiles.  Based on a faceted projection of sample ‘primary type’, develop recommendations for a standardised set of associated attribute profiles based on minimum information standards (e.g. MIABIS, MIGS, MIMARKS)

Working Group 2

Tools for sample meta-data enrichment. Defining tools (Zooma, SORTA, RightField) for ontological enrichment of metadata from legacy data to point of capture;

Working Group 3

Sample interoperability framework. Proposal of a best practice framework for modelling sample-sample interactions exploring the benefits of process, procedure chain

Working Group 4

Tools for sample meta-data management. Harmonising data interoperability between tools (such as SEEK4Science, BioSamples, MOLGENIS, ISA) for sample metadata management.

Samples Club has been established as an open group that aims to develop an extensible and interoperable framework for the representation of sample meta-data as a recommendation for use across the European Life Science infrastructures, products and services. It’s a club that anyone can join. If you have an interest, particularly if you have a samples related use case please get involved by mailing: or

Getting a handle on the future of life-science data.

“Before data-stewarding practices can be expertly developed we need to understand the nature of the data to begin with, and how it is expected to change in the coming years.”

The idea of the “data deluge” has been looming over the life-sciences for the last 10 years. Advancing technologies are increasingly improving the speed and quality with which vast quantities of large data sets can be collected. Far from being something to fear, if handled correctly, this process of ramping data collection can really drive the speed of discovery in systems research.

Handling this change is a key challenge for the life-sciences, and requires forward thinking for developing the right platforms and techniques for collecting, annotating and storing this data during its lifecycle. There are 5 characteristics that should be understood for this process [1]:

Volume – data size – the amount of data that can be produced in a given amount of time varies over methods and tends to increase with years. The highest volume of data tends to come directly from machines in a raw format. The volume often reduces during  post-processing to a final form. Understanding these changes ensures appropriate storage, preservation, and access solutions can be used. It also important to understand how much raw data to preserve.

Velocity – speed of change – how fast the data is replaced. With improving technologies some types of datasets can become obsolete quickly. Knowing a timescale for obsoletion allows appropriate managing of stored datasets over the long term, dictating if and when to scrap the old.

Variety – different forms of data storage – data types can be collected using different methods, and technologies. These usually allow a specialised way of collecting the most valid data for any given study. Understanding the range of these techniques, and how the final data is used and, over the long-term, re-used and/or repurposed, is very important for valid downstream use.

Veracity – uncertainty of data – describes how messy the data is, and therefore how much it can be trusted for study. Often high throughput data has to compromise qualities which may be useful  (e.g. full quantification for metabolomics data).

Value – how useful is it for the investigation at hand?

As a community, we held a joint meeting in March 2014, bringing together  biomedical sciences research infrastructures (BMS RIs) covering genomics, proteomics, imaging, metabolomics, and clinical data. In a two day meeting we brainstormed current data characteristics, and those predicted for the future, with teams of experts for all data-types. As a result we managed to produce a report for both  infrastructures and researchers to use as a basis to begin developing our data-management plans. The findings are still useful two years on.


[1] Bernard Marr (2015) Big Data. John Wiley & Sons; 1 edition.

Written by Natalie Stanford of FAIRDOM, and edited by Steffi Suhr from the BioMedBridges Infrastructure and EBI.

Reproducible and Citable Data and Models Workshop

Last week the FAIRDOM team were in the beautiful seaside town of Warnemünde, Germany, holding their first Reproducible and Citable Data and Models Workshop. The workshop ran for 3 days from September 14th-16th, and was supported by EraSysAPP. It featured: talks from field experts in reproducible and citable data and models, introducing the participants to the state-of-the-art; hands-on sessions for how to structure, store, snapshot, and publish data and models to Zenodo and ORCiD; what Research Objects and Combine Archives are and how to create them; how to use software such as SEEK to improve the reproducibilty of published models.

We had participants from UK, Germany, Norway and Italy, with both academia and industry represented. The atmosphere was relaxed and productive, with participants citing it as highly useful for their research.

You can access the experts talks which established the state-of-the-art for participants:

Jo McEntyre (EMBL-EBI, UK), Citing data in research articles: principles, implementation, challenges – and the benefits of changing our ways.

Tom Ingraham (F1000, UK), F1000 Research: publishing data and code openly.

Carole Goble (Director of FAIRDOM, University of Manchester), Reproducible and citable data and models; Licensing, citation and sustainability.

Mihai Glont (EMBL-EBI, UK), Capturing the context: one small(ish) step for modellers, one giant leap for mankind.

Paolo Manghi (CRS, Italy), Enabling better science: results and vision of the OpenAIRE infrastructure and RDA Data Publishing.

Dagmar Waltemath (University of Rostock), Reproducibility of model-based results: standards, infrastructure, and recognition.

Jacky Snoep (Stellenbosch University, South Africa), Reproducible model construction, validation, and simulation. (to be provided shortly).

Wolfgang Müller (HITS), Making your data good enough for sharing.

The tutors were:

Finn Bacall (University of Manchester)

Martin Golebiewski (Hits gGMBH)

Natalie Stanford (University of Manchester)

Stuart Owen (Univesrity of Manchester)

Martin Scharm (University of Rostock)

Martin Peters (University of Rostock)


Picture credit to TMZ, University of Rostock.