Repositories and commons are ways of storing and managing data and models. They allow data and models to be made available publicly, for search and reuse. Repositories used in systems biology tend to be data-type specific, with repositories available for storing genome data (e.g. GEO), transcriptome data (e.g. RNA-Seq Atlas), proteome data (e.g. PRIDE), metabolome data (e.g. MetaboLights), and models (e.g. BioModels) to name a few. Data-type specific repositories allow the data to be made accessible to a strong core audience within that field of research. For systems biology projects, however, they offer a discordant solution: when datasets generated to populate a model are stored in separate repositories, even highly comprehensive metadata is not enough to retain the relational links between the datasets and model. To address this issue, commons were generated.

A commons is centralised public resource that allows the aggregation of diverse content. In systems biology, commons allow data and models to be linked to, stored, and managed using contextual relationships (e.g. all data and models comprising healthy liver tissue from patient X), rather than by data type. The most popular commons platform in systems biology is SEEK, which uses the Investigations (project context), Study (unit of research), Assay (analytical measurement) format to organise data and models hierarchically. The SEEK forms part of FAIRDOM platform, which is a collection of software and aggregated data that support the entire pipeline of data management from data collection through to publication. The groups of data and models that comprise investigation can be downloaded as complete packages known as Research Objects. Research Objects aid with exchanging research outcomes, which retain semantic integrity, between researchers.

Repositories and commons should be where published data and models are made publicly available for search, download, and reuse. However, it is noted that 20-30% of researchers are not willing to share their data or models publicly even after publication in a journal. This shows a significant barrier in the sharing culture within systems biology that needs to be overcome.


Resource Description
Array Express Array Express – archive of functional genomics data.
BiGG A biochemical, genetic and genomic knowledge base for generating large-scale metabolic reconstructions.
BioCyc Collection of databases for different cellular functions
BioModels For storing SBML models.
BioUML Platform for analysing ‘omics data using computational biology tools.
BRENDA The Comprehensive Enzyme Information Systems.
CellML Model repository For storing CellML models.
ENA European Nucleotide Archive – a comprehensive record of the worlds nucleotide sequences.
GenBank National Institute of Health genetic sequence database.
GEO Gene Expression Omnibus – repository for functional genomics data.
insilicoDB Subscription based modelling solution.
JWS Online For storing SBML models, as well as an online simulation environment.
KEGG KEGG is a database resource for understanding high-level functions and utilities of the biological system.
MetaCrop Summary of information relating to metabolic pathways in crop plants.
Model DB Yale resource for model sharing, no specified formats, and shows flexibility for storing parameter sets.
Open Source Brain For collaborative development of brain models.
wwPDB World Wide Protein Data Bank – holds information on 3D structures of proteins.
PubMed For biomedical literature.
Sabio-RK A curated database containing information about biochemical reactions – including kinetic rate equations, parameters, experimental conditions.
SEEK Commons resource for storing research assets in ISA format.
STRING For known and predicted protein-protein interactions.
TCGA The Cancer Genome Atlas.
UniProt For protein sequence and functional information.
Virtual Cell Internet based modelling software for mathematical models in general.