Data Repositories and Commons
Repositories and commons are ways of storing and managing data and models. They allow data and models to be made available publicly, for search and reuse.
Repositories used in systems biology tend to be data-type specific, with repositories available for storing genome data (e.g. GEO), transcriptome data (e.g. RNA-Seq Atlas), proteome data (e.g. PRIDE), metabolome data (e.g. MetaboLights), and models (e.g. BioModels) to name a few. Data-type specific repositories allow the data to be made accessible to a strong core audience within that field of research. For systems biology projects, however, they offer a discordant solution: when datasets generated to populate a model are stored in separate repositories, even highly comprehensive metadata is not enough to retain the relational links between the datasets and model. To address this issue, commons were generated.
A commons is centralised public resource that allows the aggregation of diverse content. In systems biology, commons allow data and models to be linked to, stored, and managed using contextual relationships (e.g. all data and models comprising healthy liver tissue from patient X), rather than by data type. The most popular commons platform in systems biology is SEEK, which uses the Investigations (project context), Study (unit of research), Assay (analytical measurement) format to organise data and models hierarchically. The SEEK forms part of FAIRDOM platform, which is a collection of software and aggregated data that support the entire pipeline of data management from data collection through to publication. The groups of data and models that comprise investigation can be downloaded as complete packages known as Research Objects. Research Objects aid with exchanging research outcomes, which retain semantic integrity, between researchers.
Repositories and commons should be where published data and models are made publicly available for search, download, and reuse. However, it is noted that 20-30% of researchers are not willing to share their data or models publicly even after publication in a journal. This shows a significant barrier in the sharing culture within systems biology that needs to be overcome.