This checklist walks you through the fundamental questions you should ask yourself when constructing a data and model management plan for your grant proposal.
This data checklist includes integration of the Data Management Plan requirements on H2020 grants (PDF available here: http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf)
If you are working directly with FAIRDOM
- Please identify a DMP Responsible Persons in your project (names)
- Please identify a bench scientist and a modeler who is participating in PALs meetings. PALs meeting is discussing data management needs, requirements and solutions with non-PIs.
- State the purpose of the data collection/generation.
- Explain the relation to the objectives of the project
- Consider what data will be collected or created as part of the study (RAW data).
- Consider what data will be produced by processing the RAW data (Secondary, processed data).
- Specify if existing data is being re-used (if any)
- Specify the origin of the data
- Specify the types and formats you plan to use for the data generated/collected (raw, processed, published).
- Consider what data will be published as the result of your study (Published data).
Volume and Life Cycle of the Data.
If you are using FAIRDOM, we will look after data that will be retained and potentially exchanged by your projects. It will help with local storage for temporarily-held local data prior to processing.
For RAW data, please consider the following:
- How much RAW data you think will be produced (Estimates, per month, year, full project duration)?
- Will all of the RAW data be kept for the duration of the study or will the RAW data be deleted once it is processed?
- For large scale RAW data (images, sequence) have you planned the local storage capacity necessary for processing?
- Do you require help to organise a suitable local management system for RAW data?
- Do you have policies that govern the management and usage of RAW data?
- How long will RAW data be kept? Will there be a long-term archive?
For Secondary and Published data, please consider the following:
- What data processing is foreseen in the project?
- How much processed data will be produced, and stored (can you make estimates per month, year, full project)?
- How much of this data will be published? (Estimates per month, year, full project)?
- Does your institution, or the project funders, have policies governing the access and usage of processed data?
Additional for personally sensitive data (e.g medical data)
- When looking at the data flow through the project, define what data is:
- aggregated (typically safe to share, if names cannot be recovered)
- anonymized (name cannot be recovered from the data)
- pseudonymized (name can be recovered by some)
- non-anonymized (name linked to data)
- Determine which organisational boundaries have to be traversed by which data.
- Make sure with your *local* data protection officer and ethics commission that the data can be shared with your partners along the flow described with the anonymisation levels as described. Why local? Some laws change across surprising boundaries. E.g. in Germany Universities and other public organisations are subject to another data protection law than enterprises. Why seek advice? In some cases you may be required to be able to recover the name-data-relation, e.g. to enable study participants to *leave* a study.
Making data findable (documentation and metadata management)
- What documentation and metadata will accompany the data (assist its discoverability)? (Details on methodology, definitions, procedures, SOPs, vocabularies, units, dependencies, etc)
- What information is needed for the data to be read and interpreted in the future?
- What naming conventions will be used?
- How will you approach versioning your data?
- How will you capture / create this documentation and metadata?
- How do you ensure the completeness of the captured data?
Making Data Accessible
- Specify which data will be made openly available taking into consideration
- What ethics and legal compliance issues do you have if any? Do you need consent for data preservation and sharing? Do you have to protect certain data? Is any data sensitive?
- Do you think you might have Intellectual Property Rights issues? Have you considered ownership of the data, licensing, restrictions on use?
- Do you think you will need to embargo any data?
Making Data Interoperable
- What standards (metadata vocabularies, formats, checklists) or methodologies will you use?
- How do you address data and model quality? What validation steps do you foresee?
- Will you use standardised vocabulary for all data types to allow inter-disciplinary interoperability?
- Where you can not used standardised vocabulary for all types of data, can you map to more commonly used ontologies?
Making data Re-usable
- How will you licence your data to permit the widest re-use possible?
- When will the data be made available for re-use? Does this include an embargo period? (if yes, please detail why)
- Which data will be available for re-use during/after the project? For data that is not re-usable, please detail why
- What are your data quality assurance processes?
- How long do you expect your data to remain re-usable?
- What are the cost estimates of making your data FAIR?
- What is the perceived long-term value of making your data FAIR?
- What provisions will you have in place for data recovery, secure storage, and transfer of sensitive data (if required)?
- Do you have any national/funder/sectorial/departmental procedures for data management?