Metadata best practices should be applied and followed throughout your research project to increase the accessibility and usability of your research data for you, your research team, and future users.
Metadata is data about data. Metadata describes and gives context to your data and research project. Metadata is key for research data access and re-use. Metadata makes it possible for others to understand your data, starting with top-level descriptive metadata such as title, creator(s), and date created, to variable-level metadata.
Metadata standards or schemas are made up of a set of authorized elements that describe your data. Many disciplines have their own metadata standards, as well as some data repositories. Use the directories below to find the metadata standard that best describes your research data:
1. Make a data dictionary before or simultaneously with data production. If you work with several people on a project, or similar experiments or measurements are done regularly in your research group, it is a good idea to develop a data dictionary for the collected data, using a controlled vocabulary to fill it in. A data dictionary can describe general information about your overall study, containing metadata fields such as “Study Title”, “Study description”, “Experimental Factors”, “Study Design” etc. It can also describe individual observations and measurements, containing metadata fields as column names, such as “Date”, “Length”, “Datafile name” etc.
Start the data dictionary while you are developing the project or the data management plan.
Excel and Google Sheets are a simple way to create a data dictionary.
Examples:
2. Add all information needed to understand and reproduce your experiments as metadata. Metadata fields could be dose, time, date, frequency, measurement unit, geographical coordinates, unexpected events, parameter settings, name and version of the software used etc. Include reference to used protocols and raw or processed datafiles. The context of the data generation should be richly annotated to maximize its reusability: mention any particularities or limitations about the data that other users should be aware of. Ensure that all variable names are explained or self-explanatory (i.e., defined in the research field’s controlled vocabulary). Clearly specify the version of the archived and/or reused data.
3. Use controlled vocabulary and data validation.
Use controlled vocabulary and data validation as much as you can to avoid mistakes, such as typos, misspelling, synonymous etc.
4. Use standard metadata and ontology.
Use standard metadata and ontology as much as possible, so that your data can be reused and different experiments can be easily compared.
5. Do not include calculation nor graphs in the metadata sheet.
A data dictionary should only contain metadata and/or raw data. For calculation and graphs, make a copy of the spreadsheet.
6. Do not use colour code as (meta)data and do not combine multiple variables in one cell.
More information on creating a data dictionary or metadata checklist: https://rdm.elixir-belgium.org/metadata_in_practice
Keep file names short, descriptive, and agree on and follow consistent conventions with your team. Here are some general guidelines and examples:
DO: SSHRC_Proposal_2022-04-01_v02.docx
DON'T: finaldraft1 or finalfinaldraft3
In research data management practice, documentation refers to a document or documents included with your data that describes the details of the data and how it was generated. Examples of data documentation include:
A readme file provides information about a dataset and is intended to help ensure that the data can be correctly interpreted, by yourself at a later date or by others when sharing or publishing data. ReadMe files are usually formatted as text files to prolong their lifespan and ensure accessibility. There are no standards for readme files but should include:
Find more information on ReadMe files in the Guide to writing "readme" style metadata by the Research Data Management Service Group at Cornell University.