Library Help Chat Skip to Main Content

Research Data Management

Documentation Best Practices

Documentation and Metadata Best Practices

Metadata best practices should be applied and followed throughout your research project to increase the accessibility and usability of your research data for you, your research team, and future users.

What is metadata?

Metadata is data about data. Metadata describes and gives context to your data and research project. Metadata is key for research data access and re-use. Metadata makes it possible for others to understand your data, starting with top-level descriptive metadata such as title, creator(s), and date created, to variable-level metadata.

Metadata standards or schemas are made up of a set of authorized elements that describe your data. Many disciplines have their own metadata standards, as well as some data repositories. Use the directories below to find the metadata standard that best describes your research data: 

Metadata Tips

1. Make a data dictionary before or simultaneously with data production. If you work with several people on a project, or similar experiments or measurements are done regularly in your research group, it is a good idea to develop a data dictionary for the collected data, using a controlled vocabulary to fill it in. A data dictionary can describe general information about your overall study, containing metadata fields such as “Study Title”, “Study description”, “Experimental Factors”, “Study Design” etc. It can also describe individual observations and measurements, containing metadata fields as column names, such as “Date”, “Length”, “Datafile name” etc.

Start the data dictionary while you are developing the project or the data management plan. 

Excel and Google Sheets are a simple way to create a data dictionary.

Examples:

2. Add all information needed to understand and reproduce your experiments as metadata. Metadata fields could be dose, time, date, frequency, measurement unit, geographical coordinates, unexpected events, parameter settings, name and version of the software used etc. Include reference to used protocols and raw or processed datafiles. The context of the data generation should be richly annotated to maximize its reusability: mention any particularities or limitations about the data that other users should be aware of. Ensure that all variable names are explained or self-explanatory (i.e., defined in the research field’s controlled vocabulary). Clearly specify the version of the archived and/or reused data.

3. Use controlled vocabulary and data validation.
Use controlled vocabulary and data validation as much as you can to avoid mistakes, such as typos, misspelling, synonymous etc.

4. Use standard metadata and ontology.
Use standard metadata and ontology as much as possible, so that your data can be reused and different experiments can be easily compared.

5. Do not include calculation nor graphs in the metadata sheet.
A data dictionary should only contain metadata and/or raw data. For calculation and graphs, make a copy of the spreadsheet.

6. Do not use colour code as (meta)data and do not combine multiple variables in one cell.

More information on creating a data dictionary or metadata checklist: https://rdm.elixir-belgium.org/metadata_in_practice  

File Naming & Versioning

File Naming and Versioning

Keep file names shortdescriptive, and agree on and follow consistent conventions with your team. Here are some general guidelines and examples:

  • Agree upon a file naming convention early with your team when planning data management
  • Use a short, unique, and descriptive identifier such as an acronym of your project name or grant #. This will make your files easy to find.
    • Add key term summarizing the content of the file to the file name such as GrantProposal, Questionnaire, etc.
    • Don't repeat file name information from the folder above: 
      • DO: Survey >> Results OR Survey >> ConsentForms
      • DON'T: Survey >> SurveryResults OR Survey >> SurveyConsentForms
  • Dates: Always use YYYYMMDD or YYYY-MM-DD format for dates. This format is easiest to read and systems to sort in chronological order
  • Use _ (underscores), - (hypehs), and/or CamelCase to delimit and avoid special characters as different computer systems will handle them differently
  • Where appropriate you may also wish to include researcher/author initials or location information in the file name
  • Keep track of versions by either changing the date and time or numbering system such as v01 or v01-01 ... v01-03 ... v03-02 to track file versions within different stages of the project.
    • Use leading 0s allowing a computer to sort the versions in chronological order
  • Try to keep file hierarchies shallow
    • no more than 4 levels deep
    • try to limit the number of files to around 10 files per folder

Examples

DO: SSHRC_Proposal_2022-04-01_v02.docx

DON'T: finaldraft1 or finalfinaldraft3

Resources

Data Documentation

What is documentation? 

In research data management practice, documentation refers to a document or documents included with your data that describes the details of the data and how it was generated. Examples of data documentation include: 

  • Data dictionaryy or codebook;
  • Logbooks, lab notebooks or lab protocols;
  • Methodological information (how you collected your data);
  • Analytical and procedural information;
  • Information about the structure of the dataset;
  • How raw data have been processed into other forms of data;
  • Parameters and instrument settings for image acquisition, measurements, models or other techniques; 
  • Labels and definitions of variables; 
  • User guides;
  • Explanatory comments in code or model scripts; 
  • File properties added to a data file;
  • A README file for the dataset. 

ReadMe Files

ReadMe Files

A readme file provides information about a dataset and is intended to help ensure that the data can be correctly interpreted, by yourself at a later date or by others when sharing or publishing data. ReadMe files are usually formatted as text files to prolong their lifespan and ensure accessibility. There are no standards for readme files but should include:

  • Data and file overview for each file name including a short description of the data each file contains and when the file was created
  • Licenses or restrictions placed on the data
  • Methodological information including, description of methods for data collection/generation and processing
  • Data-specific information for each dataset or file (as appropriate), including:
    • Variable list, including full names and definitions of column headings for tabular data
    • Units of measurement
    • Definitions for codes or symbols used to record missing data

Find more information on ReadMe files in the Guide to writing "readme" style metadata by the Research Data Management Service Group at Cornell University.

Resources

Sources

Sources 

Copyright

Databases Terms of Use