Research Data Management

Metadata & Documentation

Documenting Your Research

Documentation is an important part of data management. Your data is only useful to yourself and others if you have adequately described your dataset and documented your processes. This includes describing when, why and how the data was collected or generated, what the variables mean, how it was analyzed and how the final dataset was created.

Documentation is best done at the beginning of your research journey, and maintained throughout the project, to ensure accuracy and thoroughness

ReadMe Files

A readme file provides information about a dataset and is intended to help ensure that the data can be correctly interpreted, by yourself at a later date or by others when sharing or publishing data. ReadMe files are usually formatted as text files to prolong their lifespan and ensure accessibility. There are no standards for readme files but should include:

Data and file overview for each file name including a short description of the data each file contains and when the file was created
Licenses or restrictions placed on the data
Methodological information including, description of methods for data collection/generation and processing
Data-specific information for each dataset or file (as appropriate), including:
- Variable list, including full names and definitions of column headings for tabular data
- Units of measurement
- Definitions for codes or symbols used to record missing data

Find more information on ReadMe files in the Guide to writing "readme" style metadata by the Research Data Management Service Group at Cornell University.

README File Generator
This README file generator is maintained by the Federated Research Data Repository (https://www.frdr-dfdr.ca/). This tool allows researchers and data depositors to identify and populate relevant information to describe and contextualize datasets. Exported as plain text (.txt), the README file is suitable to document data for FRDR or other repositories.
Readme Template
Created by Cornell University's Research Data Management Service Group
Guide to writing "readme" style metadata
Created by Cornell University's Research Data Management Service Group
Creating a README for your dataset
Created by Doug Brigham, UBC

Codebooks and Data Dictionaries

Codebooks and data dictionaries are two forms of structured documentation used to define variables. They are related in function but differ in form, focus, and approach.

Codebooks

A codebook is a document commonly included with datasets in the social and behavioral sciences intended to assist with understanding the contents and structure of those datasets. Codebooks include front matter, including the study title, names of the principal investigators, and an introduction to the data. They may include methodological information too, if that is not documented elsewhere. However, the main content of a codebook is detailed definitions and descriptions of variables in the dataset.

Codebooks are commonly included with studies where lengthy questionnaires, surveys, or similar instruments are used and result in large numbers of variables, often named with opaque alphanumeric codes. For each coded variable, a codebook offers the question text, what the data values mean (e.g. 1 = good, 2 = fair, etc., also called value labels), and sometimes additional information such as summary statistics or notes and comments about that variable.

Data Dictionaries

Data dictionaries are, in contrast, typically in tabular/spreadsheet form. A typical data dictionary might contain columns for variable name (exactly as it appears in the dataset), a more descriptive human-readable variable name, unit of measurement, allowed values, a definition of the variable, and additional explanation, comments, or notes for each variable. Data dictionaries are not exclusively intended for quantitative empirical data, but they are more suited for that purpose than codebooks, since they foreground the units and allowed/expected values of variables.

If either of these forms of documentation are suitable for your study and dataset(s), it is good practice to create and maintain them and to later include them with your data when sharing it. They are crucial documentation when a research project has variables that are difficult to understand or need explanation.

How to Make a Data Dictionary
A resource from Open Science Framework on data dictionaries.
What is a Codebook?
A resource on codebooks from ICPSR.
Creating a Codebook in SPSS (Kent State)
An example of how to create a codebook using popular stats software SPSS.

File Naming & Versioning

File Naming and Versioning

Keep file names short, descriptive, and agree on and follow consistent conventions with your team. Here are some general guidelines and examples:

Agree upon a file naming convention early with your team when planning data management
Use a short, unique, and descriptive identifier such as an acronym of your project name or grant #. This will make your files easy to find.
- Add key term summarizing the content of the file to the file name such as GrantProposal, Questionnaire, etc.
- Don't repeat file name information from the folder above:
  - DO: Survey >> Results OR Survey >> ConsentForms
  - DON'T: Survey >> SurveryResults OR Survey >> SurveyConsentForms
Dates: Always use YYYYMMDD or YYYY-MM-DD format for dates. This format is easiest to read and systems to sort in chronological order
Use _ (underscores), - (hypehs), and/or CamelCase to delimit and avoid special characters as different computer systems will handle them differently
Where appropriate you may also wish to include researcher/author initials or location information in the file name
Keep track of versions by either changing the date and time or numbering system such as v01 or v01-01 ... v01-03 ... v03-02 to track file versions within different stages of the project.
- Use leading 0s allowing a computer to sort the versions in chronological order
Try to keep file hierarchies shallow:
- no more than 4 levels deep
- try to limit the number of files to around 10 files per folder

Examples

DO: SSHRC_Proposal_2022-04-01_v02.docx

DON'T: finaldraft1 or finalfinaldraft3

Resources

DataverseNO: Prepare your data - File naming and organization
Guidelines on File naming and organization from DataverseNO, a national, generic repository for open research data, owned and operated by UiT The Arctic University of Norway.
Organising (UK Data Service)
Guide for how best to keep track of your data files. Includes sample screenshot of a well organized file structure.
Organize (UBC Library)
RDM planning and organization
Recommended formats
File formats recommended by the UK Data Service
Preferred File Formats
File format recommendations from University of Washington Libraries

Sources

What is documentation?
Guidance on documentation from Ke Leuven University
General Documentation
Guidance on documentation from University of Virginia Library, created by Laura Hjerpe.
Guide to writing "readme" style metadata
Guidance to writing a ReadMe files from Cornell Data Services.