Library Help Chat Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Research Data Management

Documentation and Metadata Best Practices

Metadata best practices should be applied and followed throughout your research project to increase the accessibility and usability of your research data for you, your research team, and future users.

What is metadata?

Metadata is data about data. Metadata describes and gives context to your data and research project. Metadata is key for research data access and re-use. Metadata makes it possible for others to understand your data, starting with top-level descriptive metadata such as title, creator(s), and date created, to variable-level metadata.

There are various types of metadata:

  • Citation metadata: basic information such as author, title, etc.
  • Domain specific metadata: discipline or format specific metadata (ex. life sciences, geospatial)
  • File-level metadata: descriptive information about file content, relationships, etc.

Metadata standards or schemas are made up of a set of authorized elements that describe your data. Many disciplines have their own metadata standards, as well as some data repositories. Use the directories below to find the metadata standard that best describes your research data: 

Organization and Naming

File Naming and Versioning

Keep file names shortdescriptive, and agree on and follow consistent conventions with your team. Here are some general guidelines and examples:

  • Agree upon a file naming convention early with your team when planning data management
  • Use a short, unique, and descriptive identifier such as an acronym of your project name or grant #. This will make your files easy to find.
    • Add key term summarizing the content of the file to the file name such as GrantProposal, Questionnaire, etc.
    • Don't repeat file name information from the folder above: 
      • DO: Survey >> Results OR Survey >> ConsentForms
      • DON'T: Survey >> SurveryResults OR Survey >> SurveyConsentForms
  • Dates: Always use YYYYMMDD or YYYY-MM-DD format for dates. This format is easiest to read and systems to sort in chronological order
  • Use _ (underscores), - (hypehs), and/or CamelCase to delimit and avoid special characters as different computer systems will handle them differently
  • Where appropriate you may also wish to include researcher/author initials or location information in the file name
  • Keep track of versions by either changing the date and time or numbering system such as v01 or v01-01 ... v01-03 ... v03-02 to track file versions within different stages of the project.
    • Use leading 0s allowing a computer to sort the versions in chronological order
  • Try to keep file hierarchies shallow
    • no more than 4 levels deep
    • try to limit the number of files to around 10 files per folder


DO: SSHRC_Proposal_2022-04-01_v02.docx

DON'T: finaldraft1 or finalfinaldraft3


File Formats

Scholars Portal Dataverse can accept any file format but to ensure the long-term accessibility and reusability of research data, widely used and supported, non-proprietary file formats are preferred.

Data Type Preferred Format 
Audio uncompressed and lossless Wav or AIFF, FLAC, Mp3
Container File .tar, .zip; container files are automatically unpacked by Dataverse
Image uncompressed TIFF; acceptable formats: PNG, JPG
Text Unicode text (.txt), Comma/Tab Separated Values (.csv)
Video MPEG-4 (.mp4)
Array data netCDF (.nc)
Statistical analysis spreadsheet (.csv, .tsv, .tab, .ods), SPSS (.por, .sav), STATA (.dta)
Geospatial ESRI (.shp, .shx, .dbf, .prj, .sbx, .sbn), GeoTiff (.tif, .tfw)
Markup language XML, HTML, .css, .xslt, .js, .es
Documentation  PDF/A

This is not an exhaustive list. Please contact the RDM Librarian for more information on preferred formats.


Additional Metadata

ReadMe Files

In addition to the set metadata fields, ReadMe files provide a way for you to further document your research data. ReadMe files are usually formatted as text files to prolong their lifespan and ensure accessibility. There are no standards for readme files but should include:

  • Data and file overview for each file name including a short description of the data each file contains and when the file was created
  • Licenses or restrictions placed on the data
  • Methodological information including, description of methods for data collection/generation and processing
  • Data-specific information for each dataset or file (as appropriate), including:
    • Variable list, including full names and definitions of column headings for tabular data
    • Units of measurement
    • Definitions for codes or symbols used to record missing data

Find more information on ReadMe files in the Guide to writing "readme" style metadata by the Research Data Management Service Group at Cornell University.



Databases Terms of Use