Library Help Chat Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Research Data Management

Data Anonymization

What is data anonymization?

Anonymization involves permanently removing personal identifiers so that data cannot be attributed to an identifiable individual.

Pseudonymization or deidentification involves replacing personal identifiers with alternate identifiers so that data cannot readily be attributed to an identifiable individual, but where the ability to re-identify the data is maintained.


Data Anonymization Software

Assessing Risk

  Low Risk Medium Risk High Risk Extreme Risk
Definition Publicly available data where there is no reasonable expectation of privacy, regardless of sensitivity or identifiability.
Data collected with no information that could reasonably identify individuals or groups.
Data contains no confidential, private, or sensitive information.
Data subjects are not vulnerable in the context of the research and would not be harmed if a breach were to occur.
All identifiers collected have been stripped so that data has no information that could reasonably identify individuals or groups.
Data may contain information originally collected as confidential, private or sensitive. 
Data subjects are not vulnerable in the context of the research and would not be harmed if a breach were to occur.
Identifiers remain and/or (re)-identification is possible or probable.
Data contains confidential, private or sensitive information.
Data subjects may be vulnerable in the context of the research and may be harmed if a breach were to occur.
Data acquired through an agreement (formal or informal) with a custodian, barring further use or retention.
Identifiers remain and/or (re)-identification is possible or probable.
Data contains confidential, private or sensitive information.
Data subjects are vulnerable in the context of the research and would be harmed if a breach were to occur.
Informed Consent Notification that data will be made available for future use. Notification that data will be made available for future use. Option to opt-out of deposit should be considered. Notification that data may be made available for future use.
Request for permission to share and/or deposit data clearly included in consent form or process. 
If possible, provide options regarding areas of future research.
Confidentiality will be maintained for as long as the data exist. Data will not be shared beyond the research team.
Data Collection Publicly available data may be found online or in public archives, or be collected through naturalistic observation.
Researchers do not know the identities of research participants/data subjects.
Methods should not involve direct interaction with research participants. These typically involve surveys, questionnaires and observational research.
No direct or indirect identifiers are collected.
Researchers may know the identities of research participants/data subjects and may have promised confidentiality through informed consent.
Methods for data collection are wide-ranging and may involve direct interaction with research participants.
Direct and/or indirect identifiers may be collected.
The majority of human research will fall into this category. 
Researchers may know the identities of research participants/data subjects and may have promised confidentiality through informed consent.
Methods for data collection are wide-ranging and may involve direct interaction with research participants.
Direct identifiers may or may not be collected, but indirect identifiers collected may be sufficient to render participants identifiable.
Researchers may know the identities of research participants/data subjects and will have promised confidentiality through informed consent.
Methods for data collection are wide-ranging and may involve direct interaction with research participants.
Direct identifiers may or may not be collected, but indirect identifiers collected may be sufficient to render participants identifiable.
Analysis / Management No restrictions in the analysis of data for publicly available data.
Data analysis should adhere to the REB-approved protocol and informed consent document/script.
Direct identifiers should be replaced as soon as possible with a linking code (e.g., pseudonym, alpha numeric code) and separated physically and/or electronically from the master list. Consent forms should be stored separately from research data.
Only members of the research team should have access to identifiable data.
Direct identifiers should be replaced as soon as possible, with a linking code, and separated physically and/or electronically from the master list. Consent forms or notes with identifiers should be stored separately from research data.
Indirect identifiers should be coded, if possible.
Data should not be accessed/analyzed in a public space where others could see data on a device or by other means.
Direct identifiers shall be replaced as soon as possible, with a linking code, and separated physically and/or electronically from the master list. Consent forms or notes with identifiers shall be stored separately from research data.
Indirect identifiers shall be coded, if possible.
Data shall only be accessed by members of the research team, as described in the approved protocol, and access/analysis shall only occur in a secure environment.
Active Storage and Security All storage devices, file sharing, and cloud services are allowed, including both public and institutional cloud services.
Data should be backed up in a way that is consistent with the risk level associated with these data.
Identifiable data should be stored on password-protected devices, in appropriate secure locations. If data need to be accessible through the internet, they should be encrypted.
Public cloud services should not be used, unless no other options exist. If they are used, files and access should be password-protected and encrypted.
Private cloud services, as supported by the research institution and/or assessed as being secure, may be used. 
Data should be backed up in a way that is consistent with the risk level associated with these data.
All data should be stored on password-protected encrypted devices, in appropriate secure locations. If data need to be accessible through the internet, they should be encrypted.
Public cloud services are strictly prohibited.
Private cloud services, as supported by the research institution and/or assessed as being secure may be used, if approved by the REB. 
Data should be backed up in a way that is consistent with the risk level associated with these data.
All data shall be stored on a centralized, stand-alone computer/site that is both password protected and encrypted, in appropriate secure locations.
Data should be backed up in a way that is consistent with the risk level associated with these data.
Sharing Can be shared via email and all cloud services including public cloud services. Encrypted and password-protected files can be shared via email and institution-approved cloud services or collaboration sites. Restricted data shall only be shared with other members of the research team, as specified in the approved protocol. Files shall be encrypted and password protected.  Data restricted to a centralized, stand-alone computer/site that is password protected and encrypted.
Files should not be copied or shared.
Access to data shall be restricted to authorized individuals explicitly identified in the REB protocol and should involve the smallest number of individuals possible. 
Deposit and Access Data should be deposited with unrestricted access within a reasonable timeframe, taking into account publication of original papers. 
Secondary data use does not require REB approval.
Data from participants/data subjects who opt out should be separated from data to be deposited.
De-identified data should be deposited with unrestricted access within a responsible timeframe, taking into account publication or original papers, need to replicate research and ensure appropriate shelf-life for reuse of the data.
Secondary use of de-identified data currently requires REB approval.
Data from participants/data subjects who opt out should be separated from data to be deposited.
De-identified data should be deposited with restricted access to be evaluated by the data custodian. Data may be separated into sets depending on potential uses that participants have agreed to through informed consent (e.g. use for this study only, only for studies in this subject area, or for any use).
Secondary data use requires REB approval. 
Data should not be deposited anywhere, beyond the direct storage and access needs of the research team.
Retention and Destruction Data may be retained indefinitely for discovery, access, and archival purposes. Data may be retained indefinitely for discovery, access, and archival purposes. Data may be retained indefinitely for discovery, access, and archival purposes in accordance with the REB-approved protocol. Data must be destroyed at the earliest opportunity, in accordance with the REB-approved protocol.

Source: Sensitive Data Expert Group. (2020). Sensitive Data Toolkit for Researchers Part 2: Human Participant Research Data Risk Matrix. https://doi.org/10.5281/zenodo.4088954 Creative Commons Attribution Non Commercial 4.0 International

Informed Consent for Data Sharing

Informed Consent for Data Sharing

Ensure data sharing and future reuse is included in participant consent and information letters and in ethics applications. Please contact the UWinnipeg Ethics Program Officer for more information.

Sample Language to Deposit and Share Data

Including RDM Language in Grant Proposals

Granting agencies are increasingly investing in projects that have demonstrated data management planning in their proposals. The following is a generic example but the more specific details you can provide the better.

Sample Language:

"Our team is committed to best practices in Research Data Management and strictly adheres to the ethics and research policies of the University of Winnipeg as to the management of our research data. With respect to storing data collected, we will use secure data storage options and security measures that are most appropriate for the sensitivity level of the data. After the project is concluded, the research outputs from this project will be shared via a general research data repository such as Scholars Portal Dataverse, or a domain-specific repository such as ________, which will further broaden the reach of the research outputs of this project."

Informed Consent Language

Participant consent and information materials should include future use of data. Participant consent is required in order for data to be shared and used beyond the scope of the immediate project. The following are examples of language for participants to consent to the reuse of their quantitative or qualitative data:

Quantitative: "By participating in this research, I hereby consent to my de-identified information be used for research purposes beyond this immediate project"

Qualitative: “De-identified transcripts and/or summaries of interviews will be deposited in the University’s Research Data Repository, to gain access, researches will have to get approval by the principle investigator and/or organization and transcripts will be redacted or summarized so researchers will gain a general idea of what was said but may not have access to the exact words”

Resources

Recommended Informed Consent Language for Data Sharing (ICPSR): This guide includes: language to avoid, model language, known concerns and recommended alternatives and conventional language used in the past.

Anonymization Webinars and Training

Webinars

Resources

Copyright

Databases Terms of Use