In response to a growing need in the neuroscience community for concrete guidance concerning ethically sound and pragmatically feasible open data-sharing, the CONP has created an ‘Ethics Toolkit’, currently comprised of:

2. The CONP Privacy and De-identification Guide

Together, these documents are meant to help researchers identify key elements in the design of their projects that are often required for the open sharing of neuroscience data, such as model consent language and approaches to de-identification.

This guidance is the product of extended discussions and careful drafting by the CONP Ethics and Governance Committee that considers both Canadian and international ethical frameworks and research practice. The best way to cite these resources is with their associated Zenodo DOI:

We encourage you to share feedback on these materials by commenting on their source GitHub repository or by contacting the CONP at info@conp.ca.

What follows is a live rendering the latest versions of the toolkit documents, as maintained on GitHub and Zenodo.

The CONP Consent Toolkit (live from GitHub)

CONP Consent Guide (v1.0.4)

Unless otherwise noted, this work is licensed under Attribution 4.0 International

This consent guide is the product of extended discussions and careful drafting that considers both Canadian and international ethical frameworks and research practice. It provides general guidance to Canadian neuroscience researchers who wish to share de-identified data in an unrestricted manner with the scientific community and the public at large. The guide is a model for Canadian open-access neuroscience initiatives to adopt in developing their governance standards and it also describes the minimum permissions required for researchers to submit data to the Canadian Open Neuroscience Platform (CONP) Portal’s Community Server (see below).

CONP Portal users can presently choose among different methods of hosting their data, including third-party storage provided by the OSF or Zenodo, or storage native to the CONP’s technical infrastructure, known as the ‘Community Server’. Greater flexibility still in data hosting location is available through the combined use of the DataLad distributed data-management system and the GitHub open software repository to host the dataset metadata.

These distinctions in data storage location have stewardship implications:

For data that are accessible through the Portal but stored on non-CONP servers, it is sufficient for contributors to abide by their local legal and biomedical research ethics requirements and to respect the data stewardship regulations of the selected host repository. In this case, adherence to the CONP data stewardship guidance is recommended but not required.
However, for data natively hosted on its Community Sever and for which it therefore takes on the role of primary data steward, the CONP requires adherence to its data governance standards, which are described below.

In adapting this consent guide for your own research needs, we encourage you to mention to your research ethics board that your consent materials are modeled after those created by the CONP Ethics and Governance Committee, whose members include internationally recognized experts in research ethics, neuroethics, data governance, and law: Trudo Lemmens (Chair), Alexander Bernier (Manager), Ann Cavoukian, John Clarkson, Jason Karamchandani, Roland Nadler, Dylan Roskams-Edris, and Walter Stewart. Further feedback was graciously provided from members of the CONP community, in particular Patrick Bermudez, Marcel Farrés Franch, Jessica Royer, Robert Zatorre, and past Committee Members Bartha Maria Knoppers, Lindsay Green Noble, and Judy Illes.

This document provides general, ‘best-practice’ guidance for open data-sharing which must be adhered to in the specific case of data-sharing through the CONP’s Community Sever. The CONP’s Community Server accepts datasets in two situations:

Participants have prospectively consented to the sharing of their de-identified data via an open data-sharing repository.
Participants have not consented to the sharing of their data on the CONP Portal, but the consent given meets the core consent elements for retrospective consent.

Each of these situations are be explained below.

Part 1: Core consent elements (prospective consent)

Researchers and/or clinicians should prospectively seek consent to the open sharing of de-identified datasets. Depending on the circumstances, consent should be sought from either the prospective participant directly (first-person consent) or from their legally authorized representative (substitute consent). First-person consent is appropriate for adults with legal capacity. Substitute consent is for situations where the prospective participant lacks legal capacity either due to age (i.e., minors) or due to an inability to understand the nature and consequences of research participation (i.e., limited cognitive ability). In these circumstances, the prospective participant should be involved in the consent process to the fullest extent of their ability to do so.

In addition to the required consent elements common to research with human participants, prospective consent forms should include the following core elements to enable broad, open sharing via the CONP Portal:

Table 1: Core consent elements for contributing datasets to the CONP Portal.

	To contribute datasets to the CONP Portal Community Sever, participants should consent to:
Data generation	Generation of participant data for research purposes
De-identification (coding, anonymization, and synthetic data generation)	De-identification of their data, which may consist of a combination of the following processes: Coding – the removal of direct identifiers (e.g., name; health insurance number) and replaced with a code whose key is kept in a secure location Anonymization – the permanent removal of direct identifiers without the conservation of any keys Data synthesis – the creation of a dataset with similar statistical properties as the original dataset but without a one-to-one match for every variable
CONP Portal Community Sever	Sharing of de-identified data via the CONP Portal Community Sever, an open-access platform that researchers the world over may access
Commercial use	Use of de-identified data for commercial purposes
Data withdrawal	Not possible to withdraw data that has already been shared
Re-identification	Low risk that the participant could be re-identified in the future

If any of the elements in Table 1 have not been included in the informed consent documents, datasets should not be uploaded to the CONP Portal Community Sever (though it might still be possible to share them through the CONP Portal by one of its other data-storage methods). In such circumstances, we recommend you examine the ‘retrospective filter’ below. If doubts remain or you have questions about the suitability of these consent elements for your research, please consult with your local research ethics board.

Note: Researchers who want to repurpose the above guidance to establish the terms of data deposit for a different open science platform should replace the phrase “CONP Portal” in the above template with the name of the concerned open science platform.

We recognize that some research projects may choose to have a tiered consent model, whereby participants consent to only certain data types being publicly shared via the CONP Portal or other open data repository. In such circumstances, the essential consent elements are only required with respect to those data types. It is the researcher’s responsibility to ensure that data segmentation is appropriately managed so that only those data types for which consent to open data sharing has been given are shared.

See Annex 1 below for suggested clauses to include in the informed consent documentation.

Part 2: Retrospective consent filter

Use the filter below to determine if the consent already obtained from participants will permit you to include the data on the CONP Portal Community Server.

Step 1: Answer the following questions.

Have your participants consented to:	Yes	No
Deposit of de-identified data in open-access databases?
Commercial use?

Step 2: If all your responses in step 1 were “yes”, your data can be submitted for inclusion on the CONP Portal Community Server. If any of the responses were “no”, answer the following questions.

	Yes	No
Have your participants consented to be re-contacted?
Is it feasible (i.e., not onerous to the degree of jeopardizing your research) to re-contact and consent your participants for inclusion on the CONP Portal Community Sever Community Sever?

Step 3: If both answers in step 2 were “yes”, re-contact your participants and obtain their consent using the example consent clauses in Annex 1. Otherwise, answer the following question.

(Note: Only those participants who were re-contacted and gave appropriate consent can be included in the dataset included on the CONP Portal Community Sever Community Sever. If you have recontacted your participants and they have withheld consent, you are unlikely to obtain a waiver.)

	Yes	No
Are you able to apply to your research ethics board for a waiver, i.e., an authorization to alter the initial consent parameters?

Step 4: If the answer in step 3 was “yes”, apply to your research ethics board for a waiver. If the answer was “no”, your data cannot be included on the CONP Portal Community Sever.

Annex 1: Example consent clauses for the essential consent elements.

The example consent clauses are marked with CC0 1.0 Universal
	Essential elements	Example of consent clause language:	Alternative consent clause [1]:
Data generation	The generation of participant data for research purposes	You are being asked to consent to [include relevant procedure, e.g., imaging] procedure(s) that will generate data about your [include relevant data types, e.g., cognitive performance] for research purposes. [If questionnaires are also included…] You will also be asked questions about [include description of the types of questions, e.g., personal biographical information, health status, etc.]	N/A
De-identification (coding, anonymization, and synthetic data generation)	Removal of direct and indirect identifiers, the restructuring of data, and the generation of synthetic data for open sharing	To protect your privacy and to facilitate the broadest possible, public sharing of the data that you and other participants contribute, your name, date of birth and other directly identifying information will not be shared. Some data, [e.g., name; date of birth; occupation; etc.], will be replaced by a code. Other data that we believe may allow others to determine who you are, such as your [include relevant feature, e.g., facial features] will be irreversibly changed through a family of processes called anonymization so that you cannot be identified. Other information, such as [include relevant data, e.g., age group; location], will be pooled together with those of other participants for high-level summaries.	Your privacy is very important to us, and we will take appropriate measures to protect it. We will not disclose any information about you like your name, your date of birth, your address, or your contact information to unauthorized persons. All personal identifying information will be replaced with a unique code. Before researchers have access to your data, we will de-identify it. This means we take out names, dates of birth [include others, if any] and other personal details. Your participation in this research project and any information obtained within this research project that can identify you will remain confidential, except as required by law [circumstances in which confidential information can be released by law should be explained to the participant].
CONP Portal Community Sever	Sharing of de-identified data via the CONP Portal, an open-access platform that researchers the world over may access	After having gone through the described de-identification process, a dataset made of your de-identified data and those of other participants will be uploaded to [specify the online platform or give a list of potential options if known, e.g., Zenodo, LORIS, etc.]. The link to the dataset will then be included on the Canadian Open Neuroscience Platform (CONP) Portal to be openly shared with the scientific community. As an Open Science platform, CONP aims to make scientific data as easily accessible as possible to the scientific community while respecting your privacy. CONP is publicly available to anyone with internet access. Researchers from around the world will use the CONP Portal to access the de-identified data from this research study and other studies. This maximizes the potential for this project’s research data to be used for additional research that may lead to important scientific discoveries.	With your consent, we will share de-identified information with other researchers from around the world who would use it to improve patient care or advance scientific knowledge, for clinical and/or general research purposes. Your information will be shared with others through open-access databases, such as the Canadian Open Neuroscience Platform (CONP). CONP is publicly available to anyone with internet access. General information, such as age, race, or sex, or your de-identified [specify data type, e.g., MRI images] may be shared in these types of databases.
Commercial use	Use of de-identified data for commercial purposes	It is possible that future research conducted using datasets that include both your de-identified data and those of others will lead to the development of commercial products. These may include new therapeutics, diagnostic tests, or even software programs. In such cases, no part of the revenue generated from their development or sale will be shared.	Some of the research done with the information stored in the databases may one day lead to the development of software, tests, drugs, or other commercial products. If this happens, you will not receive any of the profits.
Data withdrawal	Not possible to withdraw data that has already been shared	The choice to participate is always yours. You may withdraw from the study at any time. If you withdraw, no additional data will be collected from you. We will continue to use any data already collected unless you tell us otherwise. Once data have been de-identified and shared via CONP, they cannot be deleted. Your privacy is important to us, and your data will continue to benefit from the privacy protections we use for all participants.	You are free to withdraw from the project at any stage. If you withdraw before testing and data is collected, we will not continue. If you withdraw after testing and data is collected, we will use any information already collected unless you tell us not to. However, if your data has already been shared it may not be possible to retrieve or remove all your data. You can withdraw your data at any time by contacting [name of relevant person] free of charge at [information]. Data sent to other researchers around the world cannot be withdrawn if already used or published.
Re-identification	Low risk that the participant could be re-identified in the future	Your privacy will be protected through advanced de-identification techniques. Despite this, this is always a small risk that your shared data may one day enable someone to identify you. For example, someone could use information in your clinical record to match you in a dataset. This is difficult to do because of the de-identification measures. Extensive sharing of personal information through social media or genealogical websites may also make it easier to re-identify you. With technological advances, however, it may become less difficult. Unable to foresee such advances, we use today’s best protections. In the remote chance that you are re-identified, you or your relatives may suffer a loss of privacy or potentially be subject to discrimination	Some participants worry about being identified as someone taking part in the project. The chance of this happening is extremely small [consider including an estimation of such risk, such as “i.e., less than 1%”], and we will do everything we can to prevent this from happening. Much like fingerprints, it is possible to identify someone if certain pieces of information are put together. While we use very strict data security measures to protect your privacy, there is always a small risk that your data may lead to you being re-identified. As technology advances, there may be new ways of linking data back to you that we cannot foresee today. Like other medical information, this may one day affect your insurability or your employment.

Modified from the Global Alliance for Genomics and Health’s model consent clauses.

The CONP Privacy and De-identification Toolkit (live from GitHub)

CONP Privacy and De-identification Guide (v1.0.4) [1]

Unless otherwise noted, this work is licensed under Attribution 4.0 International

The fundamental goal of protecting participant privacy is to prevent the identification of an individual’s data by others. In keeping with relevant guidance from Canadian privacy authorities, data must present a low or very low likelihood of individual re-identification before open release, roughly analogous to a maximum threshold of 9% risk of re-identification. Open access datasets should never contain individually identifying information such as names, health card numbers, or social insurance numbers.

Preparing your data: de-identification techniques

The CONP Portal brings together diverse datasets that contain many different types of information in several modalities and formats, e.g., structural and functional MRI, EEG, and behavioural results. The guidance below seeks to help researchers prepare neuroscience data for deposit in Canadian open-access data repositories that accept de-identified data, including the CONP Portal Community Server.

This guide is not comprehensive and may need to be tailored to your data. Researchers bear the responsibility to ensure that tools are applied properly and that information is de-identified before sharing.

General tools

Tool name	Description	Link(s)
Portage Network’s De-Identification Guidance	Guidance about de-identification for many data types and research uses, includes neuroimaging and medical data de-identification information.	https://doi.org/10.5281/zenodo.4270551
NITRC’s De-identification Toolbox	Java application that removes identifying information from neuroimaging datasets.	https://www.nitrc.org/projects/de-identification/ Research paper describing the tool
OpenAIRE’s Amnesia	Java application that removes identifying information from delimited text files.	https://amnesia.openaire.eu/index.html

Tool name

Description

Link(s)

Portage Network’s De-Identification Guidance

Guidance about de-identification for many data types and research uses, includes neuroimaging and medical data de-identification information.

https://doi.org/10.5281/zenodo.4270551

NITRC’s De-identification Toolbox

Java application that removes identifying information from neuroimaging datasets.

https://www.nitrc.org/projects/de-identification/

Research paper describing the tool

OpenAIRE’s Amnesia

Java application that removes identifying information from delimited text files.

https://amnesia.openaire.eu/index.html

Image headers

Tool name	Description	Link
pydicom’s deid	Removes information from image headers (customizable).	https://pydicom.github.io/deid/
dicomanon (for MATLAB)	Removes confidential medical information from the DICOM file file_in and creates a new file file_out with the modified values. Image data and other attributes are unmodified.	https://www.mathworks.com/help/images/ref/dicomanon.html

Facial and dental features

Tool name	Description	Link
FMRIB Software Library’s Brain Extraction Tool (BET)	Removes non-brain tissue from whole-head images.	https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/BET
AFNI’s 3dSkullStrip	Extracts brain tissue from MRI T1-weighted images.	https://afni.nimh.nih.gov/pub/dist/doc/program_help/3dSkullStrip.html
Laboratory for Computational Neuroimaging’s FreeSurfer Software Suite	Comprehensive software suite that includes tools for skull stripping.	https://surfer.nmr.mgh.harvard.edu/
Peer Herholz’s BIDSonym	Gathers T1w images from a BIDS dataset and applies a selected de-identification algorithm. Either: mri_deface PyDeface Quickshear mridefacer	https://github.com/PeerHerholz/BIDSonym

Cf also: https://community.imagingqa.com/docs

Synthetic data

Synthetic data are data that have been generated from either “real” data or models and that possess the same statistical properties as the original data. While not completely free of re-identification risks, synthetic data are increasingly popular for machine-learning applications.

Tool name	Description	Link
SYLLS’ synthpop package for R	Creates synthetic versions of data.	https://cran.r-project.org/web/packages/synthpop/index.html Research paper describing the tool

Tool name

Description

Link

SYLLS’ synthpop package for R

Creates synthetic versions of data.

https://cran.r-project.org/web/packages/synthpop/index.html

Research paper describing the tool

Articles

Vaden, Kenneth I., Mulugeta Gebregziabher, Dyslexia Data Consortium, and Mark A. Eckert. “Fully Synthetic Neuroimaging Data for Replication and Exploration.” NeuroImage 223 (December 1, 2020): 117284. https://doi.org/10.1016/j.neuroimage.2020.117284.

El Emam, Khaled, Lucy Mosquera, and Jason Bass. “Evaluating Identity Disclosure Risk in Fully Synthetic Health Data: Model Development and Validation.” Journal of Medical Internet Research 22, no. 11 (November 16, 2020): e23139. https://doi.org/10.2196/23139.

Deciding between open, registered, and controlled access

Access model	Description	Identifiability of data
Open	Accessible with minimal restrictions or verifications.	Fully de-identified data (both direct and indirect identifiers removed) and/or aggregate data.
Registered	Accessible only to users who have an account that has identified them as a bona fide researcher.	De-identified data and/or aggregate data where inferences may be made about indirectly identifying individual records.
Controlled	Accessible only upon review of a data access application, which includes prior approval by a research ethics board.	Individual-level data with direct identifiers removed or replaced by a code.

Articles

Dyke, Stephanie O. M., Mikael Linden, Ilkka Lappalainen, Jordi Rambla De Argila, Knox Carey, David Lloyd, J. Dylan Spalding, et al. “Registered Access: Authorizing Data Access.” European Journal of Human Genetics 26, no. 12 (December 2018): 1721–31. https://doi.org/10.1038/s41431-018-0219-y.

Dyke, Stephanie O. M., Emily Kirby, Mahsa Shabani, Adrian Thorogood, Kazuto Kato, and Bartha M. Knoppers. “Registered Access: A ‘Triple-A’ Approach.” European Journal of Human Genetics 24, no. 12 (December 2016): 1676–80. https://doi.org/10.1038/ejhg.2016.115.

Additional resources

Basic concepts

Chapter 5: Privacy and Confidentiality of the Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans https://ethics.gc.ca/eng/tcps2-eptc2_2018_chapter5-chapitre5.html.

Sensitive Data Expert Group. “Sensitive Data Toolkit for Researchers Part 1: Glossary of Terms for Sensitive Data Used for Research Purposes,” September 30, 2020. https://doi.org/10.5281/zenodo.4088946.

Sensitive Data Expert Group. “Sensitive Data Toolkit for Researchers Part 2: Human Participant Research Data Risk Matrix,” October 1, 2020. https://doi.org/10.5281/zenodo.4088954.

Canadian Institutes of Health Research. “CIHR Best Practices for Protecting Privacy in Health Research,” September 15, 2005. https://cihr-irsc.gc.ca/e/documents/et_pbp_nov05_sept2005_e.pdf.

Beauvais, Michael J.S., Bartha Maria Knoppers, and Judy Illes. “A Marathon, Not a Sprint – Neuroimaging, Open Science and Ethics.” NeuroImage 236 (August 1, 2021): 118041. https://doi.org/10.1016/j.neuroimage.2021.118041.

Data management plans

Morissette, Erica, Lina Harper, Isabella Peters, Felicity Tayler, and Stefanie Haustein. “Data Management Plan Template: Open Science Workflows,” April 9, 2021. https://doi.org/10.5281/zenodo.4701021.

Strauss, Ted. “Data Management Plan Template: Neuroimaging in the Neurosciences,” April 9, 2021. https://doi.org/10.5281/zenodo.4673558.

Creation of open access datasets

Tremblay-Mercier, Jennifer, Cécile Madjar, Samir Das, Alexa Pichet Binette, Stephanie O. M. Dyke, Pierre Étienne, Marie-Elyse Lafaille-Magnan, et al. “Open Science Datasets from PREVENT-AD, a Longitudinal Cohort of Pre-Symptomatic Alzheimer’s Disease.” BioRxiv, November 30, 2020, 2020.03.04.976670. https://doi.org/10.1101/2020.03.04.976670.

Tremblay-Mercier, Jennifer, Cécile Madjar, Samir Das, Stephanie O. M. Dyke, Pierre Étienne, Marie-Elyse Lafaille-Magnan, Pierre Bellec, et al. “Creation of an Open Science Dataset from PREVENT-AD, a Longitudinal Cohort Study of Pre-Symptomatic Alzheimer’s Disease.” BioRxiv, March 5, 2020, 2020.03.04.976670. https://doi.org/10.1101/2020.03.04.976670.

Data governance

Eke, Damian, Amy Bernard, Jan G. Bjaalie, Ricardo Chavarriaga, Takashi Hanakawa, Anthony Hannan, Sean Hill, et al. “International Data Governance for Neuroscience.” PsyArXiv, June 1, 2021. https://doi.org/10.31234/osf.io/esz9b.

[1] Developed by the Ethics and Governance Committee of the Canadian Open Neuroscience Platform. Members: Bartha Maria Knoppers (Chair), Michael Beauvais (Manager), Ann Cavoukian, John Clarkson, Lindsay Green-Noble, Judy Iles, Jason Karamchandani, Roland Nadler, Dylan Roskams-Edris, and Walter Stewart.