Ethics Toolkit

In response to a growing need in the neuroscience community for concrete guidance concerning ethically sound and pragmatically feasible open data-sharing, the CONP has created an ‘Ethics Toolkit’, currently comprised of:

Together, these documents are meant to help researchers identify key elements in the design of their projects that are often required for the open sharing of neuroscience data, such as model consent language and approaches to de-identification.

This guidance is the product of extended discussions and careful drafting by the CONP Ethics and Governance Committee that considers both Canadian and international ethical frameworks and research practice.  The best way to cite these resources is with their associated Zenodo DOI:

We encourage you to share feedback on these materials by commenting on their source GitHub repository or by contacting the CONP at info@conp.ca.

What follows is a live rendering the latest versions of the toolkit documents, as maintained on GitHub and Zenodo.

The CONP Consent Toolkit (live from GitHub)

CONP Consent Toolkit

This consent toolkit is the product of extended discussions and careful drafting that considers both Canadian and international ethical frameworks and research practice.

In adapting this consent toolkit for your own research needs, we encourage you to mention to your research ethics board that your consent materials closely follow the templates created by the CONP Ethics and Governance Committee. The Committee’s members include internationally recognized experts in research ethics, neuroethics, data governance, and law. The Committee’s members are Bartha Maria Knoppers (Chair), Michael Beauvais (Manager), Ann Cavoukian, John Clarkson, Lindsay Green-Noble, Judy Illes, Jason Karamchandani, Roland Nadler, Dylan Roskams-Edris, and Walter Stewart.

Further feedback was graciously provided from members of the CONP community, in particular Patrick Bermudez, Marcel Farrés Franch, Jessica Royer, and Robert Zatorre.

Unless otherwise noted, this work is licensed under Attribution 4.0 International

This document assists researchers in contributing data sets to the Portal of the Canadian Open Neuroscience Platform (CONP).

The CONP accepts data sets in two situations:

  1. Participants have prospectively consented to the sharing of their de-identified data through the CONP Portal.

  2. Participants have not consented to the sharing of their data on the CONP Portal, but the consent given meets the core consent elements for retrospective consent.

Each of these situations will be explained.

Part 1. Core consent elements (prospective consent)

Researchers and/or clinicians should seek consent to the sharing of de-identified data sets through the CONP Portal prospectively. Depending on the circumstances, consent should be sought from either the prospective participant directly (first-person consent) or from their legally authorized representative (substitute consent). First-person consent is appropriate for adults with legal capacity. Substitute consent is for situations where the prospective participant lacks legal capacity either due to age (i.e., minors) or due to an inability to understand the nature and consequences of research participation (i.e., limited cognitive ability). In these circumstances, the prospective participant should be involved in the consent process to the fullest extent of their ability to do so.

In addition to the required consent elements common to research with human participants, prospective consent forms should include the following core elements to enable broad, open sharing via the CONP Portal:

To contribute data sets to the CONP Portal, participants should consent to:
Data generation Generation of participant data for research purposes
De-identification (coding, anonymization, and synthetic data generation)

De-identification of their data, which may consist of a combination of the following processes:

Coding – the removal of direct identifiers (e.g., name; health insurance number) and replaced with a code whose key is kept in a secure location

Anonymization – the permanent removal of direct identifiers without the conservation of any keys

Data synthesis – the creation of a data set with similar statistical properties as the original data set but without a one-to-one match for every variable

CONP Portal Sharing of de-identified data via the CONP Portal, an open-access platform that researchers the world over may access
Commercial use Use of de-identified data for commercial purposes
Data withdrawal Not possible to withdraw data that has already been shared
Re-identification Low risk that the participant could be re-identified in the future

Table 1. Core consent elements for contributing data sets to the CONP Portal.

If any of the elements in Table 1 have not been included in the informed consent documents, data sets should not be uploaded to the CONP Portal. In such circumstances, we recommend you examine the retrospective filter below. If doubts still remain, or you have questions about the suitability of these consent elements for your research, please consult with your local research ethics board.

We recognize that some research projects may choose to have a tiered consent model, whereby participants consent to only certain data types being publicly shared via the CONP Portal or other open data repository. In such circumstances, the essential consent elements are only required with respect to those data types. It is the researcher’s responsibility to ensure that data segmentation is appropriately managed so that only those data types for which consent to open data sharing has been given are shared.

See Annex 1 for suggested clauses to include in the informed consent documentation.

Part 2. Retrospective consent filter

Use the filter below to determine if the consent already obtained from participants will permit you to include the data on the CONP Portal.

Step 1: Answer the following questions.

Have your participants consented to: Yes No
Deposit of de-identified data in open-access databases?
Commercial use?

Step 2: If all your responses in step 1 were “yes”, your data can be submitted for inclusion on the CONP Portal. If any of the responses were “no”, answer the following questions.

Yes No
Have your participants consented to be re-contacted?
Is it feasible (i.e., not onerous to the degree of jeopardizing your research) to re-contact and consent your participants for inclusion on the CONP Portal?

Step 3: If both answers in step 2 were “yes”, re-contact your participants and obtain their consent using the example consent clauses in Annex 1. Otherwise, answer the following question.

NB Only those participants who were re-contacted and gave appropriate consent can be included in the data set included on the CONP Portal. If you have recontacted your participants and they have withheld consent, you are unlikely to obtain a waiver.

Yes No
Are you able to apply to your research ethics board for a waiver, i.e., an authorization to alter the initial consent parameters?

Step 4: If the answer in step 3 was “yes”, apply to your research ethics board for a waiver. If the answer was “no”, your data cannot be included on the CONP Portal.

Version Date (YYYY-MM-DD) Modification summary
1.0.1 2021-09-30 Small typos
1.0 2021-07-28 Initial release

Annex 1. Example consent clauses for the essential consent elements.

The example consent clauses are marked with CC0 1.0 Universal
Essential elements Example of consent clause language: Alternative consent clause [1]:
Data generation The generation of participant data for research purposes

You are being asked to consent to [include relevant procedure, e.g., imaging] procedure(s) that will generate data about your [include relevant data types, e.g., cognitive performance] for research purposes.

[If questionnaires are also included…]

You will also be asked questions about [include description of the types of questions, e.g., personal biographical information, health status, etc.]

N/A
De-identification (coding, anonymization, and synthetic data generation) Removal of direct and indirect identifiers, the restructuring of data, and the generation of synthetic data for open sharing

To protect your privacy and to facilitate the broadest possible, public sharing of the data that you and other participants contribute, your name, date of birth and other directly identifying information will not be shared.

Some data, [e.g., name; date of birth; occupation; etc.], will be replaced by a code.

Other data that we believe may allow others to determine who you are, such as your [include relevant feature, e.g., facial features] will be irreversibly changed through a family of processes called anonymization so that you cannot be identified.

Other information, such as [include relevant data, e.g., age group; location], will be pooled together with those of other participants for high-level summaries.

Your privacy is very important to us, and we will take appropriate measures to protect it. We will not disclose any information about you like your name, your date of birth, your address, or your contact information to unauthorized persons. All personal identifying information will be replaced with a unique code.

Before researchers have access to your data, we will de-identify it. This means we take out names, dates of birth [include others, if any] and other personal details.

Your participation in this research project and any information obtained within this research project that can identify you will remain confidential, except as required by law [circumstances in which confidential information can be released by law should be explained to the participant].

CONP Portal Sharing of de-identified data via the CONP Portal, an open-access platform that researchers the world over may access

After having gone through the described de-identification process, a data set made of your de-identified data and those of other participants will be uploaded to [specify the online platform or give a list of potential options if known, e.g., Zenodo, LORIS, etc.]. The link to the data set will then be included on the Canadian Open Neuroscience Platform (CONP) Portal to be openly shared with the scientific community.

As an Open Science platform, CONP aims to make scientific data as easily accessible as possible to the scientific community while respecting your privacy.

CONP is publicly available to anyone with internet access. Researchers from around the world will use the CONP Portal to access the de-identified data from this research study and other studies. This maximizes the potential for this project’s research data to be used for additional research that may lead to important scientific discoveries.

With your consent, we will share de-identified information with other researchers from around the world who would use it to improve patient care or advance scientific knowledge, for clinical and/or general research purposes. Your information will be shared with others through open-access databases, such as the Canadian Open Neuroscience Platform (CONP). CONP is publicly available to anyone with internet access. General information, such as age, race, or sex, or your de-identified [specify data type, e.g., MRI images] may be shared in these types of databases.
Commercial use Use of de-identified data for commercial purposes It is possible that future research conducted using data sets that include both your de-identified data and those of others will lead to the development of commercial products. These may include new therapeutics, diagnostic tests, or even software programs. In such cases, no part of the revenue generated from their development or sale will be shared. Some of the research done with the information stored in the databases may one day lead to the development of software, tests, drugs, or other commercial products. If this happens, you will not receive any of the profits.
Data withdrawal Not possible to withdraw data that has already been shared

The choice to participate is always yours. You may withdraw from the study at any time. If you withdraw, no additional data will be collected from you. We will continue to use any data already collected unless you tell us otherwise.

Once data have been de-identified and shared via CONP, they cannot be deleted. Your privacy is important to us and your data will continue to benefit from the privacy protections we use for all participants.

You are free to withdraw from the project at any stage. If you withdraw before testing and data is collected, we will not continue. If you withdraw after testing and data is collected, we will use any information already collected unless you tell us not to. However, if your data has already been shared it may not be possible to retrieve or remove all your data.

You can withdraw your data at any time by contacting [name of relevant person] free of charge at [information]. Data sent to other researchers around the world cannot be withdrawn if already used or published.

Re-identification Low risk that the participant could be re-identified in the future Your privacy will be protected through advanced de-identification techniques. Despite this, this is always a small risk that your shared data may one day enable someone to identify you. For example, someone could use information in your clinical record to match you in a data set. This is difficult to do because of the de-identification measures. Extensive sharing of personal information through social media or genealogical websites may also make it easier to re-identify you. With technological advances, however, it may become less difficult. Unable to foresee such advances, we use today’s best protections. In the remote chance that you are re-identified, you or your relatives may suffer a loss of privacy or potentially be subject to discrimination

Some participants worry about being identified as someone taking part in the project. The chance of this happening is extremely small [consider including an estimation of such risk, such as “i.e., less than 1%”], and we will do everything we can to prevent this from happening.

Much like fingerprints, it is possible to identify someone if certain pieces of information are put together. While we use very strict data security measures to protect your privacy, there is always a small risk that your data may lead to you being re-identified. As technology advances, there may be new ways of linking data back to you that we cannot foresee today. Like other medical information, this may one day affect your insurability or your employment.

  1. Modified from the Global Alliance for Genomics and Health’s model consent clauses.

The CONP Privacy and De-identification Toolkit (live from GitHub)

Canadian Open Neuroscience Platform: Privacy and De-identification Toolkit [1]

Unless otherwise noted, this work is licensed under Attribution 4.0 International

The fundamental goal of protecting participant privacy is to prevent the identification of their data by others. In keeping with relevant guidance from Canadian privacy authorities, data must present a low or very low likelihood of individual re-identification before open release, roughly analogous to a maximum threshold of 9% risk of re-identification. Open access data sets should never contain individually identifying information such as names, health card numbers, and social insurance numbers.

Preparing your data: de-identification techniques

The CONP Portal brings together diverse data sets that contain many different types of information in several formats, e.g., structural and functional MRI, EEG, behavioural results.

This guide is not comprehensive and may need to be tailored to your data. Researchers bear the responsibility to ensure that tools are applied properly and that information is de-identified before sharing.

General tools

Tool name Description Link(s)
Portage Network’s De-Identification Guidance Guidance about de-identification for many data types and research uses, includes neuroimaging and medical data de-identification information. https://doi.org/10.5281/zenodo.4270551
NITRC’s De-identification Toolbox Java application that removes identifying information from neuroimaging datasets.

https://www.nitrc.org/projects/de-identification/

Research paper describing the tool

OpenAIRE’s Amnesia Java application that removes identifying information from delimited text files. https://amnesia.openaire.eu/index.html

Image headers

Tool name Description Link
pydicom’s deid Removes information from image headers (customizable). https://pydicom.github.io/deid/
dicomanon (for MATLAB) Removes confidential medical information from the DICOM file file_in and creates a new file file_out with the modified values. Image data and other attributes are unmodified. https://www.mathworks.com/help/images/ref/dicomanon.html

Facial and dental features

Tool name Description Link
FMRIB Software Library’s Brain Extraction Tool (BET) Removes non-brain tissue from whole-head images. https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/BET
AFNI’s 3dSkullStrip Extracts brain tissue from MRI T1-weighted images. https://afni.nimh.nih.gov/pub/dist/doc/program_help/3dSkullStrip.html
Laboratory for Computational Neuroimaging’s FreeSurfer Software Suite Comprehensive software suite that includes tools for skull stripping. https://surfer.nmr.mgh.harvard.edu/
Peer Herholz’s BIDSonym

Gathers T1w images from a BIDS dataset and applies a selected de-identification algorithm. Either:

https://github.com/PeerHerholz/BIDSonym

Cf also: https://community.imagingqa.com/docs

Synthetic data

Synthetic data are data that have been generated from either “real” data or models and that possess the same statistical properties as the original data. While not completely free of re-identification risks, synthetic data are increasingly popular for machine-learning applications.

Tool name Description Link
SYLLS’ synthpop package for R Creates synthetic versions of data.

https://cran.r-project.org/web/packages/synthpop/index.html

Research paper describing the tool

Articles

Vaden, Kenneth I., Mulugeta Gebregziabher, Dyslexia Data Consortium, and Mark A. Eckert. “Fully Synthetic Neuroimaging Data for Replication and Exploration.” NeuroImage 223 (December 1, 2020): 117284. https://doi.org/10.1016/j.neuroimage.2020.117284.

El Emam, Khaled, Lucy Mosquera, and Jason Bass. “Evaluating Identity Disclosure Risk in Fully Synthetic Health Data: Model Development and Validation.” Journal of Medical Internet Research 22, no. 11 (November 16, 2020): e23139. https://doi.org/10.2196/23139.

Deciding between open, registered, and controlled access

Access model Description Identifiability of data
Open Accessible with minimal restrictions or verifications. Fully de-identified data (both direct and indirect identifiers removed) and/or aggregate data.
Registered Accessible only to users who have an account that has identified them as a bona fide researcher. De-identified data and/or aggregate data where inferences may be made about indirectly identifying individual records.
Controlled Accessible only upon review of a data access application, which includes prior approval by a research ethics board. Individual-level data with direct identifiers removed or replaced by a code.

Articles

Dyke, Stephanie O. M., Mikael Linden, Ilkka Lappalainen, Jordi Rambla De Argila, Knox Carey, David Lloyd, J. Dylan Spalding, et al. “Registered Access: Authorizing Data Access.” European Journal of Human Genetics 26, no. 12 (December 2018): 1721–31. https://doi.org/10.1038/s41431-018-0219-y.

Dyke, Stephanie O. M., Emily Kirby, Mahsa Shabani, Adrian Thorogood, Kazuto Kato, and Bartha M. Knoppers. “Registered Access: A ‘Triple-A’ Approach.” European Journal of Human Genetics 24, no. 12 (December 2016): 1676–80. https://doi.org/10.1038/ejhg.2016.115.

Additional resources

Basic concepts

Chapter 5: Privacy and Confidentiality of the Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans https://ethics.gc.ca/eng/tcps2-eptc2_2018_chapter5-chapitre5.html.

Sensitive Data Expert Group. “Sensitive Data Toolkit for Researchers Part 1: Glossary of Terms for Sensitive Data Used for Research Purposes,” September 30, 2020. https://doi.org/10.5281/zenodo.4088946.

Sensitive Data Expert Group. “Sensitive Data Toolkit for Researchers Part 2: Human Participant Research Data Risk Matrix,” October 1, 2020. https://doi.org/10.5281/zenodo.4088954.

Canadian Institutes of Health Research. “CIHR Best Practices for Protecting Privacy in Health Research,” September 15, 2005. https://cihr-irsc.gc.ca/e/documents/et_pbp_nov05_sept2005_e.pdf.

Beauvais, Michael J.S., Bartha Maria Knoppers, and Judy Illes. “A Marathon, Not a Sprint – Neuroimaging, Open Science and Ethics.” NeuroImage 236 (August 1, 2021): 118041. https://doi.org/10.1016/j.neuroimage.2021.118041.

Data management plans

Morissette, Erica, Lina Harper, Isabella Peters, Felicity Tayler, and Stefanie Haustein. “Data Management Plan Template: Open Science Workflows,” April 9, 2021. https://doi.org/10.5281/zenodo.4701021.

Strauss, Ted. “Data Management Plan Template: Neuroimaging in the Neurosciences,” April 9, 2021. https://doi.org/10.5281/zenodo.4673558.

Creation of open access data sets

Tremblay-Mercier, Jennifer, Cécile Madjar, Samir Das, Alexa Pichet Binette, Stephanie O. M. Dyke, Pierre Étienne, Marie-Elyse Lafaille-Magnan, et al. “Open Science Datasets from PREVENT-AD, a Longitudinal Cohort of Pre-Symptomatic Alzheimer’s Disease.” BioRxiv, November 30, 2020, 2020.03.04.976670. https://doi.org/10.1101/2020.03.04.976670.

Tremblay-Mercier, Jennifer, Cécile Madjar, Samir Das, Stephanie O. M. Dyke, Pierre Étienne, Marie-Elyse Lafaille-Magnan, Pierre Bellec, et al. “Creation of an Open Science Dataset from PREVENT-AD, a Longitudinal Cohort Study of Pre-Symptomatic Alzheimer’s Disease.” BioRxiv, March 5, 2020, 2020.03.04.976670. https://doi.org/10.1101/2020.03.04.976670.

Data governance

Eke, Damian, Amy Bernard, Jan G. Bjaalie, Ricardo Chavarriaga, Takashi Hanakawa, Anthony Hannan, Sean Hill, et al. “International Data Governance for Neuroscience.” PsyArXiv, June 1, 2021. https://doi.org/10.31234/osf.io/esz9b.

Version Date (YYYY-MM-DD) Modification summary
1.0 2021-07-28 Initial release

[1] Developed by the Ethics and Governance Committee of the Canadian Open Neuroscience Platform. Members: Bartha Maria Knoppers (Chair), Michael Beauvais (Manager), Ann Cavoukian, John Clarkson, Lindsay Green-Noble, Judy Iles, Jason Karamchandani, Roland Nadler, Dylan Roskams-Edris, and Walter Stewart.