Journal:Towards a risk catalog for data management plans

From LIMSWiki
Jump to navigationJump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
Full article title Towards a risk catalog for data management plans
Journal International Journal of Digital Curation
Author(s) Weng, Franziska; Thoben, Stella
Author affiliation(s) Kiel University
Primary contact Email: franziskaweng at web dot de
Year published 2020
Volume and issue 15(1)
Page(s) 18
DOI 10.2218/ijdc.v15i1.697
ISSN 1746-8256
Distribution license Creative Commons Attribution 4.0 International
Website http://www.ijdc.net/article/view/697
Download http://www.ijdc.net/article/view/697/614 (PDF)

Abstract

Although data management and its careful planning are not new topics, there is little published research on risk mitigation in data management plans (DMPs). We consider it a problem that DMPs do not include a structured approach for the identification or mitigation of risks, because it would instill confidence and trust in the data and its stewards, and foster the successful conduction of data-generating projects, which often are funded research projects. In this paper, we present a lightweight approach for identifying general risk in DMPs. We introduce an initial version of a generic risk catalog for funded research and similar projects. By analyzing a selection of 13 DMPs for projects from multiple disciplines published in the Research Ideas and Outcomes (RIO) journal, we demonstrate that our approach is applicable to DMPs and transferable to multiple institutional constellations. As a result, the effort for integrating risk management in data management planning can be reduced.

Keywords: data management plan, data management, risk management, risk assessment, information security

Introduction

University of New Mexico's William Michener describes a data management plan (DMP) as "a document that describes how you will treat your data during a project and what happens with the data after the project ends.”[1] The Digital Curation Centre's (DCC) Martin Donnelly notes that DMPs “serve to mitigate risks and help instill confidence and trust in the data and its stewards.”[2] Sarah Jones, also of the DCC, adds that “planning for the effective creation, management, and sharing of your data enables you to get the most out of your research.”[3] As such, the creation of a DMP should not only happen for obtaining a grant but also for successfully conducting the proposed project.

According to ISO 31000[4], a risk is “an effect of uncertainty on objectives.” Data management plans should help to decrease effects of uncertainty on project objectives. We consider it a problem that neither DMPs nor funders’ DMP evaluation schemes include a structured approach for the identification or mitigation of risks, since this would foster the successful conduction of data-generating projects, which often are funded research projects. We believe our approach will help funders evaluate risks of proposed projects and hence the risks of their investment options.

Data management maturity models like the Data Management Maturity (DMM) model[5] or the Enterprise Information Management (EIM) maturity model[6] are primarily designed for enterprises and may not be feasible for higher education institutions (HEIs). A rigid model for HEIs to coordinate support of data management and sharing across a diverse range of actors and processes to deliver the necessary technological and human infrastructures “cannot be prescribed since individual organizations and cultures occupy a spectrum of differences.”[7] Also, there is a potential conflict between organizational demands and scientific freedom. The Charter of Fundamental Rights of the E.U. contains scientific freedom as a constitutional right, and researchers may view the imposition of specific data management processes as a restriction of their scientific freedom. On an even more international level, the UNESCO recommends that “each Member State should institute procedures adapted to its needs for ensuring that, in the performance of research and development, scientific researchers respect public accountability while at the same time enjoying the degree of autonomy appropriate to their task and to the advancement of science and technology.”[8]

We consider it important, that researchers commit themselves to data management practices like e.g., ISO 31000. However, ISO 31000 defines the risk management process as a feedback loop to be conducted in organizations.[4] Projects tend to have a much more limited scope with regard to funding and duration than organizations. Therefore, we regard the ISO 31000 risk management process as too time-consuming and of limited suitability for funded research and similar projects.

In this paper, we propose a lightweight approach for the identification of general risks in DMPs. We introduce an initial version of a generic risk catalog for funded research and similar projects. By analyzing a selection of 13 DMPs for projects from multiple disciplines published in the Research Ideas and Outcomes (RIO) journal[9][10][11][12][13][14][15][16][17][18][19][20][21], we demonstrate that our approach is applicable and transferable to multiple institutional constellations. As a result, the effort for integrating risk management in data management planning can be reduced.

Related work

Jones et al. developed a guide for HEIs “to help institutions understand the key aims and issues associated with planning and implementing research data management (RDM) services.”[7] In this guide, the authors mention data management risks for HEIs. They note that While the upfront costs for cheap storage of active data “may be only a fraction of those quoted by central services, the risks of data loss and security breaches are significantly higher, potentially leading to far greater costs in the long term.”[7] Additionally, there are “potential legal risks from using third-party services.”[7] However, data selection counters the risks of “reputational damage from exposing dirty, confidential, or undocumented data that has been retained long after the researchers who created it have left.”[7]

The OSCRP working group developed the OSCRP (Open Science Cyber Risk Profile), which “is designed to help principal investigators (PI) and their supporting information technology (IT) professionals assess cybersecurity risks related to open science projects.”[22] The OSCRP working group proposes that principal investigators examine risks, consequences and avenues of attack for each mission critical science asset on an inventory list, whereas assets include devices, systems, data, personnel, workflows, and other kinds of resources.[22] We regard this as a very detailed alternative to our approach, but FAIR Guiding Principles[23] and long-term preservation need to be added.

In 2014, Ferreira et al.[24] “propose an analysis process for eScience projects using a data management plan and ISO 31000 in order to create a risk management plan that can complement the data management plan.” The authors describe an analytical process for creating a risk management plan and “present the previous process’ validation, based on the MetaGen-FRAME project.”[24] Within this validation Ferreira et al. also identify a project’s task-specific risks, e.g., “R6: Loss of metadata, denying the representation of the output information to the user via Taverna.”[24] This risk is tailored to the use of Taverna and hence may not be relevant for the majority of funded research and similar projects. There may be projects for which analyzing specific risks for all resources may be crucial. However, a detailed risk analysis may require a considerable amount of work.

Methods

We propose a lightweight approach that can serve as a starting point to include risk management in research data management planning. It doesn’t preclude detailed approaches like OSCRP[22] or ISO 31000.[4] Instead, we propose an approach which tries to reduce and maybe avoid the burden of a full risk management process like, e.g., ISO 31000. Our approach is based on a pre-tailored and extensible general risk catalog (Table 1) to lessen the effort required for risk management. We derived part of this risk catalog from 29 interviews with researchers from multiple disciplines[a], which we conducted as part of project SynFo: Creating synergies on the operational level of research data management.[25] One goal of project SynFo was the development of a transferable approach to improve research data management in multiple organizational constellations. In generalized content from the interviews, we identified risks entailed by interfaces of information, e.g., between researchers and data subjects or between researchers and external service providers. For the development of our approach, we also consulted the catalogs for threats and measures from the supplement of the “IT-Grundschutz” catalogs[26] by the German Federal Office for Information Security (BSI), the FAIR Guiding Principles[23], and the report and action plan from the European Commission expert group on FAIR data.[27]

Table 1. General risk catalog
Risk category Risk [CODE] Possible risk source
Legal Penalty for conducting unreported notifiable practices [RLEGU] Physical sample collection
Penalty for unpermitted usage of external data [RLEGE] Processing external data
Penalty for unpermitted usage of personal data [RLEGP] Processing personal data
Penalty for conducting inadequate data protection practices [RLEGD] Using an external service provider for processing personal data
Privacy Loss of confidentiality through sending data to an unintended recipient [RPRIR] Correspondence
Loss of confidentiality through interception or eavesdropping of information [RPRII] Online data transmission
Loss of confidentiality through loss or theft of portable storage media or devices [RPRIS] Portable storage media or devices
Loss of confidentiality through careless data handling by an external party [RPRIE] Sharing data with an external party without publication purposes
Technical Unavailability through data corruption [RTECC] Data processing
Unavailability through data loss [RTECL] Data storage
Science Poor knowledge discovery or reusability for stakeholders cannot find the data [RSCIF] Searchable information not planned
Poor knowledge discovery or reusability for stakeholders cannot access the data [RSCIA] Sharing location not planned
Poor knowledge discovery or reusability for stakeholders cannot integrate the data [RSCII] File format not planned
Poor knowledge discovery or reusability for stakeholders cannot reuse the data [RSCIR] Licensing and context information not planned
Preservation Unsustainability in the long-term through unavailability or discontinuity of financial support [RPREU] Preservation location not planned

Our risk identification includes risks, their possible risk sources, mitigation approaches, and consequences. By analyzing occurrences and mitigations of risks from our catalog within a selection of 13 DMPs from multiple disciplines[b], published in the ‘’RIO’’ journal, we demonstrate that our lightweight approach is applicable to DMPs and transferable to multiple institutional constellations. We evaluate the occurrences of the 15 risks in our catalog by identifying possible risk sources in each of the selected DMPs and analyze the risk mitigations in accordance to what the authors wrote.

Risks

Legal risks

A breach of a regulation like the General Data Protection Regulation (GDPR) or the Nagoya Protocol can result in high fines. At worst, compliance breaches can lead to reputational damages, legal disputes, and enormous cost.

Penalty for conducting unreported notifiable practices [RLEGU]

Research may include reportable research practices like the collection of physical samples regulated by the Nagoya Protocol on Access to Genetic Resources and the Fair and Equitable Sharing of Benefits Arising from their Utilization, which was transposed into E.U. law by Regulation (EU) No 511/2014. Under this regulation, there is a reporting obligation if the research on genetic resources is financially supported (Art. 7, Sec. 1) and if the research is in the final stage of development of a product that is based on the utilization of genetic resources (Art. 7, Sec.2).[28] Article 11 says that “Member States shall lay down the rules on penalties applicable to infringements of Articles 4 and 7 and shall take all the measures necessary to ensure that they are applied.”[28] The Nagoya Protocol “and EU documents themselves give no guidance on penalties, each country has the liberty to determine these.”[29] Consequences may be fines of up to EUR 810,000 or even imprisonment.[29] To avoid penalties, the parties should comply strictly with the rules. The Convention on Biological Diversity publishes a detailed list of parties to the Nagoya Protocol.[c]

Penalty for unpermitted usage of external data [RLEGE]

In many countries, data by themselves do not have inherent legal protection. License contracts can reach various agreements concerning terms of use. Free licenses make (data) objects available for utilization to everyone, but usage can be restricted or conditioned. Creative Commons (CC) licenses and the GNU General Public License (GPL), which is specialized for free software, are widely used. Nonetheless, using CC licenses can lead to conflicting rights of third parties. Publicity, personality, and privacy rights “not held by the licensor are not affected and may still affect your desired use of a licensed work.”[30] “If there are any third parties who may have publicity, privacy, or personality rights that apply, those rights are not affected by your application of a CC license, and a reuser must seek permission for relevant uses.”[30] This example holds for pictures of persons. Also, the GNU GPL license imposes transitive obligations, e.g., “derivative programs must also be subject to the same initial GPL conditions of ability to copy, modify, or redistribute.”[31] To mitigate the risk of unpermitted usage of external data, it is recommended to abide by the license terms. In general, an overview about the data and the related licenses can be developed in the DMP or within the framework of a data policy.

Penalty for unpermitted usage of personal data [RLEGP]

In the E.U., the GDPR governs the processing of personal data. Articles 6 and 7 of the GDPR regulate the lawfulness of processing and the conditions of consent. On an international level, the European Commission can conduct an assessment to “ensure that the level of data protection in a third country or international organization is essentially equivalent to the one established by the E.U. legislation.”[32] Canada (commercial organizations), Israel, Switzerland, Japan, and the U.S. (limited to the Privacy Shield Framework) offer an adequate level of data protection.[33] To avoid penalties, it is recommended to receive written consent forms from data subjects, including information about the purpose and procedures of data processing.

Penalty for conducting inadequate data protection practices [RLEGD]

Article 5 of the GDPR enumerates principles related to processing of personal data: the principle of lawfulness, fairness and transparency, purpose limitation, data minimization, accuracy, storage limitation, integrity, and confidentiality, as well as accountability. According to Article 45 of the GDPR, “A transfer of personal data to a third country or an international organization may take place where the Commission has decided that the third country, a territory or one or more specified sectors within that third country, or the international organization in question ensures an adequate level of protection. Such a transfer shall not require any specific authorization.”[34] Countries without adequacy, which are not classified as safe third countries, can guarantee protection in other ways, for example by using appropriate safeguards (Art. 6, GDPR) or binding corporate rules (Art. 7, GDPR). To avoid penalties, it is recommended to abide by the applicable laws. In case of doubt, researchers can contact the appropriate (data protection) authorities.

Privacy risks

A loss of confidentiality can have adverse effects on an organization like financial effects.[26] These effects may also apply to a researcher who additionally may want to keep research data confidential before scientific output is published, so that research data will not be subject to theft of work.

Loss of confidentiality through sending data to an unintended recipient [RPRIR]

Correspondence has the intrinsic potential that a researcher transmits data to an unintended recipient. This may happen accidentally or as the result of a fraudulent attack like social engineering and leads to loss of confidentiality. “Social engineering is a method used to gain unauthorized access to information or IT systems by social action.”[26] Researchers should take extra care when sending confidential information and be aware of fraudulent attacks.

Loss of confidentiality through interception or eavesdropping of information [RPRII]

In the supplement of the IT-Grundschutz catalogs, the BSI specifies the threats of interception or eavesdropping of information, which entail the risk of loss of confidentiality.[26] “Since data is sent using unforeseeable routes and nodes on the internet, the sent data should only be transmitted in an encrypted form, as far as possible.”[26]

Loss of confidentiality through loss or theft of portable storage media or devices [RPRIS]

“Portable terminal devices and mobile data media in particular can be lost easily”[26] or even be stolen. “Whenever possible, mobile data media such as USB sticks and laptops should always be encrypted completely even if they are only occasionally used for confidential information.”[26]

Loss of confidentiality through careless data handling by an external party [RPRIE]

We regard the event that researchers share data with an external party without the purpose of publication as entailing the risk of loss of confidentiality. The external party may handle confidential data carelessly. “It can frequently be observed that there are a number of organizational or technical security procedures available in organizations, but they are then undermined through careless handling of the specifications and the technology.”[26] We recommend that researchers who share their research data to always grant specific usage rights in written form to the external party or to check if appropriate security measures are applied by the external party.

Technical risks

Data can lose their integrity or be lost[26], leading to the major risk of unavailability of data. Unavailability of the correct data through silent corruption can lead to usage of incorrect data and hence to the production of incorrect results. If data are unavailable, either the project may fail or researchers need to repeat their data collection and the project will be behind schedule.

Unavailability through data corruption [RTECC]

“The integrity of information may be impaired due to different causes, e.g., manipulations, errors caused by people, incorrect use of applications, malfunctions of software, or transmission errors.”[26] “If only accidental changes need to be detected, then checksum procedures (e.g., cyclic redundancy checks) or error-correcting codes can be used.”[26] Nonetheless, there may be other scenarios where these verification techniques are insufficient.

Unavailability through data loss [RTECL]

Data may “be lost when devices or data media are damaged, lost, or stolen,”[26] hence becoming unavailable. Approaches to mitigate irretrievable losses of data are for example regular backups[26] or keeping copies in multiple storage locations.[35]

Science risks

Consequences of poor discoverability and reusability of data are that researchers may unnecessarily repeat work and that scientific outputs derived from it may fail to be comprehensible, reproducible, or traceable. Problems with reproducibility and replication “can cause permanent damage to the credibility of science,”[36] and thus why this category is referred to as "science risks."

Poor knowledge discovery or reusability for stakeholders cannot find [RSCIF], access [RSCIA], integrate [RSCII], or reuse [RSCIR] the data

Making data fundable, accessible, interoperable, and reusable (FAIR) to human and computational stakeholders is a best practice approach described in the FAIR Guiding Principles of Wilkinson et al.[23] Therefore, we include the risks that stakeholders cannot find, access, process, or reuse data in our risk catalog. Authors of DMPs can mitigate these risks as described by Wilkinson et al.[23] We abbreviated the risk names under this risk category using the term “poor knowledge discovery or reusability,” but refer to all the FAIR Guiding Principles for more information.[23]

Preservation risk

If data are not suitably preserved, scientific outputs derived from them may fail to be comprehensible, reproducible, or traceable in the long run. Data should be stored in a trusted and sustainable digital repository.[27]

Unsustainability in the long-term through unavailability or discontinuity of financial support [RPREU]

A digital preservation location has the intrinsic technical risk that data become unavailable through data loss or corruption. However, preservation locations also entail the risk of becoming unavailable when their funding ends. For example, Canhos states that discontinuity of financial support is a threat to Brazil’s Virtual Herbarium and its data sources.[9] Authors of DMPs should consider these risks when selecting a preservation location. They can mitigate the risk that data are not preserved long-term by reviewing the external preservation location’s longevity, certificates, and funding. We also suggest that attention is paid to possible migration and exit strategies like exporting and handing over data to a national data archive. This may particularly be important when the preservation location is not external.

Evaluation

When applying the risk catalog (Table 1) to the sample of 13 DMPs, we distinguish between risk occurrences themselves and risk occurrences with at least one mitigation, as show in Table 2.

Table 2. Risk occurrences (+) and risk occurrences with at least one mitigation (-) in the sample of 13 DMPs
DMP RLEGU RLEGE RLEGP RLEGD RPRIR RPRII RPRIS RPRIE RTECC RTECL RSCIF RSCIA RSCII RSCIR RPREU
Canhos, 2017[9] - + - - - + - +
Fey and Anderson, 2016[10] + + + - - - - - -
Fisher and Nading, 2016[11] + + + + + + - - - - + -
Gatto, 2017[12] + + - - - + - -
McWhorter et al., 2016[13] - - + - - - -
Neylon, 2017[14] + + + + + + - - - - - -
Nichols and Stolze, 2016[15] - - - - - - -
Pannell, 2016[16] + + + - + - - - - - -
Traynor, 2017[17] - + + + + + + - + - - - -
Wael, 2017[18] + + + + + + - + - + - + - +
White, 2016[19] - + - + - + - +
Woolfrey, 2017[20] - + - + - - - -
Xu et al., 2016[21] + + - - - + - -

Because risk sources and mitigations were not always explicitly mentioned in the 13 sample DMPs, we needed to make interpretations. Appendix A shows our interpretation notes. According to these interpretations, we then found the mitigations, as shown in Appendix B.

Evaluation results

Each of the 15 risks of our catalog occurred in at least two of the selected 13 DMPs. Table 3 summarizes our evaluation results.

Table 3. Summary of risk evaluation results
Risk from catalog % of risk occurrences mitigated Most often used mitigation strategy (in # of DMPs)
Unavailability through data loss [RTECL] 100.00 Backup (8)
Poor knowledge discovery or reusability for stakeholders cannot access the data [RSCIA] 100.00 Specific repository (7)
Poor knowledge discovery or reusability for stakeholders cannot reuse the data [RSCIR] 92.31 Specific license (9)
Unsustainability in the long-term through unavailability or discontinuity of financial support [RPREU] 76.92 Specific file formats (4), Specific data archive (4)
Poor knowledge discovery or reusability for stakeholders cannot integrate the data [RSCII] 69.23 Specific file formats (8)
Poor knowledge discovery or reusability for stakeholders cannot find the data [RSCIF] 61.54 Metadata (2)
Loss of confidentiality through careless data handling by an external party [RPRIE] 40.00 Agreement for IP rights (1), Secure external infrastructure (1)
Penalty for unpermitted usage of external data [RLEGE] 37.50 Respect usage permissions of external data (2)
Penalty for unpermitted usage of personal data [RLEGP] 25.00 Signed consent forms (1)
Unavailability through data corruption [RTECC] 15.38 Compare data from before and after transmission (1), Data quality control (1)
Loss of confidentiality through interception or eavesdropping of information [RPRII] 00.00
Penalty for conducting inadequate data protection practices [RLEGD] 00.00
Loss of confidentiality through sending data to an unintended recipient [RPRIR] 00.00
Loss of confidentiality through loss or theft of portable storage media or devices [RPRIS] 00.00
Penalty for conducting unreported notifiable practices [RLEGU] 00.00

Within the small sample of 13 DMPs, we found 34 distinct strategies to mitigate 10 of the 15 risks of our proposed catalog. Hence, we also found that for five of the 15 risks from our catalog the authors did not describe any mitigation in the corresponding DMP. These risks are legal and privacy risks and they do have possible consequences like loss of reputation or project failure through theft of work. The authors of the selected DMPs overall attach highest importance to mitigating data unavailability through data loss; making data findable, accessible, interoperable, and reusable; and ensuring their long-term digital preservation. We found that all the authors of the 13 DMPs managed to mitigate two risks from our catalog: unavailability through data loss (RTECL) and poor knowledge discoverability or reusability for stakeholders cannot access the data (RSCIA).

Conclusion

Since we identified each risk of our catalog in at least two of the selected DMPs, we conclude that our risk catalog is applicable to DMPs from multiple areas of research. In the selected DMPs, we overall find 53 of 125 (42.4%) risk occurrences not mitigated and hence see the necessity of DMP quality improvement through risk identification and mitigation planning in the data management planning phase.

We consider our approach useful for identifying general risks in DMPs. We propose that after filling out a funder’s DMP template, authors of DMPs refer to a risk catalog to identify possible risk sources and hence risks. Next, the authors should add mitigations to their DMP in the corresponding paragraph, if their DMP does not already contain one. For example, in a DMP’s paragraph in which authors write about the usage of external hard disks, they should add a sentence indicating that these external hard disks will be encrypted to mitigate the risk of loss of confidentiality through loss or theft of storage media, if their DMP does not yet contain any measures mitigating this risk.

The risk catalog may also be useful to funders, since it makes it possible for them to evaluate basic investment risks of proposed projects.

Note that many of the legal assertions in this article hold within the E.U. Applicability to non-E.U. countries may vary.

We think further research on suitable risk management approaches concerning the data management of funded research and similar projects needs to be conducted.

Appendices

Appendix A: Interpretation notes

Canhos states that discontinuity of financial support is a threat to Brazil’s Virtual Herbarium and its data sources.[9] We interpret this as the risk that data are not preserved long-term.

In Fey and Anderson's DMP[10], we interpret the choice of .txt and .csv formats as open file formats for interoperability, and we interpret the use of metadata standards as to mitigate the risk that data are not findable.

In Fisher and Nading’s DMP [11], we find only geospatial metadata mentioned. Neither documentation nor license information are mentioned in Fisher and Nading’s DMP. [11]

In Gatto’s DMP[12], we assume data are enriched or combined so that a license of the resulting data set should be derived from the source data licenses in Gatto’s DMP. We evaluate Gatto’s DMP according to the FAIR Guiding Principles for research software, as proposed by Jiménez et al.[37]

Concerning the DMP of McWhorter et al.[13], we assume that data do not have protection requirements and no risk of interception because data are made freely available for public use.

Neylon’s DMP[14] does not include collecting signed forms of consent from interviewees. Neylon also decides not to make all data anonymous and accessible.[14] Finally, the DMP does not include using file formats that are interoperable or allow re-use.

Nichols and Stolze’s DMP[15] describes migration of data from old storage media to an .xlsx format and their subsequent publication.

Concerning Pannell’s DMP[16], we think that it would be adequate to inform the responsible authority about the planned research project. Pannell does not address rights of use of external data. We interpret Pannell's term "filterable,"[16] in the context of metadata documentation, as meaning "findable."

In Traynor’s DMP[17], it is not clear if personal data are anonymized before they are uploaded in the infrastructure of an external service provider. Traynor’s DMP also contains no decisions for metadata capture or a specific long-term preservation location.[17]

As for Wael’s DMP, they plan to hire a consultant to do technical planning and system set up.[18] The DMP does not include making data interoperable. We think that utilizing academic contacts to inform the research community that the data exists, as stated by Wael[18], is not the same as making data findable.

According to White’s DMP, the project members will develop data and software “in the open,”[19] which we interpret as making data accessible. In his DMP, White does not mention long-term preservation.[19] We regard the metadata capture and user-focused documentation stated in White’s DMP[19] as making data reusable.

In Woolfrey’s DMP[20], metadata are captured for re-usability. Woolfrey’s DMP does not include making data findable.

Xu et al.[21] do not explicitly state in which states or countries they plan to collect physical samples. We make the interpretation that physical samples are registered with a persistent identifier, as described by Xu et al.[21], to make their metadata findable. Xu et al. write that for re-use and distribution, IEDA (Interdisciplinary Earth Data Alliance) would have a persistent identifier assigned to the data sets.[21]

Appendix B: Risk mitigations

Table 4. Risk mitigations
DMP Risk mitigations
Canhos, 2017[9] Aligned licensing of all data (RLEGE), Backup (RTECL), Maintain blog and social media account (RSCIF), Publicly accessible server (RSCIA), Specific fle formats (RSCIR), Specific license (RSCIR), Metadata or citation of external data (RSCIR), Standard data model (RSCIR)
Fey and Anderson, 2016[10] Backup (RTECL), Provide rights of use (RSCIR); Specific data archive (RPREU, RSCIA), Metadata (RSCIF), Metadata standard (RSCIF), Publicly accessible server (RSCIA), Specific file formats (RSCII)
Fisher and Nading, 2016[11] Backup (RTECL), Specific data archive (RPREU, RSCIA), Listing in national discipline-specific wiki (RSCIF), Listing on funders website (RSCIF), Specific file formats (RSCII)
Gatto, 2017[12] Multiple storage locations (RTECL), Specific license (RSCIR), Collaborative software development (RPREU, RSCIF), Specific repository (RSCIA), Documentation (RSCIR), Metadata or citation of external data (RSCIR)
McWhorter et al., 2016[13] Data are freely available for public use (RSCIR), Data quality control (RTECC), Backup (RTECL), Multiple preservation locations (RTECL, RPREU), Specific data archive (RPREU), Publicly accessible server (RSCIA), Specific file formats (RSCII), Metadata (RSCIR)
Neylon, 2017[14] Multiple storage locations (RTECL), Specific license (RSCIR), Multiple preservation locations (RPREU), Specific file formats (RPREU, RSCII), Persistent identifier (RSCIF), Specific repository (RSCIA), Documentation (RSCIR)
Nichols and Stolze, 2016[15] Compare data from before and after transmission (RTECC), Backup (RTECL), Multiple storage locations (RTECL), Specific license (RSCIR), Repository guarantees long-term availability (RPREU), Publish data descriptor in open-access journal (RSCIF, RSCIR), Specific repository (RSCIA), Specific file formats (RSCII), Documentation (RSCIR), Metadata (RSCIR), Persistent identifier (RSCIR)
Pannell, 2016[16] Secure external service infrastructure (RPRIE), Replicas in external service infrastructure (RTECL), Specific license (RSCIR), Preservation at institution’s library (RPREU), Specific file formats (RPREU, RSCII), Specific repository (RSCIA), Metadata (RSCIF), Persistent identifier (RSCIR)
Traynor, 2017[17] Signed consent forms (RLEGP), Multiple storage locations (RTECL), Backup (RTECL), Specific license (RSCIR), Specific file formats (RPREU), Specific repository (RSCIA), Specific file formats (RSCII), Documentation (RSCIR)
Wael, 2017[18] Multiple storage locations (RTECL), Agreement for IP rights (RPRIE), Specific license (RSCIR), Publicly accessible server (RSCIA), Anonymization (RSCIR), Documentation (RSCIR), Specific file formats (RSCIR)
White, 2016[19] Respect usage permissions of external data (RLEGE), Backup (RTECL), Specific license (RSCIR), Specific repository (RSCIA), Metadata (RSCIR), Documentation (RSCIR)
Woolfrey, 2017[20] Respect usage permissions of external data (RLEGE), Backup (RTECL), Data are freely available for public use (RSCIR), Specific data archive (RPREU, RSCIA), Specific file formats (RSCII), Documentation (RSCIR), Metadata (RSCIR), Metadata standard (RSCIR)
Xu et al., 2016[21] Multiple storage locations (RTECL), Specific license (RSCIR), Specific file formats (RPREU), Specific repository (RPREU, RSCIA), Persistent identifier for physical samples (RSCIF), Documentation (RSCIR), Metadata (RSCIR); Persistent identifier (RSCIR), Standardized vocabulary (RSCIR)

Footnotes

  1. Geo sciences (12), biology (5), humanities (5), social and behavioral sciences (4), computer science, systems engineering and electrical engineering (2), and medicine (1)
  2. Biology (4), geo sciences (4), social and behavioral sciences (3), computer science, systems engineering and electrical engineering (1), and humanities (1)
  3. Parties to the Nagoya Protocol

Acknowledgements

We thank Peter Wullinger and Klaus Stein for their constructive feedback.

Funding

This work was funded by the German Federal Ministry of Education and Research (BMBF) under the funding ID 16FDM024.

References

  1. Michener, W.K. (2015). "Ten Simple Rules for Creating a Good Data Management Plan". PLoS Computational Biology 11 (10): e1004525. doi:10.1371/journal.pcbi.1004525. 
  2. Donnelly, M. (2012). "Chapter 5: Data management plans and planning". In Pryor, G.. Managing Research Data. Facet. pp. 83–104. doi:10.29085/9781856048910.006. ISBN 9781856048910. 
  3. Jones, S. (2011). "How to Develop a Data Management and Sharing Plan". Digital Curation Centre. https://www.dcc.ac.uk/guidance/how-guides/develop-data-plan. Retrieved 19 November 2019. 
  4. 4.0 4.1 4.2 "ISO 31000:2018 Risk management — Guidelines". International Organization for Standardization. February 2018. https://www.iso.org/standard/65694.html. 
  5. "Data Management Maturity (DMM)". Information System Audit and Control Association, Inc. 2019. https://cmmiinstitute.com/data-management-maturity. Retrieved 22 November 2019. 
  6. Newman, D.; Logan, D. (23 December 2008). "Overview: Gartner Introduces the EIM Maturity Model". Gartner. https://www.gartner.com/en/documents/846312/overview-gartner-introduces-the-eim-maturity-model. 
  7. 7.0 7.1 7.2 7.3 7.4 Jones, S.; Pryor, G.; Whyte, A. (25 March 2013). "How to Develop RDM Services - A guide for HEIs". Digital Curation Centre. https://www.dcc.ac.uk/guidance/how-guides/how-develop-rdm-services. Retrieved 19 November 2019. 
  8. UNESCO (2017). "Records of the General Conference, 39th session, Paris, 30 October-14 November 2017, v. 1: Resolutions". p. 116. https://unesdoc.unesco.org/ark:/48223/pf0000260889.page=116. Retrieved 30 November 2019. 
  9. 9.0 9.1 9.2 9.3 9.4 Canhos, D.A.L. (2017). "Data Management Plan: Brazil's Virtual Herbarium". RIO 3: e14675. doi:10.3897/rio.3.e14675. 
  10. 10.0 10.1 10.2 10.3 Fey, J.; Anderson, S. (2016). "Boulder Creek Critical Zone Observatory Data Management Plan". RIO 2: e9419. doi:https://doi.org/10.3897/rio.2.e9419. 
  11. 11.0 11.1 11.2 11.3 11.4 Fisher, J.; Nading, A.M. (2016). "A Political Ecology of Value: A Cohort-Based Ethnography of the Environmental Turn in Nicaraguan Urban Social Policy". RIO 2: e8720. doi:10.3897/rio.2.e8720. 
  12. 12.0 12.1 12.2 12.3 Gatto, L. (2017). "Data Management Plan for a Biotechnology and Biological Sciences Research Council (BBSRC) Tools and Resources Development Fund (TRDF) Grant". RIO 3: e11624. doi:10.3897/rio.3.e11624. 
  13. 13.0 13.1 13.2 13.3 McWhorter, J.; Wright, D.; Thomas, J. (2016). "Coastal Data Information Program (CDIP)". RIO 2: e8827. doi:10.3897/rio.2.e8827. 
  14. 14.0 14.1 14.2 14.3 14.4 Neylon, C. (2017). "Data Management Plan: IDRC Data Sharing Pilot Project". RIO 3: e14672. doi:10.3897/rio.3.e14672. 
  15. 15.0 15.1 15.2 15.3 Nichols, H.; Stolze, S. (2016). "Migration of legacy data to new media formats for long-time storage and maximum visibility: Modern pollen data from the Canadian Arctic (1972/1973)". RIO 2: e10269. doi:10.3897/rio.2.e10269. 
  16. 16.0 16.1 16.2 16.3 16.4 Pannell, J.L. (2016). "Data Management Plan for PhD Thesis "Climatic Limitation of Alien Weeds in New Zealand: Enhancing Species Distribution Models with Field Data"". RIO 2: e10600. doi:10.3897/rio.2.e10600. 
  17. 17.0 17.1 17.2 17.3 17.4 Traynor, C. (2017). "Data Management Plan: Empowering Indigenous Peoples and Knowledge Systems Related to Climate Change and Intellectual Property Rights". RIO 3: e15111. doi:10.3897/rio.3.e15111. 
  18. 18.0 18.1 18.2 18.3 18.4 Wael, R. (2017). "Data Management Plan: HarassMap". RIO 3: e15133. doi:10.3897/rio.3.e15133. 
  19. 19.0 19.1 19.2 19.3 19.4 19.5 White, E.P. (2016). "Data Management Plan for Moore Investigator in Data Driven Discovery Grant". RIO 2: e10708. doi:10.3897/rio.2.e10708. 
  20. 20.0 20.1 20.2 20.3 Woolfrey, L. (2017). "Data Management Plan: Opening access to economic data to prevent tobacco related diseases in Africa". RIO 3: e14837. doi:10.3897/rio.3.e14837. 
  21. 21.0 21.1 21.2 21.3 21.4 21.5 Xu, H.; Ishida, M.; Wang, M. (2016). "A Data Management Plan for Effects of particle size on physical and chemical properties of mine wastes". RIO 2: e11065. doi:10.3897/rio.2.e11065. 
  22. 22.0 22.1 22.2 Peisert, S.; Welch, V.; Adams, A. et al. (2017). "Open Science Cyber Risk Profile (OSCRP)". IUScholar Works. http://hdl.handle.net/2022/21259. Retrieved 19 November 2019. 
  23. 23.0 23.1 23.2 23.3 23.4 Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J. et al. (2016). "The FAIR Guiding Principles for scientific data management and stewardship". Scientific Data 3: 160018. doi:10.1038/sdata.2016.18. PMC PMC4792175. PMID 26978244. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4792175. 
  24. 24.0 24.1 24.2 Ferreira, F.; Coimbra, M.E.; Bairrão, R. et al. (2014). "Data Management in Metagenomics: A Risk Management Approach". International Journal of Digital Curation 9 (1): 41–56. doi:10.2218/ijdc.v9i1.299. 
  25. University Computing Centre (Rechenzentrum) (2019). "SynFo - Creating synergies on the operational level of research data management". Kiel University. https://www.rz.uni-kiel.de/en/projects/synfo-creating-synergies-on-the-operational-level-of-research-data-management. 
  26. 26.00 26.01 26.02 26.03 26.04 26.05 26.06 26.07 26.08 26.09 26.10 26.11 26.12 German Federal Office for Information Security (22 December 2016). "IT-Grundschutz-catalogues 15th version - 2015 (Draft)". Archived from the original on 28 January 2020. https://web.archive.org/web/20200128211607/https://www.bsi.bund.de/SharedDocs/Downloads/DE/BSI/Grundschutz/International/GSK_15_EL_EN_Draft.html. Retrieved 19 November 2019. 
  27. 27.0 27.1 Collins, S.; Genova, F.; Harrower, N. (26 November 2018). "Turning FAIR into reality". European Commission. doi:10.2777/1524. https://op.europa.eu/en/publication-detail/-/publication/7769a148-f1f6-11e8-9982-01aa75ed71a1/language-en. 
  28. 28.0 28.1 "Regulation (EU) No 511/2014 of the European Parliament and of the Council of 16 April 2014 on compliance measures for users from the Nagoya Protocol on Access to Genetic Resources and the Fair and Equitable Sharing of Benefits Arising from their Utilization in the Union Text with EEA relevance". EUR-Lex. European Union. 16 April 2014. http://data.europa.eu/eli/reg/2014/511/oj. Retrieved 01 December 2019. 
  29. 29.0 29.1 "Implementation of Nagoya Protocol: A comparison between The Netherlands, Belgium and Germany". vo.eu. 18 June 2018. https://publications.vo.eu/implementation-of-nagoya-protocol/. Retrieved 30 November 2019. 
  30. 30.0 30.1 "Frequently Asked Questions". Creative Commons. https://creativecommons.org/faq/. Retrieved 08 December 2019. 
  31. Lipinski, T.A. (2012). Librarian's Legal Companion for Licensing Information Resources and Legal Services. Neal-Schuman Publishers. p. 312. ISBN 9781555706104. 
  32. Article 29 Data Protection Working Party (6 February 2018). "Working document on Adequacy Referential (wp254rev.01)". European Commission. https://ec.europa.eu/newsroom/article29/item-detail.cfm?item_id=614108. Retrieved 01 December 2019. 
  33. European Commission (14 January 2019). "Adequacy decisions". European Commission. https://ec.europa.eu/info/law/law-topic/data-protection/international-dimension-data-protection/adequacy-decisions_en. Retrieved 01 December 2019. 
  34. "Consolidated text: Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) (Text with EEA relevance)". EUR-Lex. European Union. 27 April 2016. http://data.europa.eu/eli/reg/2016/679/2016-05-04. Retrieved 30 November 2019. 
  35. Reich, V.; Rosenthal, D.S.H. (2009). "LOCKSS (Lots of Copies Keep Stuff Safe)". New Review of Academic Librarianship 6 (1): 155–61. doi:10.1080/13614530009516806. 
  36. Peng, R. (2015). "The reproducibility crisis in science: A statistical counterattack". Sginificance 12 (3): 30–32. doi:10.1111/j.1740-9713.2015.00827.x. 
  37. Jiménez, R.C.; Kuzak, M; Alhamdoosh, M. et al. (2017). "Four simple recommendations to encourage best practices in research software [version 1; peer review: 3 approved]". F1000Research 6: 876. doi:10.12688/f1000research.11407.1. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. Grammar was cleaned up for smoother reading. In some cases important information was missing from the references, and that information was added. The original article lists references in alphabetical order; this version lists them in order of appearance, by design.