Difference between revisions of "User:Shawndouglas/sandbox/sublevel1"

From LIMSWiki
Jump to navigationJump to search
Line 7: Line 7:


==Sandbox begins below==
==Sandbox begins below==
{{Infobox journal article
{{Infobox journal article
|name        =  
|name        =  
Line 12: Line 13:
|alt          = <!-- Alternative text for images -->
|alt          = <!-- Alternative text for images -->
|caption      =  
|caption      =  
|title_full  = A data quality strategy to enable FAIR, programmatic access across large,<br />diverse data collections for high performance data analysis
|title_full  = How could the ethical management of health data in the medical field inform police use of DNA?
|journal      = ''Informatics''
|journal      = ''Frontiers in Public Health''
|authors      = Evans, Ben; Druken, Kelsey; Wang, Jingbo; Yang, Rui; Richards, Clare; Wyborn, Lesley
|authors      = Krikorian, Gaelle; Vailly, Joëlle
|affiliations = Australian National University
|affiliations = Institut de recherche interdisciplinaire sur les enjeux sociaux (IRIS)
|contact      = Email: Jingbo dot Wang at anu dot edu dot au
|contact      = Email: gaelle.krikorian@gmail.com
|editors      = Ge, Mouzhi; Dohnal, Vlastislav
|editors      = Lefèvre, Thomas
|pub_year    = 2017
|pub_year    = 2018
|vol_iss      = '''4'''(4)
|vol_iss      = '''6'''
|pages        = 45
|pages        = 154
|doi          = [http://10.3390/informatics4040045 10.3390/informatics4040045]
|doi          = [http://10.3389/fpubh.2018.00154 10.3389/fpubh.2018.00154]
|issn        = 2227-9709
|issn        = 2296-2565
|license      = [http://creativecommons.org/licenses/by/4.0/ Creative Commons Attribution 4.0 International]
|license      = [http://creativecommons.org/licenses/by/4.0/ Creative Commons Attribution 4.0 International]
|website      = [http://www.mdpi.com/2227-9709/4/4/45/htm http://www.mdpi.com/2227-9709/4/4/45/htm]
|website      = [https://www.frontiersin.org/articles/10.3389/fpubh.2018.00154/full https://www.frontiersin.org/articles/10.3389/fpubh.2018.00154/full]
|download    = [http://www.mdpi.com/2227-9709/4/4/45/pdf http://www.mdpi.com/2227-9709/4/4/45/pdf] (PDF)
|download    = [https://www.frontiersin.org/articles/10.3389/fpubh.2018.00154/pdf https://www.frontiersin.org/articles/10.3389/fpubh.2018.00154/pdf] (PDF)
}}
}}
{{ombox
{{ombox
Line 32: Line 33:
| text      = This article should not be considered complete until this message box has been removed. This is a work in progress.
| text      = This article should not be considered complete until this message box has been removed. This is a work in progress.
}}
}}
==Abstract==
To ensure seamless, programmatic access to data for high-performance computing (HPC) and [[Data analysis|analysis]] across multiple research domains, it is vital to have a methodology for standardization of both data and services. At the Australian National Computational Infrastructure (NCI) we have developed a data quality strategy (DQS) that currently provides processes for: (1) consistency of data structures needed for a high-performance data (HPD) platform; (2) [[quality control]] (QC) through compliance with recognized community standards; (3) benchmarking cases of operational performance tests; and (4) [[quality assurance]] (QA) of data through demonstrated functionality and performance across common platforms, tools, and services. By implementing the NCI DQS, we have seen progressive improvement in the quality and usefulness of the datasets across different subject domains, and demonstrated the ease by which modern programmatic methods can be used to access the data, either ''in situ'' or via web services, and for uses ranging from traditional analysis methods through to emerging machine learning techniques. To help increase data re-usability by broader communities, particularly in high-performance environments, the DQS is also used to identify the need for any extensions to the relevant international standards for interoperability and/or programmatic access.
'''Keywords''': data quality, quality control, quality assurance, benchmarks, performance, data management policy, netCDF, high-performance computing, HPC, fair data
==Introduction==
==Introduction==
The National Computational Infrastructure (NCI) manages one of Australia’s largest and more diverse repositories (10+ petabytes) of research data collections spanning datasets from climate, coasts, oceans, and geophysics through to astronomy, [[bioinformatics]], and the social sciences.<ref name="WangLarge14">{{cite journal |title=Large-Scale Data Collection Metadata Management at the National Computation Infrastructure |journal=Proceedings from the American Geophysical Union, Fall Meeting 2014 |author=Wang, J.; Evans, B.J.K.; Bastrakova, I. et al. |pages=IN14B-07 |year=2014}}</ref> Within these domains, data can be of different types such as gridded, ungridded (i.e., line surveys, point clouds), and raster image types, as well as having diverse coordinate reference projections and resolutions. NCI has been following the Force 11 FAIR data principles to make data findable, accessible, interoperable, and reusable.<ref name="F11FAIR">{{cite web |url=https://www.force11.org/group/fairgroup/fairprinciples |title=The FAIR Data Principles |publisher=Force11 |accessdate=23 August 2017}}</ref> These principles provide guidelines for a research data repository to enable data-intensive science, and enable researchers to answer problems such as how to trust the scientific quality of data and determine if the data is usable by their software platform and tools.
Various events paved the way for the production of ethical norms regulating biomedical practices, from the Nuremberg Code (1947)—produced by the international trial of Nazi regime leaders and collaborators—and the Declaration of Helsinki by the World Medical Association (1964) to the invention of the term “bioethics” by American biologist Van Rensselaer Potter.<ref name="PotterBio70">{{cite journal |title=Bioethics, the science of survival |journal=Perspectives in Biology and Medicine |author=Potter, V.R. |volume=14 |issue=1 |pages=127–53 |year=1970 |doi=10.1353/pbm.1970.0015}}</ref> The ethics of biomedicine has given rise to various controversies—particularly in the fields of newborn screening<ref name=VaillyTheBirth13">{{cite book |title=The Birth of a Genetics Policy: Social Issues of Newborn Screening |author=Vailly, J. |publisher=Routledge |pages=240 |year=2013 |isbn=9781472422729}}</ref>, prenatal screening<ref name="IsambertÉthique80">{{cite journal |title=Éthique et génétique: De l'utopie eugénique au contrôle des malformations congénitales |journal=Revue française de sociologie |author=Isambert, F.A. |volume=21 |issue=3 |pages=331–54 |year=1980 |doi=10.2307/3320930}}</ref>, and cloning<ref name="PulmanLesEnjeux05">{{cite journal |title=Les enjeux du clonage |journal=Revue française de sociologie |author=Pulman, B. |volume=46 |issue=3 |pages=413–42 |year=2005 |doi=10.3917/rfs.463.0413}}</ref>—resulting in the institutionalization of ethical questions in the biomedical world of genetics. In 1994, France passed legislation (commonly known as the “bioethics laws”) to regulate medical practices in genetics. The medical community has also organized itself in order to manage ethical issues relating to its decisions, with a view to handling “practices with many strong uncertainties” and enabling clinical judgments and decisions to be made not by individual practitioners but rather by multidisciplinary groups drawing on different modes of judgment and forms of expertise.<ref name="BourretDécision08">{{cite journal |title=Décision et jugement médicaux en situation de forte incertitude : l’exemple de deux pratiques cliniques à l’épreuve de la génétique |journal=Sciences sociales et santé |author=Bourret, P.; Rabeharisoa, V. |volume=26 |issue=1 |pages=128 |year=2008 |doi=10.3917/sss.261.0033}}</ref> Thus, the biomedical approach to genetics has been characterized by various debates and the existence of public controversies.
 
To ensure broader reuse of the data and enable transdisciplinary integration across multiple domains, as well as enabling programmatic access, a dataset must be usable and of value to a broad range of users from different communities.<ref name="EvansExtend16">{{cite journal |title=Extending the Common Framework for Earth Observation Data to other Disciplinary Data and Programmatic Access |journal=Proceedings from the American Geophysical Union, Fall General Assembly 2016 |author=Evans, B.J.K.; Wyborn, L.A.; Druken, K.A. et al. |pages=IN22A-05 |year=2016}}</ref> Therefore, a set of standards and "best practices" for ensuring the quality of scientific data products is a critical component in the life cycle of data management. We undertake both QC through compliance with recognized community standards (e.g., checking the header of the files to make sure it is compliant with community convention standard) and QA of data through demonstrated functionality and performance across common platforms, tools, and services (e.g., verifying the data to be functioning with designated software and libraries).
 
The Earth Science Information Partners (ESIP) Information Quality Cluster (IQC) has been established for collecting such standards and best practices and then assisting data producers in their implementation, and users in their taking advantage of them.<ref name="RamapriyanEnsuring17">{{cite journal |title=Ensuring and Improving Information Quality for Earth Science Data and Products |journal=D-Lib Magazine |author=Ramapriyan, H.; Peng, G.; Moroni, D.; Shie, C.-L. |volume=23 |issue=7/8 |year=2017 |doi=10.1045/july2017-ramapriyan}}</ref> ESIP considers four different aspects of [[information]] quality in close relation to different stages of data products in their four-stage life cycle<ref name="RamapriyanEnsuring17" />: (1) define, develop, and validate; (2) produce, access, and deliver; (3) maintain, preserve, and disseminate; and (4) enable use, provide support, and service.
 
Science teams or data producers are responsible for managing data quality during the first two stages, while data publishers are responsible for the latter two stages. As NCI is both a digital repository, which manages the storage and distribution of reference data for a range of users, as well as the provider of high-end compute and data analysis platforms, the data quality processes are focused on the latter two stages. A check on the scientific correctness is considered to be part of the first two stages and is not included in the definition of "data quality" that is described in this paper.
 
==NCI's data quality strategy (DQS)==
NCI developed a DQS to establish a level of assurance, and hence confidence, for our user community and key stakeholders as an integral part of service provision.<ref name="AtkinTotal05">{{cite book |chapter=Chapter 8: Service Specifications, Service Level Agreements and Performance |title=Total Facilities Management |author=Atkin, B.; Brooks, A. |publisher=Wiley |isbn=9781405127905}}</ref> It is also a step on the pathway to meet the technical requirements of a trusted digital repository, such as the CoreTrustSeal certification.<ref name="CTSData">{{cite web |url=https://www.coretrustseal.org/why-certification/requirements/ |title=Data Repositories Requirements |publisher=CoreTrustSeal |accessdate=24 October 2017}}</ref> As meeting these requirements involves the systematic application of agreed policies and procedures, our DQS provides a suite of guidelines, recommendations, and processes for: (1) consistency of data structures suitable for the underlying high-performance data (HPD) platform; (2) QC through compliance with recognized community standards; (3) benchmarking performance using operational test cases; and (4) QA through demonstrated functionality and benchmarking across common platforms, tools, and services.
 
NCI’s DQS was developed iteratively through firstly a review of other approaches for management of data QC and data QA (e.g., Ramapriyan ''et al.''<ref name="RamapriyanEnsuring17" /> and Stall<ref name="StallAGU16">{{cite web |url=https://www.scidatacon.org/2016/sessions/100/ |title=AGU's Data Management Maturity Model |work=Auditing of Trustworthy Data Repositories |author=Stall, S.; Downs, R.R.; Kempler, S.J. |publisher=SciDataCon 2016 |date=2016}}</ref>) to establish the DQS methodology and secondly applying this to selected use cases at NCI which captured existing and emerging requirements, particularly the use cases that relate to HPC.
 
Our approach is consistent with the American Geophysical Union (AGU) Data Management Maturity (DMM)SM model<ref name="StallAGU16" /><ref name="StallTheAmerican16">{{cite journal |title=The American Geophysical Union Data Management Maturity Program |journal=Proceedings from the eResearch Australasia Conference 2016 |author=Stall, S.; Hanson, B.; Wyborn, L. |pages=72 |year=2016 |url=https://eresearchau.files.wordpress.com/2016/03/eresau2016_paper_72.pdf}}</ref>, which was developed in partnership the Capability Maturity Model Integration (CMMI) Institute and adapted for their DMMSM<ref name="CMMIDataMan">{{cite web |url=https://cmmiinstitute.com/store/data-management-maturity-(dmm) |title=Data Management Maturity (DMM) |publisher=CMMI Institute LLC}}</ref> model for applications in the Earth and space sciences. The AGU DMMSM model aims to provide guidance on how to improve data quality and consistency and facilitate reuse in the data life cycle. It enables both producers of data and repositories that store data to ensure that datasets are "fit-for-purpose," repeatable, and trustworthy. The Data Quality Process Areas in the AGU DMMSM model define a collaborative approach for receiving, assessing, cleansing, and curating data to ensure "fitness" for intended use in the scientific community.
 
After several iterations, the NCI DQS was established as part of the formal data publishing process and is applied throughout the cycle from submission of data to the NCI repository through to its final publication. The approach is also being adopted by the data producers who now engage with the process from the preparation stage, prior to ingestion onto the NCI data platform. Early consultation and feedback has greatly improved both the quality of the data as well as the timeliness for publication. To improve the efficiency further, one of our major data suppliers is including our DQS requirements in their data generation processes to ensure data quality is considered earlier in data production.
 
The technical requirements and implementation of our DQS will be described as four major but related data components: structure, QC, benchmarking, and QA.
 
===Data structure===
NCI's research data collections are particularly focused on enabling programmatic access, required by: (1) NCI core services such as the NCI supercomputer and NCI cloud-based capabilities; (2) community virtual [[Laboratory|laboratories]] and virtual research environments; (3) those that require remote access through established scientific standards-based protocols that use data services; and, (4) increasingly, by international data federations. To enable these different types of programmatic access, datasets must be registered in the central NCI catalogue<ref name="NCIDataPortal">{{cite web |url=https://geonetwork.nci.org.au/geonetwork/srv/eng/catalog.search#/home |title=NCI Data Portal |publisher=National Computational Infrastructure}}</ref>, which records their location for access both on the filesystems and via data services.
 
This requires the data to be well-organized and compliant with uniform, professionally managed standards and consistent community conventions wherever possible. For example, the climate community Coupled Model Intercomparison Project (CMIP) experiments use the Data Reference Syntax (DRS)<ref name="TaylorCMIP12">{{cite web |url=https://pcmdi.llnl.gov/mips/cmip5/docs/cmip5_data_reference_syntax.pdf |format=PDF |title=CMIP5 Data Reference Syntax (DRS) and Controlled Vocabularies |author=Taylor, K.E.; Balaji, V.; Hankin, S. et al. |publisher=Program for Climate Model Diagnosis & Intercomparison |date=13 June 2012}}</ref>, whilst the National Aeronautics and Space Administration (NASA) recommends a specific name convention for Landsat satellite image products.<ref name="USGSLandsat">{{cite web |url=https://landsat.usgs.gov/what-are-naming-conventions-landsat-scene-identifiers |title=What are the naming conventions for Landsat scene identifiers? |publisher=U.S. Geological Survey |accessdate=23 August 2017}}</ref> The NCI data collection catalogue manages the details of each dataset through a uniform application of ISO 19115:2003<ref name="ISO19115">{{cite web |url=https://www.iso.org/standard/53798.html |title=ISO 19115-1:2014 Geographic information -- Metadata -- Part 1: Fundamentals |publisher=International Organization for Standardization |date=April 2014 |accessdate=25 May 2016}}</ref>, an international schema used for describing geographic information and services. Essentially, each catalogue entry points to the location of the data within the NCI data infrastructure. The catalogue entries also point to the services endpoints such as a standard data download point, data subsetting interface, as well as Open Geospatial Consortium (OGC) Web Mapping Service (WMS) and Web Coverage Services (WCS). NCI can publish data through several different servers, and as such the specific endpoint for each of these service capabilities is listed.
 
NCI has developed a catalogue and directory policy, which provides guidelines for the organization of datasets within the concepts of data collections and data sub-collections and includes a comprehensive definition for each hierarchical layer. The definitions are:
 
* A ''data collection'' is the highest in the hierarchy of data groupings at NCI. It is comprised of either an exclusive grouping of data subcollections, or it is a tiered structure with an exclusive grouping of lower tiered data collections, where the lowest tier data collection will only contain data subcollections.
 
* A ''data subcollection'' is an exclusive grouping of datasets (i.e., belonging to only one subcollection) where the constituent datasets are tightly managed. It must have responsibilities within one organization with responsibility for the underlying management of its constituent datasets. A data subcollection constitutes a strong connection between the component datasets, and is organized coherently around a single scientific element (e.g., model, instrument). A subcollection must have compatible licenses such that constituent datasets do not need different access arrangements.
 
* A ''dataset'' is a compilation of data that constitutes a programmable data unit that has been collected and organized using a self-contained process. For this purpose it must have a named data owner, a single license, one set of semantics, ontologies, vocabularies, and has a single data format and internal data convention. A dataset must include its version.
 
* A ''dataset granule'' is used for some scientific domains that require a finer level of granularity (e.g., in satellite Earth Observation datasets). A granule refers to the smallest aggregation of data that can be independently described, inventoried, and retrieved as defined by NASA.<ref name="NASAGlossary">{{cite web |url=https://earthdata.nasa.gov/user-resources/glossary#ed-glossary-g |title=Granule |work=EarthData Glossary |accessdate=23 August 2017}}</ref> Dataset granules have their own metadata and support values associated with the additional attributes defined by parent datasets.
 
In addition we use the term "data category" to identify common contents/themes across all levels of the hierarchy.
 
* A ''data category'' allows a broad spectrum of options to encode relationships between data. A data category can be anything that weakly relates datasets, with the primary way of discovering the groupings within the data by key terms (e.g., keywords, attributes, vocabularies, ontologies). Datasets are not exclusive to a single category.
 
====Organization of data within the data structure====
NCI has organized data collections according to this hierarchical structure on both filesystem and within our catalogue system. Figure 1 shows how these datasets are organized. Figure 2 provides an example of how the CMIP 5 data collection demonstrates the hierarchical directory structure.
 
 
[[File:Fig1 Evans Informatics2017 4-4.png|700px]]
{{clear}}
{|
| STYLE="vertical-align:top;"|
{| border="0" cellpadding="5" cellspacing="0" width="700px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"| <blockquote>'''Figure 1.''' Illustration of the different levels of metadata and community standards used for each</blockquote>
|-
|}
|}
 
 
[[File:Fig2 Evans Informatics2017 4-4.jpg|550px]]
{{clear}}
{|
| STYLE="vertical-align:top;"|
{| border="0" cellpadding="5" cellspacing="0" width="550px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"| <blockquote>'''Figure 2.''' Example schematic of the National Computational Infrastructure (NCI)’s data organizational structure using the Coupled Model Intercomparison Project (CMIP)) 5 collection. The CMIP 5 collection housed at NCI includes three sub-collections from The Commonwealth Scientific and Industrial Research Organisation (CSIRO) and Australian Bureau of Meteorology (BOM): (1) the ACCESS-1.0 model, (2) ACCESS-1.3 model, and (3) Mk 3.6.0 model. Each sub-collection then contains a number of datasets, such as “piControl” (pre-industrial control experiment), which then contains numerous granules (e.g., precipitation, “pr”). A complete description on the range of CMIP5 contents can be found at: https://pcmdi.llnl.gov/mips/cmip5/experiment_design.html.</blockquote>
|-
|}
|}


===Data QC===
In the judicial sphere, the situation is very different. Since the end of the 1990s, developments in biomedical research have led to genetic data being used in police work and legal proceedings. Today, [[forensic science]] is omnipresent in investigations, not just in complex criminal cases but also routinely in cases of “minor” or “mass” delinquency. Genetics, which certainly receives the most media coverage among the techniques involved<ref name="BrewerMedia09">{{cite journal |title=Media Use and Public Perceptions of DNA Evidence |journal=Science Communication |author=Brewer, P.R.; Ley, B.L. |volume=32 |issue=1 |pages=93–117 |year=2009 |doi=10.1177/1075547009340343}}</ref>, has taken on considerable importance.<ref name="WilliamsGenetic08">{{cite book |title=Genetic Policing: The Uses of DNA in Police Investigations |author=Williams, R.; Johnson, P. |publisher=Willan |pages=208 |year=2008 |isbn=9781843922049}}</ref> However, although very similar techniques are used in biomedicine and police work (DNA amplification, [[sequencing]], etc.), the forms of collective management surrounding them are very different, as well as the ethico-legal frameworks and their evolution, as this text will demonstrate.
Data QC measures are intended to ensure that all datasets hosted at NCI adhere, wherever possible, to existing community standards for metadata and data. For Network Common Data Form (netCDF) (and Hierarchical Data Format v5 (HDF5)-based) file formats, these include the Climate and Forecast (CF) Convention<ref name="LLNLCFConv">{{cite web |url=http://cfconventions.org/ |title=CF Conventions and Metadata |publisher=Lawrence Livermore National Laboratory |accessdate=23 August 2017}}</ref> and the Attribute Convention for Data Discovery<ref name="ESIPAttri">{{cite web |url=http://wiki.esipfed.org/index.php/Attribute_Convention_for_Data_Discovery_(ACDD) |title=Attribute Convention for Data Discovery 1-3 |publisher=Federation of Earth Science Information Partners |accessdate=23 August 2017}}</ref> (see Table 1).


{|
'''Keywords''': DNA, police, ethics, genetic technologies, criminal investigations
| STYLE="vertical-align:top;"|
{| class="wikitable" border="1" cellpadding="5" cellspacing="0" width="80%"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" colspan="3"|'''Table 1.''' The NCI Quality Control (QC) mandatory requirements. A full list of the Attribute Convention for Data Discovery (ACDD) metadata requirements used by NCI is provided in Appendix A.
|-
|-
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;"|Convention/Standard
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;"|NCI Requirements
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;"|Further Information
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|CF
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Mandatory CF criteria, e.g., no “errors” result from any of the recommended compliance checkers
  | style="background-color:white; padding-left:10px; padding-right:10px;"|[http://cfconventions.org http://cfconventions.org]
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|ACDD (Modified version)
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Required attributes are included within each file: 1. title, 2. summary, 3. source, 4. date_created
  | style="background-color:white; padding-left:10px; padding-right:10px;"|[http://wiki.esipfed.org/index.php/Attribute_Convention_for_Data_Discovery_1-3 http://wiki.esipfed.org/index.php/Attribute_Convention_for_Data_Discovery_1-3]
|-
|}
|}


====Climate and forecast (CF) convention====
==Nature of the information and genetic data produced in the police sphere==
NCI requires that all geospatial datasets meet the minimum mandatory CF convention metadata criteria at the time of publication, and, where scientifically applicable, we require they meet the relevant recommended CF criteria. These requirements are detailed in the latest CF convention document provided on their website.<ref name="LLNLCFConv" />


The CF convention is the primary community standard for netCDF data, which was originally developed by the climate community and is now being adapted for other domains, e.g., marine and geosciences. It defines metadata requirements for information on each variable contained within the file as well as spatial and temporal properties of the data, so that contents are fully “self-described.” For example, no additional companion files or external sources are required to describe any information about how to read or utilize the data contents within the file. The metadata requirements also provide important guidelines on how to structure spatial data. This includes recommendations on the order of dimensions, the handling of gridded and non-gridded (time series, point and trajectory) data, coordinate reference system descriptions, standardized units, and cell measures (i.e., information relating to the size, shape, or location of grid cells). CF requires that all metadata information be equally readable and understandable by humans and software, which has the benefit of allowing software tools to easily display and dynamically perform associated operations.
In police work in France, data produced by DNA are currently compiled and used in two different ways: first, to create files on individuals in the FNAEG or ''Fichier national automatisé des empreintes génétiques'' (national automated DNA database) and, second, in order to obtain [[information]] about perpetrators of crimes (their appearance, their origin, their kinship links to other individuals).


====Attribute Convention for Data Discovery (ACDD)====
Police use of DNA has been allowed in France since the 1998 law providing for the creation of the FNAEG. A DNA profile corresponds to a “specific individual alphanumeric combination”<ref name="CabalRapport01">{{cite book |title=Rapport sur la valeur scientifique de l'utilisation des empreintes génétiques dans le domaine judiciaire |author=Cabal, C.; Le Déaut, J.-Y.; Revol, H. |publisher=Assemblée nationale |year=2001 |isbn=2111150177}}</ref> that is the numerical encoding of analysis of DNA segments. This profile is the result of analysis of DNA fragments using genetic markers. This analysis can be carried out on a minute amount of genetic material (saliva, blood, sperm, hair, contact, etc.). It identifies the presence of sequences specific to an individual that differentiate them from any other person (with the exception of an identical twin) but that are not supposed to provide any phenotypical information (about appearance, geographical origin, or diseases).{{efn|The Order of 10 August 2015 increased the number of markers analyzed to 21; policemen and analysis laboratories had three years to comply with this new requirement.}} Such profiles therefore make individuals “identifiable in their uniqueness.”<ref name="BonniolL'ADN14">{{cite journal |title=L’ADN au service d’une nouvelle quête des ancêtres? |journal=Civilisations |author=Bonniol, J.-L.; Darlu, P. |volume=63 |pages=201–19 |year=2014 |doi=10.4000/civilisations.3747}}</ref> During investigations, DNA is collected from suspects or unidentified stains left on crime scenes or people and the results of this analysis are entered into the database. Identification through the FNAEG was originally restricted to a limited number of crimes—those of a sexual nature, as part of the law relating to the prevention and punishment of sexual crimes and the protection of minors. This remit has progressively been extended to include the vast majority of crimes and offences{{efn|Act n°98-468 of 17 June 1998 relative to the punishment of sexual crimes and the protection of minors introduced article 706-54 into the Code of Criminal Procedure making provision for the creation of an automated national database to centralize the DNA profiles of persons convicted of offences of a sexual nature. The remit of the database was then extended on several occasions. In 2001, it included serious crimes against persons. In 2003, the law on internal security extended it to persons convicted of or implicated in crimes and offences against persons or property.}}, leading to the routine use of DNA in investigations.{{efn|Collecting DNA samples in investigations is now the rule. An ''ad hoc'' body of staff has been trained over the past 15 years that almost systematically processes crime scenes.}} As a result of this evolution, there has been a substantial increase in the number of persons with files in the FNAEG, more than three million as of late 2015.{{efn|This figure was provided to the French Parliament by the Ministry of the Interior following a question by parliamentarian Sergio Coronado (member of the “Ecologist” parliamentary group) (http://questions.assemblee-nationale.fr/q14/14-79728QE.htm).}}
The ACDD is another common standard for netCDF data that complements the CF convention requirements.<ref name="ESIPAttri" /> The ACDD primarily governs metadata information written at the file-level (i.e., netCDF global attributes), while the CF convention pertains mainly to variable-level metadata and structure information. Therefore, when combined these two standards help to fully describe both the higher-level metadata relevant to the entire file (e.g., dataset title, custodian, data created, etc.) and the lower-level information about each individual variable or dimension (e.g., name, units, bounds, fill values, etc.). ACDD also provides the ability to link to even higher-levels such as the dataset parent and grandparent ISO 19115 metadata entries.


NCI has applied this convention, along with CF, as summarized in Table 1 as part of our data QC. As the ACDD has no “required” fields in its current specification, NCI has applied a modified version that requires all published datasets meet the minimum of four required ACDD catalogue metadata fields at the time of publication. These are “title,” “summary,” “source,and “date_created” and have been ranked as “required” to aid with NCI’s data services and data discovery. A complete list of ACDD metadata attributes and NCI requirements are available in Appendix A.
New techniques have also emerged in recent years. It is now possible to obtain indications about an individual's physical appearance based on a sample of his/her DNA<ref name="KayserImproving11">{{cite journal |title=Improving human forensics through advances in genetics, genomics and molecular biology |journal=Nature Reviews Genetics |author=Kayser, M.; de Knijff, P. |volume=12 |issue=3 |pages=179–92 |year=2011 |doi=10.1038/nrg2952 |pmid=21331090}}</ref><ref name="KayserForensic15">{{cite journal |title=Forensic DNA Phenotyping: Predicting human appearance from crime scene material for investigative purposes |journal=Forensic Science International Genetics |author=Kayser, M. |volume=18 |pages=33–48 |year=2015 |doi=10.1016/j.fsigen.2015.02.003 |pmid=25716572}}</ref>: the analyses in question provide statistical information on eye, hair, and skin color, etc. These techniques are more exploratory and aim not to match DNA with an identity by comparison but to determine the characteristics of the perpetrator of a crime. These data result from [[Data analysis|analysis]] of several dozen DNA markers that, unlike the FNAEG's data, are selected deliberately so that they can provide information about a person's physical appearance. They are therefore aimed at “generating a suspect”<ref name="M'charekBeyond13">{{cite journal |title=Beyond Fact or Fiction: On the Materiality of Race in Practice |journal=Cultural Anthropology |author=M'charek, A. |volume=28 |issue=3 |pages=420–42 |year=2013 |doi=10.1111/cuan.12012}}</ref> but because the information about this person's features are incomplete (e.g., a person with blue eyes, fair skin, light brown hair, and of European “bio-geographical” ancestry), they define “target populations of interest” to guide police investigations.<ref name="CaliebePredictive18">{{cite journal |title=Predictive values in Forensic DNA Phenotyping are not necessarily prevalence-dependent |journal=FSI Genetics |author=Caliebe, A.; Krawczak, M.; Kayser, M. |volume=33 |pages=e7–e8 |year=2018 |doi=10.1016/j.fsigen.2017.11.006}}</ref> Several private and public laboratories in France now produce what professionals often refer to as “DNA photofits”; it is estimated that several dozen such analyses have been carried out since 2014 as part of investigations.


===Benchmarking methodology===
==How is this framed legally, politically, and ethically?==
Any reference datasets made available on NCI must be well organized and accessible in a form suitable for the known class of users. Datasets also need to be more broadly available to other users from different domains, with the expectation that the collection will continue to have long-term and enduring value not just to the research community but also to others (e.g., government, general public, industry). To ensure that these expectations are clearly understood across the range of use-cases and environments, NCI has adopted a benchmarking methodology as part of their DQS process. Benchmarks register their functionality and performance, which helps to define expectations around data accessibility and provide an effective, defined measure of usability.
The legal framework surrounding how the police and justice system use DNA analysis was devised to follow the creation of the FNAEG. For this reason, and in order to defuse fears and criticisms, the law only allows analyses using “non-coding” DNA so as to meet the initial objective of allowing identification without providing information about individuals. French law only provides the police DNA for identification purposes “within the framework of investigative measures or the preparation of a case during a judicial proceeding,”{{efn|Art. 16.11 of the Civil Code}} in cases of missing persons{{efn|Art. 26, Domestic Security Guidance and Planning Act n° 95-73 of 21 January 1995}}, or, more recently, in the context of familial searches to allow “searches for persons directly related to [an] unknown person” who has left a stain at a crime scene (i.e., without determining phenotype).{{efn|This possibility was written into law in 2016 in article 796-56-1-1 of Act n° 2016-731 of 3 June 2016 strengthening provisions for the fight against organized crime, terrorism, and their financing, and improving the efficiency and guarantees of the criminal procedure.}}


To substantiate this, NCI works with both the data producers and the users to establish benchmarks for specific areas, which are then included as part of the registry of data QA measures. These tests are then verified by both NCI and by wider community representatives to ensure that the benchmark is appropriate for the requested access. The benchmark methodology also provides a way to systematically consider how current users will be affected when considering any future developments or evolution in technology, standards, or reorganization of data. The benchmark cases then substantiate the original intention, and they can be reviewed against any subsequent changes. For example, benchmark cases that were previously specified to use data in a particular format may have been updated to use an alternative, more acceptable format that is better for use in high-performance environments or improves accessibility across multiple domains. The original benchmark cases can then be re-evaluated against both the functionality and performance required to assess how to make such a transformation. Further, if there are any upgrades or changes to the production services, the benchmark cases are used to perform prerelease tests on the data servers before implementing the changes into production.
Concerning the so-called “DNA Photofit” technique, in June 2014, France's highest court, the Court of Cassation, ruled admissible an expert report charged with providing “all useful elements relating to the suspect's visible morphological characteristics” based on stains collected after a rape in an investigation into a series of sexual assaults in Lyon between October 2012 and January 2014. The Court of Cassation's authorization of this practice in DNA analysis was the first in France. For judges and prosecutors, there is now set a legal precedent allowing them to authorize “DNA Photofits” when they consider this could help an investigation.


The benchmarks consist of explicit current examples using tools, libraries, services, packages, software, and processes that are executed at NCI. These benchmarks explore the required access and identify supporting standards that are critical to the utility of the service, whether access be through the filesystem or by API protocols provided by NCI data services. Where benchmarks are shown to be beyond the capability of the current data service, the benchmark case will be recorded for future application.
In legal terms, the emerging of new technical possibilities and their practical use create conflicting and parallel regimes. On one hand, “DNA Photofits” do not correspond to the legal frameworks devised in the 1990s. It does not provide identification, per se, but is rather an “assistance to the investigation,” as it uses coding DNA. One another hand, as science evolves, the law is falling out of step with the technical and scientific reality. New knowledge shows that some of the markers used by the FNAEG may in fact allow further information to be obtained about people regarding their predisposition to certain diseases, their genetic pathologies, and their “ethnic origin” (by continent or sub-continent).{{efn|For example, according to a study by the Telethon Institute of Genetics and Medicine, D2S1388, one of the markers used by the FNAEG, plays a determining role in the transmission of pseudohyperkalaemia, a rare genetic disease.<ref name="CarellaASecond04">{{cite journal |title=A second locus mapping to 2q35-36 for familial pseudohyperkalaemia |journal=European Journal of Human Genetics |author=Carella, M.; d'Adamo, A.P.; Grootenboer-Mignot, S. et al. |volume=12 |issue=12 |pages=1073–6 |year=2004 |doi=10.1038/sj.ejhg.5201280}}</ref> In 2011, a publication by Chinese researchers highlighted the association between marker D21S11-28.2 and coronary heart disease.<ref name="HuiNovel11">{{cite journal |title=Novel association analysis between 9 short tandem repeat loci polymorphisms and coronary heart disease based on a cross-validation design |journal=Atherosclerosis |author=Hui, L.; Jing, Y.; Rui, M.; Weijian, Y. |volume=218 |issue=1 |pages=151–5 |year=2011 |doi=10.1016/j.atherosclerosis.2011.05.024 |pmid=21703622}}</ref> A team of Portuguese researchers<ref name="PereiraPop11">{{cite journal |title=PopAffiliator: online calculator for individual affiliation to a major population group based on 17 autosomal short tandem repeat genotype profile |journal=International Journal of Legal Medicine |author=Pereira, L.; Alshamali, F.; Andreassen, R. et al. |volume=125 |issue=5 |pages=629–36 |year=2011 |doi=10.1007/s00414-010-0472-2 |pmid=20552217}}</ref> has developed an online calculator capable of correlating certain markers used in the FNAEG's DNA samples with individual affiliation to population groups (Sub-Saharan Africa, Eurasia, East Asia, North Africa, Near East, North America, South America, and Central America).}} Moreover, whereas at the FNAEG's inception it was considered unacceptable for the police to use medical information, certain professionals in police or justice now recognize that this information (whether genetic or not) can be useful in investigations (providing information about wanted persons' need for medication or healthcare, or about their physical appearance, etc.). Although there are no changes in the legal framework on this matter, the idea is spreading and the red line is, to some extend, and for some of the professionals, fading.


Furthermore, the results of the testing of each benchmark are reviewed with the data producer in light of any issues raised. This may require action by the user to revise the access pattern and/or by the data producer to modify the data to ensure that the reliability of NCI’s production service is not compromised. Alternatively, NCI may be able to provide a temporary separate service to accommodate some aspects of the usage pattern. For example, the data might be released via a modified server that can address shortcomings of a specific benchmark case but would not be applicable generally. This may be a short-term measure until a better server solution is found, or it may address current local issues on either the data or client application side.
It is thus obvious that police uses of DNA data providing information about individuals' characteristics raise novel politic-ethical issues.<ref name="M'charekSilent08">{{cite journal |title=Silent witness, articulate collective: DNA evidence and the inference of visible traits |journal=Bioethics |author=M'charek, A. |volume=22 |issue=9 |pages=519-28 |year=2008 |doi=10.1111/j.1467-8519.2008.00699.x |pmid=18959734}}</ref><ref name="MacLeanForensic14">{{cite journal |title=Forensic DNA phenotyping in criminal investigations and criminal courts: Assessing and mitigating the dilemmas inherent in the science |journal=Recent Advances in DNA and Gene Sequences |author=MacLean, C.E.; Lamparello, A. |volume=8 |issue=2 |pages=104-12 |year=2014 |pmid=25687339}}</ref> In particular, it brings into play the issue of what constitutes private data<ref name="ToomApproaching16">{{cite journal |title=Approaching ethical, legal and social issues of emerging forensic DNA phenotyping (FDP) technologies comprehensively: Reply to 'Forensic DNA phenotyping: Predicting human appearance from crime scene material for investigative purposes' by Manfred Kayser |journal=Forensic Science International Genetics |author=Toom, V.; Wienroth, M.; M'charek, A. et al. |volume=22 |pages=e1–e4 |year=2016 |doi=10.1016/j.fsigen.2016.01.010 |pmid=26832996}}</ref>—for certain geneticists, where “DNA Photofits” are concerned, externally visible characteristics do not fall into this category because they are visible.<ref name="KayserForensic15" /> Generally, as stated by some professionals during interviews, the question is “to know until where to go. And where to stop.“ Regarding the FNAEG and French law, in a case heard in June 2017, the European Court of Human Rights (ECHR) ruled that “interference with the applicant's right to respect for his private life had been disproportionate.”{{efn|Case of Aycaguer V. France, 22 June 2017, 8806/12, ECHR, Court (Fifth Section)}} The ECHR judgment ruled against France and underscored that French law regarding DNA date storage should be differentiated “according to the nature and seriousness of the offence committed."{{efn|See legal summary, available at [https://goo.gl/FcyuUM https://hudoc.echr.coe.int/eng#{%22itemid%22:[%22002-11703%22]} }}


===Data QA===
In Germany, a contradictory dialogue between experts took place regarding Forensic DNA Phenotyping revealing public and political debate on the matter.<ref name="BuchananForensic18">{{cite journal |title=Forensic DNA phenotyping legislation cannot be based on “Ideal FDP”—A response to Caliebe, Krawczak and Kayser (2017) |journal=FSI Genetics |author=Buchanan, N.; Staubach, F.; Wienroth, M. et al. |volume=34 |pages=e13–e14 |year=2018 |doi=10.1016/j.fsigen.2018.01.009}}</ref> In France, despite the stakes involved and the spread of new usages of DNA techniques, no public debate has emerged in recent years concerning new uses of DNA in police work. In 2008, a private analysis [[laboratory]] offering indicative geo-genetic tests (''tests d'origine géo-génétique'' or TOGG) providing information about individuals' origin based on their DNA sparked a media debate that complicated the issue<ref name="VaillyThePolitics17">{{cite journal |title=The politics of suspects’ geo-genetic origin in France: The conditions, expression, and effects of problematisation |journal=BioSocieties |author=Vailly, J. |volume=12 |issue=1 |pages=66–88 |year=2017 |doi=10.1057/s41292-016-0028-x}}</ref>; however, the controversy soon died down. A few years later, Ministry of Justice instructions to judges and prosecutors discouraged the use of this technique, with no further debate. Since then, although the Court of Cassation's 2014 decision opened up the possibility of using an unprecedented practice, this has not generated any public debate or controversy.  
To ensure that the data is usable across a range of use-cases and environments, the QA approach uses benchmarks for testing data located on the local filesystem, as well as remotely via the data service endpoints. The QA process is designed to verify that software and libraries used are functioning properly with the most commonly used tools in the community.


The following are a list of data services that are available under NCI’s Unidata Thematic Real-time Environmental Distributed Data Services (THREDDS):
“DNA Photofits” have received some media coverage{{efn|A search conducted on the press database Europresse for the period 2010 to 2018 brought up around 70 pieces published mentioning the terms “DNA Photofits” or “Genetic photofits”.}}, but this has mainly been to underscore the technical process involved, echoing the fiction conveyed by television series that have made the use of genetic techniques in criminal investigations seem commonplace and particularly efficient. Our sociological fieldwork has revealed, however, that there was organized debate among judges and prosecutors between 2013 and 2014. At the time, the investigating judge who had for the first time ordered the analysis of the suspect's visible morphological characteristics referred the case to the examining chamber himself, to obtain a verdict on whether the expert report he had requested was legal. Although the examining chamber approved the report, the public prosecutor brought the issue before the Court of Cassation—the highest legal authority in France—in order to ensure the final nature of the decision. The Court of Cassation ruled that a judge could have recourse to such analyses. Following this verdict, several bodies consulted by the Ministry of Justice{{efn|These bodies were the Commission nationale consultative des droits de l'homme (CNCDH – National consultative committee on human rights) and the approval committee for people authorized to conduct identification procedures using DNA profiles in the context of legal proceedings or the extrajudicial procedure for identifying deceased persons.}} provided opinions underscoring the need for this technique to be written into and regulated by the law. This has not been implemented to date. After being authorized for several years under a temporary protocol, familial searches allowing “genetic proximity testing”<ref name="PrainsackGenetic10">{{cite book |chapter=Chapter 2: Key issues in DNA profiling and databasing: Implications for governance |title=Genetic Suspects: Global Governance of Forensic DNA Profiling and Databasing |author=Prainsack, B. |editor=Hindmarsh, R.; Prainsack, B. |publisher=Cambridge University Press |pages=15–39 |year=2010 |isbn=9780521519434}}</ref> were written into law in 2016. However, the Court of Cassation's judgment on DNA analysis to provide “all useful elements relating to a suspect's visible morphological characteristics” has not been brought up for parliamentary debate to be included in the law. There has been no political management of the question at the state level, nor has the issue been included in the general debate organized by the National Consultative Council of Ethics (Comité Consultatif National d'Ethique) in 2018 regarding the revision of laws on bio-ethics.


* Open-source Project for a Network Data Access Protocol (OPeNDAP): a protocol enabling data access and subsetting through the web;
==Conclusion==
* NetCDF Subset Service (NCSS): web service for subsetting files that can be read by the netCDF java library;
The use of these new technological and scientific techniques plays a significant role in guiding how we engage with the world<ref name="WilliamsSocial17">{{cite journal |title=Social and ethical aspects of forensic genetics: A critical review |journal=Forensic Science Review |author=Williams, R.; Wienroth, M. |volume=29 |issue=2 |pages=145–69 |year=2017 |pmid=28691916}}</ref>, just as it redefines the production of identity translated into information<ref name="AasTheBody06">{{cite journal |title=‘The body does not lie’: Identity, risk and trust in technoculture |journal=Crime, Media, Culture: An International Journal |author=Aas, K.F. |volume=2 |issue=2 |pages=143-158 |year=2006 |doi=10.1177/1741659006065401}}</ref> and structures the way sensitive information about individuals is used and circulated. Despite these stakes, and the initial caution that surrounded the creation of the national automated DNA database, it has not gone hand-in-hand with collective political and ethical debate. This raises questions about the conditions for the existence or for the absence of political controversies that call for further sociological investigations about the framing of the issue and the social and political logic at play.
* WMS: OGC web service for requesting raster images of data;
* WCS: OGC web service for requesting data in some output format;
* Godiva2 Data Viewer: tool for simple visualization of data; and
* HTTP File Download: for direct downloading of data.


The data is tested through each of the required services as part of the QA process, with the basic usability functionality tests applied to each service as shown in Table 2. Should an issue be discovered during these functionality tests, the issue is investigated further. This may lead to additional modifications of the data so as to pass the functionality or performance requirements, and in doing so requires further communication with the data producer to ensure that such changes are acceptable and can be corrected in any future data production process. More detailed functionality can also be recorded for scientific use around the data. Such tests tend to be specific for the data use-case but follow the same methodology as that described here.
As the uses of these techniques are developing in police practices, this absence of collective management of the issue refers the professional to forms of local arbitration. Our fieldwork has shown that they are aware that these practices raise issues and therefore devise ethical frameworks for their own use of DNA. As a consequence, in this field, as it is the case in others, ethical issues are addressed in a fragmented manner as endogenous ethical frameworks are “cobbled together” by professionals as a function of their practices and needs. Each institution, laboratory, and in some cases each individual, is crafting a frame and a perimeter of limits to what can be done according to their understanding and appreciation of the legal setting, the practical utility of actions and the ethical constraints perceived.


 
The ECHR's recent ruling against France regarding the FNAEG may force lawmakers to reach a verdict on this issue, thereby triggering what seems like necessary public debate on forensic use of DNA. The new possibilities provided by genetic technologies point to the need for promoting dialogue among the various professionals using this technology in police work (forensic teams and geneticists working with them, police investigators, private laboratories, prosecutors, judges, etc.), but also with healthcare professionals—who already have experience of the institutionalized management of ethical considerations relating to their practices in genetics—and, more broadly, in society as a whole.
{|
| STYLE="vertical-align:top;"|
{| class="wikitable" border="1" cellpadding="5" cellspacing="0" width="80%"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" colspan="2"|'''Table 2.''' Description of basic accessibility and functionality tests that are applied for commonly used tools as part of NCI’s QA tests
|-
|-
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;"|Test
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;"|Measures of Success
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|netCDF C-Library
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Using the <tt><nowiki>ncdump-h <file></nowiki></tt> function from command line, the file is readable and displays the file header information about the file dimensions, variables, and metadata.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|GDAL
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Using the <tt><nowiki>gdalinfo <file></nowiki></tt> function from command line, the file is readable and displays the file header information about the file dimensions, variables, and metadata.<br />Using the <tt><nowiki>gdalinfo NETCDF:<file>:<subdataset></nowiki></tt> function from command line, the subdatasets are readable and corresponding metadata for each subdataset is displayed.<br />The <tt>Open</tt> and <tt>GetMetadata</tt> functions return non-empty values that correspond to the netCDF file contents.<br />The <tt>GetProjection</tt> function (of the appropriate file or subdataset) returns a non-empty result corresponding to the data coordinate reference system information.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|NCO (NetCDF Operators)
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Using the <tt><nowiki>ncks -m <file></nowiki></tt> function from command line, the file is readable and displays file metadata.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|CDO (Climate Data Operators)
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Using the <tt><nowiki>cdo sinfon <file></nowiki></tt> function from command line, the file is readable and displays information on the included variables, grids, and coordinates.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Ferret
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Using <tt><nowiki>SET DATA “<file>”</nowiki></tt> followed by <tt>SHOW DATA</tt> displays information on file contents.<br /> Using <tt><nowiki>SET DATA “<file>”</nowiki></tt> followed by <tt><nowiki>SHADE <variable></nowiki></tt> (or another plotting command) produces a plot of the requested data.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Thredds Data Server
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Dataset index catalog page loads without timeout and within reasonable time expectations (<10 s).
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Thredds Data Service Endpoints
  | style="background-color:white; padding-left:10px; padding-right:10px;"|'''HTTP Download''': File download commences when selected the HTTPServer option from the THREDDS catalog page for the file.<br />'''OPeNDAP''': When selecting OPeNDAP from the THREDDS catalog page for the file, the OPeNDAP Dataset Access Form page loads without error. From the OPeNDAP Dataset Access Form page, a data subset is returned in ASCII format after selecting data and clicking the Get ASCII option at the top of the page.<br />'''Godiva2''': When selecting the Godiva2 viewer option from the THREDDS catalog page for the file, the viewer displays the file contents.<br />'''WMS''': When selecting the WMS option from the THREDDS catalog page for the file, the web browser displays the GetCapabilities information in xml format. After constructing a GetMap request, the web browser displays the corresponding map.<br />'''WCS''': When selecting the WCS option from the THREDDS catalog page for the file, the web browser displays the GetCapabilities information in XML format. After constructing a GetCoverage request, file download of coverage commences.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Panoply
  | style="background-color:white; padding-left:10px; padding-right:10px;"|From the File → Open menu, the file can be opened. File contents and metadata displayed.<br />Using Create Plot for a selected variable, data is displayed correctly in new plot window.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|QGIS
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Using the Add WMS/WMTS menu option, QGIS can request GetCapabilities and/or GetMap operations, and the layer is visible.<br />The ncWMS GetCapabilities URL accepts and adds the NCI THREDDS Server, the request displays the available layers to select from, and a selected layer displays according to user expectations.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|NASA Web WorldWind
  | style="background-color:white; padding-left:10px; padding-right:10px;"|The ncWMS GetCapabilities URL accepts and adds the NCI THREDDS Server, the request displays the available layers to select from, and a selected layer displays according to user expectations.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|PYTHON cdms2
  | style="background-color:white; padding-left:10px; padding-right:10px;"|The file can be opened by the <tt>Open</tt> function.<br />File metadata is displayed using <tt>Attributes</tt> function.<br />File data contents are displayed when using <tt>Variables</tt> function.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|PYTHON netCDF4
  | style="background-color:white; padding-left:10px; padding-right:10px;"|The file can be opened by the <tt>Dataset</tt> function.<br />File metadata is displayed using <tt>ncattrs</tt> object.<br />File data contents are displayed using <tt>variables</tt> (and/or <tt>groups</tt>) objects.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|PYTHON h5py
  | style="background-color:white; padding-left:10px; padding-right:10px;"|The netcdf file can be opened by the <tt>File</tt> function.<br />The metadata and variables are displayed by the <tt>keys</tt> and <tt>attrs</tt> objects.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|ParaView
  | style="background-color:white; padding-left:10px; padding-right:10px;"|From the File → Open menu, the file can be opened and displayed as a layer in the Pipeline Browser. Enabling layer visibility results in data displaying in the Layout window.
|-
|}
|}
 
==Examples of tests and reports undertaken on NCI datasets prior to publication==
===Metadata QC checker reports===
To assess the CF and ACDD compliance, NCI runs a QC checker prior to data publication and works with the data producer to rectify problems. The NCI checker is based on the U.S. Integrated Ocean Observing System (IOOS) Compliance Checker<ref name="IOOSCompliance">{{cite web |url=https://github.com/ioos/compliance-checker |title=ioos/compliance-checker |publisher=GitHub |accessdate=22 November 2017}}</ref> but has been modified to include additional checks relevant to NCI’s data services as well as the modified ACDD convention. Appendix B shows an example QC checker report (Figure A1) with metadata that is 100% compliant with NCI’s requirements. In practice, the process usually needs to be run several times as the datasets are checked, feedback is given, and then re-run against the timestamp for each version to keep a record of metadata update provenance. The reports are shared with the data producers with comments and additional feedback provided in the “high/medium/low-priority suggestions” section at the end of the report, depending on the potential impact of non-compliance.
 
Due to the large number of data files that can be involved, NCI’s QC checker has been modified to enable parallelization so that multiple processes can be run simultaneously, thus increasing performance of the checking process. For instance, it takes less than a minute to check hundreds of files, and about 10 minutes for tens of thousands. For the largest datasets, the QC checker can typically run on more than one million files at a time.
 
The QC checker also helps to find corrupted or temporary files, which can be easily overlooked or not detected by the data producers, especially during a batch production process.
 
===Functionality test QA reports===
Appendix B provides an example report (Figure A2) of the QA results from checking three data files when accessed directly on the filesystem and their service endpoints for access via THREDDS. The functionality test shows that the variable structure within the data of two files (2 GB and 4 GB) are too large to load the files into several commonly used data viewers, such as ncview (v2.1.1) and Panoply (v4.5.1), and they have similar issues on opening files through the service endpoints. In this case, our advice for mitigation is to reduce the requested size of the image by using a lower resolution or to work ''in situ'' with this particular data file, as recorded in the comments of Figure A2, sections b and c.
 
===Benchmarking use cases===
In the benchmark tests several popular tools and APIs are run to evaluate their elapsed time on accessing data either residing on the local filesystem or being accessed via data services. The test files in the example NCI functionality QA test report (Figure A2) are used in the benchmark tests, and their data structures are listed in Table 3. We access the 2D variable in each file, which is recorded at (lat, lon), chunked at (128,128) and deflated at level 2.
 
{|
| STYLE="vertical-align:top;"|
{| class="wikitable" border="1" cellpadding="5" cellspacing="0" width="80%"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" colspan="5"|'''Table 3.''' Data structure of the sample files used in the benchmark tests
|-
|-
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;" colspan="2"|Attributes
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;"|File 1
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;"|File 2
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;"|File 3
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|lon (double)
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Size
  | style="background-color:white; padding-left:10px; padding-right:10px;"|5717
  | style="background-color:white; padding-left:10px; padding-right:10px;"|59501
  | style="background-color:white; padding-left:10px; padding-right:10px;"|40954
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Chunksize
  | style="background-color:white; padding-left:10px; padding-right:10px;"|128
  | style="background-color:white; padding-left:10px; padding-right:10px;"|128
  | style="background-color:white; padding-left:10px; padding-right:10px;"|128
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|lat (double)
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Size
  | style="background-color:white; padding-left:10px; padding-right:10px;"|4182
  | style="background-color:white; padding-left:10px; padding-right:10px;"|41882
  | style="background-color:white; padding-left:10px; padding-right:10px;"|34761
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Chunksize
  | style="background-color:white; padding-left:10px; padding-right:10px;"|128
  | style="background-color:white; padding-left:10px; padding-right:10px;"|128
  | style="background-color:white; padding-left:10px; padding-right:10px;"|128
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="3"|Variable(float)
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Name
  | style="background-color:white; padding-left:10px; padding-right:10px;"|grav_ir_anomaly
  | style="background-color:white; padding-left:10px; padding-right:10px;"|mag_tmi_rtp_anomaly
  | style="background-color:white; padding-left:10px; padding-right:10px;"|rad_air_dose_rate
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Size
  | style="background-color:white; padding-left:10px; padding-right:10px;"|(4182,5717)
  | style="background-color:white; padding-left:10px; padding-right:10px;"|(41882,59501)
  | style="background-color:white; padding-left:10px; padding-right:10px;"|(34761,40954)
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Chunksize
  | style="background-color:white; padding-left:10px; padding-right:10px;"|(128,128)
  | style="background-color:white; padding-left:10px; padding-right:10px;"|(128,128)
  | style="background-color:white; padding-left:10px; padding-right:10px;"|(128,128)
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Deflate Level
  | style="background-color:white; padding-left:10px; padding-right:10px;"|
  | style="background-color:white; padding-left:10px; padding-right:10px;"|2
  | style="background-color:white; padding-left:10px; padding-right:10px;"|2
  | style="background-color:white; padding-left:10px; padding-right:10px;"|2
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Format
  | style="background-color:white; padding-left:10px; padding-right:10px;"|
  | style="background-color:white; padding-left:10px; padding-right:10px;"|netCDF-4 classic model
  | style="background-color:white; padding-left:10px; padding-right:10px;"|netCDF-4 classic model
  | style="background-color:white; padding-left:10px; padding-right:10px;"|netCDF-4 classic model
|-
|}
|}
 
The elapsed time of the benchmark tests are listed in Table 4. The netCDF utilities such as ncdump or h5dump could dump the contents of netCDF files into an ASCII representation. They are frequently used in the functionality test of the QA report to fetch the metadata of the netCDF files. In the performance benchmarking tests, we measure the elapsed time to dump the whole variable as human-readable ASCII text. This performance relies on the internal data organization, such as contiguous or chunking, deflation shuffling, etc., and involves numerous type conventional operations. Such conventions may also incur a heavy overhead during the dump process, and it could take a very long time to complete the access of a large size file.
 
{|
| STYLE="vertical-align:top;"|
{| class="wikitable" border="1" cellpadding="5" cellspacing="0" width="100%"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" colspan="5"|'''Table 4.''' Benchmark results (in sec.)
|-
|-
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;"|Program/Service
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;"|Test
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;"|File 1
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;"|File 2
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;"|File 3
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="2"|NetCDF Utilities
  | style="background-color:white; padding-left:10px; padding-right:10px;"|ncdump
  | style="background-color:white; padding-left:10px; padding-right:10px;"|8.630
  | style="background-color:white; padding-left:10px; padding-right:10px;"|5584.414
  | style="background-color:white; padding-left:10px; padding-right:10px;"|3246.879
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|h5dump
  | style="background-color:white; padding-left:10px; padding-right:10px;"|40.547
  | style="background-color:white; padding-left:10px; padding-right:10px;"|3546.999
  | style="background-color:white; padding-left:10px; padding-right:10px;"|2373.483
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="3"|Python (2.7.x) netCDF APIs
  | style="background-color:white; padding-left:10px; padding-right:10px;"|netCDF4-python (1.2.7)
  | style="background-color:white; padding-left:10px; padding-right:10px;"|0.445
  | style="background-color:white; padding-left:10px; padding-right:10px;"|48.603
  | style="background-color:white; padding-left:10px; padding-right:10px;"|29.160
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|GDAL-python (1.11.1)
  | style="background-color:white; padding-left:10px; padding-right:10px;"|0.421
  | style="background-color:white; padding-left:10px; padding-right:10px;"|42.654
  | style="background-color:white; padding-left:10px; padding-right:10px;"|25.538
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|h5py (v2.6.0)
  | style="background-color:white; padding-left:10px; padding-right:10px;"|0.356
  | style="background-color:white; padding-left:10px; padding-right:10px;"|40.105
  | style="background-color:white; padding-left:10px; padding-right:10px;"|23.826
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" rowspan="3"|THREDDS Data Server (TDS)
  | style="background-color:white; padding-left:10px; padding-right:10px;"|netCDF4-python (1.2.7)
  | style="background-color:white; padding-left:10px; padding-right:10px;"|3.087
  | style="background-color:white; padding-left:10px; padding-right:10px;"|282.797
  | style="background-color:white; padding-left:10px; padding-right:10px;"|185.358
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|OPeNDAP (TDS v4.6.6)
  | style="background-color:white; padding-left:10px; padding-right:10px;"|3.038
  | style="background-color:white; padding-left:10px; padding-right:10px;"|277.21
  | style="background-color:white; padding-left:10px; padding-right:10px;"|194.85
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|netCDF Subset Service (TDS v4.6.6)
  | style="background-color:white; padding-left:10px; padding-right:10px;"|2.833
  | style="background-color:white; padding-left:10px; padding-right:10px;"|248.194
  | style="background-color:white; padding-left:10px; padding-right:10px;"|158.236
|-
|}
|}
 
In Table 4 we show an extreme case where a file provided complies with standard QC checks and is well formatted. However, when we evaluate the file using the standard suite of tools we see that the elapsed time of using both ncdump and h5dump can take hours to dump a variable for a file size of 2 GB or 4 GB. To evaluate performance of programmatic methods on netCDF files, we use netCDF4-python, Geospatial Data Abstraction Library (GDAL)-python, and h5py to access the target files from the Lustre filesystems. In this case our tests show that all APIs could use much less time fetching the whole variable than netCDF dump tools due to the removal of overheads on data convention and transporting. Our tests also show that h5py presents the best performance. Since netCDF-4 is essentially a profile of the HDF5 format, both netCDF4-python and GDAL-python eventually invoke the HDF5 library to access the data. NetCDF4-python can also access data from the THREDDS server (which is tested for performance on our high speed internal network), but it takes nearly six times longer to access the data via the data service when compared with accessing the same volume of data on our Lustre filesystem. All three tools take a similar time to access data from our THREDDS server. By default, netCDF4-python and THREDDS have a request size limit of 500 MB, so it is necessary to divide the fetching process into several individual requests if the target dataset is larger than 500 MB. NCSS, on the other hand, has a much larger file limit per request so less requests are needed in NCSS than either netCDF4-python or THREDDS.
 
===Results sharing===
All QC/QA reports and benchmarks are shared with the data producers. In the future we plan to make these reports available to the wider community, as the information provides consumers with evidence on how the data is functioning and how it has performed with different software and libraries. It also provides guidance on how to best use the data and enables the consumer to determine if they are using data, or a tool to access the data, that has not been tested before. This information is also used in data training to demonstrate the application of data standards in both data organization and data preparation, and how to use the data with a range of software.
 
==Discussion==
The NCI DQS has been applied to climate and weather, earth observation, geoscience, and astronomy data, with the QC and QA tests adapted to meet the relevant community standards and protocols for each domain. The examples provided in this paper have shown how the knowledge and experience on data standards for netCDF files and conventions—such as CF and ACDD, initially developed within the climate community—are applicable to other scientific domains. For example, in the geophysics domain, there is a growing need to enable access to much larger data volumes, over larger spatial areas and/or enable aggregation of data from multiple individual geophysical surveys. To do this, in consultation with the geophysics and HDF communities, the principles of the CF convention from the climate community and the ACDD from the Earth science community were translated into a proposed new geophysics convention that improves programmatic access and interoperability across different geophysical data types, such as seismic, gravity, magnetotelluric, and radiometric.<ref name="WangImprov17">{{cite journal |title=Improving Seismic Data Accessibility and Performance Using HDF Containers |journal=Proceedings from the AGU 2017 Fall Meeting |author=Wang, J.; Yang, R.; Evans, B.J.K. |pages=IN42B-04 |year=2017}}</ref> We also applied our benchmarking strategy to the geophysics domain, initially using the domain-popular ObsPy library<ref name=MegiesObsPy">{{cite web |url=https://github.com/obspy/obspy/wiki |title=obspy/obspy |author=Megies, T. |publisher=GitHub |accessdate=06 November 2017}}</ref> and SPECFEM3D code<ref name="CIG_SPEC">{{cite web |url=https://geodynamics.org/cig/software/specfem3d/ |title=SPECFEM3D Cartesian |author=Computational Infrastructure for Geodynamics |publisher=University of California Davis |accessdate=06 November 2017}}</ref>, to demonstrate how different organizations of the data (in terms of chunking size and compression) impact on the performance by comparing new data formats, such as PH5<ref name="IRIS_PH5">{{cite web |url=https://www.passcal.nmt.edu/content/ph5-what-it |title=PH5: What is it? |publisher=IRIS PASSCAL Instrument Center |accessdate=18 October 2017}}</ref> and ASDF<ref name="KrischerAnAdapt16">{{cite journal |title=An Adaptable Seismic Data Format |journal=Geophysical Journal International |author=Krischer, L.; Smith, J.; Lei, W. et al. |volume=207 |issue=2 |pages=1003–11 |year=2016 |doi=10.1093/gji/ggw319}}</ref> to traditional formats such as the Society of Exploration Geophysicists-Y Data Exchange Format (SEG-Y), the Standard for the Exchange of Earthquake Data Format (SEED), Seismic Analysis Code (SAC), etc.
 
==Conclusions==
We have developed a DQS as a key component of our vision to provide a trustworthy, transdisciplinary, high-performance data platform which enables researchers to share, use, reuse, and repurpose the data collections in high-end computational and data-intensive environments. The implementation of DQS provides assurance to users that the data is properly quality checked and they are compliant within the community standard. The functionality check in the QA process lists suitable software and libraries so that users can check whether the data is usable within their platform. Applying the DQS provides a standard way to (1) assess completeness and consistency of data across multiple datasets and collections; (2) evaluate the suitability of the data for transdisciplinary use; (3) enable standardized programmatic access; and (4) avoid the negative impacts of poor data and dissatisfied user experience.
 
The NCI DQS identifies issues with the data and metadata at the time of data ingestion onto the NCI data platform, thus allowing corrections to be undertaken prior to publication. Applying the DQS means that scientists spend less time reformatting and wrangling the data to make it suitable for use by their applications and workflows—especially if their applications can read standardized interfaces. Future work will focus on broader adoption of data from additional domains and data types, as well improving use of controlled vocabularies for individual data attributes as a means of more efficiently indexing the data.


==Acknowledgements==
==Acknowledgements==
The authors wish to acknowledge funding from the Australian Government Department of Education, through the National Collaborative Research Infrastructure Strategy (NCRIS), and the Education Investment Fund (EIF) Super Science Initiatives through the NCI and Research Data Services (RDS) projects. We also wish to acknowledge the organizational partners and data managers involved in data management at NCI, particularly Geoscience Australia, the Bureau of Meteorology, CSIRO, and the Australian National University.
Authors are grateful to Lucy Garnier for translating this article from French.


===Author contributions===
===Author contributions===
B.E. and K.D. conceived and designed the NCI DQS. K.D. developed the codes of QC/QA checker. K.D. and J.W. run the QC and QA test and generate reports. R.Y. ran the benchmark tests. J.W., K.D., R.Y. and L.W. wrote the initial paper. B.E., C.R. and L.W. reviewed and improved key sections of the paper, particularly for the broader activities of QA and its application.
GK is the main contributor. JV is the head of the research programme and collaborated to the writing of the article.
 
===Conflicts of interest===
The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; and in the decision to publish the results.
 
==Appendix==
===Appendix A===
NCI NetCDF Metadata Guide based on the Attribute Convention for Dataset Discovery (ACDD v1.3)
 
{|
| STYLE="vertical-align:top;"|
{| class="wikitable" border="1" cellpadding="5" cellspacing="0" width="80%"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" colspan="2"|'''Table A1.''' The following table contains a subgroup of attributes from the ACDD metadata specification<ref name="ESIPAttri" /> where the priority-level for the attributes are categorized as “Required,” “Recommended,” or “Suggested,” and which shows attributes where the priority-level has been modified to better align with NCI’s data hosting services (e.g., NCI classifies “source” as “Required” while it is only “Recommended” by the ACDD guidelines).
|-
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;" colspan="2"|REQUIRED
|-
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;"|Global Attribute
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;"|Description
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|title
  | style="background-color:white; padding-left:10px; padding-right:10px;"|A short phrase or sentence describing the dataset. In many discovery systems, the title will be displayed in the results list from a search, and therefore it should be human-readable and reasonable to display in a list of such names. This attribute is also recommended by the NetCDF Users Guide and the CF conventions.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|summary
  | style="background-color:white; padding-left:10px; padding-right:10px;"|A paragraph describing the dataset, analogous to an abstract for a paper.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|source
  | style="background-color:white; padding-left:10px; padding-right:10px;"|The method of production of the original data. If it was model-generated, source should name the model and its version. If it is observational, source should characterize it. This attribute is defined in the CF Conventions. Examples: "temperature from CTD #1234"; "world model v.0.1".
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|data_created
  | style="background-color:white; padding-left:10px; padding-right:10px;"|The date on which this version of the data was created. (Modification of values implies a new version, hence this would be assigned the date of the most recent values modification.) Metadata changes are not considered when assigning the date_created. The ISO 8601:2004 extended date format is recommended, as described in the Attribute Content Guidance section.
|-
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;" colspan="2"|RECOMMENDED
|-
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;"|Global Attribute
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;"|Description
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Conventions
  | style="background-color:white; padding-left:10px; padding-right:10px;"|A comma-separated list of the conventions that are followed by the dataset. For files that follow this version of ACDD, include the string ‘ACDD-1.3’. (This attribute is described in the netCDF Users Guide.)
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|metadata_link
  | style="background-color:white; padding-left:10px; padding-right:10px;"|A URL that gives the location of more complete metadata. A persistent URL is recommended for this attribute.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|history
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Provides an [[audit trail]] for modifications to the original data. This attribute is also in the netCDF Users Guide: "This is a character array with a line for each invocation of a program that has modified the dataset. Well-behaved generic netCDF applications should append a line containing: date, time of day, user name, program name and command arguments." To include a more complete description you can append a reference to an ISO Lineage entity; see NOAA EDM ISO Lineage guidance.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|license
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Provide the URL to a standard or specific license, enter “Freely Distributed” or “None”, or describe any restrictions to data access and distribution in free text.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|doi
  | style="background-color:white; padding-left:10px; padding-right:10px;"|To be used if a DOI exists.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|product_version
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Version identifier of the data file or product as assigned by the data creator. For example, a new algorithm or methodology could result in a new product_version.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|processing_level
  | style="background-color:white; padding-left:10px; padding-right:10px;"|A textual description of the processing (or QC) level of the data.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|institution
  | style="background-color:white; padding-left:10px; padding-right:10px;"|The name of the institution principally responsible for originating this data. This attribute is recommended by the CF convention.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|project
  | style="background-color:white; padding-left:10px; padding-right:10px;"|The name of the project(s) principally responsible for originating this data. Multiple projects can be separated by commas, as described under Attribute Content Guidelines. Examples: "PATMOS-X" and "Extended Continental Shelf Project".
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|instrument
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Name of the contributing instrument(s) or sensor(s) used to create this data set or product. Indicate controlled vocabulary used in instrument_vocabulary.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|platform
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Name of the platform(s) that supported the sensor data used to create this data set or product. Platforms can be of any type, including satellite, ship, station, aircraft or other. Indicate controlled vocabulary used in platform_vocabulary.
|-
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;" colspan="2"|SUGGESTED
|-
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;"|Global Attribute
  ! style="background-color:#e2e2e2; padding-left:10px; padding-right:10px;"|Description
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|id
  | style="background-color:white; padding-left:10px; padding-right:10px;"|An identifier for the data set, provided by and unique within its naming authority. The combination of the “naming authority” and the “id” should be globally unique, but the id can be globally unique by itself also. IDs can be URLs, URNs, DOIs, meaningful text strings, a local key, or any other unique string of characters. The id should not include white space characters.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|date_modified
  | style="background-color:white; padding-left:10px; padding-right:10px;"|The date on which the data was last modified. Note that this applies just to the data, not the metadata. The ISO 8601:2004 extended date format is recommended, as described in the Attributes Content Guidance section.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|date_created
  | style="background-color:white; padding-left:10px; padding-right:10px;"|The date on which this version of the data was created. (Modification of values implies a new version, hence this would be assigned the date of the most recent values modification.) Metadata changes are not considered when assigning the date_created. The ISO 8601:2004 extended date format is recommended, as described in the Attribute Content Guidance section.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|date_issued
  | style="background-color:white; padding-left:10px; padding-right:10px;"|The date on which this data (including all modifications) was formally issued (i.e., made available to a wider audience). Note that these apply just to the data, not the metadata. The ISO 8601:2004 extended date format is recommended, as described in the Attributes Content Guidance section.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|references
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Published or web-based references that describe the data or methods used to produce it. Recommend URIs (such as a URL or DOI) for papers or other references. This attribute is defined in the CF conventions.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|keywords
  | style="background-color:white; padding-left:10px; padding-right:10px;"|A comma-separated list of key words and/or phrases. Keywords may be common words or phrases, terms from a controlled vocabulary (GCMD is often used), or URIs for terms from a controlled vocabulary (see also “keywords_vocabulary” attribute).
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|standard_name_vocabulary
  | style="background-color:white; padding-left:10px; padding-right:10px;"|The name and version of the controlled vocabulary from which variable standard names are taken. (Values for any standard_name attribute must come from the CF Standard Names vocabulary for the data file or product to comply with CF.) Example: "CF Standard Name Table v27".
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|geospatial_lat_min
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Describes a simple lower latitude limit; may be part of a 2- or 3-dimensional bounding region. Geospatial_lat_min specifies the southernmost latitude covered by the dataset.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|geospatial_lat_max
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Describes a simple upper latitude limit; may be part of a 2- or 3-dimensional bounding region. Geospatial_lat_max specifies the northernmost latitude covered by the dataset.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|geospatial_lon_min
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Describes a simple longitude limit; may be part of a 2- or 3-dimensional bounding region. geospatial_lon_min specifies the westernmost longitude covered by the dataset. See also geospatial_lon_max.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|geospatial_lon_max
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Describes a simple longitude limit; may be part of a 2- or 3-dimensional bounding region. geospatial_lon_max specifies the easternmost longitude covered by the dataset. Cases where geospatial_lon_min is greater than geospatial_lon_max indicate the bounding box extends from geospatial_lon_max, through the longitude range discontinuity meridian (either the antimeridian for −180:180 values, or Prime Meridian for 0:360 values), to geospatial_lon_min; for example, geospatial_lon_min = 170 and geospatial_lon_max = −175 incorporates 15 degrees of longitude (ranges 170 to 180 and −180 to −175).
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|geospatial_vertical_min
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Describes the numerically smaller vertical limit; may be part of a 2- or 3-dimensional bounding region. See geospatial_vertical_positive and geospatial_vertical_units.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|geospatial_vertical_max
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Describes the numerically larger vertical limit; may be part of a 2- or 3-dimensional bounding region. See geospatial_vertical_positive and geospatial_vertical_units.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|geospatial_vertical_positive
  | style="background-color:white; padding-left:10px; padding-right:10px;"|One of "up" or "down." If up, vertical values are interpreted as "altitude," with negative values corresponding to below the reference datum (e.g., under water). If down, vertical values are interpreted as "depth," positive values correspond to below the reference datum. Note that if geospatial_vertical_positive is down ("depth" orientation), the geospatial_vertical_min attribute specifies the data’s vertical location furthest from the earth’s center, and the geospatial_vertical_max attribute specifies the location closest to the earth’s center.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|geospatial_bounds
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Describes the data’s 2D or 3D geospatial extent in OGC’s Well-Known Text (WKT) Geometry format (reference the OGC Simple Feature Access (SFA) specification). The meaning and order of values for each point’s coordinates depends on the coordinate reference system (CRS). The ACDD default is 2D geometry in the EPSG:4326 coordinate reference system. The default may be overridden with geospatial_bounds_crs and geospatial_bounds_vertical_crs (see those attributes). EPSG:4326 coordinate values are latitude (decimal degrees_north) and longitude (decimal degrees_east), in that order. Longitude values in the default case are limited to the [−180, 180) range. Example: "POLYGON ((40.26 -111.29, 41.26 -111.29, 41.26 -110.29, 40.26 -110.29, 40.26 -111.29))".
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|time_coverage_start
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Describes the time of the first data point in the data set. Use the ISO 8601:2004 date format, preferably the extended format as recommended in the Attribute Content Guidance section.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|time_coverage_end
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Describes the time of the last data point in the data set. Use ISO 8601:2004 date format, preferably the extended format as recommended in the Attribute Content Guidance section.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|time_coverage_duration
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Describes the duration of the data set. Use ISO 8601:2004 duration format, preferably the extended format as recommended in the Attribute Content Guidance section.
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|time_coverage_resolution
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Describes the targeted time period between each value in the data set. Use ISO 8601:2004 duration format, preferably the extended format as recommended in the Attribute Content Guidance section.
|-
|}
|}


===Appendix B===
===Funding===
Examples of NCI’s Quality Control (QC) and Quality Assurance (QA) reporting
This research was financed by the National Research Agency (ANR) in France (Project FITEGE, contract: ANR-14-CE29-0014).


[[File:FigA1 Evans Informatics2017 4-4.png|700px]]
===Conflict of interest statement===
{{clear}}
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
{|
| STYLE="vertical-align:top;"|
{| border="0" cellpadding="5" cellspacing="0" width="700px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"| <blockquote>'''Figure A1.''' An example of NCI’s QC compliance report, which is shared with data producers and used to ensure that the dataset metadata meets the minimum requirements for a netCDF collection. In this particular example collection, 30 files were successfully scanned (zero skipped) and all elements of the QC process passed. In cases were elements are not fully compliant, the high/medium/low priority suggestions section at the end of the report is used to explain the nature of the errors found and list possible means for modification.</blockquote>
|-
|}
|}


[[File:FigA2 Evans Informatics2017 4-4.png|700px]]
==Footnotes==
{{clear}}
{{reflist|group=lower-alpha}}
[[File:FigA2b Evans Informatics2017 4-4.png|700px]]
{{clear}}
[[File:FigA2c Evans Informatics2017 4-4.png|700px]]
{{clear}}
{|
| STYLE="vertical-align:top;"|
{| border="0" cellpadding="5" cellspacing="0" width="700px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"| <blockquote>'''Figure A2.''' An example of NCI functionality QA test report. (a) The first section of the report provides a short summary of results and whether the data is considered functional with all the tested tools, and lists the details of the files that were used for the assessment, including the properties of the files, such as size, variable shape, chunk size, and compression (deflate) level. (b) The second section provides the results for the functionality tests performed on the data, directly on the filesystem. (c) The third section provides the results of the functionality tests using the data served through NCI’s THREDDS services.</blockquote>
|-  
|}
|}


==References==
==References==
Line 542: Line 87:


==Notes==
==Notes==
This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. Several URL from the original were dead, and more current URLs were substituted.
This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. Footnotes were originally numbered but have been converted to lowercase alpha for this version. The link in footnote j had to be applied to Google Shortener because the HUDOC uses invalid characters in their URLs, and this wiki's footnote system breaks when the original URL is used.


<!--Place all category tags here-->
<!--Place all category tags here-->
[[Category:LIMSwiki journal articles (added in 2018)‎]]
[[Category:LIMSwiki journal articles (added in 2018)‎]]
[[Category:LIMSwiki journal articles (all)‎]]
[[Category:LIMSwiki journal articles (all)‎]]
[[Category:LIMSwiki journal articles on data quality]]
[[Category:LIMSwiki journal articles on forensic science]]
[[Category:LIMSwiki journal articles on informatics‎‎]]
[[Category:LIMSwiki journal articles on health informatics‎‎]]

Revision as of 19:17, 20 August 2018

Sandbox begins below

Full article title How could the ethical management of health data in the medical field inform police use of DNA?
Journal Frontiers in Public Health
Author(s) Krikorian, Gaelle; Vailly, Joëlle
Author affiliation(s) Institut de recherche interdisciplinaire sur les enjeux sociaux (IRIS)
Primary contact Email: gaelle.krikorian@gmail.com
Editors Lefèvre, Thomas
Year published 2018
Volume and issue 6
Page(s) 154
DOI 10.3389/fpubh.2018.00154
ISSN 2296-2565
Distribution license Creative Commons Attribution 4.0 International
Website https://www.frontiersin.org/articles/10.3389/fpubh.2018.00154/full
Download https://www.frontiersin.org/articles/10.3389/fpubh.2018.00154/pdf (PDF)

Introduction

Various events paved the way for the production of ethical norms regulating biomedical practices, from the Nuremberg Code (1947)—produced by the international trial of Nazi regime leaders and collaborators—and the Declaration of Helsinki by the World Medical Association (1964) to the invention of the term “bioethics” by American biologist Van Rensselaer Potter.[1] The ethics of biomedicine has given rise to various controversies—particularly in the fields of newborn screening[2], prenatal screening[3], and cloning[4]—resulting in the institutionalization of ethical questions in the biomedical world of genetics. In 1994, France passed legislation (commonly known as the “bioethics laws”) to regulate medical practices in genetics. The medical community has also organized itself in order to manage ethical issues relating to its decisions, with a view to handling “practices with many strong uncertainties” and enabling clinical judgments and decisions to be made not by individual practitioners but rather by multidisciplinary groups drawing on different modes of judgment and forms of expertise.[5] Thus, the biomedical approach to genetics has been characterized by various debates and the existence of public controversies.

In the judicial sphere, the situation is very different. Since the end of the 1990s, developments in biomedical research have led to genetic data being used in police work and legal proceedings. Today, forensic science is omnipresent in investigations, not just in complex criminal cases but also routinely in cases of “minor” or “mass” delinquency. Genetics, which certainly receives the most media coverage among the techniques involved[6], has taken on considerable importance.[7] However, although very similar techniques are used in biomedicine and police work (DNA amplification, sequencing, etc.), the forms of collective management surrounding them are very different, as well as the ethico-legal frameworks and their evolution, as this text will demonstrate.

Keywords: DNA, police, ethics, genetic technologies, criminal investigations

Nature of the information and genetic data produced in the police sphere

In police work in France, data produced by DNA are currently compiled and used in two different ways: first, to create files on individuals in the FNAEG or Fichier national automatisé des empreintes génétiques (national automated DNA database) and, second, in order to obtain information about perpetrators of crimes (their appearance, their origin, their kinship links to other individuals).

Police use of DNA has been allowed in France since the 1998 law providing for the creation of the FNAEG. A DNA profile corresponds to a “specific individual alphanumeric combination”[8] that is the numerical encoding of analysis of DNA segments. This profile is the result of analysis of DNA fragments using genetic markers. This analysis can be carried out on a minute amount of genetic material (saliva, blood, sperm, hair, contact, etc.). It identifies the presence of sequences specific to an individual that differentiate them from any other person (with the exception of an identical twin) but that are not supposed to provide any phenotypical information (about appearance, geographical origin, or diseases).[a] Such profiles therefore make individuals “identifiable in their uniqueness.”[9] During investigations, DNA is collected from suspects or unidentified stains left on crime scenes or people and the results of this analysis are entered into the database. Identification through the FNAEG was originally restricted to a limited number of crimes—those of a sexual nature, as part of the law relating to the prevention and punishment of sexual crimes and the protection of minors. This remit has progressively been extended to include the vast majority of crimes and offences[b], leading to the routine use of DNA in investigations.[c] As a result of this evolution, there has been a substantial increase in the number of persons with files in the FNAEG, more than three million as of late 2015.[d]

New techniques have also emerged in recent years. It is now possible to obtain indications about an individual's physical appearance based on a sample of his/her DNA[10][11]: the analyses in question provide statistical information on eye, hair, and skin color, etc. These techniques are more exploratory and aim not to match DNA with an identity by comparison but to determine the characteristics of the perpetrator of a crime. These data result from analysis of several dozen DNA markers that, unlike the FNAEG's data, are selected deliberately so that they can provide information about a person's physical appearance. They are therefore aimed at “generating a suspect”[12] but because the information about this person's features are incomplete (e.g., a person with blue eyes, fair skin, light brown hair, and of European “bio-geographical” ancestry), they define “target populations of interest” to guide police investigations.[13] Several private and public laboratories in France now produce what professionals often refer to as “DNA photofits”; it is estimated that several dozen such analyses have been carried out since 2014 as part of investigations.

How is this framed legally, politically, and ethically?

The legal framework surrounding how the police and justice system use DNA analysis was devised to follow the creation of the FNAEG. For this reason, and in order to defuse fears and criticisms, the law only allows analyses using “non-coding” DNA so as to meet the initial objective of allowing identification without providing information about individuals. French law only provides the police DNA for identification purposes “within the framework of investigative measures or the preparation of a case during a judicial proceeding,”[e] in cases of missing persons[f], or, more recently, in the context of familial searches to allow “searches for persons directly related to [an] unknown person” who has left a stain at a crime scene (i.e., without determining phenotype).[g]

Concerning the so-called “DNA Photofit” technique, in June 2014, France's highest court, the Court of Cassation, ruled admissible an expert report charged with providing “all useful elements relating to the suspect's visible morphological characteristics” based on stains collected after a rape in an investigation into a series of sexual assaults in Lyon between October 2012 and January 2014. The Court of Cassation's authorization of this practice in DNA analysis was the first in France. For judges and prosecutors, there is now set a legal precedent allowing them to authorize “DNA Photofits” when they consider this could help an investigation.

In legal terms, the emerging of new technical possibilities and their practical use create conflicting and parallel regimes. On one hand, “DNA Photofits” do not correspond to the legal frameworks devised in the 1990s. It does not provide identification, per se, but is rather an “assistance to the investigation,” as it uses coding DNA. One another hand, as science evolves, the law is falling out of step with the technical and scientific reality. New knowledge shows that some of the markers used by the FNAEG may in fact allow further information to be obtained about people regarding their predisposition to certain diseases, their genetic pathologies, and their “ethnic origin” (by continent or sub-continent).[h] Moreover, whereas at the FNAEG's inception it was considered unacceptable for the police to use medical information, certain professionals in police or justice now recognize that this information (whether genetic or not) can be useful in investigations (providing information about wanted persons' need for medication or healthcare, or about their physical appearance, etc.). Although there are no changes in the legal framework on this matter, the idea is spreading and the red line is, to some extend, and for some of the professionals, fading.

It is thus obvious that police uses of DNA data providing information about individuals' characteristics raise novel politic-ethical issues.[17][18] In particular, it brings into play the issue of what constitutes private data[19]—for certain geneticists, where “DNA Photofits” are concerned, externally visible characteristics do not fall into this category because they are visible.[11] Generally, as stated by some professionals during interviews, the question is “to know until where to go. And where to stop.“ Regarding the FNAEG and French law, in a case heard in June 2017, the European Court of Human Rights (ECHR) ruled that “interference with the applicant's right to respect for his private life had been disproportionate.”[i] The ECHR judgment ruled against France and underscored that French law regarding DNA date storage should be differentiated “according to the nature and seriousness of the offence committed."[j]

In Germany, a contradictory dialogue between experts took place regarding Forensic DNA Phenotyping revealing public and political debate on the matter.[20] In France, despite the stakes involved and the spread of new usages of DNA techniques, no public debate has emerged in recent years concerning new uses of DNA in police work. In 2008, a private analysis laboratory offering indicative geo-genetic tests (tests d'origine géo-génétique or TOGG) providing information about individuals' origin based on their DNA sparked a media debate that complicated the issue[21]; however, the controversy soon died down. A few years later, Ministry of Justice instructions to judges and prosecutors discouraged the use of this technique, with no further debate. Since then, although the Court of Cassation's 2014 decision opened up the possibility of using an unprecedented practice, this has not generated any public debate or controversy.

“DNA Photofits” have received some media coverage[k], but this has mainly been to underscore the technical process involved, echoing the fiction conveyed by television series that have made the use of genetic techniques in criminal investigations seem commonplace and particularly efficient. Our sociological fieldwork has revealed, however, that there was organized debate among judges and prosecutors between 2013 and 2014. At the time, the investigating judge who had for the first time ordered the analysis of the suspect's visible morphological characteristics referred the case to the examining chamber himself, to obtain a verdict on whether the expert report he had requested was legal. Although the examining chamber approved the report, the public prosecutor brought the issue before the Court of Cassation—the highest legal authority in France—in order to ensure the final nature of the decision. The Court of Cassation ruled that a judge could have recourse to such analyses. Following this verdict, several bodies consulted by the Ministry of Justice[l] provided opinions underscoring the need for this technique to be written into and regulated by the law. This has not been implemented to date. After being authorized for several years under a temporary protocol, familial searches allowing “genetic proximity testing”[22] were written into law in 2016. However, the Court of Cassation's judgment on DNA analysis to provide “all useful elements relating to a suspect's visible morphological characteristics” has not been brought up for parliamentary debate to be included in the law. There has been no political management of the question at the state level, nor has the issue been included in the general debate organized by the National Consultative Council of Ethics (Comité Consultatif National d'Ethique) in 2018 regarding the revision of laws on bio-ethics.

Conclusion

The use of these new technological and scientific techniques plays a significant role in guiding how we engage with the world[23], just as it redefines the production of identity translated into information[24] and structures the way sensitive information about individuals is used and circulated. Despite these stakes, and the initial caution that surrounded the creation of the national automated DNA database, it has not gone hand-in-hand with collective political and ethical debate. This raises questions about the conditions for the existence or for the absence of political controversies that call for further sociological investigations about the framing of the issue and the social and political logic at play.

As the uses of these techniques are developing in police practices, this absence of collective management of the issue refers the professional to forms of local arbitration. Our fieldwork has shown that they are aware that these practices raise issues and therefore devise ethical frameworks for their own use of DNA. As a consequence, in this field, as it is the case in others, ethical issues are addressed in a fragmented manner as endogenous ethical frameworks are “cobbled together” by professionals as a function of their practices and needs. Each institution, laboratory, and in some cases each individual, is crafting a frame and a perimeter of limits to what can be done according to their understanding and appreciation of the legal setting, the practical utility of actions and the ethical constraints perceived.

The ECHR's recent ruling against France regarding the FNAEG may force lawmakers to reach a verdict on this issue, thereby triggering what seems like necessary public debate on forensic use of DNA. The new possibilities provided by genetic technologies point to the need for promoting dialogue among the various professionals using this technology in police work (forensic teams and geneticists working with them, police investigators, private laboratories, prosecutors, judges, etc.), but also with healthcare professionals—who already have experience of the institutionalized management of ethical considerations relating to their practices in genetics—and, more broadly, in society as a whole.

Acknowledgements

Authors are grateful to Lucy Garnier for translating this article from French.

Author contributions

GK is the main contributor. JV is the head of the research programme and collaborated to the writing of the article.

Funding

This research was financed by the National Research Agency (ANR) in France (Project FITEGE, contract: ANR-14-CE29-0014).

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Footnotes

  1. The Order of 10 August 2015 increased the number of markers analyzed to 21; policemen and analysis laboratories had three years to comply with this new requirement.
  2. Act n°98-468 of 17 June 1998 relative to the punishment of sexual crimes and the protection of minors introduced article 706-54 into the Code of Criminal Procedure making provision for the creation of an automated national database to centralize the DNA profiles of persons convicted of offences of a sexual nature. The remit of the database was then extended on several occasions. In 2001, it included serious crimes against persons. In 2003, the law on internal security extended it to persons convicted of or implicated in crimes and offences against persons or property.
  3. Collecting DNA samples in investigations is now the rule. An ad hoc body of staff has been trained over the past 15 years that almost systematically processes crime scenes.
  4. This figure was provided to the French Parliament by the Ministry of the Interior following a question by parliamentarian Sergio Coronado (member of the “Ecologist” parliamentary group) (http://questions.assemblee-nationale.fr/q14/14-79728QE.htm).
  5. Art. 16.11 of the Civil Code
  6. Art. 26, Domestic Security Guidance and Planning Act n° 95-73 of 21 January 1995
  7. This possibility was written into law in 2016 in article 796-56-1-1 of Act n° 2016-731 of 3 June 2016 strengthening provisions for the fight against organized crime, terrorism, and their financing, and improving the efficiency and guarantees of the criminal procedure.
  8. For example, according to a study by the Telethon Institute of Genetics and Medicine, D2S1388, one of the markers used by the FNAEG, plays a determining role in the transmission of pseudohyperkalaemia, a rare genetic disease.[14] In 2011, a publication by Chinese researchers highlighted the association between marker D21S11-28.2 and coronary heart disease.[15] A team of Portuguese researchers[16] has developed an online calculator capable of correlating certain markers used in the FNAEG's DNA samples with individual affiliation to population groups (Sub-Saharan Africa, Eurasia, East Asia, North Africa, Near East, North America, South America, and Central America).
  9. Case of Aycaguer V. France, 22 June 2017, 8806/12, ECHR, Court (Fifth Section)
  10. See legal summary, available at https://hudoc.echr.coe.int/eng#{%22itemid%22:[%22002-11703%22}
  11. A search conducted on the press database Europresse for the period 2010 to 2018 brought up around 70 pieces published mentioning the terms “DNA Photofits” or “Genetic photofits”.
  12. These bodies were the Commission nationale consultative des droits de l'homme (CNCDH – National consultative committee on human rights) and the approval committee for people authorized to conduct identification procedures using DNA profiles in the context of legal proceedings or the extrajudicial procedure for identifying deceased persons.

References

  1. Potter, V.R. (1970). "Bioethics, the science of survival". Perspectives in Biology and Medicine 14 (1): 127–53. doi:10.1353/pbm.1970.0015. 
  2. Vailly, J. (2013). The Birth of a Genetics Policy: Social Issues of Newborn Screening. Routledge. pp. 240. ISBN 9781472422729. 
  3. Isambert, F.A. (1980). "Éthique et génétique: De l'utopie eugénique au contrôle des malformations congénitales". Revue française de sociologie 21 (3): 331–54. doi:10.2307/3320930. 
  4. Pulman, B. (2005). "Les enjeux du clonage". Revue française de sociologie 46 (3): 413–42. doi:10.3917/rfs.463.0413. 
  5. Bourret, P.; Rabeharisoa, V. (2008). "Décision et jugement médicaux en situation de forte incertitude : l’exemple de deux pratiques cliniques à l’épreuve de la génétique". Sciences sociales et santé 26 (1): 128. doi:10.3917/sss.261.0033. 
  6. Brewer, P.R.; Ley, B.L. (2009). "Media Use and Public Perceptions of DNA Evidence". Science Communication 32 (1): 93–117. doi:10.1177/1075547009340343. 
  7. Williams, R.; Johnson, P. (2008). Genetic Policing: The Uses of DNA in Police Investigations. Willan. pp. 208. ISBN 9781843922049. 
  8. Cabal, C.; Le Déaut, J.-Y.; Revol, H. (2001). Rapport sur la valeur scientifique de l'utilisation des empreintes génétiques dans le domaine judiciaire. Assemblée nationale. ISBN 2111150177. 
  9. Bonniol, J.-L.; Darlu, P. (2014). "L’ADN au service d’une nouvelle quête des ancêtres?". Civilisations 63: 201–19. doi:10.4000/civilisations.3747. 
  10. Kayser, M.; de Knijff, P. (2011). "Improving human forensics through advances in genetics, genomics and molecular biology". Nature Reviews Genetics 12 (3): 179–92. doi:10.1038/nrg2952. PMID 21331090. 
  11. 11.0 11.1 Kayser, M. (2015). "Forensic DNA Phenotyping: Predicting human appearance from crime scene material for investigative purposes". Forensic Science International Genetics 18: 33–48. doi:10.1016/j.fsigen.2015.02.003. PMID 25716572. 
  12. M'charek, A. (2013). "Beyond Fact or Fiction: On the Materiality of Race in Practice". Cultural Anthropology 28 (3): 420–42. doi:10.1111/cuan.12012. 
  13. Caliebe, A.; Krawczak, M.; Kayser, M. (2018). "Predictive values in Forensic DNA Phenotyping are not necessarily prevalence-dependent". FSI Genetics 33: e7–e8. doi:10.1016/j.fsigen.2017.11.006. 
  14. Carella, M.; d'Adamo, A.P.; Grootenboer-Mignot, S. et al. (2004). "A second locus mapping to 2q35-36 for familial pseudohyperkalaemia". European Journal of Human Genetics 12 (12): 1073–6. doi:10.1038/sj.ejhg.5201280. 
  15. Hui, L.; Jing, Y.; Rui, M.; Weijian, Y. (2011). "Novel association analysis between 9 short tandem repeat loci polymorphisms and coronary heart disease based on a cross-validation design". Atherosclerosis 218 (1): 151–5. doi:10.1016/j.atherosclerosis.2011.05.024. PMID 21703622. 
  16. Pereira, L.; Alshamali, F.; Andreassen, R. et al. (2011). "PopAffiliator: online calculator for individual affiliation to a major population group based on 17 autosomal short tandem repeat genotype profile". International Journal of Legal Medicine 125 (5): 629–36. doi:10.1007/s00414-010-0472-2. PMID 20552217. 
  17. M'charek, A. (2008). "Silent witness, articulate collective: DNA evidence and the inference of visible traits". Bioethics 22 (9): 519-28. doi:10.1111/j.1467-8519.2008.00699.x. PMID 18959734. 
  18. MacLean, C.E.; Lamparello, A. (2014). "Forensic DNA phenotyping in criminal investigations and criminal courts: Assessing and mitigating the dilemmas inherent in the science". Recent Advances in DNA and Gene Sequences 8 (2): 104-12. PMID 25687339. 
  19. Toom, V.; Wienroth, M.; M'charek, A. et al. (2016). "Approaching ethical, legal and social issues of emerging forensic DNA phenotyping (FDP) technologies comprehensively: Reply to 'Forensic DNA phenotyping: Predicting human appearance from crime scene material for investigative purposes' by Manfred Kayser". Forensic Science International Genetics 22: e1–e4. doi:10.1016/j.fsigen.2016.01.010. PMID 26832996. 
  20. Buchanan, N.; Staubach, F.; Wienroth, M. et al. (2018). "Forensic DNA phenotyping legislation cannot be based on “Ideal FDP”—A response to Caliebe, Krawczak and Kayser (2017)". FSI Genetics 34: e13–e14. doi:10.1016/j.fsigen.2018.01.009. 
  21. Vailly, J. (2017). "The politics of suspects’ geo-genetic origin in France: The conditions, expression, and effects of problematisation". BioSocieties 12 (1): 66–88. doi:10.1057/s41292-016-0028-x. 
  22. Prainsack, B. (2010). "Chapter 2: Key issues in DNA profiling and databasing: Implications for governance". In Hindmarsh, R.; Prainsack, B.. Genetic Suspects: Global Governance of Forensic DNA Profiling and Databasing. Cambridge University Press. pp. 15–39. ISBN 9780521519434. 
  23. Williams, R.; Wienroth, M. (2017). "Social and ethical aspects of forensic genetics: A critical review". Forensic Science Review 29 (2): 145–69. PMID 28691916. 
  24. Aas, K.F. (2006). "‘The body does not lie’: Identity, risk and trust in technoculture". Crime, Media, Culture: An International Journal 2 (2): 143-158. doi:10.1177/1741659006065401. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. Footnotes were originally numbered but have been converted to lowercase alpha for this version. The link in footnote j had to be applied to Google Shortener because the HUDOC uses invalid characters in their URLs, and this wiki's footnote system breaks when the original URL is used.