Journal:Data management: New tools, new organization, and new skills in a French research institute

From LIMSWiki
Revision as of 17:10, 3 October 2017 by Shawndouglas (talk | contribs) (Saving and adding more.)
Jump to navigationJump to search
Full article title Data management: New tools, new organization, and new skills in a French research institute
Journal LIBER Quarterly
Author(s) Martin, Caroline; Cadiou, Colette; Jannès-Ober, Emmanuelle
Author affiliation(s) National Research Institute of Science and Technology for Environment and Agriculture
Primary contact Email: caroline dot martin at agrenium dot fr
Year published 2017
Volume and issue 27(1)
Page(s) 73–88
DOI 10.18352/lq.10196
ISSN 2213-056X
Distribution license Creative Commons Attribution 4.0 International
Website https://www.liberquarterly.eu/articles/10.18352/lq.10196/
Download https://www.liberquarterly.eu/articles/10.18352/lq.10196/galley/10691/download/ (PDF)

Abstract

In the context of e-science and open access, visibility and impact of scientific results and data have become important aspects for spreading information to users and to the society in general. The objective of this general trend of the economy is to feed the innovation process and create economic value. In our institute, the French National Research Institute of Science and Technology for Environment and Agriculture, Irstea, the department in charge of scientific and technical information, with the help of other professionals (scientists, IT professionals, ethics advisors, etc.), has recently developed suitable services for researchers and their data management needs in order to answer European recommendations for open data. This situation has demanded a review of the different workflows between databases, questioning the organizational aspects among skills, occupations, and departments in the institute. In fact, data management involves all professionals and researchers assessing their workflows together.

Keywords: data management, datasets, databases, data publication, skills, services, e-science, open data

Introduction

Irstea, the National Research Institute of Science and Technology for Environment and Agriculture, is a public research institute under the joint supervision of the Ministry of Research and the Ministry of Agriculture in France. Irstea has built a multidisciplinary and systemic approach to three domains: water, environmental technologies and territories, which today form the basis of its strength and originality. The appropriation of scientific results is a very important mission of the institute. It wants to be a link between practitioners and scientists and represents a collaborative space dedicated to the co-construction of knowledge.

With exponential growth and massive data production in all areas of science, management and recovery of data becomes, in the digital age, a crucial issue in technical, scientific and economic policy. These evolutions affect the production and use of scientific information, and consequently they impact on the practices of information professionals at several levels:

  • Technical: for the increasingly rapid evolution of software tools, IT infrastructures and practices; and
  • Organizational and behavioral: new approaches leading to new modes of production and development of scientific and technical production.

This requires an important adaptation of skills and missions of scientific and technical information (STI) professionals.[1] In fact they are expected to serve more closely the different requirements for data processing, reporting and disseminating.[2] These new missions are quite natural in the continuity of the usual accompanying activities (training and local support) made by librarians from the scientific teams.[3]

At Irstea, STI professionals have developed support services for research projects, which include an involvement in the processes of management and enhancement of scientific data. After the presentation of the external and internal context of Irstea, we present the needs and questions of researchers around the data life cycle.

The new services proposed at the moment are:

  • Developing guidelines on data management, archiving and diffusion;
  • Managing a quality process on data management (ISO 9000 Quality certification);
  • Training sessions and seminars in the institute to share this knowledge and to point out new skills and new transversal working methods. In particular, this involves a management approach in order to support staff for future changes in the skills of librarians in a research institution by building a training plan, and the implementation of an STI operational organization to answer future challenges in the scientific research.
  • Developing services such as, e.g., DOI creation, as the development of “IrsteaData” (a catalogue of datasets), an important new part of our data publication system based on a close collaboration between librarians, scientists, technicians, IT professionals, lawyers, etc.;
  • Working on our vocabularies: as we have opened our institutional repository (CemOA), we are working on the mapping of our referentials, with major vocabularies for all our research products (we're currently mapping our Irstea thesaurus with Agrovoc and Gemet thesauri, and we plan to use Orcid and Geonames also).

These new services have brought the different workflows into question, especially for the deposit process of data publication and the management of the interoperability between the databases of the institute. Establishing the link between data, publications and other scientific or technical productions (reports, etc.) related to the same research project has meant adopting another point of view about the management of the information system and involving the whole staff in a transversal project about their skills and roles in the scientific process.

We will see how STI professionals are now able to play a pivotal role in the management of data in relation to the other actors involved. STI services will be presented within their place in the data management systems. Finally, we will highlight the leading role of STI professionals in a process of evolution of the interoperability between the various resources of the scientific information system of Irstea.

Data management and scientific needs

Data background at Irstea: Typology and institutional policy

The external environment, including open science, open data and H2020, requires researchers to open and document their data.

The data at Irstea are characterized by their heterogeneity at all levels of description, such as:

  • their typology: data from laboratory or field measurements, surveys of territorial actors (sociology, economics, agronomy, management science etc.), photos, videos, UAV images, GIS data, texts
  • their field: biodiversity, ecology, natural hazards (avalanches, floods), robotics, pollution and water chemistry, refrigeration engineering, economy, sociology, etc.;
  • their nature: public data (e.g., www.avalanches.fr), confidential data (from companies or for patenting), data privacy, licensing data (geographical data), weather/climatic data, social enquiries, agricultural and economic statistics (Eurostat, Agreste, INSEE), reused or produced by researchers;
  • the data management modes of organization: who does what, how and when and with whom? Lots of questions with many answers that depend on the type of players involved (researchers, data managers, IT experts, STI professionals, etc.).

Irstea has a policy for data management and a quality process framework. Nevertheless, data management at Irstea still requires further development in order to become completely operational.

The needs and initiatives of researchers

Several surveys were conducted with researchers (one about the use of data papers, e.g.) but lack of responses did not allow for an in-depth analysis of needs. An internal seminar was organized in February 2015 to identify who works with data, and to collect their needs in terms of management, storage, archiving and dissemination of data and datasets. During this seminar, several workshops were held on data management issues (metadata, storage and archiving conditions, terms of dissemination). The seminar was attended by about sixty people (64 members) from the information and communication technology domain, as well as researchers from a wide range of scientific fields (hydrology, biodiversity, remote sensing).

  • Significant support needs:
Data life cycle stage Expressed needs Impact on organization, and actors
Data production and processing – Deployment of tools to describe data and datasets; Upstream data management from the outset of data creation queries research methods and touches on individual and or/collective habits of use. This stage has a strong impact on the following stages

– Choice and use of standards to document datasets and also what types of format to select. The granularity of the data documentation is a significant question for researchers because it determines the data quality

Upstream data management from the outset of data creation queries research methods and touches on individual and or/collective habits of use. This stage has a strong impact on the following stages.
Data preservation More IT support making available long-term and medium storage; The question of storage space is asked Data preservation must be handled within a perspective of strategic and cultural changes in organization to the storage and conservation of data under agreement. The choice of archiving formats is important.
Access, dissemination and re-use The thorny issue of access is inseparable from the issue of security and traceability. Law support on the conditions of access of the data is unanimously required, as well as the conditions for re-use of data. How to make data available, in accordance with which strategy in a legal context which is hard to understand for the researcher. The researcher remains dubious, even suspicious of conditions for the dissemination of his/her data, in an open science context.
  • The bottom-up initiatives:
Researchers organize themselves to manage and share their data.[4] Moreover, in the last few years, research teams in the humanities have taken the initiative, such as in the case of the “Sygade” database (management and archiving system of interview’s data): the fruit of joint work among researchers, IT experts and librarians. Other teams have organized themselves to manage, exploit and deploy data in different fields (hydrology, biodiversity) and various supports (measurement data, photo collections, etc.). In this context, it should be noted that the support of STI professionals on the choice of standards, formats for archiving and tools is often appreciated.

Scientific and technical information department (STI): A legitimacy to build

STI position on new themes of data management

The scientific and technical information (STI) within French research organizations covers aspects of access, management and processing of scientific information, mainly the management of digital resources (subscription to major publishers), but also services like reporting, cataloguing and advocacy of scientific and technical production of the researchers, bibliometrics, scientometrics, as well as publishing activities. It is an inclusive activity that occurs at different stages of scientific production. We can distinguish two types of professionals: the librarians who work within libraries in French universities and the STI professionals who work in the research institutes (CNRS, INRA). The professional status and the activities are different. STI professionals have developed more support activities for data management than librarians dedicated to the management of electronic resources and the libraries.

Irstea, as a research institute, has an STI service consisting of 20 people (24 persons equal to 22 full-time employees) working on a broad panel of activities (Figure 1). Faced with technological changes and developments in practices of researchers, STI at Irstea has developed a forward study on the evolution of STI services entitled “Which STI mission statement for research in 2030?”. This study, conducted in 2013, identified new activities and missions around the data issue, as well as new services to propose for the researchers. The timeline of the events leading to and following this study are given in Figure 2.


Fig1 Martin LIBER2017 27-1.jpg

Figure 1. Scientific and technical information activities are present all along the research process

Fig2 Martin LIBER2017 27-1.jpg

Figure 2. Timeline of the events

Beyond this work, this has also allowed a better understanding of the entire scope covered by the STI staff in the institute, by researchers first and later by the top management. This initiative has also identified the cross-cutting nature of the issue of data management and put forward the necessary cooperation between services, especially with the IT services that are often considered naturally legitimate, from technical aspects to process management data issues.

A new path worn by the management of the institute

The reorganization in 2011 has positioned the STI function in the research and innovation department, very close to science strategy, and also responsible for scientific coordination. Moreover, the wish of Irstea's leaders to obtain ISO 9001 certification for the institute has developed a quality approach towards data management. All Irstea scientific, business and support processes were identified and documented. The quality process “data management” is now steered by the head of the STI department; this legitimized the fact that STI staff are at the center of the coordination of these activities, as well as computer scientists, database administrators and IT staff.[5] This context of quality approach has helped encourage librarians to get involved in this process, because it helps make a strategic plan for them.[6] A cross-team “research data management” plan was put into place, representing all stakeholders of the data management process. This team, led by the STI, organized a seminar open to all, focused around six objectives:

  • Identifying scientific “data” advisors;
  • Helping scientific teams to integrate all the operations necessary for operational data management, and identifying support staff to contact for each step;
  • Identifying crossover skills in the data management issues;
  • Building a model of the data management plan;
  • Communicating the plan and training to researchers and stakeholders; and
  • Having communication tools for centralizing information.

Currently the team is open to IT experts, representing patenting and licensing, advocacy, legal and quality experts of the institute, and it includes scientists for each project.

The development of a range of appropriate services

Out of the six activities concerning the data life cycle, four were identified as belonging to the natural STI services. The table below shows the type of service offered by the STI for each of these four activities, identified in the various stages of the cycle (Figure 3).


Fig3 Martin LIBER2017 27-1.jpg

Figure 3. From “the research data lifecycle” http://data-archive.ac.uk/create-anage/life-cycle, the scientific and technical information department works on the data lifecycle in order to position its activity in each step of the data lifecycle and to provide support for the research process concerning data management.

References

  1. American Library Association (2014). "The State of America's Libraries: A Report from the American Library Association" (PDF). American Libraries. pp. 79. http://www.ala.org/news/sites/ala.org.news/files/content/2014-State-of-Americas-Libraries-Report.pdf. Retrieved 18 November 2016. 
  2. MacMillan, D. (2014). "Data Sharing and Discovery: What Librarians Need to Know". The Journal of Academic Librarianship 40 (5): 541–549. doi:10.1016/j.acalib.2014.06.011. 
  3. Schmidt, B.; Shearer, K. (June 2016). "Librarians' Competencies Profile for Research Data Management" (PDF). pp. 7. https://www.coar-repositories.org/files/Competencies-for-RDM_June-2016.pdf. Retrieved 18 November 2016. 
  4. JISC (June 2012). "Researchers of Tomorrow: The Research Behaviour of Generation Y Doctoral Students" (PDF). pp. 85. Archived from the original on 14 June 2014. https://www.webarchive.org.uk/wayback/archive/20140614205429/http://www.jisc.ac.uk/media/documents/publications/reports/2012/Researchers-of-Tomorrow.pdf. Retrieved 18 November 2016. 
  5. Grudzień, Ł.; Hamrol, A. (2016). "Information quality in design process documentation of quality management systems". International Journal of Information Management 36 (4): 599–606. doi:10.1016/j.ijinfomgt.2016.03.011. 
  6. Saunders, L. (2015). "Academic Libraries' Strategic Plans: Top Trends and Under-Recognized Areas". The Journal of Academic Librarianship 41 (3): 285-291. doi:10.1016/j.acalib.2015.03.011. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original article lists references alphabetically, but this version — by design — lists them in order of appearance.