Journal:Development and governance of FAIR thresholds for a data federation

From LIMSWiki
Jump to navigationJump to search
Full article title Development and governance of FAIR thresholds for a data federation
Journal Data Science Journal
Author(s) Wong, Megan; Levett, Kerry; Lee, Ashlin; Box, Paul; Simons, Bruce; David, Rakesh; MacLeod, Andrew; Taylor, Nicolas; Schneider, Derek; Thompson, Helen
Author affiliation(s) Federation University, Australian Research Data Commons, Commonwealth Scientific and Industrial Research Organisation, University of Adelaide, The University of Western Australia, University of New England
Primary contact Email: mr dot wong at federation dot edu dot au
Year published 2022
Volume and issue 21(1)
Article # 13
DOI 10.5334/dsj-2022-013
ISSN 1683-1470
Distribution license Creative Commons Attribution 4.0 International
Download (PDF)


The FAIR (findable, accessible, interoperable, and re-usable) principles and practice recommendations provide high-level guidance and recommendations that are not research-domain specific in nature. There remains a gap in practice at the data provider and domain scientist level, demonstrating how the FAIR principles can be applied beyond a set of generalist guidelines to meet the needs of a specific domain community.

We present our insights developing FAIR thresholds in a domain-specific context for self-governance by a community (in this case, agricultural research). "Minimum thresholds" for FAIR data are required to align expectations for data delivered from providers’ distributed data stores through a community-governed federation (the Agricultural Research Federation, AgReFed).

Data providers were supported to make data holdings more FAIR. There was a range of different FAIR starting points, organizational goals, and end user needs, solutions, and capabilities. This informed the distilling of a set of FAIR criteria ranging from "Minimum thresholds" to "Stretch targets." These were operationalized through consensus into a framework for governance and implementation by the agricultural research domain community.

Improving the FAIR maturity of data took resourcing and incentive to do so, highlighting the challenge for data federations to generate value whilst reducing costs of participation. Our experience showed a role for supporting collective advocacy, relationship brokering, tailored support, and low-bar tooling access, particularly across the areas of data structure, access, and semantics that were challenging to domain researchers. Active democratic participation supported by a governance framework like AgReFed’s will ensure participants have a say in how federations can deliver individual and collective benefits for members.

Keywords: agriculture, AgReFed, FAIR data, community, governance, RM-ODP, data federation

Context and contribution

The agriculture data landscape is complex, comprising of a range of data types, standards, repositories, stakeholder needs, and commercial interests, creating data silos and potential "lock-ins" for consumers.[1][2] There is an urgent need to work toward clear, ethical, efficient agricultural data sharing practices[3][4] that incorporate improvements to discoverability, accessibility, interoperability, and quality of data across the value chain.[5][6][7] A priority stakeholder question across the agri-tech sector is "how do we create systems whereby people feel confident in entering and sharing data, and in turn how do we create systems to govern data for the benefit of all?"[2]

Agricultural data stakeholders span the public and private sector, including farmers, traders, researchers, universities, consultants, and consumers. Their varied needs around data type, trustworthiness, timeliness, availability, and accuracy shape the many data capture, storage, delivery, and value-add products emerging across the public and private sector.[1][8] Data providers require confidence in data infrastructure governance before they share their data, in turn requiring ethics of ownership, access, and control. Strong value propositions are also key. This helps grow participants via a "network effect," increasing infrastructure value further.[2][7][9]

Offerings of the many data infrastructures vary and may include a means for:

  • depositing data for persistence, citation, publisher, and funding requirements[10];
  • increasing collaborative opportunities;
  • enhancing regulatory compliance;
  • improving on-farm operations;
  • leveraging standardization, quality assurance, and quality control pipelines and specialist analysis capacity[11][12];
  • running simulations through virtual research environments (VREs)[13];
  • performing cross-domain data integration[14]; and
  • linking data and models to knowledge products and decision support tooling.[15]

If the goal is to make data trusted, discoverable, and re-usable across the sector[16][17], then a single platform is unlikely to meet all (public, private, commercial) needs.[2][16] Sector concerns include among others vendor lock-ins and a tendency towards stifling innovation.[2] As such, a grand challenge is found in how data can be discovered and made interoperable among so many different databases and infrastructures. One solution is a decentralized federated approach where there is no single master data repository or registry[11] but rather a network of independent databases and infrastructures that can deliver data through a shared platform using standard transfer protocols via application programming interface (API). The data still remains with providers, as can access controls.

Preferring a single front-end source of data, as found in data federation, is not novel, and many of the FAIR (findable, accessible, interoperable, and re-usable) principles[18] underpin data federations’ functions. Some examples include the Earth System Grid Federation[19], materials science data discovery[20], and OneGeology.[21] In the case of agriculture, there is the Ag Data Commons[22], the proposed U.K. Food Data Trust[16], AgINFRA[23], and CGIAR Platform for Big Data In Agriculture.[24] Many of these data federation initiatives specify standards for the description and exchange of data, focusing on a particular data type of provider and/or providing a central intermediate space to standardize data. However, we believe agriculture requires a different approach given the diversity of data stores, one that addresses the ways data is structured, described, and delivered; differences in organizational and research requirements and norms; and economic, trust, and intellectual property concerns connected to agricultural data in general.

Since 2018, we have piloted a community-governed federation approach via the Agricultural Research Federation (AgReFed).[25] Participants provisioned their data holdings from their own choice of data repository aligned to their organization's capabilities and requirements of their research field. Concurrently, they aligned with collective expectations for FAIR data. This required developing acceptable levels of FAIR data to be implemented and governed by AgReFed participants. Current practices adopt FAIR as a high-level set of guiding principles[18] or a set of generalist practice recommendations.[26] This case study addressed this gap in an agricultural-specific implementation of FAIR in practice. As part of this study, we co-developed FAIR threshold criteria for participants to deliver data through a federation, and, through a consensus process, we integrated these FAIR thresholds into a framework for ongoing governance by a research domain community, for generating individual and collective benefit and growth of a data federation.

Use cases

The datasets of the pilot included point observations, as well as spatial, temporal, on-ground, sensor, and remote sensed data. The data described plants (yield, crop rotation, metabolomic, proteomic, hyperspectral), soil, and climatic variables from across Australia (Table 1).

Table 1. The data providers and their data products. An * indicates both data provider and user of the dataset or collection.
In-text abbreviation Data product name Data product type Data provider to AgReFed
SH (Soil Health) Corangamite Soil Health Monitoring Program Data[27] Dataset and service Federation University, Centre for eResearch and Innovation (CeRDI)
SMN-1 (Soil Moisture Network 1) Southern Farming Systems Moisture Probe Network Data[28] Dataset Federation University, CeRDI
WT (Wheat Trials) Waite Permanent Rotation Trial[29] Dataset * University of Adelaide, School of Agriculture, Food and Wine
NS (NatSoils) CSIRO National Soil Site Database[30] Dataset and service Commonwealth Scientific and Industrial Research Organisation (CSIRO)
SLG (Soil and Landscape Grid) Soil and Landscape Grid National Soil Attribute Maps - Available Water Capacity (3" resolution) - Release 1[31] Data product (maps), collection, and service CSIRO
FT (Frost Trials) UWA/DPIRD Frost Nursery Trial 2018[32] Dataset and collection * University of Western Australia (UWA) and Department of Primary Industries and Regional Development (DPIRD)
SMN-2 (Soil Moisture Network 2) SensorNets - SMART Farms Soil Moisture Network[33] Dataset * University of New England (UNE)

The data providers defined a set of research use cases for the data in Table 1[34], identifying the current and anticipated data users and their ideal user experience. We then identified the requirements of the AgReFed platform, the data and metadata needed to deliver the use cases, and the FAIR principles that supported these requirements. These requirements are:

  • Allow the datasets and the services delivering the data to be discovered through metadata. Ideally the ability to discover should be persistent and through multiple avenues (Findable Q1, Q2, and Q3, and Accessible Q4 and Q7; Table 2).
  • Support appropriate data reuse and access controlled from the providers’ infrastructure through licensing, data access controls, and attribution (Accessible Q5 and Q6, and Reusable Q12 and Q14; Table 2).
  • Allow the data to be queried on user-defined parameters, including temporal and spatial properties, what is being measured (e.g., "wheat," "water"), the observed property being measured, the result, the procedure used to obtain the result, and the units of measurement (Interoperable Q9 and Q10, and Accessible Q6; Table 2).
  • Allow a subset of the data to be visualized through the platform and downloaded in a useable format (e.g., .csv). This requires a web service interface (Accessible Q6 and Interoperable Q8 and Q9; Table 2).
  • Allow the combining of data from different datasets (Interoperable Q8 and Q9; Table 2). This requires the ability to map terms in the data to external vocabularies and semantics (e.g., replacing local descriptive terms with published controlled vocabulary concepts, such as "m" or "meter" with "") (Interoperable Q10; Table 2).
  • Allow locality to be interoperable between datasets (e.g., latitude and longitude with coordinate reference system) (Interoperable Q9 and Q10; Table 2).
Table 2. AgReFed Version 1 FAIR thresholds for participation.[25] Light grey indicates the AgReFed minimum acceptable requirements ("Minimum thresholds") and dark grey the ideal ("Stretch targets"). The start-status and end-status indicate the progression of FAIR maturity. Data products are SH = Soil Health; SMN-1 = Soil Moisture Network 1; WT = Wheat Trials; NS = NatSoils; SLG = Soil Landscape Grid; FT = Frost Trials; SMN-2 = Soil Moisture Network 2.

1 Indicates the minimum metadata requirement for data collections and services.[25]

2 "Machine-readable" defined in terms of both syntax and structure, that is, as the representation of data products in a standard computer language that is structured in a way that is interpretable by machines.

Requirement and details Start-status End-status
Q1. The data product has been assigned (an) identifier(s).
No identifier FT
Local identifier
Web address (URL) SH, SMN-1
Globally unique, citable, and persistent identifier (e.g., DOI, PURL, or Handle) WT, NS, SLG, SMN-2 SH, SMN-1, WT, NS, SLG, FT, SMN-2
Q2. The data product identifier is included in all metadata records/files describing the data.
No SH, SMN-1, FT, SMN-2
Q3. The data product is described by a metadata record.
Not described SH, SMN-1, FT
Brief title and description SMN-2
Brief title, description, and other fields WT, NS
Comprehensively1 in a formal metadata schema SLG SH, SMN-1, WT, NS, SLG. FT, SMN-2
Q4. The data product is described by a metadata record that is indexed in a searchable registry or repository.
Not indexed SH, SMN-1, FT
Local institutional repository
Domain-specific repository
Generalist public repository
Discoverable through several places (i.e., other registries, Research Data Australia, Google Data Search) WT, NS, SLG, SMN-2 SH, SMN-1, FT, WT, NS, SLG, SMN-2
Q5. How accessible is the data? The access method(s) must be explicitly stated in the metadata record e.g., if any authentication is needed, or there are any restrictions to access.
Not accessible SH, SMN-1
Access to metadata only
Through unspecified access conditions e.g., "contact the data custodian to discuss access" NS, FT, SMN-2 SMN-2
Embargoed access after a specified date; or a de-identified version of the data is publicly accessible
Fully accessible public, or to persons who meet and follow explicitly stated conditions and processes, e.g., ethics approval for sensitive data WT, SLG SH, SMN-1, NS, FT, WT, SLG
Q6. Data are available for reuse via a standardized communication protocol, such as file download over https, or a web service.
No access to data SH, SMN-1, FT
By individual arrangement SMN-2 SMN-2
File download online WT, SLG (partial)
Non-standard web service (e.g., OpenAPI/Swagger/informal API) WT, FT
Standard web service API (e.g., OGC) NS, SLG (partial) SH, SMN-1, NS, SLG (full)
Q7. The repository/registry agrees to maintain the persistence of the metadata record, even if the data product is no longer available.
No, or not applicable if no metadata record SH, SMN-1, FT
Unsure WT
Yes NS, SLG, SMN-2 SH, SMN-1, NS, SLG, FT, SMN-2, WT
Q8. The data products are available in (an) open (file) format(s).
Data are mostly available only in a proprietary format WT, FT
Data are available in an open format SH, SMN-1
Data are available in an open, documented, widely used standard format (e.g., NetCDF, CSV, JSON, XML) NS, SLG, SMN-2 SH, SMN-1, WT, NS, SLG, FT, SMN-2
Q9. The data is machine-readable.2
The data are unstructured SMN-1, WT, FT
The data are structured and machine-readable (e.g., CSV, JSON, XML, RDF, database files) SH, NS, SLG, SMN-2 SH, SMN-1, WT, NS, SLG, FT, SMN-2
Q10. The data are semantically interoperable, because they use standard, accessible ontologies and/or vocabularies to describe the data elements/variables.
Data elements are not described (i.e., fields or objects are labelled with codes or not at all) SMN-2
Data elements are described (so that a human user can correctly interpret the data), but no standards have been used in the description SH, SMN-1, WT, FT SMN-2
Recognized standards have been used in the description of data elements, but no published vocabularies with resolvable URIs NS, SLG SLG, FT
Published vocabularies using resolvable global identifiers linking to explanations are used, so that the data can be read and understood by machines as well as humans SH, SMN-1, NS, WT
Q11. The relationships to other data and resources (e.g., related datasets, services, publications, grants, etc.) are described in the metadata or data, to provide context around the data.
There are no links to other metadata or data SH, SMN-1, FT, SMN-2 SMN-2
The metadata record includes URI links to related metadata, data, and definitions WT, NS NS
Qualified links to other resources are recorded in a machine-readable format, e.g., a linked data format such as RDF SLG SH, SMN-1, WT, SLG, FT
Q12. Machine-readable data licenses are assigned to each data product and are stated in the metadata record.
No license applied SH, SMN-1, FT, SMN-2 FT (standard license but not in metadata record)
Non-standard license applied, with a machine-readable license/license deed URL WT
Standard license applied, without a machine-readable license deed URL
Standard license applied, with a machine-readable license/license deed URL NS, SLG SH, SMN-1, WT, NS, SLG, SMN-2
Q13. The provenance of the data product is described in the metadata i.e., project objectives, data generation/collection (including from external sources) and processing workflows.
None recorded SH, FT, SMN-2 FT
Partially recorded SMN-1, WT SMN-2
Comprehensively recorded in a text format (e.g., TXT or PDF) NS, SLG WT, NS, SLG
Comprehensively recorded in a machine-readable format (e.g., in metadata record’s schema or PROV, or in RDF, JSON, NetCDF, or XML) SH, SMN-1
Q14. The preferred citation for the data product is provided in metadata record.
No SH, FT, SMN-1
Citation but with no persistent identifiers
Citation with persistent identifiers WT, NS, SLG, SMN-2 SH, SMN-1. WT, NS, SLG, FT, SMN-2

Data collection and service records need to be discoverable through the federation’s platform.[35] AgReFed currently harvests from Research Data Australia.[34] Therefore, it is an additional requirement that minimum metadata is entered into or harvestable by Research Data Australia.[25]

Developing and testing the FAIR thresholds

The development of the FAIR thresholds for AgReFed participation were co-developed by the participating research data experts and data providers. Baseline assessments of the "FAIRness" of providers’ data ("Start-status" in Table 2) were made using the Australian Research Data Commons (ARDC) FAIR data self-assessment tool.[36] The manual ARDC self-assessment tool articulates various levels of FAIR maturity (or "FAIRness") of data and metadata from "not at all discoverable" or "machine understandable," through to "fully understandable by both humans and machines." It also serves as an education resource for providers working to improve the FAIR maturity of their holdings.

Following this baseline assessment, providers determined where improvements could be made to move their data products along the FAIR continuum. Solutions were identified that met their own organization's goals and capabilities, as well as their end users’ needs. These were combined with requirements in the use cases to identify "Minimum thresholds" of data maturity required to support key platform functionality for data and metadata discovery, access and reuse through AgReFed. "Stretch targets" were also defined to communicate to the agricultural research community the level of data maturity that enables maximal data integration and (re)use (see the shading in Table 2).

As well as the addition of "Minimum thresholds" and "Stretch Targets," the content of the ARDC FAIR tool was modified somewhat to assist with ease of interpretation (Table 2). Changes made in response to user feedback included:

  • Examples of some possible information and technology solutions were worked into the questions and answers.
  • The concept of "comprehensive" metadata was clearly specified for both data collection and service records.[25]
  • "Preferred citation" in the metadata was added as an AgReFed requirement (Q14).
  • The openness of the file format was separated from the machine readability of the data (Q8). The term "Machine-readable" was defined in terms of both syntax and structure, that is, as the representation of data products in a standard computer language structured in a way that is interpretable by machines (Q9).
  • A challenge for data providers was that their data and metadata were not only individual datasets contained in a single file but multiple collections, derivations (e.g., maps) and data service endpoints. Hence the ARDC FAIR assessment was refocused from the word "data" to "data product," being the data collection or product that is provided to users, along with any associated metadata or services required for its delivery. For simplicity of a manual assessment, our Q1 – 4, 7, 10, 12 and 13 are focused on assessment of the metadata and 5, 6, 8, 9 and 10 on the data. It is acknowledged that data and metadata can be assessed for FAIRness independently[26], and the feasibility of assessing this way for AgReFed’s purposes should be evaluated in the future.

The ARDC and the Centre for eResearch and Digital Innovation (CeRDI) supported the data providers to improve the level of FAIR maturity of their data across a twelve-month period (2018–2019). The baseline assessments, progress to the final states, and the information and technology solutions used at those states are available as supplementary data.[37] Some notable experiences informed the AgReFed FAIR "Minimum thresholds" to "Stretch targets" (Table 2). These included:

  • Providers’ exemplar data products each had different FAIR starting points (see "Start-status," Table 2).
  • Improvements to metadata records to meet AgReFed "Minimum threshold" requirements were possible with organizational library and ARDC support (Findability Q1 to Q4), so "Minimum thresholds" were set high for Q1 to Q3.
  • Access requirements and licensing varied. These were accommodated across the thresholds of Q5 and Q12.
  • Data format and structure (Interoperability Q8 and Q9) and data access method (Accessibility Q6) varied between providers, as did FAIR solutions. The solutions varied depending on the data types and the organizational/research group aspirations, skills, and IT support available. So, examples of acceptable solutions were given for Interoperability Q8 and Q9, and "Minimum thresholds" to "Stretch targets" highlighted for Accessibility Q6. Provider examples included a data service provider converting sensor data from web-viewable-only HTML to O&M structured data in machine-readable format (JSON), delivered by Sensor Things API via Frost-server. Agronomic researchers converted data in Microsoft Excel tables to PostgreSQL and MySQL databases with partial O&M design patterns. These delivered JSON and CSV by Swagger PostgREST API and OpenAPI.
  • The semantic interoperability (Q10) of the data products was initially highly variable. However, no data providers utilized vocabularies that were FAIR[38] or near to FAIR. This was a "Stretch target" for providers, reflected in the sliding scale from the "Minimum threshold" (Q10). Providers described data with the URIs of external machine-readable vocabularies from within their database headers or lookup tables. These were expressed through the API endpoints. Challenges included finding and selecting vocabularies—including evaluating authority and persistence—and the need to create[39] and therefore upskill.
  • Provenance was recorded in different formats, reflected in "Minimum threshold" to "Stretch target" (Reusable Q13). Improvements were inconsistent and further work is needed defining "comprehensive" content.

FAIR threshold governance and implementation

The FAIR thresholds were presented to the AgReFed Council and approved through consensus (see AgReFed Council Terms of Reference)[40] for integration into AgReFed’s Membership and Technical Policy.[34][40] An in-depth discussion of AgReFed’s architecture is not the focus of this practice paper and is reported elsewhere.[25] However, we provide an overview in the context of how founding members implemented the governance around the FAIR thresholds.

AgReFed’s operation and design is a federated architecture (Figure 1).[25] It draws on a Service-Orientated Architecture Reference Model design for Open Distributed Processing (RM-ODP)[41], with the addition of a unique "Social Architecture" viewpoint to structure social aspects of the system, such as governance. AgReFed’s Social Architecture adopts a democratic cooperative governance approach. It is led by its members to meet shared goals of self-governance, trust through active participation, and self-determination.[42][43] Governance, roles, and responsibilities are defined in the social (i.e., membership, financial, and strategic) and technical policies[34][40] that determine operation of AgReFed, including the implementation and governance of FAIR thresholds.

Fig1 Wong DataSciJourn22 21-1.png

Fig. 1 The FAIR alignment process within the AgReFed architecture. FAIR thresholds are part of the alignment process for organizations to participate in the federation. They are integrated into governance and technical policies, as well as roles and responsibilities.[25]

The recommended process is that applications for membership to AgReFed are assessed by a Federation Data Steward.[25][40] They assess if the provider/provider community meets the membership policy, including whether the thresholds are met. The Technical Committee works with the Federation Data Steward, as well as potentially the Data Standards and Vocabularies Steward or delegated expert advisors/groups, to ensure the partners’ solutions align with and are integrated into the technical policy.

The provider has now demonstrated alignment with collective expectations for FAIR data.[25] They nominate a Data Provider Collection Custodian and member to Council and (optionally) Technical Committee. Their data and metadata are made discoverable/harvestable to AgReFed, and they officially become AgReFed members. All members have equal participation and decision rights. In this way, the community participates in the governance of the FAIR threshold settings, including how they are maintained and implemented.

Reflections and next steps

The manual AgReFed FAIR thresholds assessment, with practical examples and definitions, was useful for helping data providers conduct meaningful assessments of their data across the full continuum of data maturity. It was also useful for developing and implementing works plans. However, to enable transparency, repeatability, and scalability of assessments across the agricultural domain, some improvements could be made. Where data, metadata, and services are machine-actionable, automated assessment could be used to support scalability and repeatability.[44] In contrast, manual assessment is still required for less mature data and metadata, or where a more nuanced interpretation is required, e.g., content to enhance re-usability. In the current platform phase[45] we plan to improve repeatability of the FAIR threshold assessment by integrating a hybrid (semi-automated) approach.[46] To improve transparency and repeatability, the evidence required for both manual and machine assessment will need to be specified and may include, as examples, screen shots and automated assessment outputs.

Our experience highlighted the expertise of a Federation Data Steward will be essential for assisting partners with FAIR threshold assessments. As the federation grows, the assessments will encompass more standards and technology solutions used by different communities (e.g., FAIRsharing).[47] If various solutions align with or should be integrated into AgReFed technical policy, this will need to be evaluated by the steward in consultation with the Technical Committee. Keeping up-to-date with current developments such as the FAIR Data Maturity Mode[26] will ensure the relevance and currency of the policy and thresholds. A dedicated Federation Standards and Vocabulary Steward[25] would be valuable here for brokering conversations with expert domain communities and working groups that can advise or make delegated decisions.

Here, we focused on defining FAIR data thresholds. However, we recognize that repositories that the data is served from should be "FAIR data-enabling" as a critical component of the broader "FAIR data ecosystem."[44][48] There are various ways of assessing or accrediting repositories relating to areas of security risk management, organizational and physical infrastructure, and digital object management.[49] As a preliminary trial, we included assessment of a "pass" or "fail' of several CoreTrustSeal requirements deemed necessary for persistent delivery of trusted agricultural data whilst not being onerous and disincentivizing participation.[25] Our early experience showed that research scientists and even data managers had difficulty knowing if their repositories complied. Furthermore, there were challenges knowing what to assess if the data products were served from multiple repositories. AgReFed could play a role helping providers assess and choose repositories that meet community expectations. We look forward to learning about the solutions of other domains here.

Our experience was that the starting point for FAIRness of pilot participants’ data varied, as did their priorities, capacities, and solutions for improving. To ensure these viewpoints were encapsulated, setting the FAIR thresholds and their governance and implementation was done through consensus with providers. It is envisaged that this active participation will help ensure the settings are realistic and promote trust and self-determination, giving providers incentive to participate. The thresholds aimed to strike a balance between the realities and priorities of providers so as not to disincentivize participation whilst also aiming to inspire, support, and educate for fully FAIR data while meeting end-user needs.

Improving the FAIRness of data took resourcing, and as such, we recognize value propositions are required for providers to have confidence in participation. Benefits to founding partners included being an exemplar of FAIR best practice at the institutional level, making access and re-use easier for end-users, and being able to combine data types for research insights (see AgReFed's use case stories).[35] Providers benefited from metadata guidance through education resources, library, and licensing support. Expert assistance, including from providers’ organizational IT, was required for data structuring, access through APIs, and finding, selecting, creating, and applying vocabularies. In one case (SMN-2), institutional IT resourcing for data service work was a challenge but raised the prioritization of upgrades now being worked on. Data federations can support collective advocacy, relationship brokering, and tailored support across these areas.

The provision, assembling, and demonstration of tooling resources for data providers’ various needs, priorities and capabilities would also lower the cost of delivering FAIR data, thereby incentivizing federation participation. Examples across the data management cycle include data management plans, data collection tools[50], data deposition tools[51], and example protocol/reference implementations.[52] This is a focus of AgReFed’s next phase. Virtual research environments with example workflows are also being integrated. Furthermore, the federation can continue to align/encourage membership with intermediates or broker platforms that offer value in specific fields of research, including in data standardization.

The current phase has focused on research institutes. Expanding participation to cooperatives, research development corporations, industry, and farmers—as envisaged by members[53]—will require incentivization. The governance structure of AgReFed enables the community to make policy adjustments to support this. For example, alternative funding models may be leveraged, such as user-pays for certain services and data in the competitive space. Stakeholders can bring assets aside from data to the table to help meet the varied needs of participants. Recognizing this, membership was recently expanded to providers of tooling, infrastructure, and other resources. Active participation through the federation will help ensure individual and collective benefits are delivered across the agricultural research sector, including through FAIR and trusted data.


We gratefully acknowledge the assistance from Catherine Brady and Melanie Barlow (ARDC) for metadata and services support; the data contributions of Dr. Ben Biddulph (Department of Primary Industries and Regional Development, WA), Southern Farming Systems, and Corangamite Catchment Management Authority; technical development by Andrew MacLeod, Scott Limmer, and Heath Gillett (Federation University); Linda Gregory (CSIRO, National Soil Data and Information); Daniel Watkins (University of New England); vocabulary support by Simon Cox (CSIRO, Environmental Informatics); and policy work and manuscript feedback from Dr. Joel Epstein. Thank you to all those providing review of AgReFed documents cited herein, including Dr. Andrew Treloar (ARDC), Prof. Harvey Millar (The University of Western Australia), Peter Wilson (CSIRO, National Soil Data, and Information), Assoc. Prof. Peter Dahlhaus and Jude Channon (Federation University), Prof. Matthew Gilliham (University of Adelaide), Dr. Bettina Berger (University of Adelaide), Dr. Kay Steel (Federation University), and Dr. Rachelle Hergenhan (University of New England).

Author contributions

Methodology and framework was developed by P. Box, derived from previous work and experiences, with contributions to development and implementation of the framework by B. Simons, K. Levett, A. MacLeod, and M. Wong. Testing and feedback by R. David, D. Schneider, and N. Taylor. Contributions to writing social and or technical policies were made by A. Lee, P. Box, H. Thompson, B. Simons, A. MacLeod, and M. Wong. M. Wong led article writing, with significant writing contributions from K. Levett and A. Lee. All authors contributed to writing, including revising critically for intellectual input. P. Box refined methodological scope and significance in early drafts.


This research was supported by the Australian Research Data Commons (ARDC) Agriculture Research Data Cloud project (DC063) and ARDC Discovery Activities (TD018). The ARDC is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy (NCRIS).

Data accessibility

Datasets informing this research are available by open licence permitting unrestricted access (,,,, and unless otherwise stated ( and The FAIR assessments are available at

Competing interests

The authors have no competing interests to declare.


  1. 1.0 1.1 Kenney, Martin; Serhan, Hiam; Trystram, Gilles (2020). "Digitization and Platforms in Agriculture: Organizations, Power Asymmetry, and Collective Action Solutions" (in en). SSRN Electronic Journal. doi:10.2139/ssrn.3638547. ISSN 1556-5068. 
  2. 2.0 2.1 2.2 2.3 2.4 Ingram, Julie; Maye, Damian; Bailye, Clive; Barnes, Andrew; Bear, Christopher; Bell, Matthew; Cutress, David; Davies, Lynfa et al. (1 March 2022). "What are the priority research questions for digital agriculture?" (in en). Land Use Policy 114: 105962. doi:10.1016/j.landusepol.2021.105962. 
  3. Jakku, Emma; Taylor, Bruce; Fleming, Aysha; Mason, Claire; Fielke, Simon; Sounness, Chris; Thorburn, Peter (1 December 2019). "“If they don’t tell us what they do with it, why would we trust them?” Trust, transparency and benefit-sharing in Smart Farming" (in en). NJAS: Wageningen Journal of Life Sciences 90-91 (1): 1–13. doi:10.1016/j.njas.2018.11.002. ISSN 1573-5214. 
  4. Wiseman, L.; Sanderson, J. (2018). "Legal and trust issues in Australian agriculture". Proceedings of the 40th Annual Conference Australian Society of Sugar Cane Technologists. 
  5. Barry, S.; Darnell, R.; Grundy, M. et al. (2017). "Precision to Decision – Current and future state of agricultural data for digital agriculture in Australia" (PDF). CSIRO and Cotton Research and Development Corporation. Retrieved 11 November 2021. 
  6. Perrett, E.; Heath, R.; Laurie, A. et al. (2017). "Accelerating precision agriculture to decision agriculture – Analysis of the economic benefit and strategies for delivery of digital agriculture in Australia" (PDF). Australian Farm Institute and Cotton Research and Development Corporation. Retrieved 11 November 2021. 
  7. 7.0 7.1 Box, Paul; Reeson, Andrew; Sanderson, Todd (2017). Cultivating Trust: Towards an Australian Agricultural Data Market. doi:10.4225/08/5A05E92767CA4. 
  8. Allemang, D.; Teegarden, B. (August 2016). "A Global Data Ecosystem for Agriculture and Food" (PDF). Global Open Data for Agriculture and Nutrition. Retrieved 17 December 2021. 
  9. Chiles, Robert M.; Broad, Garrett; Gagnon, Mark; Negowetti, Nicole; Glenna, Leland; Griffin, Megan A. M.; Tami-Barrera, Lina; Baker, Siena et al. (1 December 2021). "Democratizing ownership and participation in the 4th Industrial Revolution: challenges and opportunities in cellular agriculture" (in en). Agriculture and Human Values 38 (4): 943–961. doi:10.1007/s10460-021-10237-7. ISSN 0889-048X. PMC PMC8383920. PMID 34456466. 
  10. DataCite Association (2022). "Repository Finder". DataCite Association. Retrieved 02 March 2022. 
  11. 11.0 11.1 Harper, Lisa; Campbell, Jacqueline; Cannon, Ethalinda K S; Jung, Sook; Poelchau, Monica; Walls, Ramona; Andorf, Carson; Arnaud, Elizabeth et al. (1 January 2018). "AgBioData consortium recommendations for sustainable genomics and genetics databases for agriculture" (in en). Database 2018. doi:10.1093/database/bay088. ISSN 1758-0463. PMC PMC6146126. PMID 30239679. 
  12. Wicquart, Jérémy; Gudka, Mishal; Obura, David; Logan, Murray; Staub, Francis; Souter, David; Planes, Serge (1 May 2022). "A workflow to integrate ecological monitoring data from different sources" (in en). Ecological Informatics 68: 101543. doi:10.1016/j.ecoinf.2021.101543. 
  13. Knapen, M. J. Rob; Lokers, Rob M.; Candela, Leonardo; Janssen, Sander (2020), Athanasiadis, Ioannis N.; Frysinger, Steven P.; Schimak, Gerald et al.., eds., "AGINFRA PLUS: Running Crop Simulations on the D4Science Distributed e-Infrastructure" (in en), Environmental Software Systems. Data Science in Action (Cham: Springer International Publishing) 554: 81–89, doi:10.1007/978-3-030-39815-6_8, ISBN 978-3-030-39814-9, Retrieved 2022-06-10 
  14. Kruseman, Gideon; Bairagi, Subir; Komarek, Adam M.; Molero Milan, Anabel; Nedumaran, Swamikannu; Petsakos, Athanasios; Prager, Steven; Yigezu, Yigezu A. (1 March 2020). "CGIAR modeling approaches for resource‐constrained scenarios: II. Models for analyzing socioeconomic factors to improve policy recommendations" (in en). Crop Science 60 (2): 568–581. doi:10.1002/csc2.20114. ISSN 0011-183X. 
  15. Antle, John M; Basso, Bruno; Conant, Richard T; Godfray, H Charles J; Jones, James W; Herrero, Mario; Howitt, Richard E; Keating, Brian A et al. (1 July 2017). "Towards a new generation of agricultural system data, models and knowledge products: Design and improvement" (in en). Agricultural Systems 155: 255–268. doi:10.1016/j.agsy.2016.10.002. PMC PMC5485644. PMID 28701817. 
  16. 16.0 16.1 16.2 Brewer, Stephen; Pearson, Simon; Godsiff, Philip; Maull, Roger (23 March 2021) (in en). Food Data Trust: A framework for information sharing. doi:10.5281/zenodo.4575565. 
  17. Ernst & Young (March 2019). "Agricultural Innovation – A National Approach to Grow Australia’s Future" (PDF). Ernst & Young. Retrieved 24 March 2022. 
  18. 18.0 18.1 Wilkinson, Mark D.; Dumontier, Michel; Aalbersberg, IJsbrand Jan; Appleton, Gabrielle; Axton, Myles; Baak, Arie; Blomberg, Niklas; Boiten, Jan-Willem et al. (15 March 2016). "The FAIR Guiding Principles for scientific data management and stewardship" (in en). Scientific Data 3 (1): 160018. doi:10.1038/sdata.2016.18. ISSN 2052-4463. PMC PMC4792175. PMID 26978244. 
  19. Petrie, Ruth; Denvil, Sébastien; Ames, Sasha; Levavasseur, Guillaume; Fiore, Sandro; Allen, Chris; Antonio, Fabrizio; Berger, Katharina et al. (29 January 2021). "Coordinating an operational data distribution network for CMIP6 data" (in en). Geoscientific Model Development 14 (1): 629–644. doi:10.5194/gmd-14-629-2021. ISSN 1991-9603. 
  20. Plante, Raymond L.; Becker, Chandler A.; Medina-Smith, Andrea; Brady, Kevin; Dima, Alden; Long, Benjamin; Bartolo, Laura M.; Warren, James A. et al. (13 April 2021). "Implementing a Registry Federation for Materials Science Data Discovery" (in en). Data Science Journal 20: 15. doi:10.5334/dsj-2021-015. ISSN 1683-1470. PMC PMC8596377. PMID 34795758. 
  21. OneGeology (2020). "OneGeology". https://www.on Retrieved 24 March 2022. 
  22. U.S. Department of Agriculture. "Ag Data Commons". National Agriculture Library. U.S. Department of Agriculture. Retrieved 24 March 2022. 
  23. Drakos, Andreas; Protonotarios, Vassilis; Manouselis, Nikos (15 June 2015). "agINFRA: a research data hub for agriculture, food and the environment" (in en). F1000Research 4: 127. doi:10.12688/f1000research.6349.2. ISSN 2046-1402. PMC PMC4544381. PMID 26339472. 
  24. CGIAR (2021). "Platform for Big Data in Agriculture". CGIAR. Retrieved 15 November 2021. 
  25. 25.00 25.01 25.02 25.03 25.04 25.05 25.06 25.07 25.08 25.09 25.10 25.11 Box, Paul; Levett, Kerry; Simons, Bruce; Wong, Megan (2019). Guidelines for the development of a Data Stewardship and Governance Framework for the Agricultural Research Federation (AgReFed). doi:10.25919/5CF179BA35DB9. 
  26. 26.0 26.1 26.2 Bahim, Christophe; Casorrán-Amilburu, Carlos; Dekkers, Makx; Herczog, Edit; Loozen, Nicolas; Repanas, Konstantinos; Russell, Keith; Stall, Shelley (27 October 2020). "The FAIR Data Maturity Model: An Approach to Harmonise FAIR Assessments" (in en). Data Science Journal 19: 41. doi:10.5334/dsj-2020-041. ISSN 1683-1470. 
  27. Miller, L. (20 December 2018), "Corangamite soil health monitoring program data", CeRDI Datasets (Centre for eResearch and Digital Innovation), doi:10.25955/5c1c6b8f4d8d2, 
  28. Miller, L.; Midwood, J. (2019), "Southern Farming Systems Moisture Probe Network Data", CeRDI Datasets (Centre for eResearch and Digital Innovation), doi:10.25955/5cdcff6168a76, 
  29. Sanderman, Jonathan; David, Rakesh; Moore, Andrew; Keith, Heather; Farquharson, Ryan (2015), "Waite Permanent Rotation Trial", CSIRO Data Access Portal (CSIRO), doi:10.4225/08/55e5165ec0d29, 
  30. CSIRO (2020), "CSIRO National Soil Site Database", CSIRO Data Access Portal (CSIRO), doi:10.25919/5eeb2a56eac12, 
  31. Viscarra Rossel, Raphael; Chen, Charlie; Grundy, Mike; Searle, Ross; Clifford, David; Odgers, Nathan; Holmes, Karen; Griffin, Ted et al.. (2014), "Soil and Landscape Grid National Soil Attribute Maps - Available Water Capacity (3" resolution) - Release 1", CSIRO Data Access Portal (CSIRO), doi:10.4225/08/546ed604add8a, 
  32. Leske, Brenton (2019). UWA/DPIRD Frost Nursery Trial 2018. Nicolas Taylor, Ben Biddulph, Harvey Millar, Department Of Primary Industries And Regional Development. doi:10.26182/5CEDF001186F3. 
  33. Moore, Darren; Gaire, Raj (2018), Precision Agriculture Research Group, "SensorNets - SMART Farms Soil Moisture Network", Research UNE (University of New England, Australia), doi:10.4226/95/5b10d5ca18aef, 
  34. 34.0 34.1 34.2 34.3 MacLeod, Andrew; Wong, Megan; Gregory, Linda; Schneider, Derek; Williams, Andrew; Castleden, Ian; Simons, Bruce; Levett, Kerry et al. (13 May 2020) (in en). The Agricultural Research Federation (AgReFed) Technical and Information Policy Suite. doi:10.5281/ZENODO.3993784. 
  35. 35.0 35.1 Agricultural Research Federation (2021). "AgReFed: Making the most of agricultural data for research". Agricultural Research Federation. Retrieved 11 November 2021. 
  36. Schweitzer, Martin; Levett, Kerry; Russell, Keith; White, Andrew; Unsworth, Kathryn (17 June 2021), "au-research/FAIR-Data-Assessment-Tool: Release v1.0", Zenodo (CERN), doi:10.5281/zenodo.4971127, 
  37. Levett, Kerry; Wong, Megan; MacLeod, Andrew (12 May 2022), "Testing of AgReFed FAIR data Minimum Thresholds and Stretch Targets", Zenodo (CERN), doi:10.5281/zenodo.6541413, 
  38. Cox, Simon J. D.; Gonzalez-Beltran, Alejandra N.; Magagna, Barbara; Marinescu, Maria-Cristina (16 June 2021). Markel, Scott. ed. "Ten simple rules for making a vocabulary FAIR" (in en). PLOS Computational Biology 17 (6): e1009041. doi:10.1371/journal.pcbi.1009041. ISSN 1553-7358. PMC PMC8238180. PMID 34133421. 
  39. Cox, Simon; Gregory, Linda (2020), "RDF representation of ASLS soil profile classification", CSIRO Data Access Portal (CSIRO), doi:10.25919/5f42f324b2ef8, 
  40. 40.0 40.1 40.2 40.3 Wong, Megan; Box, Paul; Epstein, Joel; Lee, Ashlin; Thompson, Helen; Levett, Kerry; Channon, Judy; Wilson, Peter et al. (15 June 2021) (in en). Agricultural Research Federation (AgReFed) Steering Policies, Roles and Responsibilities. doi:10.5281/ZENODO.5205273. 
  41. International Organization for Standardization (December 2009). "ISO/IEC 10746-3:2009 Information technology — Open distributed processing — Reference model: Architecture — Part 3". International Organization for Standardization. 
  42. Buchanan, James M. (1 February 1965). "An Economic Theory of Clubs". Economica 32 (125): 1. doi:10.2307/2552442. 
  43. "Data Cooperatives". Building the New Economy. Alex Pentland, Alexander Lipton, Thomas Hardjono (0 ed.). PubPub. 30 April 2020. doi:10.21428/ba67f642.0499afe0. 
  44. 44.0 44.1 Devaraju, Anusuriya; Mokrane, Mustapha; Cepinskas, Linas; Huber, Robert; Herterich, Patricia; de Vries, Jerry; Akerman, Vesa; L’Hours, Hervé et al. (3 February 2021). "From Conceptualization to Implementation: FAIR Assessment of Research Data Objects" (in en). Data Science Journal 20: 4. doi:10.5334/dsj-2021-004. ISSN 1683-1470. 
  45. Australian Research Data Commons (8 December 2020). "AgReFed: A platform for the transformation of agricultural research". Australian Research Data Commons. doi: 
  46. Gehlen, Karsten Peters-von; Höck, Heinke; Fast, Andrej; Heydebreck, Daniel; Lammert, Andrea; Thiemann, Hannes (24 March 2022). "Recommendations for Discipline-Specific FAIRness Evaluation Derived from Applying an Ensemble of Evaluation Tools" (in en). Data Science Journal 21: 7. doi:10.5334/dsj-2022-007. ISSN 1683-1470. 
  47. the FAIRsharing Community; Sansone, Susanna-Assunta; McQuilton, Peter; Rocca-Serra, Philippe; Gonzalez-Beltran, Alejandra; Izzo, Massimiliano; Lister, Allyson L.; Thurston, Milo (1 April 2019). "FAIRsharing as a community approach to standards, repositories and policies" (in en). Nature Biotechnology 37 (4): 358–367. doi:10.1038/s41587-019-0080-8. ISSN 1087-0156. PMC PMC6785156. PMID 30940948. 
  48. European Commission Directorate General for Research and Innovation; Collins, S.; Genova, F. et al. (2018). Turning FAIR into reality: Final report and action plan from the European Commission expert group on FAIR data.. European Commission. doi:10.2777/1524. 
  49. Lin, Dawei; Crabtree, Jonathan; Dillo, Ingrid; Downs, Robert R.; Edmunds, Rorie; Giaretta, David; De Giusti, Marisa; L’Hours, Hervé et al. (1 December 2020). "The TRUST Principles for digital repositories" (in en). Scientific Data 7 (1): 144. doi:10.1038/s41597-020-0486-7. ISSN 2052-4463. PMC PMC7224370. PMID 32409645. 
  50. Devare, Medha; Aubert, Céline; Benites Alfaro, Omar Eduardo; Perez Masias, Ivan Omar; Laporte, Marie-Angélique (11 October 2021). "AgroFIMS: A Tool to Enable Digital Collection of Standards-Compliant FAIR Data". Frontiers in Sustainable Food Systems 5: 726646. doi:10.3389/fsufs.2021.726646. ISSN 2571-581X. 
  51. Shaw, Felix; Etuk, Anthony; Minotto, Alice; Gonzalez-Beltran, Alejandra; Johnson, David; Rocca-Serra, Phillipe; Laporte, Marie-Angélique; Arnaud, Elizabeth et al. (2 June 2020). "COPO: a metadata platform for brokering FAIR data in the life sciences" (in en). F1000Research 9: 495. doi:10.12688/f1000research.23889.1. ISSN 2046-1402. 
  52. Dutch Techcentre for Life Sciences. "FAIR Data Point". Dutch Techcentre for Life Sciences. Retrieved 24 March 2022. 
  53. Paul, Box; Epstein, Joel; Wong, Megan; Thompson, Helen; Lee, Ashlin; Levett, Kerry; Hergenhan, Rachelle; Dalhaus, Peter et al. (29 August 2019). "White Paper for the enactment phase of the Agricultural Research Federation (AgReFed)" (in en). Zenodo. doi:10.5281/zenodo.3706374. 


This presentation is faithful to the original, with only a few minor changes to presentation, grammar, and punctuation. In some cases important information was missing from the references, and that information was added. The original references are in alphabetical order; this version places them in or order of appearance, by design.