Journal:Data and metadata brokering – Theory and practice from the BCube Project

From LIMSWiki
Revision as of 22:41, 13 February 2017 by Shawndouglas (talk | contribs) (Saving and adding more.)
Jump to navigationJump to search
Full article title Data and metadata brokering – Theory and practice from the BCube Project
Journal Data Science Journal
Author(s) Khalsa, Siri Jodha Singh
Author affiliation(s) University of Colorado
Primary contact Email: sjsk at nsidc dor org
Year published 2017
Volume and issue 16(1)
Page(s) 1
DOI 10.5334/dsj-2017-001
ISSN 1683-1470
Distribution license Creative Commons Attribution 4.0 International
Download (PDF)


EarthCube is a U.S. National Science Foundation initiative that aims to create a cyberinfrastructure (CI) for all the geosciences. An initial set of "building blocks" was funded to develop potential components of that CI. The Brokering Building Block (BCube) created a brokering framework to demonstrate cross-disciplinary data access based on a set of use cases developed by scientists from the domains of hydrology, oceanography, polar science and climate/weather. While some successes were achieved, considerable challenges were encountered. We present a synopsis of the processes and outcomes of the BCube experiment.

Keywords: interoperability, brokering, middleware, EarthCube, cross-domain, socio-technical

Genesis and objectives of EarthCube

In 2011 the U.S. National Science Foundation initiated EarthCube, a joint effort of NSF’s Office of Cyberinfrastructure (OCI), whose interest was in computational and data-rich science and engineering, and the Geosciences Directorate (GEO), whose interest was in understanding and forecasting the behavior of a complex and evolving Earth system. The goal in creating EarthCube was to create a sustainable, community-based and open cyberinfrastructure for all researchers and educators across the geosciences.

The NSF recognized there was no infrastructure that could manage and provide access to all geosciences data in an open, transparent and inclusive manner, and that progress in geosciences would be increasingly reliant on interdisciplinary activities. Therefore, a system that enabled the sharing, interoperability and re-use of data needed to be created.

Similar efforts to provide the infrastructure needed to support scientific research and innovation is underway in other countries, most notably in the European Union, guided by the European Strategy Forum on Research Infrastructures (ESFRI) and in Australia under the National Collaborative Research Infrastructure Strategy (NCRIS). The goal of these efforts is to provide scientists, policy makers and the public with computing resources, analytic tools and educational material, all within an open, interconnected and collaborative environment.

The nature of infrastructure development

The building of infrastructure is as much a social endeavor as a technical one. Bowker, et al.[1] emphasized that information infrastructures are more than the data, tools and networks comprising the technical elements, but also involve the people, practices, and institutions that lead to the creation, adoption and evolution of the underlying technology. The NSF realized that a cyberinfrastructure, to be successful, must have substantial involvement of the target community through all phases of its development, from inception to deployment. In fact, studies have shown infrastructure evolves from independent and isolated efforts and there is not a clear point where “deployment” is complete.[2] The fundamental challenge was the heterogeneity of scientific disciplines and technologies that needed to cooperate to accomplish this goal, and the necessity of getting all stakeholders to cooperate in its development. A compounding factor is that while technology evolves rapidly, people’s habits, work practices, cultural attitudes towards data sharing, and willingness to use others' data, all evolve more slowly. How the relationship of people to the infrastructure evolves determines whether it succeeds or fails.

A significant element of NSF’s strategy for building EarthCube was to make it a collective effort of geoscientists and technologists from the start, in hopes of ensuring that what was developed did indeed serve the needs of geoscientists and would in fact find widespread uptake. A series of community events and end-user workshops spanning the geoscience disciplines were undertaken with the dual goals of gathering requirements for EarthCube and building a community of geoscientists willing to engage with and take ownership of the EarthCube process.

NSF began issuing small awards to explore concepts for EarthCube. These were followed by the funding of an initial set of "building blocks" meant to demonstrate potential components of EarthCube. The Brokering Building Block (BCube) was one of these awards. BCube sought both to solve real problems of interoperability that geoscientist face in carrying out research, while also studying the social aspects of technology adoption.

The challenge of cross-disciplinary interoperability

Interoperability has many facets and can be viewed from either the perspective of systems or people. Systems are interoperable when they can exchange information without having to know the details of each other's internal workings. Likewise, people view systems or data as interoperable when they don’t have to learn the intricacies of each in order to use them. When systems are interoperable, users of those systems should have uniform access and receive harmonized services and data from them. This is the vision of EarthCube. Delivering on that vision can be considered the "grand challenge" of information technology as applied to the geosciences.

The reason that achieving interoperability across the geosciences is so challenging is because the many scientific fields that comprise the geosciences all have their own methods, standards and conventions for managing and sharing data. The sophistication of the information technologies that have been adopted in each community, the degree of standardization on data exchange formats and vocabularies, the amount of centralization in data cataloguing, and the openness to sharing data all vary greatly.

The methods of achieving interoperability across distributed systems can be categorized as shown in Table 1.

Table 1. Methods for achieving interoperability
Method Requirements Benefits
Adherence to common standards Uniformity in system configuration De facto interoperability
Gateways and translators Installation and maintenance of custom or third-party software Can adapt to new or changing protocols and standards
Brokers as infrastructure, third-party mediation Creation and maintenance of brokering framework with custom adapters Provides two-way translations between disparate systems and removes burdens of interoperability from data provider

Since disciplines will always use different standards for encoding, accessing and describing data, the first option is not a realistic one for the geosciences. The second method is currently in wide use within the geosciences, such as GBIF (Edwards, Lane and Nielsen, 2000), which harvests metadata from multiple external systems and then maps the metadata, which are served through different protocols and use different schemas, to a common standard. Systems such as ERDAAP (Simons and Mendelssohn, 2012; Delaney, Alessandrini and Greidanus, 2016) act as servers accessing disparate datasets and serving them through a common interface. What BCube explored was the possibility that a broker, mediating the interactions between many systems serving data and many systems requesting data, could be established as a shared service, i.e. as infrastructure, without being tied to any particular repository or user portal.


  1. Bowker, G.C.; Baker, K.; Millerand, F.; Ribes, D. (2010). "Toward Information Infrastructure Studies: Ways of Knowing in a Networked Environment". In Hunsinger, J.; Klastrup, L.; Allen, M.. International Handbook of Internet Research. Springer Netherlands. pp. 97–117. ISBN 9781402097898. 
  2. Star, S.L.; Ruhleder, K. (1996). "Steps toward an ecology of infrastructure: Design and access for large information spaces". Information Systems Research 7 (1): 111–134. ISBN 10.1287/isre.7.1.111. 


This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original article lists references alphabetically, but this version — by design — lists them in order of appearance. Footnotes have been changed from numbers to letters as citations are currently using numbers.