Journal:Data and metadata brokering – Theory and practice from the BCube Project

From LIMSWiki
Revision as of 23:52, 13 February 2017 by Shawndouglas (talk | contribs) (Saving and adding more.)
Jump to navigationJump to search
Full article title Data and metadata brokering – Theory and practice from the BCube Project
Journal Data Science Journal
Author(s) Khalsa, Siri Jodha Singh
Author affiliation(s) University of Colorado
Primary contact Email: sjsk at nsidc dor org
Year published 2017
Volume and issue 16(1)
Page(s) 1
DOI 10.5334/dsj-2017-001
ISSN 1683-1470
Distribution license Creative Commons Attribution 4.0 International
Download (PDF)


EarthCube is a U.S. National Science Foundation initiative that aims to create a cyberinfrastructure (CI) for all the geosciences. An initial set of "building blocks" was funded to develop potential components of that CI. The Brokering Building Block (BCube) created a brokering framework to demonstrate cross-disciplinary data access based on a set of use cases developed by scientists from the domains of hydrology, oceanography, polar science and climate/weather. While some successes were achieved, considerable challenges were encountered. We present a synopsis of the processes and outcomes of the BCube experiment.

Keywords: interoperability, brokering, middleware, EarthCube, cross-domain, socio-technical

Genesis and objectives of EarthCube

In 2011 the U.S. National Science Foundation initiated EarthCube, a joint effort of NSF’s Office of Cyberinfrastructure (OCI), whose interest was in computational and data-rich science and engineering, and the Geosciences Directorate (GEO), whose interest was in understanding and forecasting the behavior of a complex and evolving Earth system. The goal in creating EarthCube was to create a sustainable, community-based and open cyberinfrastructure for all researchers and educators across the geosciences.

The NSF recognized there was no infrastructure that could manage and provide access to all geosciences data in an open, transparent and inclusive manner, and that progress in geosciences would be increasingly reliant on interdisciplinary activities. Therefore, a system that enabled the sharing, interoperability and re-use of data needed to be created.

Similar efforts to provide the infrastructure needed to support scientific research and innovation is underway in other countries, most notably in the European Union, guided by the European Strategy Forum on Research Infrastructures (ESFRI) and in Australia under the National Collaborative Research Infrastructure Strategy (NCRIS). The goal of these efforts is to provide scientists, policy makers and the public with computing resources, analytic tools and educational material, all within an open, interconnected and collaborative environment.

The nature of infrastructure development

The building of infrastructure is as much a social endeavor as a technical one. Bowker et al.[1] emphasized that information infrastructures are more than the data, tools and networks comprising the technical elements, but also involve the people, practices, and institutions that lead to the creation, adoption and evolution of the underlying technology. The NSF realized that a cyberinfrastructure, to be successful, must have substantial involvement of the target community through all phases of its development, from inception to deployment. In fact, studies have shown infrastructure evolves from independent and isolated efforts and there is not a clear point where "deployment" is complete.[2] The fundamental challenge was the heterogeneity of scientific disciplines and technologies that needed to cooperate to accomplish this goal, and the necessity of getting all stakeholders to cooperate in its development. A compounding factor is that while technology evolves rapidly, people’s habits, work practices, cultural attitudes towards data sharing, and willingness to use others' data, all evolve more slowly. How the relationship of people to the infrastructure evolves determines whether it succeeds or fails.

A significant element of NSF’s strategy for building EarthCube was to make it a collective effort of geoscientists and technologists from the start, in hopes of ensuring that what was developed did indeed serve the needs of geoscientists and would in fact find widespread uptake. A series of community events and end-user workshops spanning the geoscience disciplines were undertaken with the dual goals of gathering requirements for EarthCube and building a community of geoscientists willing to engage with and take ownership of the EarthCube process.

NSF began issuing small awards to explore concepts for EarthCube. These were followed by the funding of an initial set of "building blocks" meant to demonstrate potential components of EarthCube. The Brokering Building Block (BCube) was one of these awards. BCube sought both to solve real problems of interoperability that geoscientist face in carrying out research, while also studying the social aspects of technology adoption.

The challenge of cross-disciplinary interoperability

Interoperability has many facets and can be viewed from either the perspective of systems or people. Systems are interoperable when they can exchange information without having to know the details of each other's internal workings. Likewise, people view systems or data as interoperable when they don’t have to learn the intricacies of each in order to use them. When systems are interoperable, users of those systems should have uniform access and receive harmonized services and data from them. This is the vision of EarthCube. Delivering on that vision can be considered the "grand challenge" of information technology as applied to the geosciences.

The reason that achieving interoperability across the geosciences is so challenging is because the many scientific fields that comprise the geosciences all have their own methods, standards and conventions for managing and sharing data. The sophistication of the information technologies that have been adopted in each community, the degree of standardization on data exchange formats and vocabularies, the amount of centralization in data cataloguing, and the openness to sharing data all vary greatly.

The methods of achieving interoperability across distributed systems can be categorized as shown in Table 1.

Table 1. Methods for achieving interoperability
Method Requirements Benefits
Adherence to common standards Uniformity in system configuration De facto interoperability
Gateways and translators Installation and maintenance of custom or third-party software Can adapt to new or changing protocols and standards
Brokers as infrastructure, third-party mediation Creation and maintenance of brokering framework with custom adapters Provides two-way translations between disparate systems and removes burdens of interoperability from data provider

Since disciplines will always use different standards for encoding, accessing and describing data, the first option is not a realistic one for the geosciences. The second method is currently in wide use within the geosciences, such as GBIF[3], which harvests metadata from multiple external systems and then maps the metadata — which are served through different protocols and use different schema — to a common standard. Systems such as ERDAAP[4][5] act as servers accessing disparate datasets and serving them through a common interface. What BCube explored was the possibility that a broker, mediating the interactions between many systems serving data and many systems requesting data, could be established as a shared service, i.e., as infrastructure, without being tied to any particular repository or user portal.

Edwards et al.[6] show that technical infrastructures such as electrical grids and railroads evolve in stages, and the final stage is "a process of consolidation characterized by gateways that allow dissimilar systems to be linked into networks." Brokering is such a gateway, applied in the context of information systems. While brokering technologies such as CORBA1 have been in existence since the 1990s, their application typically requires participants in a network to install software packages that enable interfacing through a common protocol. Conformance to uniform standards is clearly a barrier in cross-disciplinary contexts since each community tends to develop its own conventions for storing, describing and accessing data.

The BCube brokering framework

The BCube project advanced a brokering framework by addressing the social, technical and organizational aspects of cyberinfrastructure development. It sought to identify best practices in both technical and cultural contexts by means of engaging scientist with the evolving cyberinfrastructure to achieve effective cross-disciplinary collaborations. The engagement included a number of different communities in guiding and testing the development, with the aim of involving geoscientists at a deep level in the entire process.

BCube adapted a brokering framework that had been developed for the EuroGEOSS project[7] and subsequently deployed in the Global Earth Observation System of Systems (GEOSS). Called the Discovery and Access Broker, or DAB[8], it has successfully brokered millions of data records from dozens of data sources. Guided by the recommendations laid out in the Brokering Roadmap[9], BCube sought to demonstrate how brokering could enhance cross-disciplinary data discovery and access by having scientists from different fields create real-world science scenarios that required the use of data from diverse sources.


  1. Bowker, G.C.; Baker, K.; Millerand, F.; Ribes, D. (2010). "Toward Information Infrastructure Studies: Ways of Knowing in a Networked Environment". In Hunsinger, J.; Klastrup, L.; Allen, M.. International Handbook of Internet Research. Springer Netherlands. pp. 97–117. ISBN 9781402097898. 
  2. Star, S.L.; Ruhleder, K. (1996). "Steps toward an ecology of infrastructure: Design and access for large information spaces". Information Systems Research 7 (1): 111–134. ISBN 10.1287/isre.7.1.111. 
  3. Edwards, J.L.; Lane, M.A.; Nielsen, E.S. (2000). "Interoperability of biodiversity databases: Biodiversity information on every desktop". Science 289 (5488): 2312-2314. ISBN 10.1126/science.289.5488.2312. PMID 11009409. 
  4. Simons, R.A.; Mendelssohn, R. (2012). "ERDDAP - A Brokering Data Server for Gridded and Tabular Datasets". American Geophysical Union, Fall Meeting 2012 2012: IN21B-1473. 
  5. Delaney, C.; Alessandrini, A.; Greidanus, H. (2016). "Using message brokering and data mediation on earth science data to enhance global maritime situational awareness". IOP Conference Series: Earth and Environmental Science 34: 012005. doi:10.1088/1755-1315/34/1/012005. 
  6. Edwards, P.N.; Jackson, S.J.; Bowker, G.C. et al. (2012). "Understanding infrastructure: Dynamics, tensions, and design". Deep Blue. 
  7. Vaccari, L.; Craglia, M.; Fugazza, C. et al. (2012). "Integrative Research: The EuroGEOSS Experience". IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 5 (6): 1603–1611. doi:10.1109/JSTARS.2012.2190382. 
  8. Nativi, S.; Craglia, M.; Pearlman, J. (2013). "Earth Science Infrastructures Interoperability: The Brokering Approach". IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 6 (3): 1118–1129. doi:10.1109/JSTARS.2013.2243113. 
  9. Khalsa, S.J.; Pearlman, J.; Nativi, S. et al. (2013). "Brokering for EarthCube Communities: A Road Map" (PDF). National Snow and Ice Data Center. doi:10.7265/N59C6VBC. 


This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original article lists references alphabetically, but this version — by design — lists them in order of appearance. Footnotes have been changed from numbers to letters as citations are currently using numbers.