User:Shawndouglas/sandbox/sublevel4

From LIMSWiki
Jump to navigationJump to search

Sandbox begins below

Full article title CÆLIS: Software for assimilation, management, and processing data of an atmospheric measurement network
Journal Geoscientific Instrumentation, Methods and Data Systems
Author(s) Fuertes, David; Toledano, Carlos; González, Ramiro; Berjón, Alberto; Torres, Benjamin; Cachorro, Victoria E.; de Frutos, Ángel M.
Author affiliation(s) University of Valladolid, GRASP SAS
Primary contact Email: david at goa dot uva dot es
Year published 2018
Volume and issue 7(1)
Page(s) 67–81
DOI 10.5194/gi-7-67-2018
ISSN 21930864
Distribution license Creative Commons Attribution 4.0 International
Website https://www.geosci-instrum-method-data-syst.net/7/67/2018/
Download https://www.geosci-instrum-method-data-syst.net/7/67/2018/gi-7-67-2018.pdf (PDF)

Abstract

Given the importance of atmospheric aerosols, the number of instruments and measurement networks which focus on its characterization is growing. Many challenges are derived from standardization of protocols, monitoring of instrument status to evaluate network data quality, and manipulation and distribution of large volumes of data (raw and processed). CÆLIS is a software system which aims to simplify the management of a network, providing the scientific community a new tool for monitoring instruments, processing data in real time, and working with the data. Since 2008, CÆLIS has been successfully applied to the photometer calibration facility managed by the University of Valladolid, Spain, under the framework of the Aerosol Robotic Network (AERONET). Thanks to the use of advanced tools, this facility has been able to analyze a growing number of stations and data in real time, which greatly benefits network management and data quality control. The work describes the system architecture of CÆLIS and gives some examples of applications and data processing.

Introduction

Atmospheric aerosols are defined as solid or liquid particles suspended in the atmosphere. Many studies have shown the importance of aerosols, which play an important role in global energy balance and human activities. Among their direct impacts, aerosol particles produce radiative forcing in the atmosphere, provide nutrients for oceans, and affect human health. Aerosols generally produce a cooling effect, although an aerosol can also locally warm up the atmosphere depending on its type, height above the surface, and timescale under consideration. Indirectly, they change the chemical composition of clouds and therefore their radiative properties, lifetime, and precipitation. Improving knowledge about the distribution and composition of aerosols is one of the emerging challenges highlighted by the last IPCC report[1], where it is shown that they have the largest uncertainty for the estimates and interpretations of the Earth’s changing energy budget.

Ground-based and orbital instruments have been applied to monitor aerosol properties. Combining instruments is also possible to maximally exploit synergies. For example, satellites have demonstrated the potential of high spatial coverage and resolution, and standardized ground-based networks have the benefit of high accuracy. A common exercise is to validate satellite data with ground-based networks.

One of these ground-based networks is the Aerosol Robotic Network (AERONET).[2] Led by NASA (National Aeronautics and Space Administration; http://aerosnet.gsfc.nasa.gov) and PHOTONS (PHOtométrie pour le Traitement Opérationnel de Normalisation Satellitaire; http://loaphotons.univ-lille1.fr/), AERONET is built as a federation of sub-networks with highly standardized procedures: instrument, calibration, processing, and data distribution. It was created in the 1990s with the objective of global monitoring of aerosol optical properties from the ground, as well as validating satellite retrievals of aerosols. The standard instrument used by the network is the photometer Cimel318. It is an automatic filter radiometer with a two-axis robot and nine spectral channels covering the spectral range of 340 to 1640 nm. It collects direct solar and lunar measurements, sky radiances in the almucantar and principal planes, and hybrid geometrical configurations. Once the data are validated through instrument status and cloud screening, aerosol optical depth (AOD) can be obtained as a direct product of the nine wavelengths. Using inversion algorithms[3][4], many other parameters can be retrieved, such as size distribution, complex refractive index, portion of spherical particles, and single-scattering albedo.

The Group of Atmospheric Optics at Valladolid University (GOA), Spain, is devoted to the analysis of atmospheric components by optical methods, mainly using remote sensing techniques such as spectral radiometry and lidar. One of the main tasks of the group is the management of an AERONET calibration facility since 2006, which is now—together with the University of Lille, France, and the Spanish Meteorological Agency—part of the so-called Aerosol Remote Sensing central facility of the Aerosols, Clouds, and Trace gases Research Infrastructure (ACTRIS). Since 2016, ACTRIS has been included in the road map of the European Strategy Forum for Research Infrastructures (ESFRI). The GOA calibration facility is in charge of the calibration and site monitoring of about 50 AERONET sites in Europe, North Africa, and Central America.

AERONET standards call for annual instrument calibration, maintenance, and weekly checks on the observation data. The calibration process takes about two to three months and includes post-field calibration for sun, moon, and sky channels; maintenance of the instrumentation; and pre-field calibration for the next measurement period. In order to avoid gaps in the data sets during calibration periods, frequently one instrument is swapped out with a freshly calibrated one. The network management determines where each instrument is located, what its exact configuration and calibration coefficients are, and how many days remain until the next calibration is needed. During the regular deployment period the instrument has to be regularly checked to guarantee the data quality. A routine maintenance protocol is performed by the site manager, but the network is ultimately responsible for data quality. The routine maintenance helps in reducing instrument failure and data errors, but even with the best daily protocol, instrumentation problems may occur. Data monitoring at the calibration center helps in early identification of instrument issues. However such work cannot be accomplished manually in near-real time (NRT) for a large number of sites.

In this context, it was necessary for the calibration facility at GOA to implement an automatic mechanism (in addition to the standard mechanism of AERONET) to help manage the network and facilitate weekly data checks needed to guarantee the quality of the data. The motivation of the CÆLIS system is to fulfill these two requirements. The system has to be designed to save all data, metadata, and ancillary data (assimilated from other sources) in order to, on the one hand, support the management, maintenance and calibration of the network, and on the other hand, process the raw data in NRT with different algorithms and provide network managers, site managers and ultimately the scientific community with a very powerful and modern tool to analyze data produced at the observation sites. This work shows the fundamentals of the CÆLIS system—developed since 2008—both with respect to the scientific background and the information technology employed. There was no predecessor software at Valladolid, and these tasks were done manually before CÆLIS was developed. The other two AERONET calibration centers at NASA and University of Lille have their own tools. Some ideas implemented in CÆLIS are inspired by these tools.

General architecture

CÆLIS has been designed to run on a server which, connected to the internet, allows for external communication via a web interface. The software contains a “daemon” (a background process that offers a service) which is responsible for selecting and launching tasks. These tasks, later explained, are responsible for downloading new data whenever available and processing them. Each task reads the required input information from the database and writes the output there. Some tasks use direct internet access to retrieve data, e.g., downloading ancillary data from an FTP server. All information downloaded and treated by CÆLIS tasks is stored in the database. This allows for following actions to retrieve all information required from the database (quick extraction).

External users (organized by role with various privilege levels) can connect through the web interface to watch what tasks are being executed and explore the results of finished tasks. All actions required by the system administrators can easily be done through the web interface. Network management is also performed through the web interface, which allows for, for example, setting up the installation of an instrument at a measurement station. The same information will be used by the system when data from the instruments reach the server, and CÆLIS will compare the received information (instrument number, parameters, location, dates, etc.) with reference registers stored in the database (installation periods, configuration parameters, etc.) to know if the instrument is working properly and using the correct configuration.

External systems, such as measurement stations, can also be connected to the server and submit data. Thanks to the web interface, it can be done using port 80 (standard HTTP), which avoids many problems derived from security rules of the measurement stations and hosting institutions (some of which are in military areas).

The current system manages 120 users and 80 stations. Each station can send thousands of aerosol observations every year, and the system is constantly growing. A benchmark has been applied to confirm that the current architecture can support a network 100 times bigger, so the database can grow safely in the future.

As shown in Fig. 1, CÆLIS is composed of a database, a processing module and a web interface. These modules can be deployed independently even in different computers. The users and the stations interact with the system through the web interface. In the database, the raw data and metadata are stored, as well as the retrieved products, ancillary data, user information, etc. The NRT processing module is composed of the system daemon and a set of processing routines that extract information from the database, calculate products, and store them in the database. The web interface is the platform designed to manage the system, to manage the network, and to provide visual access to the data and metadata, with tables, plots, searching capability, etc. Each of these elements will be explained in detail in the next sections.


Fig1 Fuertes GIMDS2018 7-1.png

Figure 1. Diagram of CÆLIS architecture. Arrows indicate where the action is initiated (data flow is always bidirectional).

Database model

Databases are one of the main concepts developed in the 1980s in the computer sciences. Many different approaches in terms of technology and data models have been developed with varying success. There are many types of databases, classified based on their characteristics. A database management system (DBMS) is software with an interface to a database system that provides the user with advanced characteristics such as the management of concurrency or a query language. The decision about what kind of database and which specific DBMS software to select is one of the main design decisions because all further development will be impacted by it.

Relational databases are a traditional and well-known model, and they have been successfully applied to many different fields. With relational databases, the information is organized in tables or relations which represent entity types.[5] A good database modeler is able to identify those entity types that are relevant with the information that describes them. The tables or relations are composed of columns with the attributes that describe them, and rows which represent different individual entities that are identified by an unique key (one or more attributes that cannot be repeated in different rows). The tables are linked, creating a relational model. The keystone of a database is good design, which needs to take into account the information targeted for modeling as well as the way in which the data is going to be accessed (to optimize performance). Complex models with many groups of entities need to be planned in advance by creating an entity–relationship diagram. This diagram then helps final implementation of the database, which can be a direct translation of the diagram just taking some implementation decisions about a balance between data redundancy and performance.

The main elements of the entity–relationship diagram of the CÆLIS database are shown in Fig. 2. The central entity is the photometer, which produces raw data. The photometer, with given hardware configuration and calibration coefficients, is installed at one site of the network. The ancillary data for the site (e.g., meteorological data, ozone column, and surface reflectance) need to be stored. Finally the measurement stations are supported by institutions, which can also own other instruments.


Fig2 Fuertes GIMDS2018 7-1.png

Figure 2. Entity–relationship diagram for CÆLIS (extract of the main elements). Entities have been divided into three logic blocks.

Each of these elements is in many cases representing a group of entities. For instance, calibration coefficients include extraterrestrial signal for the different solar spectral channels, radiance calibration coefficients for sky channels, coefficients for temperature correction of the signals, instrument field of view, etc. Another example is the hardware, which includes the different parts (sensor head, robot, collimator, control box, etc.), the spectral filters with the corresponding filter response, and others.

The lower part of the diagram is closely related to the network management, with an inventory of all hardware parts identified with their serial numbers and related to the institution that owns them. The upper part is related to the raw data production, and its organization is optimized for data extraction (to create products) and is consistent with the physical meaning and relevance of the quantities. The installations are manually introduced by the network managers so that any data file submitted to the system from a measurement station can be validated.

Other tables contain ancillary information that is needed to process data, such as the list of stations (including coordinates), global climatologies for certain atmospheric components (ozone, nitrogen dioxide, etc.), solar and lunar extraterrestrial irradiance spectra, and spectral absorption coefficients for several species (ozone, NO2, water vapor, etc.).

Many different DBMS can be used to implement such a model: OracleDB, SQLite, PostgreSQL, etc. CÆLIS is based on a MySQL database. MySQL software is widely used by many different communities. Therefore the software is very robust, complete, stable and well documented, and it can be run in many different architectures.

The entity–relationship diagram for CÆLIS, illustrated using the model defined by Chen[5], is shown in Fig. 2. This diagram shows the fundamental part of the database, called layer 0. On top of that, direct products—obtained with the combination of raw data, calibration coefficients, and ancillary data—are stored. This represents “layer 1” products, physical quantities with their corresponding units and estimated uncertainties (derived from the calibration uncertainties). In our case, these products are basically aerosol optical depth, water vapor content, sky radiances, and degree of linear polarization of the sky light. On top of layer 1, there are more sophisticated products, like those derived from inversion algorithms, as well as any flags or “alarms” that are produced to help in NRT data quality control.

Layer 2 products use and combine previous layer quantities to retrieve other parameters, but no longer go down to the raw data. For instance, the inversion codes by Dubovik and King[3] and Nakajima et al.[6] use spectral aerosol optical depth and sky radiances to retrieve aerosol particle size distribution, refractive indices, single-scattering albedo, etc. More advanced products that combine photometer data with other aerosol data (e.g., lidar) also belong to this group, named “layer 2” products. A clear example is the GRASP algorithm[7] (http://www.grasp-open.com/), which is able to digest data from different sensors (satellite and ground-based, active or passive) to provide a wide set of aerosol and surface parameters. The system architecture as described here is shown in Fig. 3.


Fig3 Fuertes GIMDS2018 7-1.png

Figure 3. Different logic data layers. Each layer is based on the information of the previous layer.

Processing chain and near-real-time module

CÆLIS system provides many different data products. To provide each product, some input data has to be processed in a specific way. This is what we call a “task.” The job is divided into a set of simple tasks. The system works as a state machine: one task cannot start until the previous one is finished, no matter if the second task is dependent on or independent of the previous one. When many tasks work sequentially to achieve a common objective, we create a chain of tasks. The daemon running on the server is responsible for coordinating the different tasks, as it will be explained in the next section.

References

  1. Intergovernmental Panel on Climate Change (2014). Climate Change 2013 – The Physical Science Basis: Working Group I Contribution to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge University Press. doi:10.1017/CBO9781107415324. ISBN 9781107415324. 
  2. Holben, B.N.; Eck, T.F.; Slutsker, I. et al. (1998). "AERONET—A Federated Instrument Network and Data Archive for Aerosol Characterization". Remote Sensing of Environment 66 (1): 1–16. doi:10.1016/S0034-4257(98)00031-5. 
  3. 3.0 3.1 Dubovik, O.; King, M.D. (2000). "A flexible inversion algorithm for retrieval of aerosol optical properties from Sun and sky radiance measurements". Journal of Geophysical Research: Atmospheres 105 (D16): 20673-20696. doi:10.1029/2000JD900282. 
  4. Dubovik, O.; Sinyuk, A.; Lapyonok, T. et al. (2006). "Application of spheroid models to account for aerosol particle nonsphericity in remote sensing of desert dust". Journal of Geophysical Research: Atmospheres 111 (D11). doi:10.1029/2005JD006619. 
  5. 5.0 5.1 Chen, P.P.-S. (1976). "The entity-relationship model—toward a unified view of data". ACM Transactions on Database Systems 1 (1): 9–36. doi:10.1145/320434.320440. 
  6. Nakajima, T.; Tonna, G.; Rao, R. et al. (1996). "Use of sky brightness measurements from ground for remote sensing of particulate polydispersions". Applied Optics 35 (15): 2672-86. doi:10.1364/AO.35.002672. PMID 21085415. 
  7. Dubovik, O.; Lapyonok, T.; Litvinov, P. et al. (19 September 2014). "GRASP: A versatile algorithm for characterizing the atmosphere". SPIE Newsroom. SPIE. doi:10.1117/2.1201408.005558. http://spie.org/newsroom/5558-grasp-a-versatile-algorithm-for-characterizing-the-atmosphere?ArticleID=x109993. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. Grammar was cleaned up for smoother reading. In some cases important information was missing from the references, and that information was added. The original article lists references alphabetically, but this version—by design—lists them in order of appearance.