Journal:GeoFIS: An open-source decision support tool for precision agriculture data

Full article title	GeoFIS: An open-source decision support tool for precision agriculture data
Journal	Agriculture
Author(s)	Leroux, Corentin; Jones, Hazaël; Pichon, Léo; Guillaume, Serge; Lamour, Julien;; Taylor, James; Naud, Olivier; Crestey, Thomas; Lablee, Jean-Luc; Tisseyre, Bruno
Author affiliation(s)	University of Montpellier, SMAG, Compagnie Fruitière
Primary contact	Email: cleroux at smag-group dot com
Year published	2018
Volume and issue	8(6)
Page(s)	73
DOI	10.3390/agriculture8060073
ISSN	2077-0472
Distribution license	Creative Commons Attribution 4.0 International
Website	http://www.mdpi.com/2077-0472/8/6/73/htm
Download	http://www.mdpi.com/2077-0472/8/6/73/pdf (PDF)

Abstract

The world we live in is an increasingly spatial and temporal data-rich environment, and the agriculture industry is no exception. However, data needs to be processed in order to first get information and then make informed management decisions. The concepts of "precision agriculture" and "smart agriculture" can and will be fully effective when methods and tools are available to practitioners to support this transformation. An open-source program called GeoFIS has been designed with this objective. It was designed to cover the whole process from spatial data to spatial information and decision support. The purpose of this paper is to evaluate the abilities of GeoFIS along with its embedded algorithms to address the main features required by farmers, advisors, or spatial analysts when dealing with precision agriculture data. Three case studies are investigated in the paper: (i) mapping of the spatial variability in the data, (ii) evaluation and cross-comparison of the opportunity for site-specific management in multiple fields, and (iii) delineation of within-field zones for variable-rate applications when these latter are considered opportune. These case studies were applied to three contrasting crop types: banana, wheat, and grapes. These were chosen to highlight the diversity of applications and data characteristics that might be handled with GeoFIS. For each case-study, up-to-date algorithms arising from research studies and implemented in GeoFIS were used to process these precision agriculture data. Areas for future development and possible relations with existing geographic information systems (GIS) software is also discussed.

Keywords: decision making, GeoFIS, geostatistics, open-source software, precision agriculture, spatial analysis

Introduction

Within-field variability is now a widely accepted and reported phenomenon by the precision agriculture community.^[1]^[2] Geolocalized data are effectively collected intensively within the fields by sensors embedded on agricultural machinery, satellites, flying platforms, static stations, or humans among others, to make sure that this variability is considered and accounted for.^[3]^[4]^[5] Spatial data have particular characteristics that are worth careful consideration during analysis. First of all, their spatial resolution (density) is of interest as it defines the capacity to identify short- and long-scale spatial variability.^[6]^[7] Spatial records are often associated with a high level of noise that originates for multiple reasons, such as the plant to plant variability, the accuracy of the sensor, or the conditions of data acquisition.^[8] Except for images in which data are regularly distributed on a grid of pixels, many spatial observations collected in agriculture are irregular and do not follow a fixed pattern within the fields.^[9] This feature is of great concern because many image processing algorithms cannot be directly used on these irregular data.

To benefit from this increasing flow of data, users should be provided with software or tools that allow them to:

visualize the data they have collected (simple or low-level functions),
process these data (advanced or high-level functions), and
incorporate the knowledge they have on these data into the data processing.

It is acknowledged that basic visualization tools—e.g., data import, georeferencing, data display—are available in many general (e.g., Quantum Geographic Information System (QGIS), gvSIG, Google Earth, Whitebow Geospatial Analysis Tools) and more specific^[10]^[11] open-source platforms, including those not specific to agricultural applications. It is clear that such functionalities are of major importance for handling spatial data. However, when it comes to making informed management decisions, these visualization functions are not sufficient. It is necessary to provide users with more advanced or high-level functions so that they can turn this raw spatial data into information and decision layers. The most commonly required procedures in the precision agriculture domain are functions such as:

filtering, to ensure the quality of the datasets^[12]^[13],
interpolation, to provide a continuous mapping of the property of interest^[14]^[15]^[16],
zoning, to define within-field zones for site-specific management^[17]^[18], or
aggregation so that multiple layers of information can be combined.^[19]^[20]

To foster the adoption of such tools, all the aforementioned functions have to be specifically dedicated to the processing of agricultural data from potentially very differing productions systems. This is an important consideration as these data come with a lot of associated knowledge that has to be considered when processing these data. More specifically, significant local expertise to support decision making might be available as users, e.g., farmers, advisors and/or technicians, have normally been scouting the fields during all the growing season.^[21]^[22]^[23] Site-specific management also requires the use of agricultural machinery with specific characteristics that have to be considered in these processing functions. This is to ensure that planned differential management is in accordance with the practical and operational limitations of machinery, e.g., working width, lag time, and application speed.^[24]^[25]

From a general perspective, there are only a few dedicated software programs available to explicitly process precision agriculture data and incorporate expert knowledge into the process. Moreover, very few of them are open-source. Some freeware and shareware tools have been developed and proposed by the precision agriculture community, but these generally focus on specific processing tasks or on a particular type of data. For example, the Vesper program^[26], developed by the University of Sydney, provides users with a graphical interface to spatially interpolate their data. Despite the quite advanced functions that are available, e.g., local punctual and block kriging, users only end up with a continuous map of their data without much more practical information. The Yield Editor software from the United States Department of Agriculture^[13]^[27] deals effectively with the filtering of within-field yield datasets that are known to contain many defective observations^[28], but it does not perform interpolation or other high-level functions. Another interesting example is a QGIS plugin that was put into place to process spatial data of vine shoot diameter arising from the mounted sensor Physiocap (E.RE.C.A, Vaulx-en-Velin, France). This tool mainly incorporates functions to filter these highly noisy datasets. Other platforms have been proposed by agronomist to give farmers access to crop models, but they are very specific in terms of crop, data, and use.^[29] An open-source platform that takes raw data through to a decision point is not available to the precision agriculture community yet.

The aim of this paper is to present the GeoFIS software (https://www.geofis.org/), developed by a joint team from IRSTEA, INRA, and Montpellier SupAgro in France.^[30] The goal of this platform is to provide users with up-to-date and reliable algorithms to process their precision agriculture data and incorporate expert knowledge from the fields. GeoFIS has been mainly developed for academic and research purposes, i.e., investigators and students willing to process their data, but also to a lesser extent for agronomists and advisors with a sufficient background in spatial analysis. The objective of this interface-based platform is to support users who do not necessarily have programming skills and to show that high level functions can be introduced in a GIS and could be integrated within precision agriculture programs. The first section introduces this open-source tool along with its architecture, design, interface, and main processing functions. Three different case studies on various crops are then considered to evaluate the ability of this software to answer most of the issues that are faced by the agricultural sector for processing their spatial data. The last section highlights the needs for future developments to promote precision agriculture adoption and the possibility to create connections with existing GIS software programs.

The GeoFIS software

Aim of the GeoFIS project

GeoFIS has been designed to facilitate the movement from spatial data to spatial information, and to spatial decision making. It is an open-source program that proposes a simple and easy-to-use interface to build decision support systems (DSS) from spatial data.^[30] While its development has been inspired by agri-environmental applications, the framework itself is open and accessible to applications in other domains. It is designed to be adaptable to different usages and for different end users, mostly for academic and research applications, for student and teaching applications, and, to a lesser extent, for GIS-skilled agronomists and advisors.

GeoFIS deviates from other GIS software, e.g., QGIS, in the sense that specific tools have been implemented to answer the main expectations of agricultural professionals when it comes to processing precision agriculture data. These will be presented later on. It is acknowledged that multiple other open-source spatial programs (e.g., QGIS) or languages (e.g., R and Python) are available to process spatial and temporal data. However, these open-source tools do not have specific functions dedicated to the processing of precision agriculture data (as listed in the introduction section) and usually require users to have skills in programming. This is a major limiting factor for the practical use of spatial modelling in agriculture. Another strength of GeoFIS is that attention has been paid to the incorporation of expert knowledge into data analysis. This is not available in other related spatial processing tools. Agricultural professionals have significant local expert knowledge on their production system that needs to be taken into account. By incorporating this qualitative expert knowledge, the quality of the processing should be improved and the adoption of precision agriculture technologies should be enhanced.

Architecture and design of GeoFIS

In the proposed GeoFIS architecture, all the open-source toolboxes and libraries have been selected for their ability to handle spatial data and to incorporate expert knowledge (Figure 1). Statistical and geostatistical functions dedicated to precision agriculture data (see next subsection) are implemented in R (https://www.r-project.org). Outside these specific functions, spatial data are handled through two open-source libraries, i.e., Geotools (http://www.geotools.org) and CGAL (Computational Geometry Algorithms Library, https://www.cgal.org). Geotools is used because its Java implementation allows the design of user-friendly interfaces. CGAL was chosen for its ability to provide very efficient and reliable geometric algorithms, as its functions are developed in C++. Finally, the incorporation of expert knowledge is made possible with FisPro (https://www.fispro.org), a system that uses fuzzy sets for conceptual modeling.^[30]

Figure 1: The GeoFIS architecture^[30]. CGAL, Computational Geometry Algorithms Library; DSS, Decision Support Systems; GIS, Geographic Information System; 1D, One dimension

GeoFIS is available in four languages (French, English, Spanish, and Portuguese). The interface is designed with a man-machine cooperation objective. The goal is to facilitate the relationships between data, learning algorithms, and expert knowledge. Documentation, scientific papers, and video tutorials are available to better understand the implemented function and to facilitate the adoption of the GeoFIS software (https://www.geofis.org/). Notifications are made when a new version of the software is available.

Functionalities implemented in GeoFIS

GeoFIS contains a series of low and high-level non-spatial and spatial functionalities to interrogate spatial data. The general functionalities are introduced here and then expanded in several case studies in the following section. Figure 2 shows the generic flow required in precision agriculture, from raw data processing to decision making, with the functionalities within GeoFIS at each stage indicated. In agricultural systems, data are available in different formats (points, polygons, rasters) and at different scales. The quality of the data is also variable, with some sensors being inherently noisy and others less so. Different data need potentially different approaches to (i) data validation and clean-up (quality control), (ii) data display (visualization), and, when necessary, (iii) interpolation. These steps transform data into information layers. Within GeoFIS, data can be easily imported (Step 0) and displayed as a map (in its geographical space) and as a histogram (in its attribute space). This allows the user to "expertly" identify global outliers in both the geographical and attribute space and remove any erroneous data (Step 1). Interpolation is possible using inverse distance weighting (for small data sets) and via punctual kriging with a global variogram for larger data sets (>100 points). The kriging method includes the ability to plot the experimental variogram and specify a theoretical variogram, which is then passed to the kriging function. Interpolated outputs can be directly displayed as rasters within the display (Step 2).

Figure 2: Generic flow of data in precision agriculture with main processing steps from raw data processing to decision-making.

"Precision agriculture" or "smart agriculture" is only effective when effective decisions are made. End users can transform these information layers into decision layers to improve the management of their fields. Three main functionalities for management (practical) applications have been incorporated within GeoFIS to address this. Firstly, practitioners are provided with a method to delineate within-field homogeneous zones (Step 3.1). Zoning is of importance for precision agriculture data, as the identified zones will (i) facilitate spatial data visualization and interpretation and (ii) provide a spatial resolution that is practical and effective for many differential field operations. GeoFIS uses a segmentation algorithm to "zone" data layers.^[18] The segmentation algorithm operates either on irregular or gridded (interpolated) data to generate potential management zones.

Secondly, while data/information collection tends to be focused around production issues, there is no restriction on its use. It can equally be used for strategic as well as tactical decision making. The example of the technical opportunity index (TOI)^[31], which is implemented in GeoFIS, is a case in point. The TOI uses the production data to assess a field’s suitability for site-specific management given machinery constraints and the observed production variation (Step 3.2). The algorithm processes the within-field data with a mathematical morphological filter based on erosion and dilation.^[31] This filter allows end users to account for the passes of the agricultural machinery in the field and especially the minimum area (kernel) within which it can operate reliably. As the algorithm requires the data to be organized regularly on a grid, interpolating the data might therefore be required as a pre-processing step (Step 2).

Finally, in the majority of cases, practical agronomic decisions are multi-variate in nature. Decision support therefore requires dedicated data fusion methods to merge multiple information layers into a single decision layer (Step 3.3). For instance, when available, historical yield data (high spatial resolution point information), as-applied historical fertilizer maps (polygon data), recent point soil testing (low spatial resolution point data), and early season satellite imagery (high resolution raster) should collectively feed into a decision on mid-season spatial fertilizer inputs, i.e., a prescription fertilizer map (normally a polygon layer). In the previous example, the prescription fertilization map (the decision layer) is based on a set of inputs (information layers) that are all related through expert rules. An example of a possible expert rule could be that if, on a given location in space, the observed yield is high and the soil fertilizer level is low, then it might be relevant to apply more fertilizer inputs. Within GeoFIS, the goal of the data aggregation process is to implement the expert rules so that the final spatial decision layer (that answers the question "how much fertilizer input should be applied at this particular place at this particular time?") can be obtained. Expert rules are implemented one at a time, as each rule leads to a practical agronomic decision.

Data aggregation in GeoFIS is a two-step process. First, each information layer is transformed into an expert layer, i.e., the numerical agronomic values in each information layer are transformed into degree values (from 0 to 1) according to the expert rule to be implemented. The transformation from an information layer to an expert layer is done using a fuzzy set-based function.^[32] Secondly, all the expert layers are combined using an aggregation operator to respect the expert rules. Two aggregation operators are currently implemented in GeoFIS. The first operator is the Weighted Arithmetic Mean (WAM), which attributes a weight to each information source, e.g., the yield information layer may be given twice as much weight as the soil fertilizer level layer. The second operator is the Ordered Weighted Average (OWA)^[33], where the weighing is slightly more complex. For a given location in space, the degree values associated with each layer involved in the expert rule are ordered, and the weights assigned to each layer will depend on their position in this ordering. This operator is of interest as it enables the implementation of logical operations, such as:

"OR," where the expert rule applies as soon as the highest degree associated with the layers is high, and
"AND," where the expert rule applies as soon as one of the degrees associated with the layers is high.

The result of the aggregation process is a single decision layer. The uniqueness of the GeoFIS approach is in its ability to incorporate the expert knowledge developed by farmers and advisors on the data and their fields directly into the data fusion process. The implemented data aggregation methods require the data to be collocated, either on irregular or regular grids.

Case studies

The previous section introduced the GeoFIS framework, including the functionalities implemented and how they could be adapted to the individual needs of each end user (who will have their own unique constraints on management). The following subsections provide more detailed illustrations on the main processing steps in the context of precision agriculture applications. More specifically, the three cases deal with the typical tasks that advisors and farmers may face in their daily job:

the mapping of spatial data (Steps 0, 1 and 2),
the evaluation and cross-comparison of the opportunity for site-specific management in their fields (Step 3.2), and
the delineation of within-field zones for variable-rate applications where zoning is considered opportune (Steps 3.1 and 3.3).

Steps 0 to 2 will be exemplified through medium spatial resolution manual measurements performed over a banana field to map the plant vigor. High resolution yield data across several wheat fields will be used to illustrate the value of Step 3.2 to rank the fields from the most to the least suitable for site-specific management. Step 3.1 and 3.3 will be applied on a precision viticulture example aimed at defining zones for differential irrigation management. The overall objective is to demonstrate how GeoFIS has the ability to address the main issues of data processing in precision agriculture. As the three case studies are performed on different crops (banana, wheat, and grapes), each exhibiting unique characteristics, the applicability and genericity of this open-source software will also be demonstrated.

Case study 1

Rationale and description

Mapping the spatial organization in the data—An example of the vegetative response of an asynchronous plant, the banana

Variography and mapping are two very important processing steps in the precision agriculture domain. The former helps evaluate the spatial structure in the data by quantifying the proportions of (i) spatially-structured variability or large-scale variations and (ii) spatially unstructured variability or small-scale variations within the field. The latter is mainly used for the correct display of the observed spatial variability and facilitate the process of decision making.

In this case study, GeoFIS was used to investigate and map the spatial variability in the pseudostem (trunk) circumference of banana crops. The proposed analysis was carried out on this crop for two major reasons. First of all, the spatial variability in the agronomic properties of banana crops has been poorly reported in the literature.^[34] Secondly, this crop is known to be asynchronous in its production cycle, which means that spatial analyses are to be handled differently from what is commonly done in annual crops, e.g., wheat, canola, or perennial ones, e.g., grapes.^[34] The proposed analysis (i) estimates the proportion of spatially-structured variability in pseudostem circumferences, i.e., the proportion of variance that is mainly due to spatially-structured environmental properties^[15]; (ii) determines the proportion of spatially unstructured variability that is due to non-spatially structured phenomena, e.g., the inter-plant variability, plant competition, replanting, and measurement accuracy among others; and (iii) maps the overall within-field variability of trunk circumference in the plantation.

The plot under study is situated in a commercial banana plantation in Njombe, Cameroon (WGS84: E: 4.612, N: 9.639) in its fifteenth flowering cycle. The pseudostem circumference measurements were only taken on plants where vegetative growth had ceased, i.e., plants that were either flowering or at a later phenological stage. There were 551 measurements taken using a tape measure at 1-m height and georeferenced with a trail type hand-held GPS (Table 1). The proposed analysis in GeoFIS consisted of the following steps: (i) the dataset was imported within GeoFIS (Step 0), (ii) pseudostem circumference values were filtered to ensure the quality of the dataset (Step 1), and (iii) variograms were fitted to the filtered datasets and interpolation was performed using kriging with a local neighborhood onto a 1×1 meter grid.

Surface (ha)	Total Number of Plant Observations	Number of Plants that Have Reached at Least the Flowering Stage	Trunk Circumference (cm)
Table 1. Description of the plot under investigation
Surface (ha)	Total Number of Plant Observations		Mean	Variance
0.85	1287	551	74.7	69.7

Application in GeoFIS

The global distribution of the data was filtered within GeoFIS (Figure 3). Users can select the attribute to be filtered at the top of the window. Below the histogram, two threshold values that represent the two tails of the distribution can be changed, by either typing specific values or moving a slide bar. Observations outside these thresholds are then removed from the dataset. Note that there were two low values in this data set that were considered outside the normal distribution by the user (Figure 3). The lower threshold allowed the user to eliminate these non-compliant values.

Figure 3: Filtering of the pseudostem circumference values based on distribution of response in the attribute space

The spatial structure of the data can then be evaluated by plotting an experimental variogram, here using the within-field pseudostem circumferences. The number of lags and the maximum lag distance can be set in the left-hand corner of the window to make sure that the variogram is relevant. The interface (Figure 4) enables the user to specify and fit a theoretical variogram model to the experimental variogram. A theoretical variogram is automatically fitted, after which users can interactively change the values of the variogram parameters, i.e., nugget, partial sill, and range to improve the fit. The quality of the fit can be assessed with the root mean square error (RMSE) value that is detailed in the top right-hand corner of the interface. The theoretical model can then be saved and used later to perform interpolation by kriging.

Figure 4: Screenshot from GeoFIS illustrating the calculation of the experimental variogram and the fitting of a theoretical variogram model to the within-field pseudostem circumference spatial data

Results and discussion

The spatial locations of the measurements are displayed in Figure 5. It clearly shows that the spatial observations are irregularly-spaced within the plot. This aspect can be simply explained by the fact that not all the banana plants had reached the flowering phenological stage (only 551 out of the 1287 plants had). In the plot under study, the pseudostem circumference exhibits a quite strong spatial autocorrelation, the ratio of autocorrelated variance being close to 55% (Table 2). This finding demonstrates that spatially-structured environmental properties, e.g., soil physical and chemical characteristics, are likely in this case to exert a relatively strong influence on the pseudostem circumference of the banana plants. The determination of the factors affecting the pseudostem circumference is beyond the scope of this study. Further analyses of, e.g., soil and plant records, might help to answer this question.

Figure 5: Spatial measurements of pseudostem circumference divided in five quantiles within the plot under study

Nugget Variance (C₀)	Partial-Sill Variance (C₁)	Sill Variance (C₀ + C₁)	Ratio of Autocorrelated Variance (C₁/C₀ + C₁)
Table 2. Spatial statistics of pseudostem circumference in the plot under investigation
35.2	43.4	78.6	55.2%

Table 2 also shows that the proportion of spatially unstructured variability (C₀) is not negligible. In this case study, it can be mainly explained by (i) the inherent within-plant variability that might be exacerbated by competition among neighbors, and (ii) the accuracy of the measurements which might be affected by Global Navigation Satellite Systems (GNSS) accuracy issues or operator errors.

Figure 6 provides a surface (map) of the within-field pseudostem circumference after interpolation (ordinary kriging). This smooths the data in Figure 5 using information on spatial variability contained in the same data. The circumferences appear to be much lower (less than 70 cm) in the northeastern and southern portions of the plots. The larger pseudostems, those for which the circumference exceeded 87 cm, can be mainly found in the northern part of the field. Some local effects—e.g., small sites of low circumference surrounded by high pseudostem circumferences—are also visible on the maps. Those might be explained by several phenomena having a localized effect on plants, such as pest damage or replanting. It is worth recalling that this final map is not a map of circumferences of all pseudostems; rather, it's a map of potential circumference at flowering, as not all the banana plants have reached the flowering stage. This map is an alternative representation of the information displayed in Figure 5 and provides predictions for plants that were not measured in the original survey. As for Figure 6, this map may be very useful in locating sampling sites to perform further soil and/or plant analyses and to better characterize the within-field pseudostem circumference variability. It has the advantage over the raw data plot (Figure 5) of being easier for the human eye to interpret the main patterns in the field.

Figure 6: Kriged map of the potential pseudostem circumference within the field under study. The map represents a potential rather than an exhaustive analysis of plants because not all the plants have reached the flowering stage.

GeoFIS proved to be a relevant tool to model the spatial variability in the banana pseudostem circumference data and for continuous mapping of this property of interest. However, a couple of limitations are worth discussing. Firstly, even if the filtering interface is user-friendly, it only provides a global filtering of the data. Only the tails of the distribution can be trimmed. It may have been that spatial data exhibit not only global but also local outliers. This was not a problem here, but removing local outliers would be a useful function in the software program. When present, local outliers (inliers) will affect the quality of interpolation procedures. Secondly, GeoFIS does not yet allow the fitting of nested variogram models. This was a potential issue in this case study. In Figure 4, it could be argued that there is a short-range spatial structure within the first 10 meters and a second spatial structure from 10 to 30 meters (with a longer range). Nested spatial structures are not common but do occur in agricultural data. Thirdly, regarding the continuous mapping of the data, GeoFIS only provides a kriged map of the property of interest. The mean estimates are given, but the error (kriging variance) associated with these estimates is not provided. This is a potential limitation for assessing the mapping accuracy and for interpreting uncertainty in future analyses with the interpolated data.