User:Shawndouglas/sandbox/sublevel6

From LIMSWiki
Jump to navigationJump to search

Sandbox begins below

Full article title A new numerical method for processing longitudinal data: Clinical applications
Journal Epidemiology Biostatistics and Public Health
Author(s) Stura, Ilaria; Perracchione, Emma; Migliaretti, Giuseppe; Cavallo, Franco
Author affiliation(s) Università di Torino, Università di Padova
Primary contact Email: Ilaria dot stura at unito dot it
Year published 2018
Volume and issue 15(2)
Page(s) e12881
DOI 10.2427/12881
ISSN 2282-0930
Distribution license Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
Website https://ebph.it/index.php/ebph/article/view/12881
Download https://ebph.it/article/view/12881/11630 (PDF)

Abstract

Background: Processing longitudinal data is a computational issue that arises in many applications, such as in aircraft design, medicine, optimal control, and weather forecasting. Given some longitudinal data, i.e., scattered measurements, the aim consists in approximating the parameters involved in the dynamics of the considered process. For this problem, a large variety of well-known methods have already been developed.

Results: Here, we propose an alternative approach to be used as an effective and accurate tool for the parameters fitting and prediction of individual trajectories from sparse longitudinal data. In particular, our mixed model, that uses radial basis functions (RBFs) combined with stochastic optimization algorithms (SOMs), is here presented and tested on clinical data. Further, we also carry out comparisons with other methods that are widely used in this framework.

Conclusion: The main advantages of the proposed method are the flexibility with respect to the datasets, meaning that it is effective also for truly irregularly distributed data, and its ability to extract reliable information on the evolution of the dynamics.

Keywords: statistical method, radial basis function; stochastic optimization algorithm, longitudinal data

Introduction

Longitudinal data are often the object of study in many fields, e.g., sociology, meteorology, and medicine. In medicine, repeated measurements are used to monitor patients’ behaviors and also to adjust therapies accordingly. However, many problems occur when these data are analyzed. Indeed, each time series could have a different number of observations and not be equally spaced. In addition, the sampling period could vary from patient to patient, and measurement errors and also missing data often occur. Thus, since in these cases common methods such as linear regression usually fail, the recent research is directed towards more robust statistical methods. For instance, longitudinal data are commonly analyzed using parametric models such as Bayesian ones[1], as well as functional data analysis (FDA).[2][3] In both cases, many data are required in order to model the behavior of the studied variable(s). These methods, in fact, try to find an "average curve" using all the data, including truncated series and observations with missing information.

However, in clinical applications the estimate on the future dynamics of a single series, given few previous values, could be needed; think for instance to tumor volumes during a treatment, height/weight of children during growth, and concentration of some substance in the body. Each patient is different and could have different growth behavior and different growth parameters, so an "average curve" could not be sufficient. An important piece of information could be, for example, the possible future development of the subject, given his/her previous growth and the clinical background (e.g., treatments). These data could be compared with the real dynamics, in order to see if the response of the patients to the treatment is stable (the parameters do not vary in the future) or not (change in the parameters).

The aim of our work is to propose our numerical tool that can provide information on the future dynamics given few follow-up data. Thus, we first model longitudinal data via widely used mathematical models in population dynamics. As such, on one hand we aim at validating such a model by approximating the parameters involved in the dynamics. On the other one, we are also interested in giving reliable information on the future dynamics of the curves.

In order to achieve our goal, we propose our numerical tool based on optimization methods coupled with interpolation techniques. Specifically, we approximate the parameters involved in the dynamics by means of stochastic optimization algorithms (SOMs).[4][5][6][7] Moreover, for each data series, we improve the performances of the optimization tools by means of radial basis function (RBF) interpolation; see Fasshauer and McCourt[8] for a general overview and Cavoretto et al.[9][10] for particular instances on the topic and applications. In the interpolation process, we also take into account the critical computational issue of carrying out stable computations. For this reason, and since data are subject to noise, we adopt a kind of Tikhonov regularization.[11]

The method, namely RBF-SOM, is here tested on two different datasets:

  • height measurements of children with a diagnosis of growth hormone deficiency (GHD) during treatment, and
  • prostate-specific antigen (PSA) values of prostatectomized patients with a recurrence of prostate cancer.

In the next section of this paper, the RBF-SOM technique is described. Afterwards, the two datasets used for the validation are presented. The "Results" section is devoted to the numerical results and it is divided into two subsections: in the first one, all the data of each series are considered in order to reconstruct the curves and approximate the parameters, while, in the second one, only a few initial data of each series are used to predict the curve behavior. The last two sections offer a discussion and conclusions.

Methods

This section is devoted to describe the method used to fit a given data series and to approximate the parameters involved in the dynamics.

Given several scattered measurements sampled at different times , the basic idea of the RBF-SOM here proposed consists in considering the theoretical function f, depending on the time t and on several parameters λ = (λ1,..., λp), and to approximate such parameters in order to obtain reliable information on the biological or physical phenomenon.


In the proposed examples, we use, as theoretical growth curve f, the so-called Gompertzian function:

,

where f0 is the measurement at time t0 (i.e., the first measurement), λ1 is the growth rate, and λ2 is the carrying capacity, i.e., the maximum value that can be asymptotically achieved by f.

The Gompertzian function is characterized by a fast-growing initial period and by a progressive slowdown, reaching a carrying capacity after a certain time. This curve, depending on the values of the parameters, is able to model a variety of types of growth, from human to cancer cells ones, see [12-16] for details. For this reason, we will use in Section 5 the same function for both datasets. Moreover, its form is particularly suitable in this study because the parameter estimation is not possible via simple methods like Least Square Approximation.

Trivially, the parameters are approximated by finding

.

Note that we need optimization methods that can be used in case of non-linearity of f, as in the considered cases. In particular, we direct our research throughout stochastic methods. They have been designed by considering analogies with natural phenomena. The most popular are evolution strategy and genetic algorithms, both based on competition among individuals. On the opposite, other methods proposed in the last decades mainly focus on cooperation. Among them, particle swarm optimization (PSO), cuckoo search (CS), and ant colony are widely used techniques, based on the mutual interaction and exchange of information between individuals. In particular, here we will consider PSO and CS, briefly described in what follows.

PSO has been firstly introduced by Kennedy (social psychologist) and Eberhart (electrical engineer)[4] and was further developed by other researches.[6][7][12] In order to describe it, let us consider a group of particles or birds which are represented as points in the space. At first, we need to model their way of flying. Then, taking into account that the target of birds consists in looking for the maximum availability of food, i.e., the minimum of the objective function f, we can easily find its minimum.

The main objective consists in simulating the trajectories of the single birds by considering their selfish behavior (which is the ability of a bird of randomly flying away from the flock) and their social behavior (which is the ability of a bird of staying in the group). With these simple considerations, it is possible to simulate the way of moving of a group of birds, taking also into account that particles avoid collisions.

To explain how we can find the minimum of the objective function interpreting the latter as food, let us first suppose that a bird discovers some food. Then, the other birds have two alternatives: get out of the flock and reach the food (selfish behavior) or stay in the flock (social behavior).

If a good trade-off between the two behaviors is allowed, then the flock can reach the minimum. Indeed, if a bird can move towards some food, then other birds can change their directions towards the same place. Acting in this way, the flock gradually changes its direction until the best place, i.e., the minimum, is reached.

As concerns CS, it was developed by Yang[13] and it simulates the behavior of the cuckoo, a bird that does not incubate its eggs but tries to put them in nests of other species. The problem of this conduct is that, in some cases, the egg is removed by the nest’s owner. The cuckoo, then, searches for a nest in which its egg can be "confused" with the others. Therefore, in this algorithm the minimum of the function is the nest in which more cuckoos can put their eggs without being discovered.

As for the PSO, the user needs to give a set of possible initial solutions. They are usually randomly initialized. Indeed, if the initial solutions are chosen so that they are feasible, the stochastic methods do not fail into local minima, and thus the methods are not truly sensitive with respect to the initial conditions. The main difference with respect to the PSO approach is that, at each iteration, a fraction of nests, which are far from the minimum, are abandoned and new ones close to the minimum are built.

Note that both PSO and CS approaches can be performed in order to minimize the target function, but unfortunately the cardinality of the samples in concrete applications is really small. Thus, in order to improve the performance of the optimization methods, we first reconstruct the growth curves by means of an RBF-based interpolation scheme; see (Fasshauer and McCourt 2015; Wendland 2005)”plainCitation”:”(Cavoretto, de Rossi, and Perracchione 2017; Fasshauer and McCourt 2015; Wendland 2005. In doing so, we also take into account the instability problems arising in applications. An example of RBF reconstruction can be seen in Fig.1a-b (big coloured dots).


References

  1. Rao, C.R. (1987). "Prediction of Future Observations in Growth Curve Models". Statistical Science 2 (4): 434–47. doi:10.1214/ss/1177013119. 
  2. Ji, H; Müller, H.-G. (2017). "Optimal designs for longitudinal and functional data". Statistical Methodology Series B 79 (3): 859-876. doi:10.1111/rssb.12192. 
  3. Ramsay, J.; Silverman, B.W. (2005). Functional Data Analysis. Springer-Verlag. pp. 428. ISBN 9780387400808. 
  4. 4.0 4.1 Kennedy, J.; Eberhart, R. (1995). "Particle swarm optimization". Proceedings of ICNN'95 - International Conference on Neural Networks 4: 1942–8. doi:10.1109/ICNN.1995.488968. 
  5. Parsopoulos, K.; Vrahatis, M. (2002). "Particle swarm optimization method for constrained optimization problems". In Sincák, P.; Kvasnicka, V.; Vascák, J.; Pospíchal, J.. Intelligent Technologies: from Theory to Applications. Frontiers in Artificial Intelligence and Applications. 76. IOS Press. pp. 214–20. ISBN 9781586032562. 
  6. 6.0 6.1 Pedersen, M.E.H.; Chipperfield, A.J. (2010). "Simplifying Particle Swarm Optimization". Applied Soft Computing 10 (2): 618–28. doi:10.1016/j.asoc.2009.08.029. 
  7. 7.0 7.1 Shi, Y.; Eberhart, R. (1998). "A modified particle swarm optimizer". 1998 IEEE International Conference on Evolutionary Computation Proceedings: 69–73. doi:10.1109/ICEC.1998.699146. 
  8. Fasshauer, G.; McCourt, M. (2015). Kernel-based Approximation Methods using MATLAB. Interdisciplinary Mathematical Sciences. 19. World Scientific. pp. 536. doi:10.1142/9335. ISBN 9789814630139. 
  9. Cavoretto, R.; De Rossi, A.; Perracchione, E. (2018). "Optimal Selection of Local Approximants in RBF-PU Interpolation". Journal of Scientific Computing 74 (1): 1–22. doi:10.1007/s10915-017-0418-7. 
  10. Cavoretto, R.; De Rossi, A.; Qiao, H. (2018). "Topology analysis of global and local RBF transformations for image registration". Mathematics and Computers in Simulation 147 (5): 52–72. doi:10.1016/j.matcom.2017.10.010. 
  11. Cancelliere, R.; Gai, M.; Gallinari, P.; Rubini, L. (2015). "OCReP: An Optimally Conditioned Regularization for pseudoinversion based neural training". Neural Networks 71 (11): 76–87. doi:10.1016/j.neunet.2015.07.015. 
  12. Qasem, S.N.; Shamsuddin, S.M. (2011). "Radial basis function network based on time variant multi-objective particle swarm optimization for medical diseases diagnosis". Applied Soft Computing 11 (1): 1427–38. doi:10.1016/j.asoc.2010.04.014. 
  13. Yang, X.-S.; Deb, S. (2009). "Cuckoo Search via Lévy flights". Proceedings from the 2009 World Congress on Nature & Biologically Inspired Computing: 210-214. doi:10.1109/NABIC.2009.5393690. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation, spelling, and grammar. We also added PMCID and DOI when they were missing from the original reference. The original article's inline citations are not in numerical order (after citation 11); due to the nature of this wiki, citations are numbered in order automatically, and therefore the numbering differs from the original after citation 11. No other modifications were made in accordance with the "no derivatives" portion of the distribution license.