Gowda PH^{1,*}, Oommen T^{2}, Misra D^{3}, Schwartz RC^{4}, Howell TA^{4} and Wagle P^{5}  
^{1}USDAARS Grazinglands Research Laboratory, El Reno, OK 73036, USA  
^{2}Department of Geological and Mining Engineering and Sciences, Michigan Technological University, 1400 Townsend Drive, Houghton, MI, USA  
^{3}Department of Mining and Geological Engineering, University of Alaska Fairbanks, P.O. Box 755800, Fairbanks, AK, USA  
^{4}USDAARS Conservation and Production Research Laboratory, P.O. Drawer 10, Bushland, TX, USA  
^{5}Department of Microbiology and Plant Biology, and Center for Spatial Analysis, University of Oklahoma, Norman, OK 73019, USA  
Corresponding Author :  Gowda PH USDAARS Grazinglands Research Laboratory El Reno, OK 73036, USA Tel: 4052625291 EMail: [email protected] 
Received: December 01, 2015; Accepted: December 04, 2015; Published: December 07, 2015  
Citation:Gowda PH, Oommen T, Misra D, Schwartz RC, Howell TA, et al. (2015) Retrieving Leaf Area Index from Remotely Sensed Data Using Advanced Statistical Approaches. J Remote Sensing & GIS 4:156. doi:10.4172/24694134.1000156  
Copyright: © 2015 Gowda PH, et al. This is an openaccess article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.  
Related article at Pubmed, Scholar Google 
Visit for more related articles at Journal of Remote Sensing & GIS
Mapping and monitoring leaf area index (LAI) is critical to model surface energy balance, evapotranspiration, and vegetation productivity. Remote sensing helps in rapid collection of LAI on individual fields over large areas, in a time and costeffective manner using empirical regression between LAI and spectral vegetation indices (SVI). However, these relationships may be ineffective when sunsurface sensor geometry, background reflectance and atmosphereinduced variations on canopy reflectance are larger than variations in the canopy itself. This requires development of superior and regionspecific LAISVI models. In recent years, statistical learning methods such as support vector machines (SVM) and relevant vector machines (RVM) have been successful over the ordinary least square (OLS) regression models for complex processes. The objective of this study is to develop and compare OLS, SVM, and RVMbased reflectance models to estimate LAI for major summer crops in the Texas High Plains. The LAI was measured in 47 randomly selected commercial fields in Moore and Ochiltree counties. Data collection was made to coincide with Landsat 5 satellite overpasses on the study area. Numerous derivations of SVIs were examined for estimating LAI using OLS, SVM, and RVM models. Analysis of the results indicated that the SVILAI models based on the ratio of TM bands 4 and 3, and normalized difference vegetation index (NDVI) are most sensitive to LAI. The R2 values for selected models varied from 0.79 to 0.96 with the SVM model producing the best results. However, accuracy of reported LAI models needs further evaluation that accounts for infield spatial variability in the LAI for wider applicability.
Keywords 
Semiarid; Ogallala aquifer region; ET modeling; Texas high plains; Statistical learning method 
Introduction 
Leaf area index (LAI) is a measure of foliage density that plays a major role in photosynthesis, groundwatersurface water interactions through evapotranspiration (ET) [1], atmospheric gas exchange [2], nutrient uptake [3], and crop productivity [4]. Obviously, it is one of the sensitive input parameters to plant growth, atmospheric circulation [5,6], energy balance [7], terrestrial ecology [8], global climate change [9], and water quality simulation models [10] at field to landscape to global scales. Accurate estimates of LAI are also useful in estimating soil water content from microwave remote sensing data by subtracting the effects of crop water content on reflectance [11]. 
Traditional insitu techniques to measure LAI involve destructive sampling of leaves and are time intensive. However, numerous indirect insitu methods have been developed to measure LAI including the scanner method [12], electronic leaf area meter (DeltaT Devices Ltd, Cambridge1, UK) [13] and LAI2000, Plant Canopy Analyzer [14]. Although these insitu techniques can be accurate, they are not practical to monitor LAI spatially and temporally over large geographic areas. An alternative approach is to employ satellite remote sensing techniques to estimate LAI. However, the use of satellite data can be a practical and alternative to insitu measurements, provided spectral vegetation indices (SVI) based LAI models are available. 
In the last three decades, numerous SVIs have been developed to estimate LAI from Landsat Thematic Mapper (TM) data for corn and soybean [4] and from Enhanced Thematic Mapper Plus (ETM+) data for mixed crops mainly sugarcane [15] and for corn and soybean [16] crops. These studies and others have shown that there is a strong contrasting relationship between spectral reflectance measurements of canopy cover in red and infrared wavelengths. Consequently, a simple ratio (SR), normalized difference vegetation index (NDVI), and soil adjusted vegetation index (SAVI) [17] are some of the commonly used SVIs to estimate LAI. However, these indices are not sensitive at LAI values greater than 3.0 m^{2} m^{2} [11]. The normalized difference wtaer index (NDWI; [11]) uses normalized difference between NIR and shortwave infrared (SWIR) reflectance and the green index (GI; [18]) uses green in place of red reflectance. This green index appears to remain sensitive for LAI values between 3.0 and 6.0 [16]. 
Most of LAISVI statistical relationships reported in the literature is based on ordinary least square (OLS) regressions. In recent years, numerous artificial neural networks (ANN), support vector machines (SVM), and relevant vector machines (RVM) based models have been developed to estimate biophysical characteristics of both agricultural and forest canopies [19]. These models have proved to be superior over the OLS regression models [20]. The remote sensing of LAI often suffers due to its coarse spatial resolution resulting in complex observation with large noise. A key advantage of the advanced statistical algorithms such as SVM and RVM over OLS is the possibility of using a loss function for handling noise in the data together with the ability to obtain a sparse solution to the regression problem. A detailed review on application of SVM in remote sensing can be found in the literature [21]. Therefore, the applicability of these advanced statistical models for LAI prediction could be valuable in approximating the complex relationship and to deal with the noise in the data. 
While satellite remote sensing based SVIs have been used for mapping and monitoring LAI, the strengths and transferability of empirical LAISVI relationships to other regions may potentially be affected by exogenous factors such as sunsurface sensor geometry, background reflectance, cultural practices and atmosphereinduced variations on canopy reflectance [2225]. There have been few tests to compensate for exogenous effects on LAISVI relationships and results are mixed. Further, most studies in the past considered one vegetation type. Moreover, comparisons across studies have been limited [22]. 
Moreover, canopy cover reflectance is a multiple function of many variables that are different over time and space. Therefore, a single SVIbased LAI model for one region may not be a viable option for application to different regions. Overall, spectrally based LAI models for agricultural crops in semiarid regions are scarce. Further, there are a few OLSbased models that have been developed and validated for the Texas High Plains and but there are no SVMbased LAI models that exist for this region. Development of regionspecific LAI models will minimize errors in estimating LAI for use as input in the operational remote sensingbased evapotranspiration mapping programs [26], agronomic [4], ecological [8] and climatic [9]. The main objective of this study is to develop and compare a set of OLS, SVM, and RVMbased reflectance models to estimate LAI for major summer crops in the Texas High Plains. 
Materials and Methods 
Study area 
This study was conducted with LAI data collected from 47 randomly selected irrigated fields (23 in Ochiltree County and 24 in Moore County) in the Texas High Plains underlain by the Ogallala aquifer, which is being depleted by irrigated agriculture. Corn, sorghum, cotton and soybean are the major summer crops in both Ochiltree and Moore counties. Annual average precipitation is about 481 and 562 mm for Moore and Ochiltree counties, respectively. Crop water needs are supplemented with groundwater from the underlying Ogallala aquifer. Most of the croplands in both Moore and Ochiltree counties have nearly level to gently sloping fields with clay loam soils of the Sherm series (fine, mixed, superactive, mesic Torrertic Paleustolls) [27]. 
Field data collection 
Two Landsat TM scenes used in this study were acquired on June 27, 2005 for Ochiltree County and the other on July 4, 2005 for Moore County, for developing LAI models. Development of LAI models consists of three steps: 1) groundtruth (field) data collection, 2) atmospheric correction of Landsat TM imagery for deriving surface reflectance values for groundtruth location, and 3) development of SVILAI statistical relationships using OLS and SVM and/or RVM techniques. On the day of the Landsat 5 satellite overpass, groundtruth data were collected from 47 randomly selected commercial fields in the study area. Groundtruth data included geographic coordinates using a handheld Global Positioning System (GARMIN GPSMAP 76, Garmin Ltd), plant type and density, width of plant rows and extraction of one representative plant for LAI measurement in the laboratory. The LAI was measured using the electronic leaf area meter (DeltaT Devices Ltd, Cambridge, UK; [13]). 
Each pixel in the Landsat 5 image has a spatial resolution of 30 m after resampling. Ground truth pixel locations on the image were identified using the GPS coordinates. For model development, mean reflectance data from 9pixels (groundtruth pixel and surrounding 8 pixels) were used, as it was difficult to precisely identify the ground truth location in the field, without proper reflectors installed to validate the GPS coordinates with those of the satellite pixels. Atmospherically corrected surface reflectance (ρ_{SUR}) for each TM band was used to develop SVIs. As a first step to compute ρ_{SUR}, the digital numbers in each TM band image except thermal band 6 were converted into spectral radiance (L_{b}), using the equation: L_{b}=Gain x DN+Bias, where Gain and Bias were extracted from the image header files from the satellite data provider. The topoftheatmosphere reflectance (ρ_{TOA}) was calculated for each pixel in the image using the procedure outlined in [28]. In this procedure, the ρ_{TOA} for each pixel was calculated by dividing spectral radiance by the incoming energy (radiance) in the same shortwave band. The incoming irradiance is a function of mean solar exoatmospheric irradiance, solar incidence angle, and square of the relative earthtosun distance. The ρ_{SUR} was computed after applying atmospheric interference corrections for shortwave absorption and scattering using narrow band transmittance calibrated for each band with MODTRAN Version 4, a radioactive transfer model [29]. 
The LAISVI relationships were evaluated using OLS, SVM, and RVM techniques with measured LAI as the independent variable. A set of SVIs evaluated to develop LAISVI statistical relationships included difference indices, sum indices, product indices, ratio indices, and normalized difference indices including NDWI [11]. In addition, LAISAVI was evaluated with L value varied from .05 to 1 at 0.05 intervals. The OLS models used to evaluate each of the SVIs were linear, exponential, power, and quadratic. Finally, the best model in each category was identified and reported for the study area. 
Regression using support vector machine (SVM) 
Regression using SVM is referred to as the Support Vector Regression (SVR). Here we provide a brief overview of the theoretical concepts of SVR. A detailed depiction of SVR is beyond the scope of this paper and can be obtained from [3033]. SVM or RVM involves three steps in the regression processes. These are: (1) a random subset of the entire dataset is used as training data to develop the model. The model is developed, just like any other regression model, with some dependent variables that predict an independent variable such as the LAI; (2) a hyperweighted convolution process where a kernel function is used weigh the data to reside within a specified goal (ε ) of a regression boundary (see e.g., [30]) and a bias function is added to the data that falls outside of this goal in order to account for the deviation from the specified goal; and (3) once the training model is developed, it is applied to the rest of the data to test and assess how well the model confers or justifies the broader information. 
In machine learning, the data used for developing the model is referred to as the training data. Suppose we have training data where represent the predictor variables and represents observed LAI. The goal in SVR is to find a function f (x) that has the most ε deviation from the observed LAI , for all the training data, and at the same time, is as flat as possible. Mathematically, 
where denotes the dot product in X. Flatness in equation 1 means a small value ofw, and it can be obtained by minimizing the Euclidean norm, i.e., Thus, the SVR problem can be formulated as shown in equation 2. 
minimize 
subject to 
However, in some cases having a f unction f (x) that is flat with errors less than ε is not feasible. To deal with these infeasible situations a constant C and slack variables are introduced which leads to the formulation (equation 3) as stated in [30]. 
minimize 
subject to 
where C is the prespecified term that controls the magnitude of penalty associated with errors outside the error margin, and are slack variables representing upper and lower constraints on the output system. The constant C>0 determines the tradeoff between the flatness of the function and the amount to which deviations larger than ε are tolerated [34]. 
Lagrange multipliers that employ the KharushKuhnTucker (KKT) method are then used to solve the optimization problem in equation 3. The KKT method converts the inequality constraint into an equation of the form h(x)=0 by adding or subtracting slack variables and then solving the corresponding equality constrained quadratic optimization problem. Solution of the optimization problem results in a dual pair variable Lagrangian L_{d} (α_{i}, α_{i} *), one for each of the training patterns. The pairs that result in nonzero αi or α_{i} * are termed as the support vectors. During the training phase of the SVR model the support vectors are used to define the hyperplane (regression line) and to constrain the width of the optimal margin. Any training data that is outside the optimal margin, does not contribute to the definition of the regression line. 
Often in complex nonlinear problems the original input space (predictor variable) is nonlinearly related to the predicted variable (LAI). This limits a linear formulation of the problem as shown in equation 3. In SVR, this limitation is overcome by mapping the input space onto some higher dimensional space (feature space) using a nonlinear mapping function (kernel function). Commonly used kernel functions include the linear, polynomial, Gaussian radial basis, and sigmoid kernel functions. It was demonstrated that a linear kernel is a special case of the Gaussian radial basis kernel and that the sigmoid kernel behaves like a Gaussian radial basis kernel for certain parameters [35]. Therefore, in this study we use a Gaussian radial basis kernel function (equation 4). 
In any SVR problem, when we use the Gaussian radial basis function as the kernel function, we have three parameters to optimize during training: they are the Gaussian radial basis function parameter γ , magnitude of penalty term C, and the width/deviation of the error marginε . To avoid overfitting, the SVR model parameters were optimized using kfold cross validation [36]. In kfold cross validation, for each of the k splits, we use (k1) folds for training and the remaining one fold for testing. The advantage of kfold cross validation is that all the examples in the data set are eventually used for both training and testing. Finally, the optimal model parameter is determined by evaluating which model has the best generalizability. It was demonstrated that a k=10 or k=5 fold cross validation is appropriate to evaluate nonlinear models [36]. Therefore, in this study, the SVR model was optimized using a 5 fold crossvalidation. 
Regression using relevance vector machine (RVM) 
RVM, introduced by [33], provides a regression method in a Bayesian framework. RVM is a special case of a sparse linear model, where the kernel function Ö centered at the different training points forms the basis function given by: 
(5) 
where the output is a linearlyweighted sum of N, generally nonlinear and fixed, basis functions [33]. 
For the RVM learning, we use a set of input vectors along with corresponding set of targets which in this case is a real value (or LAI). The RVM utilizes the kernel function given in eq. 5 to learn the relationship between the input vector and the targets. By assuming the targets are independent, the likelihood of the dataset can be written as 
where t, w, and Ö is the N × (N+1) training matrix. Further, the solution of Eq. 6 is constrained to avoid overfitting to the noise by defining an explicit prior probability distribution over them. 
A popular choice of prior probability distribution for the RVM is the zeromean Gaussian prior distribution with a distinct hyperparameter, α_{i}, for each weight, 
The optimal parameters of the RVM model are then derived by minimizing the penalized negative loglikelihood, 
where and is a diagonal matrix with nonzero elements given by the vector of hyperparameters. The outcome of this optimization is that many elements of α go to infinity such that will have only a few nonzero weights that will be considered as relevant vector. Similar to SVM, the RVM model was also optimized using a 5 fold crossvalidation. 
Results and Discussion 
Ground truth LAI measurements were made from 16 commercial fields planted with corn, 12 with cotton, 11 with sorghum, and 8 with soybean. Measured LAI values varied from 0.09 to 6.21 with a standard deviation of 1.91 m^{2} m^{2}. Table 1 presents range, mean, and standard deviation of measured LAIs for major summer crops in the study area. The large variability in the measured LAI could be attributed to various stages of crop growth in different fields. Table 2 presents a best LAISVI model in each of the linear, quadratic, exponential, and power, SVM and RVM categories and associated R^{2}, adjusted R^{2}, root mean square error (RMSE), and F statistic for major summer crops in the Texas High Plains. The optimal R^{2} values with SVM and RVM modes were obtained using the NDVI43 as input. 
Figure 1ad illustrates the best statistically significant linear, quadratic, exponential, and power relationships between LAI and SVIs (see Table 2 for the models used). All four SVIs contain TM band 4 (NIR) as one of the variables indicating that the NIR is more sensitive to LAI. Their R^{2} values varied from 0.79 to 0.84. Among the four OLS models, the model that used NDVI (NDVI43) i.e., normalized difference of TM bands 4 and 3 was found to be relatively more sensitive to lower LAI values than that with ratio indices. The LAISVI relationship that uses the ratio of TM bands 4 and 7 (R47) gave the highest R^{2} value (0.79) with relatively lower RMSE (0.90) and high F statistic (160.77) in the linear category (Figure 1a). Figure 1b shows the best statistically significant quadratic relationship between LAI and SVIs. The relationship pattern was similar to linear model; however, the performance statistics were slightly improved with relatively high R^{2} values (0.80) and low RMSE values (0.89). Figure 1cd shows the best exponential and power relationships between LAI and SVIs, respectively. Both relationships use NDVI (NDVI43) as a variable and gave best performance statistics with R^{2} value of 0.81 for exponential and 0.84 for power relationships. The RMSE values (0.500.55) were much lower than that for linear and quadratic LAISVI relationships. The highest LAI estimated from this study was 6.21 m^{2}. However, saturation in the LAISVI relationships was not observed. This is consistent with the observation made by [15] for mixed land use data. 
The LAISAVI relationships (not reported here) were not better than the LAINDVI relationship in linear, quadratic, exponential or power forms of regression models, and LAI values were not sensitive to L value. However, it provided relatively comparable results with R^{2} value of 0.76 for linear, 0.78 for quadratic, 0.80 for exponential, and 0.82 for power models. These results are consistent with the L value of 0.16 reported in [37]. The reduced sensitivity of L may be due to the fact that there was not much variation in the soil background as clay loam soils of the Sherm series occupy nearly all of the cropland in both Moore and Ochiltree counties. 
Figure 2ab illustrates the best LAISVI relationships with SVM and RVM techniques, respectively. The best models were associated with the NDVI (NDVI43) in both categories. The LAI model with SVM was significantly better than OLS and RVM models with an R^{2} value of 0.96 and a RMSE value of 0.42. The RVM model was slightly comparable to the exponential model with a slightly better R^{2} (0.88) and a slightly poor RMSE (0.65). 
Five of the six reported OLS models (Table 2) use either SR (R43=TM band 4/TM band 3) or NDVI. The remaining linear model use the ratio of TM band 3 (R) or 7 (MIR). This indicates that the R and NIR bandwidths are sensitive to the LAI and is consistent with those observed by previous studies [4,16,17]. The difference in reflectance of R and NIR from plants is attributed to the chlorophyll pigments in plant leaf absorbing energy in the red band of the electromagnetic spectrum resulting in relatively low red reflectance and transmittance while lignin in plant cell walls causing scattering of nearinfrared energy resulting in relatively high nearinfrared reflectance and transmittance [38]. Based upon the RMSE and F statistic, exponential and power models (Table 2) were found to the best for estimating LAI. However, based on the normalized difference (NDVI) between the TM bands 4 and 3, SVM and RVM were found to be the best models for estimating LAI. The power model that uses NDVI43 accounted for 84% of the variability in the measured LAI data with a relatively small RMSE of 0.50. 
Conclusion 
Leaf Area Index (LAI) is important for spatially distributed modeling of surface energy balance, evapotranspiration and vegetation productivity. The Landsat 5 Thematic Mapper (TM)based spectral vegetation indices (SVIs) were evaluated using the ordinary least square (OLS), support vector machine (SVM), and Relevant Vector Machine (RVM) regression techniques for their ability to estimate LAI for major crops in the Texas High Plains. The R^{2} values for the selected models varied from 0.79 to 0.96 with the SVM model providing the least RMSE, followed by the power model. Both the SVM and the power model used NDVI as input variable and resulted in producing the best regression with fieldmeasured LAI. Analysis of the results indicated that SR and NDVI were sensitive to LAI. Overall, the remote sensing approach is promising for the rapid collection of LAI information on individual fields over large areas in the Northern High Plains of Texas in a time and costeffective manner. These are preliminary modeling analysis and we are pursuing further evaluation of the models to establish the utility of either the OLS or the machine learning algorithm such as SVM and RVM. 
Acknowledgments 
We extend our appreciation to Mr. Timothy Trimble and Mr. Scott Strawn, both County Extension Agents, Texas AgriLife Extension, Dumas, Texas and Perryton, Texas, respectively, for their logistical support to conduct field campaigns in Moore and Ochiltree counties. 
References 

Table 1  Table 2 
Figure 1  Figure 2 
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals