Cullen CA^{1*}, Kashuk S^{2}, Suhili R^{2}, Khanbilvardi R^{2} and Temimi M^{2,3}  
^{1}Department of Earth and Atmospheric Science, Graduate Center City, University of New York, USA  
^{2}Department of Earth and Atmospheric Sciences and NOAACREST center, The City College of New York, USA  
^{3}Water Center, Masdar Institute of Science and Technology, New York, USA  
*Corresponding Author :  Cullen CA Department of Earth and Atmospheric Science Graduate Center City University of New York, USA Tel: 2124919118 Email: [email protected] 
Received date: Feb 22, 2016; Accepted date: Mar 03, 2016; Published date: Mar 8, 2016  
Citation: Cullen CA, Kashuk S, Suhili R, Khanbilvardi R, Temimi M (2016) A Multistage Technique to Minimize Overestimations of Slope Susceptibility at Large Spatial Scales. J Remote Sensing & GIS 5:159. doi:10.4172/24694134.1000159  
Copyright: © 2016 Cullen CA, et al. This is an openaccess article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. 
Visit for more related articles at Journal of Remote Sensing & GIS
Rainfall induced landslides are one of the most frequent natural hazards on slanted terrains. They lead to significant economic losses and fatalities worldwide. Most factors inducing shallow landslides are local and can only be mapped with high levels of uncertainty at larger scales. This work presents an attempt to determine slope instability using buffer and threshold techniques to downscale large areas and minimize slope uncertainties at local scales, then in a second stage, logistic regression is used to determine susceptibility at large scales. ASTER GDEM V2 is used for topographical characterization of slope and buffer analysis. Four static parameters (slope angle, soil type, land cover and elevation) for 230 shallow rainfallinduced landslides listed in a comprehensive landslide inventory for the continental United States are examined. A delimiting buffer equivalent to 5, 25 or 50 km is created around each landslide event facilitating the statistical analysis of slope thresholds. Slope angle thresholds at the pixel points 50, 75, 95, 99 and maximum percentiles are compared to one another and tested for best fit in a logistic regression environment. It is determined that values lower than the 75percentile threshold misrepresents susceptible slope angles by not including slopes higher than 35°. Best range of slope angles and regression fit can be achieved when utilizing the 99 percentile slope angle threshold. The resulting logistic regression model predicts the highest number of cases correctly with 97.2% accuracy. The logistic regression model is carried over to ArcGIS where all variables are processed based on their corresponding coefficients. A regional landslide probability map for the continental United States is created and analyzed against the available landslide records and their spatial distributions. It is expected that future inclusion of dynamic parameters like precipitation and other proxies like soil moisture into the model will further improve accuracy. Keywords: Shallow landslides; Slope instability; Threshold analysis; Logistic regression; Regional analysis; GIS; Remote sensing Introduction Rainfall induced landslides are one of the most frequent natural hazards on slanted terrains. They usually result in great economic losses and fatalities globally. Worldwide at least 32,322 deaths between 2004 and 2010 have been reported [1] and in the United States alone, landslides cause $12 billion in damages and more than 25 fatalities in average each year [2]. Understanding, mapping, modeling and preventing the aftermath of these devastating events represents an important scientific and operational endeavor [3]. The term “Landslide” describes the downward and outward movement of slopeforming materials that include rock, earth, and debris or a combination of these [4]. Although landslides are considered to be dependent on the complex interaction of several static and dynamic factors [57] slope angle has great influence on the susceptibility of a slope to sliding. Increased slope angle usually correlates to increased likelihood of failure even if the material distribution on the slope is uniform and isotropic [5]. Undeniably, many other parameters are essential to the analysis of landslide risk. For example, changes in land use and land cover such as deforestation, forest logging, road construction, cultivation and fire on steep slopes can have a significant effect on landslide activity [8]. In addition, forest vegetation
Keywords 
Shallow landslides; Slope instability;Threshold analysis; Logistic regression; Regional analysis; GIS; Remote sensing 
Introduction 
Rainfall induced landslides are one of the most frequent natural hazards on slanted terrains. They usually result in great economic losses and fatalities globally. Worldwide at least 32,322 deaths between 2004 and 2010 have been reported [1] and in the United States alone, landslides cause $12 billion in damages and more than 25 fatalities in average each year [2]. Understanding, mapping, modeling and preventing the aftermath of these devastating events represents an important scientific and operational endeavor [3]. 
The term “Landslide” describes the downward and outward movement of slopeforming materials that include rock, earth, and debris or a combination of these [4]. Although landslides are considered to be dependent on the complex interaction of several static and dynamic factors [57] slope angle has great influence on the susceptibility of a slope to sliding. Increased slope angle usually correlates to increased likelihood of failure even if the material distribution on the slope is uniform and isotropic [5]. Undeniably, many other parameters are essential to the analysis of landslide risk. For example, changes in land use and land cover such as deforestation, forest logging, road construction, cultivation and fire on steep slopes can have a significant effect on landslide activity [8]. In addition, forest vegetation, especially tree roots help stabilizes hill slopes by reinforcing soil shear strength. Root reinforcement is imperative on slopes where roots can extend into joints and fractures in bedrock or into a weathered transitional layer between the soil and bedrock [9,10]. 
Furthermore, soil properties such as particle size and pore distribution of the soil matrix influence slope instability. These properties influence the soil’s holding capacity and rate in which water moves through the soil. Coarse soils are known to hold less water under unsaturated conditions than finer soils [11]. Rainfall intensity and duration affect the soil’s saturation level. Hence, hydraulic characteristics and matrix suction properties of soil are crucial in the study of rainfall triggered shallow landslides [12]. In general, soil types and their associated geotechnical, mechanical, physical and hydrological properties are essential for the assessment of landslide hazards [8]. 
Various studies that list, define areas of susceptibility and attempt to forecast landslides have shed some light on the conditions and mechanisms that influence slope instability [1218]. Nonetheless, the reliability of all proposed methodologies is dependent on the availability of adequate temporal and spatial surface data in addition to adequate reporting [19]. At local scales, deterministic methods are considered to be most reliable because they are founded on geotechnical properties [20]. Nevertheless, deterministic methods are inadequate for the study of landslides at large scales as geotechnical and hydrological conditions vary persistently from location to location [20]. 
Statistically based models are preferred at large scales as they are known to have a good degree of reliability correlating instability parameters to past distribution of landslides [14]. Logistic regression, for example, is one of the most common statistical methods used for landslide assessments [12,17,2125]. Logistic regression is used to find the best fitting relationship of multiple independent variables to a dependent variable and does not require normally distributed landslide conditioning parameters. Logistic methods are a multivariable analysis technique where the dependent variable is not a continuous parameter and where the result is a binary probability of values between 0 and 1 [22]. The advantage of logistic methods over regression analysis and discriminant analysis for the study of landslides is the fact that the dependent variable has the probability of only two values: an event happening or not happening (0 or 1) [26,27]. 
The advancement of remote sensing techniques offers a better opportunity to analyze landslide risk at large scales; however, great discrepancies arise when monitoring landslides at high spatial resolution over a large domain. Inventories usually depend on information retrieved from newspapers, online news, and government agencies where heterogeneous reporting is unavoidable. In many instances, catalogs lack precise spatial and temporal distribution making it hard to identify the precise conditions involved in the development of landslide events. In addition, studies have shown that susceptible slope angle is misrepresented at large scales. Kirschbaum [6] for example, emphasizes that slope angle values in Hong [5] global model are undervalued at around 21° due to averaging values over a large area. Similarly, in a global landslide hotspot study, Nadim [26] places susceptible slope angle between 8 and 32° top. Defining a better technique than just finding the average can assist to reduce slope underestimations at largescales and be very helpful for the analysis of landslide risks. 
This work proposes to address the scale dilemma by utilizing a descriptive landslide inventory in addition to buffer and threshold techniques that help minimize susceptibility over estimation at large spatial scales. Precisely, the proposed blended techniques involve a reducing the area of study as a suitable approach that delineates areas of high risk where another approach that is appropriate for large assessment is applied. It is then a multistage approach that is proposed here to bridge the gap between different appraisal scales and reduce slope misrepresentations. These two techniques are applied in the spatial context of the Continental United States utilizing the best available rainfalltriggered landslide inventory that represents most of the dominant conditions of landslide prone localities. Subsequently, logistic analysis is used to determine landslide probability at the regional scale. 
Methods 
This work presents a multistage technique that bridges the gap between landslide mapping at large and local scales. Based on a descriptive (spatial and temporal) landslide record, buffers are used to condense the area of study to that of the most likely area of slope susceptibility. Consequently, various percentile thresholds for each static parameter are tested in a logistic regression model to determine the best fit. Validation of the model is performed by the random division of the data in a 7030% fashion and data partition and cross validation. A confusion matrix helps conclude details about the performance of the model. Best fitting model is then represented in a landslide probability map for the continental United States. 
Data collection 
Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) Global Digital Elevation Model (GDEM) 10 by 10 tiles were merged into a single DEM utilizing ArcGIS’s mosaic to new raster function. Soil type was obtained from the Harmonize World Soil Database Version (HWSD) 1.2. This dataset combines existing regional and national updates of soil information from around the world and incorporates them into the Food and Agriculture Organization of the United Nations (FAOUNESCO) soil Map of the World at a 1 km resolution. Land cover was retrieved from the FAO Global Land CoverSHARE database at 1 km^{2} resolution. This dataset integrates local and global land cover information, local information is derived from datasets such as Africover and Corine LC and global data is derived from the Moderateresolution Imagine Spectroradiometer MODIS Vegetation Continuous Fields VCF2010 [28,29]. 
Buffer analysis 
Landslide inventory: Developing an approach for local and regional monitoring of landslides is possible when a large and comprehensive record of landslides events is available, this represents the main limitation of this work. Obtaining event data or a consolidated landslide inventory at large scales is extremely challenging due to heterogeneous reporting and data availability even for a country such as the United States. The United States Geological Survey (USGS) is currently compiling a listing of global and local events [30] but uniform reporting is not available yet. In addition, events listed in the State geological surveys, in many instances, lack precise spatial and temporal distribution. 
To this day, the most uniform and comprehensive landslide inventory found by the authors is being developed at the National Aeronautics and Space Administration (NASA) and explained in Kirschbaum [6]. The inventory is a systematic landslide catalog that lists around 1,600 landslides globally and 270 for the United States for the years 2003, 2007, 2008 and 2009. The inventory summarizes rainfalltriggered landslides and debris flows reported in newspapers, online news, and government agencies. Landslide events are reported with an accuracy of 24 hours, and in the case of multiple landslides occurring during one rainfall event, the first landslide is designated as the event time. 
This particular inventory stands out from other listings because two qualitative indices were designated to represent locality and size uncertainties that are otherwise kept undefined in other inventories. Index 1: Confidence radius, represents general location accuracy, and Index 2: Size radius, differentiates small from larger events as well as minor events from catastrophic events. Both indices range on a scale between 0 and 5 where 5 represent the most accurate location and the biggest event respectively as seen in Table 1 [6]. 
In this study, size radius and confidence radius are adopted. Size radius is incorporated as a measure of landslide size and confidence radius as an extent of uncertainty. Only confidence radius of 5, 25 and 50 km are considered as they represent the exact or near the exact location, a location known to the extent of a city or nearby coordinates of a village respectively. Confidence radius greater than 50 km are described in the inventory as events occurring somewhere within a country or large region making the uncertainty area too large and, therefore, are excluded from the analysis. The resulting 230landslide events in the U.S. are distributed between Longitudes 60 W and 130 W and Latitudes 30 N and 60 N resulting in a suitable representation of the locations and characteristics that are known to be prone to landslides in the continental U.S. as per RadbruchHall et al. [28]. Buffers equivalent to the extent of the confidence radius are created around each landslide event as seen in Figure 1. 
This process helps reduce the area of study to that of the buffer and therefore, it is reasonable to assume that the buffered area includes all possible places in which the event might have occurred. By this means, it is possible to statistically analyze the characteristics of the terrain that could have led to the raintriggered landslide. Buffer extraction from the original dataset is carried on by an itinerating algorithm that correlates the spatial coordinates of each event to the coordinates in the dataset. Consequently, subsections corresponding to each buffered area are extracted from each dataset and pixel values for each area are converted into ASCII files. Each file corresponds to one buffer or one event, resulting in 230 files for each dataset type. 
Threshold sensitivity analysis 
Slope: Rainfallinduced shallow landslides occur as relatively shallow (0.32 m) failure surfaces parallel to the slope in landslideprone slants [31,32]. In the case of rainfallinduced landslides, slope angle is the underlying factor in downslope movement after gravity forces acting parallel to the slope have superseded friction and cohesion forces. It is undeniably possible that some events could have happened at less steep slopes as gravity alone does not determine downward movement, nevertheless, the likelihood is higher as slope angle increases. This work develops under this premise. 
In this work, slope angle values for all 230landslide events in the continental U.S. are derived utilizing the mosaicked DEM. Pixel values within each buffer are analyzed statistically by creating thresholds, these thresholds are then partitioned and sorted in ascending order. Values are organized in rank order from lowest to the highest, the lowest score is in the 1^{st} percentile and the highest score is the 99^{th} percentile. The percentile represents the value below which a given percentage of the observations lie [33]. For example, if a slope value is in the 99^{th} percentile, it means that it is higher than 99% of the other slope values. 
Percentiles are then used as thresholds in each buffer zone, values laying bellow the specific percentile are considered stable, and values laying above the percentile are considered unstable. Thresholds for the T_{point}=Pixel Point, T_{50}=50, T_{75}=75, T_{95}=95, T_{99}=99 and T_{100}=Max, are tested for all buffers. This technique leads to the assessment of the slope percentiles that result in underestimations and overestimations. Buffers for 3 landslide events and their corresponding T_{99} threshold can be seen in Figure 2. Three different events with buffers of 5, 25 and 50 km and their corresponding slope, elevation, land cover and soil type are represented. Percentile threshold T_{99} is highlighted in red in each histogram as well as in each buffer. 
Elevation: Altitude values corresponding to the T_{99} percentile threshold for all 230landslide buffers are selected. As with the slope buffers, 230 extractions from the DEM are converted into ASCII files. Analysis of the mean, the standard deviation and other statistical moments is investigated. 
Land cover: Land cover classes are represented by numerical values in each dataset; these values are extracted from each buffer and then converted into ASCII files. Because land cover classes are categorical no other statistical moment besides the mode is tested. The corresponding mode for the T_{99} percentile threshold for each buffer is selected as the prominent land cover value within the buffer. 
Soil type: The HWSD lists 36 different soil types and their corresponding physicalchemical properties. Textures, soil drainage, available water storage capacity, soil phase among many other characteristics for each soil are described in the database. Classes found within each buffer are examined and the mode corresponding to the T_{99} percentage threshold range is selected as the representative value for each buffer as shown in Figure 2. 
Threshold values for each file are calculated. A complete flow chart for the analysis framework is illustrated in Figure 3. 
Logistic Regression (LR) model 
The LR method is based on the generalized linear model that can be expressed as Probability of Landslide (Pl): 
Eq.1 
Where Pl is the probability of a landslide event expressed in a dichotomous way of 0 and 1, set by a classification cutoff point value of 0.5 for adjusting the estimated Pl values to 0 for Pl<0.5 and 1 for Pl>=0.5. The logit Z is assumed to contain the independent variables on which the landslide event may fall. The Z term is expressed in the linear form as: 
Eq.2 
Where β_{0} represents the intercept of the model, β_{1}, β_{2}…,_{n} the partial regression coefficients, X_{1}, X_{2}…, X_{n} represent each of the independent variables. 
In addition to the 230 rainfallinduced shallow landslide events, 230 random points that do not overlap with actual events are used to represent the absence of landslides as areas of “noevent yet”. Buffers and thresholds are not applied to random points because statistically, these points have an equal probability of representing an event as much as a noevent. In this case, the pixel value is selected as a representative for random points. 
The regression model calculations are performed using SPSS [34] statistical software. Various models are examined utilizing all threshold percentages, from where the best fitting threshold is selected. Likelihoodratio for all variables is evaluated for removal when the contribution to the model is minimal. The contribution is deemed minimal if the observed significant level is greater than the probability of remaining in the model. In this study, such value is placed at the 0.05 level of significance. 
Results and Discussion 
Buffers and thresholds were designed to present a feasible approach to address misrepresentation of slope angle when monitoring landslide activity at high spatial resolution over a large domain. Improper identification of parameters, particularly for slope, often results in a misrepresentation of areas at risk. The development of this approach is only possible due to the availability of a comprehensive record of landslides events that represent the dominant characteristics of landslideprone areas in the continental United States. A more extensive landslide record with the same characteristics is not available at the present moment, but using the buffer and threshold techniques in more data points can help minimize overestimation of susceptible areas at the large scale. 
This work assumes that landslide risk is greater as slope angle increases. As the landslide inventory does not list the slope angle of the event because the locality is an estimate, slope values in each buffer area are tested. Comparison of slope percentile thresholds demonstrates that values below the T_{75} percentile threshold misrepresent areas of susceptibility by not including slope angle values higher than 35°, in this manner agreeing with previous studies [6,26]. Values below this threshold range between 0° and 3°, this could result in susceptibility over estimations. In addition, this threshold does not account for higher slopes present in the area possibly resulting in a misrepresentation of reality as it is well known that landslides occur in a wider range of slope angles [7,15,3537]. 
This same comparison demonstrates that values above the T_{95} threshold percentage encompass a wider range of slope values, but it is not clear whether these thresholds include overestimations such as the inclusion of outliers. Therefore, the T_{99} threshold is investigated. Nevertheless, it is important to consider that threshold percentages above T_{95} could potentially represent better susceptible slope angle values, for this reason, each threshold is examined in a logistic regression analysis. Distribution for T_{point}, T_{95}, and T_{99} thresholds can be seen in Figure 4. 
Further analysis of each threshold percentage is tested in a logistic regression model, it is determined that threshold T_{99} is the most suitable value because it produces the most representative range of slope values and it yields the best fitting model. In addition, results are consistent with local slope instability studies around the world [7,15,17,3538]. Moreover, the highest amount of variation in the dependent variable is explained by the strongest relationship between the predictors and the prediction at 94.3%. This slope threshold is a conservative assessment that no does not under or overestimate slope angle susceptibility. 
The performance of each model describes how well each variable describes the phenomenon as seen in Figure 5. Likelihoodratio for all variables is evaluated for removal when the contribution is minimal. Contribution is deemed minimal if the observed significant level is greater than the probability of remaining in the model. In this study, elevation’s contribution to the tested models was deemed insignificant; therefore elevation is excluded at this point from any further analysis. 
The independent variables in logistic regression can be characterized as useful predictors if the classification accuracy rate is substantially higher than the accuracy attainable by chance alone. SPSS calculates this chance accuracy criterion as the first step by not including any variables in the model. As a result, the accuracy rate computed for chance is 50.9% and the accuracy rate computed for the model is 97.2%. This demonstrates that the variables included in the model significantly enhance the outcome. Table 2 shows the model’s coefficients for each variable that is found to be significant. Slope and Land cover are significant variable predictors with pvalues <0.01 while soil type is less significant predictor with pvalue<0.001. Slope’s significance as a predictor in the model, emphasizes the importance of its proper initial representation. 
As validation, the data was divided randomly on a 7030% ratio for subsets as “model obtaining” and “validation” subsets respectively. Furthermore, the data was partitioned in 20% subsets for cross validation. Five rounds of crossvalidation were performed using different partitions. Validation results represented by the average of the five rounds indicate that this model predicts the highest number of cases correctly at 97.2% accuracy. 
A confusion matrix helps determine details about the performance of the model. The ability of the model to correctly identify the events is represented by True Positive. Events that are not correctly identified are represented by False Negative, and over predictions are represented by False Positive [33] (Table 3). 
It is important to emphasize that this study only investigates the relationship of some static variables to landslide events. Rainfall, the triggering factor for the landslides in this study is not incorporated. It is assumed that by incorporating this factor in addition to other proxies like soil moisture into the logistic regression model will result in higher accuracy rate. 
Landslide probability mapping 
The logistic regression model is carried over to Arc GIS 10.2. All variables are processed based on their corresponding coefficients. It is important to reiterate that only static variables are used in this map and better resolution information 
The resulting map in Figure 6 is classified into 2 categories based on a cut off value of 0.5: a) Not Probable (00.50: b) Probable (0.5011). 
Conclusions 
Landslide studies at large scales are limited by uncertainties. At the present time, no system exists that can simultaneously address both regional and local scales. This work proposes utilizing buffer and threshold techniques to minimize uncertainty at the local scale so further analysis can be done on a larger scale. Various threshold percentages corresponding to 230 shallow landslides in the continental United States are tested logistic regression analysis. Findings are as follows: 
Buffer analysis is efficient at narrowing large areas to more manageable scales. This, of course, depends on the original availability of a wellconstructed landslide inventory that provides information on the event’s locality. 
Slope threshold percentage techniques confirm that slope susceptibility is misrepresented when performing analysis at large scales. Most slope values for thresholds lower than T_{75} do not include slopes higher than 350, this result in the over estimation of susceptible areas and a misrepresentation of reality as landslideprone slopes has a greater range. 
It is determined that the threshold percentage T_{99} is a conservative assessment that includes a wider range of slope angles and successfully excludes outliers. 
A regional logistic regression model demonstrates that utilizing the threshold percentage T_{99} to model slope instability at large scales results in an accuracy rate of 97.2%. 
Likelihoodratio for all variables is evaluated in the logistic model, elevation’s contribution was deemed insignificant therefore excluded from the model. 
Using the buffer and threshold techniques in more data points can help minimize overestimation of susceptible areas at the large scale. 
Eliminating uncertainties at the local level improves the large scale modeling accuracy. 
It is important to note that given the restrictions of physically or insitu base data, this study is subject to the existence of a comprehensive landslide inventory and reasonably scaled surface data. Better resolution information and other static parameters can be tested in logistic regression analysis, but awareness of limitations given the large scale is imperative as some data may be deemed too general and not sufficiently detailed for the mapping scale. In addition, although the focus of this study is raintriggered shallow landslides, neither rain nor antecedent soil moisture information has been implemented in this work. It is assumed that future implementation of these unaccounted for variables and the addition of more detailed soil information (in the continental U.S.) will help describe susceptibility conditions dynamically, therefore, enhancing this platform. Moreover, it is possible for this approach to be brought to other regions, as all the data used in the present analysis is available globally. The approach should be regional, leading to a global scale in order to minimize over generalizations. 
Acknowledgements 
This publication was made possible by the National Oceanic and Atmospheric Administration, Office of Education Educational Partnership Program award NA11SEC4810004. Its contents are solely the responsibility of the award recipient and do not necessarily represent the official views of the US Department of Commerce, National Oceanic and Atmospheric Administration. 
References 

Table 1  Table 2  Table 3 
Figure 1  Figure 2  Figure 3 
Figure 4  Figure 5  Figure 6 
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals