Quantitative Measurement of Oligodendrogliomas Histologica Features in Predicting the 1p/19q Co-Deletion Status

Oligodendrogliomas are characterized by 1p/19q co-deletion, which generally correlates with subjective morphologic features like nuclei circularity ratio and texture features of the cancer nuclei. As cost control becomes more of an issue in medicine, upfront reflex molecular diagnostic testing for lesions like 1p/19q co-deletion may not be appropriate. This paper aims to develop a rigorous, unbiased digital imaging segmentation algorithm and statistical models that can accurately predict the likelihood of 1p/19q co-deletion based on morphology and texture, which would greatly improve cost-effectiveness in clinic trail. In this study, totally 28 gliomas of haematoxylin and eosin stained slides are comprised in this test cohort. Selected areas that had high tumour cell density were digitally analysed with a high-throughput image segmentation algorithm to automatically delineate the boundaries of the cell nuclei. Then we extracted the morphologic features and texture features based on the segmentation result, and applied them in to Lasso-logistic regression to build the correlation between these features with 1p/19q co-deletion status. As a comparison, we also used PAM (Prediction Analysis of Microarrays), RPA (Recursive Partitioning Analysis) to compare the predication performance. We find out that the circularity ratio of the cell, the variance of cell area for each patient, and parts of texture features effect the 1p/19q co-deletion status, and the false prediction rate of leave one out cross validation is at most around 10%. Moreover, we conduct survival analysis and find out two morphologic features and one texture features are significant influential to patients’ survival time. Citation: Yang Y, Xing F, Yang L (2016) Quantitative Measurement of Oligodendrogliomas Histologica Features in Predicting the 1p/19q Co-Deletion Status. J Biom Biostat 7: 283. doi:10.4172/2155-6180.1000283 J Biom Biostat ISSN: 2155-6180 JBMBS, an open access journal Page 2 of 8 Volume 7 • Issue 2 • 1000283 and eosin stained slides from 28 gliomas comprised the training cohort. Selected areas that had high tumour cell density (i.e., minimal nonneoplastic tissue contamination”) were digitally analysed with a highthroughput image segmentation algorithm to automatically delineate the boundaries of the cell nuclei (typically hundreds in one image patch). Representative morphological features, such as the mean and variability of nuclear area, shape, circularity ratio, and perimeter and texture morphological features were collected based on segmentation result. Our second contribution is that we constructed three statistical models that can accurately predict the likelihood of 1p/19q co-deletion based on morphological features and texture features. Through the stepwise selection results for Lasso-logistic regression model significant image markers (variability of circularity ratio and two texture features) are selected. Based on these selected image markers, the Lasso model was constructed to predict the 1p/19q status for new cases. As comparison, PAM (Prediction Analysis of Microarrays) and RPA (Recursive Partitioning Analysis) were also used to select the significant image markers, trained and tested. At last, leave one out cross validation was used to compare the prediction performance (the false prediction rate) of Lasso with PAM and Recursive Partitioning Analysis. The accuracy of predicting 1p/19q status is significantly improved by our proposed models. Furthermore, recent prospective randomized clinical trials have validated associations between combined 1p/19q co-deletion and prolonged overall survival of patients treated with radiation therapy with or without chemotherapy [3,4,15]. Just the same case like predicting 1p/19q co-deletion status, the work related to predict survival time and mining the hazard rate of patients and their morphologic features and texture features extracted from gliomas of haematoxylin and eosin stained slides are vacant in related research. In this paper, we also retrieve the previous study: analysis the difference of patients’ survival time under different 1p/19q co-deletion status. Moreover, we built Cox proportional hazards model of patient and take extracted morphologic features and texture features as covariate. Two morphological features and one texture features are selected by forward stepwise selection. And we consider this as out third contribution. Materials and Methods


Introduction
Brain tumour is one of the most frequent cancers worldwide. An oligodendroglioma tumour is a slow growing brain tumour that is believed to originate from the oligodendrocytes of the brain or from a glial precursor cell. They occur primarily in adults (9.4% of all primary brain and central nervous system tumours) but are also found in children (4% of all primary brain tumours). The average age at diagnosis is 35 years [1]. These tumours are frequently located within the frontal, temporal or parietal lobes and cause seizures in a relatively high percentage of patients. Many oligodendrogliomas contain little specks of calcium (bone) and can easily. Based on both FISH and LOH findings [2], the present study suggests that the 1p/19q co-deletion in pure oligodendroglia tumours can be considered as a diagnostic, rather than a prognostic marker. 1p/19q genetic status has been examined in malignant gliomas since Cairncross et al. who described the clinical implications of 1p/19q co-deletion in patients with anaplastic [3][4][5]. Thus Oligodendrogliomas are characterized by 1p/19q co-deletion. Because oligodendrogliomas are less aggressive than their grade matched astrocytic counterparts, differentiating between an astrocytoma and an oligodendroglioma is a key component of surgical neuropathology. Unfortunately, such discrimination suffers from high inter-observer variability.
Furthermore, as cost control becomes more of an issue in medicine, upfront reflex molecular diagnostic testing for lesions like 1p/19q co-deletion may not be appropriate. An accurate, unbiased way to predict the likelihood of 1p/19q co-deletion would greatly improve cost-effectiveness, reserving this expensive test for cases in which co-deletion is reasonably possible. The recognition of molecular subsets among oligodendrogliomas has raised the question whether distinct mutations in associated genes may serve as prognostic markers ( Figure 1). Recently, more and more research starts to focus on the relationship between morphological features and oligodendrogliomas [6][7][8][9]. Partial oligodendroglial morphological features in GBMs were more frequently detected in tumours with 1p loss. In Ueki'study, which indicate that morphological features do necessary follow the genetic profile [10,11]. In that case; predict 1p/19q co-deletion status based on the morphologic features circularity ratio is desirable to achieve. From Scheie's study, there is a strong association between phenotype and genotype in oligodendroglial tumours. And even when all significant variables are accounted for, perfect prediction (100%) of 1p/19q status cannot be obtained [12]. So to find out a more accurate way to predict 1p/19q co-deletion status based on the morphologic features is a necessary. On the other hand, image texture features, which is a set of metrics calculated in image processing designed to quantify the perceived texture of an image, are still play an important role in image classification [13,14], as well as in prediction. However, to our knowledge, no study has documented a detailed method to mining the correlation between texture features and 1p/19q co-deletion status, to say nothing of, combine the morphologic features and texture features to predict 1p/19q co-deletion status.
In this paper, our first contribution is that we proposed a methodology that associated nuclei morphology which released from tissue slides images with the 1p/19q co-deletion status. Haematoxylin and eosin stained slides from 28 gliomas comprised the training cohort. Selected areas that had high tumour cell density (i.e., minimal nonneoplastic tissue contamination") were digitally analysed with a highthroughput image segmentation algorithm to automatically delineate the boundaries of the cell nuclei (typically hundreds in one image patch). Representative morphological features, such as the mean and variability of nuclear area, shape, circularity ratio, and perimeter and texture morphological features were collected based on segmentation result. Our second contribution is that we constructed three statistical models that can accurately predict the likelihood of 1p/19q co-deletion based on morphological features and texture features. Through the stepwise selection results for Lasso-logistic regression model significant image markers (variability of circularity ratio and two texture features) are selected. Based on these selected image markers, the Lasso model was constructed to predict the 1p/19q status for new cases. As comparison, PAM (Prediction Analysis of Microarrays) and RPA (Recursive Partitioning Analysis) were also used to select the significant image markers, trained and tested. At last, leave one out cross validation was used to compare the prediction performance (the false prediction rate) of Lasso with PAM and Recursive Partitioning Analysis. The accuracy of predicting 1p/19q status is significantly improved by our proposed models.
Furthermore, recent prospective randomized clinical trials have validated associations between combined 1p/19q co-deletion and prolonged overall survival of patients treated with radiation therapy with or without chemotherapy [3,4,15]. Just the same case like predicting 1p/19q co-deletion status, the work related to predict survival time and mining the hazard rate of patients and their morphologic features and texture features extracted from gliomas of haematoxylin and eosin stained slides are vacant in related research. In this paper, we also retrieve the previous study: analysis the difference of patients' survival time under different 1p/19q co-deletion status. Moreover, we built Cox proportional hazards model of patient and take extracted morphologic features and texture features as covariate. Two morphological features and one texture features are selected by forward stepwise selection. And we consider this as out third contribution.

Cell segmentation
A seed-controlled repulsivelevel setmethod [16,17] is applied to brain tumour cell segmentation in histopathology image.Since the number and the position of cells are not available, a prior, it is challenging to separate touching cells from each other. To this end, we employed a robust single-pass voting algorithm to accurately locate cell geometric centers, which are defined as seeds in this work. For each pixel (x,y) in image I(x,y), it defines a cone-shape voting area A with vertex at (x,y) and votes towards the direction of negative gradient based on the magnitude ( ) To update the voting map V(x,y) with the same dimension as I(x,y), a Gaussian kernel κ(u,v,μ,Σ) is incorporated into the voting procedure [16]: Where A is defined as a cone-shape region by the radial range ( ) min max r , r and the angular range denotes the mean of Gaussian kernel, and 2 2 I Σ = σ represents the covariance matrix. The Gaussian kernel in (1) weights the magnitude for each pixel based on the distance between the pixel and the vertex. The closer it is to the cell center, the higher value it achieves. As a sequence, the kernel encourages the voting toward the cell centre and the central pixels would finally obtain higher voting values compared with those near cell boundaries. Using a threshold to select these central pixels as seed candidates, mean shift [18] is applied to final seed detection by clustering the candidates. Since the candidates are always located in the cell central region, the final seeds would be correctly detected at the geometric centres.
With the detected seeds as warm initialization, a repulsive level set model [17] is employed to extract cell boundaries. Based on an interactive scheme, repulsive level set not only uses the competition of evolving contours to determine the membership of each pixel, but also applies the repulsion to prevent adjacent contours from overlapping. Let C i (i=1,…,N) present the contours evolving toward the boundaries of N cells in image I, the level set energy function for cell segmentation with the interactive scheme can be expressed as follows: Where A i , i=1, N denotes the region closed by contour C i and b Ω represents the background. The in ( ) operator denotes the region inside cells. The c i , c b are the mean intensities of the cell region and background region, respectively. The λ 0 , λ1, η and ω are weights for the terms of cell region, background region, cell boundary, and repulsive scheme, respectively. Function g is chosen as a sigmoid function in the implementation with α representing the slope of the output curve and β representing the window size: Cell segmentation is achieved by minimizing (2) using the level set framework. By introducing the Euler-Lagrange representation, equation (2) can be solved iteratively with the gradient descent method [17]. Due to the last term in (2), touching cells can be automatically and efficiently separated from each other ( Figure 2).
Totally 28 gliomas of haematoxylin and eosin stained slides which cropped (select specific region of each pathology image to analysis) from the image provided by Norton Brain Tumour Centre, which are comprised in this test cohort. Each slide corresponds to a particular patient whose 1p/19q status is known. Selected areas that had high tumour cell density (i.e., minimal non-neoplastic tissue contamination) were digitally analysed with a high-throughput image segmentation algorithm to automatically delineate the boundaries of the cell nuclei (typically hundreds in one image patch).

Feature extraction
Based on the segmentation results, geometric features including nuclear area, perimeter, circularity index, and ratio between major and minor axes, are extracted for description. Due to the diversity among different cells within each image, the measurement of each feature would result in producing a unique distribution [19]. We apply the mean and the standard deviation to the geometrical features. Furthermore, one more robust texture feature, texton [20], is also calculated because of its strong discriminative power for classification [21]. In total, we extract 30 features consisting of 8 geometric features and one texton histogram containing 22 textons. Texton-based feature is one type of widely used texture features recently, and has achieved great successes in image segmentation [22], recognition [23], classification [24], etc. Textons are defined as the prototype filter response vectors, which are calculated by applying a filter bank to the images. In order to handle variation in cell size, intensity, and shape, the filter bank we used consists of 48 filters with 36 elongated filters at 6 orientations, 3 scales, and 2 phases, 8 centersurround difference of Gaussian filters, and 4 low-pass Gaussian filters. Therefore, each pixel is transformed into a 48-dimensional vector. Since the sliding windows are overlapping when we perform filtering, the responses will be overly redundant such that we can cluster them to form a compact representation. In the implementation, we use K-means for clustering (K is empirically set as 22). Each of the K-means centers (textons) encodes certain features so that similar pixels together with its neighbors should be mapped to the same class. The clustering center set actually forms a texton histogram which can be used as a texture feature for classification.

Lasso-logistic regression
In traditional statistical analysis, we usually use ordinary least square (OLS) to obtain unbiased estimators, which is not satisfied due to the prediction accuracy and the difficulty in interpretation. To enhance the prediction accuracy, we sacrifice a little bias to reduce the variance by add penalty terms; to make it easy to interpret, we often would like to determine a smaller subset of among feature space that exhibits the strong effects. The lasso (shorted for least absolute shrinkage and select operator) which was originally proposed for linear regression models has become a popular model selection and shrinkage estimation method. It shrinks some coefficients and sets others to 0, and hence tries to retain the good features of both subset selection and ridge regression. In that case, lasso is a popular method for regression that uses an l 1 penalty to achieve a sparse solution [25,26]. The lasso estimate ( , ) α β can be defined as follows: Where t is a tuning parameter,( y i ,x ij ) is the data set, i= 1,2,…,N and β=(β 1 ,β 2 ,…β p ) t In particular, when the response variable is binary, the linear logistic regression model is often used. Updating the y i in equation (4) by In 2008, Friedman wrote a R package 'glmnet 'to achieve a computational solution for lasso model, including lasso-logistic regression model (lasso two classification. The algorithms use cyclical coordinate descent, computed along a regularization path. The methods can handle large problems and can also deal efficiently with sparse features [27,28].
In our case, we want to predict 1p/19q co-deletion status more accurately by using a subset of representative features among all the features we extracted. Due to the sparse property of lasso, the lasso-logistic regression model is a wise choice and can be applied to solve this problem. Using the lasso package: glmnet. We selected a 3-feature classifier among 30 features by stepwise selection: variability of circularity ratio, textons 9 and textons 12.

PAM (prediction analysis of microarrays)
PAM (Prediction Analysis of Microarrays) is a statistical technique for class prediction from gene expression data using nearest shrunken centroids. It is described in Tibshirani, Hastie, Narasimhan and Chu [26]. The method of nearest shrunken centroids identifies subsets of genes that best characterize each class. Shortly, we shrink the class centroids toward the overall centroids after standardizing by the within-class standard deviation for each gene. This standardization has the effect of giving higher weight to genes whose expression is stable within samples of the same class. Such standardization is inherent in other common statistical methods such as linear discriminant analysis. The technique is general and can be used in many other classification problems. It can also be applied to image classification. PAM Software for the R package [29] has been available for some time now.

RPA (recursive partitioning analysis)
Recursive partitioning methods have become popular and widely used for non-parametric regression and classification in many fields. Especially random forests, which can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine and bioinformatics within the past few years.
Classification and regression trees are a simple nonparametric regression approach. Their main characteristic is that the feature space, the space spanned by all predictor variables, is recursively partitioned into a set of rectangular areas, as illustrated below. The partition is created such that observations with similar response values are grouped. After the partition is completed, a constant value of the response variable is predicted within each area [30].
There is a R package "rpart" available to achieve this method for classification [31]. As a comparison of lasso model, we use PAM and RPA. We selected a 6-feature among 30 features: they are mean of

Result
Based on the representative morphological features we computed from segmentation result and texture feature cropped from original gliomas of haematoxylin and eosin stained slides, and applied them into our three statistical classification models, we get the following feature selection result.
As expected, only parts of features out of 30 total features indeed are highly significantly correlated with 1p/19q co-deletion (Table 1).
From Table 1, we can find variability of circularity ratio; textons 9 and textons 12 are significant feature in all the three model, which may play an most important role in predicting 1p/19q co-deletion.
Variability of circularity ratio, textons 3 and textons 6 occur two times in the three statistical models, which may contribute secondarily to the performance of 1p/19q co-deletion prediction. Figure 3a shows the stepwise features selection result for lassologistic model. Clearly, variability of circularity ratio, textons 9 and textons 12 are chosen under that method. Figure 3b shows the model deviance by choosing different value of penalty parameters. The left vertical curve is the penalty parameter chosen by minimizing the model deviance, while the right vertical curve is the penalty parameter chosen by minimizing the 1 standard error of deviance. In practical, we prefer the 1 standard error criteria since it reflect more variability [27] ( Figure 3). Figure 4a shows features selection result for PAM and Figure 4b shows the performance of each selected feature in predicting 1p/19q co-deletion status (Figure 4).

Leave one out cross validation
Cross validation, sometimes called rotation estimation, is a technique for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. One round of cross validation involves partitioning a sample of data into complementary subsets, performing the analysis on one subset (called the training set), and   validating the analysis on the other subset (called the validation set or testing set). To reduce variability, multiple rounds of cross-validation are performed using different partitions, and the validation results are averaged over the rounds [32].
Leave-one-out cross-validation (LOOCV) is widely used when the data size we have is limited. As the name suggests, it involves using a single observation from the original sample as the validation data, and the remaining observations as the training data. This is repeated such that each observation in the sample is used once as the validation data. Leave-one-out cross-validation is computationally expensive because it requires many repetitions of training.
As we only collect the information from 28 patients, LOOCV was used to compare the prediction performance of Lasso, PAM and RPA.
We find out that the false prediction rates are 7.14%, 17.85% and 10.71%, respectively, for Lasso, PAM and RPA.
Compared to PAM and RPA, Lasso-logistic regression model achieved least false prediction rate (7.14%), while PAM which include the most features have a worst prediction performance. On the other hand, RPA model have the best prediction performance result when the patient has 1p/19q co-deletion (false prediction rate is 0). To sum up, we declare that variability of nuclear area is not as much as important to other 6 features in Table 1; mean of circularity ratio, variability of circularity ratio, textons 3, textons 6 are more important features when the patient has 1p loss then not.

Survival analysis
To further study, we conduct survival analysis of patients' survival time to 30 morphologic. In the group with 19p loss, a significantly better (p = 0.022) survival period was found for the patients (median was three times longer than those with 1p loss). In that case, we'd like to believe that 1p/19q co-deletion status is a critical factor to influence the survival time for brain tumour patients. Since we have already built the relationship between morphologic features and texture features and 1p/19q co-deletion status, to construct cox proportional hazards model of patient and take extracted morphologic features and texture features as covariate is appropriate. Three significant features (mean of circularity ratio, variability of circularity ratio and textons 3) are selected by forward stepwise selection method ( Figure 5).
From that we can see mean of circularity ratio (p = 0.0048), variability of circularity ratio (0.0049) and textons 3 (p = 0.0207) have the parameter estimate 0.0968, 0.0121 and 0.024 respectively. It means the survival hazard rate will 0.0968, 0.0121 and 0.024 times higher than the baseline hazard rate we pre-choose for any unit increase in mean of circularity ratio, variability of circularity ratio and textons 3.

Discussion and Conclusion
The major aim of the study is to investigate the correlation between nuclei morphology, nuclei texture features and 1p/19q codeletion status based on our proposed segmentation methodology. Based on a cohort of 28 patients, we have achieved at most 93% prediction accuracy through three statistical model, which confirms the long-standing observation that nuclear morphology (variability of circularity ratio) and parts texture features (textons 9 and 12) are highly correlated with 1p/19q co-deletion status. The Lasso-logistic regression model produces the best prediction results using these discovered image markers. Furthermore, survival analyses are conducted based on patients' survival time to those image markers and 1p/19q codeletion status respectively. We also detect the significant difference of survival time with or without 1p loss, and cox model is used to select those image markers regarding to survival time. Mean of circularity ratio, variability of circularity ratio and textons 3 are selected, which means they are influential to survival time. However, our study has two limitations. Firstly, the morphologic features we chose is subjective, and there might be possible we have ignored some other influential morphologic features. Secondly, we cropped/selected specific region of each pathology image to analysis instead of the whole image region. In that case, some critical latent nuclei might be ignored. In the future work, the additional morphologic features are currently being explored to further enhance prediction accuracy. Once a segmentation algorithm is sufficiently optimized, a validation cohort from Norton Brain Tumour Centre will be analysed. Additionally, instead of calculating from select regions, we also plan to apply the automatic image analysis on the whole slide scanned pathology images.