A 3S Multi-level Thresholding Technique for Intracranial Segmentation from Brain MRI Images

Image segmentation is an important task involved in different areas from image processing to image analysis. One of the simplest methods for image segmentation is thresholding. However, many thresholding methods are based on a bi-level thresholding procedure. These methods can be extended to form multi-level thresholding, but they become computationally expensive because a large number of iterations would be required for computing the optimum threshold values. In order to overcome this disadvantage, a new method based on a Shrinking Search Space (3S) algorithm is proposed in this paper. The method is applied on statistical bi-level thresholding approaches including Entropy, Cross-entropy, Covariance, and Divergent Based Thresholding (DBT), to achieve multi-level thresholding and used for intracranial segmentation from brain MRI images. The paper demonstrates that the impact of the proposed 3S technique on the DBT method is more significant than the other bi-level thresholding approaches. Comparing the results of using the proposed approach against those of the Fuzzy C-Means (FCM) clustering method demonstrates a better segmentation performance by improving the similarity index from 0.58 in FCM to 0.68 in the 3S method. Also, this method has a lower computation complexity of around 0.37s with respect to 157s processing time in FCM. In addition, the FCM approach does not always guarantee the convergence, whilst the 3S technique always converges to the optimum result.


Introduction
An important component of Computer Aided Detection (CAD) systems based on Magnetic Resonance Imaging (MRI) is segmentation. There are two ways for segmentation that are manual and automatic segmentation. Automatic segmentation is categorized into a number of groups: classical methods such as thresholding, region growing, and edge based methods; soft computing methods such as neural networks and fuzzy clustering methods; statistical methods such as Expectation-Maximization (EM) and Markov Random Fields (MRF) methods; model-based methods, and rule-based methods.
Thresholding is one of the most widely used approaches in image segmentation, and also one of the simplest approaches. Image thresholding is the process of classifying image gray values into two or more levels. This method has been used on T1-weighted MRI images to isolate brain tissues such as skull, Gray Matter (GM), White Matter (WM), and Cerebro Spinal Fluid (CSF).
Most of the existing thresholding methods are bi-level, which use two levels to categorize the image into background and object segments. However, MRI images have many different parts which make these methods non-applicable. Thus, the loss of information from the image may occur and the diagnosis system may mislead physicians in their clinical task. Therefore, an optimum multi-level thresholding algorithm is required to find each thresholding level to ensure that all important information from MRI images are retained. Fuzzy C-means (FCM) is an early multi-level thresholding method. In this method, a cluster is assigned to each tissue class in MRI segmentation. The data in each cluster should be similar based on a similarity criterion. This similarity criterion may be a geometrical distance such as Euclidian distance between each data and a center point which is representative of each cluster. This type of clustering is known as distance-based clustering. Thus, the goal of clustering is giving labels to each data, where each label identifies a cluster. Because of the popularity of the FCM method, we use it as a benchmark to evaluate the effectiveness of the 3S multilevel thresholding method proposed in this paper.
In this work, first a region growing method is applied on the brain MRI Images to remove the skull region. To fully segment the skull, some morphological operations are used. Then, the proposed 3S multithresholding approach is applied on the resulting image to segment different brain tissues.
The paper is organized as follows. Section 2 discusses the background work on thresholding. Section 3 explains some of the existing multi-thresholding methods which are basically based on bilevel methods, and also describes the FCM method. Section 4 presents the proposed 3S method. Section 5 illustrates the experimental results and associated discussion. Section 6 provides the concluding remarks.

Background
Since a great number of multi-level thresholding techniques are derived from bi-level ones, we first review bi-level methods, and then illustrate how to use these methods in a multi-level thresholding scheme.
thresholding methods are categorized into six groups including: histogram shape information, measurement space clustering, histogram entropy information, image attribute information, spatial information, and local characteristic.
In this paper, we have evaluated different approaches in the category of histogram entropy class. These methods include Entropy method [2], Covariance method [3], Divergent Based Thresholding method (DBT) [4], and Cross-Entropy method [5], which are investigated here.
Let L be gray levels [1,2,…,L] in a given image, with the probability distributions denoted by p i . Now, suppose that the pixels are divided into two classes, C 0 and C 1 (Background and Object or vice versa), where C 0 denotes pixels with levels [1,2,…,T], and C 1 denotes pixels with levels [T+1,…,L], where T is the required threshold value. Suppose that the zero order cumulative moment or in another word, the probability of occurrence of classes C 0 and C 1 are w 0 and w 1 , respectively. Similarly, the first order cumulative moment or the mean values of classes C 0 and C 1 are given by µ 0 and µ 1 , respectively. Also, the mean value of the entire image is given by µ T . In the same way, the second order cumulative moments or the variance values of classes C 0 and C 1 are given by µ 0 and µ 1 , respectively.
After this brief introduction to notations, we start to discuss some bi-level thresholding techniques.
The Covariance (Otsu) method: In this method, the covariance between two classes C 0 and C 1 are maximized. This value is equal to: The optimum threshold T is calculated by maximizing the covariance of the two classes. An exhaustive search optimization process is needed to optimize the function [3].
The Entropy (Kapur) method: In this method, two probability distributions are derived from gray level probability distributions. One is defined for discrete values 1 to T, and the other for values T+1 to L. The total entropy φ(f) is the sum of the entropies associated with each distribution. That is, Where, H t and H T are the entropies of class C 0 and whole image, respectively; thus, the entropy of class C 1 is H T -H t .
It is required to obtain the maximum information between the object and the background distributions in the image. The discrete value T which maximizes φ(f) is chosen as the threshold value [2].
The Cross Entropy (Al-Attas) method: This method obtains the optimum threshold that minimizes the minimum cross entropy using gamma distribution. The fidelity function of threshold selection which minimizes the cross entropy of the image is found to be, Where 0 ( ) The DBT (Chowdhury) method: In this method, the total average information for discriminating class C 0 versus class C 1 , and discriminating information for class C 1 versus class C 0 can be measured by the logarithm of the likelihood ratios. Therefore, the total average information for discriminating class C 0 from class C 1 is the divergence function, Where w 0 and w 1 are the probability of occurrence classes C 0 and C 1 , respectively, and t is required to minimize this function to get a threshold T, which distinguishes the object from the background [4].

Existing multi-level thresholding methods
Statistical methods: All of the methods that we discussed are bilevel thresholding methods. These methods are useful for some limited applications that have only one object in a background, for example text or finger print segmentation. But if we want to use thresholding to segment many objects in an image, such as a medical image, these methods do not fulfill our segmentation requirements. Thus, it is sensible to extend these methods to multi-level thresholdings. Most of the efforts in this issue have been done in Otsu's method [6]. But, these techniques can be also used in other approaches, for example covariance or DBT methods.
Let M be the class number of thresholded gray levels that we want to use in our application. Therefore, we need M-1 threshold values that should be designated in an optimum way. For each class C i , i = 0,1,…,M -1, as we mentioned before, we define a cost function such as Cost-Function(C 0 ,C 1 ,…,C M-1 ) and we try to find the optimum M classes such that this function is maximized or minimized according to the method of bi-level thresholding and the nature of the image histogram. This needs an exhaustive search over all combinations of M-1 threshold values starting from zero gray-level to the maximum gray-level value in the image.
So far, many approaches have been proposed to overcome the complexity limitation. A hybrid optimal estimation algorithm was proposed in reference [7] for solving the multi-level thresholding problem. In this method, the distribution of image intensities are approximated by using a mixture Gaussian model with parameters computed by a PSO (Particle Swarm Optimization) and EM (Expectation Maximization) algorithms iteratively. They confirmed that the hybrid PSO and EM algorithms can solve the multi-level thresholding problem quickly, with high quality of thresholded output images for complicated images. However, the complexity is still very high. The author in [8] showed that a recursive algorithm can greatly reduce the computational complexity of determining a multi-level threshold by accessing a look-up table, when compared to the conventional Otsu method. In this method, first a lookup table is constructed including all zero and first order moments of probabilities that will be used later in the computations. Making this look up table is not simple and also suffers from the problem of needing too much time in growing the class number M in an image. In [9], a Two-Stage Multilevel thresholding Otsu method (TSMO) was used. In this method, the whole range of gray levels are subdivided into M z groups, and in each group the mean value is calculated as a representative of that group, and then the exhaustive Otsu multi-level thresholding search is applied to the set of subgroups. This is the first stage of multi-level thresholding Otsu method. In the second stage, it is tried to find the best threshold values in each group. This procedure reduces the computational load very quickly with increasing the number of classes M. This method reduces complexity, but still needs an exhaustive search over regions. This is caused by searching on only certain groups of pixels which made this process to fail for finding the optimum thresholds too.
Fuzzy C-Means: Fuzzy C-Means (FCM) uses the same idea as C-Means (CM), except that in this approach each data can belong to more than one cluster with a membership degree. This method, which first developed by Dunn (1973) and improved by Bezdek [10], now is used in many pattern recognition applications extensively. Similar to CM, FCM is based on minimization of an objective function as follows: In this equation, n is the number of the data samples, x i is the i th data point, c j is the center point of the jth cluster, m i j u is the membership degree of the i th data point to the j th cluster, and m which is called the fuzziness degree is any real number greater than 1. Therefore, in this case each data point could belong to more than one cluster with a membership degree. This membership function could have values in the range [0,1]. In this range, zero value means no membership, one value means full membership, and values between zero and one represent partial membership. The FCM algorithm is carried out in an iterative manner. Similar to CM, we start with some initial center points (in this case initial memberships m i j u are possible too), then we compute new memberships and center values according to the following: Therefore, the algorithm is implemented in the following steps: • Place k points in the data space as the representatives of each cluster.
• Calculate the center points and membership functions according to (6) and (7).
• Repeat steps 2 and 3, until the objective function doesn't change anymore, or the iterations exceed a predefined value.
In contrast to CM, this algorithm almost always converges to the optimum solution. However, it still has sensitivity to the number of predefined clusters, and initial values of the center points or membership functions. This dependency causes occasional non-convergence of the algorithm, where it get stuck in local minima points.
In this paper, we introduce another new multi-level thresholding method based on a Shrinking Search Space (3S) scheme, which is implemented on different statistical bi-level thresholding methods including the Entropy method, the Covariance method, the Cross Entropy method, and the DBT method. Also, we have compared this technique with FCM as a benchmark to find out its segmentation quality and computation complexity.

Proposed 3S multi-thresholding method
The proposed method uses a bi-level thresholding technique as the core of the 3S multi-thresholding technique. The flowchart diagram description in (Figure 1) shows the procedure of the proposed method as well as the pre-processing steps in this research work.
Based on the flowchart in (Figure 1), the Region of Interest (ROI) is found first, which is a rectangular region that our image fits in. Then, the IC extraction is performed via a region growing method followed by some morphological operations. Finally, to achieve the multilevel thresholding, a bi-level thresholding based on the mentioned statistical methods (Entropy, Cross Entropy, Covariance, and DBT) are performed. After implementing one bi-level thresholding method and finding the best threshold, the two most optimum classes C 0 (higher gray levels class) and C 1 (lower gray levels class) are distinguished. The class C 0 is saved and left as an optimum class, and then the algorithm continues with lower gray values in class C 1 to treat it as the original image. As a result, the first iteration gives the first optimum class in the high range of gray values. Then the same bi-level thresholding method is applied on the remained class C 1 , to subdivide it again into two new classes C 1 (higher gray levels class) and C 2 (lower gray levels class). Therefore, the second iteration fixes the second class C 1 . This procedure is repeated on C 2 and the subsequent results, until it reaches the gray value zero in the histogram.
As illustrated, finding the optimum class of higher gray values in all iterations is necessary. However, this is not the only case for every thresholding technique. In thresholding techniques where a cost function has to be maximized, this movement from high-to-low gray values is used, but for the thresholding techniques which are dealing with minimizing a cost function, a movement from low-to-high gray values is done, i.e. the most optimum class that is nearest to lower gray values is found first, then the portion of histogram below this value is thrown out, and then the procedure continues with the remained portion. Algorithm 1 shows the pseudo-code of this technique. In this procedure, the histogram portion of the image is stored in a variable (processed_hist) for finding the best threshold value, and then this variable is used in the search scheme. In each stage, after finding the threshold value, the part of this variable with indexes higher than a threshold value is thrown out, and the remained portion is stored in the same variable (processed_hist). This procedure continues to reach to the first point of the histogram.

Experimental Results and Discussions
We have tested our algorithm on ten 3D T1-weighted MRI images of Brain Web simulated brain from MIDAS database with different noise levels and acquisition times, to segment brain tissues. This algorithm has been implemented on a COMPAQ PC Pentium4 (1.6 GHz) with 512MB RAM.
The above mentioned algorithms have been tested on these MRI images. Figure 2a shows an axial brain MRI image after preprocessing, which is just a simple contrast stretching method. The graph in Figure  2b shows the histogram of the gray level image.  Figure 2a, which includes the skull of the brain. As it was stated earlier, to make a complete IC mask, it is needed to do some morphological operations (dilation and erosion) on this result to fill up the gaps and remove thin bridges. Figure 3b shows the resulting mask. Figure 3d is the result of applying this mask on the input image shown in Figure 3c. Then, the four different statistical bi-level thresholding methods are combined with the 3S technique and applied on the image displayed in Figure 3d. Also the FCM method is applied on this image.
The histograms of the five thresholded images based on the above methods on the input in Figure 3d, are shown in Figure 4a-e. From these histograms, it can be seen that entropy method distinguishes so many threshold levels, so it cannot be used for segmentation. The three next methods have produced a lower number of thresholds but in different positions. Also, the 3-cluster FCM has recognized three main peaks in the histogram. The difference between the DBT method and the FCM method is that, DBT has considered thresholds for higher gray values. This is useful because higher gray values have lower probabilities, and according to information theory, these values have higher importance and they have to be considered as different objects [11].   In Figure 6, the results of the four stated thresholded images besides the FCM method have been compared. Figure 6a explains the result of applying the 3S technique on the Entropy method. This figure shows no acceptable separation between gray values. Figure 6b indicates the result of applying the hybrid of 3S and Cross Entropy on the original image. It shows that this method has removed a lot of information from the original image. Applying the 3S algorithm on Covariance methods in Figure 6c has given better results than the two last methods, while it can be seen that the DBT method in Figure 6d has created the highest contrast between different brain tissues in the MR image; therefore, it outperforms among the others. We have also applied the FCM method on the input image as a benchmark to compare our best hybrid method (3S+DBT) in the four mentioned methods with the result of FCM in Figure 6e. Comparing visually Figure 6d and 6e indicates a little bit better segmentation quality for the 3S+DBT than the FCM method.
To further explain the capabilities of applying the 3S technique in combination with different statistical thresholding methods and the FCM clustering method, three other images of the brain MRI have been used in this study. In a colored format, Figure 7b-e explains the results of using the 3S technique on an axial brain MRI image shown in Figure 7a, which includes the tissues of eyes. The FCM method has been also applied on this image, which results in the segmented image of Figure 7f. Each segment is shown with a different color. Obviously, it can be seen that the DBT method has produced good results with respect to the other three statistical and FCM methods. In this case, the FCM method has not converged to the optimum solution. Therefore, the FCM method has the occasional problem of getting stuck in local minima, although it usually outperforms the CM algorithm.
The segmentation quality of the proposed 3S+DBT method has been compared against that of the FCM method in terms of their similarity index measure with respect to manually segmented images in the BrainWeb database. The similarity index between the two measurements, S ( [0,1] S ∈ ), is defined as the ratio of twice their common area to the sum of their individual areas, as follows [12]: This measurement, in our experiment, shows an average SI of 0.68 for the implemented 3S+DBT method against 0.51 for the FCM method for the implemented slices in this work. This proves that the proposed method has performed better than the FCM method.
In addition to the priority of the mixture of 3S and DBT over other methods in terms of the segmentation quality, another comparison is made between the processing times of these four hybrid methods. It can be seen that the mixture of 3S and DBT has averagely achieved a little higher processing time (0.37s) compared to the mixture of 3S and Cross Entropy (0.31s) and the mixture of 3S and Covariance (0.28s) methods, but it is much better than the mixture of 3S and Entropy (1.45s) method, and also too much better than the FCM method (157s).
To generalize the comparison between the 3S hybrid techniques and FCM, these methods have been tested on a sagittal MRI image of the BrainWeb database in the next two figures. In Figure 8 and 9, the 3S+DBT method, as the best choice of the statistical thresholding methods, is compared with the FCM method which is a commonly used clustering algorithm, on one sagittal brain MR image as well, without applying region growing algorithm for non-brain tissue segmentation task. This comparison proves again better segmentation of the 3S+DBT method with respect to the FCM method, and also much lower processing time of the 3S method (0.78s) than that of the FCM method (423.99s). (Figure 8a-c) show the segmented images of the original image shown in Figure 8a based on the 3S+DBT method, in a binary format, while, (Figure 9a-c) show the same results from the FCM method.
According to the result in Figure 7f, the convergence of the FCM method is not always guaranteed too. This problem was inspected again on another axial brain MRI image by applying the two thresholding methods of FCM, and 3S+DBT. Figure 10a shows the original axial image, Figure 10b) shows the result of the FCM segmentation, and (Figure 10c) shows the result of the hybrid of 3S and DBT method. Comparing Figure 10 b and 10c, we see that the convergence of the FCM method on this image is not helpful, furthermore its processing time is substantially high (595s) compared to processing time of the 3S method (0.9s).

Conclusion
We have used a new multi-level thresholding technique based on a Shrinking Search Space (3S) technique on brain MRI Images. The advantages of this technique with respect to other multi-level thresholding algorithms and the FCM method are: 1) the quality of segmentation compared to other methods is high; 2) the search complexity is very low compared to the exhaustive searches in other algorithms; thus, the computation time is much lower than that of the FCM method; 3) in existing algorithms, the number of thresholds must be determined earlier but the 3S algorithm automatically finds the optimum threshold number, such that it covers all objects with different gray levels. Therefore, the 3S technique inhibits from producing unnecessary thresholds; i.e. it removes the redundancy in the number of thresholds that exists in older methods.   Based on the abovementioned results, the 3S+DBT method has better capability for segmentation of brain MRI images than other three methods. This method is constructed based on the maximum likelihood function, which is basic in Bayesian algorithm for many pattern recognition techniques. There are occasional misclassifications by the FCM method due to its occasional non-converging nature. Thus, the 3S method could be a substitute for the FCM method in many segmentation applications in terms of both quality and computation complexity.