<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD 2.3 20070202//EN" "http://dtd.nlm.nih.gov/publishing/2.3/journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
	<front>
		<journal-meta>
			<journal-id journal-id-type="nlm-ta">J Comput Sci Syst Biol</journal-id>
			<journal-id journal-id-type="publisher-id">opg</journal-id>						
			<journal-title>Journal of Computer Science &amp; Systems Biology</journal-title>			 
			<issn pub-type="epub">0974-7230</issn>
			<publisher>
				<publisher-name>OMICS Publishing Group</publisher-name>
				<publisher-loc>India, USA</publisher-loc>
			</publisher>
		</journal-meta>
		<article-meta>		
			<article-id pub-id-type="doi">10.4172/jcsb.1000028</article-id>		
			<article-id pub-id-type="publisher-id">000063</article-id>
			<article-categories>
				<subj-group subj-group-type="heading">
					<subject>Research Article</subject>
				</subj-group>
				<subj-group subj-group-type="Discipline">
					<subject>Biochemistry</subject>
				</subj-group>
				<subj-group subj-group-type="System Taxonomy">
					<subject>Proteomics</subject>
					<subject>Bioinformatics</subject>
					<subject>Genomics</subject>
					<subject>Transcriptomics</subject>
					<subject>Biomarkers</subject>
				</subj-group>
			</article-categories>
			<title-group>
				<article-title>L<sub>1</sub> Least Square for Cancer Diagnosis using Gene Expression Data</article-title>
			</title-group>
			<contrib-group>
				<contrib contrib-type="author">
					<name>
						<surname>Hang</surname>
						<given-names>Xiyi</given-names>
					</name>		
					<xref ref-type="aff" rid="a1">1</xref>									
				</contrib>	
				<contrib contrib-type="author">
					<name>
						<surname>Wu</surname>
						<given-names>Fang-Xiang</given-names>
					</name>	
					<xref ref-type="aff" rid="a2">2</xref>	
					<xref ref-type="aff" rid="a3">3</xref>										
				</contrib>							
			</contrib-group>
			<aff id="a1"><label>1</label>Department of Electrical and Computer Engineering, California State University, Northridge, CA 91330, USA</aff>		
			<aff id="a2"><label>2</label>Department of Mechanical Engineering</aff>	
			<aff id="a3"><label>3</label>Divsion of Biomedical Engineering, University of Saskatchewan, Saskatoon, Saskatchewan, S7N 5A9, Canada</aff>	
			<author-notes>
				<corresp id="cor1">&ast; To whom correspondence should be addressed: Fang-Xiang Wu, Divsion of Biomedical Engineering University of Saskatchewan, Saskatoon, Saskatchewan, S7N 5A9, Canada; E-mail: <email>xhang@csun.edu</email>, <email>faw341@mail.usask.ca</email></corresp>
			</author-notes>
			<pub-date pub-type="collection">
			     <month>04</month>
				 <year>2009</year>
			</pub-date>
			<pub-date pub-type="epub">
				<day>27</day>
				<month>04</month>
				<year>2009</year>
			</pub-date>			
			<volume>2</volume>
			<issue>2</issue>
			<fpage>167</fpage>
			<lpage>173</lpage>
			<history>
			<date date-type="received">
			     <day>19</day>
				 <month>03</month>
				 <year>2009</year>
			</date>
			<date date-type="accepted">
			      <day>27</day>
				  <month>04</month>
				  <year>2009</year>
			</date>
			</history>
			<permissions>			
			<copyright-statement><bold>Copyright:</bold> &copy; 2009 Hang X, et al.</copyright-statement>
			<copyright-year>2009</copyright-year>
			<license license-type="open access">
			<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</p>
			 </license>
			 </permissions>					
			<abstract>
				<p>The performance of most methods for cancer diagnosis using gene expression data greatly depends on careful model selection. Least square for classification has no need of model selection. However, a major drawback prevents it from successful application in microarray data classification: lack of robustness to outliers. In this paper we cast linear regression as a constrained l<sub>1</sub>-norm minimization problem to greatly alleviate its sensitivity to outliers, and hence the name l<sub>1</sub> least square. The numerical experiment shows that l<sub>1</sub> least square can match the best performance achieved by support vector machines (SVMs) with careful model selection.</p>
			</abstract>	
			<kwd-group>
				<kwd>l<sub>1</sub>-norm minimization</kwd>
				<kwd>least square regression</kwd>
				<kwd>classification</kwd>
				<kwd>cancer</kwd>	
				<kwd>gene expression data</kwd>		
				<kwd>support vector machine</kwd>					
			</kwd-group>	
			<custom-meta-wrap>
				<custom-meta>
					<meta-name>citation</meta-name>
					<meta-value>Hang X, Wu FX (2009) L<sub>1</sub> Least Square for Cancer Diagnosis using Gene Expression Data. J Comput Sci Syst Biol 2: 167-173. doi:<ext-link ext-link-type="doi" xlink:href="10.4172/jcsb.1000028">10.4172/jcsb.1000028</ext-link></meta-value>
				</custom-meta>
			</custom-meta-wrap>
		</article-meta>
	</front>
	<body>
		<sec>
			<title>Introduction</title>
				<p>DNA microarray technique has the potential to provide a more accurate and objective cancer diagnosis than traditional histopathological approach with its high throughput capability of simultaneously measuring relative expression
level of tens of thousands of genes. The success, however, greatly depends upon the supervised learning algorithm selected to classify gene expression data.</p>
				<p>Many well-established methods are available for gene expression profile classification. According to <xref ref-type="bibr" rid="r15">Lee et al (2005)</xref>, they can be classified into four categories: (1) classical methods, such as Fisher&rsquo;s linear discriminant analysis, logistic regression, K-nearest neighbor, and generalized partial least square, (2) classification trees and aggregation methods, such as CART, random forest, bagging and boosting, (3) machine learning methods, such as neural network and support vector machines (SVMs), and (4) generalized methods, such as flexible discriminant analysis, mixture discriminant analysis, and shrunken centroid method. The performance of many methods, however, relies upon careful choice of model parameters, which can be done via model selection procedure such as cross validation. For example, the model parameters for SVMs include kernel parameters and the penalty parameter C. A recent controversy regarding the performance comparison between SVM and random forest just exemplifies the importance of model selection. The study by <xref ref-type="bibr" rid="r8">Diaz-Uriarte et al. (2006)</xref> concludes that random forest outperforms SVM, and the conclusion in paper (<xref ref-type="bibr" rid="r21">Stanikov et al, 2008</xref>) is totally opposite. The main difference between these two studies is that model selection is carefully designed in the latter study but not in the former study. The incident also shows that model selection may be the obstacle of the extensive application of SVM in classification of gene expression profile. Since classification performance is a nonconvex function of model parameters, it is usually difficult to find optimal model parameters by model selection.</p>
<p>Least square for classification, on the other hand, has no need of model selection. Consider a general classification
problem with N classes. A linear model is built for each class k</p>

<p>Y<sub>k</sub> = W<sup>T</sup><sub>k</sub> X + W<sub>k0</sub>, k = 1,2,..., N. &nbsp;&nbsp;&nbsp;&nbsp; (1)</p>
<p>The N equations can be grouped into</p>
<p>y=Wx&#732; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(2)</p>
<p>where  y = [y<sub>1</sub>, y<sub>2</sub>,...,y<sub>N</sub>]<sup>T</sup>, W is a matrix whose kth row is [ w<sup>T</sup><sub>k</sub>, w<sub>k0</sub> ], and x&#732; =[x<sup>T</sup>,1]<sup>T</sup>. For a training dataset {(x<sub>i</sub> , t<sub>i</sub> ),i= 1, 2,....,n}, where t<sub>i</sub> is 1-of-N binary coding vector of the label of the ith feature x<sub>i</sub> , i.e., a vector containing zeros everywhere except 1 in the kth position, if x<sub>i</sub> belongs to category k. Denote by X the feature matrix whose kth row is [x<sup>T</sup><sub>k</sub> ,1] , and T the target matrix whose kth row is t<sup>T</sup><sub>i</sub>.The linear regression model in (2) can be fitted simultaneously to each of columns of T, and the solution is in the form</p>
<p>W&#770; = (X<sup>T</sup>X)<sup>-1</sup>X<sup>T</sup>T &nbsp;&nbsp;&nbsp;&nbsp; (3)</p>
<p>&bull; Calculate the fitted output y&#770; = W&#770;[x<sup>T</sup>,1]<sup>T</sup>(an N-dimensional vector);</p>
<p>&bull; Label = argmax<sub>k</sub> y&#770;(k), k = 1,2,.. .,N. More details can be found in literature (<xref ref-type="bibr" rid="r1">Bishop, 2006</xref>; <xref ref-type="bibr" rid="r13">Hastie et al., 2001</xref>).</p>
<p>The above approach, however, is very sensitive to outliers, especially for multicategory classification (N &ge; 3).
Furthermore, when least square for classification is applied to gene expression data, problems can become more severe due to the curse of dimensionality caused by the great number of genes in each sample.</p>
<p>Inspired by the recent progress in sparse signal recovery via l<sub>1</sub> &ndash; norm minimization (<xref ref-type="bibr" rid="r2">Cand&egrave;s et al., 2006</xref>, <xref ref-type="bibr" rid="r3">Cand&egrave;s and Tao, 2006</xref>; <xref ref-type="bibr" rid="r9">Donoho, 2006</xref>), we propose a new approach to overcome the major drawback of least square for classification by casting the linear regression problem as a constrained l<sub>1</sub> &ndash; norm minimization problem. The obtained sparse solution is much less sensitive to both outliers and curse of dimensionality. In addition, multicategory classification is realized via one-versus-rest (OVR) and one-versus- one (OVO) approaches which decompose the original multi-category problem into a series of binary problems. The new method is validated by comparing caner diagnosis performance with SVMs.</p>
		</sec>	
		<sec sec-type="methods">
			<title>Methods</title>
			<sec>
				<title>Binary L<sub>1</sub> Least Square</title>
					<p> Consider a training dataset {(x<sub>i</sub>, y<sub>i</sub>);i=1,...,n}, x<sub>i</sub>&euro; R<sup>d</sup>, y<sub>i</sub> &euro;{-1, +1}, where x<sub>i</sub> represents the ith sample, a d-dimensional column vector containing gene expression values with d as the number of genes, and y<sub>i</sub> is the label of the ith sample. Two classes are described by a liner model</p>
					<p>y = [x<sup>T</sup>,1]w &nbsp;&nbsp;&nbsp;&nbsp; (4) </p>
					<p>for any sample x. Applying the linear model to the training dataset, we have</p>
					<p>y<sub>i</sub> = [x<sup>T</sup><sub>i</sub>,1]w, i = 1,2,...,n &nbsp;&nbsp;&nbsp;&nbsp; (5)</p>
					<p>The n equations can be grouped into</p>
					<p>y = Xw &nbsp;&nbsp;&nbsp;&nbsp; (6)</p>
					<p> where y = [y<sub>1</sub>, y<sub>2</sub>,...,y<sub>n</sub>]<sup>T</sup>, and X is an n &times; (d +1) matrix whose ith row is [x<sup>T</sup><sub>i</sub>,1]<sup>T</sup>. Since the number of samples are much smaller than the number of genes, i.e., n &lt;&lt; d, the system in (6) is underdetermined. The solution is obtained by casting the original problem as the following constrained l<sub>1</sub>-norm minimization problem</p>
					<p>min ||w||<sub>1</sub> subject to Xw = y &nbsp;&nbsp;&nbsp;&nbsp; (7)</p>
					<p>The above formulation is inspired by the recent progress in compressed sensing (<xref ref-type="bibr" rid="r2">Cand&egrave;s et al., 2006</xref>, <xref ref-type="bibr" rid="r3">Cand&egrave;s and Tao, 2006</xref>; <xref ref-type="bibr" rid="r9">Donoho, 2006</xref>) and basis pursuit denoising (<xref ref-type="bibr" rid="r6">Chen et al., 2005</xref>).</p>
					<p>There are quite a few solvers available for solving the optimization problem defined in (7), such as MOSEK
(Andersen, 2002) PDCO-CHOL (<xref ref-type="bibr" rid="r19">Saunders, 2002</xref>), PDCOLSQR (<xref ref-type="bibr" rid="r19">Saunders, 2002</xref>), and l<sub>1</sub>-magic (<xref ref-type="bibr" rid="r4">Cand&egrave;s and Romberg, 2006</xref>), which all belong to interior-point methods. In this study we choose a solver called SPGL1 (<xref ref-type="bibr" rid="r11">Friedlander and Van den Berg, 2008</xref>) for its efficiency in solving largescale problems. Unlike other methods, SPGL1 solves the optimization problem by converting it into a root finding problem. Please refer to paper (<xref ref-type="bibr" rid="r28">Van den Berg and Friedlander, 2008</xref>) for details on the theory of SPGL1.</p>
					<p> Denote by w&#770; the solution to (7). Then for any sample x, the label can be simply assigned as <italic>sign</italic>([x<sup>T</sup> ,1]w&#770;).</p>
			</sec>
			<sec>
				<title>Multicategory L<sub>1</sub> Least Square: OVR</title>
					<p>Consider a multicategory training dataset {(x<sub>i</sub>, y<sub>i</sub>); i=1,...,n}, x<sub>i</sub> &euro; R<sup>d</sup>, y<sub>i</sub> &euro; {1,2,...n}, where N is the category number. OVR approach needs to determine for each class a binary classifier to separate it from the remaining classes. The N linear models are defined as</p>
					<p>D<sub>k</sub>(x) = [ X<sup>T</sup> ,1] w<sub>k</sub>, k = 1,2,...N &nbsp;&nbsp;&nbsp;&nbsp; (8)</p>
					<p>For category k, after changing the labels of those samples belonging to k to +1, and others to -1, we apply the linear model to the training dataset</p>
					<p>y<sub>k</sub> = Xw<sub>k</sub> , k = 1, 2,....,N &nbsp;&nbsp;&nbsp;&nbsp; (9)</p>
					<p>where y<sub>k</sub> is a label vector containing either +1 or -1. Similarly, the above N underdetermined systems can be solved by the following N constrained l<sub>1</sub>-norm minimization problems</p>
					<p>min || w<sub>k</sub> ||<sub>1</sub>subject to Xw<sub>k</sub> = y<sub>k </sub>&nbsp;&nbsp;&nbsp;&nbsp; (10) </p>
					<p>where k = 1,2,....N.</p>
					<p>Denote by k<sub>w&#770;k</sub> the solution to (10). Then for any sample x, the label can be determined by</p>
					<p>arg max<sub>k=1,2,...N</sub> D<sub>k</sub>(x) = [ X <sup>T</sup>, 1] W&#770;k. &nbsp;&nbsp;&nbsp;&nbsp; (11)</p>
			</sec>
			<sec>
				<title>Multicategory L<sub>1</sub> Least Square: OVO</title>					
					<p>In OVO approach, a binary classifier is constructed for each pair of classes. The linear model for class i against class j is given by</p>
					<p> D<sub>i,j</sub>(x) = [X<sup>T</sup>,1]w<sub>i,j</sub> &nbsp;&nbsp;&nbsp;&nbsp; (12)</p>
					<p> For those samples of category i and j, changing their labels to +1 and -1, applying the linear model gives rise to</p>
					<p>y<sub>i,j</sub> = X<sub>i,j</sub> W<sub>i,j</sub> &nbsp;&nbsp;&nbsp;&nbsp; (13)</p>
					<p>where y<sub>i,j</sub> is a vector containing either +1 or -1, and X<sub>i,j</sub> is a matrix whose kth row is [X<sup>T</sup><sub>k</sub>,1]<sup>T</sup> with x<sub>k</sub> belonging to either category i or j. The underdetermined system is solved by</p>
					<p>min || W<sub>i,j</sub> ||<sub>1</sub> subject to X<sub>i,j</sub> W<sub>i,j</sub> &nbsp;= &nbsp;y<sub>i,j</sub> &nbsp;&nbsp;&nbsp;&nbsp; (14)</p>
					<p>Since D<sub>j,i</sub> &nbsp;= &nbsp;&nbsp; -D <sub>i,j</sub>, the number of the classifiers is (<sub>2</sub><sup>N</sup>) ,i.e., N(N-1)/2, compared to N in OVR approach.</p>
					<p>Denote by W&#770;<sub>i,j</sub> the solution to (14). For any sample x, we calculate</p>
					<p>D<sub>i</sub> (x) = &sum;<sup>N</sup><sub>j=1,j&ne;1</sub> sign(D<sub>i,j</sub>(x))&nbsp;&nbsp;&nbsp;&nbsp; (15)</p>
					<p> with &nbsp;D<sub>i,j</sub> (x) = &sum;<sup>N</sup><sub>j=1,j&ne;1</sub> X<sub>i,j</sub>W&#770;<sub>ij</sub>. The label of x is determined by</p>
					<p>  arg max<sub>1,2,...,N</sub> D<sub>i</sub>(x) &nbsp;&nbsp;&nbsp;&nbsp; (16)</p>
			</sec>
			<sec>
				<title>Numerical Experiment</title>
					<p>Numerical experiment is carefully designed to validate the cancer diagnosis performance of l1 least square using gene expression data. The performance metric is classification accuracy obtained by 10-fold stratified cross validation.
MATLAB R14 is used to implement the new method. The results are compared with binary SVM (<xref ref-type="bibr" rid="r29">Vapnik, 1998</xref>) and some popular variants of multicategory SVMs including OVR-SVM (<xref ref-type="bibr" rid="r14">Kressel, 1999</xref>), OVO-SVM (<xref ref-type="bibr" rid="r14">Kressel, 1999</xref>), DAGSVM (<xref ref-type="bibr" rid="r17">Platt et al., 2000</xref>), method by Weston and Watkins (WW) (<xref ref-type="bibr" rid="r30">Weston and Watkins, 1999</xref>), and method by <xref ref-type="bibr" rid="r7">Crammer and Singer (CS) (2000)</xref>.</p>
				<p>The results of SVMS are obtained from GEMS (Gene Expression Model Selector), which is software with graphic
user interface for classification of gene expression data. It is freely available at <ext-link ext-link-type="uri" xlink:href="www.gems-system.org">http://www.gems-system.org/</ext-link>. GEMS is used by <xref ref-type="bibr" rid="r23">Stanikov et al.(2005)</xref> for the comprehensive study of the performance of multiple classifiers on gene expression cancer diagnosis. As for model selection, polynomial kernels are used with orders p = {1,2, 3}, and the penalty parameter C = {10<sup>-3+0.5n</sup>, n = 0, 1, &hellip;, 6}.</p>
<p>Six datasets are used in the experiment, which are among eleven datasets used in reference (<xref ref-type="bibr" rid="r23">Stanikov et al., 2005</xref>). They are available on the website of GEMS in the format of both GEMS and MATLAB mat file. For easy comparison and reference, we adopt the names used in reference (<xref ref-type="bibr" rid="r23">Stanikov et al., 2005</xref>). The information about the six datasets is summarized below.</p>
					<list id="L1" list-type="bullet">
						<list-item>
							<p>DLBCL (<xref ref-type="bibr" rid="r20">Shipp et al., 2002</xref>): The binary dataset comes from a study of gene expression of two lymphomas: diffuse large B-cell lymphomas and follicular lymphomas. Each sample contains 5469 genes. The sample number is 77.</p>
						</list-item>
						<list-item>
							<p>Prostate_Tumor (<xref ref-type="bibr" rid="r24">Singh et al., 2002</xref>): The binary dataset contains gene expression data of prostate tumor and normal tissues. There are 10509 genes in each sample, and 102 samples.</p>
						</list-item>
						<list-item>
							<p>9_Tumors (<xref ref-type="bibr" rid="r22">Staunton et al., 2001</xref>): The dataset comes from a study of 9 human tumor types: NSCLC, colon, breast, ovary, leukaemia, renal, Melanoma, prostate, and CNS. There are 60 samples, each of which contains 5726 genes.</p>
						</list-item>
						<list-item>
							<p>11_Tumors (<xref ref-type="bibr" rid="r25">Su et al., 2001</xref>): The dataset includes 174 samples of gene expression data of 11 various human tumor types: ovary, bladder/ureter, breast, colorectal, gastroesophagus, kidney, liver, prostate, pancreas, lung adeno, and lung squamous. The number of genes is 12533.</p> 
						</list-item>
						<list-item>
							<p>Brain_Tumor1 (<xref ref-type="bibr" rid="r18">Pomeroy et al., 2002</xref>): The dataset comes from a s study of 5 human brain tumor types: medulloblastoma, malignant glioma, AT/RT, normal cerebellum, and PNET, including 90 samples. Each sample has 5920 genes.</p>
						</list-item>
						<list-item>
							<p>Brain_Tumor2 (<xref ref-type="bibr" rid="r16">Nutt et al., 2003</xref>): There are 4 types of malignant glioma in this dataset: classic glioblastomas, classic anaplastic oligoden-drogliomas, non-classic glioblastomas, and non-classic anaplastic oligodendrogliomas. The dataset has 50 samples, and the number of genes is 10367.</p>							
						</list-item>
					</list>
					<p>All the datasets are normalized by rescaling the gene expression values to be between 0 and 1.</p>
					<p>Two methods are used in this experiment to study gene selection&rsquo;s impact on classification performance: Kruskal- Wallis non-parametric one-way ANOVA (KW) (<xref ref-type="bibr" rid="r12">Gibbons, 2003</xref>), and the ratio of between classes to within class sums of square (BW) (<xref ref-type="bibr" rid="r10">Dudoit et al., 2002</xref>).</p>
			</sec>
		</sec>
		<sec>
			<title>Results</title>
				<sec>
					<title>Classification without Gene Selection</title>
						<p><xref ref-type="table" rid="t1">Table 1</xref> shows the classification accuracy values obtained by 10-fold stratified cross validation for both l<sub>1</sub> least square and SVMs. The results of SVMs are slightly different from
what is reported by <xref ref-type="bibr" rid="r23">Stanikov et al. (2005)</xref> where the five datasets are also used. A possible explanation is that the distribution for cross validation in our study is different from that in paper (<xref ref-type="bibr" rid="r23">Stanikov et al., 2005</xref>).</p>

						<p>For binary datasets Prostate_Tumor and DLBCL, the performance of l<sub>1</sub> least square is slightly below that of SVMs. Note that the results of SVMs are obtained by careful model selection using cross validation, while our method does not need model selection, and is totally automatic. In addition, just like SVM, when applied to binary datasets, the
multicategory classifiers of l<sub>1</sub> least square are equivalent to binary classifier for both OVO and OVR approaches.</p>
<p>When applied to classification of multicategory datasets, OVR- l<sub>1</sub> least square can closely match the best performance achieved by SVMs. For both SVM and l<sub>1</sub> least square, OVO approach performs much worse than OVR approach for classifying 9_Tumors dataset.</p>
				</sec>
				<sec>
					<title>Classification with Gene Selection</title>
						<p><xref ref-type="table" rid="t2">Table 2</xref> shows the best performance achieved by OVR- l<sub>1</sub> least square and SVMs when gene selection methods KW and BW are used. The results show that both l<sub>1</sub> least square and SVMs perform slightly better compared with the performance without gene selection reported in <xref ref-type="table" rid="t1">Table 1</xref>. The improvement ranges from 0 to 9% for SVMS, while only from 0 to 3.48% for OVR- l<sub>1</sub> least square. Again, the performance of OVR- l<sub>1</sub> least square is comparable to SVMs.</p>
				</sec>
		</sec>
		<sec>
			<title>Discussion</title>
			<p>The success of l<sub>1</sub> least square may lie in its sparse linear model coefficient vector obtained from l<sub>1</sub> &ndash; norm minimization. <xref ref-type="fig" rid="g1">Figure 1</xref> shows the model coefficient vector w which is the solution of l<sub>1</sub> least square for classifying binary dataset DLBCL. The sparsity suggests that those genes with greater absolute coefficients could have played more important roles in classification. As a result, the classification performance  does not depend on all the genes, especially those with very small absolute coefficients. The sparsity has the potential to greatly alleviate curse of dimensionality and increase the robustness to outliers.</p>
			<p>Another implication of sparsity is that those genes with larger absolute coefficients may correspond to biological
markers. Hence, sparsity could be also used for gene selection. We did a small experiment to verify this possibility. The binary dataset DLBCL is used to fit l<sub>1</sub> least square model. Gene selection is done by choosing M genes with M largest absolute coefficients. Binary SVM is used to classify the gene-selected data. The results are compared with KW and BW methods for gene selection. <xref ref-type="fig" rid="g2">Figure 2</xref> shows the performance of the three gene selection methods for M =10, 20, 30, 40, and 50, respectively. The new method significantly outperforms both KW and BW methods when a small number of genes are selected.</p>
				<fig id="g1">
					<label>Figure 1</label>
					<caption>
						<title>The sparse coefficient vector.</title>
					</caption>
					<graphic xlink:href="JCSB-02-167-g001.tif"/>
				</fig>
				<fig id="g2">
					<label>Figure 2</label>
					<caption>
						<title>The performance of three gene Selection methods.</title>
					</caption>
					<graphic xlink:href="JCSB-02-167-g002.tif"/>
				</fig>
				<p>The above gene selection approach is in spirit similar to lasso (<xref ref-type="bibr" rid="r26">Tibshirani, 1996</xref>) formulated as follows</p>
				<p> min || Xw - y ||<sup>2</sup><sub>2</sub> subject to || w ||<sub>1</sub> &le; t &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(17)</p>
				<p>where X, w, and y follow the definitions given in section 2.1 for binary l<sub>1</sub> least square, and t is the model parameter for lasso. In addition, lasso can also be used in classification by replacing (7) with (17) for binary case, (10) with</p> 
				<p>min || Xw<sub>k</sub> - y<sub>k</sub> ||<sup>2</sup><sub>2</sub> subject to || w<sub>k</sub> ||<sub>1</sub> &le; t<sub>k</sub> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(18)</p>				
				<p>for multi-category OVR approach, and (14) with</p>
				<p>min || Xw<sub>i,j</sub> - y<sub>i,j</sub> ||<sup>2</sup><sub>2</sub> subject to || w<sub>i,j</sub> ||<sub>1</sub> &le; t<sub>i,j</sub>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(19) </p>
				<p>for multi-category OVO approach.</p>
				<p>Similarly, we can also replace l<sub>1</sub> least square regression with Dantzig selector (<xref ref-type="bibr" rid="r5">Cand&egrave;s and Tao, 2007</xref>), which is given below for binary classification</p>
				<p>min || w || subject to || X<sup>T</sup>(y - Xw) ||<sub>&infin;</sub> &le; ( 1 + t<sup>-1</sup>) &radic;(2logd.&sigma;) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(20) </p>
				<p>where t is model parameter, ands is the noise standard deviation. Dantzig selector for multicategory classification can be similarly defined.</p>
				<p>Both lasso and Dantzig selector for classification, however, still need to select optimized model parameters by
model selection procedure, such as cross validation.</p>
		</sec>
		<sec>
			<title>Conclusion</title>
					<p>In this paper, we have described a specialized regression method for cancer diagnosis using expression data. The new approach, called l<sub>1</sub> least square, casts linear regression as a constrained l<sub>1</sub>-norm minimization problem to overcome the major drawback of least square for classification: lack of robustness to outliers. Besides binary classifier, multicategory l<sub>1</sub> least square including OVO and OVR approaches are also proposed.</p>
			<p>Numerical experiment shows that OVR- l<sub>1</sub> least square can match the best performance achieved by SVMs with careful model selection. The main advantage of l<sub>1</sub> least square over other methods including SVMs is that it has no
need of model selection. As a result, the method based on l<sub>1</sub> least square is totally automatic. l<sub>1</sub> least square also has the potential to be used for gene selection.</p>
<p>The l<sub>1</sub> least square classifier may become a promising automatic cancer diagnosis tool by consistently distinguishing
gene profile classes. Those genes with great absolute regression coefficients may serve as biological marker candidates for further investigation.</p>			
			</sec>
	</body>
	<back>	
<ref-list>
			<title>References</title>
				<ref id="r1">
				<citation citation-type="book">
							<person-group>
							<name>
								<surname>Bishop</surname>
								<given-names>CM</given-names>
							</name>														
							</person-group>
							<year>2006</year>
							<article-title>Pattern recognition and machine learning</article-title>							
							<publisher-name>Springer</publisher-name>
							<publisher-loc>New York</publisher-loc> 						
			</citation>
			</ref>
			<ref id="r2">
				<citation citation-type="journal">
						<person-group>
						<name>
							<surname>Cand&egrave;s</surname>
							<given-names>E</given-names>
						</name>
						<name>
							<surname>Romberg</surname>
							<given-names>J</given-names>
						</name>
						<name>
							<surname>Tao</surname>
							<given-names>T</given-names>
						</name>
						</person-group>
						<year>2006</year>
						<article-title>Robust uncertainty principles: Exact signal reconstruction from highly incomplete
frequency information</article-title>
						<source>IEEE Trans. on Information Theory</source>
						<volume>52</volume>
						<fpage>489</fpage>	
						<lpage>509</lpage>					
			</citation>
			</ref>
			<ref id="r3">
				<citation citation-type="journal">
						<person-group>
						<name>
							<surname>Cand&egrave;s</surname>
							<given-names>E</given-names>
						</name>
						<name>
							<surname>Tao</surname>
							<given-names>T</given-names>
						</name>
						</person-group>
						<year>2006</year>
						<article-title>Near optimal signal recovery from random projections: Universal encoding strategies?</article-title>
						<source>IEEE Trans. on Information Theory</source>
						<volume>52</volume>
						<fpage>5406</fpage>
						<lpage>5425</lpage>
			</citation>
			</ref>
			<ref id="r4">
				<citation citation-type="web">
						<person-group>
						<name>
							<surname>Cand&egrave;s</surname>
							<given-names>E</given-names>
						</name>
						<name>
							<surname>Romberg</surname>
							<given-names>J</given-names>
						</name>
						</person-group>
						<year>2006</year>
						<article-title>l1 -magic: A Collection of MATLAB Routines for Solving the Convex Optimization
Programs Central to Compressive Sampling</article-title>
 <comment>[Online] Available: 
    <ext-link ext-link-type="uri" xlink:href="www.acm.caltech.edu/l1magic/">www.acm.caltech.edu/l1magic/</ext-link></comment>								
			</citation>
			</ref>
			<ref id="r5">
				<citation citation-type="journal">
						<person-group>
						<name>
							<surname>Cand&egrave;s</surname>
							<given-names>E</given-names>
						</name>
						<name>
							<surname>Tao</surname>
							<given-names>T</given-names>
						</name>
						</person-group>
						<year>2007</year>
						<article-title>The Dantzig selector: Statistical estimation when p is much larger than n</article-title>
						<source>Ann Statist</source>
						<volume>35</volume>
						<fpage>2313</fpage>
						<lpage>2351</lpage>
			</citation>
			</ref>
			<ref id="r6">
				<citation citation-type="journal">
						<person-group>
						<name>
							<surname>Chen</surname>
							<given-names>S</given-names>
						</name>
						<name>
							<surname>Donoho</surname>
							<given-names>D</given-names>
						</name>
						<name>
							<surname>Saunders</surname>
							<given-names>M</given-names>
						</name>
						</person-group>
						<year>2001</year>
						<article-title>Atomic decomposition by basis pursuit</article-title>
						<source>SIAM Rev</source>
						<volume>43</volume>
						<fpage>129</fpage>
						<lpage>159</lpage>
			</citation>
			</ref>
			<ref id="r7">
				<citation citation-type="confproc">
						<person-group>
						<name>
							<surname>Crammer</surname>
							<given-names>K</given-names>
						</name>
						<name>
							<surname>Singer</surname>
							<given-names>Y</given-names>
						</name>
						</person-group>
						<year>2000</year>
						<article-title>On the learnability and design of output codes for multiclass problems</article-title>
						<conf-name>Proceedings of the Thirteen Annual Conference on Computational Learning Theory</conf-name>
 						<publisher-name>Standford University Palo Alto CA</publisher-name>  				
			</citation>
			</ref>
			<ref id="r8">
				<citation citation-type="journal">
						<person-group>
						<name>
							<surname>Diaz-Uriarte</surname>
							<given-names>R</given-names>
						</name>
						<name>
							<surname>Alvarez de Andres</surname>
							<given-names>S</given-names>
						</name>
						</person-group>
						<year>2006</year>
						<article-title>Gene selection and classification of microarray data using random forest</article-title>
						<source>BMC Bioinformatics</source>
						<volume>7</volume>
						<fpage>3</fpage>
			</citation>
			</ref>
			<ref id="r9">
				<citation citation-type="journal">
						<person-group>
						<name>
							<surname>Donoho</surname>
							<given-names>D</given-names>
						</name>			
						</person-group>
						<year>2006</year>
						<article-title>Compressed sensing</article-title>
						<source>IEEE Trans on Information Theory</source>
						<volume>52</volume>
						<fpage>1289</fpage>
						<lpage>1306</lpage>						
			</citation>
			</ref>
			<ref id="r10">
				<citation citation-type="journal">
						<person-group>
						<name>
							<surname>Dudoit</surname>
							<given-names>S</given-names>
						</name>
						<name>
							<surname>Fridlyand</surname>
							<given-names>J</given-names>
						</name>
						<name>
							<surname>Speed</surname>
							<given-names>TP</given-names>
						</name>
						</person-group>
						<year>2002</year>
						<article-title>Comparison of discrimination methods for the classification of tumors using gene expression data</article-title>
						<source>J Am Stat Assoc</source>
						<volume>97</volume>
						<fpage>77</fpage>
						<lpage>87</lpage>
			</citation>
			</ref>
			<ref id="r11">
				<citation citation-type="web">
						<person-group>
						<name>
							<surname>Friedlander</surname>
							<given-names>M</given-names>
						</name>
						<name>
							<surname>Van den Berg</surname>
							<given-names>E</given-names>
						</name>
						</person-group>
						<year>2008</year>
						<article-title>SPGL1, a solver for large scale sparse reconstruction</article-title>
						  <comment>[Online] Available: 
    <ext-link ext-link-type="uri" xlink:href="www.cs.ubc.ca/labs/scl/spgl1/">http://www.cs.ubc.ca/labs/scl/spgl1/</ext-link>
	</comment>
			</citation>
			</ref>
			<ref id="r12">
				<citation citation-type="journal">
						<person-group>
						<name>
							<surname>Gibbons</surname>
							<given-names>JD</given-names>
						</name>						
						</person-group>
						<year>2003</year>
							<article-title>Nonparametric Statistical Inference</article-title>	
							<edition>4th edition</edition>						
							<publisher-name>CRC</publisher-name>					
			</citation>
			</ref>
			<ref id="r13">
				<citation citation-type="journal">
						<person-group>
						<name>
							<surname>Hastie</surname>
							<given-names>T</given-names>
						</name>
						<name>
							<surname>Tibshirani</surname>
							<given-names>R</given-names>
						</name>
						<name>
							<surname>Friedman</surname>
							<given-names>J</given-names>
						</name>
						</person-group>
						<year>2001</year>
						<article-title>The elements of statistical learning</article-title>						
						<source>New York: Springer</source>					
			</citation>				
			</ref>
			<ref id="r14">
				<citation citation-type="book">
						<person-group>
						<name>
							<surname>Kressel</surname>
							<given-names>U</given-names>
						</name>
						</person-group>
						<year>1999</year>
						<article-title>Pairwise classification and support vector machines</article-title>						
						<source>In Advances in Kernel Methods: Support Vector Learning, (Chapter 15.)</source>  						
  						<publisher-loc>Cambridge</publisher-loc>
 						<publisher-name>MA: MIT Press</publisher-name>  						
			</citation>
			</ref>
			<ref id="r15">
				<citation citation-type="journal">
						<person-group>
						<name>
							<surname>Lee</surname>
							<given-names>JW</given-names>
						</name>
						<name>
							<surname>Lee</surname>
							<given-names>JB</given-names>
						</name>
						<name>
							<surname>Park</surname>
							<given-names>M</given-names>
						</name>		
						<name>
							<surname>Song</surname>
							<given-names>SH</given-names>
						</name>
						</person-group>
						<year>2005</year>
						<article-title>An extensive comparison of recent classification tools applied to microarray data</article-title>
						<source>Computational Statistics &amp; Data Analysis</source>
						<volume>48</volume>
						<fpage>869</fpage>
						<lpage>885</lpage>
			</citation>
			</ref>
			<ref id="r16">
				<citation citation-type="journal">
						<person-group>
						<name>
							<surname>Nutt</surname>
							<given-names>CL</given-names>
						</name>
						<name>
							<surname>Mani</surname>
							<given-names>DR</given-names>
						</name>		
						<name>
							<surname>Betensky</surname>
							<given-names>RA</given-names>
						</name>
						<name>
							<surname>Tamayo</surname>
							<given-names>P</given-names>
						</name>		
						<name>
							<surname>Cairncross</surname>
							<given-names>JG</given-names>
						</name>	<etal/>	
						</person-group>
						<year>2003</year>
						<article-title>Gene expression-based classification of malignant gliomas correlates better with survival than histological classification</article-title>
						<source>Cancer Res</source>
						<volume>63</volume>
						<fpage>1602</fpage>
						<lpage>1607</lpage>
			</citation>
			</ref>
			<ref id="r17">
				<citation citation-type="book">
						<person-group>
						<name>
							<surname>Platt</surname>
							<given-names>JC</given-names>
						</name>
						<name>
							<surname>Cristianini</surname>
							<given-names>N</given-names>
						</name>		
						<name>
							<surname>Shawe-Taylor</surname>
							<given-names>J</given-names>
						</name>				
						</person-group>
						<year>2000</year>
						<article-title>Large margin DAGS for multiclassclassification</article-title>
						<source>In Advances in Neural Information Processing Systems 12</source>    					
 						<publisher-name>MIT Press</publisher-name>  
			</citation>
			</ref>
			<ref id="r18">
				<citation citation-type="journal">
						<person-group>
						<name>
							<surname>Pomeroy</surname>
							<given-names>SL</given-names>
						</name>
						<name>
							<surname>Tamayo</surname>
							<given-names>P</given-names>
						</name>		
						<name>
							<surname>Gaasenbeek</surname>
							<given-names>M</given-names>
						</name>
						<name>
							<surname>Sturla</surname>
							<given-names>LM</given-names>
						</name>		
						<name>
							<surname>Angelo</surname>
							<given-names>M</given-names>
						</name>	<etal/>	
						</person-group>
						<year>2002</year>
						<article-title>Prediction of central nervous system embryonal tumour outcome based on gene expression</article-title>
						<source>Nature</source>
						<volume>415</volume>
						<fpage>436</fpage>
						<lpage>442</lpage>						
			</citation>
			</ref>
			<ref id="r19">
				<citation citation-type="web">
						<person-group>
						<name>
							<surname>Saunders</surname>
							<given-names>M</given-names>
						</name>					
						</person-group>
						<year>2002</year>
						<article-title>PDCO: Primal-Dual Interior Method for Convex Objectives</article-title>
						<comment>[Online] Available: 
    <ext-link ext-link-type="uri" xlink:href="www.stanford.edu/group/SOL/software/pdco.html">http://
www.stanford.edu/group/SOL/software/pdco.html</ext-link>
	</comment>
			</citation>
			</ref>	
			<ref id="r20">
				<citation citation-type="journal">
						<person-group>
						<name>
							<surname>Shipp</surname>
							<given-names>MA</given-names>
						</name>
						<name>
							<surname>Ross</surname>
							<given-names>KN</given-names>
						</name>		
						<name>
							<surname>Tamayo</surname>
							<given-names>P</given-names>
						</name>
						<name>
							<surname>Weng</surname>
							<given-names>AP</given-names>
						</name>		
						<name>
							<surname>Kutok</surname>
							<given-names>JL</given-names>
						</name>	<etal/>	
						</person-group>
						<year>2002</year>
						<article-title>Diffuse large B-cell lymphoma outcome prediction by gene expression profiling and supervised machine learning</article-title>
						<source>Nat Med</source>
						<volume>8</volume>
						<fpage>68</fpage>
						<lpage>74</lpage>
			</citation>
			</ref>	
			<ref id="r21">
				<citation citation-type="journal">
						<person-group>
						<name>
							<surname>Statnikov</surname>
							<given-names>A</given-names>
						</name>
						<name>
							<surname>Wang</surname>
							<given-names>L</given-names>
						</name>		
						<name>
							<surname>Aliferis</surname>
							<given-names>CF</given-names>
						</name>						
						</person-group>
						<year>2008</year>
						<article-title>A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification</article-title>
						<source>BMC Bioinformatics</source>
						<volume>9</volume>
						<fpage>319</fpage>
			</citation>
			</ref>	
			<ref id="r22">
				<citation citation-type="journal">
						<person-group>
						<name>
							<surname>Staunton</surname>
							<given-names>JE</given-names>
						</name>
						<name>
							<surname>Slonim</surname>
							<given-names>DK</given-names>
						</name>	
						<name>
							<surname>Coller</surname>
							<given-names>HA</given-names>
						</name>
						<name>
							<surname>Tamayo</surname>
							<given-names>P</given-names>
						</name>	
						<name>
							<surname>Angelo</surname>
							<given-names>MJ</given-names>
						</name>		<etal/>				
						</person-group>
						<year>2001</year>
						<article-title>Chemosensitivity prediction by transcriptional profiling</article-title>
						<source>Proc Natl Acad Sci USA</source>	
						<volume>98</volume>
						<fpage>10787</fpage>
						<lpage>10792</lpage>				
			</citation>
			</ref>	
			<ref id="r23">
				<citation citation-type="journal">
						<person-group>
						<name>
							<surname>Statnikov</surname>
							<given-names>A</given-names>
						</name>
						<name>
							<surname>Aliferis</surname>
							<given-names>CF</given-names>
						</name>		
						<name>
							<surname>Tsamardinos</surname>
							<given-names>I</given-names>
						</name>
						<name>
							<surname>Hardin</surname>
							<given-names>D</given-names>
						</name>			
						<name>
							<surname>Levy</surname>
							<given-names>S</given-names>
						</name> 
						</person-group>
						<year>2005</year>
						<article-title>A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis</article-title>
						<source>Bioinformatics</source>
						<volume>21</volume>
						<fpage>631</fpage>
						<lpage>643</lpage>
			</citation>
			</ref>	
			<ref id="r24">
				<citation citation-type="journal">
						<person-group>
						<name>
							<surname>Singh</surname>
							<given-names>D</given-names>
						</name>
						<name>
							<surname>Febbo</surname>
							<given-names>PG</given-names>
						</name>	
						<name>
							<surname>Ross</surname>
							<given-names>K</given-names>
						</name>
						<name>
							<surname>Jackson</surname>
							<given-names>DG</given-names>
						</name>
						<name>
							<surname>Manola</surname>
							<given-names>J</given-names>
						</name><etal/>
						</person-group>
						<year>2002</year>
						<article-title>Gene expression correlates of clinical prostate cancer behavior</article-title>
						<source>Cancer Cell</source>						
						<fpage>203</fpage>
						<lpage>209</lpage>
			</citation>
			</ref>	
			<ref id="r25">
				<citation citation-type="journal">
						<person-group>
						<name>
							<surname>Su</surname>
							<given-names>AI</given-names>
						</name>
						<name>
							<surname>Welsh</surname>
							<given-names>JB</given-names>
						</name>	
						<name>
							<surname>Sapinoso</surname>
							<given-names>LM</given-names>
						</name>
						<name>
							<surname>Kern</surname>
							<given-names>SG</given-names>
						</name>
						<name>
							<surname>Dimitrov</surname>
							<given-names>P</given-names>
						</name><etal/>
						</person-group>
						<year>2001</year>
						<article-title>Molecular classification of human carcinomas by use of gene expression signatures</article-title>
						<source>Cancer Res</source>	
						<volume>61</volume>					
						<fpage>7388</fpage>
						<lpage>7393</lpage>
			</citation>
			</ref>		
			<ref id="r26">
				<citation citation-type="journal">
						<person-group>
						<name>
							<surname>Tibshirani</surname>
							<given-names>R</given-names>
						</name>
						</person-group>
						<year>1996</year>
						<article-title>Regression shrinkage and selection via the lasso</article-title>
						<source>J Roy Statist Soc ser B</source>
						<volume>58</volume>
						<fpage>267</fpage>
						<lpage>288</lpage>
			</citation>
			</ref>			
			<ref id="r27">
				<citation citation-type="web">				
						<source>The MOSEK Optimization Tools Version 2.5. User&rsquo;s Manual and Reference 2002</source>
						<comment>[Online] Available: 
    <ext-link ext-link-type="uri" xlink:href="www.mosek.com">www.mosek.com</ext-link>
	</comment>
			</citation>
			</ref>				
			<ref id="r28">
				<citation citation-type="journal">
						<person-group>
						<name>
							<surname>Van den Berg</surname>
							<given-names>E</given-names>
						</name>
						<name>
							<surname>Friedlander</surname>
							<given-names>M</given-names>
						</name>
						</person-group>
						<year>2008</year>
						<article-title>Probing the Pareto frontier for basis pursuit solution</article-title>		
						<source>Technical Report 2008, Department of Computer Science, University of British Columbia</source>	 
			</citation>
			</ref>	
			<ref id="r29">
				<citation citation-type="journal">
						<person-group>
						<name>
							<surname>Vapnik</surname>
							<given-names>VN</given-names>
						</name> 
						</person-group>
						<year>1998</year>
						<article-title>Statistical learning theory</article-title> 
						<source>New York: Wiley</source>
			</citation>
			</ref>	
			<ref id="r30">
				<citation citation-type="confproc">
						<person-group>
						<name>
							<surname>Weston</surname>
							<given-names>J</given-names>
						</name>
						<name>
							<surname>Watkins</surname>
							<given-names>C</given-names>
						</name>	
						</person-group>
						<year>1999</year>
						<article-title>Support vector machines for multi-class pattern recognition</article-title>
						<conf-name>In Proceedings of the Seventh European Symposium On Artificial Neural Networks (ESANN 99) Bruges</conf-name>
 					  <conf-date>April 21-23</conf-date>	
			</citation>
			</ref>				
</ref-list> 
		</back>	
		<floats-wrap>
		<table-wrap position="float" id="t1">
	<label>Table 1.</label>
  			<caption>
  				<title>Performance without gene selection.</title>			
  			</caption>
   <table frame="hsides" rules="groups">
      <thead>	  	
         <tr>          	
			<th align="left">Methods</th>
            <th align="left">Prostate Tumor</th>
            <th align="left">DLBCL</th>
			<th align="left">9 Tumors</th>	
			<th align="left">11 Tumors</th>
            <th align="left">Brain Tumor1</th>
			<th align="left">Brain Tumor2</th>													
         </tr>		
      </thead>
      <tbody>        
         <tr>
           <td rowspan="6">SVM</td>
		   <td>Binary</td>
		   <td><bold>93.27%</bold></td>
		   <td><bold>97.32%</bold></td>
		   <td>N/A</td>
		   <td>N/A</td>
		   <td>N/A</td>
		   <td>N/A</td>		  	   													
         </tr>
		 <tr>
		 	<td>OVR</td>
			<td><bold>93.27%</bold></td>
			<td><bold>97.32%</bold></td>
			<td>67.06%</td>
			<td>94.99%</td>
			<td><bold>90%</bold></td>
			<td>75.5%</td>
		</tr>
		<tr>
			<td>OVO</td>
			<td><bold>93.27%</bold></td>
			<td><bold>97.32%</bold></td>
			<td>54.63%</td>
			<td>90.22%</td>
			<td><bold>90%</bold></td>
			<td>73.83%</td>
		</tr>
		<tr>
			<td>DAGSVM</td>
			<td><bold>93.27%</bold></td>
			<td><bold>97.32%</bold></td>
			<td>54.63%</td>
			<td>90.22%</td>
			<td><bold>90%</bold></td>
			<td>73.83%</td>
		</tr>
		<tr>
			<td>WW</td>
			<td><bold>93.27%</bold></td>
			<td><bold>97.32%</bold></td>
			<td>68.17%</td>
			<td>94.31%</td>
			<td><bold>90%</bold></td>
			<td><bold>77.17%</bold></td>
		</tr>
		<tr>
			<td>CS</td>
			<td><bold>93.27%</bold></td>
			<td><bold>97.32%</bold></td>
			<td>68.17%</td>
			<td>94.31%</td>
			<td><bold>90%</bold></td>
			<td>75.5%</td>	
		</tr>
		<tr>
			<td rowspan="3">l<sub>1</sub>LRC</td>
			<td>Binary</td>
			<td>91.36%</td>
			<td>96.07%</td>
			<td>N/A</td>
			<td>N/A</td>
			<td>N/A</td>
			<td>N/A</td>
		</tr>
		<tr>
			<td>OVR</td>
			<td>91.36%</td>
			<td>96.07%</td>
			<td><bold>72.21%</bold></td>
			<td><bold>96.63%</bold></td>
			<td><bold>90%</bold></td>
			<td>76.67</td>
		</tr>
		<tr>
			<td>OVO</td>
			<td>91.36%</td>
			<td>96.07%</td>
			<td>55.33%</td>
			<td>91.93%</td>
			<td><bold>90%</bold></td>
			<td>77.00%</td>
		</tr>		
		 </tbody>
		 </table>
		 </table-wrap>
		 <table-wrap position="float" id="t2">
	<label>Table 2.</label>
  			<caption>
  				<title>Performance with gene selection.</title>			
  			</caption>
   <table frame="hsides" rules="groups">
      <thead>	  	
         <tr>          	
			<th align="left">Methods</th>
            <th align="left">Prostate Tumor</th>
            <th align="left">DLBCL</th>
			<th align="left">9 Tumors</th>	
			<th align="left">11 Tumors</th>
            <th align="left">Brain Tumor1</th>
			<th align="left">Brain Tumor2</th>													
         </tr>		
      </thead>
      <tbody>        
         <tr>
           <td rowspan="3">SVM</td>
		   <td>Accuracy</td>
		   <td><bold>94.27%</bold></td>
		   <td><bold>98.75%</bold></td>
		   <td>72.89%</td>
		   <td><bold>96.66%</bold></td>
		   <td><bold>90%</bold></td>
		   <td><bold>82.83%</bold></td>		  	   													
         </tr>
		 <tr>
		 	<td>Variant</td>
			<td>OVO</td>
			<td>OVO</td>
			<td>CS</td>
			<td>OVR</td>
			<td>WW</td>
			<td>OVR</td>
		</tr>
		<tr>
			<td>GS</td>
			<td>KW 1000</td>
			<td>KW 500</td>
			<td>BW 3000</td>
			<td>KW 1000</td>
			<td>NG</td>
			<td>KW 500</td>
		</tr>		
		<tr>
			<td rowspan="3">OVR l<sub>1</sub>LRC</td>
			<td>Accuracy</td>
			<td>94.18%</td>
			<td><bold>98.75%</bold></td>
			<td><bold>75.69%</bold></td>
			<td><bold>96.66%</bold></td>
			<td><bold>90%</bold></td>
			<td>78.33%</td>
		</tr>
		<tr>
			<td>GS</td>
			<td>BW 3050</td>
			<td>BW 500</td>
			<td>KW 1060</td>
			<td>KW 2000</td>
			<td>NG</td>
			<td>BW 9000</td>
		</tr>	
		 </tbody>
		 </table>
		 </table-wrap>
		 </floats-wrap>
</article>
