Author(s): Chuang LY, Huang HC, Lin MC, Yang CH
Abstract Share this page
Abstract BACKGROUND: Regions with abundant GC nucleotides, a high CpG number, and a length greater than 200 bp in a genome are often referred to as CpG islands. These islands are usually located in the 5' end of genes. Recently, several algorithms for the prediction of CpG islands have been proposed. METHODOLOGY/PRINCIPAL FINDINGS: We propose here a new method called CPSORL to predict CpG islands, which consists of a complement particle swarm optimization algorithm combined with reinforcement learning to predict CpG islands more reliably. Several CpG island prediction tools equipped with the sliding window technique have been developed previously. However, the quality of the results seems to rely too much on the choices that are made for the window sizes, and thus these methods leave room for improvement. CONCLUSIONS/SIGNIFICANCE: Experimental results indicate that CPSORL provides results of a higher sensitivity and a higher correlation coefficient in all selected experimental contigs than the other methods it was compared to (CpGIS, CpGcluster, CpGProd and CpGPlot). A higher number of CpG islands were identified in chromosomes 21 and 22 of the human genome than with the other methods from the literature. CPSORL also achieved the highest coverage rate (3.4\%). CPSORL is an application for identifying promoter and TSS regions associated with CpG islands in entire human genomic. When compared to CpGcluster, the islands predicted by CPSORL covered a larger region in the TSS (12.2\%) and promoter (26.1\%) region. If Alu sequences are considered, the islands predicted by CPSORL (Alu) covered a larger TSS (40.5\%) and promoter (67.8\%) region than CpGIS. Furthermore, CPSORL was used to verify that the average methylation density was 5.33\% for CpG islands in the entire human genome.
This article was published in PLoS One
and referenced in Journal of Proteomics & Bioinformatics