Author(s): Yu ZG, Anh V, Lau KS
Abstract Share this page
Abstract This paper introduces the notion of measure representation of DNA sequences. Spectral analysis and multifractal analysis are then performed on the measure representations of a large number of complete genomes. The main aim of this paper is to discuss the multifractal property of the measure representation and the classification of bacteria. From the measure representations and the values of the D(q) spectra and related C(q) curves, it is concluded that these complete genomes are not random sequences. In fact, spectral analyses performed indicate that these measure representations, considered as time series, exhibit strong long-range correlation. Here the long-range correlation is for the K-strings with dictionary ordering, and it is different from the base pair correlations introduced by other people. For substrings with length K=8, the D(q) spectra of all organisms studied are multifractal-like and sufficiently smooth for the C(q) curves to be meaningful. With the decreasing value of K, the multifractality lessens. The C(q) curves of all bacteria resemble a classical phase transition at a critical point. But the "analogous" phase transitions of chromosomes of nonbacteria organisms are different. Apart from chromosome 1 of C. elegans, they exhibit the shape of double-peaked specific heat function. A classification of genomes of bacteria by assigning to each sequence a point in two-dimensional space (D(-1),D1) and in three-dimensional space (D(-1),D1,D(-2)) was given. Bacteria that are close phylogenetically are almost close in the spaces (D(-1),D1) and (D(-1),D1,D(-2)).
This article was published in Phys Rev E Stat Nonlin Soft Matter Phys
and referenced in Journal of Computer Science & Systems Biology