Author(s): Collins JR, Stephens RM, Gold B, Long B, Dean M,
Abstract Share this page
Abstract The current pace of the generation of sequence data requires the development of software tools that can rapidly provide full annotation of the data. We have developed a new method for rapid sequence comparison using the exact match algorithm without repeat masking. As a demonstration, we have identified all perfect simple tandem repeats (STR) within the draft sequence of the human genome. The STR elements (chromosome, position, length and repeat subunit) have been placed into a relational database. Repeat flanking sequence is also publicly accessible at http://grid.abcc.ncifcrf.gov. To illustrate the utility of this complete set of STR elements, we documented the increased density of potentially polymorphic markers throughout the genome. The new STR markers may be useful in disease association studies because so many STR elements manifest multiallelic polymorphism. Also, because triplet repeat expansions are important for human disease etiology, we identified trinucleotide repeats that exist within exons of known genes. This resulted in a list that includes all 14 genes known to undergo polynucleotide expansion, and 48 additional candidates. Several of these are non-polyglutamine triplet repeats. Other examinations of the STR database demonstrated repeats spanning splice junctions and identified SNPs within repeat elements.
This article was published in Genomics
and referenced in Journal of Computer Science & Systems Biology