Past, Present and Future of Molecular Technology Applications for the Epidemiology of Bacterial Diseases

Betsy J. Bricker

doi:10.4172/2155-9872.S10-001

ISSN: 2155-9872

Journal of Analytical & Bioanalytical Techniques

Make the best use of Scientific Research and information from our 700+ peer reviewed, Open Access Journals that operates with the help of 50,000+ Editorial Board Members and esteemed reviewers and 1000+ Scientific associations in Medical, Clinical, Pharmaceutical, Engineering, Technology and Management Fields.

Meet Inspiring Speakers and Experts at our 3000+ Global Conferenceseries Events with over 600+ Conferences, 1200+ Symposiums and 1200+ Workshops on Medical, Pharma, Engineering, Science, Technology and Business

Past, Present and Future of Molecular Technology Applications for the Epidemiology of Bacterial Diseases

Betsy J. Bricker^*
National Animal Disease Center - National Centers for Animal Health, Agricultural Research Service, Department of Agriculture, PO Box 70, 1920 Dayton Rd, Ames, IA 50010 USA
Corresponding Author :	Betsy Bricker National Animal Disease Center -National Centers for Animal Health Agricultural Research Service, Department of Agriculture PO Box 70, 1920 Dayton Rd, Ames, IA 50010 USA Tel: 1-515-337-7310 Fax: 1-515-337-6256 E-mail: bbricker@ars.usda.gov
Received November 04, 2011; Accepted November 16, 2011; Published November 19, 2011
Citation: Bricker BJ (2011) Past, Present and Future of Molecular Technology Applications for the Epidemiology of Bacterial Diseases. J Anal Bioanal Tech S10:001. doi: 10.4172/2155-9872.S10-001
Copyright: © 2011 Bricker BJ. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Related article at Pubmed Scholar Google

Visit for more related articles at Journal of Analytical & Bioanalytical Techniques

View PDF Download PDF Tables & Figures

Abstract

The evolution of molecular technologies has had a major impact on many fields of research, including epidemiology. At the core of this branch of epidemiology is the need for high specificity typing of disease agents: to confirm trace back of disease to origin, to monitor the spread of disease causing strains, to study population dynamics of the disease strain, to discern endemic/enzootic from epidemic/epizootic infections, to detect the presence of multiple strain (s) in the population and/or individual, to identify modes of transmission of the disease agent from host to host, and to address other epidemiological questions or issues. Molecular subtyping has been generally found to be better than most traditional phenotypic subtyping methods because it is usually more discriminating and less influenced by the organisms’ responses to environmental cues. A large number of molecular techniques have been adapted for application to epidemiological issues, and different techniques are needed for different aspects of investigation. This review examines the most commonly used techniques for typing and/or characterizing bacteria for epidemiological purposes. It includes a historical perspective to help explain why certain techniques may be preferred over others, as well as a prediction of the future directions that epidemiologists may go in applying molecular technologies for their work.

Keywords
Molecular epidemiology; Bacterial strain typing; Molecular probes; PFGE, RAPD-PCR; AFLP; MLVA; MLST; Whole genomic sequencing
Abbreviations
AFLP/fAFLP: Amplified Fragment Length Polymorphism/fluorescent-AFLP; AGE: Agarose Gel Electrophoresis; AP-PCR: Arbitrary Primed-PCR; bp: base pair; CE: Capillary Electrophoresis; CGH: Comparative Genomic Hybridization; ERICPCR: Enterobacterial Repetitive Intergenic Consensus sequence-PCR; in/dels: insertions and/or deletions; IS-typing: Insertion Sequence element Typing; kDa: kilodalton; MIRU-PCR: Mycobacterial Interspersed Repetitive Units–PCR; MLST: Multi-Locus Sequence Typing; MLVA: Multi-Locus VNTR Analysis; MRSA: Methicillin Resistant Staphylococcus aureus; mu-RT-PCR: Multiplex Real- Time PCR; nt: nucleotide; PCR: Polymerase Chain Reaction; PFGE: Pulsed-Field Gel Electrophoresis; RAPD-PCR: Random Amplified Polymorphic DNA-PCR; REP-PCR: Repetitive Element Repeat-PCR; RFLP: Restriction Fragment Length Polymorphism; RISA: Ribosomal Intergenic Spacer Analysis; rRNA: ribosomal RNA; RT-PCR: Real- Time PCR; SBT: Sequence-Based Typing; SNP: Single Nucleotide Polymorphism; SNR: Single Nucleotide Repeat Sequencing; Spoligotyping: Spacer Oligonucleotide Typing; STRs: Short tandem repeats; VNTR: Variable Number Tandem Repeats; WGS: Whole-Genome Sequencing; WHO: World Health Organization
Introduction
Whether it is from the perspective of a global pandemic or a localized incident, bacterial diseases place a burden on plant, animal and human life. The occurrence of disease in any of these three has a detrimental effect on human social and economic activities. Some of the direct costs arise from medical expenses, lost wages and productivity, long term disability, and premature death. In the case of animal disease, economic distress can result from decreased animal reproduction, loss of weight or failure to grow, loss of markets, disposal of carcasses, possible quarantine, decontamination expenses, veterinary expenses, and possible zoonotic transmission; and in the case of plants, there can be loss of income from decreased or lost production yield, legal action, and/or loss of personal food or product supply.
In 2004, of the world’s ten most significant infectious diseases (for humans), four were bacterial: tuberculosis, with 7.8 million new cases and 1.7 million deaths; pertussis, with 18.4 million new cases and 300, 00 deaths; meningitis, with 0.7 million new cases and 200,000 deaths; and tetanus, with 300, 00 new cases and 163,000 deaths [1,2]. In addition, diarrheal diseases (bacterial, viral and parasitic), ranked fourth in importance, causing an estimated 5 billion episodes per year, of which 1.5 billion cases occurred in children under five years of age [3]. Cholera is a major cause of diarrheal disease. There are an estimated 3-5 million cases of cholera per year resulting in 100,000 to 200,000 deaths [4,5]. Determining the exact numbers is not possible since many diarrheal disease cases are unreported or unspecified. Disturbingly, cholera cases have been steadily increasing. Over the past three years, the World Health Organization reported 190,130 (2008); 221,226 (2009); and 317,534 (2010) confirmed cases [6]. Food borne bacterial diseases provide another major contribution to worldwide bacterial infections. It is estimated that 90% of the 6.5 to 33 million confirmed cases of foodborne disease per year are caused by pathogenic bacteria, resulting in about 9000 deaths [7]. Six species: Campylobacter jejuni, Clostridium perfringes, Escherichiacoli strain O157:H7, Listeria monocytogenes, Salmonella, and Staphylococcus aureus are estimated to cost the US economy$6.5 to $34.9 billion dollars per year, of which $2.9 to $6.7 billion are spent on food borne disease [8].
Epidemiological investigations are critical for successful control programs to decrease or eliminate a disease. The objectives of these inquiries are to identify the source of disease, means of transmission, scale of distribution, epidemic and pandemic potential (or extent), detection of asymptomatic carriers or reservoirs, and other factors associated with spread of the disease. To accomplish this, there must be a means of characterizing the specific strain of the disease agent that is responsible, so that the past, present and future dissemination of the causative strain can be tracked.
Most of the early discoveries in the field of molecular biology were made with bacteria or phages, because the prokaryote genome is much smaller and simpler than its eukaryotic counterpart. When new techniques were developed, they often required only minor modifications to be applied to other bacteria. As molecular technology has expanded exponentially over the last 3-4 decades, so has its application to many fields of study including epidemiology. Epidemiology is a complex field which interrogates the transmission, dissemination, and population dynamics of pathogens, and their host interactions, on a local and global scale. The issues addressed by epidemiology are so broad and diverse, that a large array of technical strategies is needed to address all the concerns. This review examines some of the most commonly used molecular DNA-based techniques and how they can be applied to various aspects of epidemiological investigation. To understand how and why certain methods are preferred, it is important to include a historical perspective to the presentation. It is also necessary to address bacterial diseases in a worldwide context, mindful that not all parts of the world have the same accessibility to the available methodologies. Despite the obvious benefit to use the latest and best technology, the cost of equipment, supplies, and a pool of skilled workers, may put these technologies out of reach for many of the countries which need them the most [2].
Common molecular technologies applied to the epidemiological study of bacterial diseases

At the heart of any epidemiological investigation are the needs to not only identify the disease agent, but also characterize it, based on its unique features, as well. The discrimination of genetically unrelated lineages into separate groups (subtypes) enables the tracking and study of individual pathogen populations. Historically, this was done by observing distinguishing traits, such as physical appearance and colony morphology; biochemical and biological properties; nutritional and physical requirements; metabolic processes and waste products; susceptibilities to antibiotics, toxins or phages; virulence; and/or antigenic properties (serotype). The problems with characterizing a pathogen on the basis of phenotype are that the expression of many of these traits can be influenced by the environmental growth conditions, and that easily discernible, discriminating traits may be very limited in number [9,10]. The classification of strains on the basis of nucleic acid composition has the advantage that these characteristics, e.g. genotype, are relatively stable, and are the ultimate basis for discernible phenotypic traits. Therefore, by applying the most appropriate technology for the type of data needed, molecular DNA-based typing of bacterial pathogens is considered the most definitive approach [11,12]. Table 1 contains a list of some of the most common technologies used by epidemiologists, and the fundamental approach upon which the technique is based. The following sections will describe in greater detail some of the most common methods of molecular technology that are being developed and used for epidemiological investigations, along with some related examples.
Nucleic Acid Hybridization
Restriction fragment length polymorphism and Southern blot analysis
In the early days of molecular typing for the purpose of epidemiology, most methods involved the use of various forms of nucleic acid hybridization and detection of the degree of nucleic acid homology between a probe and its target. One widely used technique involves digestion of purified DNA with restriction enzymes followed by size separation, usually accomplished by agarose gel electrophoresis (AGE) [171]. This technique, known as restriction fragment length polymorphism (or RFLP), indicates the presence, in the genomic or test DNA, of insertions, deletions, or nucleotide differences within the recognition sequence that is specific for the restriction enzyme chosen. While this works well for showing polymorphisms in viruses, bacteriophages and plasmids [52,61,171,172], the fragment profile of an entire bacterial genome is usually too complex to evaluate directly. For genome-level analysis, it is usually necessary to use a labeled probe which hybridizes with one or a few fragments of the size-fractionated, genomic DNA, followed by an appropriate method for detecting the probe [23,26,82,173,174]. A variety of DNA targets and elements have proven to be very useful for strain typing by this method, including: unique strain-specific or polymorphic genes [20,22], multicopy palindromic units [175,176], multicopy mobile genetic elements (e.g., insertion sequence elements and transposons) [25,26,173], small (2- bp to 25-bp) variable number nucleotide tandem repeats (VNTRs) [13,14]; and multicopy ribosomal RNA (rRNA) regions [15,16].
Dot blots and nucleic acid arrays
DNA/DNA hybridization array is another method for typing bacterial strains. Instead of using size fractionated DNA as the discriminating factor, as done in RFLP analysis, this technology uses selected fragments as the reference DNA which are immobilized in a predetermined array. DNA from the strain of interest is labeled, then allowed to hybridize with the immobilized array to see which loci the test strain shares with the array. The arrayed DNA can consist of synthetic oligonucleotides; PCR amplified products; or DNA loci cloned into phages, plasmids or cosmids [38,177,178]. The array can encompass a complete genome (e.g., whole genome tiling arrays [179,180]), open reading frames [181] or selected discriminating loci [43,182-184]. A mixed genome microarray (MGM) contain a set of loci from multiple genomes and is particularly useful for determining phylogenetic relationships without the bias that can occur with comparative genome hybridization based on a single genome [185].
Arrays can be spotted onto many different support surfaces [186], including nylon membranes [182], plastics [187], beads [44,47], and glass [188,189]. On macroarrays, the reference DNA is arrayed over a large area, often immobilized on a nylon or charged membrane. These arrays are easier to prepare in the average laboratory, but they require a correspondingly large amount of test DNA/RNA (several micrograms) [186]. The use of radioactivity to label the test DNA/RNA can significantly lower the amount needed without compromising the sensitivity [186,190]. Macroarrays with a large number of loci spotted onto the membrane are laborious to manually prepare, but there are handheld devices available that can spot up to 96 or 384 samples simultaneously from a microtiter plate [191] (e.g. http://www.vp scientific.com).
Although more expensive to produce, microarrays spotted onto glass slides have become popular because thousands of loci can be interrogated in one assay [41,177], including bacterial whole genome arrays. For epidemiological strain typing, comparative genomic hybridization (CGH) microarray analysis provides a large amount of data about the relative degrees of strain homology across the entire genome [180,192,193]. Specifically, microarrays composed of the complete set of open reading frames from the reference strain, have been shown to have considerable epidemiological value [194,195].
Direct DNA analyses
Methods characterizing purified bacterial DNA directly for the purpose of strain typing have been available for decades. The advantages with this approach are that typically the techniques are low cost, they do not depend on preparing labeled probes, and they do not rely on expensive enzymes for DNA amplification. The protocols for direct DNA analyses tend to be universal; that is, they can be applied to any bacterium with very little modification, and with very little prior knowledge about the organism. The two most commonly used methods, plasmid analysis and Pulsed-Field Gel Electrophoresis (PFGE), do require substantial amounts of test DNA. Therefore, the organism must be culturable so that adequate amounts of a pure culture can be used for DNA extraction. The third category of direct DNA characterization methods is DNA sequencing. In general, methods based on this approach are more expensive and require greater skill to perform.
Plasmid analysis
It has been known since the 1960’s that certain virulence and antibiotic resistance traits could be transferred between bacteria by plasmids [196-199]. Plasmids are much smaller than bacterial chromosomes, so they are more applicable to methods for bacterial typing. However, in the 1960’s and early 1970’s molecular analysis of plasmids consisted mainly of determining their size by centrifugal sedimentation through a density gradient of cesium chloride or sucrose [200,201]. In 1976, Meyers, et al. [202], demonstrated that plasmid DNA could be size fractionated by mobility within an electric field, through an agarose gel; it was an easy and inexpensive technique to perform. By the late 1970’s and into the 1980’s, plasmid analysis for epidemiological use was becoming an increasingly common technique [52,203-207]. Plasmid analysis was applied in two ways: first, by characterizing the number and size(s) of plasmids in the test strains’ genome [57,203,205,208-210]; then later, by restriction enzyme generated RFLP profiles [52,54,56,61,204]. It is not surprising that since many antibiotic resistance genes are carried on plasmids, application of plasmid analysis to epidemiology was predominantly used to track the spread of antibiotic resistance in nosocomial infections or the emergence of antibiotic resistant strains [55-57,203-205,208,211]. Plasmid analysis has also been used for other types of epidemiological studies, including: traceback of Salmonella to chocolate [212] and marijuana [51]; epidemiology of Edwardsiella in catfish [213,214]; traceback of bacteremia in hospital patients over a one year period due to Enterobacter cloacae and Klebsiella pneumoniae contaminated enteral nutrient solutions [215]; a retrospective phylogenetic analysis of 324 clinical enterobacterial isolates [216]; evidence of zoonotic transmission of Escherichia coli (VTEC) serotype O118 from cattle [217]; and as part of an epidemiology study of a Vibrio metschnikovii outbreak in children in Peru [218]. While these reports generally indicate the usefulness of plasmid typing, it has one major limitation. As pointed out by Mulligan [219] and others, there are many pathogenic bacteria that do not contain plasmids.
Pulsed-Field gel electrophoresis (PFGE)
In 1984, Schwartz and Cantor [220] used a different approach, although they still made use of the RFLP concept. They used macro restriction enzymes which recognized rare target sequences in the genome, cutting it into only a few very large fragments that could be directly evaluated for polymorphisms without the need for specific probes. The difficulty that Schwartz and Cantor overcame was how to separate by size the very large DNA fragments, as this is not possible with standard gel electrophoresis methods. They devised a gel system that used alternating pulses of electricity set at angles (usually 120°) relative to the top of the agarose gel. The time it takes for a DNA fragment to re-orient towards the direction of the anode is size dependent, so that the DNA fragments zigzag through the gel as a function of fragment size. Pulsed-Field Gel Electrophoresis (PFGE) detects genomic rearrangements, large insertions and deletions (in/dels), and sequence mutations within the restriction enzyme recognition site. These types of genomic changes accumulate at a steady rate that is relative for the specific organism. Selection of the macrorestriction enzyme is critical to success, as the optimum digest profile contains between 12-25 fragments. Although there are a few guidelines for enzyme selection, such as consideration of the G+C content of the genome, the best enzymes must be chosen empirically, or, if the complete genome sequence is available, in silico. A review article by Goering [67], lists the best restriction enzymes to use with each of 31 types of bacteria. Because PFGE is highly discriminating, and is relatively inexpensive to do, it has been the “gold standard” for molecular strain typing [221-223]. PFGE is probably the most widely used method for molecular based epidemiological studies at the present time [67].
PFGE typing has been applied to most of the important infectious bacteria. The CDC, in collaboration with a number of public health laboratories, has set up a network, called PulseNet, which is dedicated to epidemiology through PFGE typing of six major foodborne disease agents [224,225]. The purpose is to promote early detection and a rapid response to outbreaks in the US, so they can be contained quickly with the least amount of damage as possible [226]. The collaboration, which began in 1996 as FoodNet [227], has a complete set of standardized PFGE protocols (accessible at: http://www.cdc.gov/pulsenet/protocols. htm ) for each disease agent which it tracks [226,228,229]. Currently, PulseNet monitors six pathogens: Campylobacter jejuni, Escherichia coli strain O157:H7, Listeria monocytogenes, Salmonella, Shigella, and Yersinia pestis. With the early success of PulseNet for tracking foodborne outbreaks, it has served as a model for an International PulseNet [230], and a Latin America PulseNet.
DNA sequencing – the old fashion way
Although the new generations of DNA sequencing systems rely on in vitro amplification methods, the early sequencing protocols were applied directly to isolated DNA. Since DNA sequencing can only process small (<1000-bp) sections of DNA, the DNA was fragmented and the pieces were packaged by cloning into bacteriophage or plasmids and propagated in host bacteria, usually a non-pathogenic laboratory strain of E. coli, resulting in adequate amounts of test DNA. Two methods for DNA sequencing were commonly used prior to 1985. Maxam-Gilbert sequencing, first described in 1977 [231], is based on sequential chemical modification followed by degradation of the DNA molecule. Around the same time, Sanger published his method based on chain termination of primer extension by the irregular incorporation of A, C, G, or T, nucleotide analogues into the growing DNA strand. The analogues terminated any further addition of nucleotides [232,233]. Initially the Maxam-Gilbert method was preferred, but with improved analogue chemistries, the Sanger method became more popular. Both methods, however, involve complex procedures and initially required radioactive labeling. The technical skill level required, and need to clone the DNA fragment of interest (which took days or even weeks), made this technology generally unpopular for routine molecular epidemiology. One exception was a special modification of the Sanger sequencing method for sequence analysis of ribosomal RNA (rRNA) [145]. As rRNA is the most abundant form of RNA, and the size is limited to the rRNA operon, the rRNA is sequenced directly after treatment with reverse transcriptase to accommodate the change in nucleic acid type. This method has mainly been applied for identification, discrimination, and phylogenetic analyses at the species level, but there are a few published reports of its use for epidemiology and subtyping as well [139-144].
Molecular typing methods involving amplification of probe or target DNA
One drawback with the technologies in practice before 1988 was the requirement for relatively large quantities of DNA or RNA for analysis. For slow growing bacteria such as Mycobacterium and for non-culturable organisms, the procurement of sufficient target DNA was either time consuming or not possible at all. Everything changed with the introduction of the Polymerase Chain Reaction (PCR) in 1986, by Kary Mullis [234]. This technique enzymatically amplifies DNA in vitro, between two custom-selected hybridizing primer sequences, designed and synthesized to define the exact boundaries of the DNA sequence to be amplified. The technique was quickly improved by incorporating a thermostable polymerase into the DNA amplification protocol [235], allowing the process to be automated. At that point, molecular biology and genetics were forever changed.
The field of molecular epidemiology was likewise affected by the introduction of PCR amplification. New methods for molecular subtyping of bacterial strains were quickly developed and have been evolving ever since. The ability to amplify minute amounts of DNA has been exploited in many ways, including: as a way to synthesize highly specific probes [236]; as a means for identifying genomic rearrangements within or between repeated DNA sequences [106,165,237]; and as a way to directly amplify loci containing genetic mutations from genomic DNA [103,104,122]. Most new molecular techniques that are being reported involve some form of enzymatic amplification of the DNA or RNA.
PCR-RFLP
During the late 1980’s and throughout the 1990’s, PCR amplification greatly improved the acquisition of relatively large quantities of DNA. PCR typically requires some known DNA sequence from which to design the primers. The earliest PCR-based strain typing techniques focused on examining polymorphisms in one, or a few, genes or loci. This was because in the late 1980’s and 1990’s, DNA sequencing was still fairly low throughput, technically difficult and expensive. An example of an early PCR-based technique is PCR-RFLP. By this method, a gene or locus is amplified, followed by restriction enzyme treatment of the amplicon and size separation of the resulting fragments. Polymorphisms from genomic rearrangements (insertions, deletions, recombination, etc.) within the locus, or nucleotide changes in the restriction enzyme recognition sequence sites can easily be detected by agarose gel electrophoresis (AGE) [78-81,238]. The level of discrimination can be increased by adding more loci and repeating the assay with different restriction enzymes [78,239]. The main drawback to this approach is that in highly conserved genomes, there may not be sufficient DNA polymorphisms in these limited sequence targets to exhibit alleles.
Multiplex PCR (mu-PCR)
Multiplex PCR is a variation of PCR amplification that allows the compounding of multiple amplification reactions in a single tube by combining primer pairs which are directed at separate targets or separate regions on the same target. The complexing of primer pairs can be used to produce multiple fragments simultaneously, to generate alternative products (alleles) which vary among different targets, or to do both [85]. As an example, the AMOS-PCR assay [82,240], developed for the zoonotic pathogen Brucella, employs a complex of eight primers which form ten primer pairings to differentiate the four major species of Brucella and the two vaccine strains for Brucella abortus, S19 and RB51. The species and/or strain discrimination is accomplished by the production of amplicons which differ in number and size depending on the species and/or strain of the target DNA. The sizes and numbers of the amplified products are detected on an agarose gel or by fluorescent tags and capillary electrophoresis. Other multiplex PCR assays that have been used for molecular epidemiology include a 7-plex assay for Mycoplasma [45]; a 16 loci assay for Salmonella [83]; 13 pneumococcal serotypes [84], and an assay for discriminating among the four most common clone types of community-acquired MRSA [241]. There are several advantages of performing multiplex PCR reactions. They are faster to set up; they use less reagents than if each primer pair were tested separately, and they allow higher throughput with the thermocycler and with the detection method. The biggest drawback to multiplex PCR is that it is difficult to perfectly match a large group of primers for optimal performance. The conditions used for PCR, including the annealing temperature, MgCl2 concentration, dNTP concentration, pH, elongation time, primer concentration, and polymerase, must be methodically optimized and usually the multiplex assay parameters are a compromise of the optimal conditions for each primer pair with their specific target [85]. As a result, multiplex PCR reactions tend to be more fastidious than most standard PCR assays and less tolerant of variables in the reaction mix or introduced with the target DNA, and of inhibitors.
Real-time PCR (RT-PCR) and multiplex real time-PCR (mu- RT-PCR)
Real-time PCR is another adaptation of the standard PCR reaction. The basic premise behind this method is the use of a probe or special dye that binds to the double stranded DNA of the amplicons as they are synthesized in real time, indicating the presence or absence of the targeted sequence. It also can be applied to extrapolate how many copies of the target are present in the original sample. Although not quite as widely used as standard multiplex PCR, multiplex RT-PCR has been adapted for many epidemiological tests, including the subtyping of Staphylococcus aureus [88], Streptococcus pneumoniae [89], Brucella [91], and Mycobacterium [92]. One stumbling block for the application of RT-PCR to epidemiology and bacterial subtyping has been the difficulty in developing multiplex assays for this technology since the results are gathered optically without differentiation by product size. This problem has been addressed in a number of techniques, usually involving various types of melt-curve analysis. The primary use of multiplex RT-PCR, however, continues to be for identification of different bacteria rather than the nuances of strain differentiation needed for epidemiology. An example of this was published by Fukushima and colleagues [90]. They have developed a multiplex RT-PCR assay to identify 23 different pathogens from food. They were able to demonstrate its utility in 33 of 35 foodborne disease outbreaks; even when multiple laboratories and equipment were used for the analyses, the results were reproducible. In another example, Cheng and colleagues [87] developed a method for multiplex RT-PCR identification of different bacteria by targeting the 16S rRNA and using melt profiles to determine the level of sequence variation relative to a reference strain.
The biggest advantage offered by RT-PCR is speed. It isn’t necessary to wait until all the cycles have been completed to have an indication of what the final result will be. Nor is there a need for a separate detection step since the amplification and detection occur simultaneously. Results may be available in minutes instead of hours or days. In the event of a major disease outbreak or bioterrorist act, speed is critical for starting a rapid response to control the situation as quickly as possible.
Amplification with short random primers
As previously mentioned, when the publication describing PCR was first released in 1986, most DNA sequence analysis was done manually with radioactive or fluorescent probes for detection, and acrylamide gel electrophoresis for size separation of the analogue terminated sequencing fragments. The first bacterial genome sequence would not be completed for nearly a decade. The available DNA sequence was limited. In 1983, a publication by Feinberg and Vogelstein [242] introduced the idea of using random hexamer oligonucleotides as primers, to promote DNA polymerase synthesis from complementary DNA strands without sequence specific primers. Random priming has been particularly helpful if the sequence of the strand to be copied is not known.
In 1990, two independent laboratories adapted this concept to PCR amplification and published their findings in the same issue of Nucleic Acids Research. The two methods (Arbitrary Primed-PCR, or AP-PCR [99]; and Random Amplified Polymorphic DNA-PCR, or RAPD-PCR [100]), were based on the same principle. The target DNA is PCR amplified with a single, randomly chosen oligonucleotide primer under low stringency, to promote binding to multiple sites on the target. The resulting products are amplified from the loci that happen to have primer binding sites in the correct orientation and appropriate distance for amplification. Because the two protocols differed mainly in primer length and annealing temperature and a few minor parameters, the terms RAPD-PCR and AP-PCR are often used interchangeably. There are several advantages to this method of strain typing, including: that no prior DNA sequence information is needed; that the assay is simple to set up and perform; that the same primer and conditions can be used on different bacteria for testing; and that it can be performed at a relatively low cost with only a thermocycler and an agarose gel electrophoresis unit [97]. The results can reveal phylogenetic relationships as well, which makes this technique useful for global studies as well as localized outbreaks.
However, the decreased stringency comes at the cost of compromising the reproducibility of the results. RAPD and APPCR are based on the potentially tenuous pairing of a short primer to homologous regions of the genomic sequence, as allowed by the annealing conditions. The primer must bind to the target DNA in the proper orientation, and within an appropriate distance for a product to be amplified. Minor changes in reaction conditions (e.g., temperature, osmolarity, pH, time interval, template concentration and purity, etc.) can significantly affect where the primer(s) can bind, and subsequently what regions of the genome are amplified [243-246]. It can be difficult if not impossible to maintain the exact conditions with different personnel, equipment, and among different laboratories. Furthermore, depending on the G+C content of the target genome, some oligo primer sequences may perform better than others, so there isn’t an ideal primer for all bacteria. As with most molecular typing methods, the power of discrimination can be increased by repeating the assay with additional primers. Despite the drawbacks the technique has been frequently used for epidemiological investigations, often in conjunction with other typing methods [95,146,237,247-251].
PCR involving small, dispersed DNA repeats
As a means to increase stringency while maintaining a broad genome context, researchers have exploited the presence of small, dispersed DNA repeats found in many bacteria. There are many different types of these repeat units, depending on the size, the dispersal pattern, and the presence or absence of palindromes; direct or indirect repeats with spacers; and more. Some repeat types are only associated with a particular group of bacteria. Assays have been developed with several classes of repeats, and have been employed for epidemiological studies, including: Repetitive Extragenic Palindromic PCR (REP-PCR), [113]; Clustered Regularly Interspaced Short Palindromic Repeats(CRISPRs), [252,253]; BOX repeats [114,254,255]; Enterobacterial Repetitive Intergenic Consensus sequence-PCR (ERIC-PCR)[113, 256]; Mycobacterial Interspersed Repetitive Units (MIRU-PCR) [108,115], and Vibrio cholerae Repeats-PCR (VCR-PCR) [116]. These PCR amplification-based assays make use of primers derived from the most conserved region of the designated repeat sequence. The intervening sequences between or within repeat units are amplified if the primer binding sites are appropriately spaced and oriented. Unrelated and distantly related strains are differentiated based on the sizes of their respective intervening regions.
Repetitive sequence-based PCR reactions are more robust and reproducible than RAPD-based assays [257]. The REP-PCR technology has been developed into an automated system with Lab Chip microfluidics (DiversiLab System, BioMerieux Corp, France) [258,259]. Put into an integrated system that includes equipment, reagents, detection, and software for analysis, this technology is becoming popular with many clinical laboratories world-wide [260]. The system is being used to type many bacterial and fungal pathogens, especially drug resistant strains [259].
Typing by small dispersed repeats has been applied extensively to field isolates [111,116,261]. REP-PCR has been used to type and study the most diverse group of pathogens, including: MRSA [258]; Clostridium difficile [260]; Acinetobacter baummii [262], and vancomycin resistant enterococci [109]. ERIC-PCR has been used to study uropathogenic Escherichia coli strains [263]; Haemophilus parasuis [107]; Helicobacter pylori [110]; Vibrioparahaemolyticus [264]; Burkholderia cepacia [265]; and many others. In most studies, the assay was performed along with other molecular typing techniques to correlate the data into as detailed and clear a picture as possible.
Amplified fragment length polymorphism (AFLP)
Another novel strategy to increase PCR primer stringency, and concomitantly improve assay reproducibility is a variation of the RFLP concept, known as Amplified Fragment Length Polymorphism (AFLP). This technique consists of digesting the genomic DNA with two restriction enzymes, one that cuts the DNA infrequently and one that cuts often. The two enzymes are selected to have incompatible cohesive ends after digestion. This prevents the subset of genomic fragments that contain both restriction sites from ligating together while two unique adapters are attached, one to each of the cohesive ends. Specificity of the PCR reaction is achieved by designing a forward and a reverse primer that is homologous to their respective adapter sequences, and synthesized with 2 or 3 nucleotides added to the 3’ end. The primer annealing to the end of the fragment containing the infrequent cutting site is labeled, so that only amplicons containing the less common restriction site are detected in a large pool of fragments consisting mostly of fragments with both ends possessing the frequent restriction site. The number of detectable fragments is further limited by the need to match the 2-3 extra nucleotides on the 3’ end of the primer(s) with fragments that contain the complementary sequence. As a result, only a fraction of the available fragment pool is amplified (theoretically, 1/16 of the fragments, if 2 nts are added to the primer end, and 1/64 of the fragments, if 3 nts are added to the primer end, assuming a random but equal distribution of nucleotides in the sequence). Originally designed for complex eukaryotic genomes, the size of the pool of detectable fragments is manipulated by the number of extra bases added to the 3’ end of the primer and by the presence of the extra nucleotides on one versus both primers (for details, see [103-105]). Initially, the protocol incorporated a radioactive tag on the primer for detection of the targeted fragments, but it has been modified to use fluorescence detection instead (fAFLP) [266,267]. Although elegant, this technology has not been utilized to its full potential, partly because of cost, and partly because at the time of its inception, whole prokaryotic genome sequences of were beginning to accumulate in accessible databases. This made it possible to identify polymorphic loci and to custom design very specific and highly discriminating typing assays in silico.
Variable number tandem Repeat (VNTR) and multi-locus VNTR analyses (MLVA)
For decades, eukaryotic genotyping has been based on the higher mutation rates associated with short tandem repeats (STRs), also called variable number tandem repeats (VNTRs) or microsatellites. VNTRs have elevated mutation rates primarily due to slip-strand mispairing by the polymerase during replication or repair [268]. This is especially pronounced if the repeated unit is small and in large numbers. For some bacteria, the number of repeats at a locus can change so quickly, even closely related bacterial strains differ in the number of repeats present. The accelerated mutation rate and resultant sequence variability makes VNTR polymorphisms the most discriminating loci in many bacteria [120,123,269]. VNTR analysis is especially valuable for bacteria that have highly conserved genomes. The power of discrimination can be increased significantly with the analysis of additional tandem repeat containing loci. This approach has been termed Multi-locus VNTR Analysis or MLVA. The multi locus approach is highly efficacious for some consequential bacterial pathogens with conserved genomes, including Brucella species [119,270], Yersinia pestis [271,272], Bacillus anthracis [123,271], Bordatellapertussis [273], Francisella tularensis [274], Mycobacterium avium subspecies tuberculosis [275], and others. Currently, MLVA is a very popular technique for bacterial typing [124]. In addition to being exceptionally discriminating, and highly reproducible, it has the simplicity of a PCR assay, and detection can be carried out with limited funds, by substituting high percentage agarose or acrylamide gel electrophoresis for high cost fluorescent primers and expensive capillary electrophoresis equipment [271,276,277].
Single nucleotide polymorphisms (SNPs), multi-locus sequence typing (MLST), and in/dels
The most common sequence variations found among closely related strains are single nucleotide polymorphisms (SNPs). These changes may be intragenic or extragenic, andmay have a significant effect (e.g., creation of a stop codon) or no apparent effect at all (e.g., synonymous changes in a codon). The arrays of alternate nucleotides substituted at a specific sequence position are considered alleles. Although there is the potential for up to four alleles for any single sequence position, most often there are only two, since transitions (purines replacing purines, and pyrimidines replacing pyrimidines) are much more common than transversions (purine and pyrimidine exchanges). Small insertions and deletions (in/dels or InDels) are also found, although they are usually much less common than SNPs, since an in/del of 1 or 2 nucleotides within a coding region will cause a frameshift that may result in a nonfunctioning gene product.
Whole genome resequencing and comparative genome hybridization have revealed considerable number of SNPs. Fortunately, SNP data is clear, uncomplicated and easily archived [278]. Databanks for human SNP’s are the largest and carefully curated, with 18 million reported SNPs as of 2009 [278-280]. Stable SNPs have been used to identify and differentiate many types of biological organisms, including human individuals (forensic identification, ancestral information, phenotypic information and genealogical applications), plant species and varieties, endangered animals, parasites, viruses, pathogenic fungi, and microbial strains [133,281]. A web page maintained by the human genome project (supported by the U.S. Department of Energy Genome Programs), lists some of the unusual ways that genotyping has been applied to solve crimes, mysteries, migration patterns and population biology questions (http://www.ornl.gov/sci/techresources/Human_ Genome/elsi/forensics.shtml ).
The power of discrimination from a single SNP is not high, since there are, at the most, only four possible alleles. However, that power is substantially increased with the analysis of multiple SNP loci [282,283]. With the amount of sequence and resequence data available in public databanks, SNPs for many organisms can be found in silico, and a number of software programs have been developed to simplify the process [284-289]. For bacteria, Maiden [133,137] developed a technique based on sequencing seven to eleven housekeeping genes from the targeted organism to identify conserved SNPs. The technique, called Multi-Locus Sequence Typing (MLST), has been broadened to include other types of loci as well including virulence genes [126], antibiotic resistance genes [128], and the genes associated with serotype [130].
One of the major challenges for SNP typing is choosing a method for polymorphism detection. There have been many techniques developed for the detection of SNPs (for reviews, see [290-292]). Direct sequencing is the most reliable and comprehensive, but until recently was too costly and labor intensive for routine analysis. As already mentioned, PCR-RFLP is an easy and inexpensive method to perform if the SNP happens to fall within a restriction enzyme recognition sequence. The amplified target is digested with a restriction enzyme that has a recognition sequence coinciding with a polymorphic nucleotide location. The digest fragments are sized to determined if the recognition site is intact or disrupted.
A different approach developed for SNP detection is allele-specific primer extension based on 3’ mismatch PCR. If a mismatch occurs between the 3’ end of the PCR primer and the complementary sequence from the reference strain, no product is amplified. This analysis can be performed in real time by using dyes or probes to monitor the accumulation (or lack thereof) of amplified product [293-298]. An increasingly prevalent method for multiplexed real-time detection of MLST polymorphisms, is temperature induced melting curve analyses among alleles within a target in the range of 50-500 bp. Melting is monitored by dyes that fluoresce only when associated with double stranded DNA. As the double stranded DNA is heated, the duplex melts and the fluorescence is lost. High resolution melt analysis (HRM) is accomplished by conducting massive numbers of optical reads on each sample during the melting cycles, documenting the changes in fluorescence levels correlating with minute temperate increases. This technique is reported to have excellent resolution for SNPs [299-301]. A major advantage of this approach is the low cost, estimated at $20 per isolate compared to $100 per isolate for direct sequencing [299].
Primer extension with different fluorescence tagged, chainterminating, deoxynucleotide analogues, followed by capillary electrophoresis is another fairly easy technique that can be easily multiplexed. The polymerase adds a single chain-terminating nt to the primer at the site of the SNP, based on the sequence of the test strain’s allele at that locus. Each of the four chain-terminating dideoxynucleotides is tagged with a different fluorescent color. The assay is expanded to multiplexed by incorporating a different sized primer for each locus, extending the primer with the fluorescent tagged ddNTPs over numerous cycles, then separating and detecting the ladder of extended products by fluorescent capillary electrophoresis. The result is a ladder –like array of fragments, with each locus fluorescing the color of the allele present in the test strain. One multiplex analysis of 44 SNP markers has been developed for human identification [283]. This strategy is available in commercial kits marketed by a number of companies (e.g. iPlex by Sequenon; MegaBACE SNuPe for SNP Genotyping by GE Healthcare; and SNaPshot by Life Technologies- Applied Biosystems), increasing its utility. Research is progressing towards increasingly larger multiplexed SNP typing assays. One arraybased method allowed the development of a 124-plex SNP genotyping assay [282].
Ligase chain reaction (LCR) is an elegant strategy that works well for SNP detection [302]. Similar in principle to some PCR detection methods, two adjacent oligonucleotides (oligos) are synthesized, such that Oligo-1has one of the SNP alleles as the terminal 3’ base, while the 5’ end of Oligo-2 is juxtapositional to it. When the two oligos hybridize with the test DNA, Oligo-1 allele will be ligated toOligo-2 if the test DNA allele matches the oligo’s allele. However, but if the 3’ Oligo-1 allele is mismatched with the test DNA’s allele, the two oligos are not positioned correctly for the ligase to join them. Similar to that used for PCR, the oligos and target DNA are processed through repeated temperature cycles suitable for denaturation, hybridization, and ligation. Thermal-stable ligases are commercially available [303]. LCR can be multiplexed by designing the ligated products to be a different size for each locus and separating them by gel or by capillary electrophoresis.
One advantage that the multiplex SNP typing has over that of MLVA, is that SNP assays target a much smaller region of DNA. Therefore, SNP analysis has a higher success rate with degraded DNAs than VNTR analysis. The disadvantage of this method, is that it takes about 4 times more loci for SNP analysis to match the discrimination level of MLVA [290]. Some previous estimates have suggested a minimum of 40 loci per assay [283]. But, with the numerous multiplex detection methods and SNP data from whole bacterial genome projects, this won’t necessarily be a major hurdle to overcome [282].
High-throughput DNA sequencing and whole genome sequencing (WGS)
For many years, DNA sequencing was performed by the Sanger’s chain termination protocol, and the results detected by size separation of the truncated fragments on manually-poured, extra-long, polyacrylamide gels, and detected with radioactive or luminescent tags recorded onto photographic film. To produce quality sequence required skill and patience. Automation, PCR (and other types of enzymatic DNA amplification), and the availability of fluorescent tagged dideoxyribo nucleotides have resulted in DNA sequencing systems that are easier, faster, with higher throughput capacity. Greater coverage per run is also contributing to the rapidly increasing amount of sequence data deposited into the three major international DNA databanks: GenBank, EMBL-Bank and DDBJ. The rapid expansion of sequence data has fueled many of the molecular technologies previously mentioned.
The first genome to be completely sequenced was Haemophilus influenzae Rd in 1995 [304], soon followed by the much larger genome of Escherichia coli K12 in 1997 [305]. For the 10-year anniversary of this momentous achievement, a review written by Binnewies and colleagues offers an detailed and interesting perspective of this time period [306]. Recently, new strategies for DNA sequencing have led to new generations of sequencing methods and equipment [75,76,244-246]. These new technologies have dramatically lowered the cost of DNA sequencing while increasing the throughput (for a summary of costs associated with genomic sequencing from September 2001 until July 2011 [the most recent data currently available], see: Wetterstrand KA. DNA Sequencing Costs: Data from the NHGRI Large-Scale Genome Sequencing; Available at: http://www.genome.gov/sequencingcosts. Accessed: 28 September 2011). These technical improvements have triggered a rapid rise in the number of completely sequenced genomes, currently standing at approximately 1500 complete prokaryote genomes, (Complete Microbial Genomes at http://www.ncbi.nlm. nih.gov/genomes/lproks.cgi; accessed September 2011). As a result of the increased accessibility to whole-genome sequencing, researchers have deposited into GenBank and other databases, the complete resequence data for multiple strains of many pathogenic bacterial species. Concurrent with the expansion of high throughput sequencing platforms, has been the application of whole-genome sequencing (WGS) to molecular epidemiology [69,70,72-74,129]. For epidemiological studies, scientists are beginning to sequence whole genomes from individual bacterial colonies, and even single bacteria [69,73], which are directly isolated from disease outbreaks. The sequences are then used for comparisons to designated reference genomes or to sequences already available in GenBank, (http://www.ncbi.nlm.nih.gov/, EMBLBank and DDBJ, as well as numerous specialized DNA databases and project databases [74,243,307,308].
As a matter of practicality, WGS produces far more data than is needed for routine epidemiological investigations. Therefore, the information gleaned from WGS projects is being used as the basis for customizing some of the existing technologies. Comparisons across the WGS of multiple related bacterial strains are revealing deeper phylogenetic relationships than previously possible. As a result, better, more informative, assays are being designed, including: SNP arrays for Clostridium difficile [267], Mycobacterium leprae [74], Bacillus anthracis [266,307], Francisella tularensis [129], and Yersinia pestis [243]; and MLVA panels for many pathogenic bacteria, including Acinetobacter baumannii [309], Staphylococcus aureus [310], Legionella pneumophila [311], Salmonella enteric [312], and Brucella species [119,270,313].
Combined technologies
Many of the individual typing technologies described in this review have been combined to provide more data, better discrimination, and/ or easier methods. This is especially the case since the introduction of PCR. Only a very small number of the many published combinations are listed in Table 1. Worth noting is spoligotyping, as this technique has been particularly valuable for the epidemiology of pathogenic Mycobacterium species, especially the Mycobacterium tuberculosis complex strains responsible for human disease [165,166,168-170,314]. Mycobacteria have multiple discrete genomic regions, each containing dozens of small direct repeats. The repeats (about 35-bp in size), bracket unique (non repeated) sequence “spacers” of about the same size (35 to 40-bp), in an arrangement sometimes referred to as Clustered Regularly Interspaced Short Palindromic Repeats, or CRISPRs. Spoligotyping (spacer oligonucleotide typing), is a combination of direct repeat PCR and macroarray (membrane) hybridization. A CRISPR database, along with programs to find and compare CRISPR elements have been developed for genome comparisons [315-317]. While spoligotyping is best known for application to tuberculosis epidemiology, the technique has recently been applied to Corynebacterium diphtheria [318,319]. CRISPR typing has also been performed on isolates of Yersinia pestis [320], and Salmonella [321], and other bacteria have been shown to have CRISPR elements.
The future of molecular epidemiology
The number of bacteria that have been studied by molecular epidemiological methods is remarkable, considering how new some of the technology is. We are currently in another boom of expansion with the rapid development of new generations of high throughput DNA sequencing strategies [68,77,322]. These new systems have reduced the amount of starting DNA/RNA required for analysis, to easily procured, minute quantities and in some case a single genome copy [73,323].
There is no doubt that molecular strain typing will continue to grow in importance and that newer technologies will evolve and be applied to epidemiological investigations. Whole-genome sequencing (WGS) is currently among the fastest evolving technologies. The information derived from WGS has provided a much clearer picture of the phylogenetics and transmission mechanisms for many pathogens, including Mycobacterium leprae [74]; Vibrio cholera [69]; Salmonella enteric [72]; Staphylococcus aureus [70]; and a Shiga-toxin producing Escherichia coli with an unexpected serotype, O104:H4 [73].
At this point in time, molecular technology, as applied to diagnostics and epidemiology, is predominantly in the design and development phase, as indicated by publication content. Implementation of molecular assays into the field and clinical laboratory repertoires have occurred to a small degree, but generally has been surprisingly slow, considering the advantages these methods can offer. One concern expressed by many, is about quality control assurances for when the tests are put into practice. The question has been raised, and in some cases studied, for some of the most common methods and techniques including: RAPD-PCR [324,325], REP-PCR and ERIC-PCR [325,326], PFGE [325,327], DNA extraction and PCR [328], microarray production and use [329-331], multi-locus sequence typing (MLVA) [332], and DNA sequencing [333,334]. Clearly, a consensus of quality control standards for molecular tests will need to be addressed soon, to reassure the public that the test results are credible.
One of the greatest challenges for implementing molecular techniques to track diseases, trace their origins, improve disease surveillance and control further spread, is the high cost of the equipment and reagents needed to perform them. And yet, many of the regions where infectious diseases are endemic or enzootic are located in developing countries that simply do not have the economic resources to provide these tests. While some of the recent technologies are out of the grasp of many countries, there are still many low to medium price technologies available. Several have been mentioned in this review. RAPD-PCR and PFGE have been demonstrated repeatedly to be sufficiently discriminatory for tracking many pathogens. Undoubtedly, the inexpensive classical typing tools such as serotyping, culturing, some forms of biotyping, and phage typing will continue to be included in the epidemiologist’s toolbox. In the meantime, as the newer technologies become more commonly used, the cost per test will likely decrease, as previous history has shown.
As more of the molecular technologies become available, the best practice will likely be a combination of techniques that can simultaneously corroborate other test data, as well as provide answers to specific aspects of epidemiological investigation. This trend is quite apparent in the literature already, where many studies incorporate multiple complementary techniques. And, since bacteria are so variable in their lifestyles, no one test will be optimal for all pathogens. Techniques that perform best for the typing of genetically dynamic pathogens typically cannot differentiate highly clonal, genetically conserved bacteria adequately. Conversely, bacteria that undergo rapid and sustained evolution may not retain hypermutagenic markers long enough to be of use.
The future innovations in molecular technology are certain to have direct application to molecular epidemiological studies. Now, with the amount of information that can be gleaned from whole genome sequencing analysis for the epidemiological study of bacterial diseases, the future of molecular epidemiology looks very bright, indeed.
Author Disclosure Statement
The author has no competing financial interests. Mention of trade names or commercial products in this article is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the United States Department of Agriculture.