Received Date: September 22, 2015; Accepted Date: October 05, 2015; Published Date: October 12, 2015
Citation: Edlind T, Liu Y (2015) Development and Evaluation of a Commercial Sequence-Based Strain Typing Service for Listeria monocytogenes. J Microb Biochem Technol 7:351-362. doi:10.4172/1948-5948.1000238
Copyright: © 2015 Edlind T, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Journal of Microbial & Biochemical Technology
Listeria monocytogenes is an important foodborne pathogen and, relatedly, a persistent contaminant in many food processing facilities. Strain typing is critical to detecting and investigating outbreaks, and can be used to track down sources of contamination. The typing systems in current use have limitations ranging from low resolution and data portability to high cost and technical complexity. The aim of this study was to develop an outsourcing option for L. monocytogenes strain typing that addresses these limitations. The NCBI Genomes database representing 109 strains was screened for highly informative, tandem repeat-containing loci. The most promising, termed LmMT1 (0.8-1 kbp) and LmMT2 (0.7-0.8 kbp), exhibited complex patterns of polymorphism (insertions/deletions and single nucleotide polymorphisms), diversity indeces of 0.99 (LmMT1) and 0.98 (LmMT2), and were present in all L. monocytogenes and from one (LmMT1) to four (LmMT2) additional Listeria species represented in NCBI databases. Using a distinct set of strains, Miya et al. (J. Microbiol. Methods, 2012, 90:285-291) previously reported diversity indices of 0.95 and 0.91 for more limited regions (0.3-0.5 kbp) of these same two loci. Phylogenetic analysis of LmMT1 and LmMT2 sequences revealed distinct clusters corresponding to serotype (4b, 1/2a, and 4a complexes) and evolutionary lineage (I, II, and III/IV). Comparisons to PFGE, the current gold standard, suggest that LmMT1 typing is comparably discriminatory. Importantly, strains from four outbreaks formed corresponding clusters, although those from a 2002 outbreak were resolved into related environmental and food/human isolates that challenges their epidemiological connection. In the laboratory, LmMT1 and LmMT2 typing proved to be robust, generating high quality sequence from colonies submitted as non-hazardous heat-inactivated suspensions. Analysis of LmMT1 sequences from 62 diverse strains demonstrated overall agreement with single nucleotide polymorphism-based typing.
Listeria; Foodborne outbreak; Genotype; Phylogenetic tree; PFGE; MLVA
PFGE: Pulsed-Field Gel Electrophoresis; MLVA: Multilocus Variable Number of Tandem Repeats Analysis; MLST: Multilocus Sequence Typing; WGS: Whole Genome Sequencing; MLGT: Multilocus Genotyping; Indels: Insertions And Deletions; Snps: Single Nucleotide Polymorphisms
Listeria monocytogenes is an environmentally ubiquitous bacterium capable of contaminating various foods, particularly dairy products and processed meats, but also smoked fish and raw fruits and vegetables [1,2]. Contamination is aided by its ability to grow at low temperatures and its tolerance to freezing, drying, and heat; long term persistence in food processing facilities has been well documented [3-6]. Upon ingestion, L. monocytogenes can cause severe infection, particularly in immunocompromised individuals, the elderly, pregnant women, and neonates . Its virulence derives from its abilities to survive and grow within host phagocytes and aggressively spread through tissues, mediated by internalin proteins on its surface and induction of host actin polymerization [8,9]. Although most cases of food-borne listeriosis are sporadic, there have been multiple outbreaks in recent years, including one in 2011 responsible for 147 infections and 33 deaths across 28 states that was traced to contaminated cantaloupe (www.cdc.gov/listeria/outbreaks).
Tracking down the source of food-borne listeriosis is a challenging task that begins with laboratory culture (enrichment followed by plating) and species identification (to distinguish L. monocytogenes from its non-pathogenic sister species). This may be followed by conventional serotyping, although its limitations including low resolution (serotypes 4b, 1/2a, and 1/2b predominate) and high cost have largely led to its replacement with molecular typing systems. Among the latter, the gold standard is pulsed-field gel electrophoresis (PFGE) comparing fragment lengths following restriction enzyme digestion of chromosomal DNA . Its primary advantage is high resolution, but its limitations include lengthy turnaround time, technical complexity, high cost, and the inherently low portability of length data, although the latter has been addressed through strict standardization and pattern analysis algorithms as implemented in the PulseNet system . The limitations of serotyping and PFGE have encouraged the development of numerous alternative L. monocytogenes typing methods [12-14]. The most promising and practical of these is multilocus variable number of tandem repeats analysis (MLVA), first adapted to L. monocytogenes by Murphy et al.  and subsequently by other laboratories [16-21]. MLVA typically employs multiplex PCR and capillary electrophoresis to assess length variation due to insertions and deletions (indels) in 9 or so loci that contain tandem repeats, which are typically polymorphic due to slippage between repeat units during DNA replication .
The inherently higher portability of DNA sequence-based methods, facilitating day-to-day and lab-to-lab comparisons, is exploited by methods including multilocus sequence typing (MLST) analyzing 7 loci , and the related multi-virulence-locus sequence typing analyzing 6 loci . These methods analyze single nucleotide polymorphisms (SNPs) within relatively conserved loci, and hence are most useful for analyzing evolutionary trends rather than epidemiology. Furthermore, their multilocus requirement adds to their complexity and cost. Two additional methods representing variations on sequence analysis include multilocus genotyping (MLGT) which employs >100 probes and flow cytometry to detect the presence or absence of previously documented SNPs , and a microarray which detects the presence or absence of >100 genes previously shown to vary among strains . Development of both of these methods hinged on genomics; i.e., the availability in recent years of whole genome sequences (WGS) for large numbers of diverse L. monocytogenes strains. Indeed, WGS analysis itself is being explored as a direct approach to L. monocytogenes outbreak detection and investigation [32-35]. The WGS sequencing technologies commonly employed generate large numbers of short reads which are subsequently analyzed for SNPs, analogous to MLST and MVLST but providing considerably higher resolution. Unfortunately, short reads preclude the unambiguous sequence assembly of some tandem repeat regions. A general concern involves the substantial investments in equipment, reagents, and trained personnel required to implement MLGT, microarrays, and WGS as routine typing tools.
A more practical alternative to the multilocus or WGS approaches described above would be sequence-based typing of one (or two) highly polymorphic loci. Current representatives of this approach include Staphylococcus aureus spa typing , Streptococcus pyogenes emm typing , Neisseria meningitidis porA and fetA typing , and Campylobacter jejuni flaA typing . These loci were selected largely based on their previously characterized roles as immunodominant antigens or virulence factors in these four pathogens. Identifying highly informative loci for sequence-based typing of other bacterial pathogens might benefit from a less biased approach. For example, Miya et al.  identified several promising loci for sequence-based typing of L. monocytogenes based on MLVA data generated in their lab.
The approach used here to identify promising loci for sequencebased typing exploits the current availability of numerous genome sequences, and analogous to MLVA focuses on tandem repeats as major contributors to polymorphism. In addition to indels, it was anticipated that these tandem repeat loci, consistent with their length polymorphism, would also exhibit higher rates of SNPs within flanking regions and within the repeats themselves.
The goal was to develop and evaluate a cost-effective but uncompromising outsourcing option for L. monocytogenes strain typing that could be used by food industries to monitor and track down sources of contamination, and by public health and clinical labs for surveillance and outbreak detection and investigation. In addition to identifying promising loci, it was critical to develop a simple and safe procedure for sample submission, robust approaches to sequencing these samples, and a format for typing results that emphasizes interpretability and utility.
Tandem repeats in complete genome sequences were identified using the Tandem Repeats Database (TRDB; https://tandem.bu.edu). BLASTN searches were conducted on the NCBI website (www.ncbi. nlm.nih.gov) against the NCBI Genomes database, and as needed against the WGS and Nucleotide collection databases. Downloaded sequences were trimmed to common termini, and aligned with clustalw2 (www.ebi.ac.uk/Tools/msa/clustalw2) in PHYLIP format using default parameters. Alignments were analyzed using dnapars (DNA parsimony), and dendrograms constructed using drawgram, both from the PHYLIP package (http://evolution.genetics.washington. edu/phylip.html). Simpson’s index of diversity was calculated using the formula D=1- (?n (n-1) / N (N-1)), where n=number of strains with a given allele and N=number of (epidemiologically unrelated) strains. PCR-based serogroups were determined by BLASTN searches for serogroup-specific genes . For strains lacking experimentally determined PFGE profiles but having complete genome sequences, profiles were modelled in silico (http://insilico.ehu.es/digest/index. php?mo=Listeria).
Strains and culture conditions
A diverse set of 60 L. monocytogenes strains from the NRRL collection, including representatives of each lineage and serogroup, were obtained from T. Ward (USDA-ARS, Peoria, IL). Strains 1875 and 1877 are laboratory-generated deletion derivatives of F2365 , and ATCC strain 19115 was obtained from J. Brewster (USDA-ARS, Wyndmoor, PA). Strains were cultured on brain heart infusion agar at 37°C. As per website guidelines (www.microbitype.com/submissionguidelines), isolated colonies were suspended in provided buffercontaining tubes, bacteria were heat inactivated by incubating at 100°C for 15 min, and tubes were transported to MicrobiType (Plymouth Meeting, PA) by overnight courier at ambient temperature.
PCR and sequence analysis
Proprietary primers for PCR and sequencing were designed based on conserved regions identified by clustalw2 analyses of LmMT1 and LmMT2 loci, and synthesized by IDT (Coralville, Iowa). DNA was purified from heat-inactivated lysates, amplified with Taq polymerase, and subjected to Sanger dideoxynucleotide sequencing using proprietary modifications of standard methods. Chromatograms were visually scanned, and sequences edited as needed. All new LmMT1 sequences generated in this study have been submitted to GenBank with accession numbers KT626013-KT626044.
Identification of candidate typing loci LmMT1 and LmMT2
Tandem repeats were identified within the complete L. monocytogenes genome sequences for serotype 4b strain J1776 (GenBank accession CP006598) and serotype 1/2a strain EGD (accession HG421741) using TRDB (https://tandem.bu.edu). The repeats plus 500 nucleotides upstream and downstream flanking sequence were used as queries in BLASTN searches of the NCBI Genomes (Organism: Listeria monocytogenes) database which, as of March 2015, included 109 strains. Each repeat-containing region was evaluated for degree of polymorphism, presence in all or nearly all strains, and total length <1000 nucleotides to permit coverage by a single sequencing reaction. Two promising candidates were identified that satisfied these criteria.
The first candidate, LmMT1, involves a 0.8-1 kbp region encoding a putative internalin (gene lmo1136 in strain EGD) with Pro-Val-Asp repeats (25 to 107 residues total). Surveys of the MLVA literature revealed that this CCGGTAGAT repeat was included in several validated MLVA assays, where its length polymorphism yielded the highest diversity index of all repeats; specifically, 0.87 to 0.93 [15,17-20]. Furthermore, LmMT1 includes the 0.3 kbp TR1 region analyzed by Miya et al.  that yielded a diversity index of 0.95 in sequencebased typing of 70 strains, mostly from Japan.
The second candidate, LmMT2, involves a 0.7-0.8 kbp region encoding Asp-Ala repeats (13 to 40 residues) within a putative peptidoglycan binding protein (gene lmo1799 in EGD). This GATGCR repeat was also included in several validated MLVA assays, where its length polymorphism yielded a diversity index of 0.80 to 0.88 [16,18-21]. Similarly, LmMT2 includes the 0.5 kbp TR2 which yielded a diversity index of 0.91 in the Miya et al.  studies noted above. Interestingly, this same gene encodes a second polymorphic Asp-Ala repeat which, however, was a less promising typing target due to its excessive length in many strains.
Clustalw2 alignments of LmMT1 and LmMT2 sequences from representative strains are shown in Figure 1. As expected, polymorphisms in these loci are mediated primarily by indels involving the tandem repeats (Figure 1A and 1B). However, there are also SNPs within the repeat regions, reflecting redundancy in the genetic code. Alignments of full-length LmMT1 (Figure 1C) and LmMT2 (not shown) sequences from serotype 4b and 1/2a strains reveal substantial polymorphism extending beyond the repeats to both flanking sequences, primarily SNPs but also additional indels. Thus, analysis of this region by length polymorphism alone, as is the case with MLVA, would be much less informative than sequence analysis that weighs both indels and SNPs.
Figure 1: ClustalW2 alignments of LmMT1 and LmMT2 sequences from representative L. monocytogenes strains illustrating the complex pattern of indel and SNPbased polymorphism. (A) Tandem repeat regions of serotype 4b strains L312 and F2365. (B) Tandem repeat regions of serotype 1/2a strains EGD and LS743. (C) Full-length LmMT1 sequences for serotype 4b strain J1816 and serotype 1/2a strain F6854. The tandem repeat region is underlined.
All full-length LmMT1 and LmMT2 sequences from strains within the NCBI Genomes database were downloaded, aligned with clustalw2, and subjected to DNA parsimony analysis, weighting both indels and SNPs. For LmMT1, a total of 53 alleles (i.e., unique sequences) were resolved from analysis of the 109 strains in the NCBI Genomes database, as shown in the dendrogram (Figure 2). Note, however, that 37 of these 109 strains are annotated as “retail deli” isolates (e.g., R8-7914), collected from specific sites at multiple times over a 1 year period, as recently described . These 37 R8 isolates define only 6 total LmMT1 alleles, and 26 of the 37 define only 2 alleles, consistent with their characterization as “persistent” and the predominance of certain PFGE profiles . In the dendrogram, strains that share the same LmMT1 allele are represented by one strain (except for outbreak strains; see below), and the replicate strains (or their number in the case of R8 strains) are shown in parentheses. Based on their NCBI annotations and literature searches, it was estimated that 70 of the 109 L. monocytogenes strains in the NCBI Genomes database have no known epidemiological connection (for each retail deli isolate allele, one representative was included). The distribution of these 70 strains into the 53 alleles yielded a Simpson’s index of diversity of 0.99.
Figure 2: Dendrogram of LmMT1 sequences from all (as of March 2015) L. monocytogenes strains represented in the NCBI Genomes database. Serogroup 4b complex (lineage I), red; serogroup 1/2a complex (lineage II), blue; serogroup 4a complex (lineage III), green. PCR-based serotypes , non-bold. Replicates (strains with identical sequence as indicated strain, or their numbers in the case of retail deli isolates) of non-outbreak strains are shown in parentheses.
For LmMT2, phylogenetic analysis (Figure 3) resolved 38 alleles from the full-length sequences for 69 strains represented in the NCBI Genomes database. (The reduced number – 69 strains compared to 109 for LmMT1 – is largely due to reduced representation of R8 strains). Again, based on their annotations and literature searches it was estimated that 59 of these strains are epidemiologically unrelated. This value combined with 38 alleles yielded a Simpson’s index of diversity of 0.98. Consistent with this slightly lower value, several sets of strains that were resolved with LmMT1 (Figure 2) were clustered with LmMT2 (Figure 3); notably, the two pairs of “NE US 2002 outbreak” strains and the two “Denmark 2002/1996” strains. The “Ontario 2008” and “Multistate 2000” outbreak strains clustered with both LmMT1 and LmMT2. (complete LmMT2 sequence was available for only one of the “IL 1994” outbreak strains).
Figure 3: Dendrogram of LmMT2 sequences from all (as of March 2015) L. monocytogenes strains represented in the NCBI Genomes database. Serogroup 4b complex (lineage I), red; serogroup 1/2a complex (lineage II), blue; serotypes 4a/4c (lineages III/IV), green. PCR-based serotypes , non-bold. Replicates (strains with identical sequence as indicated strain) of non-outbreak strains are shown in parentheses.
Although their intended use is for epidemiology rather than evolutionary analysis, it is noteworthy that both LmMT1 and LmMT2 dendrograms divide the strains into three distinct groups representing lineages I (serotype 4b complex, including 4b, 4d, and 4e) , II (serotype 1/2a complex, including 1/2a, 1/2c, 3a, 3c), and III/IV (serotypes 4a, 4c), as previously described [30,43].
LmMT1 typing of epidemiologically related strains with comparison to PFGE
In light of its higher diversity index and ability to resolve closely related strains as noted above, LmMT1 was selected for further analysis. Below it is compared to PFGE with respect to four well characterized outbreaks:
(1) Identical LmMT1 sequences, and unique relative to all other strains in NCBI Genomes and WGS databases, were obtained for four serotype 1/2a strains: two from a 1988 outbreak (F6900, F6854) and two from a 2000 outbreak (J0161, J2818) traced to the same Texas meat processor . These strains are also indistinguishable by PFGE , and nearly identical based on whole genome SNP analysis . In addition to confirming their epidemiological connection, this lack of variation over 12 years is remarkable, and demonstrates the stability of LmMT1 typing.
(2) The two serotype 1/2b strains (R2-502 and R2-503) from the 1994 Illinois outbreak traced to chocolate milk  have identical, and unique, LmMT1 sequences. All strains associated with this outbreak analyzed by PFGE also had PFGE profiles that were indistinquishable or nearly so .
(3) The two serotype 1/2a strains (08-5578, 08-5923) from the 2008 Ontario outbreak traced to ready-to-eat meat have identical, unique LmMT1 sequences. By PFGE, these two strains are nearly indistinguishable, differing by a single AscI band which was shown by genome sequencing to be due to prophage insertion, a recognized source of PFGE profile variation in otherwise related strains .
(4) Four serotype 4b strains associated with the 2002 outbreak in northeastern U.S. traced to a Pennsylvania poultry processor  are indistinguishable by PFGE and MVLST , and also have identical LmMT2 sequences (Figure 3). They are, however, resolved by LmMT1 (Figure 2) into two closely related pairs - human/food strains J1776/ J1926 and processing plant/environment strains J1816/J1817 - differing by a 9 bp indel within the repeat region (Figure 4). This split is supported by analysis of additional indel-based polymorphic loci (Figure 4). Furthermore, although both pairs have unique LmMT1 sequences relative to all other NCBI Genomes and WGS strains, the J1816/J1817 pair differs only by two SNPs from recent retail deli isolates such as R8-5726 (Figure 2). These data cast doubt on the presumed connection of environmental strains J1816 and J1817 to the 2002 outbreak .
Figure 4: LmMT1 and three additional polymorphic loci resolve the 2002 outbreak isolate pair J1776 (human) and J1926 (food) from the J1816 (food processing plant) and J1817 (environment) isolate pair. Nucleotide locations in the J1776 genome (GenBank accession NC_021839) are indicated. Polymorphisms were identified by BLASTN searches of the NCBI Genomes database with the J1776 genome (in 100 kbp increments), and screening for indels.
Comparisons between LmMT1 and PFGE for epidemiologically unrelated strains
Comparisons between LmMT1 (Figure 2) and available (experimental or in silico) PFGE typing results placed epidemiologically unrelated strains into one of three groups:
(1) Strains that are indistinguishable by both LmMT1 and PFGE include serotype 4b strains F2365 and LL195; serotype 1/2c or 3c strains SLCC2372, SLCC2479, and R2-561; and serotype 4a strains L99, M7, and HCC23.
(2) Strains that are indistinguishable by LmMT1 but resolved by PFGE include serotype 1/2a strains EGD and 10403S; and serotype 1/2b strains SLCC2755, SLCC2482, and N1-011A.
(3) Strains that are resolved by LmMT1 but indistinguishable by PFGE include serotype 4b strains L312 and Clip80459, and serotype 1/2a strains La111 and N53-1.
Typing of non-monocytogenes Listeria species
While human pathogenesis is limited to L. monocytogenes, Listeria ivanovii is a pathogen of ruminants; additional species are non-pathogenic but often co-isolate with, and must be differentiated from, L. monocytogenes. Thus, it would be advantageous if LmMT1 or LmMT2 typing clearly distinguished between these species. Eleven strains of four non-monocytogenes Listeria species are represented in the NCBI Genomes database (as of March 2015). Among these, the LmMT1 locus is present in only Listeria innocua, where it resolves all three strains (Figure 5A). The LmMT2 locus, on the other hand, is present in all 11 non-monocytogenes strains, 10 of which have complete sequences and are resolved into 8 alleles (Figure 5B). For both LmMT1 and LmMT2, the non-monocytogenes strains are clearly resolved from L. monocytogenes lineages I, II, and III/IV strains.
Figure 5: Dendrograms of LmMT1 (A) and LmMT2 (B) sequences from all non-monocytogenes Listeria species (grey bars) in NCBI Genomes database (as of March 2015). Strain names followed by: i, L. innocua; v, L. ivanovii; w, L. welshimeri; s, L. seeligeri. Representative L. monocytogenes strains from lineages I (red bar), II (blue bar), and III/IV (green bar) are included for comparison.
In addition to identifying promising loci, the development of a practical outsourcing option for strain typing relies on a simple and safe procedure for sample submission and robust approaches to sequencing these samples. Non-thermophilic and non-spore-forming bacteria are effectively inactivated by heating to 100oC, and in this form do not require expensive and cumbersome biohazard packaging and shipping. To test the compatability of heat inactivation with LmMT1 and LmMT2 typing, isolated colonies from L. monocytogenes strains representing lineages I (serotypes 1/2b and 4b), II (1/2a and 1/2c), and III (4a) were prepared, submitted, and analyzed as described in Materials and Methods. The resulting chromatograms and sequences (representative results in Figure 6) were of uniformly high quality for both LmMT1 and LmMT2.
Figure 6: Representative chromatogram sections (3’-ends of tandem repeat) of (A) LmMT1 and (B) LmMT2 sequences from L. monocytogenes strain ATCC 19115 submitted as heat-inactivated single colony suspension. (C) ClustalW2 alignment of LmMT1 sequences generated from heat-inactivated colonies of representative L. monocytogenes strains. Tandem repeat regions, underlined.
Using this protocol, LmMT1 analysis was extended to an additional 56 L. monocytogenes strains and 1 L. innocua strain from the USDAARS collection, and the resulting dendrogram is shown in Figure 7. Of the 62 total strains, 34 defined 32 new alleles (i.e., unique LmMT1 sequence). The remaining 28 had LmMT1 sequences identical to 12 NCBI database strains (underlined in Figure 7). These include: strain F2365 and its laboratory-generated derivatives 1875 and 1877 which matched database F2365, strain ScottA which matched database Scott A, strain 33419 (derived from J0161) which matched database F6854 (epidemiologically related to J0161 as discussed above), and strain 33233 (derived from H7858) which matched database H7858. All of these matches were expected, and demonstrate that routine passaging does not alter LmMT1 sequence. The 59 epidemiologically unrelated strains (excluding L. innocua, 1875, and 1877) defined 44 LmMT1 alleles, and yielded a Simpson’s diversity index=0.98.
Figure 7: Dendrogram of LmMT1 sequences generated from heat-inactivated colonies of 62 strains in the USDA-ARS collection. Serogroup 4b complex (lineage I), red; serogroup 1/2a complex (lineage II), blue; serotypes 4a/4c/atypical 4b (lineages III/IV), green. PCR-based serotypes , non-bold; MLGT type , parentheses; identical match to NCBI Genomes or WGS database strain, underlined.
As observed above with NCBI strains, the USDA-ARS strains formed lineage-specific clusters that correspond fully to serogroup (with the exception of two atypical 4b strains in lineage III/IV). PFGE data were not available for these strains, but comparisons of LmMT1 typing to MLGT, which analyzes >100 SNPs using a flow cytometrybased approach , demonstrated overall agreement. Specifically, all MLGT-defined lineage I strains (i.e., 1.1 to 1.9) are also LmMT1 lineage I (Figure 4). Similarly, all MLGT-defined lineage II strains (2.1 to 2.9) are also LmMT1 lineage II, and all MLGT-defined lineages III/IV strains (3.15 to 4.4) are also LmMT1 lineages III/IV (including the two atypical serotype 4b strains). Considering strain clusters with identical LmMT1 sequence, 5 of 6 similarly include strains with identical MLGT type (e.g., the three serotype 1/2b strains with MLGT type 1.29). MLGT does, however, resolve a number of strains within several LmMT1 clusters (e.g., three serotype 1/2b strains with MLGT types 1.26, 1.28, and 1.49). Conversely, two MLGT 1.18 strains and 5 of 7 MLGT 2.10 strains were resolved by LmMT1 (Figure 7) due to a combination of indels and SNPs (Figure 8).
The goal of this study was to develop and validate a commercial sequence-based strain typing service for the foodborne pathogen L. monocytogenes that meets the following requirements: (1) resolution comparable to PFGE, the current gold standard; (2) clustering consistent with epidemiological relatedness, evolutionary lineage, and serotype; (3) data portability to facilitate day-to-day and lab-tolab comparisons; (4) readily interpretable results that reference public domain databases; (5) simple and safe sample submission; (6) turnaround time of 2 to 3 days; and (6) affordable cost. To meet this goal, a genomics-based approach was employed which led to the identification of LmMT1 and LmMT2. These tandem repeat-containing loci were present in all L. monocytogenes genomes, and could be amplified and sequenced by robust, inexpensive methods that are compatible with samples submitted as heat-inactivated colonies. Both demonstrate complex patterns of strain-dependent indels and SNPs that can be readily compared, using standard bioinformatics tools, to publicly available NCBI sequence databases currently representing >300 total strains. Importantly, LmMT1 sequence analysis was demonstrated here to provide strain resolution comparable to PFGE, and hence sufficient for outbreak detection and investigation in the public health sector and tracking down contamination in the food processing sector. Furthermore, LmMT1 and LmMT2 include the more limited TR1 and TR2 regions, respectively, previously shown by Miya et al.  using a distinct set of strains to exceed the resolution obtained by virulence gene-based MLST. Finally, the PCR-based technology behind LmMT1 and LmMT2 typing provides the potential to be used with unpurified and mixed samples, while technologies such as PFGE and whole genome sequencing rely on pure cultures. This is highly relevant in light of the trend toward culture-independent diagnostic methods .
Two approaches are currently used to summarize and share typing data. Dendrograms (e.g., Figure 2) are the most informative, as they can reveal epidemiological connections between isolates while also providing evolutionary perspective. A second, space-spacing approach is to apply a type designation in a nomenclature format that, unfortunately, is often cryptic and uninformative (i.e., a new type is arbitrarily assigned the next highest number). We propose a combination of these two approaches for sequence-based single locus typing systems such as LmMT1, where identical matches to NCBI database strains can be readily determined by BLASTN search. For example, USDA-ARS strains 33868 and 33873 are designated type LmMT1:N1-017 since they share their LmMT1 sequence with database strain N1-017 (Figure 7). (If there are multiple strains with that sequence in the database, as is the case for N1-017, the first one deposited is given priority). On the other hand, strain 57066 (and 32 additional USDA-ARS strains) had a unique LmMT1 sequence (deposited in NCBI with accession numbers KT626013-KT626044), and hereafter strains with matching sequence will be designated type LmMT1:57066. These designations are intrinsically informative, since the strain name (e.g., N1-017) can be used to search sequence, literature, and internet databases. Of course, the utility of this type designation system would be enhanced by consistent and comprehensive annotation of NCBI sequence files.
The development and validation of microbial typing systems has traditionally relied on strain collections that were sufficiently large and diverse [16,18,20,31]. However, access to these collections can be limited, and their laboratory analysis is costly, time consuming, and prone to experimental error. In contrast, the development and validation of LmMT1 and LmMT2 relied on publicly available genome sequences, whose number and diversity have expanded considerably in recent years. Indeed, one justification for this expansion in genome sequencing has been its potential application to strain typing. Of course, genome sequences, analyzed exclusively for SNPs, can and have been applied directly to investigations of foodborne outbreaks [35,49-54]. These applications were primarily retrospective, however, and this limitation will likely remain since the true costs for genome sequencing (equipment, reagents, and personnel) remain high. As demonstrated here, conventional, inexpensive sequence analysis of a single, maximally informative genomic locus such as LmMT1 that combines indels and SNPs can provide strain resolution comparable to PFGE, which has proven to be more than adequate in nearly all cases of outbreak detection and investigation.