Molecular population genetic analysis of emerged bacterial pathogens: selected insights.

Research in bacterial population genetics has increased in the last 10 years. Population genetic theory and tools and related strategies have been used to investigate bacterial pathogens that have contributed to recent episodes of temporal variation in disease frequency and severity. A common theme demonstrated by these analyses is that distinct bacterial clones are responsible for disease outbreaks and increases in infection frequency. Many of these clones are characterized by unique combinations of virulence genes or alleles of virulence genes. Because substantial interclonal variance exists in relative virulence, molecular population genetic studies have led to the concept that the unit of bacterial pathogenicity is the clone or cell line. Continued new insights into host parasite interactions at the molecular level will be achieved by combining clonal analysis of bacterial pathogens with large-scale comparative sequencing of virulence genes.

To avert the threat of resurgent and new microbial diseases, it is critical to gain insight into the molecular mechanisms contributing to temporal variation in disease frequency and severity. Although comprehensive, unambiguous understanding of the host and parasite factors mediating these processes is not available for any infectious agent, population genetic research in the last 10 years has provided noteworthy new information about the bacterial side of the equation. This review will summarize the insights accrued from population genetic analysis of bacteria responsible for disease outbreaks or increases in infection frequency and severity. One of the primary themes emerging from this research is that distinct bacterial clones have been responsible for several infection outbreaks (Table 1). Moreover, the distinct clones are frequently characterized by unique combinations of virulence genes or alleles of virulence genes. These observations have important implications for our understanding of infectious diseases and the public health measures required to reduce their detrimental and potentially devastating effect on society.

Population Genetics and Clonal Analysis of Bacterial Pathogens: Basic Concepts
Population genetic study of bacterial pathogens arose largely as an offshoot of research designed to address questions of longstanding interest to students of the molecular evolutionary processes in higher eukaryotic organisms. Bacteria were an attractive group of experimental organisms because of their phenotypic diversity, short generation times, haploid chromosomal genomes, and accessory genetic elements. Hence, bacterial population genetic research was originated by population geneticists interested in bacteria, rather than bacterial geneticists or medical microbiologists interested in population genetics (1,2). In spite of its important implications for how the field has developed over the last decade, this ontogeny will not be discussed in detail here. However, the reader should recognize that bacterial population genetics is a discipline separate and distinct from the study of the molecular epidemiology of infectious agents. The research tools, methods of data analysis, and general thought processes are very different from the typological thinking used by investigators of disease outbreaks or microbial pathogenesis (3)(4)(5)(6).
Early work on the clonal nature of bacterial pathogens was conducted largely with Escherichia coli, through a framework supplied by serotyping of one or a few polymorphic surface antigens (7,8). Only a few of the many possible O and H antigen serotypes were frequently associated with outbreaks of infantile diarrhea in the United Kingdom and other countries, which suggested that isolates expressing these traits had special virulence properties (7)(8)(9). Because serotype analysis of relatively few surface structures does not provide robust data for estimating overall levels of chromosomal diversity and relationships among strains, the primary research tool used to examine the population genetics of emerging bacterial pathogens has been multilocus enzyme electrophoresis (10,11). This technique indexes allelic variation in sets of randomly selected structural genes located on the chromosome and provides a basis for estimating overall levels of genotypic variation in populations, i.e., the sample of bacteria chosen for analysis. The key concept underlying use of starch gel-based protein electrophoresis in population genetics is that electromorphs (mobility variants) of an enzyme can be directly equated with alleles of the corresponding structural gene. Moreover, electromorph profiles over a sample of different enzymes, therefore, correspond to multilocus enzyme genotypes and are frequently referred to as electrophoretic types or ETs. The proteins analyzed are usually metabolic enzymes expressed by virtually all isolates of a species under the growth conditions used. The allelic variation detected is unaltered by environmental conditions such as culture conditions, laboratory storage, anatomic site of recovery, or specific clinical disease. Allelic variation in these metabolic enzymes is selectively neutral, or nearly so, which means that convergence to the same allele through adaptive evolution is unlikely (4,(12)(13)(14). As a consequence, this approach to the study of bacterial and other microbial pathogens provides a convenient strategy for indexing overall levels of chromosomal diversity in the sample and for inferring genetic relationships among strains. Because of the strong correlation of chromosomal divergence indexed by multilocus enzyme electrophoretic data and DNA-DNA hybridization studies (15)(16)(17)(18), several cryptic species have been identified once their existence was initially discovered by population genetic analyses that employed starch gel electrophoresis (16,(19)(20)(21). In most bacterial species, the number of allelic variants is large, and it is unlikely that recombinational processes would, by chance, frequently generate strains with the identical electromorph profile. Hence, organisms with the same electromorph profile are generally thought to be similar by descent, rather than by convergence through lateral gene flow.
Recently, convenient, rapid, and relatively inexpensive large-scale DNA sequencing techniques have also been adopted by several laboratories. Large-scale automated DNA sequencing has been used to rapidly and unambiguously identify a causative infectious agent and confirm or refute the identity of isolates recovered from temporallylinked patients thought to be involved in a disease outbreak. In addition, sequence-based studies have been employed to define the nature and extent of allelic variation in toxin and other virulence factor genes and to rapidly identify mutations associated with antimicrobial agent resistance (22).
Unless noted, data on the population genetic analysis of emerging bacterial pathogens summarized in this article were generated by multilocus enzyme electrophoresis, sometimes performed in concert with automated DNA sequencing.

Representative Insights
Brazilian Purpuric Fever Brazilian purpuric fever (BPF), a serious invasive disease of children, was first characterized in 1984 after an outbreak in Promissao, Sao Paulo State, Brazil. Children with BPF have acute onset of fever and usually die within 48 h with disseminated purpura, vascular collapse, and hypotensive shock (23). BPF is caused by Haemophilus influenzae biogroup aegyptius, an organism associated with sporadic or epidemic conjunctivitis (24). Multilocus enzyme electrophoresis and other molecular techniques have demonstrated that isolates recovered from BPF patients represent a distinct clone (25).
As a first step toward identifying the evolutionary origin of this pathogenic H. influenzae biogroup aegyptius clone, chromosomal variation and genetic relationships were indexed among 17 biogroup aegyptius isolates, and 2,209 encapsulated H. influenzae strains were recovered worldwide (26). Biogroup aegyptius isolates form three distinct evolutionary lineages of the species H. influenzae. Isolates of the case clone are only very distantly related to other isolates classified as biogroup aegyptius; that is, the case clone was no more related to other biogroup aegyptius isolates than are (for example) two H. influenzae isolates selected at random from the species. The BPF case clone was genetically allied with H. influenzae isolates expressing serotype c polysaccharide capsule, a result that explains an earlier observation (27) that BPF isolates, like serotype c strains, produce type 2 IgA1 protease, whereas other isolates of biogroup aegyptius express type 1 IgA1 protease. Thus, the population genetic evidence showed that biogroup aegyptius is polyphyletic and that the BPF organism is a genetically distinct clone unrelated to other isolates with the phenotypic criteria of biogroup aegyptius.
The genetic diversity in the sample of all biogroup aegyptius strains was approximately equal to that recorded for entire species of certain pathogenic bacteria (16,17). Therefore, the effective population size of aegyptius must be large; however, this interpretation is difficult to reconcile with the observation that strains in the biogroup are rare pathogens associated only with human disease. A possible explanation for the relatively extensive genetic diversity among biogroup aegyptius strains is that they represent cell lineages spawned from a much larger base population of diverse nonpathogenic precursor clones. According to this hypothesis, acquisition or loss of one or more genes (or, perhaps, a shift in ecological niche) may produce a pathogenic form with the characteristic viscerotropism for human conjunctivae.
Although population genetic analysis did not provide a simple reason for the BPF outbreak, the demonstration that the causative clone of biogroup aegyptius was highly differentiated from other phenotypically similar organisms provided an explanation for the unique infection manifestations and the unique group of characters associated with the clone (28)(29)(30). Moreover, population genetic analyses demonstrated distinct medical correlates to isolates classified as biogroup aegyptius. The results of numerous subsequent studies have confirmed that, as a population, H. influenzae biogroup aegyptius strains vary in their behavior, as one would expect of a genetically diverse set of organisms.

Escherichia coli O157:H7
Strains of E. coli expressing serotype O157:H7 were recognized in the early 1980s as important causes of hemorrhagic colitis and hemolytic uremic syndrome in North America (31). Disease usually occurs after consumption of contaminated beef or other food. Several large outbreaks have occurred, and more than 60 case clusters have been reported in the United States (32). Because several E. coli reference laboratories rarely identified organisms expressing this serotype before the early 1980s, reporting of these isolates has increased dramatically. Because of the medical and economic importance of E. coli strains considerable effort has been directed toward elucidating genetic relationships among and between them as well as between them and other members of the species; as a result extensive information is now available about clonal relationships among these important bacteria (33)(34)(35).
The observation that O157:H7 strains synthesize one or more Shiga-like toxins and lack the ability to rapidly ferment sorbitol initially suggested that strains of this serotype had shared a recent common ancestor. To directly test this idea, multilocus enzyme electrophoresis was used to assess genetic relatedness of 100 strains of E. coli serotypes recovered from patients with hemor-were identified, cluster analysis found that O157:H7 isolates are closely related organisms. The results were interpreted to mean that O157:H7 organisms recovered from epidemiologically unassociated North American outbreaks belong to a single geographically widespread pathogenic clone with specific virulence properties (33). Subsequent analysis of O157:H7 strains by pulsed-field gel electrophoresis has supported this idea (36).
To delineate clonal relationships among O157:H7 organisms and other E. coli strains that cause hemorrhagic colitis and infantile diarrhea, 1,300 isolates representing 16 serotypes from patients with these diseases were studied by multilocus enzyme electrophoresis and probing for genes encoding Shiga-like toxins (34). The O157:H7 clone was closely related to a clone of O55:H7 strains that has a long history of worldwide association with outbreaks of infantile diarrhea (34). The data strongly suggested that the O157:H7 and O55:H7 clones have recently radiated from a common ancestral cell. The O157:H7 clone arose from an O55:H7-like ancestor, perhaps through horizontal transfer and recombination events adding Shiga-like toxin genes and adhesion genes to an E. coli genome preadapted for causing diarrheal disease (34,35). If, as the multilocus enzyme electrophoretic data indicate, O157:H7 and O55:H7 organisms have shared a recent common ancestor, it is likely that the close genetic affiliation would be reflected at the nucleotide level. To test this notion, the gene (eae) (34) encoding intimin, a protein involved in bacterial attachment to enterocytes and subsequent effacement of the microvilli, was sequenced from representative isolates of these two serotypes. The resulting sequence data were consistent with the hypothesis that O157:H7 and O55:H7 organisms share a close genetic affinity and thereby provide a plausible explanation for the observation that these bacteria cause similar attaching and effacing lesions in cells grown in culture (38) and in animal models (39). Because conventional serotyping of E. coli does not provide a reliable basis for analyzing population structure and can be grossly misleading as to genetic relationships among isolates (40)(41)(42), many important medical correlates of the population structure will not be recognized and understood fully until E. coli isolates are sorted out along clonal lineages.

Staphylococcus aureus Toxic Shock Syndrome
Toxic shock syndrome (TSS) was described in 1978 (43) as a severe acute illness (characterized by high fever, erythematous rash, hypotension or shock, multiorgan involvement, and desquamation of the skin) of young children associated with infection with Staphylococcus aureus. Two years later, it was recognized that TSS is a geographically widespread disease affecting mainly young, healthy, menstruating women, especially those using certain high absorbency tampons (44). Most vaginal isolates of S. aureus recovered from patients with TSS produce a chromosomally encoded toxin, designated as toxic shock syndrome toxin-1 (TSST-1) (45). Evidence implicating TSST-1 as a major virulence factor in the pathogenesis of TSS has accumulated (46). Almost all strains recovered from patients with menstrual TSS, which account for approximately 90% of TSS cases, synthesize TSST-1, whereas only 50%-60% of isolates from cases of nonmenstrual TSS and 5%-25% of strains causing other diseases produce this protein.
Several questions of importance to both medical bacteriology and evolutionary genetics were addressed in a study of 315 TSST-1-producing strains of S. aureus (47). It was discovered that the organisms responsible for most cases of TSS with a female urogenital focus are members of a single distinctive clone (designated as ET 41), a result that explains the observation that isolates recovered from patients with TSS share many traits (48,49). The investigation also showed that TSST-1 is expressed by isolates of a great variety of clones representing virtually the full breadth of genotypic diversity in the species as a whole. In addition, isolates of ET 41 represented 24% of a sample of TSST-1-producing strains recovered before 1978, which meant that the tst gene encoding the toxin neither evolved nor was acquired recently by this species. The failure to recover isolates of ET 41 from non-human hosts effectively eliminated the likelihood that animals are important in the transmission of this clone.
Twenty-eight percent of isolates of S. aureus cultured from the introitus, vagina, or cervix of unassociated healthy carriers or women with non-TSS urogenital symptoms were ET 41 or closely allied clones; no other single multilocus enzyme genotype accounted for more than 12% of normal vaginal isolates. These observations led to the hypothesis that isolates of ET 41 are more readily able to colonize the human vagina and, hence, are widely dispersed in an ecological niche of great consequence in TSS. Under this "adapted clone" hypothesis, isolates of ET 41 are responsible for most vaginal cases because this clone has a special affinity for the cervicovaginal milieu, perhaps (but not necessarily) as a consequence of variation in regulation of toxin-or other virulence-gene expression. In summary, data derived from clonal analysis of TSST-1-producing S. aureus are consistent with the notion that the "bloom" in TSS cases happened because of a change in the character of catemenial products (perhaps associated with decreasing levels of anti-TSST-1 antibody in human populations), not because of a new S. aureus strain.
Two additional points are noteworthy regarding population genetic analysis of S. aureus strains producing TSST-1. First, if the gene encoding TSST-1 were evolutionarily old, allelic variants differing in nucleotide and, perhaps, amino acid sequence would exist in natural populations. This prediction was borne out by the identification of a variant of TSST-1 associated with goat, sheep, and occasionally bovine mastitis that is encoded by a gene which differs from the "human" form by 14 nucleotides, resulting in 9 amino acid changes. The variant toxin retains mitogenic activity for mouse splenocytes but differs significantly in other functions ascribed to TSST-1, including ability to induce a TSS-like disease in rabbits (50). Second, if the rapid increase in TSS cases were caused by a change in host character rather than by the rapid spread of a single, new, hypervirulent clone, subclonal heterogeneity would be present among isolates classified as ET 41. Examination of RFLP patterns for the gene (coa) encoding coagulase has shown at least three distinct subclones of ET 41 (51).

Neisseria meningitidis
Extensive work in the last 10 years has examined the molecular population genetics of Neisseria meningitidis, predominantly by clonal analysis, and more recently by DNA sequencing of putative virulence genes. This work suggests that temporal variation in disease frequency and severity is usually associated with clonal replacement much like influenza epidemics are driven by antigenic shift.

Serogroup B ET-5 Complex Organisms
Caugant et al. (52) demonstrated that an epidemic of serogroup B meningococcal disease that began in the 1970s in Norway and subsequently spread through much of Europe was caused by a group of 22 very closely related clones, designated as the ET-5 complex, that have no close genetic relationship to other clone groups. Clones of this complex were traced intercontinentally to Chile and South Africa, where they also caused contemporary outbreaks of invasive disease. Clonal analysis also showed that a severe epidemic of meningococcal disease in Cuba (characterized by a high attack rate and incidence of septicemia) was due to ET-5 complex organisms. The recovery of these same bacteria from outbreaks in Miami, Florida, in 1980 and 1981, strongly suggested that Cuban refugees imported the clones to Miami. Members of the ET-5 complex have seldom been recognized as important pathogens in the United States. However, ET-5 complex organisms were responsible for a recent increase in meningococcal disease rates in Washington and Oregon (53). In addition, a serogroup B epidemic in greater Sao Paulo, Brazil, was also caused by ET-5 complex members (54).
Recently, clonal analysis has been used to study serogroup B meningococcal isolates that caused invasive disease in The Netherlands between 1958 and 1986 (55). Significant temporal variation in the clonal composition of meningococcal populations was identified. Recent disease episodes were caused predominantly by isolates of three clonal lineages (designated I, III, and VI) that were not represented in samples collected before 1975. In addition, an epidemic in 1966-1967, and a hyperendemic disease wave in 1972 were caused mainly by two closely related clones (ET-11 and ET-17) expressing serotype 2b protein. Strong statistical deviation in the sex ratio was recorded for disease caused by clones of two lineages. Clones of lineage V were cultured far more frequently from female than for male patients; whereas, clones of lineage IX were recovered from disease in male patients approximately four times more often than average. The cause(s) of these differences are unknown but warrant further investigation.
For most bacterial pathogens, few data are available regarding the frequency with which distinctive clones are recovered in asymptomatic persons. Caugant et al. (56) studied the clonal composition of meningococcal isolates cultured from the nasopharnyx of healthy carriers in Norway and discovered that the frequency of recovery of clones (ET-5 complex and ET-37 complex) causing 80% of disease episodes were represented by only 7% and 9%, respectively, of carrier isolates. This same study demonstrated that the clones most commonly represented among carrier isolates (19%) have never been recovered from patients with invasive meningococcal infection. The data reinforce the concept that bacterial clones vary dramatically in virulence potential.

Serogroup C Disease
An increase of invasive disease due to serogroup C N. meningitidis strains has been reported in several countries in recent years (57)(58)(59)(60). Study of 121 isolates recovered from patients in Greater Sao Paulo, Brazil, between 1976 and 1990 identified a striking increase in isolates assigned to ET 11 complex (58). The percentage of invasive disease episodes caused by complex 11 organisms increased from 8% in 1988 to 66% in 1990. Outbreaks of serogroup C meningococci have also been recently reported from distinct regions of the United States (59) and Canada (57,60). Analysis of organisms collected from 13 U.S. outbreaks identified five distinct multilocus enzyme types, all very closely allied in overall chromosomal relatedness (59). Moreover, strains causing 4 of these 13 outbreaks were identical in multilocus enzyme type (designated ET-15) to organisms responsible for outbreaks in eastern Canada (60). Canadian investigators have reported (60) that ET-15 organisms had a significantly higher case-fatality ratio than other invasive meningococcal disease isolates, which may be due to a lower herd immunity to the newly emerged clone.

Serogroup A Disease
Unlike other serogroups of Neisseria meningitidis, which are usually associated with endemic disease, isolates expressing serogroup A capsular polysaccharide are unusual in that they may cause large epidemics. For example, serogroup A organisms have been responsible for epidemics of invasive disease in Africa, China, Iran, Greece, Finland, Brazil, and Nepal (61). Major epidemics every 5-10 years in the Sahel region of sub-Saharan Africa have led to the description of a "meningitis belt" and to detailed studies by clonal analysis of the molecular epidemiology of serogroup A organisms responsible for these and other outbreaks (62,63).
A group led by M. Achtman assembled 423 serogroup A meningococcal isolates, recovered primarily from invasive episodes, and representing organisms responsible for 23 epidemics or outbreaks between 1915 and 1983. Thirty-four distinctive clones were assigned to four complexes representing groups of related clonal genotypes (61). Most epidemics were caused by a single clone, and the same clone often was responsible for concurrent epidemics in contiguous countries. For example, serogroup A clone I-1 caused a pandemic that began in North Africa and certain Mediterranean countries in 1967 and spread throughout West Africa in the subsequent 2 years; clone III-1 has been responsible for disease outbreaks in Finland, Brazil, Nepal, and China.
Recently Achtman's group has extensively characterized more than 300 serogroup A isolates from patients or carriers in one epidemic in The Gambia in 1982-1983 and in 1984-1985 after an immunization program at the end of 1983. Analysis of a representative subgroup of 64 isolates showed that all were assigned to clone IV-1 (64,65). Isolates of this clone were examined for subclonal variation with SDS-PAGE profiling, LPS profiling, and genomic restriction endonuclease profiling, and rare variants were detected. Two cell-surface antigens (class 5 outer membrane protein and pili) were unusually variable, and the hypothesis was formulated that variation in the class 5 OMP occurs as a consequence of recombinational events affecting the translational reading frame. The role, if any, of this subclonal microheterogeneity in serogroup A meningococcal epidemics is being assessed. Clonal analysis has provided a framework that is being exploited to rationally select strains for further characterization by molecular and serologic techniques that may provide insight into the forces driving a bacterial epidemic (66)(67)(68).
Clonal analysis also has demonstrated that serogroup A isolates are a restricted phylogenetic subpopulation of the species N. meningitidis (69). This result may mean that the genotype bestowing the epidemic phenotype has arisen a single time and that it has not been successfully transferred horizontally to unrelated phylogenetic lineages of the species.
Moore et al. (70) employed clonal analysis to document the intercontinental spread of an epidemic group A meningococcal clone complex by Muslim hajis pilgrims in 1987. Apparently this clone was carried from South Asia (Nepal and/or India) to Mecca, Saudi Arabia, where it was disseminated in epidemic form to other hajis (pilgrims) and to indigenous Saudis. The report of invasive serogroup A meningococcal disease in other Gulf nations and among hajis returning to the United States, Europe (France and the United Kingdom), and Africa (Ethiopia, Sudan, and Chad) and the recovery of isolates of the same clone complex (designated ET III-1) from persons in these diverse geographic localities strongly suggested that an unusually virulent organism had been rapidly dispersed intercontinentally. Spread of clone III-1 from Mecca to France by hajis was independently confirmed by Riou et al. (71). This meningococcal clone also caused recent episodes of invasive disease in Sweden (72) and Kenya (73).

Streptococcus pyogenes Invasive Disease
Severe invasive infections caused by S. pyogenes have been reported with increased frequency in recent years in the United States (74,75), Europe (76)(77)(78), and elsewhere (79,80). These include both soft tissue infections, such as cellulitis, and deeper infections, including osteomyelitis, necrotizing fasciitis, and sepsis, many of which have occurred in previously healthy persons. The observation that many patients have multiorgan failure and other signs and symptoms mimicking staphylococcal toxic shock syndrome led to the characterization of a streptococcal "toxic-shocklike syndrome" (TSLS) (81). Most S. pyogenes isolates recovered from such patients produce one or more pyrogenic exotoxins with significant amino acid sequence homology and functional similarity with several enterotoxins synthesized by S. aureus (82).
To determine the genetic diversity and clonal relationships among S. pyogenes isolates recovered from patients with TSLS or other invasive diseases in the United States, 108 organisms were studied by multilocus enzyme electrophoresis and analyzed for exotoxin A, B, and C synthesis (82). The analysis showed that 33 distinctive clones were present among isolates comprising the sample, but nearly half the disease episodes, including more than two-thirds of the cases of TSLS, were caused by strains of two related clones, designated ET-1 and ET-2 (82). The production of pyrogenic exotoxin A (scarlet fever toxin, which is bacteriophage-encoded), either alone or in combination with other pyrogenic exotoxins, was associated with recovery in patients with TSLS. This association was present with isolates of the same clone, as well as those of distantly related phylogenetic lineages. The data were interpreted as strong circumstantial evidence that scarlet fever toxin A itself, or, possibly, the product of a gene tightly linked to it, is a factor in the pathogenesis of TSLS.
Because an increase in disease caused by strains expressing the M1 serotype protein had also been observed in England, Sweden, Norway, Germany, other European countries, and elsewhere, we sought to determine if strains recovered from these diverse localities were genetically allied. Chromosomal diversity and relationships among 126 M1 strains from 13 countries on five continents were analyzed by multilocus enzyme electrophoresis and restriction fragment profiling by pulsed-field gel electrophoresis (83). All isolates were also examined for the speA gene by PCR, and to increase the possibility of identifying interstrain variation, strain subsets were examined by automated DNA sequencing for allelic polymorphism in genes encoding M protein (emm), streptococcal pyrogenic exotoxin A (speA), streptokinase (ska), pyrogenic exotoxin B (speB), and C5a peptidase (scp). Seven distinct emm1 alleles were identified that would express M proteins differing at one or more amino acids in the N-terminus variable region. Although substantial levels of genetic diversity existed among M1 organisms, most invasive episodes were caused by two subclones marked by distinctive multilocus enzyme electrophoretic profile and PFGE restriction fragment length polymorphism (RFLP) types. One of these subclones (ET 1/RFLP pattern 1a) has the speA gene, and was recovered worldwide. Identity of speA, emm1, speB, and ska alleles in virtually all isolates of ET 1/RFLP type 1a means that these organisms have shared a common ancestor, and that global dispersion of this M1 subclone has occurred very recently. The occurrence of the same emm and ska allele in strains that are well-differentiated in overall chromosomal character demonstrated that horizontal transfer and recombination play a fundamental role in diversifying natural populations of S. pyogenes.
The population genetic framework constructed for S. pyogenes has been exploited to rationally choose strains for comparative molecular characterization of the gene (speA) encoding scarlet fever toxin (84,85). An analysis by Nelson et al. (84) identified four alleles of speA in natural Perspective Vol. 2, No. 1 -January-March 1996 populations, one of which (speA1) occurs in many distinct clonal lineages and is, therefore, probably evolutionarily old. The presence of identical exotoxin A structural genes in diverse phylogenetic lineages means that the gene has been horizontally distributed among clones, presumably by bacteriophage-mediated transfer. Two other alleles (speA2 and speA3), characterized solely by single nucleotide changes resulting in single amino acid substitutions, were each identified in single clones (ET 1 and ET 2) that together have caused most of TSLS episodes. The restriction of speA2 and speA3 to single clonal lineages can be interpreted as evidence that these two alleles are evolutionarily younger than speA1. A fourth allele (speA4) also is present in a single phylogenetic lineage and is 9% divergent from the other three toxin alleles. The absence of synonymous (silent) nucleotide changes in speA2 and speA3 is unusual and suggests that the allelic variation is not selectively neutral, which implies that the toxins are not functionally equivalent. Moreover, the mutations occur in a segment consisting of five amino acids that are highly conserved in the aligned sequences of staphylococcal enterotoxin A (SEA), staphylococcal enterotoxin B (SEB), SEC1, SEC3, SED, SEE, and streptococcal pyrogenic exotoxin C (86). The segment of SPE A containing these variations is immediately adjacent to a region containing cysteine residues involved in the formation of a disulfide loop believed to be required for mitogenicity of SPE A and other bacterial superantigens. Population genetic analysis then suggests that there are functional correlates of the allelic variation and that the alleles have been subject to natural selection. Recent studies have shown that the ability of the SPEA2 and SPEA3 variants to stimulate human peripheral blood mononuclear cells exceeds that of SPEA1 (87).

Mycobacterium tuberculosis
Perhaps no bacterial infection in recent years has generated as much interest nationally as resurgent tuberculosis (TB) (88). Largely because of the success of public health strategies, the incidence of TB declined steadily in the United States since the early 1950s, and the disease was thought to be eradicable by the end of the first decade of the 21st century (89). However, the yearly decline in TB incidence ended in 1984, and after several years of a plateau phase, resurged from 1988 through the present. An estimated 63,000 excess cases occurred through 1993 (90). The HIV/AIDS epidemic, immigration from countries with high TB prevalence, and outbreaks in correctional institutions, nursing homes, shelters for the homeless, and other congregative environments have contributed to the resurgence (88). On a global scale, one-third of the world's population is infected with this pathogen, and 8 million new TB cases occur each year. Moreover, nearly 3 million people die annually of TB, making it the leading cause of death due to an infectious agent worldwide (88). Hence, there is a need to understand the nature and extent of molecular variation in this pathogen.
Although the population genetics of M. tuberculosis has not been examined by multilocus enzyme electrophoresis, a recent study (91) analyzed DNA sequence diversity in eight loci (192,875 nucleotides) from unassociated isolates recovered in North America and Europe. The data showed almost a complete absence of coding sequence nucleotide variation. To rule out the possibility that restricted geographic sampling biased the data set, 350-bp fragments of genes encoding the beta subunit of RNA polymerase (rpoB), a 65-kilodalton heat shock protein (hsp65), the A subunit of DNA gyrase (gyrA), an enzyme involved in aromatic amino acid biosynthesis (aroA), RecA protein (recA), and a 1435-bp region of the gene (katG) encoding a catalase-peroxidase enzyme important in isoniazid resistance (92)(93)(94) and host-parasite interactions (95), were sequenced from one randomly selected isolate from each of seven countries with well-differentiated human populations (Switzerland, Turkey, Algeria, Somalia, Papua New Guinea, Vietnam, and Tibet). A virtual lack of nucleotide variation was also found in these seven isolates, and sequencing of several genes from many additional TB isolates has reinforced the concept of extremely restricted structural gene polymorphism. The paucity of sequence variation was surprising for several reasons. First, paleopathologic evidence suggests that humans got TB as early as 3700 BC in Egypt and 2500-1500 BC in Europe and also pre-Columbian North and South America (96). Moreover, M. tuberculosis DNA recovered from lung lesions in a 1000-yearold Peruvian mummy confirmed that the disease existed in the pre-Columbian New World (97). Second, as noted above, there is a very large global pool of infected persons (88), and third, considerable chromosomal restriction fragment length polymorphism has been identified by probing with mobile elements such as IS6110 (98,99). Based on a population genetic interpretation of the data, it was posited that M. tuberculosis may be only 15,000 to 20,000 years old, an age that dates speciation and global dissemination to roughly the same time as paleomigration into the New World. The time frame is also consistent with speculation (100) that the agent of human TB arose from the very closely related cattle pathogen M. bovis by host specialization occurring since the domestication of this animal some 8,000 -10,000 years ago. Recent large-scale DNA sequencing results are also consistent with an interpretation that M. tuberculosis and M. bovis have shared a recent common ancestor.
These molecular population genetic findings have considerable implications for M. tuberculosis pathobiology research. First, the virtual absence of naturally occurring nucleotide substitutions greatly increases the likelihood that missense mutations identified in genes associated with resistance to antimicrobial agents actually confer resistance rather than simply acting as convenient surrogate markers of resistance (93,94,101,102). Second, restricted allelic diversity means that it is probable that only nominal amino acid variation will occur in proteins of potential immunoprophylaxis, diagnostic, or virulence interest.

Clones W and Son of W
Commensurate with the rise of TB cases in the United States was an increase in the number of organisms resistant to one or more anti-TB medications (103). This trend has been viewed with great concern by public health authorities and clinicians, in part because no new first-line anti-TB agents have been introduced in several decades. Certain communities have contributed disproportionately to the documented increase is resistant organisms, the most notable being New York City (104), which has accounted for up to 60% of all drug-resistant M. tuberculosis reported nationally in some surveys. Although strains with several antimicrobial agent susceptibility patterns have been identified, approximately 300 organisms are invariably resistant to isoniazid, streptomycin, rifampin, ethambutol, and variably resistant to ethionamide, kanamycin, capreomycin, and ciprofloxacin (105,106). Early reports based on IS6110 restriction fragment length polymorphism typing (99), other molecular techniques (98,107), and classic epidemiologic investigations suggested that many of these organisms were clonally related. More recent analysis using IS6110 typing, several other molecular typing strategies, and automated DNA sequencing to identify the exact nucleotide changes responsible for resistance to isoniazid, rifampin, and streptomycin has unambiguously demonstrated the existence of two abundant closely related subclones (arbitrarily named W and W1, son of W) that have clearly shared a recent common origin (105). Multidrug resistance in these strains is due to sequential accumulation of amino acid substitutions conferring resistance to each drug alone, rather than a single-step molecular event, such as acquisition of a multidrug-resistance-conferring plasmid. Progeny of these two subclones have now spread well beyond the New York City borders. The organisms have been isolated from patients in other New York communities (108), Atlanta, Miami, Denver, Las Vegas, and Paris, France (109). Thus far, all patients documented to have infection caused by W or W1 organisms can readily be epidemiologically connected with New York City, that is, secondary, tertiary, or quaternary spread has not yet sufficiently obscured this important epidemiologic thread. Dissemination of these difficult to treat W and W1 organisms throughout New York City and other cities demonstrates the devastating consequences of clonal origin and spread of a bacterial pathogen. Because some persons now infected latently with W and W1 will later experience reactivation disease, dissemination of W and W1 has adverse implications for TB control in the 21st century.

Penicillin-Resistant Streptococcus pneumoniae
The Gram-positive bacterial pathogen S. pneumoniae is a major cause of illness and death worldwide (110,112). In the United States, the organism is responsible for more than 500,000 cases of pneumonia, 55,000 episodes of bacteremia, 6,000 cases of meningitis, and 40,000 deaths each year (113). Until relatively recently, antibiotic resistance in S. pneumoniae was rare, but it is now a global public health problem (114)(115)(116).
Resistance to penicillin in many organisms is due to the expression of altered high molecularweight penicillin-binding proteins (PBPs) that have reduced antibiotic affinity (117). Among resistant strains, alterations in four (1A, 2X, 2A, and 2B) of the five high molecular-weight PBPs expressed by isolates of the species have been Perspective Vol. 2, No. 1 -January- March 1996 identified in resistant patient isolates. Research has shown that two processes have contributed to the rise of these organisms. First, many distinct susceptible strains are independently evolving to the resistance phenotype. Acquisition of penicillin binding protein gene segments from foreign donors, such as oral streptococci, is apparently a primary driving force (118). At the molecular level, the result is generation of mosaic genes, and thereby molecularly remodelled PBP proteins with decreased affinity for penicillin (119)(120)(121)(122)(123). Evidence shows that the Hex recombinational pathway (124) participates. Once a distinct drugresistant cell has been generated, progeny can be transmitted locally and over intercontinental distances by person-to-person spread (125)(126)(127)(128)(129)(130). Multilocus enzyme electrophoresis has been applied to analyze genetic relationships among penicillin-resistant strains of S. pneumoniae from global sources and to infer patterns of epidemiologic spread of these resistant organisms (125)(126)(127)(128)(129)(130)(131).
The importance of rapid local clonal spread of antibiotic-resistant S. pneumoniae is illustrated by events in Iceland. Monitoring of antibiotic resistance patterns of pneumococci in Iceland showed no detectable penicillin-resistant organisms in 1983 to 1988. The first penicillin-resistant strain was recovered in December 1988 (132). The frequency of penicillin-resistant organisms rose sharply over the next 3 years from 2.3% to 17% of all isolates in the first quarter of 1992 (132). Almost 70% of the resistant isolates expressed serogroup 6 capsule polysaccharide and were also resistant to tetracycline, chloramphenicol, erythromycin, and trimethoprim-sulfamethoxazole. To test the hypothesis that these Icelandic isolates were clonally related, Soares et al. (128) examined 57 organisms for serotype, PBP pattern, pulsedfield chromosomal restriction endonuclease digestion pattern, and multilocus enzyme electrophoretic genotype. All isolates were serotype 6B and had closely similar or identical patterns for each of the molecular markers examined. Surprisingly, the Icelandic organisms were indistinguishable from a subgroup of multiresistant serotype 6B pneumococci that occurs with high incidence in Spain. The authors concluded that the Spanish clone was imported to Iceland and noted that in recent years a favored vacation locality for Icelandic families with young children had been Spain. The factors responsible for the precipitous spread of the clone in Iceland are largely unknown. The frequency of use of beta-lactam antibiotics in Iceland and Sweden (a country where resistant pneumococci are rare) in 1989 was similar (133), and low compared to other industrialized countries (134). However, Iceland has a very high use of antimicrobial agents such as trimethoprim-sulfamethoxazole, metronidazol, and tetracycline (133,134), which means it is conceivable that selection for the multiresistant clone occurred. A second possible factor contributing to the clonal spread of the organisms is that 57% of Iceland's population of about 250,000 live in Reykjavik and its suburbs, where most of these strains have been recovered. Moreover, almost 80% of Icelandic children 2 to 6 years of age in Reykjavik attend day-care centers. Together these factors may have provided a unique set of circumstances for introduction and rapid spread of the multiresistant clone.
Investigating the molecular population genetics of pneumococci has led to the realization that horizontal transfer and recombinational processes are also serving to generate variation in capsule type and immunoglobulin A1 (IgA1) protease gene alleles (135,136). Coffey et al. (135) analyzed European resistant strains expressing serotype 9 or 19 through a combined approach employing clonal analysis and RFLP profiling of genes encoding PBP1A, PBP2B, and PBP2X. Analysis of a resistant isolate synthesizing serotype 19 capsule showed that it was identical in overall chromosomal character to a clone of organisms resistant to multiple antibiotics which expressed serotype 23F, a result that was interpreted as evidence that horizontal transfer of capsular biosynthesis genes had occurred. More recently, Lomholt (136) has shown that recombinational processes contribute to allelic variation in the gene (iga) encoding IgA1 protease.

Methicillin-Resistant Staphylococcus aureus
Very soon after methicillin entered clinical use in the 1950s, strains of S. aureus resistant to this antimicrobial agent were reported in the United Kingdom (137). Within a few years, hospital outbreaks caused by methicillin-resistant S. aureus (MRSA) occurred in Europe. MRSA were recognized as an important hospital infection control problem in the United States in the mid-1970s, and these organisms have now achieved global distribution (138).

Emerging Infectious Diseases
Intrinsic methicillin resistance is due to the expression of an altered penicillin-binding protein (PBP) termed PBP 2a (139) that is encoded by the chromosomal mec gene (140,141). Evidence has been presented that mec originated as a consequence of a recombinational event fusing about 300 bp of a staphylococcal beta-lactamase gene and a segment of a gene encoding a PBP from an unknown donor bacterium, perhaps E. coli (142).
Multilocus enzyme electrophoresis and other molecular population genetic techniques were used to determine the extent of mec distribution among phylogenetic lineages of the species and genetic relationships among MRSA strains circulating in various geographic regions at different times (143,144). The mec gene is harbored by many divergent phylogenetic lineages representing a large portion of the breadth of chromosomal diversity in the species S. aureus. On the basis of additional evidence, it was proposed that multiple episodes of horizontal transfer and recombination have contributed to the spread of the mec resistance determinant in natural populations. The identification of a single multilocus enzyme genotype among MRSA organisms recovered in the United Kingdom, Denmark, Switzerland, Egypt, and Uganda, soon after the widespread introduction of methicillin into clinical use in the 1960s, meant that MRSA isolates recovered from those localities at that time were progeny of a single ancestral cell that had probably acquired the mec determinant recently. The multilocus enzyme electrophoretic data demonstrating association of mec with highly divergent members of the S. aureus species effectively ruled out the idea (145) that all extant MRSA are lineal descendants of a single clone, and that mec was acquired just once by methicillin-sensitive clones of this pathogen.

Borrelia Species Associated with Lyme Disease
Molecular population genetic strategies have also been used to delineate accurate phylogenetic relationships among emerging organisms. For example, Boerlin et al. (146) studied 50 isolates classified as Borrelia burgdorferi by multilocus enzyme electrophoresis and identified three distinct genetic clusters that were well differentiated from one another in overall chromosomal character. The investigators proposed that each cluster represented a genospecies, and this idea was subsequently supported by DNA-DNA reassociation studies (147), 16S rRNA gene sequencing (148), genomic fingerprinting by arbitrarily primed polymerase chain reaction (149), and other techniques (Valsangiacomo C, Balmelli T, Piffaretti J-C, pers. comm.). Moreover, different genospecies of B. burgdorferi have been associated with distinct clinical manifestations of Lyme borreliosis (150)(151)(152).
Although population genetics has only been applied to the study of pathogenic bacteria for approximately a decade, considerable insight has been gained into the molecular mechanisms of temporal variation in disease frequency and severity, host adaptation of clonal lineages, and the relationship of disease severity and naturally occurring bacterial clones. The work cited in this review represents only a small part of the contribution of molecular population genetic investigations to an understanding of temporal variation in disease frequency and severity, microbial pathogenicity, and evolution of virulence genes. For example, contributions have also been made from studies of Vibrio cholerae (153)(154)(155), encapsulated H. influenzae (156), L. monocytogenes (157)(158), Salmonella spp. (159)(160)(161), and other pathogens (162).
Changes in human behavior, simple processes of microbial evolution, and increasing resistance to antimicrobial agents will continue to supply mankind with new infectious disease challenges and, therefore, motivation for molecular population genetics studies. The genomes of two bacterial pathogens have now been sequenced (163)(164), and it is likely that the genomes of most major human and veterinary viral and bacterial pathogens will be sequenced in their entirety in the next decade. Hence, the trend toward molecular dissection of microbial populations by large-scale DNA sequencing will accelerate. DNA sequence-based and conventional molecular population genetic studies are cost-effective and should be encouraged in the fight to limit the detrimental impact of infectious agents on human, animal, and plant health.