Skip directly to site content Skip directly to page options Skip directly to A-Z link Skip directly to A-Z link Skip directly to A-Z link
Volume 19, Number 8—August 2013

Whole Genome Sequencing of an Unusual Serotype of Shiga Toxin–producing Escherichia coli

Article Metrics
citations of this article
EID Journal Metrics on Scopus
Tim Dallman, Lisa Cross, Chloe Bishop, Neil Perry, Bente Olesen, Kathie A. Grant, and Claire JenkinsComments to Author 
Author affiliations: Health Protection Agency, London, UK (T. Dallman, L. Cross, C. Bishop, N. Perry, K.A. Grant, C. Jenkins); Hillerød Hospital, Hillerød, Denmark (B. Olesen)

Cite This Article


Shiga toxin–producing Escherichia coli serotype O117:K1:H7 is a cause of persistent diarrhea in travelers to tropical locations. Whole genome sequencing identified genetic mechanisms involved in the pathoadaptive phenotype. Sequencing also identified toxin and putative adherence genes flanked by sequences indicating horizontal gene transfer from Shigella dysenteriae and Salmonella spp., respectively.

There are >400 serotypes of Shiga toxin–producing Escherichia coli (STEC), and >100 of these are known to be associated with severe disease in humans (1). STEC are defined by the presence of 1 or both phage-encoded Shiga toxin genes stx1 and stx2. However, those serotypes associated with more severe disease generally harbor additional virulence genes, such as eae (intimin), which is encoded on the locus of enterocyte effacement, or virulence regulation genes, such as aggR, which is located on the aggregative adherence plasmid. Both of these genes mediate attachment of the bacteria to the host gut mucosa (2). The stx1 gene is also found in Shigella dysenteriae serotype 1.

A range of molecular typing methods show that the shigellae belong within the Escherichia coli species (3). Peng et al. (4) described an evolutionary path of Shigella spp. from E. coli involving gene acquisition (virulence plasmid and pathogenicity islands) and gene loss (pathoadaptivity). Gene loss, or loss of gene function, may result from changes to bacterial biosynthesis pathways driven by the abundance of resources in the host or because the genes may encode proteins adverse to bacterial virulence.

Olesen et al. (5) described a strain of STEC serotype O117:K1:H7 found in travelers from Denmark who returned from tropical locations. The strain was unusual because it was negative for the production of lysine decarboxylase and β-galactosidase (ortho-nitrophenol test) and positive only for stx1.

Since 2004, 19 isolates of STEC O117:K1:H7 have been submitted to the Gastrointestinal Bacteria Reference Unit at the Health Protection Agency in London, UK, from frontline diagnostic microbiology laboratories in England and Wales for confirmation of identification and typing (Table). All isolates were originally misidentified by the submitting laboratory as Shigella sonnei or Shigella spp., probably because of the unusual biochemical phenotype exhibited by this strain. The purpose of this study was to use whole genome sequencing to investigate the evolutionary origins, putative virulence genes, and pathoadaptive mechanisms of this unusual STEC serotype.

The Study

DNA from 5 isolates (151/06, 371/08, 290/10, 754/10, and 229/11) was prepared for sequencing by using the Nextera sample preparation method and sequenced with a standard 2 × 151 base protocol on a MiSeq instrument (Illumina, San Diego, CA, USA) (6). Sequences were analyzed as described (7). In brief, Velvet version 1.1.04 ( was used to produce an average of 489 contigs with an average N50 length of 38722. Illumina reads were mapped to the reference strain (GenBank accession no. CU928145) by using Bowtie2 2.0.0 β-5 ( and a variant call format file was created from each of the binary alignment maps, which were further parsed to extract only single nucleotide polymorphism (SNP) positions that were of high quality in all genomes.


Thumbnail of Maximum-likelihood dendrogram for 5 strains of Shiga toxin–producing Escherichia coli serotype O117 in the Gastrointestinal Bacteria Reference Unit (Health Protection Agency, London, UK) archive (boldface), 32 other E. coli genomes, and 4 Shigella spp. genomes. E. fergusonii was used as an outgroup. Scale bar indicates nucleotide substitutions per site.

Figure. . . Maximum-likelihood dendrogram for 5 strains of Shiga toxin–producing Escherichia coli serotype O117 in the Gastrointestinal Bacteria Reference Unit (Health Protection Agency, London, UK) archive (boldface), 32 other E. coli...

Concatenated SNPs generated against the reference strain 55989 were used to produce a maximum-likelihood phylogeny of 5 strains in the Gastrointestinal Bacteria Reference Unit archive and 36 other publically available E. coli genomes and Shigella spp. (Figure). Despite temporal and spatial diversity of the 5 sequenced isolates, they clustered on the same branch, but they were distant from other publically available sequences of STEC strains.

A phylogenetic tree based on a diverse range of E. coli showed that the 5 strains of STEC O117 have 130 polymorphic positions, and the closest 2 strains (299/11 and 754/10) are 26 SNPs apart (Table; Figure). Furthermore, on the basis of a diverse range of E. coli, genome sequences of EDL933 and Sakai, 2 well-described strains of STEC O157, are ≈35 SNPs apart. The multilocus sequence type ST504 was assigned in accordance with the E. coli multilocus sequence type databases at the Environment Research Institute, University College (Cork, Ireland).


Alignment of the genome of strain 229/11 with STEC O157 (EDL933) and Shigella dystenteriae serotype 1 (Sd197) indicated gene acquisition, loss, and rearrangement in 229/11. The stx1 gene is adjacent to the yjhS gene in 229/11 and Sd197, and in 229/11 this fragment is flanked by phage-like sequences that are closely related to Stx2-converting phage sequences but not to other Stx1-converting phages. This unusual gene arrangement was described by Sato et al. (8). In Sd197, this region is flanked by integrases and insertion sequences. Other open reading frames homologous to those of Shigella spp. in stx-flanking regions in E. coli have been described, and it is likely that E. coli and the shigellae have exchanged stx many times in their evolutionary past but only certain strains, such as 229/11, have the appropriate genomic background to retain and stably express Stx (9).

Strain 229/11 also contains a 10-kb pathogenicity island (PAI) harboring the ratA, Sivl, and SivH genes and shares homology with PAI CS54 found in Salmonella spp. (10) and a PAI found in avian pathogenic E. coli (11). SivH has been described as similar to the intimin gene (10). SivH may facilitate attachment to the host gut mucosa and could explain the long persistence of STEC O117:K1:H7 in infected patients (5). In vitro inactivation of sivH in S. enterica serovar Typhimurium resulted in a reduced ability to colonize Peyer’s patches (10). In S. enterica serovar Typhimurium, CS54 is 25-kb and encodes shdA, ratA, ratB, sivl, and sivH, whereas in S. enterica subsp. II, S. bongori serotypes and 229/11, ratB, and shdA are absent (10).

Cadaverine has an inhibitory effect on enterotoxin activity by preventing full expression of the virulent phenotype, and it has been suggested that there is evolutionary pressure to mutate or delete the cadA gene (12). This gene is missing from S. flexneri (Sf301) and S. boydii (Sb227) because of inversion-associated deletions, and in Sd197 and S. sonnei (Ss046) it is inactivated by a frameshift mutation and an insertion sequence, respectively (12). In 229/11, loss of cadA (lysine decarboxylase) activity is caused by repositioning of the of the cadA activator gene, CadC, upstream of the cadA gene and a 90-bp deletion at the 5′ end of cadC. The cadA gene and truncated cadC gene are separated by a large fragment of DNA inserted into the cadC gene. This fragment contains several open reading frames, including genes encoding aerobactin siderophore biosynthesis proteins.

Lactose fermentation is a biochemical property commonly used for distinguishing Shigella spp. from E. coli because shigellae are non- or late-lactose fermenters. In Sd197 and Ss046 (late lactose–fermenting strains), the key gene, lacZ (encoding β-d-galactosidase) is intact, although lacY (encoding galactose permease) is a pseudogene (12). Like Sf301 and Sb227, lacZ and lacY are deleted in strain 229/11. The lack of a functional lac operon has been associated with pathogenicity mechanisms in S. enterica (13).

E. coli as a species contains a large diversity of adaptive paths. This diversity is the result of a highly dynamic genome, with a constant and frequent flux of insertions and deletions (3). Pathogenicity in STEC O117:K1:H7 is most likely multifactorial and results from a novel combination of lack of cadA and lacZ expression and the presence of stx1 and the intimin-like sivH genes, demonstrating pathoadaptivity and horizontal gene transfer.

Dr. Dallman is lead bioinformatician in the Gastrointestinal Bacterial Reference Unit at the Health Protection Agency in London, UK. His primary research interest is application of whole genome sequencing of enteric pathogens to aid public health investigations.



We thank the Health Protection Agency Next-Generation Sequencing Implementation Group for support and Flemming Scheutz for helpful discussions.

This study was supported by the Health Protection Agency Strategic Research and Development Fund (grant no. 108061).



  1. Bergey’s manual of systematic bacteriology. The Proteobacteria, 2nd ed. Garrity GM, Brenner DJ, Krieg NR, Staley JT, editors. New York: Springer; 2005.
  2. Kaper  JB, Nataro  JP, Mobley  HL. Pathogenic Escherichia coli. Nat Rev Microbiol. 2004;2:12340 and. DOIPubMedGoogle Scholar
  3. Kaas  RS, Friis  C, Ussery  DW, Aarestrup  FM. Estimating variation within the genes and inferring the phylogeny of 186 sequenced diverse Escherichia coli genomes. BMC Genomics. 2012;13:577 and. DOIPubMedGoogle Scholar
  4. Peng  J, Yang  J, Jin  Q. The molecular evolutionary history of Shigella spp. and enteroinvasive Escherichia coli. Infect Genet Evol. 2009;9:14752 and. DOIPubMedGoogle Scholar
  5. Olesen  B, Jensen  C, Olsen  K, Fussing  V, Gerner-Smidt  P, Scheutz  F. VTEC O117:K1:H7. A new clonal group of E. coli associated with persistent diarrhoea in Danish travellers. Scand J Infect Dis. 2005;37:28894 and. DOIPubMedGoogle Scholar
  6. Köser  CU, Holden  MT, Ellington  MJ, Cartwright  EJ, Brown  NM, Ogilvy-Stuart  AL, Rapid whole-genome sequencing for investigation of a neonatal MRSA outbreak. N Engl J Med. 2012;366:226775 and. DOIPubMedGoogle Scholar
  7. Dallman  T, Smith  GP, O'Brien  B, Chattaway  MA, Finlay  D, Grant  KA, Characterization of a verocytotoxin-producing enteroaggregative Escherichia coli serogroup O111:H21 strain associated with a household outbreak in Northern Ireland. J Clin Microbiol. 2012;50:41169 and. DOIPubMedGoogle Scholar
  8. Sato  T, Shimizu  T, Watarai  M, Kobayashi  M, Kano  S, Hamabata  T, Genome analysis of a novel Shiga toxin 1 (Stx1)–converting phage which is closely related to Stx2-converting phages but not to other Stx1-converting phages. J Bacteriol. 2003;185:396671 and. DOIPubMedGoogle Scholar
  9. Escobar-Páramo  P, Clermont  O, Blanc-Potard  AB, Bui  H, Le Bouguénec  C, Denamur  E. A specific genetic background is required for acquisition and expression of virulence factors in Escherichia coli. Mol Biol Evol. 2004;21:108594 and. DOIPubMedGoogle Scholar
  10. Kingsley  RA, Humphries  AD, Weening  EH, De Zoete  MR, Winter  S, Papaconstantinopoulou  A, Molecular and phenotypic analysis of the CS54 island of Salmonella enterica serotype Typhimurium: identification of intestinal colonization and persistence determinants. Infect Immun. 2003;71:62940 and. DOIPubMedGoogle Scholar
  11. Schouler  C, Koffmann  F, Amory  C, Leroy-Sétrin  S, Moulin-Schouleur  M. Genomic subtraction for the identification of putative new virulence factors of an avian pathogenic Escherichia coli strain of O2 serogroup. Microbiology. 2004;150:297384 and. DOIPubMedGoogle Scholar
  12. Yang  F, Yang  J, Zhang  X, Chen  L, Jiang  Y, Yan  Y, Genome dynamics and diversity of Shigella species, the etiologic agents of bacillary dysentery. Nucleic Acids Res. 2005;33:644558 and. DOIPubMedGoogle Scholar
  13. Bliven  KA, Maurelli  AT. Antivirulence genes: insights into pathogen evolution through gene loss. Infect Immun. 2012;80:406170 and. DOIPubMedGoogle Scholar




Cite This Article

DOI: 10.3201/eid1908.130016

Table of Contents – Volume 19, Number 8—August 2013

EID Search Options
presentation_01 Advanced Article Search – Search articles by author and/or keyword.
presentation_01 Articles by Country Search – Search articles by the topic country.
presentation_01 Article Type Search – Search articles by article type and issue.



Please use the form below to submit correspondence to the authors or contact them at the following address:

Claire Jenkins, Microbiology Services, Health Protection Agency, 61 Colindale Ave, London NW9 5HT, UK

Send To

10000 character(s) remaining.


Page created: July 11, 2013
Page updated: July 11, 2013
Page reviewed: July 11, 2013
The conclusions, findings, and opinions expressed by authors contributing to this journal do not necessarily reflect the official position of the U.S. Department of Health and Human Services, the Public Health Service, the Centers for Disease Control and Prevention, or the authors' affiliated institutions. Use of trade names is for identification only and does not imply endorsement by any of the groups named above.