Clusters of Drug-Resistant Mycobacterium tuberculosis Detected by Whole-Genome Sequence Analysis of Nationwide Sample, Thailand, 2014–2017

Multidrug-resistant tuberculosis (MDR TB), pre-extensively drug-resistant tuberculosis (pre-XDR TB), and extensively drug-resistant tuberculosis (XDR TB) complicate disease control. We analyzed whole-genome sequence data for 579 phenotypically drug-resistant M. tuberculosis isolates (28% of available MDR/pre-XDR and all culturable XDR TB isolates collected in Thailand during 2014–2017). Most isolates were from lineage 2 (n = 482; 83.2%). Cluster analysis revealed that 281/579 isolates (48.5%) formed 89 clusters, including 205 MDR TB, 46 pre-XDR TB, 19 XDR TB, and 11 poly–drug-resistant TB isolates based on genotypic drug resistance. Members of most clusters had the same subset of drug resistance-associated mutations, supporting potential primary resistance in MDR TB (n = 176/205; 85.9%), pre-XDR TB (n = 29/46; 63.0%), and XDR TB (n = 14/19; 73.7%). Thirteen major clades were significantly associated with geography (p<0.001). Clusters of clonal origin contribute greatly to the high prevalence of drug-resistant TB in Thailand.


genome. A recent WGS analysis revealed possible clonal transmission of 4 MDR TB isolates in Kanchanaburi
Province (6). However, the extent of MDR TB and XDR TB clusters across Thailand is unknown. Our aim was to investigate the clustering patterns and risk factors of possible MDR TB, pre-XDR TB, and XDR TB transmission clusters across Thailand using WGS data.

Study Population and Setting
During 2014-2017, a total of 2,071 M. tuberculosis culture-confirmed MDR TB, pre-XDR TB, and XDR TB cases were listed in the laboratory records of the National Tuberculosis Reference Laboratory (NTRL; Ministry of Public Health) and Siriraj Hospital, Mahidol University, Thailand. These 2 laboratories cover 230 hospitals handling most DR TB cases in Thailand (Appendix 1 Tables 1, 2, https://wwwnc.cdc. gov/EID/article/27/3/20-4364-App1.xlsx) (7). We randomly selected 547 M. tuberculosis isolates from MDR TB and pre-XDR TB cases across 6 regions and 71 of 77 provinces nationally. We also included all retrievable (n = 32) XDR TB isolates (Appendix 1  Table 3). For 11 cases, we used pairs of isolates collected at different times as internal controls for SNP distances. In each control pair, we included the isolate with the most mutations associated with drug resistance or the chronologically earlier isolate in the studied population (n = 579). We retrieved demographic data from laboratory records. The study protocol was approved by the Center for Ethics in Human Research, Khon Kaen University (approval no. HE601249).

Whole-Genome Sequence Analysis
We used multiple loops of M. tuberculosis colonies for genomic DNA extraction (with the cetyl-trimethylammonium bromide-sodium chloride method) (9). WGS data for 590 M. tuberculosis isolates were produced by NovogeneAIT (https://en.novogene.com) using the HiSeq (Illumina, https://www.illumina. com) platform generating 150-bp paired-end reads. We checked the quality of sequence reads using FastQC version 0.11.7 (10). We mapped high-quality reads from each isolate onto the H37Rv reference genome (GenBank accession no. NC_000962.3) using BWA-MEM version 0.7.12 (Li H, unpub. data, https:// arxiv.org/abs/1303.3997). The average depth of sequencing coverage was high (341.01 ± 61.98). We used SAMtools version 0.1.19 (11) and GATK version 3.4.0 (12) to call SNPs and insertion/deletions. We filtered variants on the basis of a minimum coverage depth of 10-fold and Q20 minimum base-call quality score, and the intersection set of GATK and SAMtools variants was retained. We used the online tool TB-Profiler version 2.8.6 (13,14) to infer drug resistance and M. tuberculosis lineage membership on the basis of SNPs from the WGS data. The WGS data are available in the ENA Sequence Read Archive (https://www. ebi.ac.uk/ena/browser/home) (accession nos. PRJ-NA598981 and PRJNA613706).

Phylogenetic Analysis
We constructed a phylogenetic tree based on 26,541 high-confidence SNPs among 590 isolates using the maximum-likelihood method with the selected general time-reversible with gamma-distribution model, implemented within MEGA version 10.1 (15). We excluded the 130 SNPs known to be associated with DR TB found in this study to ensure that they would not affect the phylogenetic analysis. We inferred a bootstrap consensus tree from 1,000 replicates. We produced the phylogenetic tree image using iTOL (16).

Data Analysis
Isolates forming monophyletic groups in which many or all pairs differed by <25 SNPs were placed in the same clade. Clusters included isolates differing by 0-11 SNPs. We regarded members of a single cluster as possibly descended from a single clone through recent transmission. Less-recently transmitted isolates within a clade differed by 12-25 SNPs. We calculated the clustering percentage as (no. clustering isolates/ total no. isolates) × 100. We differentiated isolates with acquired DR TB from possible primary DR TB (MDR TB, pre-XDR TB, and XDR TB) isolates on the basis of acquisition of additional resistance-associated mutations, especially those associated with resistance to fluoroquinolones, kanamycin, or capreomycin, drugs that are used for DR TB classification. For clusters containing isolates with different types of DR TB (such as MDR TB and XDR TB), we used the acquisition of additional drug-resistance SNPs and co-ancestral relationships to differentiate between 2 patterns of acquired resistance: chronological (ancestral strain had fewer mutations, lesser type of DR, or both) or nonchronological (ancestral strain had more mutations, stronger type of DR, or both). Although XDR TB and pre-XDR TB could be considered as subsets of MDR TB, we have treated all 3 as separate categories in our analyses.
We analyzed all data using R statistical software version 3.6.1 (https://www.r-project.org) and considered p values <0.05 to be statistically significant. We analyzed associations between clades/clusters and geography using χ 2 tests and visualized them with the R package vcd version 1.4-8. We calculated odds ratios (ORs) with 95% CIs using the R package epiR version 1.0-4. We tested factors associated with clustering isolates using the Student t-test (numerical data), χ 2 test, or Fisher exact test (categorical data), when applicable. We constructed graphs using the R package ggplot2 version 3.2.1 and built phylo-maps using the package phytools version 0.7-20.

Factors Associated with Possible DR TB Transmission Clusters
TB patients from whom clustering isolates were obtained had an average age of ≈42 years. Isolates falling within clusters were significantly associated with geographic regions (p = 0.001; Appendix 2 Figure 3, panel B). Patients with TB who lived in western provinces had a higher risk of being within possible DR TB transmission clusters than those elsewhere (OR 2.44, 95% CI 1.53-3.89; Table 2). Lineage 2.2.1 (versus other lineages) was associated with a higher risk of possible DR TB transmission clusters (OR 3.59, 95% CI 2.42-5.32). Lineage 1 had the lowest risk of being represented in DR TB transmission clusters (OR 0.03, 95% CI 0.01-0.11). Clustering isolates had drug-resistance mutations such as katG S315T, rpoB S450L, and embB G406D (Table 2).

Discussion
MDR TB and XDR TB are serious global problems, but few studies have focused on their transmission at a nationwide resolution. Thailand has a high burden of MDR TB and increasing numbers of MDR TB cases (1). We sourced 579 DR TB isolates across 71 provinces during 2014-2017. Nearly half of these were in possible transmission clusters, mostly involving M. tuberculosis lineage 2.2.1. A total of 89 clusters, most distributed among 13 major clades, contributed to multiclonal MDR TB outbreaks associated with specific regions in Thailand. Bangkok, Kanchanaburi, and Chonburi were the provinces with the highest proportions of MDR TB, pre-XDR TB, and XDR TB clusters (i.e., groups of isolates differing by <11 SNPs). We used 2 criteria to select SNP cutoff values. First, the <11 SNP difference cutoff for a cluster was derived directly from the maximum number of differences between the 11 paired isolates used as an internal control. Second, we used an SNP cutoff concordant with, or more stringent than, those in previous studies (17)(18)(19)(20). Our 11-SNP cutoff was proportionally 0.0004 of the 26,541 SNPs in our total set. This proportion was concordant with that in a previous study (21), and more stringent than those in other studies (18,20). A <12-SNP cutoff has been previously proposed as the upper boundary for possible cluster transmission events (2).
Phylogenetic analysis identified 13 major clades, each associated with a particular region(s). Pairwise  SNP differences between isolates within clades ranged from <11 to ≈25, suggesting a range of divergence times from a common ancestor (Appendix 2 Figure 2). On the basis of the transmission time estimates (0.5 SNP/genome/year) for M. tuberculosis (2), some of these major clades might have begun to circulate in Thailand ≈20-40 years ago, others more recently. Isolates differing by 12-25 SNPs nevertheless often shared geographic links. For example, 17 of 21 (81%) isolates in clade 7 (Figure 3, panel H), which had pairwise differences indicating a relatively nonrecent common ancestor, were located within neighboring provinces of southern Thailand. Clades 1, 6, 11, and 13 each consisted of isolates differing at very few SNPs, giving us confidence that these were likely examples of recent transmission. Nonetheless, isolates in clade 6 often occurred in different provinces.
The largest and most recent clade was clade 13 (Figure 3, panel N), comprising 62 cases (46 MDR TB, 11 pre-XDR TB, and 5 XDR TB based on phenotypic DST) found in the western region, especially in Kanchanaburi. This finding suggests that clones of pre-XDR TB and XDR TB may emerge from recent  MDR TB ancestors. We confirmed a previous report (22) that there was a large MDR TB outbreak in Kanchanaburi. In addition, clade 13 is sister to clade 12, which consists of strains that spread in both central (especially Bangkok) and northeast Thailand and contain less recently transmitted strains. Therefore, the MDR TB outbreak clade in Kanchanaburi was derived from a less recently transmitted clade elsewhere in Thailand.
We identified 89 clusters (isolates in each differing by <11 SNPs) of DR TB in Thailand. The clustered isolates showed a strong association with geographic region. The largest cluster (C89), within clade 13 in Kanchanaburi, comprised 34 isolates (27 MDR TB and 7 pre-XDR TB based on phenotypic DST). In South Africa, WGS analysis of a large XDR TB cohort (>400 cases) from a single province showed that only 30% of participants had clear epidemiologic links (person-person or hospital link): 70% of transmission events may have resulted from casual contact between persons not known to one another (23). Another study in South Africa showed that 19% of XDR TB patients discharged from the hospital caused secondary XDR TB cases in the community (24). Here, we found 9 clusters of pre-XDR TB (the largest with 7 isolates) and 10 clusters (the largest with 4 isolates) of XDR TB in Thailand (Appendix 1 Table 9; Appendix 2 Figure 4).
To reflect the extent of the DR TB outbreak in Thailand, we calculated the proportion of isolates falling into the 89 DR TB clusters (Table 1). In some clusters, isolates exhibited different types of DR TB associated with chronology, revealing the progression of DR mutations in the phylogeny, moving from the ancestor toward the tips of the tree (Appendix 1 Table 8). Based on mutation-acquisition analysis within this phylogeny, we saw examples of possible primary resistance in 85.9% of MDR TB, 63.0% of pre-XDR TB, and 73.7% of XDR TB cases. Eight clusters included isolates with different types of DR and more resistance-associated mutations in the ancestral strain than in its descendants. This situation might be explained by different durations of the latency stage occurring after transmission events leading to the emergence of less troublesome DR TB cases (such as MDR TB) later than the more troublesome cases (such as XDR TB) (25). Because not all cases from the possible transmission chain could be included, undetected primary resistance might exist. Data from all DR TB cases in the community and information on treatment history and known exposure are needed to accurately and completely estimate the extent of primary DR TB. The proportion of primary DR TB cases could be higher because we reported numbers of MDR TB cases excluding pre-XDR TB and XDR TB (each of which was reported as a separate subset). In addition, some index cases might not have been included in the selected population.
Previously reported factors contributing to MDR TB transmission include illicit drug usage (26); delayed TB diagnosis and being >45 years of age (18); and being single, having low income, suffering frequent stress and other diseases, and lacking medical insurance (27). Lineage 2 predominated in previous studies of transmission of MDR TB (18,26,28). We found that infection with lineage 2.2.1 is the strongest predictor (3.6-fold) of DR TB clusters, whereas infection with lineage 1 had the lowest risk. Living in the western region of Thailand increased the risk of being in DR TB clusters by 2.4-fold. The western region, being close to the border with Myanmar, differs from other regions of the country in terms of both ethnicity and economic development. These differences might explain the increased risk there (29). Previously, clustering isolates were more likely to have mutations of rpoB S450L (18,30), katG S315T, or the inhA promoter (31). We also found a pattern of drug resistance-associated mutations (katG S315T, rpoB S450L, embB G406D, rpsL K43R, ethA 639-640del, and folC S150G) in clusters.
The DR TB situation in Thailand is a major concern and requires urgent implementation of control measures such as active case finding to disrupt the transmission chain and targeted intervention and contact tracing in hotspot regions. The mortality rate and cost of treatment of XDR TB is very high (32); therefore, these DR types should be the priority for intervention. The large size of some clusters might reflect their high transmissibility (33); thus, tracking clade 13 at Kanchanaburi should be a priority. Besides the 13 major clades, several small clusters of DR TB were found in many provinces. The potential for expansion of these small clusters is unknown. Here, we also identified the hotspot provinces to help prioritize locations for intervention.
Globally, few studies at the nationwide scale have used WGS analysis of MDR TB, pre-XDR TB, and XDR TB (26,30,(34)(35)(36). Older studies have used blunt genotyping tools (e.g., IS6110 restriction fragment length polymorphism, spoligotyping, and mycobacterial interspersed repetitive unit-variable-number tandem-repeat) with limited or convenient sample sizes. DR TB studies using WGS in Saudi Arabia and Portugal have revealed transmission clusters of MDR TB; however, they had small samples and provided limited data on epidemiologic links (36,37). Extrapolating from our findings, primary-resistant TB strains may be the main contributors to the current global problem of high MDR TB and XDR TB prevalence.
The primary limitations of our study were that it was retrospective rather than prospective, lacked socioeconomic data for analysis and lacked fine-scale data of epidemiologic links: possible transmission clusters were presumed only from the genetic distances among isolates and each patient's hospital and province of residence. In addition, an accurate estimation of the exact time of the possible transmission cannot be made: clusters originating years ago may be continuing to spread. We also lacked information about treatment and exposure history and of the complete population to identify all index cases to differentiate between primary and acquired DR TB. In addition, the prevalence and clustering of MDR TB, pre-XDR TB, and XDR TB isolates in some provinces might be underestimated because of the low coverage of DST for the first-line drugs among TB cases (1).
In conclusion, we have demonstrated the usefulness of WGS for DR TB epidemiology. We have shown that close to half of MDR TB, pre-XDR TB, and XDR TB cases in Thailand might be caused by transmission clusters. Two thirds of pre-XDR TB and three quarters of MDR TB and XDR TB clustering isolates were possible examples of primary resistance. These results indicate that the emergence of MDR TB, pre-XDR TB, and XDR TB cases in Thailand might be from a narrow base of ancestral strains. The high prevalence of MDR/XDR TB in Thailand might be the result of multiclonal outbreaks. People living in the western region of Thailand had a 2.4-fold increased risk of DR TB clusters, and lineage 2.2.1 conferred a 3.6-fold increased risk of forming DR TB clusters relative to other lineages.