Skip directly to site content Skip directly to page options Skip directly to A-Z link Skip directly to A-Z link Skip directly to A-Z link

Disclaimer: Early release articles are not considered as final versions. Any changes will be reflected in the online version in the month the article is officially released.

Volume 32, Number 6—June 2026

Research

Characteristics of Plausible Source Cases Responsible for Recent Mycobacterium tuberculosis Transmission, United States, 2018–2022

Author affiliation: Centers for Disease Control and Prevention, Atlanta, Georgia, USA (S. Kammerer, K. Raz, J. Wortham, S. Talarico); Emory University, Atlanta (D. Flanagan); California Department of Public Health, Richmond, California, USA (T. Shaw)

Suggested citation for this article

Abstract

Tuberculosis (TB) outbreaks in the United States can cause substantial illness. Using surveillance and genotyping data, we applied a plausible source–case algorithm to identify TB cases reported during 2018–2020 responsible for secondary cases attributed to recent Mycobacterium tuberculosis transmission during 2020–2022. We used mixed models and a machine learning workflow to assess sociodemographic, clinical, and social risk factors associated with plausible sources. In mixed models, sputum smear positivity, cavitary disease, race/ethnicity other than non-Hispanic White or non-Hispanic Asian, age <65 years, US birth, and homelessness were associated with plausible sources. An adaptive boosting model achieved an area under the receiver operating characteristic curve of 0.81 on test data. Transmission was heterogeneous; 8.1% of sources linked to 3–15 secondary cases accounted for 24.9% of transmission events. Focusing case management and contact investigations on cases with the characteristics we identified could reduce M. tuberculosis transmission and improve TB prevention.

Tuberculosis (TB) is the leading infectious cause of death worldwide (1). The World Health Organization estimated that >10 million persons had TB develop during 2023 (1). TB incidence in the United States is low at <3 cases/100,000 population (≈9,000 annual cases) reported in recent years (2). Most US TB cases reflect reactivation of past Mycobacterium tuberculosis infection rather than recent transmission within US borders and most cases are diagnosed among non–US-born persons (2,3). However, outbreaks resulting from M. tuberculosis transmission within the United States continue to occur, disproportionately affect US-born persons, and can cause substantial illness (410). Therefore, targeted interventions to prevent M. tuberculosis transmission and TB outbreaks are crucial.

Public health interventions to prevent reactivation of latent TB differ from interventions designed to prevent M. tuberculosis transmission (11). Whereas preventing reactivation requires diagnosing and treating asymptomatic latent TB among persons with epidemiologic risk factors (e.g., birth outside the United States) (3,12), preventing M. tuberculosis transmission requires intensive case management to promptly initiate treatment to cure TB and identify, fully evaluate, and treat close contacts of TB case-patients (13). Clinically distinguishing between TB resulting from reactivation versus recent M. tuberculosis transmission often is impossible; thus, estimates of the percentage of cases attributable to recent transmission are needed to inform prioritization of resources and activities for TB prevention (1420). With this aim, the Centers for Disease Control and Prevention (CDC) developed and validated a plausible source–case algorithm to estimate the percentage of TB cases attributable to recent transmission within US borders over a 2-year period (17,18). On the basis of that algorithm, ≈10%–15% of TB cases nationwide were attributed to recent M. tuberculosis transmission, but local estimates varied widely (17,21). During 2020–2021, a total of 1,400 cases were attributed to recent transmission in the 50 US states and the District of Columbia (21). Recent literature also suggests heterogeneity in transmission at the individual case level; some analyses have proposed that transmission associated with a few TB cases accounts for a disproportionate share of secondary cases (15,16,2225) and that transmission associated with those cases is more likely to result in outbreaks (15,24).

Focused interventions to prevent transmission, such as intensive case management and contact investigations for cases most likely to generate secondary cases, could reduce TB illnesses. Nonetheless, few analyses have sought to describe the characteristics of cases presumed to have transmitted M. tuberculosis to secondary cases (17,18). We used national molecular and case surveillance data to characterize the sociodemographic, clinical, and social risk factors associated with plausible TB source cases in the United States.

Materials and Methods

Data Sources

We obtained patient and case characteristics from the National Tuberculosis Surveillance System for incident TB cases reported to CDC during 2018–2022; demographic, clinical, and social risk factors are reported for each case (21). We included cases from all 50 US states and the District of Columbia.

For community-level characteristics, we used the CDC/Agency for Toxic Substances and Disease Registry’s Social Vulnerability Index (SVI) based on the US Census Bureau’s 2016–2020 American Community Survey (26). The SVI provides a relative vulnerability ranking for each US county by using 16 social factors grouped into 4 themes (i.e., socioeconomic status [SES], household characteristics, racial and ethnic minority status, and housing type and transportation) and an overall ranking. Rankings are values from 0 to 1, where higher values indicate higher social vulnerability.

We used genotyping data generated from whole-genome sequencing methods that assign a whole-genome multilocus sequence type (wgMLSType). M. tuberculosis isolate sequencing has been routinely available for >96% of culture-confirmed TB cases reported since 2018. The wgMLSType is assigned by comparing the sequences of 2,672 genetic loci and designating the same wgMLSType to cases that have matching patterns at >99.7% of loci.

We performed whole-genome single-nucleotide polymorphism (wgSNP) comparisons for all isolate pairs identified by the plausible source–case algorithm as a potential source and secondary case during the study period. We used BioNumerics 7.6.3 (Applied Maths, http://www.applied-maths.com) to perform wgSNP comparisons, which provide increased molecular resolution compared with wgMLSType. We excluded any single-nucleotide polymorphisms (SNPs) in a wgSNP comparison if total coverage was <5 reads or if it contained any ambiguous or unreliable bases or gaps. We also excluded all SNPs that were <12 bp from another SNP.

Recent Transmission Estimates

We applied the field-validated plausible source–case algorithm to culture-confirmed, genotyped TB cases reported during 2020–2022 that had nonunique wgMLSTypes. We attributed a case to recent transmission if >1 plausible source case of pulmonary or laryngeal TB was diagnosed within the previous 2 years in persons >10 years of age who had matching wgMLSType and resided within a 10-mile radius (18).

We excluded plausible source cases with isolates that differed by >5 SNPs from the secondary case’s isolate to restrict the dataset to case pairs that are most likely to represent recent transmission (27). We extracted all remaining plausible source cases reported during 2018–2020; we classified any case not identified as a plausible source case for >1 cases attributed to recent transmission as a nonplausible source. Limiting the plausible source–case population to 2018–2020 enabled an equal 2-year follow-up for any subsequent secondary cases when assigning plausible source case status.

Study Population

We used a single binary outcome for analyses, defined as whether a case was identified as a plausible source case during 2018–2020 for >1 secondary cases attributed to recent transmission during 2020–2022. Because a secondary case can have multiple plausible sources, we assessed 2 scenarios: inclusion of all plausible sources (all scenario) and selection of a single most likely plausible source (most likely scenario).

Figure 1

Determination of the most likely plausible source cases responsible for recent Mycobacterium tuberculosis transmission, United States, 2018–2022. Flowchart shows the structured decision process used to identify the most likely plausible source case. The algorithm evaluated SNP differences, epidemiologic links in the NTSS, and an index of infectiousness. If ties remained after applying those criteria, 1 plausible source case was randomly selected from the remaining candidates. CXR, chest radiograph; CT, computed tomography; NTSS, National Tuberculosis Surveillance System; SNP, single-nucleotide polymorphism; TB, tuberculosis.

Figure 1. Determination of the most likely plausible source cases responsible for recent Mycobacterium tuberculosistransmission, United States, 2018–2022. Flowchart shows the structured decision process used to identify the most likely...

We developed a decision tree to determine the most likely plausible source case. For each secondary case, we selected a single most likely source by prioritizing minimum wgSNP distance, then documented epidemiologic link, followed by highest infectiousness index. We resolved ties at random (Figure 1). We applied the most likely scenario to plot the distribution of the number of secondary cases attributed to recent transmission during 2020–2022 associated with each plausible source identified during 2018–2020.

Descriptive Analyses

We compared individual-level demographic, clinical, and social risk factors and community-level SVI measures of plausible versus nonplausible sources under both scenarios. To assess characteristics associated with plausible source cases that transmit to >3 secondary cases, we performed an additional descriptive analysis by using the most likely scenario and stratified the outcome into 0, 1–2, and >3 secondary cases. We used χ2 test of independence or Fisher exact test for categorical variables and Wilcoxon rank-sum tests for continuous variables. We considered p<0.05 statistically significant.

Generalized Linear Mixed Models

We fit a multivariable generalized linear mixed model (GLMM) to determine individual- and community-level characteristics associated with being a plausible source case and developed separate models for the all and most likely scenarios. We incorporated a random intercept for county to accommodate clustering of cases by county and inclusion of county-level SVI measures; we assessed the separate factors in any SVI theme that were significantly associated with the outcome. We performed backward elimination starting with all variables included in the descriptive analysis. We assessed effect modification in the reduced model and evaluated multicollinearity by using variance inflation factors. We sequentially removed covariates if their exclusion decreased the Akaike information criterion, finalized models when no further improvement was observed, and calculated adjusted odds ratios (aORs) and 95% CIs.

Machine Learning Models

We also evaluated machine learning models (MLMs) that predict whether a TB case is estimated to be a plausible source. Using the most likely scenario, we developed a machine learning workflow to assess 10 different methods and included all features (i.e., variables) from the descriptive analysis (28) (Appendix).

Sensitivity Analyses

We reran both the GLMM and MLM workflows to evaluate the effects of using the infectiousness index to determine the outcome and smear positivity and cavitary disease as predictors. First, we restricted the data to the subset of secondary cases for which the most likely plausible source was determined using only wgSNP difference or epidemiologic link. Then, we reran the selection hierarchy without the infectiousness index (Appendix).

We used SAS version 9.4 (SAS Institute, Inc., https://www.sas.com) for data management, descriptive analyses, and GLMM development and Python version 3.9.13 (Python Software Foundation, https://www.python.org) for MLM analyses. CDC determined this activity to be routine public health surveillance and not human subjects research. WGS was performed as part of routine public health surveillance and no new sequence data were generated as part of this study. Sequence data included in this analysis are available in the National Center for Biotechnology Information (BioProject no. PRJNA1237251).

Results

Study Population

Figure 2

Flow diagram showing selection of recent transmission source–secondary case pairs from 50 states and Washington, DC, included in an analysis of plausible source cases responsible for recent Mycobacterium tuberculosis transmission, United States, 2018–2022. Among 3,762 RT case pairs identified, case pairs with >5 SNP differences were excluded (n = 1,840), leaving 1,922 WGS–validated RT case pairs. The analytic dataset included 922 TB cases attributed to RT during 2020–2022 and plausible source cases identified during 2018–2020 (all plausible source cases [n = 893] and most likely plausible source cases [n = 645]). RT, recent transmission; SNP, single-nucleotide polymorphism; TB, tuberculosis; WGS, whole-genome sequencing.

Figure 2. Flow diagram showing selection of recent transmission source–secondary case pairs from 50 states and Washington, DC, included in an analysis of plausible source cases responsible for recent Mycobacterium tuberculosis...

During 2018–2022, a total of 41,264 TB cases were reported from the 50 US states and the District of Columbia, of which 32,110 (77.8%) were culture-confirmed and genotyped, and 61% (n = 19,577) of culture-confirmed and genotyped cases were reported during 2018–2020. Using the plausible source–case algorithm, we identified 3,762 recent transmission source–secondary case pairs for which 1,922 (51.1%) were supported by wgSNP analysis (Figure 2). The 1,922 case pairs comprised 922 cases attributed to recent transmission during 2020–2022, indicating a mean of 2.1 (range 1–24) plausible source cases per secondary case. We identified 893 (4.6%) unique plausible source cases during 2018–2020 for the all scenario (Table 1; Figure 2; Appendix Table 1). We found secondary cases attributed to recent transmission in 44 states.

For the most likely scenario, we identified 645 (3.3%) unique plausible source cases during 2018–2020 (Table 2; Appendix Table 2). Among the 922 cases attributed to recent transmission, we chose 753 (81.7%) plausible source cases by using wgSNP difference, 114 (12.4%) by using the index of infectiousness, 44 (4.8%) by using random selection, and 11 (1.2%) by using epidemiologic links (Appendix Table 4).

Descriptive Analyses

In the all scenario, plausible sources were more often male sex, US-born, <65 years of age, and other than non-Hispanic White or non-Hispanic Asian race/ethnicity, and they more frequently reported homelessness and substance use. Those plausible sources also more often had smear positivity and cavitary disease. County-level social vulnerability was higher among plausible sources, especially the SES theme (Table 1; Appendix Table 1).

Using the most likely scenario, characteristic distribution differed greatly across the 3 transmission categories (nonplausible source, plausible source for 1–2 cases, and plausible source for >3 cases) (Table 3; Appendix Table 3). Descriptively, among plausible source cases estimated to have transmitted to >3 (range 3–15) secondary cases, the highest percentages were among persons reporting male sex, 25–44 years of age, US-born, non-Hispanic Black race/ethnicity, experiencing homelessness, noninjection drug use, sputum smear positivity, and cavitary disease. Social vulnerability, including all 4 SVI themes, was also highest for the >3 secondary cases category.

GLMMs

Of the 19,577 cases available for analysis, we excluded 311 (1.6%) with missing data for sex, birth country, race/ethnicity, sputum smear, or county. We excluded all cases among children 0–4 years of age from analysis because plausible source cases had to be >10 years of age. For the all scenario, we found that the following characteristics were most associated with being a plausible source case: race/ethnicity, specifically Native Hawaiian/Pacific Islander non-Hispanic (aOR 5.34 [95% CI 3.11–9.17]), American Indian/Alaska Native non-Hispanic (aOR 2.12 [95% CI 1.10–4.06]), and Black non-Hispanic (aOR 1.82 [95% CI 1.40–2.36]), compared with White non-Hispanic persons; age <65 years compared with >65 years of age, the greatest association of which was 15–24 years of age (aOR 3.01 [95% CI 2.21–4.30]); US-born (aOR 2.41 [95% CI 1.99–2.91]); experiencing homelessness (aOR 1.89 [95% CI 1.49–2.39]); and indicators of infectiousness, specifically positive sputum smear (aOR 1.71 [95% CI 1.42–2.07]) and cavitary disease (aOR 1.69 [95% CI 1.42–2.00]) (Table 4). Noninjection drug use, male sex, and Hispanic ethnicity also were associated with plausible source cases.

Among the SVI themes, only SES was significantly associated with plausible source cases in multivariable generalized linear mixed modeling (p<0.001). Thus, we included 2 of the SES component factors in the final model: housing cost burden (aOR 1.20 [95% CI 1.06–1.36]), defined as households spending >30% of annual income on housing (26); and not having health insurance (aOR 1.13 [95% CI 1.02–1.25]). We calculated SVI aORs on the basis of a 0.20-unit (i.e., 20%) increase for each factor. Multicollinearity was modest for age and race/ethnicity, but no variance inflation factors exceeded 4. We found no statistically significant effect modification in the final model.

The final model for the most likely scenario included the same variables as the all scenario except for SES factors (Table 5). We only retained the not having health insurance (aOR 1.12 [95% CI 1.04–1.21]) factor in that model.

Machine Learning Predictive Models

After random selection of nonplausible source cases, we ran MLMs under the most likely scenario by using 3,456 observations, among which we used 2,686 (77.8%) for training and 770 (22.2%) for testing (Appendix Table 5). We assessed recall and F1 statistic, which is the harmonic mean of weighted-average recall and weighted-average precision computed as prevalence-weighted averages across the classes. We noted recall and F1 statistic were highest for gradient boosting (recall 0.758; F1 0.752), adaptive boosting (recall 0.751, F1 0.750), and random forest (recall 0.755; F1 0.740) methods; we selected those methods for hyperparameter tuning. The tuned adaptive boosting model had the highest area under the receiver operating characteristic curve (AUC; 0.780) (Appendix).

Figure 3

Characteristic from an adaptive boosting model of plausible source cases responsible for recent Mycobacterium tuberculosis transmission, United States, 2018–2022. Horizontal bars show the normalized impurity-based importance (mean decrease in node impurity) associated with each predictor; larger values indicate greater importance. NH/PI, Native Hawaiian/Pacific Islander; SVI, Social Vulnerability Index.

Figure 3. Characteristic from an adaptive boosting model of plausible source cases responsible for recent Mycobacterium tuberculosistransmission, United States, 2018–2022. Horizontal bars show the normalized impurity-based importance (mean decrease in...

We found no reduction in predictive performance for the tuned adaptive boosting model when applied to the test set (F1 0.761; AUC 0.811) versus model training (F1 0.755; AUC 0.780). Sensitivity was 55.4% (95% CI 48.7%–62.2%), specificity 82.7% (95% CI 79.5%–85.8%), and AUC 0.81 (95% CI 0.78–0.84) with the test set. Individual-level factors dominated feature importance and county-level SVI measures were less predictive (Figure 3; Appendix).

Distribution of Secondary Cases Associated with Plausible Source Cases

The distribution of the number of secondary cases attributed to each plausible source case was right skewed, indicating heterogeneity of TB transmission (Appendix Figure). Most (76.6%, n = 494) plausible source cases transmitted to a single secondary case and comprised 53.6% of all estimated transmission events. Plausible source cases that transmitted to 3–15 secondary cases (8.1%, n = 52) comprised 24.9% of all estimated transmission events.

Sensitivity Analyses

We restricted analyses to source assignments solely on the basis of wgSNP difference or epidemiologic link and we removed the index of infectiousness from the hierarchy. In both instances, associations for smear positivity and cavitary disease were modestly attenuated but remained statistically significant (Appendix Table 6).

Discussion

M. tuberculosis transmission was rare in the United States during 2018–2022; nonetheless, cases attributed to recent transmission were diagnosed in nearly every US state. In mixed models, sputum smear positivity, cavitary disease, race/ethnicity other than non-Hispanic White or Asian, age 15–44 years, being US-born, and homelessness were associated with being a plausible source for transmission. We found heterogeneous transmission and 8.1% of plausible source cases identified by the recent transmission algorithm were linked to 3–15 secondary cases and accounted for 24.9% of inferred transmission events. Among plausible source cases identified during 2018–2020 that were linked to >3 secondary cases, 86.5% were sputum smear–positive versus 48.1% of all genotyped cases reported during that period; similarly, 76.9% of plausible source cases had cavitary disease versus 36.7% of all genotyped cases. US-born persons only accounted for 27.4% of all genotyped cases, but 71.2% of plausible sources linked to >3 secondary cases were among US-born persons.

Our results align with prior investigations of source-case characteristics. A study from the Netherlands defined source cases as the first case diagnosed in a genotype-matched cluster (29). That study found fewer secondary cases for female sex and decreasing numbers of cases with increasing source age. In another study conducted among smear-positive TB patients in Barcelona, Spain, researchers used contact investigations to identify secondary cases and found that more secondary cases occurred after cases in younger adults, those with cavitary disease, and persons who injected drugs (30). In Peru, sources identified by using wgSNP distance more often were <34 years of age, were male, and had incarceration history or reported alcohol use or smoking (31).

Characteristics of plausible source cases in our study were like characteristics of cases estimated to be attributed to recent transmission. One study found that among the largest 10% of recent transmission clusters, cases attributed to recent transmission were more likely to be in persons who were US-born, American Indian/Alaska Native non-Hispanic, Native Hawaiian/Pacific Islander non-Hispanic, Black non-Hispanic, and Asian non-Hispanic and who reported homelessness (17).

In our study, positive sputum smear and cavitary disease, markers of infectiousness and advanced disease (3234), were consistently associated with source case status in both GLMM and MLM, including sensitivity analyses where those factors were not used to select between potential source cases. Those associations support the current practice of prioritizing contact investigations for smear-positive and cavitary TB cases (13) and suggest that diagnostic delays contribute to transmission. Delays might reflect barriers to care, including homelessness (4, 5,10,35), or missed diagnoses after seeking healthcare services (36). Because TB is uncommon in the United States, clinicians might not routinely consider it in symptomatic patients. Of note, whereas most US TB cases occur among persons born outside the United States, plausible source cases in our study had higher odds of being in US-born persons. Although Asian non-Hispanic persons account for most TB cases (21), they were not substantially associated with plausible source status in this study. Missed diagnoses and longer infectious periods among higher-risk groups could explain those findings. Therefore, robust epidemiology and outbreak detection should be used to customize local TB control and prevention efforts focused on at-risk persons.

We found that county-based social determinants of health related to SES (i.e., burdened by housing cost and not having health insurance) were associated with plausible source case attribution in the GLMM (26). Those associations could reflect barriers to TB care, including lack of insurance and constrained resources in settings with high housing costs (37,38). A study that examined data from TB cases reported to the California TB Registry during 2012–2016 found higher TB rates in the lowest SES areas, which defined SES by low education, crowding, poverty, and the California Healthy Places Index (39). An ecologic analysis in another study reported that census tracts with lower median incomes, more racial/ethnic minority groups, and more migrants had higher pediatric TB rates; however, overcrowding and unemployment were not associated (40). As in those prior studies, our analysis suggests that area-based SES measures could inform TB prevention efforts.

Adaptive boosting ranked male sex and younger age (<65 years) as the main predictors, followed by race/ethnicity, clinical indicators of infectiousness, and birth origin (Figure 3). Ensemble methods outperformed regression-based approaches in this analysis (Appendix Table 5) and have been used to predict TB outcomes, including cluster growth and positive laboratory results (41,42). The GLMM supports inference through interpretable adjusted associations, and the MLM offers a complementary perspective focused on prediction. Despite differences in ranking, many of the same predictors were influential across both approaches, including markers of infectiousness, age, race/ethnicity, origin of birth, and several social factors. That convergence supports the robustness of our findings.

Strengths of our analysis included the use of wgSNP comparisons to refute case pairs that were not likely to be the result of recent transmission, assessment of both patient- and community-level predictors for association with being a plausible source case, and analysis of plausible source cases of recent transmission at a national level. The first limitation of this analysis is that some cases could have been misattributed as not sources by the algorithm because the source was reported after the secondary case, the source was outside a 10-mile radius, or the secondary case was not genotyped. Although ≈75% of US cases are culture-confirmed (21) and >96% of those are genotyped, TB cases in young children, which would predominantly result from recent transmission, have substantially lower rates of culture confirmation. Furthermore, because we limited running the algorithm to starting in 2020, some of the 2018–2019 cases could have been misattributed as not a source if the resulting secondary case was also reported in 2018–2019. We also might have misattributed cases as not sources if all contacts with latent TB infection underwent successful treatment and did not develop TB disease. Second, the COVID-19 pandemic occurred during the analysis period. The pandemic was associated with changes in healthcare seeking behavior that could have affected TB diagnoses; therefore, generalization of our results to other time periods should be done with caution.

In summary, although M. tuberculosis transmission is relatively rare in the United States, targeted control efforts could prevent outbreaks that overwhelm public health programs. We identified both patient-level and, to a lesser extent, community-level characteristics as predictors of being a source of recent transmission. Cases with clinical indicators of increased infectiousness were more likely to be transmission sources, supporting current guidance to prioritize those cases for contact investigation. Demographic characteristics associated with being a source case, such as race and origin of birth, differed from those of overall TB cases, highlighting the need for prompt TB diagnosis in US-born persons with risk factors, particularly homelessness and substance use, for preventing outbreaks. In addition, enhanced efforts to promptly diagnose TB in communities with more uninsured residents and those with high housing costs might also reduce transmission. Findings from this analysis suggest that intensifying public health interventions on TB cases in persons with certain demographic and clinical characteristics could yield a greater than expected reduction in M. tuberculosis transmission in the United States.

Mr. Kammerer is a health statistician at the Centers for Disease Control and Prevention in Atlanta. His primary research interests include tuberculosis molecular epidemiology and outbreak detection methods.

Top

Acknowledgments

We thank the state and local health departments who collect and report the TB data that were used for these analyses. We specifically thank Noah Schwartz for his helpful comments during manuscript preparation.

CDC’s Division of Tuberculosis Elimination, National Center for HIV, Viral Hepatitis, STD, and Tuberculosis Prevention, provided funding support through employee salaries for this publication.

We used the large language model–based tool ChatGPT (OpenAI, https://openai.com) for limited editorial assistance (e.g., wording and formatting suggestions). All analyses, drafts of the manuscript, interpretations, figures, graphs, and final wording decisions were made by the authors.

Top

References

  1. World Health Organization. Global tuberculosis report no. 978-92-4-010153-1. Geneva: The Organization; 2024.
  2. Williams  PM, Pratt  RH, Walker  WL, Price  SF, Stewart  RJ, Feng  PI. Tuberculosis—United States, 2023. MMWR Morb Mortal Wkly Rep. 2024;73:26570. DOIPubMedGoogle Scholar
  3. LoBue  PA, Mermin  JH. Latent tuberculosis infection: the final frontier of tuberculosis elimination in the USA. Lancet Infect Dis. 2017;17:e32733. DOIPubMedGoogle Scholar
  4. Raz  KM, Talarico  S, Althomsons  SP, Kammerer  JS, Cowan  LS, Haddad  MB, et al. Molecular surveillance for large outbreaks of tuberculosis in the United States, 2014–2018. Tuberculosis (Edinb). 2022;136:102232. DOIPubMedGoogle Scholar
  5. Haddad  MB, Mitruka  K, Oeltmann  JE, Johns  EB, Navin  TR. Characteristics of tuberculosis cases that started outbreaks in the United States, 2002–2011. Emerg Infect Dis. 2015;21:50810. DOIPubMedGoogle Scholar
  6. Stewart  RJ, Raz  KM, Burns  SP, Kammerer  JS, Haddad  MB, Silk  BJ, et al. Tuberculosis outbreaks in state prisons, United States, 2011–2019. Am J Public Health. 2022;112:11709. DOIPubMedGoogle Scholar
  7. Stalter  RM, Pecha  M, Dov  L, Miller  D, Ghazal  Z, Wortham  J, et al. Tuberculosis outbreak in a state prison system—Washington, 2021–2022. MMWR Morb Mortal Wkly Rep. 2023;72:30912. DOIPubMedGoogle Scholar
  8. Groenweghe  E, Swensson  L, Winans  KD, Griffin  P, Haddad  MB, Brostrom  RJ, et al. Outbreak of multidrug-resistant tuberculosis—Kansas, 2021–2022. MMWR Morb Mortal Wkly Rep. 2023;72:95760. DOIPubMedGoogle Scholar
  9. Labuda  SM, McDaniel  CJ, Talwar  A, Braumuller  A, Parker  S, McGaha  S, et al. Tuberculosis outbreak associated with delayed diagnosis and long infectious periods in rural Arkansas, 2010–2018. Public Health Rep. 2022;137:94101. DOIPubMedGoogle Scholar
  10. Mindra  G, Wortham  JM, Haddad  MB, Powell  KM. Tuberculosis outbreaks in the United States, 2009–2015. Public Health Rep. 2017;132:15763. DOIPubMedGoogle Scholar
  11. Churchyard  G, Kim  P, Shah  NS, Rustomjee  R, Gandhi  N, Mathema  B, et al. What we know about tuberculosis transmission: an overview. J Infect Dis. 2017;216:S62935. DOIPubMedGoogle Scholar
  12. Mangione  CM, Barry  MJ, Nicholson  WK, Cabana  M, Chelmow  D, Coker  TR, et al.; US Preventive Services Task Force. Screening for latent tuberculosis infection in adults: US Preventive Services Task Force recommendation statement. JAMA. 2023;329:148794. DOIPubMedGoogle Scholar
  13. Cole  B, Nilsen  DM, Will  L, Etkind  SC, Burgos  M, Chorba  T. Essential components of a public health tuberculosis prevention, control, and elimination program: recommendations of the Advisory Council for the Elimination of Tuberculosis and the National Tuberculosis Controllers Association. MMWR Recomm Rep. 2020;69:127. DOIPubMedGoogle Scholar
  14. Smith  JP, Cohen  T, Dowdy  D, Shrestha  S, Gandhi  NR, Hill  AN. Quantifying Mycobacterium tuberculosis transmission dynamics across global settings: a systematic analysis. Am J Epidemiol. 2023;192:13345. DOIPubMedGoogle Scholar
  15. Smith  JP, Gandhi  NR, Silk  BJ, Cohen  T, Lopman  B, Raz  K, et al. A cluster-based method to quantify individual heterogeneity in tuberculosis transmission. Epidemiology. 2022;33:21727. DOIPubMedGoogle Scholar
  16. Shrestha  S, Winglee  K, Hill  AN, Shaw  T, Smith  JP, Kammerer  JS, et al. Model-based analysis of tuberculosis genotype clusters in the United States reveals high degree of heterogeneity in transmission and state-level differences across California, Florida, New York, and Texas. Clin Infect Dis. 2022;75:143341. DOIPubMedGoogle Scholar
  17. Yuen  CM, Kammerer  JS, Marks  K, Navin  TR, France  AM. Recent transmission of tuberculosis—United States, 2011–2014. PLoS One. 2016;11:e0153728. DOIPubMedGoogle Scholar
  18. France  AM, Grant  J, Kammerer  JS, Navin  TR. A field-validated approach using surveillance and genotyping data to estimate tuberculosis attributable to recent transmission in the United States. Am J Epidemiol. 2015;182:799807. DOIPubMedGoogle Scholar
  19. Noppert  GA, Yang  Z, Clarke  P, Davidson  P, Ye  W, Wilson  ML. Contextualizing tuberculosis risk in time and space: comparing time-restricted genotypic case clusters and geospatial clusters to evaluate the relative contribution of recent transmission to incidence of TB using nine years of case data from Michigan, USA. Ann Epidemiol. 2019;40:2127.e3. DOIPubMedGoogle Scholar
  20. Mamiya  H, Schwartzman  K, Verma  A, Jauvin  C, Behr  M, Buckeridge  D. Towards probabilistic decision support in public health practice: predicting recent transmission of tuberculosis from patient attributes. J Biomed Inform. 2015;53:23742. DOIPubMedGoogle Scholar
  21. Centers for Disease Control and Prevention. Reported tuberculosis in the United States, 2021 [cited 2025 Jul 7]. https://www.cdc.gov/tb/statistics/reports/2021/default.htm
  22. Ypma  RJ, Altes  HK, van Soolingen  D, Wallinga  J, van Ballegooijen  WM. A sign of superspreading in tuberculosis: highly skewed distribution of genotypic cluster sizes. Epidemiology. 2013;24:395400. DOIPubMedGoogle Scholar
  23. Stein  RA. Super-spreaders in infectious diseases. Int J Infect Dis. 2011;15:e5103. DOIPubMedGoogle Scholar
  24. Rodriguez  CA, Li  T, Self  JL, Jenkins  HE, Horsburgh  CR, White  LF. Genotyping indicates marked heterogeneity of tuberculosis transmission in the United States, 2009–2018. Epidemiol Infect. 2021;149:e215. DOIGoogle Scholar
  25. Melsew  YA, Gambhir  M, Cheng  AC, McBryde  ES, Denholm  JT, Tay  EL, et al. The role of super-spreading events in Mycobacterium tuberculosis transmission: evidence from contact tracing. BMC Infect Dis. 2019;19:244. DOIPubMedGoogle Scholar
  26. Centers for Disease Control and Prevention; Agency for Toxic Substances and Disease Registry. CDC/ATSDR Social Vulnerability Index 2022 database [cited 2025 Jul 10]. https://www.atsdr.cdc.gov/placeandhealth/svi/data_documentation_download.html
  27. Walker  TM, Ip  CL, Harrell  RH, Evans  JT, Kapatai  G, Dedicoat  MJ, et al. Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study. Lancet Infect Dis. 2013;13:13746. DOIPubMedGoogle Scholar
  28. Chawla  NV, Bowyer  KW, Hall  LO, Kegelmeyer  WP. SMOTE: Synthetic Minority Over-sampling Technique. J Artif Intell Res. 2002;16:32157. DOIGoogle Scholar
  29. Borgdorff  MW, Nagelkerke  NJ, de Haas  PE, van Soolingen  D. Transmission of Mycobacterium tuberculosis depending on the age and sex of source cases. Am J Epidemiol. 2001;154:93443. DOIPubMedGoogle Scholar
  30. Rodrigo  T, Caylà  JA, García de Olalla  P, Galdós-Tangüis  H, Jansà  JM, Miranda  P, et al. Characteristics of tuberculosis patients who generate secondary cases. Int J Tuberc Lung Dis. 1997;1:3527.PubMedGoogle Scholar
  31. Trevisi  L, Brooks  MB, Becerra  MC, Calderón  RI, Contreras  CC, Galea  JT, et al. Who transmits tuberculosis to whom: a cross-sectional analysis of a cohort study in Lima, Peru. Am J Respir Crit Care Med. 2024;210:22233. DOIPubMedGoogle Scholar
  32. Lau  A, Barrie  J, Winter  C, Elamy  AH, Tyrrell  G, Long  R. Chest radiographic patterns and the transmission of tuberculosis: implications for automated systems. PLoS One. 2016;11:e0154032. DOIPubMedGoogle Scholar
  33. Asadi  L, Croxen  M, Heffernan  C, Dhillon  M, Paulsen  C, Egedahl  ML, et al. How much do smear-negative patients really contribute to tuberculosis transmissions? Re-examining an old question with new tools. EClinicalMedicine. 2022;43:101250. DOIPubMedGoogle Scholar
  34. Urbanowski  ME, Ordonez  AA, Ruiz-Bedoya  CA, Jain  SK, Bishai  WR. Cavitary tuberculosis: the gateway of disease transmission. Lancet Infect Dis. 2020;20:e11728. DOIPubMedGoogle Scholar
  35. Shrestha  S, Cilloni  L, Asay  GRB, Kammerer  JS, Raz  K, Shaw  T, et al. Model-based analysis of impact, costs, and cost-effectiveness of tuberculosis outbreak investigations, United States. Emerg Infect Dis. 2025;31:497506. DOIPubMedGoogle Scholar
  36. Wallace  RM, Kammerer  JS, Iademarco  MF, Althomsons  SP, Winston  CA, Navin  TR. Increasing proportions of advanced pulmonary tuberculosis reported in the United States: are delays in diagnosis on the rise? Am J Respir Crit Care Med. 2009;180:101622. DOIPubMedGoogle Scholar
  37. Simon  AE, Fenelon  A, Helms  V, Lloyd  PC, Rossen  LM. HUD housing assistance associated with lower uninsurance rates and unmet medical need. Health Aff (Millwood). 2017;36:101623. DOIPubMedGoogle Scholar
  38. Baker  DW, Shapiro  MF, Schur  CL. Health insurance and access to care for symptomatic conditions. Arch Intern Med. 2000;160:126974. DOIPubMedGoogle Scholar
  39. Bakhsh  Y, Readhead  A, Flood  J, Barry  P. Association of area-based socioeconomic measures with tuberculosis incidence in California. J Immigr Minor Health. 2023;25:64352. DOIPubMedGoogle Scholar
  40. Myers  WP, Westenhouse  JL, Flood  J, Riley  LW. An ecological study of tuberculosis transmission in California. Am J Public Health. 2006;96:68590. DOIPubMedGoogle Scholar
  41. Althomsons  SP, Winglee  K, Heilig  CM, Talarico  S, Silk  B, Wortham  J, et al. Using machine learning techniques and national tuberculosis surveillance data to predict excess growth in genotyped tuberculosis clusters. Am J Epidemiol. 2022;191:193643. DOIPubMedGoogle Scholar
  42. Smith  JP, Milligan  K, McCarthy  KD, Mchembere  W, Okeyo  E, Musau  SK, et al. Machine learning to predict bacteriologic confirmation of Mycobacterium tuberculosis in infants and very young children. PLOS Digit Health. 2023;2:e0000249. DOIPubMedGoogle Scholar

Top

Figures
Tables

Top

Suggested citation for this article: Kammerer S, Flanagan D, Raz K, Shaw T, Wortham J Talarico S. Characteristics of plausible source cases responsible for recent Mycobacterium tuberculosis transmission, United States, 2018–2022. Emerg Infect Dis. 2026 Jun [date cited]. https://doi.org/10.3201/eid3206.260104

DOI: 10.3201/eid3206.260104

Original Publication Date: May 15, 2026

Table of Contents – Volume 32, Number 6—June 2026

EID Search Options
presentation_01 Advanced Article Search – Search articles by author and/or keyword.
presentation_01 Articles by Country Search – Search articles by the topic country.
presentation_01 Article Type Search – Search articles by article type and issue.

Top

Comments

Please use the form below to submit correspondence to the authors or contact them at the following address:

Sarah Talarico, Centers for Disease Control and Prevention, 1600 Clifton Rd NE, Mailstop H24-3, Atlanta, GA 30329-4018, USA

Send To

10000 character(s) remaining.

Top

Page created: April 13, 2026
Page updated: May 15, 2026
Page reviewed: May 15, 2026
The conclusions, findings, and opinions expressed by authors contributing to this journal do not necessarily reflect the official position of the U.S. Department of Health and Human Services, the Public Health Service, the Centers for Disease Control and Prevention, or the authors' affiliated institutions. Use of trade names is for identification only and does not imply endorsement by any of the groups named above.
file_external