Disclaimer: Early release articles are not considered as final versions. Any changes will be reflected in the online version in the month the article is officially released.
Volume 32, Number 6—June 2026
Research
Characteristics of Plausible Source Cases Responsible for Recent Mycobacterium tuberculosis Transmission, United States, 2018–2022
Suggested citation for this article
Abstract
Tuberculosis (TB) outbreaks in the United States can cause substantial illness. Using surveillance and genotyping data, we applied a plausible source–case algorithm to identify TB cases reported during 2018–2020 responsible for secondary cases attributed to recent Mycobacterium tuberculosis transmission during 2020–2022. We used mixed models and a machine learning workflow to assess sociodemographic, clinical, and social risk factors associated with plausible sources. In mixed models, sputum smear positivity, cavitary disease, race/ethnicity other than non-Hispanic White or non-Hispanic Asian, age <65 years, US birth, and homelessness were associated with plausible sources. An adaptive boosting model achieved an area under the receiver operating characteristic curve of 0.81 on test data. Transmission was heterogeneous; 8.1% of sources linked to 3–15 secondary cases accounted for 24.9% of transmission events. Focusing case management and contact investigations on cases with the characteristics we identified could reduce M. tuberculosis transmission and improve TB prevention.
Tuberculosis (TB) is the leading infectious cause of death worldwide (1). The World Health Organization estimated that >10 million persons had TB develop during 2023 (1). TB incidence in the United States is low at <3 cases/100,000 population (≈9,000 annual cases) reported in recent years (2). Most US TB cases reflect reactivation of past Mycobacterium tuberculosis infection rather than recent transmission within US borders and most cases are diagnosed among non–US-born persons (2,3). However, outbreaks resulting from M. tuberculosis transmission within the United States continue to occur, disproportionately affect US-born persons, and can cause substantial illness (4–10). Therefore, targeted interventions to prevent M. tuberculosis transmission and TB outbreaks are crucial.
Public health interventions to prevent reactivation of latent TB differ from interventions designed to prevent M. tuberculosis transmission (11). Whereas preventing reactivation requires diagnosing and treating asymptomatic latent TB among persons with epidemiologic risk factors (e.g., birth outside the United States) (3,12), preventing M. tuberculosis transmission requires intensive case management to promptly initiate treatment to cure TB and identify, fully evaluate, and treat close contacts of TB case-patients (13). Clinically distinguishing between TB resulting from reactivation versus recent M. tuberculosis transmission often is impossible; thus, estimates of the percentage of cases attributable to recent transmission are needed to inform prioritization of resources and activities for TB prevention (14–20). With this aim, the Centers for Disease Control and Prevention (CDC) developed and validated a plausible source–case algorithm to estimate the percentage of TB cases attributable to recent transmission within US borders over a 2-year period (17,18). On the basis of that algorithm, ≈10%–15% of TB cases nationwide were attributed to recent M. tuberculosis transmission, but local estimates varied widely (17,21). During 2020–2021, a total of 1,400 cases were attributed to recent transmission in the 50 US states and the District of Columbia (21). Recent literature also suggests heterogeneity in transmission at the individual case level; some analyses have proposed that transmission associated with a few TB cases accounts for a disproportionate share of secondary cases (15,16,22–25) and that transmission associated with those cases is more likely to result in outbreaks (15,24).
Focused interventions to prevent transmission, such as intensive case management and contact investigations for cases most likely to generate secondary cases, could reduce TB illnesses. Nonetheless, few analyses have sought to describe the characteristics of cases presumed to have transmitted M. tuberculosis to secondary cases (17,18). We used national molecular and case surveillance data to characterize the sociodemographic, clinical, and social risk factors associated with plausible TB source cases in the United States.
Data Sources
We obtained patient and case characteristics from the National Tuberculosis Surveillance System for incident TB cases reported to CDC during 2018–2022; demographic, clinical, and social risk factors are reported for each case (21). We included cases from all 50 US states and the District of Columbia.
For community-level characteristics, we used the CDC/Agency for Toxic Substances and Disease Registry’s Social Vulnerability Index (SVI) based on the US Census Bureau’s 2016–2020 American Community Survey (26). The SVI provides a relative vulnerability ranking for each US county by using 16 social factors grouped into 4 themes (i.e., socioeconomic status [SES], household characteristics, racial and ethnic minority status, and housing type and transportation) and an overall ranking. Rankings are values from 0 to 1, where higher values indicate higher social vulnerability.
We used genotyping data generated from whole-genome sequencing methods that assign a whole-genome multilocus sequence type (wgMLSType). M. tuberculosis isolate sequencing has been routinely available for >96% of culture-confirmed TB cases reported since 2018. The wgMLSType is assigned by comparing the sequences of 2,672 genetic loci and designating the same wgMLSType to cases that have matching patterns at >99.7% of loci.
We performed whole-genome single-nucleotide polymorphism (wgSNP) comparisons for all isolate pairs identified by the plausible source–case algorithm as a potential source and secondary case during the study period. We used BioNumerics 7.6.3 (Applied Maths, http://www.applied-maths.com) to perform wgSNP comparisons, which provide increased molecular resolution compared with wgMLSType. We excluded any single-nucleotide polymorphisms (SNPs) in a wgSNP comparison if total coverage was <5 reads or if it contained any ambiguous or unreliable bases or gaps. We also excluded all SNPs that were <12 bp from another SNP.
Recent Transmission Estimates
We applied the field-validated plausible source–case algorithm to culture-confirmed, genotyped TB cases reported during 2020–2022 that had nonunique wgMLSTypes. We attributed a case to recent transmission if >1 plausible source case of pulmonary or laryngeal TB was diagnosed within the previous 2 years in persons >10 years of age who had matching wgMLSType and resided within a 10-mile radius (18).
We excluded plausible source cases with isolates that differed by >5 SNPs from the secondary case’s isolate to restrict the dataset to case pairs that are most likely to represent recent transmission (27). We extracted all remaining plausible source cases reported during 2018–2020; we classified any case not identified as a plausible source case for >1 cases attributed to recent transmission as a nonplausible source. Limiting the plausible source–case population to 2018–2020 enabled an equal 2-year follow-up for any subsequent secondary cases when assigning plausible source case status.
Study Population
We used a single binary outcome for analyses, defined as whether a case was identified as a plausible source case during 2018–2020 for >1 secondary cases attributed to recent transmission during 2020–2022. Because a secondary case can have multiple plausible sources, we assessed 2 scenarios: inclusion of all plausible sources (all scenario) and selection of a single most likely plausible source (most likely scenario).
We developed a decision tree to determine the most likely plausible source case. For each secondary case, we selected a single most likely source by prioritizing minimum wgSNP distance, then documented epidemiologic link, followed by highest infectiousness index. We resolved ties at random (Figure 1). We applied the most likely scenario to plot the distribution of the number of secondary cases attributed to recent transmission during 2020–2022 associated with each plausible source identified during 2018–2020.
Descriptive Analyses
We compared individual-level demographic, clinical, and social risk factors and community-level SVI measures of plausible versus nonplausible sources under both scenarios. To assess characteristics associated with plausible source cases that transmit to >3 secondary cases, we performed an additional descriptive analysis by using the most likely scenario and stratified the outcome into 0, 1–2, and >3 secondary cases. We used χ2 test of independence or Fisher exact test for categorical variables and Wilcoxon rank-sum tests for continuous variables. We considered p<0.05 statistically significant.
Generalized Linear Mixed Models
We fit a multivariable generalized linear mixed model (GLMM) to determine individual- and community-level characteristics associated with being a plausible source case and developed separate models for the all and most likely scenarios. We incorporated a random intercept for county to accommodate clustering of cases by county and inclusion of county-level SVI measures; we assessed the separate factors in any SVI theme that were significantly associated with the outcome. We performed backward elimination starting with all variables included in the descriptive analysis. We assessed effect modification in the reduced model and evaluated multicollinearity by using variance inflation factors. We sequentially removed covariates if their exclusion decreased the Akaike information criterion, finalized models when no further improvement was observed, and calculated adjusted odds ratios (aORs) and 95% CIs.
Machine Learning Models
We also evaluated machine learning models (MLMs) that predict whether a TB case is estimated to be a plausible source. Using the most likely scenario, we developed a machine learning workflow to assess 10 different methods and included all features (i.e., variables) from the descriptive analysis (28) (Appendix).
Sensitivity Analyses
We reran both the GLMM and MLM workflows to evaluate the effects of using the infectiousness index to determine the outcome and smear positivity and cavitary disease as predictors. First, we restricted the data to the subset of secondary cases for which the most likely plausible source was determined using only wgSNP difference or epidemiologic link. Then, we reran the selection hierarchy without the infectiousness index (Appendix).
We used SAS version 9.4 (SAS Institute, Inc., https://www.sas.com) for data management, descriptive analyses, and GLMM development and Python version 3.9.13 (Python Software Foundation, https://www.python.org) for MLM analyses. CDC determined this activity to be routine public health surveillance and not human subjects research. WGS was performed as part of routine public health surveillance and no new sequence data were generated as part of this study. Sequence data included in this analysis are available in the National Center for Biotechnology Information (BioProject no. PRJNA1237251).
Study Population
During 2018–2022, a total of 41,264 TB cases were reported from the 50 US states and the District of Columbia, of which 32,110 (77.8%) were culture-confirmed and genotyped, and 61% (n = 19,577) of culture-confirmed and genotyped cases were reported during 2018–2020. Using the plausible source–case algorithm, we identified 3,762 recent transmission source–secondary case pairs for which 1,922 (51.1%) were supported by wgSNP analysis (Figure 2). The 1,922 case pairs comprised 922 cases attributed to recent transmission during 2020–2022, indicating a mean of 2.1 (range 1–24) plausible source cases per secondary case. We identified 893 (4.6%) unique plausible source cases during 2018–2020 for the all scenario (Table 1; Figure 2; Appendix Table 1). We found secondary cases attributed to recent transmission in 44 states.
For the most likely scenario, we identified 645 (3.3%) unique plausible source cases during 2018–2020 (Table 2; Appendix Table 2). Among the 922 cases attributed to recent transmission, we chose 753 (81.7%) plausible source cases by using wgSNP difference, 114 (12.4%) by using the index of infectiousness, 44 (4.8%) by using random selection, and 11 (1.2%) by using epidemiologic links (Appendix Table 4).
Descriptive Analyses
In the all scenario, plausible sources were more often male sex, US-born, <65 years of age, and other than non-Hispanic White or non-Hispanic Asian race/ethnicity, and they more frequently reported homelessness and substance use. Those plausible sources also more often had smear positivity and cavitary disease. County-level social vulnerability was higher among plausible sources, especially the SES theme (Table 1; Appendix Table 1).
Using the most likely scenario, characteristic distribution differed greatly across the 3 transmission categories (nonplausible source, plausible source for 1–2 cases, and plausible source for >3 cases) (Table 3; Appendix Table 3). Descriptively, among plausible source cases estimated to have transmitted to >3 (range 3–15) secondary cases, the highest percentages were among persons reporting male sex, 25–44 years of age, US-born, non-Hispanic Black race/ethnicity, experiencing homelessness, noninjection drug use, sputum smear positivity, and cavitary disease. Social vulnerability, including all 4 SVI themes, was also highest for the >3 secondary cases category.
GLMMs
Of the 19,577 cases available for analysis, we excluded 311 (1.6%) with missing data for sex, birth country, race/ethnicity, sputum smear, or county. We excluded all cases among children 0–4 years of age from analysis because plausible source cases had to be >10 years of age. For the all scenario, we found that the following characteristics were most associated with being a plausible source case: race/ethnicity, specifically Native Hawaiian/Pacific Islander non-Hispanic (aOR 5.34 [95% CI 3.11–9.17]), American Indian/Alaska Native non-Hispanic (aOR 2.12 [95% CI 1.10–4.06]), and Black non-Hispanic (aOR 1.82 [95% CI 1.40–2.36]), compared with White non-Hispanic persons; age <65 years compared with >65 years of age, the greatest association of which was 15–24 years of age (aOR 3.01 [95% CI 2.21–4.30]); US-born (aOR 2.41 [95% CI 1.99–2.91]); experiencing homelessness (aOR 1.89 [95% CI 1.49–2.39]); and indicators of infectiousness, specifically positive sputum smear (aOR 1.71 [95% CI 1.42–2.07]) and cavitary disease (aOR 1.69 [95% CI 1.42–2.00]) (Table 4). Noninjection drug use, male sex, and Hispanic ethnicity also were associated with plausible source cases.
Among the SVI themes, only SES was significantly associated with plausible source cases in multivariable generalized linear mixed modeling (p<0.001). Thus, we included 2 of the SES component factors in the final model: housing cost burden (aOR 1.20 [95% CI 1.06–1.36]), defined as households spending >30% of annual income on housing (26); and not having health insurance (aOR 1.13 [95% CI 1.02–1.25]). We calculated SVI aORs on the basis of a 0.20-unit (i.e., 20%) increase for each factor. Multicollinearity was modest for age and race/ethnicity, but no variance inflation factors exceeded 4. We found no statistically significant effect modification in the final model.
The final model for the most likely scenario included the same variables as the all scenario except for SES factors (Table 5). We only retained the not having health insurance (aOR 1.12 [95% CI 1.04–1.21]) factor in that model.
Machine Learning Predictive Models
After random selection of nonplausible source cases, we ran MLMs under the most likely scenario by using 3,456 observations, among which we used 2,686 (77.8%) for training and 770 (22.2%) for testing (Appendix Table 5). We assessed recall and F1 statistic, which is the harmonic mean of weighted-average recall and weighted-average precision computed as prevalence-weighted averages across the classes. We noted recall and F1 statistic were highest for gradient boosting (recall 0.758; F1 0.752), adaptive boosting (recall 0.751, F1 0.750), and random forest (recall 0.755; F1 0.740) methods; we selected those methods for hyperparameter tuning. The tuned adaptive boosting model had the highest area under the receiver operating characteristic curve (AUC; 0.780) (Appendix).
We found no reduction in predictive performance for the tuned adaptive boosting model when applied to the test set (F1 0.761; AUC 0.811) versus model training (F1 0.755; AUC 0.780). Sensitivity was 55.4% (95% CI 48.7%–62.2%), specificity 82.7% (95% CI 79.5%–85.8%), and AUC 0.81 (95% CI 0.78–0.84) with the test set. Individual-level factors dominated feature importance and county-level SVI measures were less predictive (Figure 3; Appendix).
Distribution of Secondary Cases Associated with Plausible Source Cases
The distribution of the number of secondary cases attributed to each plausible source case was right skewed, indicating heterogeneity of TB transmission (Appendix Figure). Most (76.6%, n = 494) plausible source cases transmitted to a single secondary case and comprised 53.6% of all estimated transmission events. Plausible source cases that transmitted to 3–15 secondary cases (8.1%, n = 52) comprised 24.9% of all estimated transmission events.
Sensitivity Analyses
We restricted analyses to source assignments solely on the basis of wgSNP difference or epidemiologic link and we removed the index of infectiousness from the hierarchy. In both instances, associations for smear positivity and cavitary disease were modestly attenuated but remained statistically significant (Appendix Table 6).
M. tuberculosis transmission was rare in the United States during 2018–2022; nonetheless, cases attributed to recent transmission were diagnosed in nearly every US state. In mixed models, sputum smear positivity, cavitary disease, race/ethnicity other than non-Hispanic White or Asian, age 15–44 years, being US-born, and homelessness were associated with being a plausible source for transmission. We found heterogeneous transmission and 8.1% of plausible source cases identified by the recent transmission algorithm were linked to 3–15 secondary cases and accounted for 24.9% of inferred transmission events. Among plausible source cases identified during 2018–2020 that were linked to >3 secondary cases, 86.5% were sputum smear–positive versus 48.1% of all genotyped cases reported during that period; similarly, 76.9% of plausible source cases had cavitary disease versus 36.7% of all genotyped cases. US-born persons only accounted for 27.4% of all genotyped cases, but 71.2% of plausible sources linked to >3 secondary cases were among US-born persons.
Our results align with prior investigations of source-case characteristics. A study from the Netherlands defined source cases as the first case diagnosed in a genotype-matched cluster (29). That study found fewer secondary cases for female sex and decreasing numbers of cases with increasing source age. In another study conducted among smear-positive TB patients in Barcelona, Spain, researchers used contact investigations to identify secondary cases and found that more secondary cases occurred after cases in younger adults, those with cavitary disease, and persons who injected drugs (30). In Peru, sources identified by using wgSNP distance more often were <34 years of age, were male, and had incarceration history or reported alcohol use or smoking (31).
Characteristics of plausible source cases in our study were like characteristics of cases estimated to be attributed to recent transmission. One study found that among the largest 10% of recent transmission clusters, cases attributed to recent transmission were more likely to be in persons who were US-born, American Indian/Alaska Native non-Hispanic, Native Hawaiian/Pacific Islander non-Hispanic, Black non-Hispanic, and Asian non-Hispanic and who reported homelessness (17).
In our study, positive sputum smear and cavitary disease, markers of infectiousness and advanced disease (32–34), were consistently associated with source case status in both GLMM and MLM, including sensitivity analyses where those factors were not used to select between potential source cases. Those associations support the current practice of prioritizing contact investigations for smear-positive and cavitary TB cases (13) and suggest that diagnostic delays contribute to transmission. Delays might reflect barriers to care, including homelessness (4, 5,10,35), or missed diagnoses after seeking healthcare services (36). Because TB is uncommon in the United States, clinicians might not routinely consider it in symptomatic patients. Of note, whereas most US TB cases occur among persons born outside the United States, plausible source cases in our study had higher odds of being in US-born persons. Although Asian non-Hispanic persons account for most TB cases (21), they were not substantially associated with plausible source status in this study. Missed diagnoses and longer infectious periods among higher-risk groups could explain those findings. Therefore, robust epidemiology and outbreak detection should be used to customize local TB control and prevention efforts focused on at-risk persons.
We found that county-based social determinants of health related to SES (i.e., burdened by housing cost and not having health insurance) were associated with plausible source case attribution in the GLMM (26). Those associations could reflect barriers to TB care, including lack of insurance and constrained resources in settings with high housing costs (37,38). A study that examined data from TB cases reported to the California TB Registry during 2012–2016 found higher TB rates in the lowest SES areas, which defined SES by low education, crowding, poverty, and the California Healthy Places Index (39). An ecologic analysis in another study reported that census tracts with lower median incomes, more racial/ethnic minority groups, and more migrants had higher pediatric TB rates; however, overcrowding and unemployment were not associated (40). As in those prior studies, our analysis suggests that area-based SES measures could inform TB prevention efforts.
Adaptive boosting ranked male sex and younger age (<65 years) as the main predictors, followed by race/ethnicity, clinical indicators of infectiousness, and birth origin (Figure 3). Ensemble methods outperformed regression-based approaches in this analysis (Appendix Table 5) and have been used to predict TB outcomes, including cluster growth and positive laboratory results (41,42). The GLMM supports inference through interpretable adjusted associations, and the MLM offers a complementary perspective focused on prediction. Despite differences in ranking, many of the same predictors were influential across both approaches, including markers of infectiousness, age, race/ethnicity, origin of birth, and several social factors. That convergence supports the robustness of our findings.
Strengths of our analysis included the use of wgSNP comparisons to refute case pairs that were not likely to be the result of recent transmission, assessment of both patient- and community-level predictors for association with being a plausible source case, and analysis of plausible source cases of recent transmission at a national level. The first limitation of this analysis is that some cases could have been misattributed as not sources by the algorithm because the source was reported after the secondary case, the source was outside a 10-mile radius, or the secondary case was not genotyped. Although ≈75% of US cases are culture-confirmed (21) and >96% of those are genotyped, TB cases in young children, which would predominantly result from recent transmission, have substantially lower rates of culture confirmation. Furthermore, because we limited running the algorithm to starting in 2020, some of the 2018–2019 cases could have been misattributed as not a source if the resulting secondary case was also reported in 2018–2019. We also might have misattributed cases as not sources if all contacts with latent TB infection underwent successful treatment and did not develop TB disease. Second, the COVID-19 pandemic occurred during the analysis period. The pandemic was associated with changes in healthcare seeking behavior that could have affected TB diagnoses; therefore, generalization of our results to other time periods should be done with caution.
In summary, although M. tuberculosis transmission is relatively rare in the United States, targeted control efforts could prevent outbreaks that overwhelm public health programs. We identified both patient-level and, to a lesser extent, community-level characteristics as predictors of being a source of recent transmission. Cases with clinical indicators of increased infectiousness were more likely to be transmission sources, supporting current guidance to prioritize those cases for contact investigation. Demographic characteristics associated with being a source case, such as race and origin of birth, differed from those of overall TB cases, highlighting the need for prompt TB diagnosis in US-born persons with risk factors, particularly homelessness and substance use, for preventing outbreaks. In addition, enhanced efforts to promptly diagnose TB in communities with more uninsured residents and those with high housing costs might also reduce transmission. Findings from this analysis suggest that intensifying public health interventions on TB cases in persons with certain demographic and clinical characteristics could yield a greater than expected reduction in M. tuberculosis transmission in the United States.
Mr. Kammerer is a health statistician at the Centers for Disease Control and Prevention in Atlanta. His primary research interests include tuberculosis molecular epidemiology and outbreak detection methods.
Acknowledgments
We thank the state and local health departments who collect and report the TB data that were used for these analyses. We specifically thank Noah Schwartz for his helpful comments during manuscript preparation.
CDC’s Division of Tuberculosis Elimination, National Center for HIV, Viral Hepatitis, STD, and Tuberculosis Prevention, provided funding support through employee salaries for this publication.
We used the large language model–based tool ChatGPT (OpenAI, https://openai.com) for limited editorial assistance (e.g., wording and formatting suggestions). All analyses, drafts of the manuscript, interpretations, figures, graphs, and final wording decisions were made by the authors.
References
- World Health Organization. Global tuberculosis report no. 978-92-4-010153-1. Geneva: The Organization; 2024.
- Williams PM, Pratt RH, Walker WL, Price SF, Stewart RJ, Feng PI. Tuberculosis—United States, 2023. MMWR Morb Mortal Wkly Rep. 2024;73:265–70. DOIPubMedGoogle Scholar
- LoBue PA, Mermin JH. Latent tuberculosis infection: the final frontier of tuberculosis elimination in the USA. Lancet Infect Dis. 2017;17:e327–33. DOIPubMedGoogle Scholar
- Raz KM, Talarico S, Althomsons SP, Kammerer JS, Cowan LS, Haddad MB, et al. Molecular surveillance for large outbreaks of tuberculosis in the United States, 2014–2018. Tuberculosis (Edinb). 2022;136:
102232 . DOIPubMedGoogle Scholar - Haddad MB, Mitruka K, Oeltmann JE, Johns EB, Navin TR. Characteristics of tuberculosis cases that started outbreaks in the United States, 2002–2011. Emerg Infect Dis. 2015;21:508–10. DOIPubMedGoogle Scholar
- Stewart RJ, Raz KM, Burns SP, Kammerer JS, Haddad MB, Silk BJ, et al. Tuberculosis outbreaks in state prisons, United States, 2011–2019. Am J Public Health. 2022;112:1170–9. DOIPubMedGoogle Scholar
- Stalter RM, Pecha M, Dov L, Miller D, Ghazal Z, Wortham J, et al. Tuberculosis outbreak in a state prison system—Washington, 2021–2022. MMWR Morb Mortal Wkly Rep. 2023;72:309–12. DOIPubMedGoogle Scholar
- Groenweghe E, Swensson L, Winans KD, Griffin P, Haddad MB, Brostrom RJ, et al. Outbreak of multidrug-resistant tuberculosis—Kansas, 2021–2022. MMWR Morb Mortal Wkly Rep. 2023;72:957–60. DOIPubMedGoogle Scholar
- Labuda SM, McDaniel CJ, Talwar A, Braumuller A, Parker S, McGaha S, et al. Tuberculosis outbreak associated with delayed diagnosis and long infectious periods in rural Arkansas, 2010–2018. Public Health Rep. 2022;137:94–101. DOIPubMedGoogle Scholar
- Mindra G, Wortham JM, Haddad MB, Powell KM. Tuberculosis outbreaks in the United States, 2009–2015. Public Health Rep. 2017;132:157–63. DOIPubMedGoogle Scholar
- Churchyard G, Kim P, Shah NS, Rustomjee R, Gandhi N, Mathema B, et al. What we know about tuberculosis transmission: an overview. J Infect Dis. 2017;216:S629–35. DOIPubMedGoogle Scholar
- Mangione CM, Barry MJ, Nicholson WK, Cabana M, Chelmow D, Coker TR, et al.; US Preventive Services Task Force. Screening for latent tuberculosis infection in adults: US Preventive Services Task Force recommendation statement. JAMA. 2023;329:1487–94. DOIPubMedGoogle Scholar
- Cole B, Nilsen DM, Will L, Etkind SC, Burgos M, Chorba T. Essential components of a public health tuberculosis prevention, control, and elimination program: recommendations of the Advisory Council for the Elimination of Tuberculosis and the National Tuberculosis Controllers Association. MMWR Recomm Rep. 2020;69:1–27. DOIPubMedGoogle Scholar
- Smith JP, Cohen T, Dowdy D, Shrestha S, Gandhi NR, Hill AN. Quantifying Mycobacterium tuberculosis transmission dynamics across global settings: a systematic analysis. Am J Epidemiol. 2023;192:133–45. DOIPubMedGoogle Scholar
- Smith JP, Gandhi NR, Silk BJ, Cohen T, Lopman B, Raz K, et al. A cluster-based method to quantify individual heterogeneity in tuberculosis transmission. Epidemiology. 2022;33:217–27. DOIPubMedGoogle Scholar
- Shrestha S, Winglee K, Hill AN, Shaw T, Smith JP, Kammerer JS, et al. Model-based analysis of tuberculosis genotype clusters in the United States reveals high degree of heterogeneity in transmission and state-level differences across California, Florida, New York, and Texas. Clin Infect Dis. 2022;75:1433–41. DOIPubMedGoogle Scholar
- Yuen CM, Kammerer JS, Marks K, Navin TR, France AM. Recent transmission of tuberculosis—United States, 2011–2014. PLoS One. 2016;11:
e0153728 . DOIPubMedGoogle Scholar - France AM, Grant J, Kammerer JS, Navin TR. A field-validated approach using surveillance and genotyping data to estimate tuberculosis attributable to recent transmission in the United States. Am J Epidemiol. 2015;182:799–807. DOIPubMedGoogle Scholar
- Noppert GA, Yang Z, Clarke P, Davidson P, Ye W, Wilson ML. Contextualizing tuberculosis risk in time and space: comparing time-restricted genotypic case clusters and geospatial clusters to evaluate the relative contribution of recent transmission to incidence of TB using nine years of case data from Michigan, USA. Ann Epidemiol. 2019;40:21–27.e3. DOIPubMedGoogle Scholar
- Mamiya H, Schwartzman K, Verma A, Jauvin C, Behr M, Buckeridge D. Towards probabilistic decision support in public health practice: predicting recent transmission of tuberculosis from patient attributes. J Biomed Inform. 2015;53:237–42. DOIPubMedGoogle Scholar
- Centers for Disease Control and Prevention. Reported tuberculosis in the United States, 2021 [cited 2025 Jul 7]. https://www.cdc.gov/tb/statistics/reports/2021/default.htm
- Ypma RJ, Altes HK, van Soolingen D, Wallinga J, van Ballegooijen WM. A sign of superspreading in tuberculosis: highly skewed distribution of genotypic cluster sizes. Epidemiology. 2013;24:395–400. DOIPubMedGoogle Scholar
- Stein RA. Super-spreaders in infectious diseases. Int J Infect Dis. 2011;15:e510–3. DOIPubMedGoogle Scholar
- Rodriguez CA, Li T, Self JL, Jenkins HE, Horsburgh CR, White LF. Genotyping indicates marked heterogeneity of tuberculosis transmission in the United States, 2009–2018. Epidemiol Infect. 2021;149:
e215 . DOIGoogle Scholar - Melsew YA, Gambhir M, Cheng AC, McBryde ES, Denholm JT, Tay EL, et al. The role of super-spreading events in Mycobacterium tuberculosis transmission: evidence from contact tracing. BMC Infect Dis. 2019;19:244. DOIPubMedGoogle Scholar
- Centers for Disease Control and Prevention; Agency for Toxic Substances and Disease Registry. CDC/ATSDR Social Vulnerability Index 2022 database [cited 2025 Jul 10]. https://www.atsdr.cdc.gov/placeandhealth/svi/data_documentation_download.html
- Walker TM, Ip CL, Harrell RH, Evans JT, Kapatai G, Dedicoat MJ, et al. Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study. Lancet Infect Dis. 2013;13:137–46. DOIPubMedGoogle Scholar
- Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic Minority Over-sampling Technique. J Artif Intell Res. 2002;16:321–57. DOIGoogle Scholar
- Borgdorff MW, Nagelkerke NJ, de Haas PE, van Soolingen D. Transmission of Mycobacterium tuberculosis depending on the age and sex of source cases. Am J Epidemiol. 2001;154:934–43. DOIPubMedGoogle Scholar
- Rodrigo T, Caylà JA, García de Olalla P, Galdós-Tangüis H, Jansà JM, Miranda P, et al. Characteristics of tuberculosis patients who generate secondary cases. Int J Tuberc Lung Dis. 1997;1:352–7.PubMedGoogle Scholar
- Trevisi L, Brooks MB, Becerra MC, Calderón RI, Contreras CC, Galea JT, et al. Who transmits tuberculosis to whom: a cross-sectional analysis of a cohort study in Lima, Peru. Am J Respir Crit Care Med. 2024;210:222–33. DOIPubMedGoogle Scholar
- Lau A, Barrie J, Winter C, Elamy AH, Tyrrell G, Long R. Chest radiographic patterns and the transmission of tuberculosis: implications for automated systems. PLoS One. 2016;11:
e0154032 . DOIPubMedGoogle Scholar - Asadi L, Croxen M, Heffernan C, Dhillon M, Paulsen C, Egedahl ML, et al. How much do smear-negative patients really contribute to tuberculosis transmissions? Re-examining an old question with new tools. EClinicalMedicine. 2022;43:
101250 . DOIPubMedGoogle Scholar - Urbanowski ME, Ordonez AA, Ruiz-Bedoya CA, Jain SK, Bishai WR. Cavitary tuberculosis: the gateway of disease transmission. Lancet Infect Dis. 2020;20:e117–28. DOIPubMedGoogle Scholar
- Shrestha S, Cilloni L, Asay GRB, Kammerer JS, Raz K, Shaw T, et al. Model-based analysis of impact, costs, and cost-effectiveness of tuberculosis outbreak investigations, United States. Emerg Infect Dis. 2025;31:497–506. DOIPubMedGoogle Scholar
- Wallace RM, Kammerer JS, Iademarco MF, Althomsons SP, Winston CA, Navin TR. Increasing proportions of advanced pulmonary tuberculosis reported in the United States: are delays in diagnosis on the rise? Am J Respir Crit Care Med. 2009;180:1016–22. DOIPubMedGoogle Scholar
- Simon AE, Fenelon A, Helms V, Lloyd PC, Rossen LM. HUD housing assistance associated with lower uninsurance rates and unmet medical need. Health Aff (Millwood). 2017;36:1016–23. DOIPubMedGoogle Scholar
- Baker DW, Shapiro MF, Schur CL. Health insurance and access to care for symptomatic conditions. Arch Intern Med. 2000;160:1269–74. DOIPubMedGoogle Scholar
- Bakhsh Y, Readhead A, Flood J, Barry P. Association of area-based socioeconomic measures with tuberculosis incidence in California. J Immigr Minor Health. 2023;25:643–52. DOIPubMedGoogle Scholar
- Myers WP, Westenhouse JL, Flood J, Riley LW. An ecological study of tuberculosis transmission in California. Am J Public Health. 2006;96:685–90. DOIPubMedGoogle Scholar
- Althomsons SP, Winglee K, Heilig CM, Talarico S, Silk B, Wortham J, et al. Using machine learning techniques and national tuberculosis surveillance data to predict excess growth in genotyped tuberculosis clusters. Am J Epidemiol. 2022;191:1936–43. DOIPubMedGoogle Scholar
- Smith JP, Milligan K, McCarthy KD, Mchembere W, Okeyo E, Musau SK, et al. Machine learning to predict bacteriologic confirmation of Mycobacterium tuberculosis in infants and very young children. PLOS Digit Health. 2023;2:
e0000249 . DOIPubMedGoogle Scholar
Figures
Tables
Suggested citation for this article: Kammerer S, Flanagan D, Raz K, Shaw T, Wortham J Talarico S. Characteristics of plausible source cases responsible for recent Mycobacterium tuberculosis transmission, United States, 2018–2022. Emerg Infect Dis. 2026 Jun [date cited]. https://doi.org/10.3201/eid3206.260104
Original Publication Date: May 15, 2026
Table of Contents – Volume 32, Number 6—June 2026
| EID Search Options |
|---|
|
|
|
|
|
|

![Flow diagram showing selection of recent transmission source–secondary case pairs from 50 states and Washington, DC, included in an analysis of plausible source cases responsible for recent Mycobacterium tuberculosis transmission, United States, 2018–2022. Among 3,762 RT case pairs identified, case pairs with >5 SNP differences were excluded (n = 1,840), leaving 1,922 WGS–validated RT case pairs. The analytic dataset included 922 TB cases attributed to RT during 2020–2022 and plausible source cases identified during 2018–2020 (all plausible source cases [n = 893] and most likely plausible source cases [n = 645]). RT, recent transmission; SNP, single-nucleotide polymorphism; TB, tuberculosis; WGS, whole-genome sequencing.](/eid/images/26-0104-F2-tn.jpg)

Please use the form below to submit correspondence to the authors or contact them at the following address:
Sarah Talarico, Centers for Disease Control and Prevention, 1600 Clifton Rd NE, Mailstop H24-3, Atlanta, GA 30329-4018, USA
Top