Skip directly to search Skip directly to A to Z list Skip directly to page options Skip directly to site content

Volume 14, Number 7—July 2008


Determinants of Cluster Size in Large, Population-Based Molecular Epidemiology Study of Tuberculosis, Northern Malawi

Judith R. Glynn*Comments to Author , Amelia C. Crampin*†, Hamidou Traore*, Steve Chaguluka†, Donex T. Mwafulirwa†, Saad Alghamdi*, Bagrey M.M. Ngwira†, Malcolm D. Yates‡, Francis D. Drobniewski‡, and Paul E.M. Fine*
Author affiliations: *London School of Hygiene and Tropical Medicine, London, UK; †Karonga Prevention Study, Chilumba, Malawi; ‡Health Protection Agency, London;

Suggested citation for this article


Tuberculosis patients with identical strains of Mycobacterium tuberculosis are described as clustered. Cluster size may depend on patient or strain characteristics. In a 7-year population-based study of tuberculosis in Karonga District, Malawi, clusters were defined by using IS6110 restriction fragment length polymorphism, excluding patterns with <5 bands. Spoligotyping was used to compare strains with an international database. Among 682 clustered patients, cluster size ranged from 2 to 37. Male patients, young adults, and town residents were over-represented in large clusters. Cluster size was not associated with HIV status or death from tuberculosis. Spoligotypes from 9 (90%) of 10 large cluster strains were identical or very similar (1 spacer different) to common spoligotypes found elsewhere, compared with 37 (66%) of 56 of those from nonclustered patients (p = 0.3). Large clusters were associated with factors likely to be related to social mixing, but spoligotypes of common strains in this setting were also common types elsewhere, consistent with strain differences in transmissibility.

Molecular techniques, in particular restriction fragment length polymorphism (RFLP) based on the IS6110 insertion element, are used to define clusters of isolates of Mycobacterium tuberculosis with identical DNA fingerprints. Many studies have investigated risk factors for clustering, but relatively little is known about the determinants of cluster size (1,2). The size of clusters could depend on factors favoring transmission or on differences in the strains themselves. M. tuberculosis strains found in persons with smear-positive disease, many contacts, or delays in diagnosis and effective treatment are particularly likely to be transmitted. Some strains may be inherently more transmissible than others, perhaps because they are particularly likely to give rise to sputum smear–positive disease, they are associated with a more insidious onset of clinical symptoms (so patients are infectious for longer), or they are more virulent and are therefore more likely to give rise to secondary cases within the period studied (3). Large clusters may also be observed if the strain has a particularly stable RFLP pattern; this may be more likely for strains with few bands.

Epidemiologic differences can be explored by examining risk factors for cluster size. Giordano et al. (1) hypothesized that cluster size would be related to duration of symptoms. Those researchers found no evidence of this but did find inverse associations with age and HIV status in a population-based study in Texas in the United States. Strain-related differences are likely if the same strains give rise to large clusters in unrelated populations. The ubiquity of the Beijing family of strains has led to speculation that they may be particularly virulent or transmissible (4).

In a population-based study of the molecular epidemiology of tuberculosis in northern Malawi, we found that clustering was associated with young age, female sex, area of residence, and, in older adults, HIV positivity (5). We explored the determinants of cluster size and the characteristics of the larger clusters.


As part of the Karonga Prevention Study, northern Malawi, all persons with suspected tuberculosis at peripheral clinics and the district hospital are seen by project staff. Sputum is collected for smear and culture; lymph node and pleural and peritoneal aspirates are also cultured, when available. Cultures are set up in the project laboratory in Malawi, and those macroscopically consistent with M. tuberculosis are sent to the Health Protection Agency Mycobacterium Reference Unit, London, United Kingdom, for species identification and drug resistance testing. HIV testing is conducted after counseling, if consent is given. Patients are treated for tuberculosis according to Malawi government guidelines (6).

DNA fingerprinting using IS6110 RFLP has been conducted on isolates from patients who have been diagnosed since late 1995, following standard procedures (7). Patients whose disease was diagnosed up to early 2003 were included in this analysis. RFLP patterns were compared by using computer-assisted (Gelcompar 4.1; Applied Maths, Kortrijk, Belgium) visual comparison. Laboratory error was thought likely if isolates with identical RFLP patterns were isolated on the same day from patients with no known epidemiologic relationship if, in addition, there was no other laboratory evidence of tuberculosis, or if they were the only 2 examples of this RFLP pattern, or if the patients had other isolates with different patterns (8). After likely laboratory errors were excluded, RFLP patterns shared by >1 patient were classified as clustered. Some patients had >1 isolate. To define whether a strain was clustered and to determine the size of the cluster, patients were included more than once if they had >1 RFLP pattern. Thereafter, patients were only included once, for their first episode of tuberculosis for which an RFLP result was available.

Spoligotyping (9) was performed on at least 2 isolates of clusters containing at least 15 patients, to enable comparison of strains with international databases (10,11). Changes in the proportion of tuberculosis cases caused by each of these large cluster strains over time was examined, by using the Fisher exact test to compare proportions and the χ2 test for linear trend. Spoligotyping was also performed on unique (not clustered) strains from patients with smear-positive tuberculosis in 1998 or 1999, as examples of strains that had apparently not spread in the population; and from all positive cultures from 2002. Previously identified spoligotypes were defined as widespread if the international database described them as both “ubiquitous” and “recurrent,” “common,” or “epidemic.”

Analysis of cluster size excluded unique strains and strains with <5 bands on the RFLP (because patterns with few bands are insufficiently discriminatory). Cluster size was divided into 4 groups (Table 1), and associations with cluster size were determined by using maximum-likelihood ordered logistic regression with the ologit command in STATA (12). With this method, the odds ratios calculated represent the summary relative odds of larger clusters compared to smaller clusters across the 4 groups. This method was used in preference to linear regression because cluster size is not normally distributed, and in preference to logistic regression because it avoids arbitrary dichotomization of cluster size. All available risk factors for cluster size were assessed individually (Table 1), and factors that were significant at the 5% level, after adjusting for other factors, or that confounded other variables were retained in the final model. The molecular epidemiologic work of the Karonga Prevention Study was approved by the Malawi National Health Sciences Research Committee and the ethics committee of the London School of Hygiene and Tropical Medicine.


Over the study period, 1,248 cases of culture-positive tuberculosis were diagnosed in patients in Karonga District. RFLP results were available on 1,194 isolates from 1,044 patients. After we excluded 25 isolates because laboratory error was suspected (8), there were results for 1,029 patients. Eighty-one had <5 bands so they were excluded. Of the remaining 948 patients, 682 (72%) were clustered and form the basis of this analysis.

Cluster size varied from 2 to 37. The determinants of cluster size are shown in Table 1. Older patients were less likely than younger patients to be in large clusters. Male patients were more likely than female patients to be in large clusters, and there was variation by geographic area. Cluster size was not statistically associated with HIV status, type of tuberculosis, previous tuberculosis, or drug resistance. Patients in small clusters were as likely to die during treatment as those in large clusters. In the multivariate analysis, the results were similar (Table 2), with significant associations with age, sex, and area of residence. The results were unchanged by adjusting for year or for RFLP band number. None of the other factors shown in Table 1 was associated with clustering after we adjusted for possible confounders. Repeating the analysis with different categorizations of cluster size gave similar results (not shown).


Thumbnail of Geographic distribution of the 4 most common strains defined by restriction fragment length polymorphism: A) strain kps12, B) strain kps121, C) strain kps41, and D) strain kps44. Each o represents a patient. Each square is 10 km × 10 km. The background shading represents the total number of tuberculosis (TB) cases in each area during the study period, which largely reflects the population density.

Figure. Geographic distribution of the 4 most common strains defined by restriction fragment length polymorphism: A) strain kps12, B) strain kps121, C) strain kps41, and D) strain kps44. Each o represents a...

All of the large cluster strains (>15 people) were found in at least 4 of the 6 geographic areas of the district, and most were found throughout the district. The distributions of the 4 largest clusters are shown in the Figure. Patients with strains from most of the large clusters were present in the district throughout the study period. Trends over time for strains involving at least 15 people are shown in Table 3. Only 1 strain, kps121, showed statistically significant changes over time; it appeared to be decreasing.

Spoligotypes from large clusters (>15 people) were compared with the international database (10,11). The results, displayed according to the octal code, are shown in Table 4 (13). Six of the large cluster strains had patterns identical or very similar to spoligotype 59, which is classified as ubiquitous and recurrent (10,11). These 6 RFLP-defined strains (kps10, 12, 20, 21, 41, and 64) had similar RFLP patterns, with a similarity coefficient of 79% (with 1% position tolerance).

The spoligotypes for RFLP-defined strains kps104, kps44, and kps97 were also identical or similar to previously described widespread spoligotypes, types 21, 53, and 1 (Beijing), respectively. The spoligotype for strain kps121, spoligotype129, was not similar to any widespread types.

The spoligotypes from the RFLP-defined large cluster strains were compared with spoligotypes from patients with positive cultures in 2002, and from patients with smear-positive tuberculosis and unique RFLP patterns in 1998 through 1999. Overall, 9 (90%) of 10 of the large cluster strains had spoligotypes that were identical to, or only 1 spacer different from, previously described widespread spoligotypes. For the patients from 2002, this proportion was 90 (71%) of 126 (p = 0.3 when compared to the large cluster strains), and for the smear-positive unique strains, it was 37 (66%) of 56 (p = 0.3 compared to the large cluster strains).

All the spoligotypes that were found in the RFLP-defined large cluster strains were also found among (RFLP-defined) unique strains. Seventeen of the unique strains had spoligotype 59, and 2 others had closely related patterns (i.e., 1 spacer different); 1 had spoligotype 21, and 1 had a closely related pattern; 4 had spoligotype 53, and 2 had closely related patterns; and 6 had spoligotype 129. Of the 56 patients from 1998 to 1999, none had Beijing spoligotypes, but we have previously described strains with Beijing spoligotypes and unique RFLP patterns in this population (14).

The spoligotypes found in the large cluster strains were also common among the unselected patients from 2002. Thirty-six (29%) had spoligotype 59, and 10 more had closely related patterns; 11 (9%) had spoligotype 21; 8 (6%) had spoligotype 53, and 2 had closely related patterns; 7 (6%) had the Beijing spoligotype; and 8 (6%) had spoligotype 129. The 36 isolates with spoligotype 59 had 23 different RFLP patterns with a similarity coefficient of 63%.


This study suggests that both epidemiologic and strain-related factors may contribute to large cluster size. In large clusters young adults, male patients, and those living in the town were over-represented, all factors likely to be associated with increased social mixing. Similar associations with age and sex have been found previously, in the United States and Denmark. In Denmark the largest cluster was particularly predominant in the capital city (1,2).

There was no significant association between tuberculosis type (smear positive, smear-negative pulmonary, or extrapulmonary) and cluster size, but most patients had sputum smear–positive disease. There was also no statistically significant association with degree of smear positivity (not shown). An overall association with infectiousness would not necessarily be expected: the infectiousness of the first cases of a cluster may be important in determining size, but the first cases for the large clusters, which were found throughout the period of study, are not identifiable. There was no significant association with isoniazid resistance, but only 39 (6%) patients had resistant strains. Isoniazid resistance has been associated with reduced clustering and reduced generation of secondary cases (15,16) so it might have been expected to be less common in the larger clusters. Only 3 clustered patients had rifampin resistance in our study (2 with 1 strain and 1 with another), so the effect of this factor on cluster size could not be investigated.

The factors associated with cluster size were not identical to those associated with clustering overall (5). Whereas younger adults were more likely to have clustered strains and to be in large clusters, female patients were more likely to have clustered strains but among clustered case-patients, male patients were more likely to be in large clusters. Known contact with a previous tuberculosis patient is an important risk factor for tuberculosis, especially for women in this population (17). It may be that women are particularly likely to become infected at home (and therefore be in small clusters) and that men are more likely to become infected outside the home, sometimes from outside the area (seen as unique strains) and sometimes as part of large clusters.

We found no evidence of an association of cluster size with HIV status, although we had previously found HIV to be associated with clustering among older patients (5). The effect of HIV infection on clustering is complex since it depends both on the biologic effects of HIV (increasing the risks for active disease—perhaps to different extents for primary and postprimary disease—and decreasing infectiousness) and on any tendency for HIV and tuberculosis to affect the same subpopulations with shared risk factors.

Strain virulence was assessed by examining the proportion of patients who died: there was no association with cluster size either overall, or separately, in HIV-positive or -negative patients (data not shown). Virulent strains could lead to large clusters if virulence were associated with increased transmission rates or increased rates of disease after infection (3). However, virulent strains could have less opportunity to transmit if the severity of symptoms leads to early treatment or death, thus reducing the duration of the infectious period.

Evidence that strain characteristics may have contributed to cluster size comes from the finding that the spoligotypes of most of the common RFLP-defined strains in this study were identical to, or only 1 spacer different from, widespread spoligotypes already described. Unique RFLP-defined strains from smear-positive patients in the early part of the study were used as a comparison group. Smear-positive case-patients were chosen to maximize the likelihood of transmission occurring; early cases were used to allow time for secondary cases to have been identified if they had occurred. These unique strains were less likely than the large cluster strains to have spoligotypes that were closely related to widespread types, but this difference was not statistically significant, and the spoligotypes that were found in the large cluster strains were also found among the unique strains. Interestingly, strain kps121, which was the only large cluster strain with a spoligotype not closely related to a widespread previously described type, was also the 1 large cluster strain that was clearly decreasing in the Karonga population.

The finding of large cluster strains with previously described widespread spoligotypes may suggest that these strains are particularly transmissible or particularly likely to cause disease. Other possibilities are that they are older in evolutionary terms, and thus have had more time to become widespread, or that we are seeing a founder effect in some populations with subsequent spread following human migration patterns. Spoligotype 59 was common in the Malawi population in all groups of patients, clustered and unique, and was associated with a wide diversity of RFLP patterns, which suggests that it may be a longstanding strain in this area. It was also the most common spoligotype found in studies in Zimbabwe and Zambia (18,19). However, spoligotype 59 was particularly common among the isolates from large clusters, with more closely related RFLP patterns, consistent with some variants having high transmissibility. Spoligotype 59 has been classified as belonging to the Latin-American-Mediterranean lineage (18), and as part of the strain family Southern Africa Family 1 (19). The large cluster strain kps97 had a Beijing spoligotype and in total, we have previously identified 44 patients with Beijing strains in this dataset, with 12 different RFLP patterns (14). Beijing strains have been associated with increased virulence and growth rates in vitro (2022). That there are true differences in strain characteristics between other clustered and nonclustered strains is beginning to be established in in vitro studies from other populations (23).

Dr Glynn is professor of infectious disease epidemiology at the London School of Hygiene and Tropical Medicine, London, UK. Her research interests include tuberculosis, HIV, and molecular epidemiology.


We thank the Government of the Republic of Malawi for their interest in this project and the National Health Sciences Research Committee of Malawi for permission to publish the article. We also thank Pam Sonnenberg for helpful comments on an earlier draft, Keith Branson for the maps, and Sian Floyd for statistical advice.

Until 1996 the Karonga Prevention Study was funded primarily by the British Leprosy Relief Association and the International Federation of Anti-Leprosy Organizations, with contributions from the World Health Organization/United Nations Development Program/World Bank Special Programme for Research and Training in Tropical Diseases. Since 1996, the Wellcome Trust has been the principal funder. J.R.G. was supported in part by the UK Department for International Development and the UK Department of Health (Public Health Career Scientist award).


  1. Giordano TP, Soini H, Teeter LD, Adams GJ, Musser JM, Graviss EA. Relating the size of molecularly defined clusters of tuberculosis to the duration of symptoms. Clin Infect Dis. 2004;38:106. DOIPubMed
  2. Lillebaek T, Dirksen A, Kok-Jensen A, Andersen AB. A dominant Mycobacterium tuberculosis strain emerging in Denmark. Int J Tuberc Lung Dis. 2004;8:10016.PubMed
  3. Valway SE, Sanchez MP, Shinnick TF, Orme I, Agerton T, Hoy D, An outbreak involving extensive transmission of a virulent strain of Mycobacterium tuberculosis. N Engl J Med. 1998;338:6339. DOIPubMed
  4. Bifani PJ, Mathema B, Kurepina NE, Kreiswirth BN. Global dissemination of the Mycobacterium tuberculosis W-Beijing family strains. Trends Microbiol. 2002;10:4552. DOIPubMed
  5. Glynn JR, Crampin AC, Yates MD, Traore H, Mwaungulu FD, Ngwira BM, The importance of recent infection with M. tuberculosis in an area with high HIV prevalence: a long-term molecular epidemiological study in northern Malawi. J Infect Dis. 2005;192:4807. DOIPubMed
  6. Glynn JR, Crampin AC, Ngwira BM, Mwaungulu FD, Mwafulirwa DT, Floyd S, Trends in tuberculosis and the influence of HIV infection in northern Malawi, 1988–2001. AIDS. 2004;18:145963. DOIPubMed
  7. van Embden JD, Cave MD, Crawford JT, Dale JW, Eisenach KD, Gicquel B, Strain identification of Mycobacterium tuberculosis by DNA fingerprinting: recommendations for a standardized methodology. J Clin Microbiol. 1993;31:4069.PubMed
  8. Glynn JR, Yates MD, Crampin AC, Ngwira BM, Mwaungulu FD, Black GF, DNA fingerprint changes in tuberculosis: re-infection, evolution, or laboratory error? J Infect Dis. 2004;190:115866. DOIPubMed
  9. Kamerbeek J, Schouls L, Kolk A, van Agterveld M, van Soolingen D, Kuijper S, Simultaneous detection and strain differentiation of Mycobacterium tuberculosis for diagnosis and epidemiology. J Clin Microbiol. 1997;35:90714.PubMed
  10. Filliol I, Driscoll JR, Van Soolingen D, Kreiswirth BN, Kremer K, Valetudie G, Global distribution of Mycobacterium tuberculosis spoligotypes. Emerg Infect Dis. 2002;8:13479.PubMed
  11. Filliol I, Driscoll JR, van Soolingen D, Kreiswirth BN, Kremer K, Valetudie G, Snapshot of moving and expanding clones of Mycobacterium tuberculosis and their global distribution assessed by spoligotyping in an international study. J Clin Microbiol. 2003;41:196370. DOIPubMed
  12. StataCorp. STATA base reference manual. College Station (TX): STATA Press; 2003.
  13. Dale JW, Brittain D, Cataldi AA, Cousins D, Crawford JT, Driscoll J, Spacer oligonucleotide typing of bacteria of the Mycobacterium tuberculosis complex: recommendations for standardised nomenclature. Int J Tuberc Lung Dis. 2001;5:2169.PubMed
  14. Glynn JR, Crampin AC, Traore H, Yates MD, Mwaungulu F, Ngwira B, Mycobacterium tuberculosis Beijing genotype, northern Malawi. Emerg Infect Dis. 2005;11:1503.PubMed
  15. van Soolingen D, Borgdorff MW, de Haas PE, Sebek MM, Veen J, Dessens M, Molecular epidemiology of tuberculosis in the Netherlands: a nationwide study from 1993 through 1997. J Infect Dis. 1999;180:72636. DOIPubMed
  16. Burgos M, DeRiemer K, Small PM, Hopewell PC, Daley CL. Effect of drug resistance on the generation of secondary cases of tuberculosis. J Infect Dis. 2003;188:187884. DOIPubMed
  17. Crampin AC, Glynn JR, Floyd S, Malema SS, Mwinuka VM, Ngwira B, Tuberculosis and gender: exploring the patterns in a case control study in Malawi. Int J Tuberc Lung Dis. 2004;8:194203.PubMed
  18. Easterbrook PJ, Gibson A, Murad S, Lamprecht D, Ives N, Ferguson A, High rates of clustering of tuberculosis strains in Harare, Zimbabwe: a molecular epidemiological study. J Clin Microbiol. 2004;42:453644. DOIPubMed
  19. Chihota V, Apers L, Mungofa S, Kasongo W, Nyoni IM, Tembwe R, Predominance of a single genotype of Mycobacterium tuberculosis in regions of Southern Africa. Int J Tuberc Lung Dis. 2007;11:3118.PubMed
  20. Lopez B, Aguilar D, Orozco H, Burger M, Espitia C, Ritacco V, A marked difference in pathogenesis and immune response induced by different Mycobacterium tuberculosis genotypes. Clin Exp Immunol. 2003;133:307. DOIPubMed
  21. Dormans J, Burger M, Aguilar D, Hernandez-Pando R, Kremer K, Roholl P, Correlation of virulence, lung pathology, bacterial load and delayed type hypersensitivity responses after infection with different Mycobacterium tuberculosis genotypes in a BALB/c mouse model. Clin Exp Immunol. 2004;137:4608. DOIPubMed
  22. Zhang M, Gong J, Yang Z, Samten B, Cave MD, Barnes PF. Enhanced capacity of a widespread strain of Mycobacterium tuberculosis to grow in human macrophages. J Infect Dis. 1999;179:12137. DOIPubMed
  23. Theus SA, Cave MD, Eisenach KD. Intracellular macrophage growth rates and cytokine profiles of Mycobacterium tuberculosis strains with different transmission dynamics. J Infect Dis. 2005;191:45360. DOIPubMed



Suggested citation for this article: Glynn JR, Crampin AC, Traore H, Chaguluka S, Mwafulirwa DT, Alghamdi S, et al. Determinants of cluster size in large, population-based molecular epidemiology study of tuberculosis, northern Malawi. Emerg Infect Dis [serial on the Internet]. 2008 Jul [date cited]. Available from

DOI: 10.3201/eid1407.060468

Table of Contents – Volume 14, Number 7—July 2008

Comments to the Authors

Please use the form below to submit correspondence to the authors or contact them at the following address:

Judith R. Glynn, Department of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, Keppel St, London WC1E 7HT, UK;

character(s) remaining.

Comment submitted successfully, thank you for your feedback.

Comments to the EID Editors

Please contact the EID Editors via our Contact Form.