Improving Estimates of Social Contact Patterns for Airborne Transmission of Respiratory Pathogens

Data on social contact patterns are widely used to parameterize age-mixing matrices in mathematical models of infectious diseases. Most studies focus on close contacts only (i.e., persons spoken with face-to-face). This focus may be appropriate for studies of droplet and short-range aerosol transmission but neglects casual or shared air contacts, who may be at risk from airborne transmission. Using data from 2 provinces in South Africa, we estimated age mixing patterns relevant for droplet transmission, nonsaturating airborne transmission, and Mycobacterium tuberculosis transmission, an airborne infection where saturation of household contacts occurs. Estimated contact patterns by age did not vary greatly between the infection types, indicating that widespread use of close contact data may not be resulting in major inaccuracies. However, contact in persons >50 years of age was lower when we considered casual contacts, and therefore the contribution of older age groups to airborne transmission may be overestimated.

Simulated mixing patterns can have a considerable effect on model dynamics (1), underscoring the importance of simulating realistic mixing patterns. Mixing patterns are frequently shaped by social contact data (i.e., empirical data collected from respondents about the persons with whom they had contact during a set period) (2).
Most social contact data collection has focused on close contacts, using a definition of contacts that required a 2-way face-to-face conversation of >3 words, close proximity (e.g., within 2 meters), physical contact, or some combination of those criteria (2). Those types of contact may approximate reasonably well the types of contact that are relevant for infections that are transmitted primarily through direct contact, short range aerosols, droplets, or some combination of these modes. For obligate, preferential, or opportunistic airborne infections such as measles, Mycobacterium tuberculosis, and SARS-CoV-2, however, this definition probably excludes many potentially effective contacts because transmission of airborne infections can occur between anybody sharing air in inadequately ventilated indoor spaces, regardless of whether conversation occurs, and over distances >2 meters (3). For airborne infections, estimates of casual contact time may therefore be more appropriate, calculated as the time spent in indoor locations multiplied by the number of other persons present.
Tuberculosis also differs from most respiratory infections in terms of the long periods during which persons are potentially infectious; an estimated 9-36 months elapses between disease development and diagnosis (or notification) in 11 countries with high tuberculosis incidences (4). Therefore, transmission to repeated contacts can partially saturate (even allowing for reinfection), making the relationship between contact time and infection risk nonlinear (5). This effect is most pronounced for contact between

Improving Estimates of Social Contact Patterns for Airborne Transmission of Respiratory Pathogens
Data on social contact patterns are widely used to parameterize age-mixing matrices in mathematical models of infectious diseases. Most studies focus on close contacts only (i.e., persons spoken with face-to-face). This focus may be appropriate for studies of droplet and short-range aerosol transmission but neglects casual or shared air contacts, who may be at risk from airborne transmission. Using data from 2 provinces in South Africa, we estimated age mixing patterns relevant for droplet transmission, nonsaturating airborne transmission, and Mycobacterium tuberculosis transmission, an airborne infection where saturation of household contacts occurs. Estimated contact patterns by age did not vary greatly between the infection types, indicating that widespread use of close contact data may not be resulting in major inaccuracies. However, contact in persons >50 years of age was lower when we considered casual contacts, and therefore the contribution of older age groups to airborne transmission may be overestimated.
household members (5). Household membership and repeated contacts are rarely explicitly simulated in mathematical models, and therefore the effects of contact saturation need to be incorporated into the mixing matrices used to parameterize the models.
In this article, we describe methods for estimating age-mixing patterns relevant for nonsaturating airborne transmission and M. tuberculosis transmission by using a novel weighted approach to incorporate the effects of household contact saturation into our estimates for M. tuberculosis. We generate estimates of age mixing using data on close and casual contacts from 2 communities in South Africa and compare the estimated mixing patterns with those typically used in mathematical modeling studies (i.e., generated using close contact numbers, and more suitable for droplet or short range aerosol transmission).

Methods
We collected social contact data in 2 study communities in South Africa: 1 in KwaZulu-Natal Province and 1 in Western Cape Province. Both communities have high rates of unemployment, high prevalence of HIV, and high incidence of tuberculosis compared with the other provinces as a whole. The study community in KwaZulu-Natal consisted of a population of ≈46,000, living in the predominantly rural and peri-urban areas in the catchment areas of 2 primary care clinics and within a demographic surveillance area (DSA). The study community in Western Cape was a periurban community of ≈27,000 and was an established research site with biennial censuses.

Data Collection
We collected the KwaZulu-Natal data during March-December 2019. We sampled 3,093 adults (>18 years of age) at random from an estimated population of 33,288, stratified by residential area (small-scale divisions with ≈350 households per area) and with probability proportional to the number of eligible persons in each area, based on the most recent DSA census conducted before area entry. We made up to 3 attempts to contact sampled persons.
We collected the Western Cape data during May-October 2019. In total, we selected 1,530 adults (>15 years of age) from an estimated population of 20,633, by using age-and sex-stratified random sampling, based on a census conducted in the study population in February and March 2019. We made up to 5 attempts to contact selected persons on different days of the week (including weekends).
For both surveys, we conducted interviews faceto-face at the respondents' homes, by using interview administered questionnaires on tablet computers. We conducted interviews in isiZulu in KwaZulu-Natal and in English or isiXhosa in Western Cape. We asked respondents about their movements on a randomly assigned day during the preceding week in KwaZulu-Natal, and on the day before the interview in Western Cape. To allow casual contact time (defined as time spent "sharing air" indoors or on transport) to be estimated, we asked respondents to list the places they had visited (including their own home) and transport they had used. For each location, questions asked included: • What type of location was it? (Appendix Figure 5, https://wwwnc.cdc.gov/EID/ article/28/10/21-2567-App1.pdf) • How long did you spend there? (recorded in hours and minutes) • How many persons were there halfway through the time you were there?
We did not ask respondents for the ages of persons present because it was thought that respondents would not be able to accurately remember and estimate the ages of all persons present in all indoor locations visited and transport used. We also asked respondents about their close contacts, defined as persons with whom the respondent had a face-to-face conversation. We first asked respondents to make a numbered list of all their contacts, with help from the interviewer. We then asked respondents questions about 10 contacts (selected at random by number by the tablet computers) or all of their contacts if they reported <10. Questions included: • Is this contact a member of your household?
• How old do you think they are?
• How much time did you spend with them in total?
We also collected respondents' basic demographic information. For the KwaZulu-Natal community, we obtained data on household size and residency (i.e., urban, peri-urban, or rural) from the most recent DSA census. We collected all other data directly from the respondents.

Data Analysis
We estimated close contact numbers and times by using data on persons with whom the respondents reported having a face-to-face conversation. We generated 95% plausible intervals for the age-mixing matrices by using bootstrapping.
We estimated casual contact time in a location as the duration of time the respondent reported spending there multiplied by the reported number of persons present. We generated central estimates for casual contact time age-mixing matrices by using the method outlined in McCreesh et al (6). In brief, because data were collected on numbers of total persons and children present in indoor locations only, and not the ages of adults, we need to estimate the age distribution of adult casual contacts. We therefore assumed that the age distribution of adult contacts in each location type matched the weighted age distribution of respondents who reported visiting locations of that type. Again, we generated 95% plausible ranges by using bootstrapping.
We adjusted the age-mixing matrices to be symmetric by using the study community age structures. We used data on adult contact numbers and time with children to estimate child contact numbers and time with adults, assuming that overall contact numbers and time between children and adults in each age group is equal to overall contact numbers and time between adults in each age group and children. To enable comparison between the 2 study communities, the lowest respondent age group was set at 15-19 years for both surveys. Because persons 15-17 years of age were not interviewed in KwaZulu-Natal, we assumed that contact patterns in persons 18-19 years of age were representative of contact patterns in all persons 15-19 years of age (Appendix).

Generating Age-Mixing Matrices for Droplet and Nonsaturating Airborne Transmission and Mycobacteria tuberculosis
We set age-mixing matrices relevant for droplet transmission to be equal to age-mixing matrices calculated using close contact numbers ( Figure 1). We set age-mixing matrices relevant for nonsaturating airborne transmission to be equal to the unweighted sum of the household close contact time matrices and the nonhousehold casual contact time matrices. We used close contact time between household members for household estimates, as opposed to casual contact time occurring in households. We did so because most contact between household members is likely to meet the definition of close contact, and because this approach enabled the age structures of households to be more accurately reflected in the age-mixing matrices. We set age-mixing matrices relevant for M. tuberculosis transmission to be equal to the sum of the household close contact number matrices and the nonhousehold casual contact time matrices. We weighted these matrices to reflect empirical estimates of the proportion of tuberculosis that results from household transmission (central estimate 12% [range 8%-16%]) (5).
To enable direct comparisons to be made between the different age-mixing matrices, we adjusted the matrices for nonsaturating airborne transmission and M. tuberculosis transmission to give the same mean contact intensity between adults as the matrices for droplet transmission. We used bootstrapping to generate plausible ranges (Appendix).
Of the 1,530 persons sampled in Western Cape, 1,214 (93%) were successfully contacted, 117 (8%) had moved or died, 193 (13%) had had incorrect information listed in the census, and 6 were uncontactable. Of the 1,214 persons contacted, 77 (6%) refused to be interviewed and 14 were ineligible (because of disability or lack of fluency with English and isiXhosa). Of 1,123 persons interviewed, unexplained technical issues meant that data from 8 interviews were lost between collection and transfer to the database, leaving 1,115 (92%) completed interviews.
For both populations, the recruited sample was a reasonable match to the target population in terms of sex, age, and residence type (urban, peri-urban, or rural) (Table). Respondents in Kwa-Zulu-Natal also were a close match to the target population in terms of employment status (Appendix). No data on employment status for the target population were available for Western Cape.

Contact Numbers and Time
We stratified household and nonhousehold close contact numbers and time and casual contact time in KwaZulu-Natal and Western Cape, by sex, age, and household size (Figure 2, 3; Appendix Tables 1-6). Overall, close contact numbers and time, as well as casual contact time, were higher for women than for men in both communities; however, the   , and casual contact time (C) for study of social contact patterns for airborne transmission of respiratory pathogens, KwaZulu-Natal Province, South Africa, by sex, age group, and household size. Error bars show 95% CIs for total contact numbers or time. For KwaZulu-Natal, household size data were taken from census data and did not always correspond exactly with respondents' views of who they considered to be household members. For this reason, some contact with household members was reported by respondents who we recorded as having a household size of 1.
generated 95% plausible ranges for these matrices (Appendix Figure 1, 2). Estimated contact patterns by age did not vary greatly between the infection types in either community. However, age-mixing patterns were less assortative in the nonsaturating airborne and M. tuberculosis matrices compared with the droplet matrices in both communities (Appendix). The exception to this pattern was contact between persons 15-19 years of age in KwaZulu-Natal, which was more intense in the nonsaturating airborne and M. tuberculosis matrices than the droplet matrices. In both communities, relative to other adult age groups, overall contact intensities were lower in persons >50 years of age when considering contact relevant for nonsaturating airborne transmission or the transmission of M. tuberculosis compared with contact relevant for droplet transmission.

Discussion
Using data from 2 provinces in South Africa, we estimated contact and age-mixing patterns relevant for the transmission of droplet infections, nonsaturating airborne infections, and M. tuberculosis. In our communities, contact patterns did not vary greatly between contacts relevant for droplet infections and those relevant for nonsaturating airborne or M. tuberculosis transmission. However, using close contact data in models of the transmission of M. tuberculosis or other airborne infections in our study communities Emerging Infectious Diseases • www.cdc.gov/eid • Vol. 28, No. 10, October 2022 Province, South Africa, by sex, age, and household size, for study of social contact patterns for airborne transmission of respiratory pathogens. Error bars show 95% CIs for total contact numbers or time. In Western Cape, contact with household members was reported by a small proportion of respondents who had reported having no household members, most likely reflecting errors in the data. may mean that the importance of adults >50 years of age to transmission is overestimated.
Very few data are available on casual contact patterns from any setting. Previous studies in the same community in Western Cape have found greater drops in casual contact time than in close contact numbers in older age groups (6) and decreases in indoor casual contact numbers with age (7). Another study in the same community found high levels of age-assortative mixing with respect to casual contact time in schools and workplaces (8). More data are needed on casual contact patterns, and age-mixing 2022 Emerging Infectious Diseases • www.cdc.gov/eid • Vol. 28, No. 10, October 2022 Province, South Africa. Panels A, C, and F show absolute contact intensities between respondents and contacts in each age group; panels B, D, and G show intensities of contact between each member of each age group; panels E and H show intensities for airborne infections and M. tuberculosis compared with intensities for droplet infections, respectively. Numbers shown in panel A are the mean number of contacts respondents in each age group have with contacts in each age group per day. Numbers shown in panel B are the rate of contact between each person in the population per day, expressed as rates × 10 5 . Numbers and rates in panels C, D, F, and G are standardized so that the mean overall contact intensity by reported by adult respondents is the same as the mean number of overall close contacts reported by adult respondents (panel A). Contact numbers between child respondents and contacts in each age group were estimated from data on contact between adult respondents and child contacts.
patterns in particular, to determine whether the findings of this study are generalizable to other settings and to improve the predictions from mathematical models of the transmission of M. tuberculosis and other airborne infections. Our approaches to generating the separate droplet and airborne transmission matrices are necessarily simplifications, and many infections will not fit neatly into these 2 categories. In addition, considerable uncertainty exists about the role of different transmission routes to the spread of many infections. Droplets have traditionally been considered to be the main transmission route for most respiratory viruses; however, there is evidence that airborne transmission Emerging Infectious Diseases • www.cdc.gov/eid • Vol. 28, No. 10, October 2022 2023 Province, South Africa. Panels A, C, and F show absolute contact intensities between respondents and contacts in each age group; panels B, D, and G show intensities of contact between each member of each age group; panels E and H show intensities for airborne infections and Mycobacterium tuberculosis compared with intensities for droplet infections, respectively. Numbers shown in panel A are the mean number of contacts respondents in each age group have with contacts in each age group per day. Numbers shown in panel B are the rate of contact between each person in the population per day, expressed as rates × 10 5 . Numbers and rates in panels C, D, F, and G are standardized so that the mean overall contact intensity by reported by adult respondents is the same as the mean number of overall close contacts reported by adult respondents (panel A). Contact numbers between child respondents and contacts in each age group were estimated from data on contact between adult respondents and child contacts.
can occur for a wide range of pathogens, including influenza, respiratory syncytial virus, Middle East respiratory syndrome coronavirus, and SARS-CoV-2 (9). One model using data on household transmission of influenza A suggested that airborne transmission was responsible for about half of infections (10). For infections where both airborne and droplet or short range aerosol transmission are thought to play an important role in transmission, an intermediate matrix may be preferable. There are 2 main differences between our droplet and airborne or M. tuberculosis age-mixing matrices. The first is the type of nonhousehold contacts considered: close (face-to-face conversation) or casual (sharing space indoors). The second is that the airborne and (nonhousehold component of the) M. tuberculosis matrices are based on contact time, rather than unique contact numbers. The primary reason for using contact time for casual contacts is that respondents are unlikely to be able to estimate unique casual contact numbers for many locations they visit, necessitating the use of contact time or assumptions about the rate of turnover of unique persons in a location. For our droplet transmission matrices, we chose to use unique contact numbers in a 24-hour period because that is the most commonly used method (2) and therefore enables comparisons to be made with what is typically done. However, we should note that both the choice of a 24-hour time period and the lack of any weighting or restrictions by contact duration or other measures of closeness are relatively arbitrary choices.
Robust evidence as to the types of contact most relevant to transmission are limited for respiratory infections. Several studies have compared the fit to data on varicella, parvovirus B19, or influenza A seroprevalence by age of models parameterized by using contact patterns generated from close contact data in a range of different ways (11-13). Overall, those studies suggest that analysis methods that give greater weight to more intimate contacts may be preferable in some circumstances; for instance, restricting what counts as a contact to those involving physical touch or a minimum contact duration or using contact time rather than contact numbers. Approaches based on contact numbers may be more suitable for more highly transmissible infections such as measles, where only a short duration of contact is needed for transmission, whereas approaches based on contact time may be more suitable for less transmissible infections, where repeated or longer contacts are needed (14).
Fewer studies have considered expanding the pool of contacts beyond close contacts only, to also include casual contacts. However, a study that had paired individual-level contact data and pandemic influenza A serologic data found that models that included a variable for number of locations visited were strongly supported over those that only included variables for age and close contact numbers (15). This finding suggests that airborne transmission may play a role in the spread of influenza A, or that the standard close contact definition misses a substantial proportion of contacts at risk for droplet transmission.
Other factors may also influence airborne and M. tuberculosis transmission risk, which are not accounted for in the analyses. Ventilation rates play a large role in determining airborne infection risk (16), and giving less weight to contact occurring in better ventilated settings would improve our airborne and M. tuberculosis matrices. Unfortunately, few data on ventilation rates by location type are available, and they show large amounts of variation between locations and between the same location on different days (17). Saturation of contacts may occur for infections other than M. tuberculosis, particularly highly transmissible pathogens such as measles virus. An approach based on casual contact numbers may be preferable for these infections but would be highly dependent on assumptions made about how unique contact numbers are related to estimates of cross-sectional numbers of persons present.
There are several limitations when using casual contact data to estimate mixing patterns. First, estimates of contact time in places where large numbers of persons are present are likely to be less reliable because a person's estimates of the number of persons present are likely to be poor and because the assumption that a risk for transmission exists between all persons present in the space may not be true in larger spaces. Estimates may be poorer when asking about a random day in the past week (as we did in KwaZulu-Natal) than when asking about the day before the interview (as we did in Western Cape). In our main analysis, when estimating contact time, we cap the number of persons at risk for transmission at 100. In our sensitivity analyses, we show that using a cap of 20 persons or not capping the numbers of persons has a moderate effect on casual contact time age-mixing matrices (Appendix). Conducting similar sensitivity analyses may be necessary when using age-mixing matrices calculated using casual contact time in mathematical models.
A second limitation is that the approach we use to determining the ages of adults present in locations other than respondents' own homes is indirect and relies on the assumption that the age distribution of adults present in a location type reflects the duration of time respondents of different ages reported spending in that location type. This assumption may not always be reasonable if different age groups tend to visit different locations of the same type (or at different times) or substantial mixing occurs with persons from outside the study community. These issues are discussed further in McCreesh et al. (6).
An additional limitation of our estimates for KwaZulu-Natal only is that we did not recruit persons 15-17 years of age and instead assumed in the analysis that contact by persons 18-19 years of age was representative of contact by all persons 15-19 years of age. This assumption is unlikely to be true given that contacts by persons 15-17 and 18-19 years of age differ greatly in Western Cape (Appendix Figure 9). For this reason, our estimates for persons 15-19 years of age in KwaZulu-Natal should be treated with caution.
To conclude, our estimated age-mixing matrices for droplet transmission, nonsaturating airborne transmission, and M. tuberculosis transmission were not substantially different from each other for either community. This finding provides some reassurance that the widespread use of close contact data to parameterize age-mixing matrices for transmission models of airborne infections may not be resulting in major inaccuracies. Some differences were observed, however, particularly in the oldest age group, and our data were from 2 communities in South Africa only. We recommend that future social contact surveys collect data on casual contacts as well as close contacts to determine whether the similarity between different types of contact pattern is true across other settings. We would also urge mathematical modelers to consider whether unique close contact numbers in a 24-hour period are the most appropriate contacts for the infection and scenario they are simulating and to consider performing sensitivity analyses when uncertainty exists as to the most appropriate contact definition.

Since the 2015 Zika virus outbreak in the Americas, transmission of this vectorborne disease has substantially decreased. But Zika virus doesn't spread only through mosquito bites…it also spreads through sexual transmission, blood transfusions, breastfeeding, and even needlestick injuries in laboratories.
Stringent safety protocols minimize the risk of laboratoryassociated exposures. But on rare occasions, researchers are accidentally exposed to the disease they are trying to solve.

Laboratory-Associated Zika Virus, United States
Visit our website to listen: https://go.usa.gov/xFZU2

Comparison of employment status with target population, KwaZulu-Natal
To investigate the whether we may have under-recruited people who were employed in KwaZulu-Natal, we compared data on employment from the most recent DSA survey between respondents in the social contact survey and the target population aged 18+ years as a whole. No data on employment status for the target population were available for Western Cape.

Weighting
All analyses of contact numbers and contact time were weighted. For KwaZulu-Natal, they were weighted to the study population composition by age group (18-19, 20-29, 30-39, 40-49, 50+) and sex. For Western Cape, they were weighted to the study population composition by age group (15-17, 18-19, 20-29, 30-39, 40-49, 50+) and sex. As fewer respondents were asked about Fridays and Saturdays, the Western Cape data were also weighted by the day of the week.

Estimating close contact numbers
Respondents were first asked to make a numbered list of all their contacts, with help from the interviewer. The total number of contacts was recorded on the tablet computers, along with the number of those contacts who were members of the respondent's household. Close contact numbers by respondent characteristic were estimated using the total number of close contacts that the respondent reported, and the number of those contacts who were household members.
Respondents who reported more household contacts than total contacts were excluded from the analysis.
Close contact age mixing matrices were generated using data on close contacts on which more detailed information were available (all contacts if respondents reported ≤10, a random 10 if they reported more). When generating the central estimate, contact numbers by age group for respondents were multiplied by T / Da, where T was the total number of contacts that they reported, and Da was the total number of contacts whose age was provided. Respondents who reported a nonzero number of close contacts, but who did not give the age of any of their contacts were excluded. 95% plausible intervals were generated using 10,000 bootstrapped samples, re-sampling respondents with replacement within age categories, and re-sampling T contacts with replacement from the set of all contacts of the respondent on which detailed information were collected (1). Only sampled contacts who were or weren't reported to be a member of the respondent's household were included when estimating household and 'other' age-mixing patterns respectively.
The age mixing matrices were adjusted to be symmetric, using the study community age structures. Data on adult contacts with children were used to estimate child contact with adults.
To allow comparison between the two study communities, and between close and casual age mixing patterns, the lowest respondent age group was set at 15-19 years for both surveys. As 15-17 year olds were not interviewed in KwaZulu-Natal, we assumed that contact patterns in 18-19 years olds were representative of contact patterns in all 15-19 year olds, and adjusted the weights accordingly.

Estimating close contact time
The approach used for estimating close contact time was the same as that used for estimating close contact numbers, except that contact numbers were multiplied by the amount of time that respondents reported spending with each contact that day.
Contacts with the contact duration missing were excluded when generating bootstrap samples for estimating close contact time age mixing patterns. When generating the central estimate, contact numbers by age group for respondents were multiplied by T / Dad, where T was the total number of contacts that they reported, and Dad was the total number of contacts whose age and duration were both provided.

Estimating casual contact time
For each location visited, respondents were asked to identify the location type, from a list of frequently visited location types identified by local researchers and fieldworkers before the start of data collection. If the interviewer could not identify the correct location type on the list, they selected 'Other' and gave details. In the analysis, locations were excluded if it was considered likely that most all or of the time would have been spent outdoors (e.g., if the details given were 'gardening'). Several responses were re-coded, if it was considered plausible from the details given that the location was covered by one of the original options (e.g., 'domestic worker' was changed to 'House off plot'). Several new location categories were added, if reported by multiple respondents (e.g., 'factory'). Finally, remaining responses in the 'Other' category were recorded as 'Other' if the type of location could be determined from the free text variable, and 'Missing' if it could not be.
Respondents were excluded from all casual contact time analyses if: 1) They reported visiting no locations (including their own home) and using no transport 2) The variable giving the total number locations visited was missing, and no information was provided on any locations visited

3) No information was available on any of the locations visited or transport used
Casual contact time was estimated as the duration of time that respondents reported spending in a location, multiplied by the number of people that they estimated were present at the location, halfway through the time that they were there. In the analysis, total numbers of people present were capped at a maximum of 100, as above this value, it is unlikely that the respondent had sufficient contact with each person present to allow transmission. Estimates of numbers of adults and children present were reduced by the same proportion for each location, to give a maximum total number of people present of 100.
If the estimated number of people or children present was missing for a location, or if the estimated number of children present was greater than the estimated total number of people present, then the numbers of adults and children present were set equal to the mean reported number of adults and children present at locations of that type in the same community (KwaZulu-Natal or Western Cape). These numbers were rounded to the nearest whole number when generating bootstrap samples. If the duration of time spent in the location was missing, the duration was set equal to the mean duration for locations of that type.
Age mixing matrices for casual contact time were generated using the data on locations visited, the duration of time spent in the location, and the estimated number of adults and children present. Central estimates were generated using the method outlined in McCreesh et al (2). As data were collected on numbers of people and children present in indoor locations only, and not on the ages of adults present, the age distribution of adult casual contacts needed to be estimated. To do so, we assumed that the age distribution of adults present in each location type matched the age distribution of respondents who reported visiting locations of that type, weighted by the duration of time they reported spending in that location type and weighted to the sampled population age and sex distribution. To generate plausible ranges, 10,000 bootstrap samples were generated, re-sampling respondents with replacement within age categories, and sampling the ages of adults present by resampling with replacement respondents who reported visiting locations of that type (weighted by duration, and to the sampled age and sex distribution). The number of children present was set equal to number of children present reported by the respondent. Contact times were then estimated by multiplying the duration of time the respondent reported spending in each location by the sampled number of people of each age group present.
The age mixing matrices were made symmetric, using data on adult contact time with children to estimate child contact time with adults. As respondents were asked to estimate numbers of children present who were aged under 15 years, we set the lowest respondent age group to be 15-19 years. As 15-17 year olds were not interviewed in KwaZulu-Natal, we assumed that contact patterns in 18-19 years olds were representative of contact patterns in all 15-19 year olds, and adjusted the weights accordingly.

Estimating age-mixing patterns relevant for the transmission of droplet infections
Age-mixing patterns relevant for the transmission of droplet infections were assumed to be equal to age mixing patterns calculated from close contact numbers.

Estimating age-mixing patterns relevant for the transmission of airborne infections
To generate age-mixing patterns relevant for the transmission of airborne infections, we summed estimated close contact time between household members, and estimated casual contact time occurring in locations other than the respondents' own houses. 95% plausible ranges were generated by pairing each of the 10,000 close contact household bootstrapped matrices with one of the 10,000 outside-household casual contact time bootstrapped matrices.
To allow direct comparisons to be made between the different age mixing matrices, the matrices for airborne infections (central estimate and 10,000 individual bootstrapped matrices) were then adjusted to give the same mean contact intensity between adults as the matrices for close contacts (using the central estimate matrices and 10,000 individual bootstrapped matrices respectively).

Estimating age-mixing patterns relevant for the transmission of Mycobacterium tuberculosis
It is estimated that, in high incidence settings, only 8%-19% of tuberculosis comes from household transmission (2). Long durations of disease (3) also mean that transmission to household members may partially saturate, meaning that the relationship between contact time and transmission is nonlinear for household contacts. We therefore estimated age mixing matrices relevant to the transmission of Mtb by creating weighted averages of close contact numbers with household members, and casual contact time occurring outside respondents' own households.
For each pair of bootstrapped household close contact number and non-household casual contact time matrices, a proportion of contact that should occur in households was sampled from a uniform distribution between 8%-16% (the range of values found by different studies), with 12% used for the central estimate. A weighted average was then generated, given the desired proportion of overall 'contact' by adults occurring in households.
To allow direct comparisons to be made between the different age mixing matrices, the matrices for Mtb (central estimate and 10,000 individual bootstrapped matrices) were then adjusted to give the same mean contact intensity between adults as the matrices for close contacts (using the central estimate matrices and 10,000 individual bootstrapped matrices respectively).

Assortative mixing by age group
We quantified the assortativeness of mixing by age group using the index Q, which takes the value of 1 when all contact occurs within age groups, and 0 when there is homogeneous mixing by age (4). 95% plausible intervals and p-values for the difference between assortativeness in the airborne and Mtb transmission matrices compared to the droplet transmission matrices were generated using the bootstrapped samples.

Sensitivity analyses
In our main analysis, we cap the number of people at risk of infection in locations (i.e., casual contact numbers) at a maximum of 100. In the sensitivity analysis, we explore the effect of setting the cap at 20, or not having a cap.
When generating the non-saturating airborne and Mtb age-mixing matrices, we used close contact time and close contact number data respectively to estimate contact time between household members. In the sensitivity analysis, we explore the effect of using casual contact time.
For all sensitivity analysis age-mixing matrices, we rescaled contact times to give the same overall mean contact hours per adult in the sensitivity analysis as in the main analysis. This was done because the primary use of age mixing matrices is in mathematical modeling, where it is usually the relative values of the cells in the matrices that has an impact on model dynamics, not the absolute values. (1139/30259) of the target population.

Missing/incomplete data
Close contact data • The reported number of household contacts was higher than the reported total number for two respondents in KwaZulu-Natal. They were excluded from all analyses of close contacts.
• Three respondents in KwaZulu-Natal had the contact age missing for all of their contacts, and were excluded from the age-mixing analysis.
• The number of household contacts was unknown for 11 respondents in KwaZulu-Natal. They were not included in the analysis of household or non-household contacts (Appendix Figures 10-13).
• Contact ages were missing for 54 contacts in KwaZulu-Natal and 16 contacts in Western Cape • Whether a contact is a household member was missing for 16 contacts in KwaZulu-Natal and two in Western Cape

Sensitivity analyses
Changing the cap on people at risk in locations Overall casual contact time was lower when the number of people at risk was capped at 20, and higher when it was not capped, compared to when it was capped at 100 people (Appendix Figure 1). Changing the cap had a moderate effect on casual contact time age-mixing patterns, although most changes were not large compared to the breadths of the 95% plausible ranges (Appendix Figure 2).

Generating age-mixing matrices using casual contact time for household contact
Calculating age-mixing matrices relevant for non-saturating airborne and Mtb transmission using casual contact time data for contact in between household members had very little effect on estimated age-mixing patterns (Supporting information, Appendix Figures 5, 6).
The exception to this was contact relevant to airborne transmission between 15-19 year olds in KwaZulu-Natal, which was lower when casual contact time data were used than in the main analysis.