Simulation methods

Initially, separate univariable analyses were performed on all study variables within each exposure group to generate crude odds ratios (OR) with 95% confidence intervals (CI). Exposure variables significantly associated with Campylobacter infection in these crude analyses were considered as candidate variables for development of the multivariable model. Using sequential backward elimination of non-significant variables (based on the model deviance statistic), multivariable logistic regression models were then constructed for each exposure group after controlling for state, sex and education. Confounders sex and education were identified from a separate multivariable logistic model of demographic variables and state was considered a design variable (Table 2A). Again using backward elimination, an omnibus multivariable (main-effects) model was then constructed using all significant exposure variables derived from each of the separate multivariable exposure group models as candidate variables and controlling for state, sex and education. Finally, once the most parsimonious multivariable model was identified, two-factor interactions were introduced into the model and backward elimination of nonsignificant terms were undertaken (based on the model deviance statistic) until the final model was ascertained. The two-factor interactions considered were based on biological plausibility and prior knowledge from the literature. The Hosmer-Lemeshow goodness-offit test was performed on all multivariable models to check model adequacy. SPSS (SPSS Version 11.0; SPSS Inc., Chicago) was used for all regression analyses and a significance level of α = 0.05 was used to define statistical significance. To reduce the risk of a type I error, only significant variables were included in the multivariable logistic regression models. A detailed description of the analytical approach and pursuant results has been published elsewhere (7).


Development of the multivariable logistic regression model
Initially, separate univariable analyses were performed on all study variables within each exposure group to generate crude odds ratios (OR) with 95% confidence intervals (CI). Exposure variables significantly associated with Campylobacter infection in these crude analyses were considered as candidate variables for development of the multivariable model. Using sequential backward elimination of non-significant variables (based on the model deviance statistic), multivariable logistic regression models were then constructed for each exposure group after controlling for state, sex and education. Confounders sex and education were identified from a separate multivariable logistic model of demographic variables and state was considered a design variable (Table 2A). Again using backward elimination, an omnibus multivariable (main-effects) model was then constructed using all significant exposure variables derived from each of the separate multivariable exposure group models as candidate variables and controlling for state, sex and education. Finally, once the most parsimonious multivariable model was identified, two-factor interactions were introduced into the model and backward elimination of nonsignificant terms were undertaken (based on the model deviance statistic) until the final model was ascertained. The two-factor interactions considered were based on biological plausibility and prior knowledge from the literature. The Hosmer-Lemeshow goodness-offit test was performed on all multivariable models to check model adequacy. SPSS (SPSS Version 11.0; SPSS Inc., Chicago) was used for all regression analyses and a significance level of α = 0.05 was used to define statistical significance. To reduce the risk of a type I error, only significant variables were included in the multivariable logistic regression models. A detailed description of the analytical approach and pursuant results has been published elsewhere (7).

Simulation methods
To calculate the proportion of campylobacteriosis that occur among persons aged five years and older in Australia, Australian notification data for the years 2001 to 2003 was reviewed (12). The yearly proportions for cases aged 5 years and older among all notified cases reported by the National Notifiable Diseases Surveillance System (NNDSS) between 2001 and 2003 were 84.3%, 85.1% and 87.4% respectively.
Simulations of size 1,000,000 were undertaken in SAS (SAS Institute Inc. The SAS System for Windows (9.1). Cary, N.C, USA) to estimate the total number of Campylobacter infections attributable to each specific risk factor identified in the final multivariable model using the following steps: 1. Total Campylobacter case numbers (N j ). We assumed that 223,000 (95% CrI: 94,000, 363,000) cases of campylobacteriosis occur in Australia in a typical year (3). As this distribution is asymmetrical about its mean, a power transformation of 7/8 was applied (removing the asymmetry in the 95% CrI), 1,000,000 random variates generated, and then these variates were back-transformed to the original scale. We denote these backtransformed variates as N j for j=1,…,1,000,000.

Eligible Campylobacter case numbers (n j
where: a i OR is the i th category-specific adjusted odds ratio calculated from the logistic regression model and i p is the proportion of all study cases falling into i th exposure level for a categorical variable with k levels and reference category denoted by 1 = i . The total population attributable risk proportion (PAR) is given by: As the distribution of i aOR is log-normal, i PAR values for each category level i, i >1, were derived in the following manner. Simulated log(aOR i ) values were randomly generated from a normal distribution with mean and standard deviation estimates derived from the i th exposure category of the risk factor under investigation in the final multivariable logistic regression model, i=2,…,k. These generated log(aOR i ) values were exponentiated, producing aOR i values. The proportion of people within each of the i th exposure categories, denoted by p i , was generated from a binomial distribution via: x i ~ Binomial(q i , m) where q i is the proportion of cases in the i th exposure category, m is the number of cases with non-missing data, and p i = x i / m. Simulated i PAR and PAR values were then derived by combining the generated aOR i and p i given by equations (i) and (ii) above. This process was repeated j=1,…,1,000,000 times.

4.
Attributable Campylobacter case numbers. Finally, eligible campylobacteriosis case numbers and PAR simulated values derived in Steps 2 and 3 above were multiplied together to produce distributions of the total number of Campylobacter infections attributable to each specific risk factor. Because some distributions are skewed, we present medians and 95% credible intervals (defined to be the 2.5 and 97.5 percentiles) for the simulation results.