The serial interval of COVID-19 from publicly reported confirmed cases

We estimate the distribution of serial intervals for 468 confirmed cases of COVID-19 reported in 93 Chinese cities by February 8, 2020. The mean and standard deviation are 3.96 (95% CI 3.53–4.39) and 4.75 (95% CI 4.46–5.07) days, respectively, with 12.6% of reports indicating pre-symptomatic transmission.


Group
Mean [

Data
We collected publicly available data on 6,903 confirmed cases from 271 cities of mainland China, that were available online as of February 8, 2020. The data were extracted in Chinese from the websites of provincial public health departments and translated to English (Table   S5). We then filtered the data for clearly indicated transmission events consisting of: (i) a . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint known infector and infectee , (ii) reported locations of infection for both cases, and (iii) reported dates and locations of symptom onset for both cases. We thereby obtained 468 infector-infectee pairs identified via contact tracing in 93 Chinese cities between January 21, 2020 and February 8, 2020 ( Figure S1). The index cases (infectors) for each pair are reported as either importations from the city of Wuhan ( N = 239), importations from cities other than Wuhan ( N = 106) or local infections ( N = 122). The cases included 752 unique individuals, with 98 index cases who infected multiple people and 17 individuals that appear as both infector and infectee. They range in age from 1 to 90 years and include 386 females, 363 males and 3 cases of unreported sex.

Estimating serial interval distribution
For each pair, we calculated the number of days between the reported symptom onset date for the infector and the reported symptom onset date for the infectee. Negative values indicate that the infectee developed symptoms before the infector. We then used the fitdist function in Matlab (19) to fit a normal distribution to all 468 observations. It finds unbiased estimates of the mean and standard deviation, with 95% confidence intervals. We applied the same procedure to estimate the means and standard deviations with the data stratified by whether the index case was imported or infected locally.
Estimating the basic reproduction number ( R 0 ) Given an epidemic growth rate r and normally distributed generation times with mean ( ) and standard deviation ( ), the basic reproduction number is given by is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 20, 2020. . https://doi.org/10. 1101/2020 (6) .
Since we do not know the COVID-19 generation time distribution, we use our estimates of the COVID-19 serial interval distribution as an approximation (Table S1), noting that the serial interval distribution tends to be more variable than the generation time distribution. We assume that the COVID-19 growth rate ( r ) is 0.10 per day [95% CI 0.050-0.16] based on a recent analysis of COVID-19 incidence in Wuhan, China (13) . To estimate R 0 , we take 100,000 Monte Carlo samples of the growth rate ( ) and the mean and standard deviation of the serial interval ( where and ). We thereby estimate an R 0 of 1.32 [95% CI 1.16-1.48].

Model Comparison
We used maximum likelihood fitting and the Akaike information criterion (AIC) to evaluate four candidate models for the COVID-19 serial interval distributions: normal, lognormal, Weibull and gamma. Since our serial interval data includes a substantial number of non-positive values , we fit the four distributions both to truncated data in which all non-positive values are removed and to shifted data in which 12 days are added to each observation ( Figure S1 and Tables S2-S3). The lognormal distribution provides the best fit for the truncated data (followed closely by the gamma and Weibull). However, we do not believe there is cause for excluding the non-positive data and would caution against making assessments and projections based on the truncated data. The normal distribution provides the . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 20, 2020. . https://doi.org/10. 1101/2020 best fit for the full dataset (shifted or not) and thus is the distribution we recommend for future epidemiological assessments and planning.  Table S4. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

Supplementary Analysis
. CC-BY-NC-ND 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 20, 2020. . https://doi.org/10.1101/2020 To facilitate interpretation and future analyses, we summarize key characteristics of the COVID-2019 infection report data set.

Figure S2. Number of infections per unique index case in the infection report data set.
There are 301 unique infectors across the 468 infector-infectee pairs. The number of transmission events reported per infector ranges from 1 to 16, with ~55% having only one.
. CC-BY-NC-ND 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 20, 2020March 20, . . https://doi.org/10.1101March 20, /2020 doi: medRxiv preprint Figure S3. Geographic composition of the infection report data set . The data consist of 468 infector-infectee pairs reported by February 8, 2020 across 93 cities in mainland China. Colors represent the number of reported events per city, which range from 1 to 72, with an average of 5.03 (SD 8.54) infection events. The 71 cities with fewer than five events are colored in blues; the 22 cities with at least five events are colored in shades of orange.
. CC-BY-NC-ND 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 20, 2020. . https://doi.org/10.1101/2020