Title: The Novel Coronavirus, 2019-nCoV, is Highly Contagious and More Infectious Than Initially Estimated

The novel coronavirus (2019-nCoV) is a recently emerged human pathogen that has spread widely since January 2020. Initially, the basic reproductive number, R 0 , was estimated to be 2.2 to 2.7. Here we provide a new estimate of this quantity. We collected extensive individual case reports and estimated key epidemiology parameters, including the incubation period. Integrating these estimates and high-resolution real-time human travel and infection data with mathematical models, we estimated that the number of infected individuals during early epidemic double every 2.4 days, and the R 0 value is likely to be between 4.7 and 6.6. We further show that quarantine and contact tracing of symptomatic individuals alone may not be effective and early, strong control measures are needed to stop transmission of the virus. One-sentence summary By collecting and analyzing spatiotemporal data, we estimated the transmission potential for 2019-nCoV.

real-time domestic travel data in China.Third, to address the issue of potential data collection and methodological bias or incomplete control of confounding variables, we implemented two distinct modeling approaches using different sets of data.These analyses produced estimates of the exponential growth rates that are consistent with one another and higher than previous estimates.
A unique feature of our case report dataset (Table S1) is that it includes case reports of many of the first or the first few individuals who were confirmed with the virus infection in each province, where dates of departure from Wuhan were reported.All together, we collected 140 individual case reports (Table S1).These reports include demographic information including age, sex and location of hospitalization, as well as epidemiological information including potential time periods of infection, dates of symptom onset, hospitalization and case confirmation.
Using this dataset, we estimated the basic parameter distributions of durations from initial exposure to symptom onset to hospitalization to discharge or death.Our estimate of the time from initial exposure to symptom onset is 4.2 days with a 95% confidence interval (CI for short below) between 3.5 and 5.1 days (Fig. 1C).This estimated period is about 1 day shorter and has lower variance than a previous estimate (1).The shorter time is likely caused by the expanded temporal range of our data that includes cases occurring after broad public awareness of the disease.Patients reported in the Li et al. study (1) are all from Wuhan and most had symptom onset before mid-January; in our dataset, many patients had symptom onset during or after mid-January and were reported in provinces other than Hubei province (where Wuhan is the capital).The time from symptom onset to hospitalization showed evidence of time dependence (Fig. 1D  and S1).Before January 18, the time from symptom onset to hospitalization was 5.5 days (CI: 4.6 to 6.6 days); whereas after January 18, the duration shortened significantly to 1.5 days only (CI: 1.2 to 1.9 days) (p-value <0.001 by Mann-Whitney U test).The change in the distribution coincides with the period when infected cases were first confirmed in Thailand, news reports of potential human-to-human transmission and upgrading of emergency response level to Level 1 by China CDC.The emerging consensus about the risk of 2019-nCoV likely led to significant behavior change in symptomatic people seeking more timely medical care over this period.We also found that the time from initial hospital admittance to discharge is 11.5 days (Fig. 1E; CI: 8.0 to 17.3 days) and the time from initial hospital admittance to death is 11.2 days (Fig. 1F; CI: 8.7 to 14.9 days).
Moving from empirical estimates of basic epidemiological parameters to an understanding of the actual epidemiology of 2019-nCoV requires model-based inference.We thus used mathematical models to integrate the empirical estimates with spatiotemporal domestic travel and infection data outside of Hubei province to infer the outbreak dynamics in Wuhan.Inference based on data outside of Hubei is more reliable because, as a result of the awareness of the risk of virus transmission, other provinces implemented intensive surveillance system to detect individuals with high temperatures and closely track travelers out of Wuhan using digital data to identify infected individuals (6) as the outbreak in Wuhan unfolded.
We collected real-time travel data during the epidemic using the Baidu® Migration server (Fig. 2A and Table S2).The server an online platform summarizing mobile phone travel data through Baidu® Huiyan [https://huiyan.baidu.com/].Baidu® Huiyan is a widely used positioning system in China.It processes >120 billion positioning requests daily through GPS, WIFI and other means [https://huiyan.baidu.com/].Therefore, the data represents a reliable, real-time and highresolution source of travel patterns in China.We extracted daily travel data from Wuhan to each of the provinces.We found that in general, between 40,000 to 140,000 people in Wuhan traveled to destinations outside of Hubei province daily before the lock-down of the city on January 23, with travel peaks on January 9, 21 and 22 (Fig. 2B).Thus, it is likely that this massive flow of people from Hubei province during January facilitated the rapid dissemination of virus.
We integrated the travel data into our inferential models using two approaches.The rationale of the first model, the 'first-arrival' approach, is that an increasing fraction of people infected in Wuhan increases the likelihood that one such case is exported to the other provinces.Hence, how soon new cases are observed in other provinces can inform disease progression in Wuhan (Fig. 2C).This has similarities with earlier analyses to estimate the size of the 2019-nCoV outbreak in Wuhan based on international travel data (5,7,8), though inference based infected cases outside of China may suffer large uncertainty due to the low volume of international travel.In our model, we assumed exponential growth for the infected population I* in Wuhan,  *   , where  is the exponential growth rate and  is the time of the exponential growth initiation, i.e.  *   1.Note that  is likely to be later than the date of the first infection event, because multiple infections may be needed before the onset of exponential growth (9).We used travel data to each of the provinces (Table S3) and the earliest times that an infected individual arrived at a province across a total of 26 provinces (Fig. 2D) to infer  and  (see Supplementary Materials for details).Model predictions of arrival times in the 26 provinces fitted the actual data well (Fig. S2).We estimated that the date of the beginning of an exponential growth is December 20, 2019 (CI: December 11 to 26).This suggests that human infections in early December may be due to spillovers from the animal reservoir or limited chains of transmission (10,11).The growth rate of the outbreak is estimated to be 0.29 per day (CI: 0.21 to 0.37 per day), a much higher rate than two recent estimates (1,5).This growth rate corresponds to a doubling time of 2.4 days.We further estimated that the total infected population size in Wuhan was approximately 4,100 (CI: 2,423 to 6,178) on January 18, which is remarkably consistent with a recently posted estimate (7).The estimated number of infected individuals is 18,700 (CI: 7,147, 38,663) on January 23, i.e. the date when Wuhan started lock down.We projected that without any control measure, the infected population would be approximately 233,400 (CI: 38,757 to 778,278) by the end of January (Fig. S3).
An alternative model, the 'case count' approach, used daily case count data between January 19 and 26 from provinces outside of Hubei to infer the initiation and the growth rate of the outbreak.We restricted the data to this period because during this time infected persons found outside of Hubei province generally reported visiting Wuhan within 14 days of becoming symptomatic, i.e. cases during that time period were indicative of the dynamics in Wuhan.We developed a metapopulation model based on the classical SEIR model (12).We assumed a deterministic exponential growth for the infected populations in Wuhan, whereas in other provinces, we represented the trajectory of infected individuals who travelled from Wuhan using a stochastic agent-based model.The transitions of the infected individuals from symptom onset to hospitalization and then to case confirmation were assumed to follow the distributions inferred from the case report data (see Supplementary Materials for detail).Simulation of the model using best fit parameters showed that the model described the observed case counts over time well (Fig. 2E).The estimated date of exponential growth initiation is December 16, 2019 (CI: December 12 to Dec 21) and the exponential growth rate is 0.30 per day (CI: 0.26 to 0.34 per day).These estimates are consistent with estimates in the 'first arrival' approach (Fig. 2F and G, and Fig. S4).
We note that in both approaches, we assumed perfect detection of infected cases outside of Hubei province, i.e. the dates of first arrival and the number of case counts are accurate.This could be a reasonable assumption to make for symptomatic individuals because of the intensive surveillance implemented in China, for example, tracking individuals' movement from digital transportation data (6).However, it is possible that a fraction of infected individuals, for example, individuals with mild or no symptoms (13), were not hospitalized, in which case we will underestimate the true size of the infected population in Wuhan.We undertook sensitivity analyses to investigate how our current estimates are affected by this issue using both approaches (see Supplementary Materials for detail).We found that if a proportion of cases remained undetected, the time of exponential initiation would be earlier than December 20, translating into a larger population of infected individuals in January, but the estimation of the growth rate remained the same.Overall, the convergence of the estimates of the exponential growth rate from the two approaches emphasizes the robustness of our estimates to modeldependent assumptions.
Our estimated outbreak growth rate is significantly higher than two recent reports where the growth rate was estimated to be 0.1 per day (1,5).This estimate were based on early case counts from Wuhan (1) or international air travel data (5).However, these data suffer from important limitations.The reported case counts in Wuhan during early outbreak are likely to be underreported because of many factors, and because of the low numbers of individuals traveling abroad compared to the total population size in Wuhan, inference of the infected population size and outbreak growth rate from infected cases outside of China suffers from large uncertainty (7,8).Our estimated exponential growth rate, 0.29/day (a doubling time of 2.4 days) is consistent the rapidly growing outbreak during late January (Fig. 1A).
Using the exponential growth rate, we next estimated the range of the basic reproductive number, R 0 .It has been shown that this estimation depends on the distributions of the latent period (defined as the period between the times when an individual infected and become infectious) and the infectious period (14).For both periods, we assumed a gamma distribution and varied the mean and the shape parameter of the gamma distributions in a large range to reflect the uncertainties in these distributions (see Supplementary Materials).It is not clear when an individual becomes infectious; thus, we considered two scenarios: 1) the latent period is the same as the incubation period, and 2) the latent period is 2 days shorter than the incubation period, i.e. individuals start to transmit 2 days before symptom onset.Integrating uncertainties in the exponential growth rate estimated from the 'first arrival' approach and the uncertainties in the duration of latent and infectious periods, we estimated the values of R 0 to be 6.3 (CI: 3.3 to 11.3) and 4.7 (CI: 2.8 to 7.6), for the first and second scenarios, respectively (Fig. 3A).When using the estimates from the 'case count' approach, we estimated slightly higher R 0 values of 6.6 (CI: 4.0 to 10.5) and 4.9 (CI: 3.3 to 7.2), for the first and second scenarios, respectively (Fig. S5).Overall, we report R 0 values are likely be between 4.7 and 6.6 with a CI between 2.8 to 11.3.We argue that the high R 0 and a relatively short incubation period lead to the extremely rapid growth of the of 2019-nCoV outbreak as compared to the 2003 SARS epidemic where R 0 was estimated to be between 2.2 to 3.6 (15,16).
The high R 0 values we estimated have important implications for disease control.For example, basic theory predicts that the force of infection has to be reduced by 1 to guarantee extinction of the disease.At  2.2 this fraction is only 55%, but at  6.7 this fraction rises to 85%.To translate this into meaningful predictions, we use the framework proposed by Lipsitch et al (16) with the parameters we estimated for 2019-nCoV.Importantly, given the recent report of transmission of the virus from asymptomatic individuals (13), we considered the existence of a fraction of infected individuals who is asymptomatic and can transmit the virus (see Supplementary Materials).Results show that if as low as 20% of infected persons are asymptomatic and can transmit the virus, then even 95% quarantine efficacy will not be able to contain the virus (Fig. 3B).Given the rapid rate of spread, the sensitivity of control effort effectiveness to asymptomatic infections and the potential of transmission before symptom onset, we need to be aware of the difficulty of controlling 2019-nCoV once it establishes in a new population (17).Future field, laboratory and modeling studies aimed to address the unknowns, such as the fraction of asymptomatic individuals, the time when individuals become infectious and the existence of superspreaders are needed to accurately predict the impact of various control strategies (9,17).
Fortunately, we see evidence that control efforts have a measurable effect on the rate of spread.Since January 23, Wuhan and other cities in Hubei province implemented vigorous control measures, such as closing down transportation and mass gatherings in the city; whereas, other provinces also escalated the public health alert level and implemented strong control measures.We noted that the growth rate of the daily number of new cases in provinces outside of Hubei slowed down gradually since late January (Fig. 3B).Due to the closure of Wuhan (and other cities in Hubei), the number of cases reported in other provinces during this period shall start to track local infection dynamics rather than imports from Wuhan.We estimated that the exponential growth rate is decreased to 0.14 per day (CI: 0.12 to 0.15 per day) since January 30.Based on this growth rate and an R 0 between 4.7 to 6.6 before the control measures, a calculation following the formula in Ref. (14) suggested that a growth rate decreasing from 0.29 per day to 0.14 per day translates to a 50%-59% decrease in R 0 to between 2.3 to 3.0.This is in agreement with previous estimates of the impact of effective social distancing during 1918 influenza pandemic (18).Thus, the reduction in growth rate may reflect the impact of vigorous control measures implemented and individual behavior changes in China during the course of the outbreak.
The 2019-nCoV epidemic is still rapidly growing and spread to more than 20 countries as of February 5, 2020.Here, we estimated the growth rate of the early outbreak in Wuhan to be 0.29 per day (a doubling time of 2.4 days), and the reproductive number, R 0 , to be between 4.7 to 6.6 (CI: 2.8 to 11.3).Among many factors, the Lunar New Year Travel rush in early and mid-January 2020 may or may not play a role in the high outbreak growth rate, although SARS epidemic also overlapped with the Lunar New Year Travel rush.How contiguous the 2019-nCoV is in other countries remains to be seen.If the value of R 0 is as high in other countries, our results suggest that active and strong population-wide social distancing efforts, such as closing down transportation system, schools, discouraging travel, etc., might be needed to reduce the overall contacts to contain the spread of the virus.

Case count and individual case reports
We collected and translated reports from documents published daily from the China CDC website and official websites of health commissions across provinces and special municipalities in China (website URLs are available upon request).We collected daily counts of confirmed cases in each province as well as 140 individual case reports (Table S1).Many of the individual reports were also published on the China CDC official website (http://www.chinacdc.cn/jkzt/crb/zl/szkb_11803/)and the China CDC weekly bulletin (in English) (http://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm).Our dataset includes demographic information including age, sex as well as epidemiological information including dates of symptom onset, hospitalization, case confirmation, discharge or death.Most of the health commissions in provinces and special municipalities documented and published detailed information of the first or the first few cases confirmed with 2019-nCoV infection.As a result, this dataset includes case reports of many of the first or the first few individuals who were confirmed with the virus infection in each province, where dates of departure from Wuhan were available.

Travel data
We used the Baidu® Migration server (https://qianxi.baidu.com/) to estimate the number of daily travelers in and out Wuhan (Table S2).Specifically, we extracted from the server the Immigration Index and Emigration Index for Wuhan, which are linearly related to the number of travelers going in and out of Wuhan, respectively, based on cell phone positioning data.We also extracted the fraction of individuals who went to or came from a particular province.It has been reported that there were 5 million people going out of Wuhan between the start of the Chinese New Year travel rush and January 23 (https://www.washingtonpost.com/world/asia_pacific/china-coronavirus-liveupdates/2020/01/30/1da6ea52-4302-11ea-b5fc-eefa848cde99_story.html;accessed Feb. 2,2020).This allowed us to calibrating the Emigration Index and estimated the number of daily travelers to or from a particular province, and thus the fraction of people traveling to or from a particular province (Table S3).These data were used in mathematical models to estimate the s

Estimating distributions of epidemiological parameters from individual reports
We used the first confirmed cases in provinces other than Hubei to inform the time between patient infection and the onset of symptoms ( = 24).These individuals had all traveled to Wuhan a short time preceding symptoms onset.Since these individuals were the first cases detected in the province, it is likely that the infection occurred during their recent stay in Wuhan.We approximated the time of infection as the middle time point of their stay.Because the delays between infection and symptoms onset vary between patients, we modeled the delay using a gamma distribution, as its support is nonnegative and it permits relatively large delays as compared to the median.Figure 1 in the main text presents results from fitting the distribution to the data.
The fitting procedure was performed by maximizing the likelihood of observed delays between infection and symptoms onset.For a single observation, the individual likelihood is the gamma density function evaluated at the infection-to-onset delay.Some of the delays were censored, i.e. bounded by a certain value.For example, in some cases, only the times of infection and hospitalization were reported, and the time of symptom onset was missing in the case report.In such cases, we assumed that the missing onset time is bounded between times of infection and hospitalization.Then, the likelihood for this observation is equal to the cumulative gamma distribution evaluated at this censored value, i.e., the time when the patient was hospitalized.The maximum likelihood estimates (MLEs) are the shape and scale parameters that maximize the sum over all observations of the individual log-likelihoods.We used differential_evolution in scipy.optimizelibrary (Python) to perform maximization.A stochastic algorithm was implemented in the optimization procedure to avoid being trapped in local minima.(1) The likelihood-based confidence intervals was computed by methods reported in Raue et al. (2) A similar approach was adopted to fit distributions to the time between symptom onset and hospitalization (  = 96 ), between hospitalization and discharge (  = 6 ), and between hospitalization and death ( = 23).The reported dates for these events was obtained directly from official sources.Data from cases originating from all over China and neighboring countries were used for distribution fitting.Detailed patient-level data is provided in Table S1.

The 'first-arrival' model: Inferring disease dynamics in Wuhan using the first-arrival times at other provinces
In this model, we used the first-arrival time of a patient who traveled from Wuhan to a specific other province and was later confirmed to have been infected by the 2019-nCoV.The rationale behind our approach is that an increasing fraction of people infected in Wuhan increases the likelihood that one such case is exported to the other provinces.Hence, how soon new cases are observed in other provinces can inform the disease progression in Wuhan.We hypothesize that this information is more reliable because the infected population in Wuhan needs to sufficiently large to allow probable export of one infected individual.The flow of expected cases depends on the flow of travelers to each province and on the proportion of the Wuhan population that is infected by the virus.
We first estimated the daily number of travelers from Wuhan to each of the China provinces.For this purpose, we used Wuhan's daily migration index to other provinces and the daily distribution of traveler destinations from Wuhan (see Data Collection).When assuming linearity between the migration index and the total number of exported individuals, it can be estimated that a migration index of 1 is approximately equal to 5 million individuals over the sum of migration indexes from January 10 to January 25, 2020 (it was reported that 5 million individuals left Wuhan during that period; see Data Collection section).The total number of daily Wuhan travelers to a province at a certain date was then set equal to the number of travelers estimated from the migration index times the fraction of the population having traveled to this province.Results from estimation are reported in Table S2.
An infected traveler may be pre-symptomatic, i.e. this individual may have been exposed to the virus () and not have developed symptoms or be already symptomatic ().In fact, for many individuals, infection onset was recorded days after the time of their departure from Wuhan (see Table S1).Assuming travelers represent a random sample of the whole population, it follows that the probability that a traveler is infected is equal to the number of exposed or infected individuals in Wuhan ( * =  + ) over the total Wuhan population (()).The total population size varied during the infection period.We estimated the population size by using the daily inflow and outflow of individuals from Wuhan (see Table S2).In order to represent the beginning of an outbreak, we modeled an exponential increase in the size of exposed and infected population over time : where  is the infection growth rate and  0 is the time of onset of exponential outbreak.
Equation ( 1) allows a simple analytic expression of the likelihood of arrival times for the first cases in each of the provinces other than Hubei.For a specific province, indexed by , we modeled the arrival of new cases in each province during short time intervals as a Poisson random process   () .Note that the rate parameter of this Poisson process, () =  * ()   ()/() depends on the time-varying sum of exposed and symptomatic populations  * (), the time varying flow of population   () transported from Wuhan to the province  and the time varying population size.
It can be shown mathematically (3) that the probability that no exposed or symptomatic traveler arrived to province  during a short time interval (,  + Δ) , Δ ≪ 1 is: We assume no delay was incurred due transportation in our model.Equation ( 222) is valid for any  > 0, and because the overall process is Markovian, we can formulate the probability that the time of arrival of the first case in province ,  () , is later than  by: where [ 0 , ) was partitioned into  equal intervals of Δ = ( −  0 )/, and we convert the Riemannian sum into an integral in the limit of  → ∞ (Δ → 0).Finally, we apply d/d to 1 − ℙ{ () > } to obtain the probability density function (PDF) of the first-arrival time of province : The form of the probability density function Eq. ( 4) was used to estimate the likelihood of observed arrival times in each province as a function of the growth rate  and outbreak initiation time  0 .This likelihood was maximized, again using differential_evolution in scipy.optimize,(1)and the confidence intervals for  and  0 were obtained through profile likelihood.
(2) Numerical integration was performed by discretizing time in daily time intervals, since both the flow of travelers and the population size in Wuhan were estimated daily.

Sensitivity analyses for the 'first-arrival' model
ℙ{ +Δ () −   () = 0} ≈ exp (−  * ()  () () Δ ) . (2) The arrival times were fitted using three versions of the above model.Each version made a different assumption on the probability that an infected or exposed individual having arrived at a location be later diagnosed with coronavirus.In the first sensitivity analysis, we assumed that this probability was 50%.In the second analysis, we assumed this probability to be 10%.Finally, we tested the assumption that this probability was 0% for cases having arrived before Dec 31 st , 2019, after which point new infected arrivals had a 50% probability of being later diagnosed.
The model formulation above needed a small modification to perform analyses.The event : "no new arrival before time  is later diagnosed with the infection" is now equivalent to "no arrival of an infected individual before time ", "one infected arrival before time  remained undiagnosed", "two infected arrivals before time  remained undiagnosed", etc.For a Poisson process with fixed parameter , the probability of  can be expressed as: where  is the probability of detection.It follows that the modified PDF formulation for sensitivity analyses is: This PDF was used instead of equation ( 4) to obtain maximum likelihood estimates of the growth rate and outbreak initiation date for sensitivity analyses.

Results from sensitivity analyses
The following are the maximum likelihood estimates for the growth rate and date of outbreak initiation in the hypothetical situations mentioned above.When the probability of detection of a case was set to 50%, the estimated growth rate was 0.29/day, while the time of outbreak initiation was Dec 18, 2019.The same estimates were obtained if we assumed no case could be detected for individuals having arrived from Wuhan before Dec 31, 2019.When the probability of detection of a case was set to 10%, the estimated growth rate remained 0.29/day, but the estimated outbreak initiation date was Dec 12, 2019.

The 'case count' model: The SEIR-type hybrid stochastic model
Model 1 fitted the time of arrival of the first confirmed case of each province.We used a different approach and a different dataset to infer disease dynamics.In particular, we constructed a hybrid stochastic model for inferring the disease dynamics using all confirmed cases outside Hubei.Since the measurements in Wuhan, Hubei may have been biased in early outbreak, it is our aim to use data from outside Hubei for the inference of the growth rate  and the onset time  (define  = 0 as 0:00 am, 1/1/2020), defined as the time when the sum of exposed and symptomatic populations ≈ 1 in Wuhan.The model is hybrid in the sense that we will couple a deterministic and exponential growth to describe the outbreak in Wuhan and an agent-based model which describes the discrete population dynamics of the patients after they left Hubei to other provinces.We present a schematic diagram of the hybrid meta-population model in Supplementary Fig. 6 below.

Agent-based model for patients who have left Wuhan to other provinces
We assume that between 1/1 and 1/26, the populations in Wuhan are large and the dynamics can be reasonably approximate by the above deterministic and exponentially growing curves.However, the initial propagation of the disease to other provinces in China involves only a small population of exposed (  ,  for Others) or symptomatic individuals who left Hubei province.In addition, the transitions between different phases of these patients, from exposed (  ) to symptomatic (   ), over to hospitalized (   ), and finally to be confirmed by laboratory examinations (   ) in other provinces are also variable (as we quantified in Fig. 1C-F).Consequently, the resulting population dynamics in other provinces is highly stochastic.We thus adopt an agent-based modeling approach and rely on kinetic Monte Carlo Sampling techniques detailed below to simulate the population dynamics in other provinces.With this approach, we aim to generate samples of (1) each individual patient who left Wuhan at a specific date, and (2) the individual's health status as the time progresses (susceptible, exposed, or symptomatic).The goal is to accumulate a large amount of Monte Carlo samples, by which we can compute the key summary statistics, i.e., the average case reported on each day between 1/18 and 1/26, to be compared against to the data.We achieve this by the following algorithmic procedures.

Generate random number of infected populations leaving Wuhan. We collected migration
index which quantifies the fraction of total populations (14 million) in Wuhan that traveled to other provinces on each date   = 1, … ,26 (see Table S3).Assuming independence of an individual's health state (susceptible, exposed, or symptomatic) and the individual's migration decision (leaving to other provinces or not), on each date   , the exposed and symptomatic populations leaving Hubei can be modeled by two Bernoulli distributions,   = Bernoulli(  (  ), (  )) and   = Bernoulli(  (  ), (  )).Here,   () and   () are the exposed and symptomatic population in Wuhan, and are assigned to the nearest integers to the previously prescribed exponential growth, given model parameters (,  0 ) .Thus, to generate one stochastic sample path (realization), we generate Bernoulli-distributed random populations leaving Hubei on each day between 1/1 and 1/26 (both included), and model each of these in silico patients' health states by the following procedures.

2.
Generate the progression of the health state for each patient: We assume that each hypothetical patient generated by the above procedure would stochastically, identically and independently progress toward to be confirmed (   ) and reported in one of the other provinces.If an individual was exposed (  ) when s/he left Hubei at   , we generate a Gamma distributed random time Δ → ∼ Γ( 1 ,  1 ) and update the individual's health state to symptomatic (  ) at time   + Δ → .We chose a time-dependent waiting-time distribution for the progression from symptomatic sate   to reflect the two regimes we observed from the data (see main text): If   + Δ → is before 1/18 (included), we generate a Gamma distributed random time Δ → ∼ Γ( 2,1 ,  2,1 ) to model the waiting time for an infected patient to be hospitalized (otherwise, if it is later than 1/18, Δ → ∼ Γ( 2,2 ,  2,2 ) ).
Consequently, the patient's state is changed to   at time   + Δ → + Δ .If   + Δ → + Δ → is before 1/19, the patient would wait in the "H" state until 1/19 when the policy of case confirmation was announced and institutionalized.Then, the confirmation process is modelled by another Gamma distributed random time Δ → ∼ Γ( 3 ,  3 ).The patient is then confirmed and reported at time   + Δ → + Δ → + Δt → , and we add one more case report at the next integer (date of January).Similar procedure applied to a patient who had already progressed to the   state before s/he left Hubei on date   , with the exception that the first random waiting time is neglected-the patient's confirmation time would be   + Δ → + Δ → .We repeat the procedure for each in-silico patient who left In Wuhan, a susceptible patient in compartment   is first exposed and progresses to an exposed state (  ), progressed to be infected (  ), hospitalized (  ), and then became a confirmed case (  ), and either recovered (  ) or deceased (  ).A portion of ill population (  and   ) moved to other provinces and followed a similar progression.Because these populations are small and thus the dynamics are stochastic, we adopt an agent based approach to simulate the disease dynamics (   () ,   () ,   () and   () ) in other provinces.The case reports on each day in other provinces were compared against the model's output,   () to constrain the unknown initial onset and growth rate in Wuhan.
Parameter estimation and uncertainty quantification of (,  0 ) It is our task to infer the unknown parameters, exponential growth rate  and exponential growth onset time  0 by the number of confirmed cases reported between 1/18 and 1/26.This is possible because the information of the unknown parameters (,  0 ) have an impact of the deterministic growths of the exposed   () and symptomatic population   (), which in turn have an impact on the random populations which have left Hubei on each date.These populations follow statistically quantified processes until the final confirmation outside of Hubei, and can be compared against the reported data.
An error measure is devised to assess the quality of fit of the model given a set of parameters (,  0 ) by the following procedures.For each parameter set, we generate 2 13  , and compare it to the actual data    (  ).We quantify the quality of the fit by computing the sum of the squared residuals: A 100 × 100 grid-based parameter scan is performed to identify the parameters in the region 0.22 <  < 0.42 and −20 ≤  0 ≤ −5 for identifying the best-fit parameters: As for uncertainty quantification, we formulate the logarithm of the likelihood ℒ of a parameter set (,  0 ) as Here,  = 9 is the number of data points we use to fit the model.The assumption we make to formulate the above likelihood is that (1) the data (number reported new cases on date   ) is normally distributed with a mean which equals to the Monte Carlo mean reported new cases in our model, and (2) the variance of the noise is identically and   -independently distributed, and the variance is equal to the mean squared residuals of the best-fit model.
narrowly distributed, we can numerically compute the marginalized posterior, which is reported in Fig. 2D-F and used to calculate the bounds of centered 95% probability mass to estimate the confidence interval of the growth rate .

Calculation of R0 from exponential growth rate
Assuming gamma distributions for the latent and infectious periods, Wearing et al. ( 4) have shown that the value of R0 can be calculated from estimated exponential growth rate, r, of an outbreak as: where 1/ and 1/ are the mean latent and infectious periods, respectively, and  and  are the shape parameters for the gamma distributions for the mean latent and infectious periods, respectively.

Calculation of the impact of intervention strategies
Using a susceptible-exposed (noninfectious)-infectious-recovered (SEIR) type compartmental model, Lipsitch et al. ( 5) evaluated the impact of quarantine of symptomatic cases to prevent further transmission and quarantine and close observation of asymptomatic contacts of cases so that they may be isolated when they show possible signs of the disease.Assuming that only symptomatic individuals transmit the pathogen, they showed that the reproductive number after the intervention,   , can be expressed as: where  is the reproductive number before intervention,  is the percentage of infected individuals being quarantined,   and  are the mean durations of infectious period after intervention and without intervention, respectively.
Here in our model, we adopted this formulation; however, we assumed that a fraction, , of infected individuals are asymptomatic and can transmit.In this case, quarantine of symptomatic individuals only reduces the contribution of these individuals towards the reproductive number.Thus, we can calculate the reproductive number under quarantine,   , as: We also considered another form of control measure, i.e. the population-level control measure that reduces overall number of daily contacts in the population by .These measures include closing down of transportation systems, work and/or school closure, etc.Since R depends linearly on the number of daily contacts, we calculate the combined impact of the individual-based quarantine and the population level control measure as: In our calculations, we assumed that the mean duration of infectious period of 2019-nCoV to be 5 days, i.e. =5 days and that   = 2 days.We set the value of  to be the maximum likelihood estimate of  0 .Then the impact of the two types of interventions are calculated.

Fitting the number of new cases in and out of Hubei
To infer the growth rate of the number of new cases, we used linear regression over the logtransformed case counts.We used the day in January 2020 as an independent variable.For this specific analysis, we avoided using case frequencies < 10 because infection dynamics may have been dominated by stochasticity.For cases inside Hubei, we used the number of cases reported between Jan. 16 and Feb. 4. For cases outside of Hubei, we used the number of cases reported between Jan 20. and Feb. 4. To assess whether a different growth rate was observed after Jan 25 outside of Hubei, we evaluated the significance of the interaction term between variable day and the index variable for dates Jan 25 and beyond; the results are presented in Fig. 3C.All regressions and confidence interval estimates were obtained through software R.

Fig. 1 .
Fig. 1.Epidemiological characteristics of early dynamics of 2019-nCoV outbreak in China.(A-B) Daily new and cumulative confirmed cases in Hubei province (A) and provinces other than Hubei in China (B).(C-F) Distributions of key epidemiological parameters, including the durations between infection and symptom onset (C), between symptom onset to hospitalization (D), between hospitalization to discharge (E) and between hospitalization to death (F).Filled circles and bars on x-axes denote the estimated mean and 95% confidence intervals.

Fig. 2 .
Fig. 2. Two different approaches using high-resolution travel data reached consistent estimates of the exponential growth rate and the date of exponential growth initiation of the 2019-nCoV outbreak.(A) A modified snapshot of the Baidu® Migration online server interface showing the migration pattern out of Wuhan (red dot) on January 19, 2020.Thickness of curved white lines denotes the size of the traveler population to each province.The names of most of the provinces are shown in white.(B) Estimated daily population sizes of travelers from Wuhan, Hubei province to other provinces.(C) A schematic illustrating the export of infected individuals from Wuhan.Travelers (dots) are assumed to be random samples from the total population (the whole pie).Because of the growth of the infected population (orange pie) and the shrinking size of the total population in Wuhan over time, it is more and more likely infected individuals travel to other provinces (orange dots).(D) The dates of documented first arrivals of infected cases in 26 provinces.Names of provinces were shown vertically.(E) Best fit of the 'case count' model to daily counts of new cases (including only imported cases) in provinces other than Hubei.The standard deviations of the sample distribution are shown as the error bars.(F and G) The marginalized likelihoods of the growth rate  (F) the exponential growth initiation time (G) are consistent between the 'first arrival' model and the 'case count' model.

Fig. 3 .
Fig. 3. Estimation of the basic reproductive number, R 0 , and the impact of control measures.(A) Histograms and the means (stars) of estimated R 0 assuming individuals become infectious at symptom onset (blue) or 2 days before symptom onset (orange).The dotted line denotes R 0 =1.(B) The levels of minimum efforts (lines) of intervention strategies needed to control the virus spread.Strategies considered are quarantine of symptomatic individuals and individuals who had contacts with them (x-axis) and population-level efforts to reduce overall contact rates (y-axis).Different colored lines denote different assumptions of the fraction of asymptomatic individuals in the infected population.Solid and dashed lines correspond to R 0 =4.7 and 6.3 (i.e. the estimated means of R 0 ), respectively.(C) The cumulative number of cases outside of Hubei province in late January 2020.The growth rate decreased to 0.14 per day since January 30.The dashed black line shows January 23 when Wuhan is locked down.

Fig. S6 .
Fig. S6.Schematic diagram of the proposed meta-population model.Schematic diagram of the hybrid stochastic model.The model is a variant of the SEIR model with two geographic compartment, Wuhan (subscripted ) and other provinces (subscripted ).In Wuhan, a susceptible patient in compartment   is first exposed and progresses to an exposed state (  ), progressed to be infected (  ), hospitalized (  ), and then became a confirmed case (  ), and either recovered (  ) or deceased (  ).A portion of ill population (  and   ) moved to other provinces and followed a similar progression.Because these populations are small and thus the dynamics are stochastic, we adopt an agent based approach to simulate the disease dynamics (   () ,   () ,   () and   () ) in other provinces.The case reports on each day in other provinces were compared against the model's output,   () to constrain the unknown initial onset and growth rate in Wuhan.

Figure S1 .
Figure S1.The duration from symptom onset to hospitalization decreases over time during the outbreak.

Figure S2 .
Figure S2.Predictions of the 'first arrival' model using best-fit parameters agree well with data.Probability densities of times of first arrival of infected cases in each province based on our maximum likelihood estimate (curves) and documented times of first arrival of infected individuals in our case report dataset (lines).

Figure S3 .
Figure S3.Projections of numbers of infected individuals in Wuhan between January 1 and 30, 2020 using the likelihood profile of parameter values in the 'first arrival' approach.Projections after the lock-down of Wuhan on January 23 were hypothetical scenarios assuming no control measures are implemented.

Figure S4 .
Figure S4.Log-likelihood profiles of the estimated exponential growth rate of the outbreak, r (xaxis) and the date of exponential growth initiation (y-axis) from the 'first arrival' model (A) and the 'case count' model (B).

Figure S5 .
Figure S5.Histogram of the basic reproductive number, R0, using the 'case count' model assuming individuals become infectious at symptom onset (blue) or 2 days before symptom onset (orange).The mean estimates are R0=6.6 (blue star) with a CI between 4.0 and 10.5 and R0=4.9 (orange star) with a CI between 3.3 to 7.2.The dashed line denotes R 0 =1.
= 8192 Monte Carlo samples.On each date   , the  th sample reports a random number    (_|,  0 , ) of confirmed new cases.We thus average over all the samples and obtain an averaged number of newly confirmed cases on a date   ,    (  |,  0 ) ≔ ∑    (  |,  0 , )