Use of Genomics to Track Coronavirus Disease Outbreaks, New Zealand

Real-time genomic sequencing has played a major role in tracking the global spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), contributing greatly to disease mitigation strategies. In August 2020, after having eliminated the virus, New Zealand experienced a second outbreak. During that outbreak, New Zealand used genomic sequencing in a primary role, leading to a second elimination of the virus. We generated genomes from 78% of the laboratory-confirmed samples of SARS-CoV-2 from the second outbreak and compared them with the available global genomic data. Genomic sequencing rapidly identified that virus causing the second outbreak in New Zealand belonged to a single cluster, thus resulting from a single introduction. However, successful identification of the origin of this outbreak was impeded by substantial biases and gaps in global sequencing data. Access to a broader and more heterogenous sample of global genomic data would strengthen efforts to locate the source of any new outbreaks.

Real-time genomic sequencing has played a major role in tracking the global spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), contributing greatly to disease mitigation strategies. In August 2020, after having eliminated the virus, New Zealand experienced a second outbreak. During that outbreak, New Zealand used genomic sequencing in a primary role, leading to a second elimination of the virus. We generated genomes from 78% of the laboratory-confi rmed samples of SARS-CoV-2 from the second outbreak and compared them with the available global genomic data. Genomic sequencing rapidly identifi ed that virus causing the second outbreak in New Zealand belonged to a single cluster, thus resulting from a single introduction. However, successful identifi cation of the origin of this outbreak was impeded by substantial biases and gaps in global sequencing data. Access to a broader and more heterogenous sample of global genomic data would strengthen eff orts to locate the source of any new outbreaks. countries and over time ( Figure 1). For example, the COVID-19 Genomics UK Consortium (https://www. cogconsortium.uk) has led to the United Kingdom being the most represented sampling location, totaling ≈180,000 genomes and comprising 44% of the global dataset despite recording only ≈4% of the world's positive cases (n = 3,669,658). Conversely, SARS-CoV-2 genomes sequenced in India represent just 1% of the global dataset but 11% of the world's total reported cases (n = 10,677,710).
Such disparate sequencing efforts can have major implications for data interpretation and must be carefully considered. Real-time sequencing of SARS-CoV-2 genomes has, however, been particularly useful for tracking the re-emergence of the virus in New Zealand. By June 2020, New Zealand had effectively eliminated COVID-19 in the community and positive cases were limited to those linked to managed quarantine facilities at the border (7,15; J. Douglas et al., unpub. data, https://www.medrxiv.org/content/10 .1101/2020.08.04.20168518v1). After ≈100 days with no detected community transmission of COVID-19, on August 11, 2020, four new cases emerged with no apparent epidemiologic link to any known case. We used genomic sequencing of SARS-CoV-2 cases to investigate the probable origins of this outbreak, generating genomes for 78% (n = 140) of the 179 laboratoryconfirmed samples from this outbreak.
We obtained nasopharyngeal samples positive for SARS-CoV-2 by real-time reverse transcription PCR (rRT-PCR) from public health medical diagnostics laboratories located throughout New Zealand. All samples had been de-identified before receipt. Under contract for the New Zealand Ministry of Health, the Institute of Environmental Science and Research (ESR) has approval to conduct genomic sequencing for surveillance of notifiable diseases.

Genomic Sequencing
Of 179 laboratory-confirmed samples of SARS-CoV-2 from the August 2020 outbreak in New Zealand, 172 were received by ESR for whole-genome sequencing. Genome sequencing of SARS-CoV-2 samples was performed as before (7). In brief, viral extracts were prepared from respiratory tract samples in which SARS-CoV-2 was detected by rRT-PCR by using World Health Organization-recommended primers and probes targeting the envelope and nucleocapsid genes. Extracted RNA from SARS-CoV-2-positive samples was subjected to whole-genome sequencing by following the ARTIC network protocol version 3 (https://www.protocols.io/view/ncov-2019-sequencing-protocol-v3-locost-bh42j8ye) and using the Massey University 1200-bp primer set (https:// www.protocols.io/view/ncov-2019-sequencing-protocol-rapid-barcoding-1200-bh7hj9j6) (16).
We used 1 of the tiling amplicon designs to amplify viral cDNA prepared with SuperScript IV (Thermo-Fisher Scientific, https://www.thermofisher.com). Sequence libraries were then constructed by using Oxford Nanopore Ligation Sequencing and Native  (21).

Results
Of the virus genomes generated in real time for 78% of cases in this cluster, from August 11 through September 14, 2020, when the last case in this outbreak was reported, the maximum distance among the genome was 5 single-nucleotide polymorphisms. When we compared the genomes from patients in the August 2020 New Zealand outbreak with sequenced genomes from patients affected by the first COVID-19 wave in New Zealand and those in quarantine facilities, we found no link. Most available sequence data from case-patients in New Zealand quarantine facilities indicated virus lineages different from those of the August 2020 outbreak. However, this observation was of limited value given that only 42% of case-patients in those quarantine facilities had adequate viral RNA for successful genomic sequencing. To determine the likely origins of this outbreak, we compared genomes from the new community outbreak to the global dataset. An initial genomic sequence analysis found that the reemergence of COVID-19 in New Zealand was caused by a SARS-CoV-2 from the (now ancestral) lineage B.1.1.1 of the pangolin nomenclature (17). Of the countries that have contributed SARS-CoV-2 data, 30% had genomes of this lineage. Remarkably, 80% of B.1.1.1. genomes were from the United Kingdom and were generated during March 2020-January 2021; however, most samples were collected during the first wave of disease in the United Kingdom ( Figure 2). Phylogenetic analysis of the most recently sampled B.1.1.1. genomes identified genomes from South Africa, England, and Switzerland in August as the most likely to be contained within the sister clade ( Figure 2); these genomes were the closest sampled genomic relatives of the viruses associated with the August 2020 outbreak in New Zealand (Appendix 2, https://wwwnc.cdc.gov/EID/ article/27/5/20-4579-App2.pdf). Because of the dynamic nature of the pangolin lineage nomenclature, genomes sampled from the August 2020 outbreak in New Zealand are now distinctly classified as lineage C.12, which is now extinct.
Additional Bayesian analysis estimated that the outbreak originated 10 days before the first transmission event; the 95% highest posterior density was 0-25 days. We also estimated that the first transmission event in the outbreak occurred during July 22-August 13, 2020 (95% highest posterior density mean date of August 2). Epidemiologic data showed that 2 confirmed case-patients linked to the outbreak had a symptom onset date of July 31, although the most probable sampled genomes within the sister clade were sampled later, August 6-28. Hence, it is unlikely that the currently available global genomic dataset contains the source of this outbreak.

Discussion
Genomic epidemiologic analysis of the possible origins of the COVID-19 re-emergence in New Zealand in August 2020 was inconclusive, probably because of missing genomic data within the quarantine border facilities and in the global dataset. A glimpse into the genomic diversity probably omitted from the global dataset can be seen in the genomes sequenced in New Zealand from SARS-CoV-2-positive quarantined case-patients, comprising citizens and residents returning from across the globe. For example, 12 SARS-CoV-2 genomes from persons returning to New Zealand from India who arrived on the same flight fell across at least 4 genomic lineages and comprised sequence divergence of up to 34 single-nucleotide polymorphisms (https://www.nextstrain.org). This divergence represented far more genomic mutations than was observed in New Zealand during the first outbreak in March-May 2020 (7). Such a high level of diversity in just a small sample of SARS-CoV-2-positive case-patients from India suggests that the currently available genomic data fail to encompass the true diversity that existed locally, let alone globally.
The genome sequences identified after the reemergence of SARS-CoV-2 in New Zealand in August 2020 exemplified one of the most complete genomic datasets for a specific outbreak compiled to date, comprising 78% of positive case-patients (140 of 179 total case-patients SARS-CoV-2 positive by PCR). Real-time genomic sequencing quickly informed track-and-trace efforts to control the outbreak, setting New Zealand on track to eliminate the virus from the community for the second time. The rapid genome sequencing of positive samples provided confidence to public health teams regarding links to the outbreak and identified that cases and subclusters were linked to a single genomic lineage, resulting from a single introduction event. Indeed, the timing and length of lockdown measures were partly informed on the basis of these data. Overall, real-time viral genomics has played a pivotal role in eliminating COVID-19 from New Zealand and has since helped prevent additional regional lockdowns, leading to substantial economic savings. Nevertheless, the biased nature of global sampling, including the contribution of very few genome sequences from certain geographic locations, clearly limited the power of genomics to attribute the geographic origin of the August 2020 outbreak in New Zealand. We therefore advocate that potential sampling biases and gaps in available genomic data be carefully considered whenever attempting to determine the geographic origins of a specific SARS-CoV-2 outbreak. Analyses should consider all available evidence, including that from genomic and epidemiologic sources.