Predictors of Test Positivity, Mortality, and Seropositivity during the Early Coronavirus Disease Epidemic, Orange County, California, USA

We conducted a detailed analysis of coronavirus disease in a large population center in southern California, USA (Orange County, population 3.2 million), to determine heterogeneity in risks for infection, test positivity, and death. We used a combination of datasets, including a population-representative seroprevalence survey, to assess the actual burden of disease and testing intensity, test positivity, and mortality. In the first month of the local epidemic (March 2020), case incidence clustered in high-income areas. This pattern quickly shifted, and cases next clustered in much higher rates in the north-central area of the county, which has a lower socioeconomic status. Beginning in April 2020, a concentration of reported cases, test positivity, testing intensity, and seropositivity in a north-central area persisted. At the individual level, several factors (e.g., age, race or ethnicity, and ZIP codes with low educational attainment) strongly affected risk for seropositivity and death.

at this time, the mandated social distancing measures had exceptions in place for persons working in essential jobs, which was broadly defined and included medical professionals, food providers, delivery agencies, public officials, construction contractors, and building laborers (19). The social and economic characteristics of persons working essential jobs differs from the overall population (20).
Almost half of OC residents >5 years of age speak a language other than English at home. In addition, many within the Hispanic/Latinx and Asian communities of OC live below the poverty level (17.9% and 12.0%, respectively) and face challenges in education, household income, access to healthcare, health disparities, and life expectancy (21,22). The relatively small land area, high population density, and diverse population of OC provides a unique opportunity to explore potentially important social, economic, and demographic correlates of COVID-19 epidemiology.
Emerging Infectious Diseases • www.cdc.gov/eid • Vol. 27   We conducted a detailed spatiotemporal epidemiologic analysis of COVID-19 in OC during March 1-August 16, 2020. We drew from reported tests and mortality data from the county health agency. Given that passively detected cases are prone toward bias, in July 2020 we also conducted a seroprevalence 2606 Emerging Infectious Diseases • www.cdc.gov/eid • Vol. 27 pdf), which is intended to be representative of the age, income, and racial and ethnic diversity of OC. We recruited 1 participant per household (by email or phone) to participate in a survey on their thoughts and opinions regarding COVID-19. The survey included questions on sociodemographics, occupation, social activities, any illness or symptoms in the past few months, and whether the person had been diagnosed with COVID-19. After completing this portion of the survey, each eligible participant was asked if they would be willing to participate in a drive-through blood test for SARS-CoV-2 antibodies. Eligibility for antibody testing was restricted to a quota sample designed to be demographically representative of the county as a whole. Recruitment to the antibody test was delayed to the end of the questionnaire to avoid biasing the serologic survey toward persons who believed that they were infected with SARS-CoV-2. A total of 10 field sites for drive-through blood tests were dispersed throughout OC to minimize driving distances for participants. This cross-sectional survey was conducted July 10-August 16, 2020. The seroprevalence study design and overall findings for OC have been described previously (23).

Serologic Test Data
We used a coronavirus antigen microarray to classify participants from the serologic survey as seropositive or seronegative. age with at least a bachelor's degree, the percentage of adults who had health insurance in the previous 5 years, the number of persons per square kilometer, and the percentage of houses with >1 person per room. These data came from the 2018 American Community Survey (21).

Analysis Descriptive Spatiotemporal Data Analysis
We aggregated reported cases and number of tests at the ZIP code level and by week. We included 86 ZIP codes in the analysis. For plotting cases on OC maps, we further aggregated the data by month (March-August). We calculated and mapped case incidence as positive cases per 100,000 population per week, testing intensity as total number of tests per 100,000 persons per week, and test positivity as the percentage of positive tests for each month. We conducted formal testing of spatial autocorrelation by using the global Moran's I statistic and spatial correlograms. We then used local clustering statistics (local indicators of spatial autocorrelation [LISA] [24]) to visualize the location of clusters. We ran all tests for case incidence, test positivity, and test intensity. We also used LISA statistics to map and assess seropositivity.

Risk for Death, and Seropositivity
We used logistic regressions to explore geographic, demographic, economic, and epidemiologic predictors of the odds of testing positive for COVID-19, of dying from COVID-19, and of being seropositive for SARS-CoV-2 antibodies. Predictors in our models were age group, sex, and race or ethnicity at the individual level (Appendix Table 1). ZIP codelevel predictors were median household income, the percentage of adults >25 years of age with at least a bachelor's degree, the percentage of adults who had health insurance in the previous 5 years, population density (persons/km 2 ), and house crowding (the percentage of houses with >1 person per room).
We tested several specifications of the models. Through preliminary exploratory analyses, we noted that the first cases were reported from coastal ZIP codes but that this pattern had shifted inland over time. The best fitting model included a smoothed interaction term for time, coded by day (Appendix Table 1), and median household income at the ZIP code level.
We included the same predictors in the model for risk for death, except for the interaction between time and median household income, which did not improve model performance. Given reports of increased mortality rates related to hospital bed shortages, we also included as a predictor the number of intensive care unit beds occupied by suspected or confirmed COVID-19 patients on the day that any person tested positive for SARS-CoV-2. For all model results, we calculated model-adjusted odds ratios (aORs) with 95% CIs.

Ethics Considerations
This analysis constitutes a retrospective analysis of deidentified, anonymized epidemiologic records. Therefore, it is exempt from ethics review.

Results
A total of 597,922 tests were reported to OCHCA through August 16, 2020. After excluding repeated tests and those with incomplete data, 316,626 (53.0% of all records) persons were included in the test positivity analysis; 37,546 (12.0%) persons tested positive for COVID-19. A total of 42,383 persons with positive COVID-19 tests were included in the mortality analysis; 1,038 (2.5%) died from the disease. In the separate population-based serologic survey, 2,979 persons participated and 350 tested seropositive.

Spatial Patterns in Reported COVID-19 Cases, Testing Intensity, and Seropositivity
The tests for spatial autocorrelation indicated significant clustering in reported cases and testing intensity in the first month (March) of the local epidemic (  Figure 3). The highest reported case incidence in March was along the central coast and southern portion of the county (Figure 2, panel A). The LISA statistics indicated statistically significant clustering of high-incidence ZIP codes in the central coast area (Figure 2,   Clustering of reported cases and test positivity increased in magnitude in May (Table 1; Appendix  Figures 1, 3). Although clustering in test intensity was high in March (Table 1; Appendix Figure 2), it decreased in May as access to testing spread throughout much of the county. Clustering in testing intensity increased again in June and July (centered on the hotspots in the north-central part of the county) (Figure 2, 4). By April, case incidence, testing intensity, and test positivity had all shifted to the north-central part of the county. ZIP code-level seropositivity also revealed a cluster in the north-central part of OC (Figure 5), especially in the city of Santa Ana (Figure 1).

SARS-CoV-2 infection
Age was a strong predictor of testing positive. Persons in the 10-14-and 15-19-year age groups had the highest odds of testing positive (both with ≈2.30 times the odds of testing positive compared with the 0-4 year age group) (Table 2; Figure 6). Men and boys had 1.20 times the odds of testing positive than women and girls (95% 95% CI 1.18-1.23). Persons who identified as Hispanic or Latinx had 1.7 times the odds of testing positive (95% CI 1.60-1.76) than did non-Hispanic Whites, whereas Asian (aOR 0.55; 95% CI 0.52-0.58), Black (aOR 0.58; 95% CI 0.51-0.65), and Pacific Islander (aOR 0.35; 95% CI 0.29-0.42) persons had lower odds of testing positive than did non-Hispanic Whites. A large proportion of persons did not have attributable race or ethnicity data in the records (72% of all records through August 16). This unknown category includes persons who had no race or ethnicity categories recorded, those who had unknown or mixed listed for race or ethnicity, and those who listed multiple races.
ZIP code-level population density was not a significant predictor of testing positive (Table 2; Figure 6). However, education (percentage of adults >25 years Figure 6. Model-adjusted odds ratios and 95% CIs from the logistic regression for odds of testing positive for severe acute respiratory syndrome coronavirus 2, Orange County, California, USA, July-August 2020. Corresponding data presented in Table 2. of age with at least a bachelor's degree), health insurance coverage (percentage of adults who had health insurance in the previous 5 years), median household income, and household crowding were all statistically significant predictors of testing positive. For example, persons who lived in ZIP codes with the highest education levels had 32% decreased odds of testing positive (aOR for the fourth quartile 0.68, 95% CI 0.56-0.83). In addition, the interaction between ZIP code-level median household income (Figure 7) indicates that persons from wealthier ZIP codes had increased risk for testing positive at the beginning of the epidemic in OC. However, this pattern quickly shifted, and persons from lower-income areas showed the highest odds of testing positive in subsequent months.

Factors Associated with Dying from COVID-19
For each increase in 10 years of age, we observed an associated 2.5-fold increase in the odds of death (aOR 2.56, 95% CI 2.45-2.67; Table 3; Figure 8). Infected men and boys were twice as likely to die from COVID-19 than were women and girls (aOR 2.00, 95% CI 1.73-2.31). Although persons who identified as Asian were less likely to test positive for SARS-CoV-2 infection (Table 2), those who did test positive had higher odds of death. Compared with non-Hispanic Whites, this group had 54% increased odds of dying from COVID-19 (aOR 1.54, 95% CI 1.23-1.93).
Living in ZIP codes with high education levels and health insurance coverage was also predictive of mortality outcomes (Table 3; Figure 8). Persons who tested positive for COVID-19 and lived in ZIP codes with the highest levels of educational attainment had 49% lower odds of dying from COVID-19 (aOR for the fourth quartile 0.51, 95% CI 0.31-0.84). Persons who lived in ZIP codes with the highest levels of health insurance coverage had 21% lower odds of dying from COVID-19. ZIP code-level household crowding and the number of COVID-19 patients in hospital beds were both significant predictors of death. Risk for death from COVID-19 decreased over the study period.

Factors Associated with SARS-CoV-2 Seropositivity
ZIP code-level cumulative incidence was a significant predictor of individual-level seropositivity in  the absence of other ZIP code-level predictors. Every increase in 10% of the ZIP code cumulative incidence resulted in an approximately 50% increase in the odds that a person would be seropositive (Appendix Table 2). ZIP code-level cumulative incidence was no longer a statistically significant predictor of seropositivity when other ZIP code-level predictors were added to the model (Table 4; Figure 9). In the full model (including all ZIP code-level covariates), median household income had a protective effect; persons coming from ZIP codes with higher median household income had lower odds of being seropositive for SARS-CoV-2 antibodies (aOR for every 1 SD increase 0.75, 95% CI 0.57-1.00).
We found no difference in age groups with regard to seropositivity. Although men and boys were more likely to test positive or to die from SARS-CoV-2 infection, they were less likely than women and girls to be seropositive (aOR 0.75, 95% CI 0.59-0.94). Hispanic and Latinx persons had 54% increased odds of being seropositive (aOR 1.54, 95% CI 1.17-2.03). Pacif-ic Islanders may also have had higher odds of being seropositive, but with small total numbers and broad 95% CIs (aOR 3.89, 95% CI 1.04-14.65); 3 of 12 Pacific Islanders tested were seropositive.

Discussion
Infectious disease data from passive case detection can be biased in various ways, including the welldocumented challenge of uneven access to testing and diagnosis (25) and a general bias toward persons who are seeking clinical care for symptomatic disease. In our analysis of COVID-19 in OC, we used a rich set of complementary data that included those passively collected (e.g., reported cases and mortality records) and those from active screening (e.g., population-based serologic testing). Results indicate that, in the early days of the epidemic in OC, both testing intensity and test positivity were concentrated in wealthy and affluent areas along the central coast. After March, however, a large cluster of reported cases formed in lower-income north-central OC (especially the cities of Santa Ana and Anaheim)   Table 2 because of more extensive data curation for mortality data than for general test data. More rows of data were dropped because of missing information (e.g., on age or sex) in the test positivity data than in the mortality data. COVID-19, coronavirus disease; ICU, intensive care unit; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2. †Model intercept represents odds of death for a White female diagnosed with SARS-CoV-2 in the 0-4 years age group in a ZIP code in the first quartile of college degree and insured with the average population density in Orange County. The odds of this person testing dying from COVID-19 is estimated to be zero.
‡Estimated percentage of population density in a person's ZIP code. §Percentage of hospital beds not being used by COVID-19 patients in Orange County.
( Figures 1, 2), growing in size in May and persisting over time. Testing intensity spread throughout the county during this same period. Consistent with other reports, we also found that age and male sex strongly predict testing positive and COVID-19 associated death (26). Intriguingly, whereas older age groups and men and boys were more likely to have symptomatic disease, our populationbased serologic survey found that women and girls were more likely than their male counterparts to be seropositive. Hispanic and Latinx persons had higher risk for infection and testing positive, even after controlling for several ZIP code-level socioeconomic factors. Given the consistency of this finding between the models for test positivity and seropositivity, the risk for being infected with SARS-CoV-2 rises above and beyond the risks of living in a ZIP code with high transmission or a ZIP code with low income and low levels of educational attainment. Other studies also note an increased risk for testing positive among Hispanic and Latinx persons (27)(28)(29). Our seroprevalence survey indicates that in OC, this finding is not an artifact of passive case detection but instead represents an actual true greater risk for infection for Hispanic and Latinx persons.
Although persons identifying as Asian were less likely to test positive for SARS-CoV-2, they were more likely to die when infected. This disparity is consistent with national data, though its cause is uncertain (30). This pattern may reflect discrepancies in outreach communication to these communities or other socioeconomic and cultural factors (31,32) and warrants further detailed investigation.
Social determinants of health, defined as "conditions in which people are born, grow, work, live, age, and the wider set of forces and systems," play a critical role in the creation of disparities related to illness, death, and quality of life (33). These social determinants include (among other factors) poverty, wealth, educational quality, household and neighborhood   Table 3. COVID-19, coronavirus disease; ICU, intensive care unit.
conditions, childhood experience, and social support. Several speculative explanations have been proposed for these sociodemographic patterns related to COVID-19, including living in dense quarters (and this pattern is evident in our analyses). In addition, as the state and local shelter-in-place and social distancing policies were mandated, persons who are independently wealthy or who work in occupations where working from home was a viable option, were more capable of practicing social distancing. Persons from low socioeconomic status areas, by contrast, may have less ability to practice social distancing. Our analyses show that persons from ZIP codes with lower overall educational attainment and health insurance coverage and with higher housing density were more likely to test positive for and die from CO-VID-19. The association with median household income was more complex and changed over time with regard to test positivity. However, we also find that persons from ZIP codes with lower median household income were also more likely to be seropositive for SARS-CoV-2. These findings underscore the importance of understanding contextual factors surrounding infectious disease outbreaks. Study limitations include that county-reported testing and mortality data did not include individuallevel information on income, education, and insurance. These variables were only available at the ZIP code-level, and ZIP codes are unlikely to adequately represent important spatial units. Our seroprevalence survey occurred during July 10-August 16, 2020. We limited our analyses of test positivity and risk for death to before August 16, to correspond with the end of the seroprevalence survey. However, the survey occurred over a period of just over a month, during which time the cumulative incidence was changing. Missing data on race and ethnicity (72% of all official test records) and small counts of some racial and ethnic groups may have affected our findings for groups with low counts in this analysis. Even when race or ethnicity data were available, they were broad categories (e.g., Asian rather than specific Asian ethnicities), which is a major limitation of these data, and efforts are being made to improve collection of race and ethnicity data. A major challenge over the course of this pandemic has been collecting data in a standardized format when test results are being reported from a wide variety of laboratories that are affiliated with many different private and governmental entities. We do not believe that the race and ethnicity data are missing at random but also are not able to assess the magnitude of bias that this possibility would introduce, especially given that race and ethnicity appear to be risk factors for infection.
Study strengths include the diversity of OC in terms of socioeconomic and demographic predictors, which provide sufficient power to investigate these factors in our analyses. California was also one of the first states to issue an executive order for residents to stay home, providing data for several months when only essential workers were permitted to work outside the home. Our analyses were able to identify temporal shifts in the demographics of COVID-19 test positivity that likely reflect disparities related to occupation type that are further amplified by household characteristics. Finally, we are able to assess differences in 2616 Emerging Infectious Diseases • www.cdc.gov/eid • Vol. 27, No. 10, October 2021 Figure 9. Model-adjusted odds ratios and 95% CIs from the logistic regression for the odds of being seropositive for severe acute respiratory syndrome coronavirus 2, Orange County, California, USA, July-August 2020. Corresponding data presented in Table 4. +, positive.
risk for infection and test positivity by comparing our population-level serologic survey to routinely collected (passive) data from county statistics. The reasons for the spatial, sociodemographic, and economic patterns we discovered are likely complex and broadly related to issues of accessing healthcare and general social determinants of health. The clear disparities in how this disease has manifested in OC point toward the need for approaches that are socio-culturally appropriate and have a focus on health equity. The large amount of missing data and the collection of only broad categories of race and ethnicity information highlight the need for improved data collection. Finally, measures that focus on the hardest-hit communities, including those that involve working with communitybased organizations who have experience working with hard-hit demographic and geographic groups to ensure equitable access to health services, may serve as efficient points of intervention for COVID-19.