Prevalence of SARS-CoV-2 Antibodies after First 6 Months of COVID-19 Pandemic, Portugal

In September 2020, we tested 13,398 persons in Portugal for antibodies against severe acute respiratory syndrome coronavirus 2 by using a quota sample stratified by age and population density. We found a seroprevalence of 2.2%, 3–4 times larger than the official number of cases at the end of the first wave of the pandemic.

August), the number of cases was somewhat elevated, but steady, with a daily average of 255 cases.
As of June 2, 2020, Portugal was one of the ten countries in the world with highest levels of testing in per capita terms (22). This notwithstanding, the potential for asymptomatic infections makes it difficult to estimate the true extent of SARS-CoV-2 infections in Portugal after the first phase of the pandemic, although an earlier, more limited, study estimated seroprevalence at 2.9% (23).

Calculation of Sample Size
The sample was stratified by age groups (<18, 18 to 54, ≥55 years old) crossed by population density of the place of residence (<60; 60 to 500; >500 persons/km 2 ). These strata were chosen for epidemiologic reasons. Age is a major factor in COVID-19 severity, and the three age groups were chosen based on cut-offs proposed in a vaccine trial (24). Population density is a major factor in the transmission of infectious diseases, and the three groups were chosen to have a good balance between number of counties sampled and total population in each density strata. At the same time, we strived to keep the total number of strata at <10, to reduce the logistical complexities and sample size associated with more strata. The overall sample size was determined by assuming low prevalence in each of the nine strata, between 0.1% and 3%, with lower levels in the regions of low population density. We also defined a relative error margin of 15% for the global prevalence estimate (i.e., error margin could be at most 15% of the observed prevalence). In addition, we assumed that the test to be used would have 99% sensitivity and 98.7% specificity. Using the test characteristics changes the expected fraction of positive actually observed in the study (see below). We then used these corrected seroprevalence values and Cochrane's formula for proportional allocation to estimate sample size in stratified populations (25), and obtained a sample size of at least 11,241 persons divided proportionally among the 9 strata mentioned. To guarantee precision in the lowest population density regions, where prevalence was expected to be lower, the sample size in those strata (each of the 3 age groups) was increased by 50% of the value calculated. Thus, the final sample is no longer proportional to the population. The total sample size should be at least 11,994 persons distributed according to Appendix Table 1. To achieve the required allocation by population density, the 308 counties of Portugal (including both the Madeira and Azores archipelagos) were subdivided into the three levels of population density and 104 were randomly selected to be sampled, among all counties with a collection laboratory, and with the number of persons in each age group per county as prescribed in (Appendix Table 2).

Recruitment of Study Participants
For logistical reasons, we recruited volunteers to this study, according with the quotas defined (Appendix Table 2). Thus, this study uses a convenience sample. To achieve the needed number of participants from all of Portugal, we developed a communication and study dissemination strategy with several layers. One month before the beginning of the study, the main media groups in Portugal were contacted to aid in the broadcasting of this project. Media Capital, a large group representing 2 TV channels (TVI and TVI 24, over the air broadcast and cable, respectively) and several radio stations (Rádio Comercial, M80, Cidade FM) with National coverage, promptly joined in, promoting short campaign videos featuring TV and News hosts in teasers aired at the beginning of the recruitment. Additionally, a press release containing all the information about the study and how to participate, was widely distributed to the Portuguese media, 1 week before the beginning of the study (with embargo). This enabled several news pieces to be prepared in advance and released on the first day of the study. During We also implemented a campaign of leaflet distribution and poster advertisements, through one of the funding partners of this study: Jeronimo Martins Group, which owns one of the largest supermarket chains in Portugal (Pingo Doce), again with implantation in all regions of Portugal. To help disseminate the study to a larger audience, a leaflet was prepared and distributed in the Pingo Doce stores across the country. At the entrance of the stores, advertisement posters were visible to all the clients. Additionally, advertisement posters were distributed to the 314 participating Germano de Sousa laboratories.
Finally, we used social media, including a short video (https://www.youtube.com/watch?v = TiKMz-Ne9bo) and specifically designed materials were produced for the communication of the project through the institutional social media channels (Facebook, Instagram, LinkedIn, Twitter, and YouTube), again reaching a wide audience. We also had an email and phone lines dedicated to the study, through which interested persons could reach us for help in registration or information about the study. All participants were recruited by voluntary registration through a Web site specifically designed for the study. To help citizens with fewer digital skills, the enrollment could be done directly at one of the 314 participating blood collection laboratories (Germano de Sousa Laboratories), where the local technicians could support and assist in the process of registration through the Web site. Participants were not given any compensation beyond being informed of their serologic status. Participants were excluded only if they had any contraindication for phlebotomy. Prior diagnosis of SARS-CoV-2 infection was not an exclusion criterion.

Blood Collection and Serologic Tests
All blood collections and serologic tests were done by Centro de Medicina Laboratorial Germano de Sousa (CMLGS), an ISO 9001:2015 certified private laboratory, which performs serologic tests for SARS-CoV-2 according to the clinical guidelines issued by the Directorate-General of Health (DGS), within the Portuguese Ministry of Health. CMLGS has a national network of collection sites, of which 314 were involved in this study. This network enabled blood collection from the participants, wherever it was most convenient for them, typically in their area of residence. Each participant donated 7-9 mL of blood collected into tubes with separation gel and without any anti-coagulant, for a 4-5 mL of serum sample, obtained by centrifugation. All samples were transported to the central laboratory, according to usual procedures, where they were assayed.
Blood samples were assayed for total antibodies against SARS-CoV-2 by using the Siemens SARS-CoV-2 Total (COV2T) (Advia Centaur Siemens, Siemens Healthcare, Portugal), a chemiluminescent immunoassay test targeting the spike protein. Positive samples were stored at Biobanco-iMM, Lisbon Academic Medical Center.

Epidemiologic Questionnaire and Outcomes
All participants completed a questionnaire with sociodemographic, general health and clinical/epidemiologic questions regarding SARS-CoV-2 exposure, including symptoms of interest. The full (translated) questionnaire is presented near the end of this Appendix. The questionnaire was in Portuguese (the overwhelmingly dominant language in Portugal), and it was tested beforehand in a study of the University of Lisbon, involving ≈2,500 persons (mostly staff).
The questionnaire was completed at enrollment, and it was the only way participants could get a code to perform the free blood draw, within 2 weeks.
The primary outcome was the proportion of serologic positive cases defined as the fraction of participants who were positive for SARS-CoV-2 specific total antibodies: overall and stratified by age and population density. The secondary outcomes included the proportion of serologic positive cases without any symptoms of interest (asymptomatic cases); or with <3 symptoms and without sudden loss of smell or taste (pauci-symptomatic cases as defined) (3).
The symptoms of interest reported by participants in the questionnaire were: loss of smell/taste, fever, chills, cough (dry or with mucus), muscle or joint pain, sore throat, headaches, general weakness/tiredness, respiratory difficulty, gastrointestinal issues (vomit, nausea, diarrhea), loss of appetite, rashes, rhinorrhea, or loss of consciousness.
Finally, the associations between antibody positivity and the sociodemographic, health and epidemiologic characteristics of the participants were explored. We included questions about education, household size, occupation, chronic disease conditions, body mass index, exercise, smoking habits, influenza and Bacille Calmette-Guerin (BCG) vaccine (against tuberculosis), contact with persons who had COVID-19, previous tests for SARS-CoV-2, among others (see questionnaire).

Adjustment of Seroprevalence for Sample Weights
To extrapolate our results for the entire population, sample seroprevalences were adjusted based on official estimates for the resident population, per quinquennial age group, in each county of Portugal as of December 31, 2019 (26), and further adjusted for the overrepresentation of women by post-stratifying the sample on sex. The weights for each of the 9 study strata divided by sex are presented (Appendix Table 3).
Due to the low values of seroprevalences, specific methods were favored in the calculation of upper and lower limits of the CIs, in detriment of methods based on the normal approximation to the binomial distribution. In particular, Jeffreys CIs for a proportion were used at the strata level (27). To calculate CIs for aggregated strata (i.e., marginal values), we used the exact limiting terms for the binomial parameter adapted for weighted proportions (28).

Correcting Seroprevalence Estimates with Test Sensitivity and Specificity
The total antibody test has a sensitivity, from 14 days post-infection, of 98.1% (based on 536 positive samples); and a specificity of 99.9% (based on 994 samples) (29).
The seroprevalence observed in our weighted sample was adjusted taking into consideration the sensitivity and specificity of the tests by using the Rogan-Gladen estimator where Pm is the measured prevalence and Padj is the final adjusted prevalence, as reported in the main text, with the test specificity Sp and sensitivity S.

Correcting the Asymptomatic and Pauci-Symptomatic Prevalence Estimates with Test Sensitivity and Specificity
The proportion of asymptomatic observed in our weighted sample was adjusted taking into consideration the sensitivity and specificity of the tests, by using the following formula, deduced by applying standard results from probability theory (see the section at the end of this where A is the observed weighted proportion of asymptomatic in the seropositive participants Pm is the measured seroprevalence, AS is the observed proportion of asymptomatic in the full sample, Sp is the test specificity and Aadj is the final adjusted proportion of asymptomatic, as reported in the main text. Similarly, the proportion of pauci-symptomatic observed in our weighted sample was adjusted taking into consideration the sensitivity and specificity of the tests.

Comparison to Official Reported Cases
To compare our seroprevalence results with official reported cases, we used cutoffs in 10- year age groups, which is how the official statistics are presented. For each of the age intervals ( Figure 1 of the main text), we calculated the seroprevalence in Portugal by sex and compared it to the fraction of reported cases, as a proportion of the respective age-sex population in Portugal.
We then calculated the multiplier corresponding to how many more cases our seroprevalence study found compared with those officially reported. For this analysis, we used the number of reported cases on September 1, 2020 (21). We use this date to account for some time between infection and seroconversion, which has been reported to take ≈2 weeks (32)(33)(34). Since 90% of blood samples from participants were collected between September 8 and September 19, 2020, the chosen date is good for this comparison. Note that incidence was stable: ≈50 cases/million persons/day in early September (Appendix Figure 1).

Calculation of Infection-Fatality Rates
We used the official number of deaths due to COVID-19 by age and sex divided by our estimated number of cases in the total population to obtain the infection-fatality rate (IFR).
Again, we used cutoffs in 10-year age groups, which is how the official statistics are presented.
In addition, we took into account the typical delay between infection and death, which we assumed to be ≈3 weeks (35,36). If we assume that we are estimating infections up to September 1, 2020 (see above), then we should calculate IFR with death data from September 21. We note that there are more sophisticated ways to take into account the distribution of times until death (37,38), but here for simplicity and for lack of data on that distribution, we just calculate the quotient of deaths on September 21 by the total number of estimated infected in our study. Thus, this is only an approximation to the IFR, albeit likely a good one, because the numbers of cases and deaths were relatively low around these dates.

Statistical Analyses
We used the χ 2 test to compare categorical variables (e.g., distribution of the number of positive and negative participants with a given symptom), except when the numbers in some groups were low, when we used the Fisher exact test. We used logistic regression to analyze the effect of smoking status on prevalence of seropositivity, controlling for sex and age. For this, we used the survey package of R (39). We did not input any missing values.
All statistical analyses were two-sided, the significance level was α = 0.05, and reported CIs are at the 95% level. Statistical analyses were done by using SAS version 9.4 (SAS Institute Inc, Cary, NC, USA) and R version 3.6.1 (R Foundation for Statistical Computing, Vienna, Austria).

Sample Representativeness
Overall, comparing with the sociodemographic characteristics of the Portuguese population, we found an overrepresentation in the education and health sectors (36% of employees in the sample, compared with 19% in the population). This had an impact on some characteristics of the 18-54 age group: more women, more graduates and fewer persons living alone than in the global population of these ages. We present the characterization of this sample regarding sociodemographic and health characteristic (Appendix Tables 5-7).

SARS-CoV-2 Antibody Seroprevalence in the Population in Portugal
As we mentioned in the main text, the differences in seroprevalence across age groups were not statistically significant. However, this difference was highly dependent on population density, with the lowest observed seroprevalence in the youngest group in low population density areas (0.6%) and the highest seroprevalence also in the youngest group, but in high population density areas, with a point estimate 6 times higher (3.5%) ( Table 1, main text).
After adjusting for sensitivity and specificity, the estimated proportion of asymptomatic among seropositive was 17.4% (95% CI 14.1%-22.9%), and the prevalence of asymptomatic cases was much higher in persons <18 years of age (Appendix Table 8). If we consider paucisymptomatic cases, which also includes asymptomatic cases, the proportion among seropositive persons increases to 19.9% (95% CI 16.1%-25.4%), also with significantly higher values for persons <18 years old (39.6%) (Appendix Table 8).

Demographic, Health, and Epidemiologic Determinants of Seroprevalence
We found no difference between seropositivity levels in men and women (2.3% vs. 2.1%) ( Table 2; Appendix Table 9). In terms of occupation, there were small differences in seroprevalence between employed persons (2.3%), unemployed persons (2.5%), or students (2.3%). However, for retired persons, we found a lower seroprevalence level (1.6%). It is noteworthy that healthcare professionals (3.2%) and transport sector workers (3.2%) had higher levels of seroprevalence than other workers, such as persons in commerce, industry, education, services, or construction. About 15% (n = 1,104) of employed participants reported that they were teleworking, and teleworkers show a lower seroprevalence (1.4%) than non-teleworkers (2.4%), independently of whether the latter had contact with other persons at work (Table 2; Appendix Table 9).
We also enquired about health conditions and 27.7% (n = 3,717) participants reported at least 1 chronic condition, but we found no differences in seroprevalence for persons with or without such conditions (Appendix Table 10). However, there was, a significant difference (p = 0.002) between persons who do not smoke (n = 9,235 participants) and those who smoke (n = We also considered other health-related variables. For example, there was no difference in seroprevalence among participants who practice regular exercise versus those who do not. We also enquired about Bacille Calmette-Guerin (BCG) status (a vaccine against tuberculosis). In our study, 688 participants reported not taking this vaccine versus 10,672 who did, and seroprevalence was not statistically different between these groups (Appendix Table 10). Finally, although we found a slight over-representation of overweight and obese persons in seropositive when compared with seronegative participants, this result was not statistically significant (Table   2).
Among participants who believed that they had been in contact with an infected person, prevalence was 16.2% (95% CI 14.2%-19.3%), and most of these contacts were reported to be at work. Prevalence among participants, who had someone infected in their household, was 28.3% (95% CI 24.5%-33.7%) ( Table 2). Of the 401 participants who indicated that someone in their household had been given a diagnosis of COVID-19, 71.3% (n = 286) were seronegative, and presumably were not infected by their household contact.

Clinical Comparison of Seropositive with Seronegative Cases
Based on the clinical questionnaire, the symptoms with largest differences in reporting between seropositive and seronegative participants were loss of taste ( These are the symptoms, and the subgroups of participants, in whom prevalence is the highest, indicating a good positive predictive value. A total of 50.0% of seropositive participants had never been given a diagnosis of having a case or suspected case of infection (Appendix Altogether, among the 2,025 seronegative participants who had an RT-PCR before our study, 1.2% (n = 24) were positive. These tests were performed a median of 88 days (minimum 12 days and maximum 186 days) before the study.

Results in Context
We found an overall prevalence of 2.2% of persons positive for antibodies against SARS-CoV-2 in the population of Portugal. This prevalence is was lower than that for an earlier smaller study, using samples from persons who were tested in clinical laboratories for non-SARS-CoV-2 reasons, which showed a seroprevalence <2.9% (23). Our results suggest that there were 3-4fold as many persons infected by SARS-CoV-2 than those officially reported by health authorities. However, this factor varied across age groups, being ≈9-fold among younger persons (<18 years of age, both males and females). This result is striking because it contradicts the recent suggestion that young persons might have a lower susceptibility to infection compared with adults (40). However, other seroprevalence studies also reported this large discrepancy between seropositive young persons and official reported cases (41).
We found that ≈40% of infections were asymptomatic in persons <18 years old, whereas this proportion was much lower in older persons. However, we note that, in this study, a participant was considered asymptomatic if she or he had not experienced any of the listed symptoms since the beginning of the pandemic (i.e., within a period of 6 months). Thus, the percentage of asymptomatic infection is probably an underestimate, although it is consistent with other values reported (1-4).
Spain, the only country with which Portugal has land borders, reported 5% seroprevalence in a study done 4 months before ours (3). The dire situation observed early on in some regions and hospitals of Spain had a profound influence in the nonpharmaceutical control measures imposed by the Portuguese authorities, and these seemed to have been successful in controlling the spread of infection.
We found similar seroprevalence estimates for men (2.3%) and women (2.1%), which translates into more women having been infected than men because ≈53% of the population in Portugal are women (42), and it is also consistent with the number of confirmed cases, in which women had ≈54% of the cases. Our results also show that retired (older) persons, who might take more care not to expose themselves to the virus, had lower seroprevalence (1.6%) than other groups. Among those working, teleworking resulted in lower seroprevalence, when compared with persons physically present at their work locations. In addition, in workers of certain sectors (such as healthcare or transportation) seroprevalence was higher. Some of these differences did not reach statistical significance, but are suggestive of differences in risk for acquiring infections.
In this respect, we did not find differences in seroprevalence among persons with and without previous chronic health conditions. Given the widespread knowledge that some chronic conditions are major risk factors for severe disease, one might expect persons who had comorbidities to take extra precautions to avoid infection. However, our data do not support this expectation.
We were also able to analyze 2 controversial issues related to the risk for infection. First there have been some reports of a link between smoking and risk for SARS-CoV-2 infection (or COVID-19 severity). A few studies looked at risk for infection (asymptomatic, mild, or severe), including an ecologic meta-analysis (43), and a study of an outbreak on an aircraft carrier (44), indicating a potential protective effect of smoking. Conversely, a large cross-sectional study based on a symptom app indicated an increased risk for (symptomatic) infection for smokers (45). In our population-based study, with self-reported smoking status, we found a lower seroprevalence in smokers (1.0%) vs. non-smokers (2.4%), which was one of the most robust differences, even when accounting for sex and age of the participants. Women were the drivers of this finding, and if we analyzed only the men, we found that the difference in prevalence between smokers and non-smokers was no longer significant. Although these results were clear, it is essential to stress that smoking is a well-known risk factor for many other pathologies, most more pathogenic than SARS-CoV-2 infection (46). In addition, it is probable that once infected, smokers have a worse prognosis (47). Thus, our findings should be interpreted cautiously.
Another debated issue is the suggestion that the BCG vaccine might be protective against infection (48), which led to some ongoing clinical trials to analyze that hypothesis (49). In our study, there was a slightly increased prevalence of total SARS-CoV-2 antibodies in those reporting not taking the BCG vaccine (2.6%) versus participants who had taken the vaccine before (2.2%), which was not statistically significant, but it is consistent with a recent result (50).
We note that only a small percentage of persons (≈6%) report not taking this vaccine (excluding those that did not know their BCG status), which is in accordance with the recommendation of universal vaccination in Portugal until 2016.
Some seronegative patients reported that they had been given a diagnosis of having a suspicious case of COVID-19. However, almost none of these cases were actually confirmed by PCR. This finding is probably caused by heightened awareness of the infection, leading to many spurious diagnoses. According to the responses of participants, >60% of these suspicious cases were diagnosed by using SNS24, a National Health Service telephone line managed by the government as a first line of medical advice (not just during the pandemic). The national health authorities reported the number of suspected cases in their daily briefings until August 16, 2020 (21). On that day, 2 weeks before the start of our study, there were 468,937 suspected cases, which corresponds to 4.6% of the ≈10.3 million persons in Portugal. The number of suspicious diagnosis in our sample is consistent with that value. However, there were 24 seronegative persons who reported having a positive RT-PCR result before our study.
There are several possible explanations for this observation. These persons could have true negative results (e.g., persons who did not yet have antibodies, persons who might have lost antibodies (seroreversion), or persons who had a false-positive RT PCR result). Alternatively, they could be persons who had false-negative results in our antibody test. In any case, when correcting our prevalence estimates with the sensitivity and specificity of the test, we are (up to a point) taking into account these potential false-negative results in the antibody test.
As stated in the main text, our study has some limitations. We used quota sampling, relying on volunteers for the study. Thus, our sample might not reflect the population of Portugal in some demographic/epidemiologic respects. We stratified the study and sampled over counties in Portugal to at least have an appropriate representation over these variables (age and population density). In addition, we checked sex distribution by strata and found a distortion in the 18-54 years age group , for all density levels, leading us to post-stratify by sex, despite the resulting larger imprecision in the estimates. However, there is always the possibility that access to the internet, interest in finding serostatus results, and other factors bias the sample of participants. In this regard, it is useful to note that other sample characteristics that deviated from the population statistics, such as education level or household size, were not associated with seroprevalence.
One reason we chose our method of enrollment was to achieve a fast enrollment process. During an infection outbreak, the number of persons infected, who eventually will seroconvert, is changing continuously. This process is different from other study situations in which the outcome is more stable (e.g., chronic conditions, behavior, or opinions). If the study (i.e., enrollment) takes too long, then large changes in prevalence during the study period are possible, and it is unclear how to associate the prevalence estimate with a given time period. We reasoned that the occurrence of such changes could bias the study more than the method of recruitment. In addition, we note that studies designed to have a fully random sample often end up with a large fraction of persons not participating (e.g., refusing to participate or could not be contacted), negating the objective of that design choice (3,16). Another limitation is that we used relatively large intervals for age groups. Likely, a more fine-grained stratification (e.g., 0-5, 6-10, 11-20, 21-50, 51-60, 61-70, 71-80, >80 years) would be more representative of epidemiologic and clinical aspects of SARS-CoV-2. However, such stratification, as well as adding other variables (e.g., biologic sex), would need a much larger sample size.
Our study was also based on a self-reporting questionnaire, often retrospectively, especially for such issues as past symptoms and behaviors, and we cannot exclude errors in this reporting. We did recontact persons who consented and for whom there were inconsistencies in the questionnaire results that were clear obvious mistakes. In addition, in a study of seroprevalence, there are always potential issues of assay imprecision, which we attempted to correct on the basis of published sensitivity and specificity. Finally, we did not correct for potential seroreversion, which has been suggested (51)(52)(53). This phenomenon would reduce the fraction of seropositive persons detected in our study in relation to the actual number of past infections, which would also lower the estimated IFR. We note that this study was conducted 6 months after the start of the pandemic in Portugal, and persons were infected at various times within that period. Several studies, including our own, have now demonstrated that antibodies to SARS-CoV-2 are often detectable for >6 months (6,(54)(55)(56)(57)(58) The proportion of asymptomatic observed in our weighted sample was adjusted taking into consideration the sensitivity and specificity of the test, using the following formula where A is the observed weighted proportion of asymptomatic in the seropositive participants Pm is the measured seroprevalence, AS is the observed proportion of asymptomatic in the full sample, Sp is the test specificity, and Aadj is the final adjusted proportion of asymptomatic.

Derivation of the Formula Based on Conditional Probabilities and Bayes' Law
Consider these events/statements Ab, having antibodies T + , having a positive antibody test, and the corresponding probability Pm = P[T + ] T -, ≈T + (having a negative antibody test)