Article Text

Increasing test specificity without impairing sensitivity: lessons learned from SARS-CoV-2 serology
Free
  1. Thomas Perkmann1,
  2. Thomas Koller1,
  3. Nicole Perkmann-Nagele1,
  4. Maria Ozsvar-Kozma1,
  5. David Eyre2,
  6. Philippa Matthews3,
  7. Abbie Bown4,
  8. Nicole Stoesser3,
  9. Marie-Kathrin Breyer5,6,
  10. Robab Breyer-Kohansal5,6,
  11. Otto C Burghuber6,7,
  12. Slyvia Hartl5,6,7,
  13. Daniel Aletaha8,
  14. Daniela Sieghart8,
  15. Peter Quehenberger1,
  16. Rodrig Marculescu1,
  17. Patrick Mucher1,
  18. Astrid Radakovics1,
  19. Miriam Klausberger9,
  20. Mark Duerkop10,
  21. Barba Holzer11,
  22. Boris Hartmann11,
  23. Robert Strassl1,
  24. Gerda Leitner12,
  25. Florian Grebien13,
  26. Wilhelm Gerner14,15,16,
  27. Reingard Grabherr9,
  28. Oswald F Wagner1,
  29. Christoph J Binder1,
  30. Helmuth Haslacher1
  1. 1Department of Laboratory Medicine, Medical University of Vienna, Wien, Austria
  2. 2Big Data Institute, Nuffield Department of Population Health, University of Oxford, Oxford, UK
  3. 3Nuffield Department of Medicine, University of Oxford, Oxford, UK
  4. 4Public Health England Porton Down, Salisbury, UK
  5. 5Department of Respiratory and Critical Care Medicine, Clinic Penzing, Vienna, Austria
  6. 6Ludwig Boltzmann Institute for Lung Health, Vienna, Austria
  7. 7Sigmund Freud Private University Vienna, Vienna, Austria
  8. 8Division of Rheumatology, Department of Medicine III, Medical University of Vienna, Vienna, Austria
  9. 9Institute of Molecular Biotechnology, Department of Biotechnology, University of Natural Resources and Life Sciences (BOKU) Vienna, Vienna, Austria
  10. 10Institute of Bioprocess Science and Engineering, Department of Biotechnology, University of Natural Resources and Life Sciences (BOKU) Vienna, Vienna, Austria
  11. 11Institute for Veterinary Disease Control, Austrian Agency for Health and Food Safety (AGES), Moedling, Austria
  12. 12Department of Blood Group Serology and Transfusion Medicine, Medical University of Vienna, Vienna, Austria
  13. 13Institute for Medical Biochemistry, University of Veterinary Medicine Vienna, Vienna, Austria
  14. 14Institute of Immunology, University of Veterinary Medicine Vienna, Vienna, Austria
  15. 15Christian Doppler Laboratory for an Optimized Prediction of Vaccination Success in Pigs, University of Veterinary Medicine Vienna, Vienna, Austria
  16. 16The Pirbright Institute, Pirbright, UK (current)
  1. Correspondence to Dr Helmuth Haslacher, Department of Laboratory Medicine, Medical University of Vienna, Wien 1090, Austria; helmuth.haslacher{at}meduniwien.ac.at

Abstract

Background Serological tests are widely used in various medical disciplines for diagnostic and monitoring purposes. Unfortunately, the sensitivity and specificity of test systems are often poor, leaving room for false-positive and false-negative results. However, conventional methods were used to increase specificity and decrease sensitivity and vice versa. Using SARS-CoV-2 serology as an example, we propose here a novel testing strategy: the ‘sensitivity improved two-test’ or ‘SIT²’ algorithm.

Methods SIT² involves confirmatory retesting of samples with results falling in a predefined retesting zone of an initial screening test, with adjusted cut-offs to increase sensitivity. We verified and compared the performance of SIT² to single tests and orthogonal testing (OTA) in an Austrian cohort (1117 negative, 64 post-COVID-positive samples) and validated the algorithm in an independent British cohort (976 negatives and 536 positives).

Results The specificity of SIT² was superior to single tests and non-inferior to OTA. The sensitivity was maintained or even improved using SIT² when compared with single tests or OTA. SIT² allowed correct identification of infected individuals even when a live virus neutralisation assay could not detect antibodies. Compared with single testing or OTA, SIT² significantly reduced total test errors to 0.46% (0.24–0.65) or 1.60% (0.94–2.38) at both 5% or 20% seroprevalence.

Conclusion For SARS-CoV-2 serology, SIT² proved to be the best diagnostic choice at both 5% and 20% seroprevalence in all tested scenarios. It is an easy to apply algorithm and can potentially be helpful for the serology of other infectious diseases.

  • serology
  • allergy and immunology
  • medical laboratory science

Data availability statement

Data are available upon reasonable request. Data are available to interested researchers upon request from the corresponding author.

This article is made freely available for personal use in accordance with BMJ’s website terms and conditions for the duration of the covid-19 pandemic or until otherwise determined by BMJ. You may use, download and print the article for any lawful, non-commercial purpose (including text and data mining) provided that all copyright notices and trade marks are retained.

https://bmj.com/coronavirus/usage

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

What is already known on this topic

  • Serological tests are widely used throughout medical disciplines. When a serological assay is to be established, usually a threshold is defined above or below which a result is considered indicative of a certain medical condition. This cut-off comes with a distinct sensitivity and specificity. Sensitivity and specificity are communicating vessels—increasing one comes at the cost of the other. Common orthogonal testing algorithms concentrate on confirming positive cases, thereby increasing specificity, but decreasing sensitivity.

What this study adds

  • Here, we propose a novel orthogonal test algorithm applying serological assays with adjusted cut-offs. The reduction of the thresholds for positivity in both the screening and confirmation tests as well as the additional introduction of a high cut-off in the screening test, above which no further confirmation is required, allow to increase the specificity without compromising the sensitivity. This alorithm, which we termed ‘sensitivity-improved two-test, SIT²’, was derived in an Austrian cohort using five different SARS-CoV-2 antibody tests and validated in an independent UK cohort.

How this study might affect research, practice or policy

  • This paper clearly shows that, in the case of SARS-CoV-2 serology, the use of two randomly chosen test systems allowed for increasing test specificity without impairing its sensitivity. This is of specific interest, when an ongoing pandemic leads to waning antibody levels—in this case, sensitivities should not be further impaired. Furthermore, we are confident that the principle of SIT² is universally applicable and that this algorithm could also be used with serological assays other than those for SARS-CoV-2.

Introduction

Serological tests are commonly used diagnostic tools in a broad medical field, spanning from infectiology1 2 to autoimmunity,3 4 oncology5 and transplantation medicine.6 They also play a critical role in animal disease surveillance.7 However, many serological tests come with acceptable but imperfect sensitivities and specificities. Tests with specificities slightly above 90% are considered good8 or even highly specific.5 However, at low seroprevalence rates, every single per cent counts: if the frequency of a given disease in the tested population is only 5%, a specificity of 90% would mean that—even at a sensitivity of 100%–5 true positives would be matched by 10 false positives. Thus, the probability of an individual with a positive test (positive predictive value, PPV) to be a true positive would be only 33%.

During the early phase of the SARS-CoV-2 pandemic, seroprevalences were far below 1%.9 Therefore, highly specific test systems were necessary (>99.5%) to provide good PPVs.10 Sensitivity and specificity can be seen as communicating vessels—the improvement of one is usually at the expense of the other.11 Consequently, test systems adjusted by the manufacturers to very high specificities (>99%) showed moderate sensitivity. This problem was particularly evident when non-hospitalised patients were included in the cohorts studied.12–14 To further increase specificity at very low seroprevalence levels, various methods have been proposed, for example, raising the thresholds for positivity or confirming a positive result with a second test (OTA; orthogonal testing).11 15 16 Unfortunately, these specificity improvement strategies inevitably lead to a further reduction of the previously moderate sensitivities.

As the pandemic progressed, the problem became more pronounced as antibodies declined, and sophisticated statistical models were required to compensate for the waning sensitivity.17 In the case of SARS-CoV-2, as with any evolving pandemic, increasing seroprevalence rates worldwide have attenuated the need for higher specificity.

However, the problem persists in non-epidemic diseases, where seroprevalence remains low. Moreover, each new pandemic begins with extremely low seroprevalence rates as well, and, in the future, we should have better diagnostic strategies in infectious serology ready here.

In the present work, we propose for the first time an orthogonal test algorithm based on real-life data for the SARS-CoV-2 antibody tests of Roche, Abbott, DiaSorin, and two commercial SARS-CoV-2 ELISAs18 intending to maximise both specificity and sensitivity at the same time. Although our algorithm follows a general principle, it was developed based on SARS-CoV-2 antibody tests. The SARS-CoV-2 pandemic provided a unique opportunity in this regard, as historical samples from before the pandemic are negative by definition (thus allowing accurate specificity testing). In addition, sufficient PCR-confirmed positive cases were available quickly, ensuring a sophisticated and reliable sensitivity verification. Thus, for SARS-CoV-2—in contrast to most other circulating microorganisms—a realistic and unusually accurate estimation of specificities and sensitivities of serological tests was possible. We benefited from this advantage to develop our sophisticated diagnostic algorithm.

Methods

Study design and cohorts

Sera used in this non-blinded prospective cross-sectional study were either residual clinical specimens or samples stored in the MedUni Wien Biobank (n=1181), a facility specialised in the preservation and storage of human biomaterial, which operates within a certified quality management system (ISO 9001:2015).19

For derivation of the SIT2 algorithm, sample sets from individuals known to be negative and positive were established for testing. As previously described,20 samples collected before 1 January 2020 (ie, assumed SARS-CoV-2 negative) were used as a specificity cohort (n=1117): a cross-section of the Viennese population (the LEAD (Lung, hEart, sociAl, boDy) study),21 preselected for samples collected between November and April to enrich for seasonal infections (n=494); a collection of healthy voluntary donors (n=265); a disease-specific collection of samples from patients with rheumatic diseases (n=358) (see also online supplemental tables S1 and S2).

Supplemental material

The SARS-CoV-2-positive cohort (n=64 samples from n=64 individuals) included patients testing positive with reverse transcription PCR (RT-PCR) during the first wave and their close, symptomatic contacts. Of this cohort, five individuals were asymptomatic, 42 had mild–moderate symptoms, 4 reported severe symptoms and 13 were admitted to the intensive care unit. The timing of symptom onset was determined by a questionnaire for convalescent donors and by reviewing individual health records for patients and was in median 41 (26, 25–49) days. For asymptomatic donors (n=5), SARS-CoV-2 RT-PCR confirmation time was used instead (for more details, see online supplemental tables S1 and S3). All included participants gave written informed consent to donate their samples for research purposes. The overall evaluation plan conformed with the Declaration of Helsinki as well as relevant regulatory requirements.

For validation of the SIT² algorithm, we used data from an independent UK cohort,22 including 1512 serum/plasma samples (536 PCR-confirmed SARS-CoV-2-positive cases; 976 negative cases, collected earlier than 2017).

Antibody testing

For the derivation analyses, SARS-CoV-2 antibodies were either measured according to the manufacturers’ instructions on three different commercially available automated platforms (Roche Elecsys SARS-CoV-2 (total antibody assay detecting IgG, IgM and IgA antibodies against the viral nucleocapsid, further referred to as Roche NC), Abbott SARS-CoV-2-IgG assay (nucleocapsid IgG assay, Abbott NC), DiaSorin LIASION SARS-CoV-2 S1/S2 assay (S1/S2 combination antigen IgG assay, DiaSorin S1/S2)) or using 96-well ELISAs (Technoclone Technozym RBD and Technozym NP) yielding quantitative results18 (for details, see online supplemental methods). The antibody assays used in the validation cohort were Abbott NC, DiaSorin S1/S2, Roche NC, Siemens RBD total antibody, and a novel 384-well trimeric spike protein ELISA (Oxford Immunoassay),22 resulting in 20 evaluable combinations. All samples from the Austrian SARS-CoV-2-positive cohort also underwent live virus neutralisation testing (VNT), and neutralisation titres (NT) were calculated, as is described in detail in online supplemental methods.

Sensitivity-improved two-test method

Our newly developed sensitivity-improved two-test (SIT²) method consists of the following key components: (1) sensitivity improvement by cut-off modification and (2) specificity rescue by a second, confirmatory test (figure 1A).

Figure 1

(A) The sensitivity improved two-test (SIT2) algorithm includes sensitivity improvement by adapted cut-offs and a subsequent specificity rescue by re-testing all samples within the re-testing zone of the screening test by a confirmatory test. (B) Testing algorithm for SIT2 using a screening test on an automated platform (ECLIA/Roche, CMIA/Abbott, CLIA/DiaSorin) and a confirmation test, either on one of the remaining platforms or tested by means of ELISA (Technozym RBD, NP). (C) All test results between a reduced cut-off suggested by the literature, and a higher cut-off, above which no more false positives were observed, were subject to confirmation testing. **results between 12.0 and 15.0, which are according to the manufacturer considered borderline, were treated as positives; ***suggested as a cut-off for seroprevalence testing; ****determined by in-house modelling; 1see23 ; 2see24 ; 3see.25

For the first component of the SIT² algorithm, positivity thresholds were optimised for sensitivity according to the first published alternative thresholds for the respective assays, calculated, for example, by ROC (receiver operating characteristic)-analysis.23–25 Additionally, a high cut-off, above which a result can be reliably regarded as true positive without the need for further confirmation, was defined. These levels were based on in-house observations20 and represent those values (including a safety margin) above which no more false positives were found. The highest results seen in false positives were 1.800 COI, 2.86 Index and 104.0 AU/mL, respectively. Hence, we defined the high cut-off for Roche and Abbott as 3.00 COI/Index and for DiaSorin as 150.0 AU/mL. The lowering of positivity thresholds improves sensitivity; the high cut-off prevents unnecessary retesting of clearly positive samples. Moreover, the high cut-off avoids possible erroneous exclusion by the confirmatory test. The newly defined interval between the reduced threshold for positivity and the high cut-off is the retesting zone (figure 1A). The initial antibody test (screening test) is then followed by a confirmatory test, whereby positive samples from the retesting zone of the screening test are retested. Also, for the confirmatory test, sensitivity-adapted assay thresholds are needed (figure 1A,B). As false-positive samples are usually only positive in one test system (Fig. S1), false positives can be identified, and specificity markedly restored with minimal additional testing as most samples do not fall within the retesting zone.16 20 A flowchart of the testing strategy and the applied cut-off levels and their associated quality criteria are presented in figure 1B,C.

Test strategy evaluation

On the derivation cohort, we compared the overall performance of the following SARS-CoV-2 antibody testing strategies: (1) testing using single assays, (2) simple lowering of thresholds, (3) classical OTA and (4) our newly developed SIT2 algorithm at assumed seroprevalences of 5% and 20%. As part of the derivation, we then compared the performance of OTAs and SIT2 against the results of a virus neutralisation assay. On the validation cohort, we then compared the performance of OTAs and SIT2. Finally, we used data from this cohort to evaluate the performance of SIT2 versus single tests at seroprevalences of 5%, 10%, 20% and 50% if the Abbott and DiaSorin assays (ie, assays with varying degrees of discrepancies in sensitivity and specificity) were used.

Statistical analysis

Unless otherwise indicated, categorical data are given as counts (percentages), and continuous data are presented as median (IQR). Total test errors were compared by Mann-Whitney tests or, in case they were paired, by Wilcoxon tests. 95% CIs for sensitivities and specificities were calculated according to Wilson, 95% CI for predictive values were computed according to Mercaldo-Wald unless otherwise indicated. Sensitivities and specificities were compared using z-scores. P values <0.05 were considered statistically significant. All calculations were performed using Analyse-it V.5.66 (Analyse-it Software, Leeds, UK) and MedCalc V.19.6 (MedCalc bvba, Ostend, Belgium). Graphs were drawn using Microsoft Visio (Armonk, USA) and GraphPad Prism V.7.0 (La Jolla, USA).

Results

In the derivation cohort of 1117 prepandemic sera and 64 sera from convalescent patients with COVID-19 (80% non-hospitalised, 20% hospitalised), the Roche NC, Abbott NC and DiaSorin S1/S2 antibody assays gave rise to 7/64, 10/64 and 11/64 false-negative, as well as to 3/1117, 9/1117 and 20/1117 false-positive results. Assuming a seroprevalence of 20%, this led to 2180, 3120 and 3440 false-negative results per 100 000 tests, and 240, 650 and 1440 false-positive results per 100 000 tests, respectively (figure 2A, right panel).

Figure 2

False-positives (FP)/false-negatives (FN) (A) and total error (B) of single tests, tests with reduced thresholds according to,23–25 orthogonal testing algorithms (OTAs) and the sensitivity improved two-test (SIT2) algorithm at 5 and 20% estimated seroprevalence. Data in (B) were compared by Mann-Whitney tests (unpaired) or Wicoxon tests (paired). *p<0.05; **p<0.01; ***p<0.001.

Effects of threshold lowering on sensitivity and specificity

Lowering the positivity thresholds for the Roche NC, Abbott NC and Diasorin S1/S2 to 0.165 COI, 0.55 Index and 9 AU/mL increased the sensitivity significantly and reduced false-negative results to 1/64, 2/64, and 7/64 (320, 620 and 2180 per 100 000 tests at a seroprevalence of 20%), but substantially increased false-positive results to 18/1117, 27/1117 and 31/1117, respectively (1280, 1920 and 2240 per 100 000 tests, at an assumed seroprevalence of 20%; online supplemental table S4, figure 2A, right panel).

Classical OTA compared with SIT2

Subsequently, we evaluated 12 OTA combinations using the fully automated SARS-CoV-2 antibody tests from Roche NC, Abbott NC and DiaSorin S1/S2 as screening tests, each combined with one of the other fully automated assays or a commercially available NC or RBD-specific ELISA as a confirmation test. Combining these tests as classical OTAs significantly increased specificity and reduced false positives to 0 (0–1)/1117. However, the rate of false negatives was 14 (12–16)/64 (1095 (955–1230) per 100 000 tests at 20% seroprevalence), and, therefore, considerably higher than for single testing strategies. In contrast, the SIT2 algorithm minimised false positives to 0 (0–2)/1117 (0 (0–140) per 100 000 tests at 20% seroprevalence) while also reducing false negatives to 5 (3–8)/64 (1560 (940–2420) per 100 000 tests at 20% seroprevalence, figure 2A right panel; online supplemental table S5).

Reduction of total error rates by the SIT2

Of all the methods assessed, SIT2 reached the lowest total error rates per 100 000 tests under both 5% and 20% assumed seroprevalence (455 (235–685) and 1600 (940–2490) per 100 000 tests) (figure 2B). At a seroprevalence of 5%, OTA on average performed better than individual tests, and the total error rates of the single tests were higher for the Abbott NC and DiaSorin S1/S2 assay (OTA 1095 (955–1325) vs 830 (Roche NC), 1540 (Abbott NC) and 2570 (DiaSorin S1/S2) per 100 000 tests). But with a seroprevalence of 20%, performance of OTAs, worsened compared with single tests (OTA 4380 (3820–5000) vs 1600 (Roche), 2540 (Abbott) and 4420 (DiaSorin) per 100 000 tests) (figure 2B). Therefore, at both 5% and 20% seroprevalence, SIT2 resulted in the lowest overall errors. Compared with OTAs, SIT2 yielded a similar improvement in specificity while not suffering from the significant sensitivity reduction (online supplemental figure S2). Since the better overall performance of SIT2 compared with OTAs was not due to increased specificity but improved sensitivity, we subsequently set out to examine these differences in more detail.

Sensitivities of single tests, OTA and SIT2 in relation to NT

Next, we compared the sensitivities of the three screening tests as single tests and in both two-test methods (OTA and SIT2), benchmarking them using the Austrian sensitivity cohort (n=64) simultaneously evaluated with an authentic SARS-CoV-2 VNT. Regardless of the screening test used (Roche NC, Abbott NC, or DiaSorin S1/S2), OTAs had lower sensitivities than single tests (80.5% (78.5–83.6), 78.1% (75.8–82.8) or 75.8% (71.5–78.9) vs 89.1%, 84.4% or 82.8%, respectively), and SIT2 showed the best sensitivities of all methods (95.3% (93.0–96.5), 93.8% (92.2–96.5) or 87.5% (85.1–88.7)) (figure 3). SIT2 algorithms, including the Roche NC and Abbott NC assays, achieved similar or even higher sensitivities than VNT (figure 3, VNT reference line), made possible by the unique retesting zone of SIT2 (Fig. S3).

Figure 3

Sensitivities of single tests, orthogonal testing algorithms (OTAs) and the sensitivity improved two-test (SIT2) algorithm. The dotted line indicates the sensitivity of virus neutralisation test (VNT).

Validation of the SIT2 using an independent cohort

To confirm the improved sensitivity of SIT 2 compared with OTA, we analysed the sensitivities of OTAs and SIT2 in an independent validation cohort of 976 prepandemic samples and 536 post-COVID samples. Out of 20 combinations using the assays Roche NC (total antibody), Abbott NC (IgG), DiaSorin S1/S2 (IgG), Siemens RBD (total antibody) and Oxford trimeric-S (IgG), a statistically significant improvement in sensitivities over OTAs was shown for SIT2 in 18 combinations (figure 4). The performance was comparable for the remaining two combinations (Siemens RBD with Oxford trimeric-S and vice versa). Still, no statistically significant improvement could be achieved due to the high pre-existing sensitivities of these assays on this particular sample cohort.

Figure 4

Differences in sensitivity and specificity (mean±95% CI) between the sensitivity improved two-test (SIT2) algorithm and standard orthogonal testing algorithms (OTAs) within the UK validation cohort. *p<0.05; **p<0.01; ***p<0.001; ****p<0.0001.

To further illustrate the effect of SIT2 on the outcome of SARS-CoV-2 antibody testing, we compared single testing versus SIT2 with the Abbott and DiaSorin assays at varying assumed seroprevalences (5, 10, 20 and 50%), given that the Abbott NC assay is a highly specific (99.9%), but moderately sensitive test (92.7%), and the DiaSorin S1/S2 assay has the most limited specificity (98.7%) of all evaluated assays but an acceptable sensitivity (96.3%). Regardless of whether a lack of specificity (DiaSorin S1/S2) or sensitivity (Abbott NC) had to be compensated for, SIT2 improved the overall error rate compared with the individual tests in all four combinations and at all four assumed seroprevalence levels (figure 5).

Figure 5

Comparing false-positives (FP), false-negatives (FN) and total error (TE) for two selected test systems, (A) Abbott, (B) DiaSorin, between different sensitivity improved two-test (SIT2) combinations and the respective single test within the UK validation cohort for different estimated seroprevalences.

Discussion

Serology is a commonly used, multi-purpose analytical method.1–6 However, not all serologic assays have appropriate sensitivities and specificities, especially in low-prevalence settings. The SARS-CoV-2 pandemic prompted the simultaneous development of several antibody tests and, which is rare otherwise, allowed to evaluate these tests with both confirmed positive and negative cases, the latter derived from biobank collections established before the virus emerged. In the case of SARS-CoV-2, false-positive samples are usually not simultaneously reactive in different test systems.16 20 This led to the hypothesis that reducing the threshold for positivity in screening and confirmation tests would increase the specificity without impairing the sensitivity. A further improvement in sensitivity was possible by defining a high cut-off for the screening test, above which, due to the excellent reliability of high test results, no further confirmation (and, thereby, a possible false-negative result in the confirmation test) was necessary.

In the early waves of the SARS-CoV-2 pandemic, many commercially available SARS-CoV-2 antibody tests did not provide sufficient specificity to achieve acceptable PPVs, for example, at a seroprevalence rate of 1–5%.15 20 Lowering positivity thresholds might improve test sensitivity23–25 and conventional orthogonal testing can maximise specificity.11 26 27 The latter might increase the PPV, but PPV will only be relevantly increased at low seroprevalences. However, since seroprevalence is often neither known and varies widely from region to region, it is difficult to judge whether a less specific or less sensitive test is the lesser of two evils.

Based on actual data related to SARS-CoV-2, we propose a new, universally adaptable two-test system that could, in the case of SARS-CoV-2, perform better than any other known approach regardless of the actual seroprevalence: the sensitivity-improved two-test or SIT2. For this, we established the algorithm in our COVID-19 cohort (including 1181 samples, 1117 prepandemic negative, and 64 confirmed post-COVID-positive samples) and validated it in a completely independent UK cohort (including 1512 samples, 976 negatives and 536 positives). So, the associations found were neither exclusively related to a particular cohort nor the analysing institutions. All Austrian cohort samples were tested with the following assays: Roche, Abbott, DiaSorin S1/S2, Technozym RBD and Technozym NP. The UK cohort we used for validation included a complete data set of all samples analysed with the Roche, Abbott, DiaSorin S1/S2, Siemens, Oxford assays. Hence, the Austrian and the UK cohorts shared three test systems (Roche, Abbott and DiaSorin S1/S2) but differed regarding specific characteristics of the included negative and positive samples. Besides these three overlapping test systems, each cohort included data of two more exclusive SARS-CoV-2 antibody assays in the analysis. The use of these different combinations should underscore the universality of SIT².

Its generalisability can be inferred further in detail from the following features: (1) the adapted cut-offs used to optimise sensitivity were determined in various independent studies and were not explicitly calculated for our cohort23–25, (2) SIT2 was effective, although with different efficiencies, in a total of 32 different test combinations and (3) SIT2 was successfully validated in an independent cohort, which was profoundly different from the derivation cohort. The robustness of a diagnostic algorithm regarding analytical variability (lot-to-lot variability, instrument-dependent variability or method-specific confounders) is essential. Based on our study design with three overlapping assays (Roche, Abbott and DiaSorin) tested at two sites with two different cohorts but without lot matching, we did not find any adverse effects on the robustness of our algorithm by these potential confounders. Moreover, we estimated the SIT² robustness to between-lot variability simulating how the algorithm’s performance would change if results would vary according to their respective reference change values. For this, we used an SIT²-algorithm consisting of Roche and Abbott as an example and could conclude that expectable between-assay variability might only marginally affect the algorithm (data not shown). Therefore, SIT2 does not require a particular infrastructure, the availability of high-performance individual test systems or specific reagent lots to work but can optimise the performance of any available test system.

Our SIT2 strategy can rescue the specificity with minimal repeat testing required (see online supplemental table S6). For example, when applying the Roche NC as a screening test to our cohort, only 27 out of 1181 samples needed confirmation testing with the Abbott NC test to correctly identify 62/64 true positives. Simultaneously, all false-positive results were eliminated, including those added by lowering the cut-offs (online supplemental table S4 and figure S1). Additionally, it was more sensitive than VNT, which identified only 60/64 clinical positives (figure 3). This result is not completely surprising as it is known that not all patients who recovered from COVID-19 show detectable levels of neutralising antibodies.28 Nevertheless, it should be noted that although antibody binding assays may have a higher sensitivity than neutralisation assays, they only partially reflect the functional activity of SARS-CoV-2 reactive antibodies.29 30 The sensitivity of SARS-CoV-2 tests may change over time, as prominently shown in a Brazilian study, where pronounced antibody waning led to an apparent decrease in seroprevalence already a few months after a SARS-CoV-2 corona wave.17 However, this was mainly caused by the strongly decreasing sensitivity of the test system used. The measured seroprevalence decreased from 46.3% in June 2020 to only 20.7% in October 2020, when the standard manufacturer cut-off of 1.4 was used for the Abbott NC test. When the same data were analysed with a reduced cut-off of 0.4, the values changed from 54.3% in June to 44.6% in October, so the apparent decrease in seroprevalence was much less pronounced. Lowering the cut-off to increase the sensitivity of a test system (and, therefore, also to compensate for such time-dependent sensitivity losses) is the first step of our SIT2 algorithm. As this cut-off lowering reduces the specificity of a test (so with the 0.4 cut-off, the seroprevalence rate in June was 8% higher than with the 1.4 cut-off, including more false positives), it is necessary to rescue this loss of specificity by a second test (also highly sensitive by cut-off lowering). This should illustrate that while there are test systems whose sensitivity changes more rapidly over time, and there is physiologically a time-dependent decrease in antibody levels, SIT2 offers a strategy to counteract this development with an increase in sensitivity by cut-off lowering and subsequent correction of specificity. Thus, these time-dependent sensitivity changes are not a significant problem for SIT2. Accordingly, there are far-reaching potential applications. Regarding SARS-CoV-2, on the one hand, the use of an algorithm of this kind could increase the reliability of seroprevalence analyses, especially in low-prevalence areas. On the other hand, its use in routine clinical diagnostics is also conceivable. In the case of SARS-CoV-2, the emergence of new viral variants particularly affects test sensitivity.31 This could be counteracted by increasing sensitivity through modified cut-offs, and specificity would subsequently be restored by a second test. For SARS-CoV-2 testing, it must be further emphasised that different mechanisms of immunisation induce different humoral responses, whereas an infection usually leads to both antinucleocapsid and antispike antibodies, the immune response to an mRNA-based, vector-based or protein-based vaccine that introduces only the spike-protein lacks the anti-nucleocapsid antibody.32 Accordingly, among the vaccinated, tests assessing antispike antibodies might not be useful in detecting individuals after SARS-CoV-2 infection, as the measured amount would have at least partly been induced by the vaccine. However, add-on infection could boost antispike levels.33 These conditions must be considered when searching for the optimal combination of tests for a SIT2 approach.

Our study has both strengths and limitations. One strength is the size of the cohorts examined, both in deriving the SIT2 algorithm (N=1181) and validating it (N=1512). The composition of our specificity cohort is also unique: it consists of three subcohorts with selection criteria to further challenge analytical specificity. The lower cut-offs used to increase sensitivity were not modelled within our data sets but were derived from ROC analyses data of independent studies.23–25 Furthermore, we were able to test the performance of the two-test systems in a total of 32 combinations, 12 in the derivation cohort and another 20 combinations in the validation cohort. As a limitation, in the Austrian cohort, only samples≥14 days after symptom onset were included. Therefore, no conclusions on the sensitivity of the early seroconversion phase can be made from these data. Furthermore, mild and asymptomatic cases were under-represented in the British cohort, perhaps leading to an observed higher sensitivity of the test systems. Moreover, the analysis did only include samples collected during the first wave; therefore, positive individuals were most likely infected by the wildtype virus. However, as stated above, the emergence of new variants challenges a test system’s sensitivity even more, which only reinforces the need to increase sensitivity without harming specificity, as we propose here by using SIT2.

In conclusion, we describe the novel two-test algorithm SIT2, which makes it possible to maintain or even significantly improve sensitivity while approaching 100% specificity.

Data availability statement

Data are available upon reasonable request. Data are available to interested researchers upon request from the corresponding author.

Ethics statements

Patient consent for publication

Ethics approval

The study was reviewed and approved by the ethics committee of the Medical University of Vienna (1424/2020). Participants gave informed consent to participate in the study before taking part.

Acknowledgments

We sincerely thank Marika Gerdov, Susanne Keim, Karin Mildner, Elisabeth Ponweiser, Manuela Repl, Ilse Steiner, Christine Thun, and Martina Trella for excellent technical assistance. Finally, we want to thank all the donors of the various study cohorts.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Handling editor Tahir S Pillay.

  • Contributors TP and TK contributed equally. Conceptualisation: TP, TK, OFW, HH. Methodology: TP, TK, NP-N, HH; investigation: TP, TK, NP-N, MO-K, DWE, PM, AB, NS, M-LB, RB-K, OCB, SH, DA, DS, PQ, RM, PM, AR, MK, MD, BHo, BHa, RS, GL, FG, WG, RG, HH; data curation: TP, TK, DW, PM, AB, NS, M-LB, RB-K, OCB, SH, DA, DS; project administration: PM, AR; formal analysis: HH; validation: DWE, PM, AB, NS; writing—original draft: TP, TK, NP-N, HH; visualisation: HH; supervision: OFW, CJB, HH; resources: DWE, PM, AB, NS, M-LB, RB-K, OCB, SH, DA, DS, PQ, RM, MK, MD, BHo, BHa, RS, GL, FG, WG, RG, OFW, CJB; writing—review and editing: all authors; guarantor: HH.

  • Funding The MedUni Wien Biobank is funded to participate in the biobank consortium BBMRI.at (www.bbmri.at) by the Austrian Federal Ministry of Science, Research and Technology. There was no external funding received for the work presented. However, test kits for the Technoclone ELISAs were kindly provided by the manufacturer.

  • Competing interests NP-N received a travel grant from DiaSorin. DWE reports lecture fees from Gilead outside the submitted work. OCB reports grants from GSK, grants from Menarini, grants from Boehringer Ingelheim, grants from Astra, grants from MSD, grants from Pfizer, and grants from Chiesi, outside the submitted work. SH does receive unrestricted research grants (GSK, Boehringer, Menarini, Chiesi, Astra Zeneca, MSD, Novartis, Air Liquide, Vivisol, Pfizer, TEVA) for the Ludwig Boltzmann Institute of COPD and Respiratory Epidemiology, and is on advisory boards for G. SK, Boehringer Ingelheim, Novartis, Menarini, Chiesi, Astra Zeneca, MSD, Roche, Abbvie, Takeda and TEVA for respiratory oncology and COPD. PQ is an advisory board member for Roche Austria and reports personal fees from Takeda outside the submitted work. The Dept. of Laboratory Medicine (Head: OWF) received compensations for advertisement on scientific symposia from Roche, DiaSorin, and Abbott and holds a grant for evaluating an in-vitro diagnostic device from Roche. CJB is a Board Member of Technoclone. HH receives compensations for biobank services from Glock Health Science and Research and BlueSky immunotherapies.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.