Article Text

Down's syndrome screening: a controversial test, with more controversy to come!
Free
1. T M Reynolds1
1. 1Queen's Hospital and Division of Clinical Sciences, Wolverhampton University, Wolverhampton WV1 1SB, UK
1. Professor Reynolds, Clinical Chemistry Department, Queen's Hospital, Belvedere Road, Burton on Trent, Staffordshire DE13 0RB, UK Tim.Reynolds{at}Queens.Burtonh-tr.wmids.nhs.UK

## Abstract

By 1998, most health authorities offered antenatal screening for Down's syndrome, usually by biochemical methods. To date, the development of this form of screening has not been coordinated by a national body and, consequently, there are wide variations in practice between localities. Fortunately, many of these variations have not led to any noticeable inequality of health provision, but the wide variation in risk cut offs used by different centres does. Other variations merely lead to potentially unnecessary expenditure; whereas it is believed that adding extra tests to the screening procedure is beneficial (such as double test to triple test), statistical evaluation of the confidence intervals for the detection rates quoted indicates that there is no evidence that the extra test provides an increase in detection. The cervical screening programme has progressively improved, partly through the auspices of a national framework. A similar national approach would benefit Down's screening and is only now being considered: the national screening committee (NSC) is currently drafting recommendations. To ensure optimum screening performance, the NSC should specify the risk thresholds applied, the screening protocols to be used—that is, an opt-in programme with a minimum (possibly even a maximum) of two biochemical analytes or a nuchal fold evaluation—and perhaps should even recommend national population parameters to be used for risk calculation. It might even be advisable for statistical work to be carried out to determine whether local derivation of medians is truly necessary. Furthermore, defined options for older women could be specified—for example, should all older patients have the option to proceed directly to amniocentesis if they wish or should National Health Service amniocentesis only be available for those with a “high risk” screening result. The difficulties that will face the NSC in deciding which screening policy to adopt are also considered; specifically, the lack of evidence to suggest that triple testing is superior to double testing, and the lack of evidence to prove the superiority of one analyte over another. This inadequacy of evidence is not from want of trying, but is caused by the problems of collecting enough data to provide statistical significance. Finally, there is one important difference between cervical and Down's syndrome screening that has a major impact on the advice given by any “expert”; namely, patents. Many aspects of Down's screening are subject to patents and, therefore, there is more potential for apparently uncontroversial decisions to rebound with future retrospective patent infringement claims. Thus, it would be sensible to insist that any member of a national body deciding upon Down's screening policy must fully disclose all potential conflicts of interest, both personal and family, before they are allowed to sit on the committee. Furthermore, if a national policy is decided upon, worldwide patent searches should be carried out to determine whether there are any possible unforeseen legal consequences of any recommendation.

• trisomy 21
• screening
• nuchal fold

## Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Down's syndrome screening became feasible in 1988 when the seminal paper on the subject was published.1 The first routine National Health Service (NHS) Down's syndrome screening programme using biochemical markers began in February 1990 in Newport and Cardiff. This was initially run as a one year prospective trial, which demonstrated that screening was acceptable to both doctors and the local population.2 By 1998, 72% of UK health authorities offered serum screening for all women, 10% offered age restricted serum screening, and only 8% restricted screening to age alone. Most of those offering biochemical screening use just two markers (α fetoprotein (AFP) and either total or free human chorionic gonadotrophin (β-HCG)). Nuchal fold screening was provided by 7% of health authorities to women of any age.3

Dutch researchers have recently called for all pregnant women to be offered screening for Down's syndrome in the first trimester of pregnancy using a combination of maternal age, ultrasound, and biochemical markers. Currently, the only legal method for Down's screening in the Netherlands is maternal age.4 However, serum screening has been available in the Netherlands for many years, despite government disapproval, although not reimbursed by the health care system.5 Pressure for change has come from a publication of the Amsterdam Medical Centre's screening results, suggesting an 85% detection rate for a 5% false positive rate.6 In the UK, there has been little government pronouncement on the issue of Down's screening.

Since its inception, Down's screening has been controversial. This is primarily because pregnancy is a very common condition and a child with Down's syndrome is seen as a disaster by many potential parents. The frequency of pregnancy means that the test has to be carried out on a very large number of people. Consequently, there is a large market for reagents and even a small profit margin can net a large income, in several different ways. Furthermore, the impact on parents means that Down's screening is one of the few situations where failure of a midwife to collect a sample, or failure of the sample to arrive at the laboratory, can and has resulted in litigation in which the screening laboratory has been the defendant for others' negligence.J Clin Pathol 2000;53:894–899

## Patents

Recently, the Biomedical Patent Management Association filed a patent infringement suit and retrospectively demanded $5 for every Down's syndrome screening test that had been carried out in the USA using HCG (the patent did not apply in the UK). The state of California alone had screened 12 million women at this time and the ongoing additional cost to their screening programme would be$1.7 million/year.7 At the same time, it was asked whether the possession of patents affected the impartiality of persons promoting various screening modalities, with the suggestion that patent holding without clear declaration is unethical. Indeed, concern about such commercialism was expressed almost from the very beginning of Down's screening.8

## Software and population parameters

Recently, it was reported that there had been a computer problem at a Sheffield laboratory, which had resulted in several patients being given incorrect Down's risk assessments. Software has provided great problems for Down's screening. There are several commercial software packages available, most of which perform their function extremely adequately, but most have the disadvantage that they are stand alone packages, not integrated with the main laboratory computer. This not only leads to extra expense in terms of hardware but also to a potential for data loss because, despite exhortations to the contrary, stand alone PCs are rarely backed up as assiduously as they should be. A further problem is that such software is often a “black box” and users rarely understand how the results are generated. Consequently, when the National External Quality Assessment Scheme (NEQAS) sent MoM (multiple of the median) results on dummy patients for Down's syndrome risk calculation to participating laboratories, the coefficient of variation of the estimated risks was 36.5% for double screening with AFP and total HCG, 31.5% for double screening with AFP and free β-HCG, and 55.6% for triple screening with AFP, HCG, and unconjugated oestriol (uE3).9 These variations were sufficient to change the estimate of risk from clearly indicating amniocentesis to clearly precluding amniocentesis, simply as a result of software variation. Much of this variation is likely to result from different embedded population parameters in the software.

The difficulty with population parameters extends further than just the ability to alter values recorded in software. There is a multiplicity of different population parameters available, but no one can be sure which parameters should be applied to their population. There are several schools of thought. Essentially, some believe only locally derived population data should be applied, which leads to extreme difficulties in deriving local Down's syndrome population parameters, whereas others believe that published population parameters are acceptable. Herein lies another problem: there are many different sets of population parameters available; which should one use? Furthermore, the decision on screening parameters can be affected by commercial interests because if the parameters published for one analyte are given incorrectly, these can falsely improve the apparent performance of another analyte and indicate that extra screening tests must be added.10 In the case of one analyte, the failure of presentation of important data in a publication was also queried.11,12

## Analytes: which and how many?

This is another very controversial area. As stated above, most screening laboratories use a double test because AFP alone is only marginally effective; some add a third analyte, uE3, and still fewer add a fourth analyte, inhibin A.3 In the past, up to five analytes in a single screen have been suggested.13 The arguments for increasing the number of analytes always hinge upon gains in detection rate, with no increase in false positive rate, which can then be used in cost effectiveness analyses to justify the extra expense of the third assay.1,13–23 The evidence against adding a third analyte, which includes results published by the manufacturer of the first second trimester optimised uE3 assay (other uE3 assay kits are now available), either shows no increase in detection or a loss of detection,24–32 or questions the validity of the assay.33 Furthermore, some researchers have reported evidence in favour of the use of uE3 in one study16 but against it in another.27

Why should there be such a major split in the literature about such a fundamental question? The simple answer could be that the difference in performance is close to zero, but that each individual study is too small for any observed differences to be significant. Most studies describe no more than 100 cases of Down's syndrome. With minor variation for maternal age distribution, the SD of the detection rate, assuming a fixed 5% false positive rate, can be estimated as 38.9/√—N for a triple test and as 34.8/√—N for a double test.34 For example, for a study containing 20 Down's cases with a detection rate of 67%, the SD of a triple test detection rate would be 38.9/√—20 = 8.7%, giving rise to an approximate 95% confidence interval of 67% ± 2 × 8.7% or 49.6% to 84.4%. Because most studies estimate benefits of only a few per cent, it is clear that despite the large number of studies carried out, there is in fact no evidence that triple testing is superior to double testing because in every study the overlap of confidence intervals is so great that there is no true statistical difference between the detection rates in the different arms of the study! Studies that claim to show a benefit from adding a fourth analyte are similarly inconclusive.

The next area of controversy is whether free β-HCG is superior to total HCG. There is a large literature suggesting that it does achieve a higher detection rate,11,29,35–39 with some dissenting voices.23,40–44 Others have identified problems of monoclonal antibody specificity measuring different variants (nicked and non-nicked) of free β-HCG,45 and have queried the validity of free β-HCG results because of sample instability.46–51 In common with the difference between a double and a triple test, all studies demonstrating the benefit of free β-HCG have been small. Therefore, there is once again no evidence that free β-HCG is significantly superior to total HCG.

Notwithstanding the worries about free β-HCG sample instability, this lack of clear evidence either way for free β-HCG or total HCG probably explains the split between laboratories found in Wald's survey.3 Several of the random access immunoanalysers (for example, Abbott AxSym, Bayer Immuno1, Chiron ACS) carry out total HCG assays, whereas others (for example, Wallac DELFIA) carry out free β-HCG assays. The choice of Down's syndrome screening assay is thus influenced by the department's other immunoassay needs, rather than by any specific Down's syndrome screening related imperative. The choice to use more than two analytes is also governed more by resource availability than by scientific concerns. However, in the current financial situation, the NHS should consider seriously whether the evidence available warrants the extra spending on more than two assays for second trimester Down's screening. This should be considered urgently by the National Screening Committee (NSC).

## When should screening be done: chemistry v ultrasound or both?

Currently, most Down's screening takes place in the second trimester using biochemical markers. The next big debate in respect of Down's screening is when the test should be carried out and how: biochemically, ultrasonographically, or by both modalities. It has also been suggested that testing should be carried out in both the first and second trimesters, with results being imparted after the second trimester result.52 This approach has been criticised ethically and scientifically,53,54 but is too recent for its impact to be identified.

First, let us assume that first trimester testing is better for the patient and consider the potential markers. The most effective first trimester biochemical markers appear to be free β-HCG and pregnancy associated plasma protein A (PAPP-A), but other substances such as dimeric inhibin A have also been shown to be markers.55–66 However, some data suggest that this is only a weak marker in the first trimester and is better in the second.67 The other available marker is nuchal fold translucency, an ultrasonographically measured parameter that can be used without biochemistry to assess the risk of trisomy.68

If nuchal translucency alone could be used as a screening test, it would offer immediate superiority to biochemical screening because the measurement, risk calculation, and result presentation could all be carried out in a single session. However, the UK multicentre project68 demonstrating the “superiority” of nuchal fold screening had a major flaw: it was an interventional study—if an anomaly was identified action was taken. Consequently, the 80% detection rate presented in the paper was biased because the numerator (the Down's cases detected) was inflated by those Down's cases that would have spontaneously aborted and the denominator (total of cases detected and born) was similarly incorrect. Consequently, the true detection rate for nuchal fold screening should have been approximately 60% for a false positive rate of 5%,69–71 which is comparable with most second trimester screening programmes. Furthermore, nuchal translucency can be used to screen for other major defects, especially cardiac anomalies, which may indicate that it identifies those Down's fetuses that are most likely to abort spontaneously.72,73 Consequently, the currently available evidence is not sufficient to identify whether nuchal fold screening, instead of biochemical screening, is a better way to spend the limited funds available.

Biochemical studies of first trimester screening have usually been retrospective so do not suffer the same fetal loss bias as the nuchal fold study. These give detection rates comparable with second trimester studies. Using the same statistical method as described above, there is therefore no evidence that first trimester biochemical screening provides superior detection. Any decision to change must be based on the evidence of the patient benefits of early screening. One difficulty that will be encountered if it is decided to introduce first trimester screening is that many women do not present to their doctor early enough for first trimester screening to be carried out. This means that it might be necessary for laboratories to offer both first and second trimester screening tests.

Finally, would it be possible to combine nuchal fold and biochemical screening? This appears to provide the best opportunity to improve detection rates. Some impressive detection rates have been quoted, but often with high false positive rates or very small samples (for example, an 87.5% detection rate for a 14% false positive rate61; 100% detection in women aged < 35 years and 92% for women aged > 35 years for a 5% false positive rate, although there were only six and 12 cases, respectively64; and a 76% detection rate for a 5% false positive rate65). One trial using stored sera from 210 Down's syndrome cases suggests that a detection rate of 89% could be achieved at a false positive rate of 5%.74 Unfortunately, because the cases came from the UK multicentre project,68 the detection rate estimate suffers from the same flaws. Consequently, the detection rate estimate cannot be compared directly with second trimester detection rates for the same reasons as described above. Empirically, 20% of detection was excluded from the multicentre results and a similar amount could be excluded from this trial to give a 69% detection rate for a 5% false positive rate. This is in a similar range to some estimates of second trimester detection rates, and given the wide confidence intervals for detection rates, we again cannot be certain that it offers any advantages.

At present, the benefits of combined nuchal fold and biochemical screening can only be estimated from models until prospective trials have been carried out. Model based screening programmes are often contentious and attract statistical comment.75,76 However, it will be difficult to carry out truly prospective trials in the future because the established “standard of care” now includes prenatal diagnosis and an offer of termination.

The Royal College of Obstetricians and Gynaecologists organised a study group to consider the issue of first trimester screening. The report77 concluded that: (1) preliminary data supported the use of nuchal fold translucency screening in centres with experienced ultrasound staff who participate in an external quality assurance scheme and are subject to regular audit; (2) the combination of biochemical and ultrasonographic markers might prove superior to either marker type alone; and (3) a national screening policy should be implemented. The final verdict is yet to be decided . . .

## Progress towards national coordination

The NSC antenatal subgroup is currently considering Down's screening and draft recommendations are soon to be published. These are rumoured to include a recommendation that a standardised risk threshold is used to determine who should be considered “high risk”. It is likely that a 1 : 250 risk cut off will be chosen as the national risk threshold because this is the most commonly used.78

## Conclusions

Since antenatal serum screening for Down's syndrome began in 1988 there have been a great many developments but, despite occasional calls for a national screening policy, no coordination has yet been imposed. Consequently, even when screening is performed, there may be wide differences in the risk cut offs applied and thus the detection rate.78 For example, in one village local to my laboratory, it is possible for screening samples to be sent to one centre, which has a 1 : 250 cut off, another with a 1 : 150 cut off, and another with a 1 : 100 cut off. This is confusing for patients and could be a source of litigation. The changes in other screening services, especially cervical screening and the advent of NICE (National Institute for Clinical Excellence) and CHIMP (Commission for Health Improvement) have shown that not only does national policy improve effectiveness but there is now a political wish for change. It is therefore excellent news that the NSC is to implement recommendations to ensure consistency of screening performance.

However, it is essential that the difference between Down's screening and cervical screening is recognised and acted upon. It is clear that researchers have the right to patent their findings but, equally clearly, if they are to be involved in making national policy, any potential benefit to them or their families from those patents should be known to other members of the panel. This will protect them from later allegations that their advice to the panel was subject to bias if it is to their financial benefit.

The NSC is clearly going to deal with the issue of risk thresholds but there are several other issues that are vital to ensure true parity of screening. Down's screening relies on software interpretation of results by comparison with a set of standard population parameters. Because the many studies carried out have produced a wide range of different parameters for each analyte it is possible for one set of results to generate many different risk estimates, as has been demonstrated by NEQAS studies. I believe it would be entirely rational for the NSC to define a small number of acceptable “national standard population parameter” sets, which should be regularly reviewed to take account of new developments. This parameter set would include the population mean and SD for unaffected, Down's syndrome, and trisomy 18 pregnancies for each analyte, with defined weeks of gestation for which the parameter set is valid. Some analytes, such as PAPP-A, which change throughout the period for which they are useful, might need a more complex definition, including a calculation to account for gestational age. This is one area where complete impartiality/disclosure of interests would be essential because the incorrect setting of one parameter can falsely indicate that extra screening tests are useful.10

Defining national population parameter sets would of course have a knock on effect on software. Some software packages are black boxes that have their own defined population parameters that are not accessible to the user. It would be essential for the NSC to have the authority to order software manufacturers to alter their population parameters or to specify which features make software acceptable, with loss of accreditation for screening if inappropriate software is used.

The implementation of a national screening framework would also have to include some strategy for audit. This would be easier if screening were carried out by large units because there would be less coordination required to link cytogenetic and laboratory/ultrasonography results. The NSC may therefore consider a minimum workload rule, similar to that used for cervical cytology screening centres. As a laboratory which only processes approximately 2500 Down's samples each year, I believe that small (local) is good because we can return results in hours when sending samples to a large centre would take days. It may be better for a centrally funded audit programme (essentially a central, virtual laboratory) to be developed, which would collect screening results from individual laboratories (allowing comparison of risk estimate distributions against the mathematically derived theoretical distribution), collect cytogenetic reports to match with screening tests, and collect birth defect registry reports to assess false negative results.

The final task of a national audit group should be to determine whether the current practice of every centre setting its own local medians is really necessary. At present, all laboratories tend to derive their own calculation factors and may use different strategies, resulting in different MoM distributions when calculations are completed. The aggregation of data nationally would allow statistical analysis to determine whether medians could be specified for particular batches/brands of assay.

## Declaration

I declare the following personal and family interests in Down's screening:

1. Patents: none held.

2. Research grants: none currently held. One grant for 30 000 Swiss francs held in the past.

3. Software: Downcalc, DOS based software, non-millennium compliant. Previously made available at cost varying from £0 to £50 (profit approximately £500). RiskCalc, Windows based. Software provided free of charge if for research purposes only. Seven copies sold (profit after cost of computer upgrade £700).

4. Consultancies: in the past, I have acted as a paid consultant on behalf of several diagnostic companies in respect of Down's syndrome screening developments. I am not currently acting as a consultant for any company in this area (total income £5000).

5. Medicolegal: I have acted as a paid expert witness in one case of litigation in respect of Down's syndrome screening.