How providers of external quality assessment (EQA) programmes relate to and interact with the monitors and watchdog of clinical laboratory performance in the UK is described. With regard to the quality of antibiotic assays, the changes in methodologies and in performance quality between 1971 (when the UK NEQAS for Antibiotic Assays began) and 1999 is reviewed. How improvements in performance and changes of methodology are related is discussed. The findings and conclusions of two experimental pilot EQA distributions (the teicoplanin assay and serum bactericidal test) are also discussed.
- external quality assessment
- antibiotic assays
- United Kingdom National Quality Assessment Schemes
Statistics from Altmetric.com
External quality assessment (EQA) is one of many procedures used in a laboratory to ensure that its tests are accurate and reproducible and its interpretations are appropriate. In the UK, EQA in clinical laboratories involves three organisations:
The providers of EQA programmes, and UK National Quality Assessment Schemes (UK NEQAS) are pre-eminent in the UK.
The monitors of laboratory performance, the National Quality Assurance Advisory Panels (NQAAP). Each EQA provider is required to report poor performing UK clinical laboratories to an appropriate NQAAP.
An independent watchdog, the Joint Working Group on Quality Assurance (JWG), to provide the necessary safeguards of quality without the requirement for mandatory licencing or legislation.
Some explanation of how these organisations work together is necessary to understand how EQA functions in the UK.
External quality assessment programmes in the UK began over 30 years ago when interested professionals began to look at the quality of laboratory tests through the use of interlaboratory comparisons. The results of these studies revealed that, with many investigations, there was so little consensus between laboratories that urgent action was needed.1–3 This led to the establishment of UK National Quality Assessment Schemes (UK NEQASs) for clinical chemistry in Birmingham and for haematology in London some 30 years ago. Subsequently, further UK NEQAS programmes were initiated in several centres organised by experts in the various fields. Their common aim was to improve the reliability of laboratory investigations through the use of homogeneous samples distributed to many laboratories, with the programmes designed to be educational rather than punitive. In the 1990s, the individual schemes formed a consortium to take collective responsibility for maintaining the professional standards and characteristics of UK NEQAS through adherence to an agreed code of practice. In 1995, the consortium formed a charity (Pathology Quality Assessment (PQA) Ltd), the principal objective of which was “to advance education and promote the presentation of good health by providing external quality assessment services for clinical laboratories”. PQA is run by an executive elected by the UK NEQAS consortium from members of each division of laboratory medicine and is accountable to the full membership. A UK NEQAS board, comprising the executive and four additional advisors from purchasers and the professions to provide external accountability, held its inaugural meeting on 15 December 1999. All UK NEQAS programmes are funded through annual subscriptions on a non-profit making basis. Any EQA scheme can apply to join the UK NEQAS consortium but must agree to uphold the UK NEQAS code of practice.
The National Quality Assurance Advisory Panels (NQAAPs) are professional groups that have responsibility for maintaining satisfactory standards of analytical and interpretive performance in all (private or public) UK laboratories that perform investigations for the detection, diagnosis, or management of disease in humans. Each panel comprises representatives of the Royal College of Pathologists, the Institute of Biomedical Sciences, and other appropriate professional bodies. The chairman reports back to the Joint Working Group on Quality Assurance (JWG). Currently, there are five panels: chemical pathology; haematology; histopathology and cytopathology; immunology; and medical microbiology.
NQAAPs are not part of the UK NEQAS organisation, UK NEQAS and other approved EQA providers in the UK are required, using agreed measures of performance, to bring poor performing laboratories to the attention of the appropriate panel. The panel usually contacts a poor performing laboratory to offer help and advice. Laboratory identity is not revealed to the panel unless poor performance continues, in which case the chairman contacts the laboratory with the aim of resolving the problem. An organiser would normally have had correspondence with a laboratory before involving the appropriate panel.
The JWG in an independent professional watchdog comprising representatives of the professional bodies associated with laboratory medicine. Its major roles are:
To observe, support, and monitor the activity of NQAAPs.
To deal with complaints regarding EQA, which have not been satisfactorily resolved by the EQA provider or the appropriate NQAAP.
To consider suggestions for modification, development, or expansion of EQA in any particular area.
To formulate and update the process of handling unsatisfactory performance of UK clinical laboratories.
The JWG currently comprises 19 members representing 12 professional bodies (Association of Clinical Biochemists, Association of Clinical Cytogeneticists, Association of Clinical Pathologists, Association of Clinical Microbiologists, British Blood Transfusion Society, British Society for Clinical Cytology, British Society for Haematology, British Society for Immunology, Clinical Molecular Genetics Society, Institute of Biomedical Science, Pathological Society of Great Britain and Ireland, and Royal College of Pathologists), NQAAP chairmen, and observers from the Department of Health and Clinical Pathology Accreditation (UK) Ltd.
History and origins of the UK NEQAS for Antibiotic Assays
When the aminoglycoside gentamicin was introduced for clinical use, medical microbiology laboratories became aware of the need to assay serum gentamicin concentrations to optimise efficacy while minimising ototoxicity and nephrotoxicity. David Reeves decided to exchange specimens of gentamicin in serum with three other laboratories in 1971 and 1972, and the assay results showed such a worrying lack of interlaboratory agreement that the British Society for Antimicrobial Chemotherapy (BSAC) and the Public Health Laboratory Service (PHLS) funded some larger surveys in 1973 and 1974.
Table 1 summarises the developments in the scheme. By the mid 1970s there were around 300 participants. This number is almost the same today, despite continued laboratory rationalisation and amalgamation, but now about 12% are non-UK laboratories. The scheme has its own web site (www.ukneqasaa.win-uk.net) with links to the UK NEQAS home page (www.ukneqas.org.uk).
How performance is scored and monitored by the UK NEQAS for Antibiotic Assays
THE STATISTICAL BASIS OF ASSESSING PERFORMANCE
Like many EQA programmes, the UK NEQAS for Antibiotic Assays chose to score performance statistically by taking account of both the accuracy (closeness to the true concentration) and reproducibility (degree of variability when the same sample is assayed several times) of clinical antibiotic assays.
Accuracy (or perhaps more appropriately inaccuracy) can be expressed in terms of bias, which may be positive (consistently above the true concentration), negative (consistently below the true concentration), fixed (always x mg/litre), proportional (always x%), or variable. Thus, every individual assay result will have an associated bias. The scheme chose to call this bias the “%error” for a particular result; some other EQA programmes use the term BIAS.
Reproducibility (or perhaps more appropriately irreproducibility) can only be calculated if more than one determination is made and, rather than ask a laboratory to assay the same sample more than once (because this would allow for the possibility of “trimming” to improve the data—ignoring the worst replicates), the scheme chose to base the determination of reproducibility on a batch of six samples of differing concentration. The sample standard deviation (SD) of the mean %error (mean BIAS or MBIAS) for the six determinations was chosen as the measure of reproducibility. Some other EQA programmes use the term VAR for this parameter.
DEFINING ACCEPTABLE AND POOR PERFORMANCE
It was considered important that the definition of poor performance should be based on medical need rather than purely on statistical parameters, and Reeves4 suggested that a clinical aminoglycoside assay result should be within 25–30% of the true concentration if it was to be sufficiently reliable to be used as the basis for making a clinical decision (that is, a dosage correction). This definition was made at a time when many laboratories would have had difficulty achieving it (see below), but remains unchanged today and is used as the basis for determining poor performance of all analytes distributed by the scheme.
Taking the returns of an individual laboratory for six samples, the mean %error (MBIAS) and the SD about this mean (VAR) are calculated. If the MBIAS is negative the modulus is taken (that is, the sign is ignored) and this value is added to twice the SD to create the so called MEAN +2 SD. Laboratories scoring a MEAN +2 SD of 30 or less are considered to be performing satisfactorily (table 2) and are given a score of +2 (this scoring system is currently harmonised with the scoring system of the UK NEQAS for General Microbiology) and this is considered acceptable performance.
In simple language, a MEAN +2 SD of X (where X is 30 or less) approximates to: “this laboratory is performing satisfactorily and 95% of the time the results from this laboratory will be within ± X% of the true concentration”.
Trends in methodology
In the early 1970s, bioassay was the most commonly used methodology, and of the three bioassay techniques in use when the scheme began (plate assay with Gram negative indicator strain, plate assay with Gram positive indicator strain, and a broth dilution method) the Gram negative plate rapidly became dominant. The broth dilution method disappeared by 1977 as a result of its poor performance5 (see below). Non-bioassay methods were developed to overcome the shortcomings of bioassays. A method based on aminoglycoside modifying enzymes and radioactive cofactors, the transferase method,6 appeared in the mid 1970s but never gained wide appeal, possibly because of the need for a scintillation counter. This method disappeared in the mid 1980s, a few years after a commercial enzyme immunoassay (EMIT) for gentamicin was launched in the UK. Despite the expense of EMIT kits, the method rapidly gained popularity, becoming the pre-eminent method by 1984; its rise being mirrored by the decline of bioassay usage.
A commercial latex agglutination assay kit (Macro-Vue) appeared in 1979, and because it was inexpensive compared with EMIT and did not require expensive equipment there was a view that it might become very popular. However, it never achieved this and was withdrawn from the market after 1985.
Fluoroimmunoassays appeared in the 1980s and these comprised commercial kits based on somewhat differing techniques, including quenching fluoroimmunoassay7 and substrate labelled fluoroimmunoassay.8 However, the whole field was revolutionised in the early 1980s when Abbott launched its fluorescence polarisation immunoassays (FPIAs) and the TDX analyser. Despite high costs the assays were easy to perform, very fast, and highly reproducible. The Abbott FPIA performed on the TDX (or latterly the FLX analyser) rapidly became the most popular method for aminoglycoside assays, a position it still holds today. Now, third party kits for use on the Abbott analysers are available and FPIA remains the method of choice for most laboratories for gentamicin, tobramycin, netilmicin, and amikacin assays.
The trends are summarised in table 3. Demand for vancomycin assay samples has steadily increased and FPIA has been by far the most popular method since 1985 when its users first outnumbered bioassay users. High performance liquid chromatography (HPLC) has never been popular for vancomycin assays. The EMIT vancomycin assay that appeared in the 1990s never became very popular, despite its improved specificity for microbiologically active vancomycin compared with the Abbott FPIA for the TDX/FLX.9 This is probably because of the decline in the use of EMIT for aminoglycoside assays and the associated decommissioning of appropriate analysers.
Trends are illustrated in fig 1. Chloramphenicol was added in 1982 initially with 20 participants (14 bioassay, six HPLC). The EMIT assay appeared in 1987, participation continued to increase, and within a year EMIT had become the most popular method. In contrast to vancomycin (see above), the numbers of laboratories performing chloramphenicol assays has, after peaking in the mid 1980s, steadily declined. The main reason for this is the reduced number of clinical indications for chloramphenicol treatment as a result of the availability of newer less toxic alternatives (such as third generation cephalosporins and fluoroquinolones). The Gram positive plate method has disappeared and very few laboratories continue with any form of bioassay. EMIT popularity peaked in the late 1980s but declined rapidly in the 1990s, possibly because most laboratories that stopped assaying chloramphenicol were EMIT users, but also because of a change from EMIT to HPLC for purely financial reasons. Southmead Hospital made this change back to HPLC because the reduced demand for chloramphenicol assays made running an EMIT analyser solely for this purpose uneconomical. By 1996, there were on average only 11 chloramphenicol returns, the bulk of which were by HPLC (one bioassay, three EMIT). HPLC is the method of choice for most of the laboratories that still assay chloramphenicol, and the scheme currently has only one EMIT user and one bioassay user. The UK NEQAS for Antibiotic Assays continues to supply chloramphenicol assay samples for those few remaining laboratories that still perform the assay. Performance is not currently formally scored (table 4) because of the small number of returns (< 10).
Trends are summarised in table 3. The numbers of laboratories assaying flucytosine have remained small. No commercial immunoassays are available for this drug and yeast bioassay and HPLC have vied for popularity. Both methods are in common usage but gas liquid chromatography (GLC) and fluorimetry10 have not been used.
Trends in performance
For the purposes of this article I have looked at trends in performance in three different ways, namely:
Mean laboratory scores have been tabulated for each analyte. In general, the nearer the mean is to 2 the better, but it must be borne in mind that when numbers of participants are small (for example, flucytosine and latterly chloramphenicol) one poor performing laboratory will have a large influence on the mean score.
Numbers of advisory letters sent by the scheme organiser, NQAAP, and the chairman of NQAAP to laboratories with performance problems.
The percentage acceptable and percentage poor performers using a particular method at a particular time.
MEAN LABORATORY SCORES FOR EACH ANALYTE
Table 4 tabulates the mean scores and overall mean scores. For every analyte, with the exception of amikacin, there was a consistent improvement after its introduction into the scheme, followed by a plateau at about 1.8 for those analytes assayed mainly by immunoassay and a somewhat lower plateau (1.5–1.6) for flucytosine and chloramphenicol. Despite the fact that most chloramphenicol assays were immunoassays (EMIT) assays in the late 1980s, mean scores never reached those achieved by the aminoglycosides and vancomycin.
NUMBERS OF ADVISORY LETTERS SENT
White5 summarised these data up to 1997 and they are updated here. Between 1986 and mid 1999, 513 organiser's letters, 155 NQAAP letters, and only six chairman's letters were sent to participants. Numbers of organiser's letters have, after initial high activity, remained relatively constant (> 30 in 1986 and 1987, between 16 and 24 each period up to 1992, between five and 23 after 1992). The number of NQAAP letters sent for each six monthly period has steadily declined (> 10 in 1986 and 1987, between five and 10 each period up to 1992, < 5 after 1992, < 3 after 1995, and only one in the whole of 1998). The chairman has not written to any laboratory since 1995. Because the purpose of the organiser's letters is to advise laboratories of a performance concern that will be brought to the attention of NQAAP if it continues, it appears that they are effective. These data all suggest continually improving performance, which is now of a high standard.
ACCEPTABLE AND POOR PERFORMERS USING A PARTICULAR METHOD
Taking the data for gentamicin assays as a model and scoring performance using the current criteria outlined in table 2 some interesting observations can be made.5 The tube dilution bioassay and the Gram positive plate had 93% and 43% poor performers, respectively, so it is not surprising that they rapidly lost popularity. The Gram negative plate, which dominated in the 1970s and early 1980s had fewer than 3% poor performers, but < 40% acceptable performers, leaving a very large number of borderline laboratories. The commercial latex agglutination immunoassay performed poorly (37% poor, only 15% acceptable), which might explain its withdrawal from the market. If the leading immunoassays (EMIT and fluoroimmunoassay) are examined, a clear difference between the 1980s and the 1990s can be seen. Over 60% of EMIT users were performing acceptably and just below 10% were performing poorly in the 1980s. In the 1990s there were no poor performers, yet acceptable performance was still only around 60%. In the 1980s, fluoroimmunoassay performance was not impressive, with < 40% acceptable performers and 15% poor performers (worse figures than Gram negative bioassay!). However, these assays were a mixture of various commercial kits, some of which performed better than others. By the 1990s almost all fluoroimmunoassay users used FPIA kits run on Abbott TDX or FLX analysers. Fluoroimmunoassay performance remains impressive, with around 90% of users performing satisfactorily and < 2% performing poorly. As stated above, with fluoroimmunoassay this improvement is mainly the result of a move to FPIA; with EMIT it might be related to a move from manual to automated assays, but the continued surveillance that EQA provides might also be important.
With regard to chloramphenicol assays only 32% of bioassay users performed well and 30% performed poorly, probably accounting for the rapid decline of this method. In contrast, both EMIT and HPLC showed over 70% of acceptable performers in the 1980s, with only 9% and 12% of poor performers, respectively. As with aminoglycoside immunoassays, both HPLC and EMIT chloramphenicol assay performance showed further improvement in the 1990s, with acceptable performance rising to over 80% and poor performance dropping to < 2% for both techniques.
With flucytosine assays, where HPLC and bioassay are equally popular, it is not possible to say that one method performs better than another. There are laboratories that consistently perform either method well. However, HPLC does have the advantage of speed over bioassay for clinical flucytosine assays.
Common reasons for poor performance
The single most common reason for poor performance of an antibiotic assay remains a gross error (which are termed “blunders”) not related to the method used. The two most common errors are:
Transposition of results.
Failure to multiply the result of a diluted sample by the dilution factor.
These types of mistakes can only be minimised by improved working practices, stringent internal quality checks, and detailed standard operating procedures. Human errors will always be made, but systems can be continually improved on the basis of experience to minimise and/or identify them. EQA, laboratory accreditation and the philosophy of clinical governance can only aid this process.
EQA samples do sometimes reveal unexpected problems. A participant a few years ago discovered their new laboratory computer system only allocated one digit before the decimal point for gentamicin assay results. Only when a UK NEQAS return that was more than 10 mg/litre had the first digit removed by the computer, plunging the laboratory score to −1, did the problem come to light! It is the first and only time a laboratory had thanked the organiser for a poor score!
Twenty two laboratories in the UK, Eire, France, Germany, and Switzerland were sent two distributions, each of six samples.11 Most laboratories (14 of 22) used an FPIA kit manufactured by Oxis (Portland, USA) for use with the Abbott TDX/FLX analyser, and performance was generally satisfactory with at least one distribution. Some laboratories used bioassay (three of 22) or HPLC (five of 22), and some of these performed satisfactorily, whereas others did not. Only seven laboratories performed acceptably with both distributions (five FPIA, one HPLC, and one bioassay). Currently, only a few UK laboratories assay teicoplanin, most assays being performed in reference centres. If in house assay becomes more widespread in the future, the distribution of EQA specimens might be indicated. Whether sufficient participants (> 10) could be recruited in the UK or whether a European wide circulation will be required is not clear at the present time.
Serum bactericidal titre (SBT) determination was a controversial test with many protagonists and antagonists. Before it was possible to assess its clinical usefulness it was considered necessary for the quality of the investigations and the interpretive criteria used by laboratories to be determined.
Two hundred laboratories completed methodology/interpretation questionnaires and were sent experimental samples; initially Staphylococcus aureus and a serum containing vancomycin and gentamicin, and subsequently penicillin sensitive and penicillin tolerant streptococci and a freeze dried serum containing penicillin.
The returns indicated that a very wide range of interpretative criteria, definitions of endpoints, and methodologies were used. When asked to define satisfactory pre and post dose titres in the management of α-haemolytic streptococcal endocarditis, 12 different sets of recommendations with five or more proponents were put forward, indicating an alarming lack of consensus! Approximately 75% of laboratories correctly determined a bactericidal titre using S aureus or a penicillin sensitive streptococcus, but only 34% correctly determined the bactericdal titre for a penicillin tolerant streptococcus.12
A second distribution comprised tolerant and sensitive streptococci, together with a control strain and methodology recommendations, plus a methodology mini questionnaire based around issues that the previous distribution had suggested could be related to good or bad performance. Multivariate regression was performed and, of 81 returns that were analysed, 24 gave a correct result for all three strains. Responses to only two questions were identified as predictors of a laboratory getting a correct result. These questions were:
Did you add a measured volume to the recovery medium? (p = 0.034)
The odds of getting all three strains correct was 5.8 (95% confidence interval (CI), 1.17 to 30.1) times more likely if the laboratory gave a “yes” reply to the first question, but less likely (odds ratio, 0.21; 95% CI, 0.05 to 0.98) if they replied “yes” to the second question. The data were insufficient to allow for an interaction between the questions to be tested.
Only 11 of the 200 original participants returned acceptable results for all five streptococcal strains distributed and these included the expert laboratories used for predistribution testing. In view of the lack of consensus for interpretation and the inability of > 60% laboratories to determine correctly a bactericidal titre with a penicillin tolerant strain, it is impossible to recommend the use of the bactericidal titre test and no further EQA distributions are planned. The BSAC Endocarditis Working Party has ceased to recommend the use of this test,13 based on the lack of evidence supporting its predictive accuracy of treatment outcome.
The data presented above show that since UK NEQAS started circulating samples for antibiotic assays there has been a considerable improvement in performance, together with a massive swing away from bioassay to immunoassay for those analytes (aminoglycosides, glycopeptides, and chloramphenicol) for which commercial kits are available. The EMIT immunoassay dominated the chloramphenicol assays at one time, but as demand for chloramphenicol assays declined and numbers of participants dropped, HPLC became the most popular method. Performance analysis showed that EMIT users performed similarly to HPLC users with regard to chloramphenicol assays. Demand for chloramphenicol assays has, as stated above, dropped dramatically because of changes in clinical practice and laboratory rationalisation, but has EQA of antibiotic assays been a “shrinking market” over the past 20 years? Table 5 summarises the picture. Although demand for the original analyte (gentamicin) has dropped slightly and that for tobramycin and netilmicin has dropped considerably, the new analytes amikacin, vancomycin, and flucytosine have shown a sustained growth in demand in the 1990s. The demand for vancomycin samples—for example, has doubled. Taking the addition of new analytes into account as an important factor, EQA activity in antibiotic assays has increased by nearly 50% since 1981 and nearly 15% since 1990. The UK NEQAS for Antibiotic Assays remains the largest EQA programme for antibiotic assays in Europe. As for the future, the addition of teicoplanin assays is likely if laboratories begin to assay this antibiotic more frequently, and the loss of chloramphenicol assays is a real possibility. It is possible that collaboration between UK NEQAS and other EQA providers will be the most fruitful way to deal with those analytes for which the number of provider laboratories is declining (chloramphenicol) or small (flucytosine) and the newer analytes such as teicoplanin.