Background: The original role of the National Health Service breast screening programme (pathology) external quality assessment (EQA) scheme was educational; it aimed to raise standards, reinforce use of common terminology, and assess the consistency of pathology reporting of breast disease in the UK.
Aims/Methods: To examine the performance (scores) of pathologists participating in the scheme in recent years. The scheme has evolved to help identify poor performers, reliant upon setting an acceptable cutpoint. Therefore, the effects of different cutpoint strategies were evaluated and implications discussed.
Results/Conclusions: Pathologists who joined the scheme improved over time, particularly those who did less well initially. There was no obvious association between performance and the number of breast cancer cases reported each year. This is not unexpected because the EQA does not measure expertise, but was established to demonstrate a common level of performance (conformity to consensus) for routine cases, rather than the ability to diagnose unusual/difficult cases. A new method of establishing cutpoints using interquartile ranges is proposed. The findings also suggest that EQA can alter a pathologist’s practice: those who leave the scheme (for whatever reason) have, on average, marginally lower scores. Consequently, with the cutpoint methodology currently used (which is common to several EQA schemes) there is the potential for the cutpoint to drift upwards. In future, individuals previously deemed competent could subsequently be erroneously labelled as poor performers. Due consideration should be given to this issue with future development of schemes.
- EQA, external quality assessment
- NHSBSP, National Health Service breast screening programme
- RCPath, Royal College of Pathologists
- external quality assessment
Statistics from Altmetric.com
- EQA, external quality assessment
- NHSBSP, National Health Service breast screening programme
- RCPath, Royal College of Pathologists
The success of the UK National Health Service breast cancer screening programme (NHSBSP) is dependent on a high quality service. This programme required from the outset a commitment from the NHS to a systematic programme of quality assurance involving all stages of the screening process. For pathologists in particular, this required for the first time mandatory participation in an external quality assessment (EQA) scheme. The scheme was introduced nationally in 1990, with the objective of improving the consistency of pathologists reporting breast disease.
“The Royal College of Pathologists’ recommendations are not prescriptive with regard to the type of scoring system to be used”
In the late 1990s the Department of Health, in conjunction the Royal College of Pathologists (RCPath) and the Association of Clinical Pathologists, established a working group to make recommendations for the future of EQA (in histopathology and cytopathology). Interestingly, the RCPath recommendations1 are not prescriptive with regard to the type of scoring system to be used. Indeed, because of the variation in the case mix and case difficulty, different methods are in operation in different schemes.2 A key problem with EQA in histopathology is defining the correct diagnosis. In most schemes, the consensus answer is regarded as the correct diagnosis. Furthermore, for most schemes only cases that achieve 80% or more agreement among participants with regard to the diagnosis are used to determine the performance scores of candidates. Cases with less than 80% agreement are rejected for the purposes of performance assessment.
The RCPath report does not recommend a single minimum score. However, the document did recommend that the best approach was to compare an individual’s score with those of their peers. There remains controversy over what can be deemed a poor level of performance in EQA.3,4
Apart from the NHSBSP EQA, which is compulsory for pathologists who participate in breast screening, other histopathology EQAs have been voluntary. This has started to change, and in future participation in EQAs will be required by the General Medical Council as part of a consultant’s portfolio to demonstrate revalidation.5 Furthermore, because the NHSBSP EQA is about to obtain CPA (Clinical Pathology Accreditation) accreditation (in 2005), the scheme will be able to implement the action points for identified poor performance. This involves a graded incremental response that includes advice, education, and if necessary investigation and intervention.
Here, we document our experience with the scheme in recent years and discuss some of the implications for the future.
MATERIALS AND METHODS
Data for our study were obtained from the NHSBSP EQA. A general description and details of standard operating procedures have been published recently.6 In brief, over 50 sets of 12 cases are circulated to pathologists in the UK who report breast pathology over a three month period. Each case comprises one haematoxylin and eosin stained slide representative of the lesion removed from a patient. The cases are submitted by participants for use in the scheme. The participating pathologist independently examines the slides and for each case completes a tick box proforma, which includes their opinion on the diagnosis. Where appropriate (for example, for carcinomas), additional information regarding tumour size and grade can be recorded. Each pathologist has a unique identification code number, which they record on each submitted proforma. These completed proformas are returned to the cancer screening evaluation unit (Institute of Cancer Research, Sutton, UK), where the participants’ opinions for each case are collated. This procedure is repeated twice a year.
The proforma returns are analysed after the closure date for slide circulation.
Cases are scored as follows.
Each diagnosis is categorised into one of four possibilities. A maximum of three points is awarded when the participant agrees with the consensus answer. A score of two points is allocated if the participant deviates by one diagnostic group and one point if the participant deviates by two groups (table 1).
Thus, for an invasive carcinoma (consensus diagnosis), a participant would score three points if this was the offered diagnosis or two points if in situ carcinoma was offered. One point would be awarded for atypical hyperplasia and none if a benign diagnosis was tendered.
For each participant, the points for each case are totalled and the results (final score) expressed as a percentage as follows:
score (%) = total points/(number of cases × 3) × 100.
Scores are rounded to the nearest whole number. For the 12 cases, possible scores are 100 (36/36), 97 (35/36), 94 (34/36), 92 (32/36), etc. The fact that there are no scores of 95, 96, 98, or 99 etc. is an artefact of the scoring system. Very occasionally, a few pathologists do not examine all 12 cases and this can lead to other scores, because the divisor will be less than 36.
Participants receive feedback in the form of a document after each EQA circulation. This document details individual scores in the form of a list of participant code numbers and score obtained, together with consensus diagnosis and specialist opinion of the cases. There is also a regional feedback session at which pathologists meet and are able to review cases retrospectively, together with the consensus diagnosis and specialist opinions.
These can be established using rank order and percentile methodologies. Both these methods are in use in histopathology EQAs in the UK. Since 2003, the fifth percentile method has been used by the NHSBSP EQA.
In the rank order method, scores are placed in ascending order and the cutpoint established by the number of participants in the bottom X% (for example, 2.5%). In contrast, in the (per)centile method, the cutpoint is defined as the score below (but not at) a designated value. Centiles are obtained by calculating:
q = k (n + 1)/100, where k is the centile and q is the ranked observation. For example, the fifth centile of 149 observations is q = 5 × 150/100 = 7.5. Therefore, the fifth centile is half way between the seventh and eighth observations. If the seventh observation is 97 and the eighth is 107 the fifth centile is 102. Both methods suffer from the problem that the cutpoint is often in the middle of a group, so the cutpoint is taken below that group.
We present the data for the percentile method. The rank order and percentile methods usually give similar results, although very occasionally small differences can occur. When this is experienced this is highlighted in the results tables.
A new method for establishing cutpoints has recently been proposed using outlier statistics (DM Parham. Performance assessment of histopathologists. Dissertation, University of Warwick, 2003),3 and this approach is also shown here. This method was originally derived from the statistical graphing method known as the box plot. In brief, the modified box plot rule uses the interquartile range (Q3–Q1) as the basis for setting the upper and lower limits (fences) used to detect outliers. For performance in EQA the lower fence is used to establish the cutpoint and this is defined as Q1–k(Q3–Q1), where k is the outlier coefficient. Typical values of k are either 1.5 or 3 although other values could be used. When k = 3 the modified box plot rule is quite conservative at identifying outliers. Indeed, it can be calculated that for scores that follow a normal distribution, if there are 100 participants the probability of erroneously being labelled as an outlier is 0.3% (when k is smaller this percentage increases).
Additional data analysis was undertaken using Windows based software: Microsoft Excel 2000 (Microsoft Corporation) and software accompanying the book Confidence interval analysis.7
Performance profiles of pathologists in the NHSBSP EQA
Ten circulations of 12 slides were examined between the years 1998 and 2002. Each set was circulated to between 407 and 485 pathologists at six month intervals. Only cases that obtained a minimum of 80% agreement among coordinators with regard to diagnosis are used in the NHSBSP EQA assessment of performance (cases not achieving 80% consensus on the diagnosis are excluded from the measurement of performance). As shown in table 2, in some circulations this meant that up to three cases were excluded. In our study, the performance scores of pathologists was examined using this measure (retained cases). In addition, performance scores using all cases circulated were calculated.
The most recent circulation examined (2002, no. 2) is typical of the score profile for participants. Figure 1 shows the score distribution (for all cases), and demonstrates a range of scores, with one possible outlier scoring 86. However, there is a larger group of “borderline scores”; this group could be part of a “normal distribution” of scores and the lack of participants with a score of 95 or 96 could result from an “artefact” in the scoring scheme. The interquartile range methodology with both a stringent and non-stringent k value locates one participant below the cutpoint. The rank and percentile methodologies group 10 participants below the cutpoint (table 3).
When the data are examined using the retained cases only, the distribution of scores is tighter (fig 2). Both the rank order and percentile methodologies place a group of 10 participants below the cutpoint. This group is distinct from most of the other results. Because of the tightness of the distribution, it is not possible to obtain an interquartile range. However, using a 10–90th range with an appropriately adjusted (DM Parham. Performance assessment of histopathologists. Dissertation, University of Warwick, 2003)3 stringent k value, this method also identifies a similar group of participants below the cutpoint (table 4).
Tables 5 and 6 summarise the number of participants failing to make the cutpoints by the different methodologies for all cases and retained cases, respectively. Overall, there is no consistent difference between the numbers failing to make the cutpoints using the different methods and comparable thresholds.
Table 7 shows the average score for coordinators and non-coordinators for all cases; the difference in score between the two groups is an average of one percentage point.
Educational value of EQA
Between EQA circulations 1998 no. 1 and 2002 no. 1, inclusive, 199 new participants joined the EQA scheme. Data were obtained over subsequent EQAs for each participant. To preserve anonymity, their first result was zeroed and subsequent results were expressed either as an increase or decrease in score. For participants who joined at EQA circulation 2002 no. 1, there would be only one subsequent EQA score to follow.
Table 8 shows the improvement in scores for all new participants (based on all cases). Similar results (not shown) were obtained when the analysis was confined to the retained cases only (that is, cases obtaining greater than 80% agreement). In both cases, there was a marginal improvement in performance scores between the first and subsequent EQAs. In most subsequent EQAs the confidence interval for the mean improvement bridges zero.
It was possible to obtain separate data for the new participants with the lowest scores (defined subjectively from examination of the distribution of scores obtained from new participants). For the performance scores based on all cases, over the period examined, there was a group of 11 low scorers. The scores for these individuals were zeroed and their improvement (or deterioration) tracked over subsequent EQAs. Table 9 shows a consistent improvement in the second and subsequent EQA scores compared with their first scores. The results were reanalysed using only the test cases that achieved an 80% agreement among coordinators with regard to diagnosis. Review of performance scores showed a group of 12 low scorers. Table 10 shows the results in subsequent EQAs when these individuals were followed.
We analysed the scores of participants leaving the EQA scheme. Information regarding the reason for leaving (for example, retirement, change of employment, change of job description) is not collected by the data centre. Most individuals cease to participate without informing the data collection centre of their intention not to continue. For our study, individuals were deemed to have left the EQA scheme if they had not participated in three consecutive EQA circulations up to circulation 2003 no. 1. The last score for individuals meeting these criteria was obtained. For each EQA circulation the average score for leavers was calculated and compared with non-leavers. As shown in table 11, the average score of leavers tended to be slightly lower than that for non-leavers. However, table 12 shows that there has been no corresponding upward drift in the cutpoint over the circulations under consideration.
For EQA circulation 2002 no. 2, information regarding the number of breast cancers that the individual participants report annually was requested. This information was returned by 305 participants. Some pathologists in the scheme report only breast biopsies and resected breast specimens, whereas others are generalists reporting specimens from a wide variety of body sites including breast. This difference in work is reflected in the variation in the number of breast cancers reported (range, 45–900/year). No data were collected on the total workload of individuals. From fig 3 it is apparent that there is no correlation between the number of breast cancers reported and an individual’s performance score.
The results obtained in the EQA do not necessarily reflect the routine performance of pathologists in the breast screening programme or an individual breast screening unit. EQA cases may be assessed by some pathologists in a more diligent fashion than routine cases. It is also unusual for a pathologist to make a diagnosis solely on the basis of a single haematoxylin and eosin section, or not to have access to clinical information or the opportunity to take additional sections or to perform special stains that could aid in diagnosis. In addition, as part of the breast screening programme, cases are reviewed at multidisciplinary meetings, and the final classification takes account of this interaction and availability of additional information. Despite this false nature of the EQA environment, the methods used by the UK NHSBSH EQA are typical of many other histopathology schemes, and benefit from a simple standard methodology, which appears to be acceptable to participants.
During the five year period of study, 11 of the 120 cases used in the EQA circulations did not achieve an 80% consensus among more than 400 participants. This is not unexpected. In an Italian study using 81 breast cases, only 27% achieved a 100% consensus as to the diagnosis (using a similar four group classification) among 16 histopathologists.8 However, the remit of our study was not to analyse the error rate in histopathology.
“A fundamental issue with these tests is setting the standard between acceptable and unacceptable levels of performance, and this should be taken in the context of what is achievable and practicable, not setting a cutpoint that is too high or too low, but realistic”
The EQA reveals the consistency of reporting among the participants and gives a range of scores for pathologists reflecting their closeness to the peer group. However, it does not measure expertise: although there is a tail of scores representing those who perform less well (with regard to consensus agreement), there is little separation or discrimination between most pathologists. Furthermore, the difference (1%) in the scores obtained between the coordinators (who have a special interest in breast pathology) and other participants was small (although this suggests there may be room for improvement). However, this relative lack of difference could result from the fact that the EQA policy is that the consensus answer should be deemed the correct answer; this may not necessarily be the same as an “expert answer”. A negative aspect of this policy is that although interobserver consistency may be reassuring there is a risk of groupthinking9—the participants may all agree but be wrong (equally an expert opinion is not necessarily always correct!). This indicates that the EQA is not a measure of expertise but a measure of a common level of attainment (conformity to consensus). To balance this risk before formal publication of the results each case is reviewed by the coordinating group midway through a round, when approximately 50% of participants’ scores are available for evaluation. This review serves to give feedback to participants on the issues deemed relevant for each case. In addition, in the very rare circumstances where a case is believed to have been classified incorrectly by the majority, the case can be withdrawn from individual performance appraisal.
The literature regarding the setting of cutpoints for satisfactory performance is well established and has been reviewed by Cizeck.10 There is no gold standard method for establishing cutpoints. Jaeger compared 32 different methods and found little agreement regarding the level of the cutpoint established.11 This study also found differences in the numbers failing to reach the cutpoints by the different methodologies examined (the variation between different circulations could result from differences in the degree of difficulty of the slides used in each circulation).
Take home messages
Pathologists who joined the National Health Service breast screening programme external quality assessment (EQA) scheme showed improvement over time, particularly those who did less well initially
There was no obvious association between the number of breast cancer cases that participants report each year and performance, probably because the EQA does not measure expertise, but was established to demonstrate a common level of performance (conformity to consensus) for routine cases, rather than the ability to diagnose unusual/difficult cases
We propose a new method of establishing cutpoints using interquartile ranges, which can help identify poor performers
Those who leave the scheme (for whatever reason) have marginally lower scores
A fundamental issue with these tests is setting the standard between acceptable and unacceptable levels of performance, and this should be taken in the context of what is achievable and practicable, not setting a cutpoint that is too high or too low, but realistic. As stated by Kane et al, the standard should be high enough to provide protection for the public while not unnecessarily restricting the supply of qualified practitioners or excluding competent workers.12 The precision and objectivity of the cutpoint is crucial. In the UK, the present method of defining poor performance (and hence cutpoints) in histopathology EQA has received much criticism in the professional press.3,13 Failing the bottom 2.5% or 5% in each EQA is arbitrary and may not be appropriate, because it does not take into account the distribution of scores and what is achievable or practicable. If one assumes that most participants within the EQA are competent, and that the cutpoint should be based on the performance profile of the majority, the use of the interquartile method based on the scores obtained from the middle majority has a reassuring logic and rationality. This method would seem to solve many of the problems associated with the current methodology, as suggested elsewhere.3
The principal function of EQA when first introduced was educational, derived through the content and personal feedback, the aim of the EQA scheme being to raise standards. The potential educational value of EQA schemes was recognised by the Department of Health and the RCPath.14 Evidence to support this assertion to date is limited. Data from our study have shown that the educational value for most participants is marginal when considering the overall diagnosis (for example, benign versus malignant), and most noticeable after completing three circulations. Nevertheless, it has been shown that in specific areas (such as tumour grading) improved performance can be demonstrated by the EQA.15 However, an important feature demonstrated by our study is that individuals who perform less well in their first EQA improve in subsequent EQAs. This supports the key aim of EQA of raising standards. This improvement could be educational, through test content itself or subsequent self education. There are other possibilities for the improved scores and these include better familiarisation with the EQA scheme, the marking procedures, or the statistical phenomenon known as “regression to the mean”.
A potential unintended consequence of an EQA scheme is that, rather than striving to improve their performance, individuals with low scores would withdraw from the scheme. This possibility has not been examined in detail in the past. In our study, the scores of those leaving the scheme were, on average, marginally lower than those who remained in the scheme. Although there is the potential, with the cutpoint methodologies presently used, for an upward drift in the cutpoint, we have not observed such a drift in the scheme so far, possibly because of the tendency of new participants to have lower scores. If there were to be an upward drift then participants who originally performed adequately would subsequently risk being caught. Although this may not be a bad scenario—“inadvertently” raising minimum standards—it could be viewed as an inappropriate means of obtaining this gain. It should be noted that these data cover all leavers.
Unfortunately, no information was available as to the reason that these individuals left. Some would leave because of retirement or job changes, but a few could have left because of their low score in the scheme. This issue is more likely to become more problematical as the action points for poor performance start to become implemented from 2005. Because the interquartile methodology3 uses cutpoints based on the performance of the middle majority, an upward drift in the cutpoint is less likely to occur if those with low scores withdraw from the scheme.
It is widely believed that workload (both excessive and insufficient) in histopathology may affect performance.16 In our study, there was no relation between workload in breast pathology and performance in the EQA. However, no data were available on the total workload undertaken by individual pathologists. It is therefore not possible to comment on the relation of specialised workload (or the effect of total workload) on performance in the EQA. Nevertheless, as noted earlier, the EQA is not a measure of expertise; it is designed to assess competence in routine pathology, not rare unusual or difficult cases. Therefore, there may be no relation between the number of breast specimens a pathologist sees annually and their performance in the EQA. However, this needs further investigation.
The effect of specialist workload on performance would be an important area to investigate because there is debate as to whether pathologists should remain specialists or generalists.17–19 Knowledge of such information would have important implications for the shaping of service provision in the future. At present in the UK most pathologists are generalists. Most hospitals have a few general pathologists covering all areas of the work. If it were shown that specialist reporting influenced EQA scores positively, this would be a driver for more specialist (and fewer general) pathologists. This in turn would require an “exponential” increase in the number of pathologists with the possible concentration of specialists to fewer sites.
In conclusion, this study has highlighted some interesting phenomena occurring among participants in the EQA scheme. As yet, there has been no formal audit of participant’s experience within the NHSBSP EQA. Further data need to be collected regarding the reason for participant withdrawal from the scheme, and detailed examination of the effect of total workload and specialist workload on performance needs to be undertaken. These findings could have implications for shaping the provision of pathology services in the UK.
The authors of this paper are members of the external quality assessment (EQA) scheme management group of the UK national coordinating committee for breast pathology, which is responsible for pathology quality assurance in the UK National Health Service breast screening programme and preparation of minimum dataset standards in breast cancer pathology for the Royal College of Pathologists. The committee acts as the steering committee for the UK national breast screening histopathology EQA scheme.
Membership of the UK national coordinating committee for breast pathology: Dr Al-Sam, Dr N Anderson, Dr L Bobrow, Dr I Buley, Mr D Coleman, Professor C E Connolly, Dr N S Dallimore, Professor I O Ellis, Dr S Hales, Professor A Hanby, Dr S Humphreys, Dr F Knox, Mrs S Kodikara, Professor S Lakhani, Dr J Lowe Dr J Macartney, Dr S Moss, Dr R Nash, Dr D Parham, Mrs J Patnick, Dr S E Pinder, Dr C M Quinn, Dr A J Robertson, Dr J Shrimankar, Professor R A Walker, Dr C A Wells, and Mr R Winder.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.