Article Text

Download PDFPDF

Review of the national external quality assessment (EQA) scheme for breast pathology in the UK
  1. Emad A Rakha1,
  2. Rachel L Bennett1,
  3. Derek Coleman1,
  4. Sarah E Pinder2,
  5. Ian O Ellis1
  6. On behalf of the UK National Coordinating Committee for Breast Pathology (EQA Scheme Steering Committee)
  1. 1Department of Histopathology, Nottingham City Hospital, Nottingham, UK
  2. 2Department of Research Oncology, King's College London, Guy's Hospital, London, UK
  1. Correspondence to Professor Ian O Ellis, and Professor Emad A Rakha, Division of Cancer and Stem Cells, Department of Histopathology, Nottingham City Hospital, University of Nottingham, Hucknall Rd, Nottinghamm NG5 1PB, UK; emadrakha{at}


Background The National Health Service Breast Screening Programme (NHSBSP; pathology) external quality assurance (EQA) scheme aims to provide a mechanism for examination and monitoring of concordance of pathology reporting within the UK. This study aims to review the breast EQA scheme performance data collected over a 24-year period following its introduction.

Methods Data on circulations, number of cases and diagnosis were collected. Detailed analyses with and without combinations of certain diagnostic entities, and over different time periods were performed.

Results Overall, of 576 cases (172 benign, 11 atypical hyperplasia, 98 ductal carcinoma in situ/microinvasive and 295 invasive disease), consistency of assessment of diagnostic parameters was very high (overall k=0.80; k for benign diagnosis=0.79; k for invasive disease=0.91). For distinguishing benign versus malignant lesions, no further improvement is considered possible in view of the limitations of the scheme methodology. Although diagnostic consistency of atypical hyperplasia remains at a low level, combining it with the benign category results in a high level of agreement (k=0.93). The level of consistency of reporting prognostic information is variable and some items such as lymphovascular invasion and tumour size measurement may need further intervention to improve their reporting consistency. Although the level of consistency of reporting of histological grade remained at a moderate level overall (k=0.48), it was variable among cases and appears to have levelled off; no further significant improvement is expected and no significant impact of the previous publication of guidelines is observed.

Conclusions These results provide further evidence to indicate the value of the breast EQA scheme in monitoring performance and the identification of specific areas where improvement or new approaches are required. For most parameters, the concordance of reporting reached a plateaux a few years after the introduction of the EQA scheme. It is important to maintain this high level and also to tackle specific low-performance areas innovatively.

  • EQA

Statistics from


The National Health Service Breast Screening Programme (NHSBSP) commenced in 1988, and the pathology external quality assurance (EQA) scheme was introduced shortly afterwards. The scheme was initially piloted among the members of the National Coordinating Group for Breast Screening Pathology, and the first national circulation was conducted in the second half of 1990. One of the main objectives of the scheme is to provide a mechanism for the examination and monitoring of concordance of breast pathology reporting within the UK. In addition to the provision of a mechanism for individual performance appraisal, collective performance appraisal is likely to identify problems that could be addressed through various initiatives. The large number of EQA participants (currently more than 700), the choice of representative cases and the availability of multiple circulations over the years make review of the process likely to provide a true reflection of performance.

In reporting the slides, EQA participants are expected to follow NHSBSP guidelines. Three editions have been published to date (1989, 19951 and 2005)2 and the fourth edition is in press. These guidelines have taken into account, among other things, the findings from the previous years of the EQA scheme. Consistency studies of the scheme have been published; the first, in 1994,3 addressed observations derived from the first 3 years following its introduction, while the second was published in 2006,4 outlining the findings of the first 10 years of the scheme including the impact of the revised national guidelines published in 1995.1

The aims of the present study are to summarise the findings of the scheme, identify changes and trends following the initial reviews3–5 and assess the impact it has had on pathological reporting in the UK since its inception. This study also aims to comment on the trends in the performance following the publication of the third edition of the national guidelines in 2005 and the impact of an initiative on the consistency of tumour size assessment.

Materials and methods

This study is based on data obtained from the NHSBSP breast pathology EQA scheme over more than 20 years (1990–2014). Description and details of the standard operating procedures have been published previously.4–6 In brief, over 65 sets (approximately 1 set per 10 participants) of 12 cases (plus 3 educational cases) are circulated twice a year (over a 3-month period) to pathologists who report breast pathology in the UK. Each case comprises one representative H&E stained slide. No clinical details or immunohistochemistry results are provided. The cases are submitted by participants themselves for use in the scheme from their diagnostic practice and represent routinely reported cases. The slides are checked at the coordinating centre and reviewed by the organiser (IOE) following preparation of the sections to ensure that they show identical histological features. Cases are eliminated if section quality is too poor to interpret the histological appearances adequately, or if the key lesion is not adequately represented in all the sections cut. A mix of cases is selected to represent the main diagnostic categories, particularly invasive, benign and in situ diseases (see online supplementary table 1S). Categories of cases are selected but there is no selection within these categories by ease or difficulty of diagnosis, typicality of appearance or other criteria. A standard reporting form is completed by the participants for each case, including diagnostic classification of the lesion. For breast cancer cases, this includes the minimum standard data set information as required by the UK NHSBSP and by the Royal College of Pathologists1 ,2 ,7 ( The standard operating procedures for the scheme have been published and can be downloaded from the UK NHS Cancer Screening Programme's website (

The scheme currently includes more than 700 participants, a more than threefold increase in number since its introduction (220 pathologists took part in 1990 (circulation 19 902) and 466 in 2000 (circulation 20 002)). Although it is anticipated that participants should take part in each circulation, adequate participation is defined as completion of two of every three circulations. The participating pathologist independently examines the slides and, for each case, completes a tick box proforma, now an online electronic process ( Unlike routine practice, discussion of cases with colleagues or consultation during reporting of the slides is not allowed. Cases are classified into four categories: benign, atypical hyperplasia, in situ carcinoma and invasive carcinoma. Cases are classified in accordance with the published guidelines, which define criteria for the different categories based on the diagnostic features, including the definition of atypical hyperplasia1 ,2 ,7 ( For the purposes of the present analysis, microinvasive and in situ carcinomas were grouped together; microinvasive carcinomas being defined as an in situ carcinoma with one or more foci of invasion, none exceeding 1 mm in maximal dimension.

The level of agreement was assessed using κ statistics, as previously described.4 ,8 ,9 Values of κ range from 0 for chance agreement only to +1 for perfect agreement, with a negative value implying systematic disagreement. The range of κ is interpreted as follows: 0–0.20=slight agreement; 0.21–0.40=fair; 0.41–0.60=moderate; 0.61–0.80=substantial and 0.81–1.00=almost perfect agreement. In this study, the values are presented for groups of three circulations, to minimise differences caused by case selection. Education cases are not included in this review.


Major diagnostic categories

Table 1 shows κ values for the four major diagnostic categories (benign, atypical hyperplasia, in situ carcinoma and invasive carcinoma). Detailed results are shown since 2001, and results from the earlier years of the scheme are summarised (and have been previously published).4 The level of agreement for diagnosing invasive carcinoma was in the almost perfect range, and in the substantial to almost perfect range for benign cases. As in the earlier period, the number of examples of atypical hyperplasia was low, but the level of consistency has remained low. Due to the small number of cases of in situ/microinvasive carcinoma included in recent years, there is considerable variation.

Table 1

κ Values for overall diagnosis (all participants)

Figure 1 shows κ values for the major diagnostic categories since the beginning of the breast pathology EQA scheme. The level of agreement for diagnosing invasive cancer has been in the almost perfect range for the duration of the scheme, but there is a suggestion of an improvement over time. However, caution is necessary when interpreting the κ values as they are sensitive to case mix, and this has changed over the duration of the scheme, with a disproportionate number of radial scars in the first six circulations, while more recent circulations have included a lower proportion of in situ/microinvasive cases (see online supplementary table for breakdown).

Figure 1

Line graph of the overall diagnosis of different lesions during the period of study.

Prognostic features of invasive carcinoma

Tumour subtype

Table 2 summarises the findings of analysis of histological subtype of invasive carcinoma. Overall, the levels of agreement for diagnosing mucinous and invasive lobular carcinomas were in the almost perfect and substantial ranges, respectively. The remainder was in the moderate range, with the exception of the mixed type, where the level of agreement was very low. The term ‘medullary-like carcinoma’ was introduced in the third edition of the reporting guidelines in 2005;2 however, no additional medullary carcinomas were included in the Scheme after this date to enable comment to be made on the impact of this terminology and revised diagnosed criteria on the level of consistency.

Table 2

κ Values for subtype of invasive carcinoma (all participants)

Histological grade

Overall, histological grading2 ,10 achieved a moderate level of consistency (table 3). As reported in an earlier review, an improvement was observed which coincided with the publication of the revised guidelines in 1995.1 However, further improvement has not been seen.

Table 3

κ Values for histological grade of no special type (NST) cases of invasive carcinoma (all participants)

Recording of the histological grade components was introduced in 2000 (first circulation 20 001), and a total of 111 cases (no special type (NST) only) have been analysed as part of the scheme. The highest level of consistency for the three components was achieved in categorisation of tubule formation (κ=0.49) (particularly for solid tumours that are scored as 3 (κ=0.62)), followed by mitotic count and nuclear pleomorphism (κ=0.34 and 0.36, respectively). Tumours with either low or high mitotic counts (score 1 or 3, respectively) showed similar κ values (κ=0.45) while tumours with score 2 mitotic count showed the lowest κ of all components (κ=0.13).

Lymphovascular invasion

Lymphovascular invasion (LVI) is reported on the EQA proforma as present or not seen, with no probable/possible category; immunohistochemistry is not used. Table 3 shows the results for categorisation of LVI. Although the overall level of agreement was in the fair range, there was a suggestion of a slight improvement since results were previously published, and consistency has reached a moderate level of agreement in more recent years (κ=0.42 for 20 011–20 141 vs κ=0.30 for 19 902–20 002). Investigating the histological features and factors underlying low LVI concordance is a subject of a companion paper.

Invasive tumour size

The percentage of measurements within ±3 mm of the median was calculated for each case. The distribution of these values in 10% bands and the median values are shown in table 4. Overall, the measurement of invasive carcinomas has remained essentially similar to earlier results, with a high rate of concordance (95.0% within ±3 mm of median) but with lower concordance for invasive lobular carcinoma (88.2% of readings within ±3 mm of median). No overall difference was observed in the measurement of ductal/NST cases when whole tumour measurement (ie, invasive carcinoma plus surrounding ductal carcinoma in situ (DCIS)) was also taken into account (table 5). The results suggest that there were fewer outliers since the last review, with at least 80% of readings being within ±3 mm of the median for 93% of ductal/NST cases between 20 011 and 20 141 compared with 84% of cases in the earlier period of the scheme.

Table 4

Consistency of measuring maximum diameter of invasive carcinoma

Table 5

Consistency of measuring invasive carcinoma of NST cases when whole tumour measurements are also taken into account

Ductal carcinoma in situ

The growth pattern was the sole method of categorisation of DCIS up to circulation 19 951, when it was superseded by classification of DCIS as per cytonuclear grade; recording of growth pattern in the EQA was discontinued at circulation 20 001. Only the nuclear grade and the presence of Paget's disease and of microinvasion are required to be recorded for in situ cases. Few cases were recorded as low grade (four cases in both 19 951–20 002 and 20 011–20 141 circulation groups). Therefore, cases recorded as either low or intermediate nuclear grade have been combined in table 6. The overall level of consistency of DCIS nuclear grade was in the moderate range (k=0.55). Only a small number of DCIS cases have been included in recent circulations, but if circulation groups with six or more cases are considered, the results suggest that consistency has decreased since the last review.

Table 6

κ Values for classification of ductal carcinoma in situ (DCIS) by nuclear grade

Benign breast disease and epithelial proliferations

During the whole scheme, 22% of the benign cases were excluded from analysis as being difficult or educational, which may reflect the need for immunohistochemistry in benign cases compared with invasive carcinoma cases (6% was excluded). The level of concordance of benign cases was high (overall κ=0.90) with little variation among circulations (range 0.82–0.92; table 7). When cases in the atypia category were combined with benign lesions, the κ value increased to 0.93, with figures up to 0.96 achieved in some circulations. Fibroadenoma and benign intraductal papilloma without florid epithelial proliferation or atypia were associated with the highest level of agreement, while papillary and proliferative lesions with atypia were associated with the lowest level of concordance.

Table 7

Benign lesions and epithelial proliferation assessment by all participants (excluding educational cases)


Advances in the classification of breast cancer, improved methods of radiological detection and sampling of breast lesions, the increasing availability of a wide variety of therapies and the move towards personalised medicine have increased demand on pathologists for provision of an accurate and consistent diagnoses and high-quality prognostic information. It is well known that routine pathological examination of breast lesions can provide invaluable, cost-effective diagnostic and prognostic information that cannot be replaced by other techniques. However, it is important to note that the level of consistency of pathology reporting is determined by the performance of the pathologists and the degree of adherence to published guidelines and also by the methodology that they use and the inherited limitation and degree of subjectivity in the test items assessed.

The NHSBSP breast pathology EQA scheme, which aims to identify substandard performance and monitor consistency of reporting, can provide a valuable source of data to determine the robustness of the diagnosis and classification systems of breast diseases. Since its inception, it was clear that the collective data derived from reviews of the scheme could be used to identify problematic and challenging performance issues and investigate lesions that show persistent low reproducibility; this could be applied to help develop ways to tackle areas of difficulty. Observations derived from the previous reviews3 ,4 have been used to revise and update the national guidelines and then re-assess the subsequent impact of such publications.

In addition, specific EQA initiatives have been adopted, aiming to improve performance in certain poor performance areas such as tumour size measurement. The initial improvement that was achieved following the introduction of the scheme, and the improvement identified following publication of the national guidelines, indicated that further improvements can be achieved in histopathological diagnosis with evidence-based interventional methods. Therefore, the present study aimed to review the EQA scheme data collected over a longer period (24 years) to identify trends over time and to assess the impact of previous guideline publications and EQA initiatives. From observations published in the previous EQA reviews,3 ,4 four main situations have been identified with respect to reporting consistency:

  1. Consistency is very high, including diagnosing in situ and invasive carcinoma (and certain distinctive tumour subtypes) and uncomplicated benign lesions. Although the level of consistency in this category has been similar over time, there was evidence of improvement over the 24 years of the scheme.4 The current study confirms that the diagnosis of invasive carcinoma remained in the almost perfect category. Apart from few exceptions, this remained the same over time. Further improvement is not expected, taking into account that EQA diagnosis is made on a single H&E slide with no auxiliary techniques or consultation with colleagues, both of which are available in routine practice. As no further improvement is expected, the main role of the scheme in such situations is to monitor the performance of the participants and identify individual underperformance and introduce remedial actions.

  2. The second situation is where consistency is suboptimal, but where it might be anticipated that it could be improved by making guidelines more detailed and explicit. Histological grading is the archetypical example in this category and subsequently detailed information of histological grade assessment, including photographic illustrations, have been added to previous guidelines.1 ,2 However, although significant improvement in the consistency of grading was achieved following the first few circulations in the EQA scheme and the publication of the first revised national guidelines in 1995,1 no further improvement was noticed. The current study also shows that no improvement was achieved following the publication of the subsequent revised guidelines in 2005.2 Further analysis of grade, which is known to represent a biological spectrum, reveals that tumours with features at extreme ends of the spectrum (obviously grade 1 or grade 3 tumours) show a high level of consistency, while the lowest level of consistency is achieved for the cases located in the middle part of the spectrum. Of note, despite reported moderate variability in grade in clinical series,10 ,11 the high level of clinical value is unquestionable. While the subjectively in the assessment of grade 2 tumours has been shown to be improved in some series using additional methods such as Ki-67 immunohistochemistry,12 this is not routinely undertaken. Although guidance including greater explicitness of grading criteria had improved consistency after its first publication,1 this approach seems unlikely to produce further improvement as evidenced by the results after the 2005 publication.2 However, continuous education, training, guidance and emphasis on the prognostic value of grade may help to maintain the current level of consistency and avoid pathologists becoming outliers.

  3. The third situation is where consistency is acceptable but a few wide outliers are often encountered. Assessment of invasive tumour size is an example of this category. In 1995 (circulation 19 952), a ‘whole tumour size’ measurement was introduced to include any DCIS extending for more than 1 mm beyond the invasive component, in addition to the size of the invasive component alone. However, no improvement was noted. Specifically, a few widely outlying measurements were encountered even in cases with an overall high level of agreement on tumour size.4 Therefore, an initiative to identify the specific problems and to improve consistency was undertaken in 2002 and again in 2008. Details of the results of these initiatives are the topic of discussion in a companion paper; however, results from the current study suggest a slight improvement in performance following the initiatives. Although poorly delineated tumours and those with multiple foci of invasion have been reported to contribute to inconsistency,3 ,4 no evidence to comment on these features could be identified in this study. As histological assessment of tumour size is both important clinically and as a radiological target within the NHSBSP, and as this should theoretically be an objective feature, investigation of this issue remains underway to try to improve further the consistency of reporting this important prognostic variable and to identify consistent outliers for possible remedial actions.

  4. The fourth situation is where no improvement in consistency has been achieved in the EQA scheme, including diagnosing atypical hyperplasia and the classification of DCIS. These have remained at low level despite publication of guidelines containing explicit and refined diagnostic criteria.1 ,4 Although this study shows no improvement in this category, this is, fortunately, of limited practical importance as difficult or uncertain lesions in this group are typically diagnosed in routine practice with the help of additional immunohistochemistry and consultation with colleagues.

It is also important to note that the methodology of the scheme itself may preclude reproducible assessment of reporting certain features (eg, atypical hyperplasia and LVI) and may contribute to the low level of consistency. The small and focal nature of these lesions, the variation between different sections and the lack of additional investigation (ie, immunohistochemistry) and/or access to second opinions may contribute to the low level of consistency. Despite this, as with histological grade, continuous education, training, guidance and emphasis on the diagnostic impact of the diagnosis may help to avoid misclassification and avoid underdiagnosis or overdiagnosis.13 The current study, however, reports overall levels of agreement similar to some other concordance studies14 ,15 and higher than other studies of similar methodology in the research setting.16–18 It also appears that categories with lower case numbers had lower κ values. This likely explains the major drop in agreement for in situ/microinvasive in the 20 082–20 101 circulations—which only included a single case. In addition, the very low numbers of cases of atypia over the entire period essentially make these data inadequate for looking at time trends. However, the numbers of other categories are consistently high in all circulation making interpretation reliable.

Classification of DCIS using growth pattern alone showed an unacceptably low concordance level with no evidence of improvement over multiple circulations. It is now well recognised that DCIS growth pattern is not infrequently heterogeneous within an individual lesion and nuclear grade is now used as the main classifier of this disease. However, there is low level of agreement of classification as low or intermediate nuclear grade DCIS, while high-grade DCIS is associated with higher level of agreement (data not shown). This also requires further research and classification initiatives, given the ongoing clinical trials for these low-risk forms of the DCIS. The updated guidelines in press address this with greater details of methodology for categorisation of DCIS and subsequent analyses of EQA data will prove if this has improved consistently.

In this study, complex sclerosing lesions and papillary lesions, particularly those associated with florid epithelial proliferation or atypia, are associated with a low level of consistency. Again, it is important to note that such problematic breast lesions are usually diagnosed with the help of auxiliary techniques including immunohistochemistry, examination of multiple blocks, often at additional levels, and discussion with colleagues. The low consistency levels in such lesions in fact may reflect the limitation of the scheme methodology rather than diagnostic variation among participants.

The lowest level of agreement of classification of tumour types was seen for those with mixed morphology. The highest level was obtained with invasive lobular and mucinous carcinomas. Contrasting the current notion that medullary carcinomas has a low level of agreement, this study showed that the level of agreement of diagnosing medullary carcinoma was not different to that of ductal/NST carcinomas.

In conclusion, these results demonstrate a high level of consistency of reporting diagnostic and prognostic information of breast disease. Indeed, limitations exist related to methodology or to innate problems in the EQA scheme, which may underestimate pathologist's performance in routine practice. More effort into investigating the reasons for the low agreement level of reporting LVI, grading of DCIS and those individual pathologists who are often outliers for recording size measurement should be performed. Although guidance, education and training can only improve consistency up to a point, they can certainly maintain the currently high level of performance and can avoid any reduction in the overall performance in the future.

Take home messages

  • The National Health Service Breast Screening Programme (NHSBSP) breast pathology EQA scheme provides a valuable source of data to determine the robustness of the diagnosis and classification systems of breast diseases.

  • The overall level of consistency in reporting aspects of breast pathology is very high.

  • For most parameters, the concordance of reporting reached a plateaux a few years after the introduction of the EQA scheme.

  • It is important to maintain this high level but also to tackle specific low-performance areas innovatively.

  • Limitations of the scheme in assessing focal lesions such as atypical ductal hyperplasia and lymphovascular invasion should be acknowledged.


The authors acknowledge members of the UK National Health Service Breast Screening Programme (NHSBSP) and National Coordinating Committee for Breast Pathology. They also thank Claire Hawkes, Patricia Islip and Kevin Lea for providing technical, clerical and IT support to the scheme.



  • Handling editor Cheok Soon Lee

  • Collaborators The authors of this manuscript are members of the EQA scheme management group of the UK National Coordinating Committee for Breast Pathology, which is responsible for pathology quality assurance in the UK National Health Service Breast Screening Programme (NHSBSP) and preparation of minimum data set standards in breast cancer pathology for the Royal College of Pathologists. The committee acts as the steering committee for the UK National Breast Screening Histopathology EQA scheme. The scheme is a member of UK NEQAS.

  • Contributors All authors contributed to the study.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.