Aims—To investigate interlaboratory variance in the immunohistochemical (IHC) detection of oestrogen receptors so as to determine the rate of false negatives, which could adversely influence the decision to give adjuvant tamoxifen treatment.
Methods—To ensure that similar results are obtained by different institutions, 200 laboratories from 26 countries have joined the UK national external quality assessment scheme for immunocytochemistry (NEQAS-ICC). Histological sections from breast cancers having low, medium, and high levels of oestrogen receptor expression were sent to each of the laboratories for immunohistochemical staining. The results obtained were evaluated for the sensitivity of detection, first by estimating threshold values of 1% and 10% of stained tumour cells, and second by the Quick score method, by a panel of four assessors judging individual sections independently on a single blind basis. The results were also evaluated using participants' own threshold values.
Results—Over 80% of laboratories were able to demonstrate oestrogen receptor positivity on the medium and high expressing tumours, but only 37% of laboratories scored adequately on the low expressing tumour. Approximately one third of laboratories failed to register any positive staining in this tumour, while one third showed only minimal positivity.
Conclusions—There is considerable interlaboratory variability, especially in relation to the detection of breast cancers with low oestrogen receptor positivity, with a false negative rate of between 30% and 60%. This variability appears to be caused by minor differences in methodology that may be rectified by fine adjustment of overall technique.
- oestrogen receptors
- interlaboratory variation
Statistics from Altmetric.com
The importance of establishing the oestrogen receptor status of tumours for the treatment of women with breast cancer has recently been emphasised.1 The authors concluded that the fundamental question to be asked when predicting the likely outcome for a particular woman receiving adjuvant tamoxifen treatment is not whether she is young or old, with or without nodal involvement, or receiving chemotherapy—but whether or not her tumour is completely oestrogen receptor negative. Oestrogen receptor status is now often established by an immunohistochemical (IHC) test employing monoclonal antibodies.2–4 This assay has been shown to be at least as sensitive as the biochemical ligand binding assay5,6 and has the advantages of being applicable to small tumours and Tru-Cut biopsy samples, and of allowing only tumour cells to be assessed for oestrogen receptor status. The IHC assay can be conducted inexpensively7,8 on routinely processed tissue sections, with no need for specialised equipment. Consequently in many countries IHC analysis has become the chosen technique for establishing oestrogen receptor status in a routine pathology setting.9,10
In view of the increasing use of the oestrogen receptor IHC assay, it is vital that good quality assurance procedures are in place to assess the quality of the assays carried out by different laboratories.10 The United Kingdom national external quality assessment scheme for immunocytochemistry11 (UK NEQAS-ICC) currently assesses the quality of many immunohistochemical techniques carried out in the majority of UK clinical laboratories and in various laboratories based outside the United Kingdom. Since April 1994 the scheme has provided an external quality assessment (EQA) programme for the demonstration of oestrogen and progesterone receptors on routinely processed breast tumours.
In this paper we report on the degree of variability between 200 laboratories in demonstrating oestrogen receptors by immunohistochemistry on the same cases. The main aim of the study was to establish the proportion of laboratories able to demonstrate oestrogen receptors reliably in a weakly positive tumour, as there is a danger that these tumours could be erroneously reported as negative if the IHC assay is not of adequately high sensitivity.
Laboratories participating in the UK NEQAS-ICC programme for steroid hormone receptors (table 1) were sent two unstained slides containing histological tissue sections of formalin fixed and paraffin processed breast tumours showing different levels of receptor expression. Included in the composite tumour block, comprising three different oestrogen receptor positive infiltrating ductal carcinomas (X, Y, and Z), was some normal glandular breast tissue which acted as an internal control. In order to ensure that all sections contained a similar proportion of oestrogen receptor positive cells, every 100th section was immunostained for oestrogen receptors by the organising laboratory. Each participant was asked to demonstrate oestrogen receptors and to return the best stained slide, along with their own in- house control slide and a completed questionnaire giving methodological details (including details of the threshold value used by the laboratory), to the UK NEQAS-ICC coordinating centre for assessment. An expert panel of four, comprising pathologists (BJ, LB) and biomedical and clinical scientists (AR, DB), examined the slides and assessed the quality of the IHC assay performed by each laboratory.
METHODS OF EVALUATION
For the purposes of the present study, the “Quick” score method of assessment12,13 was used to assess the range of immunostaining performed by the participating laboratories. With this method the intensity of the immunohistochemical reaction as viewed under the light microscope was recorded as follows: 0, negative (no staining of any nuclei even at high magnification); 1, weak (only visible at high magnification); 2, moderate (readily visible at low magnification); 3, strong (strikingly positive even at low power magnification). The proportion of tumour nuclei showing positive staining was also recorded as: 0 (none); 1 (approximately 1–25%); 2 (26–50%); 3 (51–75%); or 4 (76–100%). The score for intensity was added to the score for proportion, giving the Quick score, with a range of 0–7 for each individual tumour.
The proportion of cells stained in each tumour in the composite block was also recorded as either 0, ≥ 1% but < 10%, or ≥ 10%. The absence or presence of staining of the nuclei of non-neoplastic ducts in adjacent tissue was also recorded. This served as an internal control. Slides which failed to show any staining in the normal internal control or which showed excessive non-specific immunostaining in the stromal component were deemed unsatisfactory and were excluded from statistical analysis.
OESTROGEN RECEPTOR STATUS OF THE REFERENCE TUMOURS X, Y, AND Z
From the UK NEQAS participants, six were identified as having published clinical studies relating oestrogen receptor positivity to tamoxifen treatment. These studies are not referred to in this paper as this would identify the laboratories concerned and in so doing transgress the UK NEQAS code of practice which confers anonymity to all participants.14 The assessment results from these laboratories and the initial testing performed by the organising centre were used to establish the oestrogen receptor status of the tumours X, Y, and Z, and are recorded in table 2. Additional confirmation of the oestrogen receptor positive status was provided in the form of the results of previous biochemical assays conducted on these cases.
Median values were established for the Quick scores achieved by participating laboratories on the infiltrating ductal carcinomas (IDC) labelled X, Y, and Z. Spearman's rank coefficient was used to test for correlation between the level of sensitivity achieved on the three different tumours and differences in the proportion of laboratories showing oestrogen receptor positivity at various threshold values was tested by means of the χ2 test. Kendall's coefficient of concordance (Kendall's W) was used to determine the level of agreement between assessors.
When the staining results were analysed by the Quick score (fig 1) the median scores were 2 for tumour X (low oestrogen receptor expressor), 4 for tumour Y (medium oestrogen receptor expressor), and 6 for tumour Z (high oestrogen receptor expressor).
Spearman's rank coefficient showed a highly significant positive correlation between the level of sensitivity achieved by individual laboratories on the tumours of differing oestrogen receptor expression (tables 3–6).
When only the proportion of nuclei stained in the tumours was evaluated, 99.0% of participants demonstrated 10% or more of the nuclei of the high expressor, while 99.5% demonstrated 1% or more. For the medium expressor, 84.5% demonstrated 10% or more of nuclei, while 88.0% demonstrated 1% or more. For the low expressor, 37.3% demonstrated 10% or more of tumour nuclei, with 66.3% demonstrating 1% or more (fig 2). When the threshold values used by participants to designate a tumour as either oestrogen receptor positive or oestrogen receptor negative were used, the proportion of assays which would have recorded the high, medium, and low expressing tumours as oestrogen receptor positive fell to 98.0%, 80.0%, and 32.8%, respectively (for all evaluations, p < 0.0001, two tailed). Approximately one third of participants failed to demonstrate any tumour nuclei at all in the low expressor (fig 3).
Kendall's coefficient of concordance revealed a significant level of concordance between assessors in the evaluation of slides (Kendall's W = 0.014, p = 0.040).
With immunocytochemistry for oestrogen receptors, it is a commonly observed phenomenon that the first sign of a fall in sensitivity of the IHC technique is a diminution in staining intensity, and this is followed by a reduction in the proportion of tumour nuclei demonstrated. For this reason, three methods of evaluation were used to assess one or both of these criteria.
The Quick score method was included on the basis that it was a previously validated system for evaluating oestrogen receptor status of each of the tumours,12,13 in conjunction with a simple but clinically validated 10% oestrogen receptor positive threshold.15–19 This threshold is commonly used by many laboratories to differentiate between breast tumours which are likely to respond to tamoxifen treatment and those which are not (table 7). We also included the recently recommended 1% threshold value, considered to be clinically relevant by some workers.6,8,21 Positive IHC assays using this cut off value has been associated with a large improvement in disease-free survival in patients receiving adjuvant tamoxifen (∼30% at five years), with nearly one tenth of all oestrogen receptor positive patients investigated having only 1–10% of oestrogen receptor positive nuclei in their tumours.21 Lastly, the oestrogen receptor status of the tumours was evaluated using the threshold values employed in the participants' own laboratories.
The overall analysis showed that while the majority of laboratories had little difficulty in demonstrating the tumours with high oestrogen receptor expression, a significant proportion (62.7%, p < 0.0001) failed to demonstrate 10% or more of the nuclei of the low expressor (fig 2). Interestingly there was a three way split in these results, with approximately one third of the assays staining no nuclei at all, one third staining some nuclei but less than 10%, and one third staining 10% or more (fig 3). Clearly with such wide interlaboratory variation in the assay sensitivity, a 10% threshold value used in one laboratory is unlikely to be applicable in another. The same would apply to the Quick score, with relatively large interquartile ranges of 0–3 for the low expressing carcinoma and 2–5 for the medium expressing carcinoma (fig 1). This interlaboratory variance is not caused by inconsistencies at the time of evaluation, as the level of agreement between individual assessors was good, as it was in a previous study,13 but instead it was caused by variations in the sensitivity of the IHC method. Consequently the oestrogen receptor status (positive or negative) of these tumours and the predicted response to adjuvant tamoxifen treatment are considerably influenced by which laboratory has performed the assay.
The choice of threshold value could compensate for the slightly differing levels of IHC sensitivity observed between laboratories. It has been recommended that threshold values should always be gauged against clinical outcome.13 Consequently laboratories with different assay sensitivities could theoretically obtain the same result on the same tumour, as long as individual threshold values have been carefully adjusted to clinical outcome (assuming a similar proportion of patients respond to adjuvant tamoxifen treatment in different populations). In order to make allowance for this, the oestrogen receptor status of the tumours used in the present study was also established, using the participants' own threshold values. The fact that the interlaboratory variance persisted and if anything increased when the laboratories' chosen threshold values were used (fig 2) indicates that these would not compensate entirely for the differences in sensitivity observed between laboratories.
The positive oestrogen receptor status of the three tumours used in this study, as determined by the organising centre, is ratified by the results of the biochemical analyses. Furthermore the results of all six of the expert laboratories known to use clinically validated oestrogen receptor assays indicated that the high and medium expressing tumours were oestrogen receptor positive, and four of the six agreed that the low expressing tumour was positive, using either their own threshold value or a 10% cut off. Yet further support for the view that all the tumours were oestrogen receptor positive was obtained indirectly from the significant correlation between the Quick scores achieved on the medium expressing tumour and the proportion of nuclei stained on the low expressing tumour (table 6). Approximately 70% of laboratories who achieved higher than the median Quick score of 4 on the medium expressing tumour demonstrated ≥ 10% of nuclei in the low expressing tumour. In contrast only 18% of those scoring less than 4 on the medium expresser demonstrated ≥ 10% of nuclei in the low expresser. Consequently a Quick score of less than the median value on a relatively high oestrogen receptor expressing tumour correlates with < 10% of nuclei staining on the low expresser, while a Quick score greater than the median correlates with ≥ 10% of nuclei staining on the low expresser.
The significant positive correlation between the level of sensitivity achieved by the same laboratories on the different tumours (tables 3–6) indicates that less than optimum sensitivity on relatively high expressing tumours equates to poor and sometimes inadequate demonstration of very low expressers. This is because in the low expressing tumours the amount of oestrogen receptor present is much closer to the designated threshold value, and a slight fall in sensitivity can result in the number of nuclei demonstrated being below this value.
Interestingly, of all the threshold values investigated, the recently recommended 1% threshold value6,8,21 would result in a significant number of laboratories recording all three categories of tumour used in the present study, including the low expressing intraductal carcinoma, as oestrogen receptor positive (fig 2). The reason for this is that the 1% threshold alone would make sufficient allowance for the observed interlaboratory variation in IHC sensitivity. However, it must be emphasised that a 1% threshold could result in detection of a higher proportion of oestrogen receptor positive unresponsive tumours from laboratories using a more sensitive method of detection. Hence, as emphasised by Barnes et al, a reasonable balance must be achieved between sensitivity and specificity in order to more accurately predict the proportion of patients likely to benefit from hormone treatment.10,13
Once improvement in interlaboratory consistency in carrying out the IHC assay has been achieved, it will be possible to address two outstanding questions: first the “accuracy” of the assay, and second the choice of cut off point. In the past, when the cytosol assay was used, there was always a small number of oestrogen receptor “negative” cases that responded to endocrine treatment. It is not clear whether these were genuinely negative or whether there was insufficient tumour in the sample used to prepare the cytosol. The advantage of IHC is that the presence of tumour can be confirmed by eye. Conversely there are also unresponsive oestrogen receptor positive cases. This may happen because the tumour burden is so great that treatment is ineffective or it could reflect the presence of oestrogen receptor in normal epithelial cells; again negative staining of tumour cells can now be checked visually.
The question of the cut off values remains a topic of much discussion. These may well differ according to whether the assay is to provide prognostic or predictive information. Much experience has been gained from the treatment of metastatic disease but less is available from the adjuvant setting. The increased use and improvements in quality of IHC will enable critical examination of relations between different cut off points and response. This in turn will lead to a consensus as to the “correct” values and make comparisons between studies easier.
In this study, we have investigated the ability of laboratories participating in the United Kingdom NEQAS-ICC for hormonal receptors to demonstrate positive staining in mammary carcinomas shown by experienced laboratories to have an oestrogen receptor positive status. The difficulties experienced by some laboratories in achieving this goal are highlighted and have since been communicated to the participants, with special emphasis on the false negative results. The reasons for the underachievement by some laboratories may lie in variations in the sensitivity of the overall staining technique. The sensitivity of the IHC assay is determined by several variables, which include the quality and concentration of the primary antibody used, the power of the antigen retrieval, and the secondary detection systems and quality of the fixation of the tissue. A superficial comparison of these variables among the assay systems used by different laboratories has failed to reveal any that are predominantly responsible for the differences observed. However, quality assurance is a continual process and the ongoing cycle of assessment runs, currently in progress for the oestrogen receptor IHC assay, may show that a combination of these factors is responsible for the observed interlaboratory variance. Better optimisation of such factors is needed to ensure that the results produced in one laboratory are comparable with those produced in another. This in turn may allow the chosen set of prognostic/therapeutic threshold values for selecting treatment for both primary and metastatic breast cancers to be safely applicable in the majority of laboratories offering the specialist oestrogen receptor IHC assay service.
We thank Elizabeth Anderson, Andre Balaton, Rudolf Baumann, and Rastko Golouh for providing us with invaluable assistance, and all the participants of UK NEQAS-ICC without whom this study would not have been possible.