Article Text
Abstract
Aims Histological grade is widely used to guide the management of invasive breast cancer (IBC). Yet, substantial interlaboratory and intralaboratory grading variations exist in daily pathology practice. To create awareness and to facilitate quality improvement, feedback reports, containing case-mix-adjusted laboratory-specific grades benchmarked against other laboratories, were sent to the individual laboratories by 1 March 2018. We studied the effect of these feedback reports on interlaboratory grading variation up till 1 year later.
Methods Overall, 17 102 synoptic pathology reports of IBC resection specimens from 33 laboratories, obtained between 1 March 2017 and 1 March 2019 were retrieved from the Dutch Pathology Registry (PALGA). An overall deviation score (ODS), representing the sum of deviations from the grade-specific overall proportions, was calculated to compare the absolute deviation for all grades at once. Case-mix correction was performed by two multivariable logistic regression analyses, providing laboratory-specific ORs for high-grade versus low-grade IBC.
Results After feedback, the overall range between laboratories decreased by 3.8%, 6.4% and 6.6% for grades I, II and III, respectively. Though the mean ODS remained similar (13.8% vs 13.7%), the maximum ODS decreased from 34.1% to 29.4%. The range of laboratory-specific ORs decreased by 21.9% for grade III versus grades I–II.
Conclusions An encouraging decrease in grading variation of IBC was observed after laboratory-specific feedback. Nevertheless, the overall grading variation remains substantial. In view of the important role of grading in patient management, it is adamant that not only feedback should be provided on a regular basis but also other interventions, such as additional training, are required.
- breast cancer
- breast pathology
- oncology
- histopathology
- epidemiology
Statistics from Altmetric.com
Introduction
To date, the histological grade is widely used to guide the therapeutic management of invasive breast cancer (IBC)1–4 as it remains one of the most well-established prognostic factors.2 3 5 When strictly adhering to the current Dutch guideline,1 which is similar to the rest of Europe,4 grade even indicates the need for adjuvant chemotherapy in approximately one third of patients with breast cancer in the Netherlands.6 Furthermore, grading is used to guide radiotherapy decisions1 4 7 and the use of genetic profiling tests.1 4 8–10
Despite its important role in patient management, we previously showed that substantial, and clinically relevant, variation in the grading of IBC exists on a nationwide scale in daily clinical practice in the Netherlands.6 Studies in which multiple IBC lesions were graded by several pathologists also showed that reproducibility was no more than moderate.11–14 This suggests that patients may be undergraded and overgraded in specific pathology laboratories and or by specific pathologists, which may subsequently result in undertreatment and overtreatment of a substantial number of breast cancer patients .6 As this may influence outcome, including exposure to unnecessary toxicity, it is clear that standardised histological grading is of key importance.
The results of our previous study6 were sent to the individual laboratories as feedback reports by the nationwide Dutch Pathology Registry (PALGA) to facilitate quality improvement, as auditing and benchmarking improve the quality of breast cancer care.15–20 By benchmarking their laboratory-specific proportions per histological grade against other laboratories, pathologists in individual laboratories were enabled to discuss and reflect on their grading practices and could conclude that adaptations were necessary.
This study was conducted to examine the effect of the case-mix-adjusted, laboratory-specific feedback reports on the interlaboratory variation in histological grading of IBC using real-life data from synoptic (structured) pathology reports in the Netherlands.
Methods
Data source
Data were retrieved from PALGA, the nationwide network and registry of histopathology and cytopathology in the Netherlands, which contains excerpts of all pathology reports from Dutch laboratories, with nationwide coverage since 1991.21 All data within the PALGA database are pseudonymised both in the laboratories and by a trusted third party (ZorgTTP, Houten, the Netherlands). All pathology laboratories were anonymised to the researchers by PALGA in a final step. Laboratories that wanted to receive feedback on the pathologist level (in addition to the overall laboratory feedback) were asked to send their local pathologist information to PALGA as the PALGA database did not contain pathologist’ information before 2019. All data were retrieved and handled in compliance with the General Data Protection Regulation Act and this study was approved by the scientific and privacy committee of PALGA.
Study population
All synoptic pathology reports of patients with IBC resection specimens in the Netherlands between 1 March 2017 and 1 March 2019 were retrieved from PALGA (n=25 420) (online supplementary 1).
Supplemental material
Overall, 38 of 42 Dutch pathology laboratories used the synoptic (PALGA) protocol from 1 March 2017 and onwards. Of these laboratories, we included those that reported at least 50 IBC resection specimens per year.
We excluded all resection specimen reports of patients with neoadjuvant treatment as grading may be influenced by chemotherapy.22–24 Furthermore, synchronous IBCs, defined as an ipsilateral lesion within 6 months of the previous IBC resection report, were considered paired measurements of which we only included the first report (online supplementary 1).
Per pathology report, we extracted patient characteristics (sex, age and type of surgery) and tumour characteristics (tumour size, histological subtype, histological grade, oestrogen receptor (ER) and progesterone receptor (PR) status, and human epidermal growth factor receptor 2 (HER2) status). Reports with any missing data on one of the patients or tumour characteristics were excluded from further data analysis (online supplementary 1).
Feedback reports
Laboratory-specific feedback reports, regarding the variation in the grading of IBC between 1 January 2013 and 31 December 2016, were sent to the laboratories by PALGA by March 1 2018. These feedback reports showed the laboratory-specific proportions per histological grade, benchmarked against the overall national proportions and the proportions of the other anonymised laboratories. Thereby, laboratories were enabled to discuss and reflect on their grading practice, and perhaps conclude that adaptations were necessary. The general feedback report is available on the PALGA website (in Dutch only).25
Ten laboratories provided (coded) pathologist information for their data, which gave these pathologists the advantage to benchmark their own grading practice against other pathologists in their laboratory and to the national mean. According to the literature, this type of individual feedback is more effective than providing general (laboratory level) data only.26–29 Feedback reports were sent to the laboratories by 1 March 2018, which resulted in a prefeedback group of synoptic pathology reports from 1 March 2017 to 1 March 2018, and a postfeedback group of synoptic pathology reports from 2 March 2018 to 1 March 2019.
Histological grading
Histological grading of IBC was determined according to the modified Bloom and Richardson guideline (Elston-Ellis modification),30 31 with a score of 1–3 on its three components (tubule formation, nuclear pleomorphism and mitotic count). This results in a total score and subsequent grade (3–5=grade I, 6–7=grade II, 8–9=grade III).
Statistical analysis
Patient and tumour characteristics were summarised and differences between the prefeedback and postfeedback group were tested by means of a χ2 test for categorical variables and by a non-parametric Mann-Whitney U test for continuous variables.
Overall proportions per grade (I, II and III) were determined before and after the feedback reports and considered the national proportions. The absolute differences from the national proportion per laboratory are presented in bar charts per grade for the prefeedback and postfeedback periods. Laboratories that also received feedback on the pathologist level are indicated by striped bars.
An overall deviation score (ODS) was computed to compare the absolute deviation for all three grades at once. The ODS was calculated by the sum of absolute deviations from the grade-specific national proportions per period (prefeedback and postfeedback). Differences in ODS of individual laboratories before and after feedback were compared using a Wilcoxon signed-rank test.
As a possible way to interpret the type of change in laboratories after feedback, we used multiple definitions of change. First, we arbitrarily defined laboratories with an absolute change of ≤2% as ‘not shifting’. Second, in the case of an absolute change of >2%, we defined two types of change. Laboratories with a smaller deviation from the overall mean were defined as ‘less deviant’. Similarly, laboratories that became more deviant from the overall mean were defined as ‘more deviant’.
To compare relative differences among laboratories, we used a logistic regression model, providing ORs and 95% CIs per laboratory. We performed two logistic regression analyses, with different definitions of low-grade and high-grade IBC, as there is no clear binary cut-off in clinical practice. For example, grade III is considered a risk factor (high grade) according to radiotherapy guidelines,1 4 7 whereas according to chemotherapy guidelines grades II–III is considered a risk factor (high grade) with possible subsequent therapy consequences.1 4 Therefore, in one logistic regression analysis, low-grade IBC was defined as grades I–II and high-grade IBC as grade III, whereas in the other logistic regression analysis, low-grade IBC was defined as grade I and high-grade IBC as grades II–III.
For the choice of reference laboratory, we arbitrarily chose the laboratory best resembling the national distribution with regard to the specific logistic regression analysis. Multivariable logistic regression analyses were performed to correct for differences in case mix. Case-mix variables were selected based on our previous research6 and included age, tumour size, type of surgery, histological subtype, HER2 status and hormone receptor status. Hormone receptor status was considered positive when either or both the ER- or PR-receptor status were positive and was taken into account as a binary variable (either positive or negative). According to the current Dutch guideline,1 the receptor status for ER and PR is considered positive when ≥10% of the tumour cells show ER and PR staining by immunohistochemistry (IHC). The overall number of men was too low to take into account in a multivariate model, however, men did not cluster in specific laboratories. To compare differences in the case-mix-adjusted ORs of the individual laboratories, we calculated the positive OR difference (ie, the difference of a laboratory-specific OR to the reference OR of 1.00) both prefeedback and postfeedback and compared the differences of the individual laboratories by a Wilcoxon signed-rank test for both multivariable logistic regression analyses (grade I vs grades II–III and grades I–II vs grade III).
Lastly, the effect of feedback on the pathologist level was tested by comparing the mean ODS before and after feedback, between the laboratories that received feedback both on pathologist and laboratory levels and laboratories that received feedback on the laboratory level only. In addition, type of change between these groups was compared by means of a χ2 test.
All statistical analyses were performed by using IBM SPSS Statistics V.25.0.0.2. Values <0.05 were considered statistically significant.
Results
Patient, tumour, and laboratory characteristics
A total of 17 102 IBC synoptic resection specimen reports from 16 734 patients were included in our data analysis. For some patients, more than one pathology report was included as this concerns either a bilateral tumour or an ipsilateral tumour that was reported >6 months after the first diagnosis (online supplementary 1). Of the included reports, 8767 were reported before and 8335 were reported after feedback reports were sent to the laboratories by PALGA.
All patients originated from 33 of 42 Dutch pathology laboratories, as 4 laboratories did not implement synoptic reporting between 1 March 2017 and 1 March 2019 and 5 laboratories graded <50 IBC lesions within the synoptic PALGA protocol per period (prefeedback and/or postfeedback). The number of synoptic IBC reports per laboratory ranged from 64 to 613 (median 239) in the year before the feedback reports, whereas the number of synoptic pathology reports per laboratory in the year after the feedback reports ranged from 52 to 637 (median 207). Characteristics of all included IBC resection specimen reports are listed in table 1.
The overall mean age (SD) at diagnosis was 63.2 (11.9) years and patients were primarily female individuals (99.2%). Breast-conserving surgery was performed in approximately two thirds of patients (68.2%). The majority of tumours were of ductal (not otherwise specified) subtype (78.2%), with a positive ER/PR status (89.8%), whereas only a small minority of tumours had a positive HER2 status (7.9%). Most characteristics, including age, sex, tumour size, type of surgery and histological subtype, we similarly distributed prefeedback and postfeedback. A minimal but significant increase of hormone-receptor-positive tumours was observed after the feedbacks reports, whereas a significant decrease was observed for HER2-positive tumours (p=0.010).
Overall national proportions for IBC grades I, II and III were, respectively, 30.5%, 49.5% and 20.0% before the feedback reports, whereas IBC grades I, II and III were reported in, respectively, 32.0%, 49.2% and 18.8% after the feedback reports (p=0.048).
Interlaboratory differences in histological grading
After feedback, the total range between laboratories decreased for all grades; 3.8% for grade I (from 17.5%–45.5% to 17.3%–41.5%), 6.4% for grade II (from 34.3%–64.5% to 35.0%–58.8%) and 6.6% for grade III (from 10.9%–37.1% to 9.9%–29.5%) (figure 1).
The mean overall ODS remained similar after feedback (13.8 vs 13.7%), which is also reflected by the similar ODS of individual laboratories (Wilcoxon signed-rank test, p=0.955). The maximum ODS, however, decreased from 34.1% to 29.4% (figure 2). Overall, 11 (33.3%), 13 (39.4%) and 16 (48.5%) of laboratories showed no shift (≤2%) after feedback for grades I, II and III (table 2).
Among laboratories that shifted >2% after feedback, the number of laboratories that became more deviant was similar to the number of laboratories that became less deviant after feedback (30.3% vs 36.4% for grade I, 30.3% vs 30.3% for grade II and 24.2% vs 27.3% for grade III, respectively) (table 2).
For the multivariate logistic regression analysis of grade III versus grades I–II, laboratory 30 had the lowest mean deviation from the national proportion before and after feedback for grade III (0.2%) and was chosen as a reference laboratory. Before feedback, adjusted ORs ranged from 0.37 (95% CI: 0.21 to 0.67) to 2.15 (95% CI: 1.26 to 3.67). After feedback, the range of adjusted ORs decreased from 0.37 (95% CI: 0.20 to 0.68) to 1.76 (95% CI: 1.01 to 3.07) (figure 3A). Consequently, the absolute overall OR range decreased by 21.9% from 1.78 to 1.39. Positive OR differences of the individual laboratories did not significantly differ (Wilcoxon signed-rank test, p=0.702). For the multivariate logistic regression analysis of grades II–III versus grade I, laboratory 32 had the lowest mean deviation from the national proportion before and after feedback for grade I (0.9%) and was chosen as a reference laboratory. Before feedback, adjusted ORs ranged from 0.48 (95% CI: 0.29 to 0.77) to 2.00 (95% CI: 1.10 to 3.65), resulting in the absolute overall OR range of 1.52. After feedback, the range of adjusted ORs slightly increased (10.5%), that is, from 0.42 (95% CI: 0.26 to 0.67) to 2.10 (95% CI: 1.24 to 3.58) with corresponding absolute overall OR range of 1.68 (figure 3B). Positive OR differences of the individual laboratories did not significantly differ (Wilcoxon signed-rank test, p=0.640).
Feedback on the pathologist level
Ten of the included laboratories received feedback both on laboratory and on pathologist levels (figures 1 and 2; striped bars). Although the mean prefeedback ODS of these laboratories was lower (10.7%) than the mean prefeedback ODS of laboratories that only received feedback on the laboratory level (15.1%), both groups did not show noteworthy changes after feedback (10.7% and 15.0%, respectively). Furthermore, the type of change in laboratories after feedback did not significantly differ for both groups (table 2). Yet, a significantly higher proportion of laboratories that received feedback on the pathologist level showed no shift after feedback for grade I, whereas a similar pattern was observed for grade III (p=0.103), whereas this was not observed for grade II.
Discussion
Using nationwide data from structured (synoptic) pathology reports, we studied case-mix-adjusted, laboratory-specific feedback reports as an intervention to decrease interlaboratory variation in histological grading of IBC. This study shows an encouraging decrease in nationwide grading variation after sending feedback reports to individual laboratories, reflected by a decrease in absolute range of grade-specific proportions after feedback for all grades (I–III), the decrease of the maximum ODS and the range of laboratory-specific ORs showing a notable decrease of 21.9% for grade III versus grades I–II. Overall, this shows that most deviant laboratories became less deviant, whereas the overall mean ODS and positive OR differences of individual laboratories did not significantly differ.
The primary aim of the laboratory-specific feedback reports was to create awareness among pathologists by highlighting that grading variation in current clinical practice is substantial and improvement is warranted. It is important to stress that the aim of the feedback reports was not to just simply make ‘higher’ grading pathologists grade their tumours lower and vice versa. The awareness that the feedback reports created enabled pathologists to discuss how they grade with other pathologists. Furthermore, they could perhaps conclude that they interpret the guideline differently or less strictly than other pathologists. In addition to inciting a dialogue between pathologists, we also hope that our previous paper6 opens the dialogue between pathologists and oncologists. As we have previously shown, grade determines whether patients will get chemotherapy in approximately 30% of patients with breast cancer,6 thus, awareness of grading variation is also very important to oncologists. Moreover, one could also think of peer consultation in these cases, where grade determines whether a specific therapy is indicated.
Data included in this study were from synoptic pathology reports only, as currently over 80% of IBC resection specimens are reported this way.32 Moreover, besides increased overall completeness of pathology reports,33 34 it has recently been shown that synoptic reporting also improves patient care.34 Besides advantages in patient care, easy data extraction from synoptic pathology reports also enables the assembly of nationwide laboratory-specific feedback reports on any chosen biomarker (histological grade or hormone receptor and/or HER2 status35). We believe that these feedback reports are an important first step towards the improvement of breast cancer care by creating insight and awareness in the variation of biomarker assessment, which is supported by the results of this study.
Thirty-eight of the current 42 pathology laboratories in the Netherlands implemented synoptic reporting between 1 March 2017 and 1 March 2019. Five of these 38 laboratories were, nevertheless, excluded from further data analyses as they synoptically graded <50 IBC in either the prefeedback or postfeedback period. Two laboratories likely started using the protocol somewhere in the prefeedback period (<50 reports) as their synoptic IBC report number increased considerably (>230) during the postfeedback period. Two other laboratories had low synoptic IBC report numbers in general (30–60 per period) and the fifth laboratory stopped reporting synoptically (425 prefeedback, 0 postfeedback) for unknown reasons.
Although it seems that some laboratories (or pathologists) grade only few IBC cases annually, it is important to emphasise that these pathologists may still report IBC resection specimens narratively (ie, outside the synoptic PALGA protocol), thus they may grade more IBC cases in clinical practice than our data may suggest. In addition, we previously showed that both laboratories that grade few and many IBC within the synoptic protocol may report significantly deviant proportions per grade.6
However, if IBC resection specimen numbers from our study are the true numbers per laboratory, one could argue the desirability of laboratories assessing <50 IBC resection specimens annually, which comes down to <1 IBC resection specimen per week. With the current situation of substantial nationwide interlaboratory grading variation, and the potential clinical consequences in mind,6 one could argue that grading may be only be undertaken by trained, or maybe even only by expert breast pathologists. This should be the subject of future research.
Overall, grades I, II and III IBC were observed in 31.2%, 49.4% and 19.4% of all IBC resection specimens, respectively. This is in line with our previous study (2013–2016) and other studies,36–41 although percentages vary between the different studies. The distribution of grades did vary significantly between the prefeedback and postfeedback period (p=0.048); a slight increase of grade I IBC (30.5% vs 32.0%) and a slight decrease of grade III IBC (20.0% vs 18.8%) was apparent, whereas grade II IBC remained relatively stable (49.5% vs 49.2%). This shift in distribution may be initiated by the feedback reports, however, it may also reflect a true change or random variation in the breast cancer population. Either way, this made it more difficult to show significant deviations towards the mean.
In addition, we found relatively low numbers of HER2-positive IBC reports (7.9%) as compared with generally adopted numbers of 15%–20%.42 Furthermore, we found relatively high numbers of almost 90% ER-positive tumours, whereas approximately 15% of breast cancers are usually reported as triple negative.43 Both findings are likely due to the fact that we excluded patients who received neoadjuvant treatment, which is the preferred initial approach in patients with HER2-positive and triple-negative breast cancer.44
We analysed the data in an absolute and a relative manner, comparing laboratories both to the national proportion and a reference laboratory. Overall, the mean ODS and changes within individual laboratories were non-significant (ODS, positive OR differences). However, all analyses did show a decrease in the extremes (absolute range per grade, maximum ODS and the 21.9% decrease in ORs for grade III vs grades I–II) after feedback. In addition, the slight increase (10.5%) of the absolute range of OR before and after feedback for grades II–III versus I seems to be mainly caused by a single extremely deviating laboratory (#12), as this laboratory became significantly more deviant in the same direction after feedback. An impressive decrease in the absolute OR range of 45.0% can be observed for the remainder of the laboratories. Overall, this shows that most deviant laboratories became less deviant, whereas the majority of the other laboratories remained stable. We therefore believe that these results show an encouraging decrease in breast cancer grading variation after feedback.
Interestingly, we found no differences between laboratories that only received feedback on the laboratory level and laboratories that additionally received feedback on the pathologist level. This may be due to the multistep, semiobjective way of the grading according to the modified Bloom and Richardson guideline,30 31 with scores on the three components of grading, resulting in a total score and subsequent grade. This makes it more objective and thus more robust to direct influences on the overall grade. In this light, it may be interesting to reflect on the three different subcategories (tubular differentiation, nuclear pleomorphism, mitotic count) at the pathologist level. In addition, although the mean ODS did not change in both groups, the mean ODS of laboratories that also received feedback on the pathologist level was notably lower than the mean ODS of laboratories that received feedback on the laboratory level only. Hence at the starting point, laboratories with feedback on the pathologist level were already less deviant from the overall mean, which may have influenced their urgency to adjust their grading practices. This may be reflected by the significantly higher proportion of laboratories showing no shift for grade I among laboratories that received feedback on the pathologist level as compared with laboratories that received feedback on the laboratory level only. Lastly, the relatively low number of laboratories that received feedback on the pathologist level (n=10), makes it difficult to draw definite conclusions. However, literature does suggest that feedback is more effective when individual rather than general data are provided.26–29
As with all interventions implemented in an uncontrolled environment, such as daily clinical practice, the observed decrease in grading variation can probably not solely be attributed to the feedback reports. However, within the time frame of our previous study6 and this study (31 December 2016 to 1 March 2019), no guideline changes and no major other interventions or events on a nationwide scale took place.
Although feedback reports were sent by 1 March 2018, it is very likely that they were discussed in the laboratories somewhere in the week(s) thereafter, for example, in a regular staff meeting. Hence, this means that the actual postfeedback period may have started somewhat later than 1 March 2018, which could have clouded the effect of the feedback reports. Hence, the actual effect of the feedback reports may be even greater.
Despite the encouraging decrease in nationwide grading variation, we also showed that grading variation remains substantial. Besides continuous monitoring and benchmarking26 of histological grading (and other crucial biomarkers like ER, PR and HER2), which is already being considered by PALGA and the Dutch Society of Pathology, future research might focus on developing an e-learning module to train pathologists and residents in histological grading of IBC in a standardised way to further decrease grading variation. This is underlined by Elston and Ellis, who state that grading of IBC should only be undertaken by specifically trained pathologists.45
Conclusions
An encouraging decrease in nationwide Dutch grading variation of IBCs was observed after feedback. As feedback reports were sent to the laboratories for the first time, this was not (yet) a closed quality loop. Therefore, although our results are encouraging, the full potential of these feedback reports is still unknown. As overall grading variation remains substantial, it seems worthwhile to monitor this by continuing with feedback reports. Closing the quality loop and further training of pathologists, for example, by e-learning, may help to further decrease grading variation and improve clinical decision making and thereby the outcome of our patients.
Take home messages
Substantial, and clinically relevant, grading variation of invasive breast cancer exists in daily pathology practice.
To create awareness and to facilitate quality improvement, feedback reports, containing case-mix-adjusted laboratory-specific grades benchmarked against other laboratories, were sent to the individual pathology laboratories by 1 March 2018.
An encouraging decrease in nationwide Dutch grading variation was observed within the year after laboratory-specific feedback reports.
References
Footnotes
Handling editor Cheok Soon Lee.
Contributors CvD, PJvD, IOB, EvdW, IAD: involved in the design of the study; read and revised the paper, and agreed with the final version of the paper. CD and IAD: data collection. CvD: analysed data; wrote the paper. CvD, PJvD, EvdW, IAD: data interpretation. PJvD: obtained funding.
Funding This research was funded by the Quality Foundation of the Dutch Association of Medical Specialists (SKMS).
Competing interests None declared.
Patient consent for publication Not required.
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement Data may be obtained from a third party and are not publicly available. Our data are derived from PALGA, the nationwide network and registry of histo- and cytopathology in the Netherlands. Data are available upon reasonable request by researchers. More information can be found at: https://www.palga.nl/en/data-requests/researchers.html.