Background: The grade of dysplasia found in Barrett’s oesophagus surveillance biopsies is a major factor to determine follow up and treatment. However, it has been reported that the reproducibility of the grading system is not optimal.
Aims: To compare routine and expert dysplasia grades in Barrett’s oesophagus surveillance biopsies. To evaluate prospectively morphometrical grading support and to assess the pitfalls in its daily application.
Methods: Consecutive biopsies (n = 143) were graded routinely by experienced general surgical pathologists as no dysplasia (ND), indefinite for dysplasia, low grade dysplasia (LGD), and high grade dysplasia (HGD). Two expert gastrointestinal pathologists blindly reviewed all sections. The stratification index of nuclei, mean nuclear area, and Ki67area% were assessed routinely according to a strict protocol. With these features, the previously described morphometrical grade was calculated for each case. The grades provided by the experts, surgical pathologists, and morphometry were compared.
Results: The general pathologists graded many more cases as dysplastic than did the experts. Complete agreement between the experts’ grades and the original grades was 50 of 143 (35%). Sixty four of the 71 original LGDs and 11 of the 23 original HGDs were downgraded by the experts, whereas one LGD was upgraded. In 93 of the 143 biopsies, at review pitfalls or special characteristics of a technical nature (tangential cutting, severe inflammation, ulcer or the squamocylindrical junction very close by, among others) were seen in the part of the biopsy marked as diagnostic. These probably contributed in part to the original overdiagnoses and could have been prevented or corrected. The morphometrical grading model has not been developed to compensate for this; application of the current morphometrical grading method is not allowed and may result in erroneous (usually too high) morphometrical grades. In spite of this, all HGDs according to the experts were recognised as such by morphometry, also in these technically less adequate sections or areas. However, 46% of the experts’ downgrades occurred in technically adequate sections and thus were caused by a difference in interpretation. Here, morphometrical support proved to be useful because, in agreement with the experts, it downgraded 51% of the original LGDs, upgraded one of eight NDs to LGD and one of 39 LGDs to HGD.
Conclusions: Experts downgraded a high proportion of biopsies graded as LGDs and HGDs by the surgical pathologists. Morphometrical grading can be used for daily quality control; the results were close to those of the experts and corrected a large number of cases erroneously graded by surgical pathologists.
- Barrett’s oesophagus
- diagnostic support
- GI, gastrointestinal
- HGD, high grade dysplasia
- H&E, haematoxylin and eosin
- LGD, low grade dysplasia
- MNA, mean nuclear area
- ND, no dysplasia
- PBS, phosphate buffered saline
- SCJ, squamo columnar junction
- SI, stratification index of nuclei
Statistics from Altmetric.com
- GI, gastrointestinal
- HGD, high grade dysplasia
- H&E, haematoxylin and eosin
- LGD, low grade dysplasia
- MNA, mean nuclear area
- ND, no dysplasia
- PBS, phosphate buffered saline
- SCJ, squamo columnar junction
- SI, stratification index of nuclei
Normally, the oesophagus is covered by non-keratinising squamous epithelium. Intestinal-type cylindrical (columnar) epithelium in the oesophagus (Barrett’s oesophagus) is a pre-malignant condition.1 Endoscopical surveillance biopsies are generally recommended to prevent cancer by the detection of non-invasive precursors. This is not always possible and sometimes invasive carcinoma is detected in a surveillance biopsy. However, even then the favourable effect of surveillance is clear because such cancers have a better prognosis than incidentally detected cancers in non-surveyed patients.2 The grade of dysplasia has prognostic value.3 As a result, therapeutic and follow up decision schemes for the surveillance of Barrett’s oesophagus are guided by the grade of dysplasia of the endoscopical biopsies, but subjective grading is prone to observer variation, even between experts.4–6 However, a recent evaluation showed fair to good agreement,7 although this is probably too optimistic an impression for the pathology community in general, because the participants of this study were specialised gastrointestinal (GI) pathologists. The results for general surgical pathologists (who perform most Barrett’s oesophagus evaluations and gradings) may not be as good. For instance, one reproducibility study between GI pathologists showed an interobserver difference in grade in 21% of the cases. Moreover, repeated grading of the consensus cases one year later showed an intra-observer variation of 12%.6
“Therapeutic and follow up decision schemes for the surveillance of Barrett’s oesophagus are guided by the grade of dysplasia of the endoscopical biopsies, but subjective grading is prone to observer variation, even between experts”
Morphometrical analysis can provide objective support in the grading of dysplasia of different organ tracts (for a general overview see Baak8, for special applications to the GI tract see Jarvis and Whitehead, Tosi P et al, Meijer et al, and Hamilton and colleagues9–12). Two previous publications have shown that when grading dysplasia in oesophageal resection specimens6 and surveillance biopsies from patients with Barrett’s oesophagus,13 computerised morphometrical analysis is useful to obtain a more objective grade. The morphometrical grading model can correct the subjective grades, even of specialised GI pathologists,6 and thus can be used in principle for daily quality control and quality assurance. The quantitative features stratification index of nuclei (SI), mean nuclear area (MNA), and area% of Ki-67 positive nuclei in glands are the important morphometrical features for grading. The analysis of p53 did not add to the discriminating power of these features6 and was more variable and less contrast rich than Ki-67. This resulted in low reproducibility of p53 (JPA Baak, unpublished results, 2002).
Thus, the results of the two studies mentioned are promising, particularly because standard histological sections can be used, the methods are easy to perform, and they do not require expensive or complicated equipment. However, a formal prospective comparison of consecutive unselected Barrett’s oesophagus biopsies graded by experts and general surgical pathologists to assess (lack of) reproducibility in grading has not been carried out. Moreover, the morphometrical grading model described by Sandick and colleagues13 for Barrett’s oesophagus biopsies has not been tested in an independent prospective study with regard to its practical feasibility and value as a diagnostic support tool. Therefore, we set out to study these topics.
MATERIALS AND METHODS
We studied 143 consecutive oesophageal biopsy specimens from a surveillance program, routinely diagnosed as Barrett’s oesophagus by experienced surgical pathologists. In addition to the normal histological examination, all consecutive cases were studied routinely with morphometrical methods within one week after the biopsy was taken. Immediately after the endoscopical biopsy, the material was routinely fixed in 4% buffered formaldehyde for up to 72 hours (usually 18–24 hours), dehydrated, and embedded in paraffin wax. Four micrometer thick sections were cut and stained with haematoxylin and eosin (H&E). These were used for grading and morphometrical assessment of SI and MNA (see below).
Grading of dysplasia
On routine histological examination, the biopsies were graded by several experienced general surgical pathologists as no dysplasia (ND), indefinite for dysplasia, low grade dysplasia (LGD), or high grade dysplasia (HGD). These routine diagnoses will be referred to as “the original diagnosis”. At the beginning of our study, the pathologists (who were all experienced) stated that they felt comfortable with grading Barrett’s oesophagus biopsies and, therefore, no special appointments were made as to grading. The pathologists all received the original publications of Sandick and colleagues13 in which the morphometrical methods, results, and inclusion/exclusion criteria for morphometrical analysis were described (table 1). Moreover, instruction was given both to the pathologists and the technicians who performed the measurements, so that there was no doubt that the possibilities and limitations of the method were clear. It was also explicitly agreed that the pathologists should carefully mark with a black marker the diagnostic area on the section of the biopsy, where the measurements had to be done. The applicability of morphometry thus remained the responsibility of the pathologist who signed out the report. At the end of the intake period, the cases were independently graded by two of us (FK and GO) with a special GI pathology interest (these diagnoses are called the “experts’ diagnoses”). In addition to the histopathological revision, we checked at review whether the samples were adequate for morphometrical analysis according to the criteria listed in table 1.
Paraffin wax embedded sections of 4 μm thickness, adjacent to the H&E section used for the grade assessment, were mounted on to SuperFrostPlus slides (Menzel-Glaeser, Braunschweig, Germany) and dried overnight at 37°C. The sections were dewaxed in xylene and rehydrated in alcohol. After rehydration, the endogenous peroxidase activity was blocked by 3% H2O2 in phosphate buffered saline (PBS) for five minutes. The sections were immersed in sodium citrate buffer (0.1M; pH 6.0) and heated at 1000 W for two minutes and at 160 W for 15 minutes in a microwave oven. Before immunostaining, sections were soaked quickly in PBS (pH 7.4). A three step strepatavidin–biotin complex method was used. Sections were first incubated with the polyclonal antibody Ki-67 (Dako, Glostrup, Denmark) at a dilution of 1/100 in PBS with 1% normal swine serum for 30 minutes. Subsequently, sections were incubated with biotinylated swine antirabbit antibody (Dako) at a 1/600 dilution in PBS for 30 minutes, followed by the third incubation step with a streptavidin–biotin–peroxidase complex (Dako), at a 1/100 dilution for 30 minutes. Visualisation of the complex was with diaminobenzidine/H202 for 10 minutes at room temperature. Two washes in PBS came after each incubation step. The counterstaining was with Mayer’s haematoxylin. Finally, sections were dehydrated in graded ethanol and xylene and mounted with DPX (Nustain, Nottingham, UK).
Quantitative image analysis
As described above, it is the responsibility of the diagnosing pathologist to delineate carefully the specific diagnostic area to be measured. The technicians who did the measurements were unaware of the pathologist’s grade. They were instructed not to accept a section for Barrett’s oesophagus measurement that had not been marked by the pathologist who requested the morphometrical grading. The morphometrical methods have been described in detail previously,13 and a short description follows here. The SI and the MNA were assessed in the standard H&E section used for the routine diagnostic procedure. The area% of Ki-67 positive (versus negative) glandular nuclei was determined in standard Ki-67 immunostained sections. Using these measurements, the cases were classified as ND, indefinite for dysplasia, LGD, or HGD based on the morphometrical model developed previously.13
To investigate possible measurement shifts over time, time trend and regression analysis were performed. Tables were made to compare the expert, original, and the morphometrical model grades and the Spearman statistic was calculated. Descriptive statistics (mean, standard deviation, minimum, maximum) and two and three dimensional scatter plots were made of the quantitative features for the different grades. Tests for normality were carried out and, because all the features showed a non-normal distribution, the non-parametric Mann-Whitney test was used to compare the means of the different grades. These statistical analyses were performed using SPSS 8.0 (SPSS Inc, Chicago, Illinois, USA).
The general pathologists graded many more cases as dysplastic than did the experts. Complete agreement between the grades given by the experts and the original grades was seen in only 50 of 143 biopsies (35%) (table 2). Sixty four of the 71 original LGDs were downgraded by the experts, in addition to 11 of the 23 original HGDs. There were two causes for these overly high grades given by the general surgical pathologists, namely: (1) diagnostic pitfalls/less adequate sections in 50 of 93 biopsies and (2) and differences in the interpretation of adequate sections without clear diagnostic pitfalls in 43 biopsies.
With regard to the first cause, in 79 of all biopsies and in 50 of the 93 cases that were downgraded, at review certain characteristics (tangential cutting, severe inflammation, ulcer, among others) were seen in the biopsy (or in the part marked as diagnostic), which probably contributed to the original grade being too high. Table 3 gives a summary of the experts’ comments about these special characteristics in relation to the type of downgrading. Most of these aberrantly high gradings could easily have been prevented. For example, in 16 samples that were downgraded, the area marked by the pathologists consisted exclusively of glands immediately adjacent to the squamous epithelium (squamo–columnar junction; SCJ). These glands, when normal, often have a higher mitotic activity and larger “immature” nuclei than glands at some distance from the SCJ. As a result, they can be overdiagnosed as LGD if this “contextual geographical information” about the squamous epithelium nearby is not taken into account. For morphometrical analysis, glands should be selected at some distance from the SCJ (as indicated in table 1). Other pitfall causes could have been corrected (for example, tangentially cut sections (fig 1); deeper cuts from the paraffin wax blocks in such biopsies may reveal the deeper crypts more clearly). Severe inflammation and ulcer were present in 15 and 10 additional cases, respectively, which led the expert to grade the biopsy as ND, LGD, or indefinite for dysplasia, whereas the general surgical pathologists graded most of these cases as LGD, and two were even graded as HGD. The current morphometrical model from Sandick et al does not compensate for this and may result in the calculation of falsely positive, aberrantly high dysplasia grades. As a result, in these and other incorrectly marked biopsies and areas, considerable overlap in the morphometrical features was found. However, it is important that in spite of the inappropriate use of the morphometrical method in these biopsy areas, HGDs could still be discriminated from NDs and LGDs in all of the 143 biopsies (fig 2).
The second important cause of overdiagnosis (43 of the 93 downgrades) was differences in the interpretation of the microscopical image in adequate sections. Technically, there was no special characteristic in these sections that could explain the aberrantly high original grade. Here, morphometrical grading support is useful because in these 64 adequate sections morphometrical grades downgraded 20 of 39 (51%) and upgraded one of 39 (3%) of the original LGDs but none of the HGDs was downgraded (table 4), in agreement with the experts’ grades. Figure 3 illustrates the classification of the original grades in relation to the morphometrical features in the 64 adequate biopsies, together with the decision lines (according to Sandick et al) between the three grade categories. Note the much better discrimination in this set compared with fig 2. Cases of LGD are more likely to be downgraded by morphometrical grading (compare fig 2 and table 4). Note also that the LGD that is classified by morphometry as HGD (in agreement with the experts, arrow in fig 3) is close to the border (decision line) of LGD and HGD but still clearly HGD with morphometry. Thus, the lesion looks like LGD and an underdiagnosis by subjective analysis in this case is understandable. However, measurements can reveal the subtle quantitative differences that characterise the case as HGD. Two and three dimensional scatter plots of the SI, Ki67 area%, and MNA allow this case to be analysed further and show that it does belong to the HGD grade subset, as indicated by the morphometrical grading model (fig 4) and confirmed by the experts’ review. Table 5 compares the experts’ and morphometrical grades in the adequate sections. The overall agreement is 75%, and all NDs and HGDs are correctly classified. The experts downgraded several morphometrical LGD cases, but not as many as the original subjective LGD grades. It is interesting that most of these downgraded morphometrical LGDs were close to the decision line between ND and LGD.
The agreement between the experts’ review and the original grades in Barrett’s oesophagus surveillance biopsies was low. This is not surprising because similar results have been obtained in many other areas.8 Such differences in grade could have resulted in considerable differences in follow up and treatment, which are often determined by the Barrett’s oesophagus dysplasia grade. Two previous publications have shown that in dysplasia grading of Barrett’s oesophagus, computerised morphometrical and immunocytometrical analysis are useful to obtain a more objective grade, both in oesophagus resection specimens and surveillance biopsies.6,13 The retrospective study on surveillance biopsies13 also concluded that morphometrical grading seems feasible in adequate Barrett’s oesophagus biopsies. However, in the routine daily practice of a general pathology laboratory many factors may disturb the “ideal” situation of a research project, such as small variations in tissue processing, sectioning, selection of areas for analysis, and also the measurement itself. Thus, the most important question was how the quantitative methods would perform in a truly prospective environment—the “real world”—when general surgical pathologists would do the selection of diagnostic areas for analysis and four independent technicians (who had not been involved in the original research studies) would perform the measurements according to the morphometrical protocol guidelines.
The results make it clear that the routine use of morphometrical grading of Barrett’s oesophagus biopsies is not only feasible but also practically useful. In agreement with the experts, morphometrical grading downgraded many original LGDs and HGDs. The sections were often suboptimal. In spite of this, morphometry recognised all the cases that were classified as HGD by the experts. Recognition of NDs and LGDs by morphometry also works reasonably well, but here it is more important that the measurement protocol and histological inclusion/exclusion criteria are carefully followed. Unfortunately, this was not the case in many of the 143 biopsies. In several biopsies offered by the pathologists for morphometrical grading one or more exclusion criteria were found at review. It is probable that the overdiagnosis by the general surgical pathologists in 54% of the biopsies was caused in part by these diagnostic pitfalls and many could have been solved or prevented beforehand—for example, demarcation exclusively of glands in direct contact with the squamous epithelium of the SCJ. In such cases, grading and morphometry should be done in glands that are not in direct contact with the squamous epithelium, because morphometrical analysis in the glands immediately neighbouring the squamous epithelium can give too high a grade. This is not a pitfall restricted to morphometry; it also holds for the grading of Barrett’s oesophagus biopsies in general. However, pathologists can include this contextual information, as they should, whereas the morphometrical grading system has not been developed for this (although it is technically possible to do this). Tangential cutting was another problem that could have been prevented or solved before the grade was assessed (fig 1); 16 tangentially cut biopsies were downgraded. Thus, in 68 of the biopsies, the technical problems were the result of inaccuracies that could easily have been prevented, but in the other 25 (17%) of the 143 biopsies, morphometrical grading should not have been performed because of technical limitations. Sandick et al found that in 39% of their sections morphometrical grading was not possible, although that study was a retrospective one on biopsies from many years ago.
Take home messages
General surgical pathologists overgraded a high proportion of Barrett’s oesophagus biopsies
Thus, expert gastointestinal pathologists downgraded many of the biopsies routinely graded as low grade dysplasia (LGD) and high grade dysplasia (HGD)
The cause of overdiagnosis was technical in nature in some cases (which could have been avoided or corrected), but in other cases it was the result of differences in interpretation
The results of morphometrical grading were close to those of the experts and corrected a large number of cases erroneously graded by surgical pathologists
Therefore, morphometrical grading could be used for as an adjunct to daily quality control
The question of course is: what is the value of a laboratory test that excludes 17% of all biopsies? We have considerable experience in Barrett’s oesophagus biopsy grading but still feel that morphometrical support is a valuable adjunct that regularly helps in coming to a definitive diagnosis (in agreement with Polkowski and colleagues6). It is unacceptable that patient treatment should be dependent on a subjective grading method alone, which shows considerable lack of observer reproducibility.
“The routine use of morphometrical grading of Barrett’s oesophagus biopsies is not only feasible but also practically useful”
Figures 3–4 illustrate how morphometrical analysis can be used routinely in a diagnostic practice as a quality control method. The case that differs between the original and morphometrical grading (LGD by original and HGD by morphometrical grade, the latter in agreement with the experts’ grade), is not only at one side of the diagnostic decision line of fig 3, but is in the HGD cluster at three dimensional scaling and also has an MNA value that is in the HGD range (fig 4). Such illustrations can be used in daily routine surgical practice for visual quality control of the dysplasia grade.
Alternative approaches for a better definition of LGD and HGD have been investigated.13 The tumour suppressor genes p53, p16, and APC are often deleted in adenocarcinomas derived from Barrett’s oesophagus, and the homozygous deletion of the p16 gene is often seen. Mutations of these genes are also found in these adenocarcinomas.14 However, the absence of easily detectable and sensitive genetic alterations in normal Barrett’s oesophagus and the low grade dysplastic epithelia suggest that mutations at these genes develop later in the progression from Barrett’s oesophagus to adenocarcinoma. Moreover, in our experience p16 immunohistochemical staining results are highly variable. Compared with these and other supportive methods, such as DNA and p53 flow cytometry, the advantage of the current morphometrical methods is that standard H&E and Ki-67 stained histological tissue sections can be used. Because these are available in all pathology laboratories, the methods are not only feasible but also easily applicable. Moreover, the application of morphometrical methods reduces patient discomfort considerably and is relatively inexpensive, certainly when compared with the health care costs of unecessary follow up, endoscopy, and eventual oesophagus resection. The technician time required for morphometrical analysis is limited (approximately 25 minutes for each case if the sections are adequate). The total costs for one determination, including working time, capital depreciation etc marginal, in the Netherlands is approximately €50 (not including the marginal costs for a Ki-67 section and the pathologist’s time). Moreover, the measurement techniques are user friendly. The most important factor in the daily application of morphometrical grading is keeping to the biopsy inclusion/exclusion criteria protocol, which is the responsibility of the pathologist.
These results allow us to conclude that morphometrical grading of Barrett’s oesophagus biopsies can be used routinely as an adjuvant diagnostic support method.
This study was made possible in part by grant 02-203 of the Stichting Bevordering Diagnostische Morfometrie. We thank the many pathologists who have diagnosed the Barrett’s oesophagus biopsies and all the technicians who did the measurements.