Background/Aims—Grading of Helicobacter pylori induced atrophic gastritis using the updated Sydney system is severely limited by high interobserver variability. The aim of this study was to set up a quantitative test of gastric corpus mucosal atrophy in tissue sections and test its reproducibility and correlation with the Sydney scores of atrophy.
Method—Mucosal atrophy was assessed in 124 haematoxylin and eosin stained corpus biopsy specimens by two experienced gastrointestinal pathologists (EB, JL) according to the updated Sydney system as none (n = 33), mild (n = 33), moderate (n = 33), or pronounced (n = 25). In each specimen, the proportions of glands, stroma, infiltrate, and intestinal metaplasia in the glandular zone were measured as volume percentages using a point counting method. The optimal point sample size, intra-observer and interobserver reproducibility, discriminative power for degrees of atrophy, and correlations with H pylori status were evaluated.
Results—Counting 400 points in 200 fields of vision provided the smallest sample size that still had excellent intra-observer and interobserver reproducibility (r ≥ 0.96). Overall, the volume percentage of glands (VPGL), infiltrate (VPI), and stroma (VPS) correlated well with the Sydney scores for atrophy (p ≤ 0.003). However, no differences were found between non-atrophic mucosa and mild atrophy. No correlation was found between age and either the Sydney grade of atrophy or the VPGL or VPS. In non-atrophic mucosa and mild atrophy, H pylori positive cases showed a significantly higher VPI than did H pylori negative cases. A lower VPGL was seen in H pylori positive cases than in H pylori negative cases in the mild atrophy group. VPS did not correlate with H pylori status within each grade of atrophy.
Conclusion—Point counting is a powerful and reproducible tool for the quantitative analysis of mucosal corpus atrophy in tissue sections. These data favour the combination of “none” and “mild” atrophy into one category, resulting in a three class grading system for corpus atrophy, when using the updated Sydney system.
- gastric atrophy
- Sydney classification
- point counting
Statistics from Altmetric.com
It has generally been accepted that the intestinal type of gastric cancer arises from a sequence of biological events reflected by a spectrum of histological changes,1 starting with chronic inflammation of the gastric mucosa, usually caused by colonisation with Helicobacter pylori. This can lead to mucosal gland loss or atrophic gastritis, which might precede the development of intestinal metaplasia, dysplasia, and cancer.
For a better understanding of this process and an adequate diagnosis in individual patients, various classification systems of gastritis have been used,2–5 which, in an attempt at international standardisation, led to the introduction of the Sydney system in 1991. This system defined atrophy as “loss of specialised glands from either the antrum or corpus” and graded this condition as no atrophy, mild, moderate, and pronounced atrophy (four grade scale),6 instead of the three grade scale that had been proposed previously.4 In response to serious criticism, among others with respect to reproducibility of atrophy scores, an updated Sydney system was devised,7 which included a visual analogue scale for the grading of inflammation, atrophy, and metaplasia. However, even with these improvements, the system still suffers from a rather high interobserver variability.8
Therefore, the aim of our study was to design a rapid, reproducible, quantitative method for the assessment of gastric atrophy in tissue sections of corpus biopsies.
Materials and methods
Patients referred for upper gastrointestinal (GI) endoscopy were studied by means of Olympus GIF100 and 1T100 endoscopes. We aimed to acquire approximately equal groups of each category (no atrophy, mild, moderate, and pronounced atrophy). During upper GI endoscopy, four biopsies were taken from the corpus and two from the antrum with standard biopsy forceps. Because the grading of atrophy according to the Sydney system is much more reproducible in corpus biopsy specimens than in antrum specimens, only corpus biopsy specimens were measured. After standard tissue processing, including fixation in formaldehyde, 4 μm thick tissue sections were cut and stained with haematoxylin and eosin (H&E), after which the slides were scored by two experienced GI pathologists for degrees of inflammation, atrophy, intestinal metaplasia, and H pylori density according to the updated Sydney system.7 In the case of a disagreement the pathologists discussed the case to reach consensus.
We obtained biopsy specimens from 99 consecutive patients with non-atrophic (n = 33), mild (n = 33), and moderately (n = 33) atrophic corpus mucosa. Subsequently, we obtained 25 samples from patients with pronounced atrophy. Consequently, a total of 124 cases (71 men, 53 women; mean age, 55; SD, 14 years) was studied. Four patients in the pronounced atrophy group were known to have pernicious anaemia. In addition, six other patients showed pronounced atrophy of the corpus but not of the antrum, also suggesting a diagnosis of pernicious anaemia. However, this diagnosis was not confirmed by the presence of antiparietal cell antibodies.
Volume percentages of glands (VPGL), stroma (VPS), infiltrate (VPI), and intestinal metaplasia (VPIM) were measured using a point counting technique.9 For this purpose, the stereology module of an interactive video overlay microscopic measuring system (QPRODIT, software version 6.1; Leica, Cambridge, UK) equipped with an automated motorised scanning stage was used. Using a ×40 objective (numerical aperture, 0.75) the microscopic image was recorded by a CCD camera, mounted on top of a standard microscope, and displayed in full colour on the monitor of the computer. A two point Weibel test grid was displayed as an overlay on top.10,11 The point distance of the test grid was 54.7 μm.
First, the most atrophic region was selected from the four biopsy specimens in every case. Then, in this region, the glandular area of interest was demarcated on the slide with a marker, avoiding areas that were completely tangentially cut or contained artefacts. We first determined the optimal number of fields of vision (FOVs) to count in a pilot study as described below. In subsequent experiments, this predetermined number of FOVs was selected from the demarcated area of interest by the computer using systematic random sampling.
Next, all FOVs were consecutively screened with an overlying two point test grid.11 A score was taken for each point of the test grid, whether it hit normal glandular epithelium or gland lumen, stroma, or inflammatory cells (lymphocytes, plasma cells, or granulocytes) in either the lamina propria or the glands, or intestinal metaplastic glands (recognised by the presence of goblet cells, brush border, or Paneth cells). After completing the counts in all FOVs, the volume percentage of each tissue component (VPGL, VPS, VPI, and VPIM) was determined.
The quantitative characteristics (VPGL, VPS, VPI, and VPIM) were correlated to the Sydney scores of atrophy, and in addition, age and H pylori status. For statistical analyses, the Student's t test and Mann-Whitney U test were used when applicable. A two sided p value < 0.05 was considered significant. For evaluation of the reproducibility, scatter plots were used and Pearson's correlation coefficient (r) was calculated.
In a pilot study, the optimal sample size was determined. Seven slides were measured four or five times, counting 100, 200, 400, 600, and 800 FOVs, respectively. The VPGL remained constant if at least 400 grid points were counted in 200 FOVs (fig 1). After demarcating the area with the highest degree of atrophy, the measurement of 400 points/case took an average of 10 minutes. Twenty cases, evenly representing all four degrees of atrophy according to the updated Sydney classification (0–3), were measured twice by observer 1 (NvG) and once by observer 2 (MW), counting 200 FOVs for each specimen. Intra-observer and interobserver reproducibility of VPGL, VPS, VPI, and VPIM was high (r ≥0.96; fig 2A–D).
Volume percentages of the respective tissue components in 124 cases were correlated with the Sydney scores (fig 3A–D). The mean volume percentage of glands (VPGL) decreased with increasing Sydney scores for atrophy from 71.9% (range, 55.1–81.7%) in non-atrophic mucosa to 68.7% (range, 41.9–81.7%), 53.3% (range, 31.5–74.5%), and 30.5% (range, 7.9–53.3%) in mild, moderate, and pronounced atrophy, respectively. The VPI showed an increase from 2.7% (range, 0.4–6.4%) in non-atrophic mucosa to 3.4% (range, 0.5–11.0%), 8.2% (range, 1.6–23.3%), and 11.8% (range, 4.8–23.1%) in mild, moderate, and pronounced atrophy, respectively. At the same time, the VPS increased from 25.4% (range, 16.5–40.7%) in non-atrophic mucosa to 27.9% (range, 16.2–47.1%), 38.4% (range, 23.4–53.0%), and 51.7% (range, 26.4–64.0%) in mild, moderate, and pronounced atrophy, respectively. Intestinal metaplasia (IM) was not present in non-atrophic or mildly atrophic mucosa and was found only once in moderate atrophy (VPIM, 3.1%), whereas IM occurred in 48% of cases with pronounced atrophy (mean VPIM, 5.5%; range, 0–53.3%). p Values for differences in VPGL, VPS, and VPI between atrophy grades 0 and 1 were not significant, but for grades 1 and 2 and grades 2 and 3, differences were significant, with p ≤ 0.003 (Student's t test) for all three parameters (fig 3A–D). VPIM was significantly higher in pronounced atrophy compared with moderate atrophy (p = 0.01, Student's t test).
Within the pronounced atrophy group, cases in which IM was present showed a lower mean VPGL than cases without IM (25.3% v 33.7%, respectively), but this difference did not reach significance (p = 0.1, Mann-Whitney U). No difference was found between these groups with respect to VPS (51.1% v 54.1%, respectively; p = 0.4). Figure 4 shows the mean cumulative percentages of glands, stroma, infiltrate, and intestinal metaplasia for Sydney grades of atrophy.
No correlation was found between VPGL and age (r = 0.007, p = 0.9).
After stratifying all groups into H pylori positive and negative cases (non-atrophic mucosa, n = 18 v n = 15; mild atrophy, n = 17 v n = 16; moderate atrophy, n = 23 v n = 10; pronounced atrophy, n = 11 v n = 14), mean VPGL was significantly lower in H pylori positive than in H pylori negative cases for the mild atrophy group (66% v 72%, p = 0.004). For non-atrophic mucosa, and for moderate and pronounced atrophy, no significant differences were found (70% v 74%, p = 0.08; 52% v 57%, p = 0.3; 29% v 31%, p = 0.7, respectively; Mann-Whitney U test; fig 5A; table 1). For every atrophy group, the mean VPI in the H pylori positive cases was higher than in the H pylori negative cases (3.4% v 1.9%, p = 0.002; 4.6% v 2.1%, p = 0.001; 9.2% v 5.8%, p = 0.06; 12.6% v 11.3%, p = 0.4, respectively; Mann-Whitney U test; fig 5B). VPS did not differ significantly with respect to H pylori status for each category, and neither did VPIM differ between H pylori positive and negative cases in the pronounced atrophy group.
The most recent approach to grading the different histopathological features in gastritis is the updated Sydney system, which includes visual analogue scales for the scoring of inflammation, H pylori, atrophy, and metaplasia.7 This system produces a good interobserver agreement for H pylori density, polymorphonuclear neutrophil activity, density of mononuclear cells, and the amount of intestinal metaplasia. However, the assessment of glandular atrophy has remained difficult, with poor interobserver agreement.8,12 This lack of agreement is caused primarily by difficulties in the scoring of gland loss in the presence of mucosal inflammation.
The problems of subjective classification have long been recognised and have led to several approaches for the quantitative assessment of gastric atrophy in tissue sections of gastric biopsy specimens. A first approach, the measurement of gastric mucosal thickness,13–15 yielded no information on the composition of the gastric mucosa and was never tested for its discriminative power to distinguish different Sydney grades of atrophy. Furthermore, this method requires strictly tangentially cut tissue sections, which makes it a difficult method for the routine assessment of gastric mucosal atrophy. Separate measurements on volume densities of parietal cells, chief cells, and mucous neck cells and their ratios have also been used,16,17 but this method may require additional staining of different epithelial cells, and it has also not been tested for a correlation with Sydney scores.18 Syntactic structure analysis, a technique that measures the arrangement of different components in tissues, has also been used.19 With this approach, significant differences between atrophy grade I and grade III were found, but this method could not differentiate between the other Sydney grades. Some of these techniques are complicated and others are tedious to perform and none of these approaches has led to routine or more widespread research application. In our study, we used the point counting method to develop a reproducible, quantitative, and objective system for the grading of atrophy of the gastric mucosa for all Sydney scores. Point counting has proved to be a reliable method to measure volume percentages of tissue components in several organs.10,20,21 Measuring volume percentages of tissue components other than the glandular epithelium, such as stroma and inflammation, has the advantage that the results show that loss of one of these components (such as epithelium) leads to replacement by others. For the purpose of this correlation study, the system was set up for corpus biopsy specimens only, because grading of corpus atrophy is less ambiguous than that of antral atrophy.
The quantitative measurements were correlated to a “gold standard” of Sydney scores set by two experienced GI pathologists. The differences in VPGL, VPS, and VPI between the non-atrophy and the mild atrophy group were negligible, but VPGL, VPS, and VPI did show a strong discriminative power for the other degrees of atrophy (fig 2). These results suggest a three class scale for grading corpus atrophy discriminating non-atrophic mucosa (Sydney grades 0 and 1), low grade (Sydney grade 2), and high grade (Sydney grade 3) atrophy rather than the current four class grading system. Volume percentages of glands of 44% and 63% would be the best cut off values to distinguish the three groups, because these are the values with the highest sensitivity and specificity for discriminating between non-atrophic mucosa, low grade atrophy, and high grade atrophy, when using our results (fig 6). Of course, a new classification based on these cut off values needs to be correlated with clinical follow up.
Intestinal metaplasia did not occur in the groups with no atrophy or mild atrophy, but was present in 48% of patients with pronounced atrophy and once in a patient with moderate atrophy. This fits with the hypothesis that intestinal metaplasia is a more advanced stage than atrophy in the histological sequence that leads to gastric cancer.1 This is in agreement with previous cohort studies showing that gland loss and metaplasia occur as consecutive steps.22,23 In our Dutch study population, atrophy almost invariably was high grade if intestinal metaplasia was present.
The influence of H pylori on the density of the different tissue components was evaluated also. Helicobacter pylori positive and negative cases were present in all groups of atrophy. In infected patients, the numbers of inflammatory cells were higher than in non-infected patients, which is not surprising because H pylori causes inflammation in almost all colonised patients. It has been suggested previously that in the presence of H pylori infection, inflammatory cells and oedema separating the glands would cause a false impression of atrophy.24 Our data show that although a limited proportion of tissue is occupied by inflammatory infiltrate, the increase in stroma was much higher in every category of atrophy (table 1; fig 4). Our present approach does not allow a reliable assessment of the effect of oedema on the stroma component, so that a higher VPS may be seen in the presence of inflammation. An answer to this question may come from the effect of H pylori eradication on VPS and other parameters. These studies are presently ongoing.
In summary, point counting is a powerful tool for the assessment of body type gastric mucosal atrophy, irrespective of H pylori status, and it meets all the requirements for a reliable test. Test results are highly reproducible, there is a good correlation with the Sydney scores, and the test is easy and rapid to perform.
This work was funded by Glaxo Wellcome BV, the Netherlands; Glaxo Wellcome Ltd, UK; and the Netherlands Foundation of Gastrointestinal and Liver Diseases.