Background and Aims Coeliac disease (CD) diagnosis requires the detection of characteristic histological alterations of small bowel mucosa, which are prone to interobserver variability. This study evaluated the agreement in biopsy interpretation between different pathology practice types.
Methods Biopsies from community hospitals (n=46), university hospitals (n=18) and commercial laboratories (n=38) were blindly assessed by a pathologist at our institution for differences in histopathology reporting and agreement in diagnosis of CD and degree of villous atrophy (VA) by κ analysis.
Results Agreement for primary diagnosis was very good between this institution and university hospitals (κ=0.888), but moderate compared with community hospitals (κ=0.465) or commercial laboratories (κ=0.419). Diagnosis differed in 26 (25%) cases, leading to a 20% increase in CD diagnosis after review. Among those diagnosed with CD by both institutions (n=49), agreement in degree of VA was fair (κ=0.292), with moderate agreement between the authors and commercial laboratories (κ=0.500) and fair with university hospitals (κ=0.290) or community hospitals (κ=0.211). The degree of VA was upgraded in 27% and downgraded in 2%. Within different Marsh score categories, agreement was poor (κ<0.0316) for scores 1 and 2, both missed at other centres, and fair or moderate for scores 3a and 3b. Information regarding degree of VA and intraepithelial lymphocytosis was lacking in 26% and 86% of reports and non-quantifiable descriptors, eg, ‘blunting’ or ‘marked atrophy’ were prevalent.
Conclusions CD-related histological changes are underdiagnosed in community-based hospitals and commercial pathology laboratories. Because incorrect biopsy interpretation can cause underdiagnosis of CD, greater CD awareness and uniformity in small bowel biopsy reporting is required among pathologists.
- Coeliac disease
- interobserver agreement
- lymph node pathology
- Marsh score
- small bowel biopsy
- small intestine
- villous atrophy
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
- Coeliac disease
- interobserver agreement
- lymph node pathology
- Marsh score
- small bowel biopsy
- small intestine
- villous atrophy
Coeliac disease (CD) is a multisystem disorder characterised by increased intraepithelial lymphocytes (IEL), crypt hyperplasia and villous atrophy (VA) of the small bowel mucosa.1 Biopsy findings are the current gold standard for the diagnosis of CD,2 therefore it is essential that the biopsy interpretation be accurate and reproducible among pathologists in different countries and across all types of practice settings.
The Marsh scoring system, which was primarily developed as a morphometry-based investigative approach to study small bowel mucosal abnormalities in a variety of diseases, was widely adapted for the semiquantitative evaluation of histological changes in CD.3 It proved useful in standardising criteria for the diagnosis of CD and in monitoring healing of the small intestinal lesion. A major feature of this classification was the requirement for description of the degree of VA, IEL infiltration and crypt hyperplasia. In 1999, Oberhuber et al4 published a modification of the Marsh classification, to describe small bowel histopathological alterations in CD patients. This was a first step in simplifying the Marsh scoring system, encouraging its use in routine small biopsy pathology reporting.
While the modified Marsh–Oberhuber classification system has been used in most published studies, its prevalence in routine clinical practice is unknown. Only a few studies have assessed the interobserver agreement in interpretation of small bowel biopsies for diagnosing CD,5–7 but none in the USA. In addition, it is estimated that the vast majority (>90%) of patients with CD in the USA currently remain undiagnosed.8 9 Inadequate number of biopsies taken at endoscopy,10 as well as a general lack of physician awareness,11 are contributing factors for the low diagnosis rate; however, another possible explanation could be the failure of pathologists to recognise the histological features of CD. The aim of this study was to evaluate the degree of agreement in biopsy interpretation and the variability in biopsy reporting between different pathology settings, in order to determine their impact on CD diagnosis.
Material and methods
After approval by our Institutional Review Board, all patients who had undergone upper endoscopy and small bowel biopsy at another institution for suspected CD, before being seen at our centre, between October 2009 and December 2009, were considered for the study.
Pathology practice setting classification
For the purpose of this study, the originating or referring pathology settings were classified according to the type of medical practice, as follows: (1) commercial laboratories: certified pathology laboratories that typically provide consultation pathology services to office-based physicians or free-standing endoscopy facilities; (2) community hospitals: local or regional hospitals that provide specialised medical and diagnostic pathology services; and (3) university hospitals: academic medical centres affiliated with universities that provide specialised and subspecialised medical and diagnostic pathology services and allied laboratory tests.
Review of biopsy histopathology
Slides from the referring pathology practices, each with two or three serial sections of formalin-fixed, paraffin-embedded and H&E-stained small bowel biopsies, were collected and blindly reviewed by a pathologist with expertise in small bowel disorders (GB) at our tertiary referral centre. For each patient/case, we generated a report that included a primary diagnosis and detailed interpretation of the biopsy according to the Marsh–Oberhuber classification,4 with semiquantitative assessment of the different histological parameters: villous to crypt ratio, increase in IEL and degree of lamina propria inflammation, as well as any increase in subepithelial collagen, the presence or absence of Brunner's glands (figure 1), and information regarding the biopsy site(s), number and adequacy (orientation) of pieces. Data from the in-house and outside reports were then transferred by another investigator (CA-G) to a database and changes in diagnosis and in grade of VA, with direction of change (upgrade/downgrade), were recorded. The absence of histopathological information and types of reporting terminology were also annotated.
Interobserver agreement for the diagnosis of CD and for grade of VA, overall and according to the type of pathology setting, were determined by the use of κ statistics. κ is an accepted mathematical coefficient used for measuring the agreement between different observers, which takes into consideration agreement due to chance and corrects for it.12 Although there is no established consensus on the interpretation of κ, some guidelines have been adopted by several authors and are widely used in practice. According to these guidelines a κ coefficient between 0.81 and 1.00 is considered ‘very good agreement’; between 0.61 and 0.80 ‘good agreement’; between 0.41 and 0.60 ‘moderate agreement’; between 0.21 and 0.40 ‘fair agreement’ and less than 0.20 ‘poor agreement’. κ for grade of VA was computed only for cases agreed on by both in-house and referring pathologists as being consistent with CD, and partial κ values were calculated to analyse agreement between pathologists for each Marsh score category. All tests were performed in a two-tailed manner with a significance level of 5% using SPSS 18.0 statistical software and MacKappa for partial κ calculation.13
Specimen and patient characteristics
One hundred and two small bowel biopsy slides, belonging to 98 adults (mean age 42±18.45 years), two adolescents and two children less than 12 years old, from community hospitals (n=46), university hospitals (n=18) and commercial laboratories (n=38), were collected and reviewed at our institution. Ninety-two samples were taken from the duodenum, two from the jejunum and eight from unspecified sites. The biopsy locations were second part of the duodenum (n=69), bulb (n=3) and other (n=13), with no information provided for 17 cases. Overall, the average number of biopsy pieces per slide was 4.5 (range 1–14, mode 3); university hospitals, mean 5.4 (range 3–11, mode 4); community hospitals, mean 4.8 (range 1–14, mode 3); and commercial laboratories, mean 3.6 (range 1–10, mode 2). Eleven cases were considered to have no adequately oriented pieces to assess the crypt to villous ratio, as per the in-house pathologist, and 43% had only one well-oriented piece.
Agreement in diagnosis of CD
The pathologists' diagnosis, based on biopsy interpretation, is shown in a contingency table (table 1). We found a significant difference in the interpretation of biopsies between our institution and other pathology practice settings (χ2=94.208, p<0.0001). Overall, agreement in the diagnosis of CD between our pathologist and the referring pathologist was moderate (κ=0.529, p<0.0001), with a change of diagnosis in 26 (25%) out of the 102 cases reviewed. According to the referring pathologist, 55 (54%) cases were considered to have CD and seven (7%) to have non-specific duodenitis, with 38 (37%) being normal. After review by our pathologist, 66 (65%) cases were deemed to have findings compatible with CD, 29 (28%) were considered normal, and five were suspected to have a disease other than CD; autoimmune enteritis (n=2), peptic duodenitis (n=1), drug-induced injury (n=1), collagenous sprue (n=1), where subepithelial fibrosis was not reported by the referring pathologist, and one case was considered inconclusive (inadequate tissue). After review, the diagnosis of CD increased by 20% and the number of normal cases decreased by 24%.
When analysing agreement by the type of pathology practice setting (table 2), we noted an agreement that ranged between ‘very good’ with other university hospitals (κ=0.888) and ‘moderate’ with community hospitals or commercial laboratories (κ=0.465 and κ=0.419, respectively).
Agreement in grade of VA
We next analysed whether there were differences in interpretation of the grade of VA. For this, referral cases without a precise Marsh score reported (n=68), but with a severity grade mentioned, were ascribed a Marsh score (normal, Marsh 0; mild atrophy, Marsh 3a; moderate atrophy, Marsh 3b and severe atrophy, Marsh 3c). If severity could not be interpreted from the report content because of the use of imprecise or vague terms, then the Marsh score was considered ‘incomplete’ (n=9), and cases lacking any mention of atrophy (n=7) were considered as ‘non-reported’. The distribution of all Marsh scores after re-coding is shown in table 3. For the final analysis of agreement in the severity of VA, however, only cases diagnosed as CD by both pathologists (box insert, table 3) were included and cases with a Marsh score of 0 or with inconclusive biopsy findings were excluded.
Overall, the level of agreement in the degree of VA among cases diagnosed as CD by both pathologists (n=49) was fair (κ=0.292, p<0.0001). There was concordance in the degree of severity assessment in 21 (43%) cases; however, it was changed in 14 (29%) cases after revision at our institution, being downgraded in one (2%) and upgraded in 13 (27%) cases.
To evaluate pathologist agreement within each Marsh score category, partial κ values were calculated for each Marsh grade. Agreement, although in general suboptimal (table 3), was higher for cases with higher Marsh scores (Marsh 1 κ=0.0316; Marsh 2 κ=0.0049; Marsh 3a κ=0.3019; Marsh 3b κ=0.1794; and Marsh 3c κ=0.4974). Agreement among normal cases (Marsh 0) was moderate (κ=0.5777); however, it was poor for cases with Marsh scores 1 (n=5) and 2 (n=1), which were the most misdiagnosed cases and were considered normal by the referral pathologist in all cases. Marsh 3a was also not well recognised and was considered normal (or the pattern ascribed to diseases other than CD) in 41% (9/22) of the cases. In comparison, Marsh grades 3b and 3c, while sometimes considered less severe, were never misdiagnosed as normal at other types of pathology practices.
According to the type of pathology setting (table 4), agreement in the degree of VA among cases with CD (n=11 for university hospitals, n=27 for community hospitals and n=11 for commercial laboratories) was higher (moderate) between our institution and commercial laboratories (κ=0.500, 63% agreement), than with community hospitals (κ=0.211, 37% agreement) or university hospitals (κ=0.290, 45% agreement), where it was fair.
Information content of biopsy reports
Information content recorded in the biopsy reports is shown in table 5. The majority of the reports from the referring pathology practice settings lacked relevant information for the correct interpretation of small bowel biopsy pathology. The site (segment of small bowel) and location (eg, bulb, second part of the duodenum) was missing in 80 and 93 of the 102 cases, respectively; villous to crypt ratios were not reported in any of the referring pathology reports and neither was information regarding the detection (or lack thereof) of subepithelial collagen deposition or adequacy of orientation for interpretation of the biopsy provided. The latter being reported only in one case. Information regarding the presence or degree of lamina propria inflammation was missing in 95% of the reports, and an increase in IEL was not reported in 42 (86%) cases, despite the latter feature being an integral component of CD diagnosis. A Marsh score was not provided in 68/102 (67%) cases. Of interest, this omission was quite frequent in reports from university hospital-based practice settings (75%). In all the referral cases where the degree of VA was reported, the terms mild, moderate and severe were preferred to the modified Marsh score. In the incomplete cases (9/102), the non-specific terms used included ‘blunting’, ‘marked atrophy’, ‘Marsh 3’ and ‘patchy’ or ‘focal’ atrophy. Biopsies lacking any mention of the grade of VA (non-reported cases), were diagnosed as CD without mentioning the presence of VA or intraepithelial lymphocytosis.
This study demonstrates an overall modest agreement in establishing a diagnosis of CD and in assessing or reporting the grade of VA when evaluating small bowel biopsies between pathologists practising in different types of settings. Our findings suggest that CD is underdiagnosed by 20% in community practice settings (such as community hospitals and commercial laboratories), but not in other academic or university-based institutions, and that the severity of VA is underestimated in community-based and university hospitals. Our results also show that the degree of agreement is related to the severity of small bowel mucosal alterations, with quite poor agreement observed in cases with lower grades of VA. In addition, there is substantial variability in the type and amount of histopathological data reported, with frequent lack of information regarding the degree of VA and elevations in IEL, as well as common use of non-specific terms such as ‘villous blunting’ or ‘marked atrophy’ in the reports.
These results are in contrast with earlier studies from other countries. Previous studies have shown moderate to good agreement in CD diagnosis among different pathologists in Italy and Scandinavia.5 6 While those studies were among expert gastrointestinal pathologists, Pinto Sanchez et al7 showed a high rate of overdiagnosis among community pathologists compared with academic pathologists in Argentina. Nonetheless, both underdiagnosis and overdiagnosis of CD have relevant clinical implications. For instance, in a study from the UK, Shidrawi et al14 demonstrated that the misinterpretation of poorly oriented biopsies by non-academic pathologists may lead to inappropriate diagnosis of CD, initiation of a gluten-free diet, and subsequent assessment for failure to respond to the diet. It is now well known that CD is a patchy disease, therefore, the sites and amount of lesional tissue sampled are important for correct diagnosis. In this study, it was interesting to note that the number of biopsies submitted for analysis, especially to community and commercial laboratories, was generally lower than currently recommended.10 This is of paramount importance, because the orientation of biopsies before embedding is not routine practice at most centres in North America, as shown in our study where 11% of the cases were inadequately oriented, thus hampering the assessment of villous to crypt ratios.
We also noted that, while Marsh scores at both ends of the disease spectrum (Marsh 0 and Marsh 3c) had optimal agreement, disagreement was more common in cases with milder changes, as biopsies with features of Marsh 1 and 2 were considered normal by pathologists at other centres, and intriguingly even cases with Marsh score 3a were wrongly assessed (considered normal or less severe) in over 40% of the cases. Although the clinical implications of errors in grading the severity of VA are less important than a failure to establish a diagnosis of CD, incorrect assessment of severity can have an impact on monitoring response to a gluten-free diet, or lead to the false diagnosis of refractory CD. Moreover, recent studies have shown that patients with intestinal inflammation (Marsh 1 and 2) have a higher mortality risk compared with the general population,15 16 and thus identification of lower grades of mucosal alterations by the pathologist may be important. IEL assessment is essential for diagnosing CD, especially the histologically milder forms of disease, because this might be the only abnormality present. For this reason, pathologists should be trained to assess IEL systematically in small bowel biopsies. While counting IEL per 100 or 500 enterocytes has been standard practice in most studies, counting of IEL per 20 enterocytes at villous tips has been proposed as a simpler method for routine practice that also appears to better discriminate between other small intestinal disorders characterised by increased IEL.17 18 In addition, less experienced pathologists may benefit from the use of immunohistochemical staining for T-cell antigens to increase the accuracy of IEL assessment, especially in cases where biopsy histology is suboptimal.
To improve the diagnostic yield of CD, we therefore recommend that: (1) endoscopists take at least four biopsies from the descending duodenum and two from the bulb; (2) specimens are properly oriented; (3) IEL are systematically assessed along the entire villous length or at the villous tips, with immunohistochemical staining for T-cell antigens (eg, CD3) in equivocal cases; and (4) a detailed, perhaps templated report, including all relevant histological parameters is provided. While a precise consensus regarding terminology may not be essential, pathology reports should include, at the minimum, information regarding specimen adequacy, especially whether biopsy pieces are well oriented, crypt to villous ratio or degree of VA, and any increase in IEL. Inclusion of this information would aid clinicians in assessing the degree of intestinal damage and in monitoring the response to treatment. The use of a standard reporting format (like the one shown in figure 1) would allow gastroenterologists to assess the adequacy of their biopsies, and in turn ensure consistency and reliability in histopathological interpretation. In addition, unified descriptive parameters and diagnostic criteria would allow reproducibility and comparisons between reports from different pathologists.
The modified Marsh–Oberhuber scoring system, which was developed to minimise disagreement and maximise cross-validation among different pathologists, remains problematic with regard to interobserver agreement and is not routinely used by most centres, as evidenced by this study. With the aim of simplifying this scoring system, alternative three-tiered classification schemes (instead of six) have been proposed such as the ones by Corazza and Villanacci19 or Ensari.20 Similar to our findings, Corazza and Villanacci19 observed only fair agreement, even among experienced pathologists, when using the modified Marsh–Oberhuber classification; however, the agreement was improved when using their simplified system.5 Further large studies at different centres, incorporating the simplified scoring schemes, will help determine their clinical utility.
One limitation of this study was the use of only histopathological features to diagnose CD. The original intent of the study was not to evaluate if CD was being diagnosed adequately, but rather to assess agreement in the histopathological interpretation of biopsies and clinical utility of the reported information, thus serological results were not evaluated. Another limitation of the study was the comparison between many different pathologists at different practice locations with just one in-house pathologist who has expertise in evaluating small bowel biopsies for CD and other small bowel disorders. This does not allow for adequate evaluation of interobserver agreement between pathologists within similar practice settings. Biopsy interpretation might be different in settings where the case mix is different and only a minority of patients have CD (community hospitals and commercial laboratories) than in a tertiary referral centre specialising in small intestinal disorders. Future studies evaluating agreement among different university hospitals, or within the same institution, are encouraged to determine the contribution of pathology underreporting to the underdiagnosis of CD. One interesting observation of this study was the amount of relevant histopathological information that was lacking for the correct interpretation of reports from referral centres, as well as the frequent use of non-specific terminology. In this context, it is also worth mentioning that the term VA, which is in popular use, might not be correct from a biological perspective, because crypt hyperplasia seems to be the dominant reason for mucosal flattening and the apparent loss of villi.
In conclusion, histopathological interpretation of small bowel biopsies varies among different types of pathology settings, which might be related to the experience of the pathologist. Failure to interpret small bowel biopsies correctly could be one of the reasons for the underdiagnosis of CD. Awareness of CD should be raised among pathologists and the use of standardised reporting methods should be encouraged to ensure greater uniformity of data generation and interpretation.
Variability in the reporting of VA and diagnosis of CD among different pathology services could be one of the reasons for the underdiagnosis of CD in the USA.
Underdiagnosis of CD appears to be greater in community hospitals and commercial pathology practices, whereas underestimation of the degree of VA is not infrequent in community and university-based practices.
Uniformity in small bowel histopathology reporting among pathologists may increase the diagnosis rate of CD.
Competing interests None declared.
Ethics approval Ethics approval was provided by Columbia University Institutional Review Board.
Provenance and peer review Not commissioned; externally peer reviewed.