Measuring interobserver variation in a pathology EQA scheme using weighted κ for multiple readers

Karen C Wright; Jane Melia; Sue Moss; Dan M Berney; Derek Coleman; Patricia Harnden

doi:10.1136/jclinpath-2011-200229

Article Text

Original article

Measuring interobserver variation in a pathology EQA scheme using weighted κ for multiple readers

Karen C Wright1,
Jane Melia1,
Sue Moss1,
Dan M Berney2,
Derek Coleman1,
Patricia Harnden3

¹Cancer Screening Evaluation Unit, Institute of Cancer Research, Sutton, Surrey, UK
²Molecular Oncology, Barts Cancer Institute, London, UK
³Department of Histopathology, Leeds Teaching Hospitals NHS Trust, Leeds, UK

Correspondence to Karen C Wright, ICON Clinical Research, 2 Globeside, Globeside Business Park, Marlow, Buckinghamshire SL7 1TB, UK; karen.wright{at}iconplc.com

Abstract

Background A Urological Pathology External Quality Assurance (EQA) Scheme in the UK has reported observer variation in the diagnosis and grading of adenocarcinoma in prostatic biopsies using basic κ statistics, which rate all disagreements equally.

Aim The aim of this study is to use customised weighting schemes to report κ statistics that reflect the closeness of interobserver agreement in the prostate EQA scheme.

Methods A total of 83, 114 and 116 pathologists took part, respectively, in three web-based circulations and were classified as either expert or other readers. For analyses of diagnosis, there were 10, 8 and 8 cases in the three circulations, respectively. For analyses of Gleason Sum Score, only invasive cases were included, leaving 5, 5 and 6 cases, respectively. Analyses were conducted using customised weighting schemes with ‘pairwise-weighted’ κ for multiple readers.

Results Analysis of diagnosis for all circulations and all readers gave a composite κ value of 0.86 and pairwise-weighted κ (κ_p–w) value of 0.91, both regarded as ‘almost perfect’ agreement. This was due to the high proportion of responses that showed partial agreement. Analysis of Gleason Sum Score gave κ=0.38 and κ_p–w=0.58 over all circulations and all readers, indicating that discrepancies occur at the boundary between adjacent grades and may not be as clinically significant as suggested by composite κ.

Conclusion Weighted κ show higher levels of agreement than previously reported as they have the advantage of applying weighting, which reflects the relative importance of different types of discordance in diagnosis or grading. Agreement on grading remained low.

Observer variation
κ statistics
customised weighting
diagnosis of prostate biopsies
EQA scheme
Gleason grading
EQA
diagnosis
prostate
cancer
urinary tract tumours
testis
genitourinary pathology
bladder
urogenital pathology
kidney
cytopathology
uropathology
histopathology
colorectal cancer
gall bladder
oncogenes
P53
pancreas

https://doi.org/10.1136/jclinpath-2011-200229

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Footnotes

Funding Karen Wright, Sue Moss, Derek Coleman and Jane Melia were funded by the Policy Research Programme of the Department of Health. The views expressed in this paper are those of the authors and not necessarily those of the Department of Health. Dr Dan Berney is funded by Orchid (Registered with the Charity Commission No. 1080540 and registered in England 3963360). The Prostate EQA is funded by the NHS Cancer Screening Programme.
Competing interests None.
Provenance and peer review Not commissioned; externally peer reviewed.

Log in using your username and password

Main menu

Log in using your username and password

You are here

Abstract

Statistics from Altmetric.com

Request Permissions

Footnotes

Read the full text or download the PDF:

Log in using your username and password

Read the full text or download the PDF:

Log in using your username and password