Background Kappa statistics are frequently used to analyse observer agreement for panels of experts and External Quality Assurance (EQA) schemes and generally treat all disagreements as total disagreement. However, the differences between ordered categories may not be of equal importance (eg, the difference between grades 1 vs 2 compared with 1 vs 3). Weighted kappa can be used to adjust for this when comparing a small number of readers, but this has not as yet been applied to the large number of readers typical of a national EQA scheme.
Aim To develop and validate a method for applying weighted kappa to a large number of readers within the context of a real dataset: the UK National Urological Pathology EQA Scheme for prostatic biopsies.
Methods Data on Gleason grade recorded by 19 expert readers were extracted from the fixed text responses of 20 cancer cases from four circulations of the EQA scheme. Composite kappa, currently used to compute an unweighted kappa for large numbers of readers, was compared with the mean kappa for all pairwise combinations of readers. Weighted kappa generalised for multiple readers was compared with the newly developed ‘pairwise-weighted’ kappa.
Results For unweighted analyses, the median increase from composite to pairwise kappa was 0.006 (range −0.005 to +0.052). The difference between the pairwise-weighted kappa and generalised weighted kappa for multiple readers never exceeded ±0.01.
Conclusion Pairwise-weighted kappa is a suitable and highly accurate approximation to weighted kappa for multiple readers.
- Interobserver agreement
- observer variation
- weighted kappa statistics
- prostate cancer
- gleason sum score
Statistics from Altmetric.com
Funding KCW, SM and JM are funded by the Policy Research Programme of the Department of Health. DMB is funded by Orchid (registered with the Charity Commission No 1080540 and registered in England 3963360). The Prostate External Quality Assurance is funded by the NHS Cancer Screening Programme.
Competing interests None.
Provenance and peer review Not commissioned; externally peer reviewed.