Article Text

Download PDFPDF
A practical application of analysing weighted kappa for panels of experts and EQA schemes in pathology
  1. Karen C Wright1,
  2. Patricia Harnden2,
  3. Sue Moss1,
  4. Dan M Berney3,
  5. Jane Melia1
  1. 1Cancer Screening Evaluation Unit, Institute of Cancer Research, Sutton, UK
  2. 2Department of Histopathology, Leeds Teaching Hospitals NHS Trust, St James's University Hospital, Leeds, UK
  3. 3Centre for Molecular Oncology & Imaging, Barts and the London School of Medicine and Dentistry, London, UK
  1. Correspondence toK C Wright, Cancer Screening Evaluation unit, Institute of Cancer Research, Sir Richard Doll Building, 15 Cotswold Road, Sutton SM2 5NG, UK; Karen.Wright{at}


Background Kappa statistics are frequently used to analyse observer agreement for panels of experts and External Quality Assurance (EQA) schemes and generally treat all disagreements as total disagreement. However, the differences between ordered categories may not be of equal importance (eg, the difference between grades 1 vs 2 compared with 1 vs 3). Weighted kappa can be used to adjust for this when comparing a small number of readers, but this has not as yet been applied to the large number of readers typical of a national EQA scheme.

Aim To develop and validate a method for applying weighted kappa to a large number of readers within the context of a real dataset: the UK National Urological Pathology EQA Scheme for prostatic biopsies.

Methods Data on Gleason grade recorded by 19 expert readers were extracted from the fixed text responses of 20 cancer cases from four circulations of the EQA scheme. Composite kappa, currently used to compute an unweighted kappa for large numbers of readers, was compared with the mean kappa for all pairwise combinations of readers. Weighted kappa generalised for multiple readers was compared with the newly developed ‘pairwise-weighted’ kappa.

Results For unweighted analyses, the median increase from composite to pairwise kappa was 0.006 (range −0.005 to +0.052). The difference between the pairwise-weighted kappa and generalised weighted kappa for multiple readers never exceeded ±0.01.

Conclusion Pairwise-weighted kappa is a suitable and highly accurate approximation to weighted kappa for multiple readers.

  • Interobserver agreement
  • observer variation
  • weighted kappa statistics
  • prostate cancer
  • gleason sum score
  • epidemiology
  • prostate

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


  • Funding KCW, SM and JM are funded by the Policy Research Programme of the Department of Health. DMB is funded by Orchid (registered with the Charity Commission No 1080540 and registered in England 3963360). The Prostate External Quality Assurance is funded by the NHS Cancer Screening Programme.

  • Competing interests None.

  • Provenance and peer review Not commissioned; externally peer reviewed.