Article Text

Download PDFPDF
Measuring interobserver variation in a pathology EQA scheme using weighted κ for multiple readers
  1. Karen C Wright1,
  2. Jane Melia1,
  3. Sue Moss1,
  4. Dan M Berney2,
  5. Derek Coleman1,
  6. Patricia Harnden3
  1. 1Cancer Screening Evaluation Unit, Institute of Cancer Research, Sutton, Surrey, UK
  2. 2Molecular Oncology, Barts Cancer Institute, London, UK
  3. 3Department of Histopathology, Leeds Teaching Hospitals NHS Trust, Leeds, UK
  1. Correspondence to Karen C Wright, ICON Clinical Research, 2 Globeside, Globeside Business Park, Marlow, Buckinghamshire SL7 1TB, UK; karen.wright{at}


Background A Urological Pathology External Quality Assurance (EQA) Scheme in the UK has reported observer variation in the diagnosis and grading of adenocarcinoma in prostatic biopsies using basic κ statistics, which rate all disagreements equally.

Aim The aim of this study is to use customised weighting schemes to report κ statistics that reflect the closeness of interobserver agreement in the prostate EQA scheme.

Methods A total of 83, 114 and 116 pathologists took part, respectively, in three web-based circulations and were classified as either expert or other readers. For analyses of diagnosis, there were 10, 8 and 8 cases in the three circulations, respectively. For analyses of Gleason Sum Score, only invasive cases were included, leaving 5, 5 and 6 cases, respectively. Analyses were conducted using customised weighting schemes with ‘pairwise-weighted’ κ for multiple readers.

Results Analysis of diagnosis for all circulations and all readers gave a composite κ value of 0.86 and pairwise-weighted κ (κp–w) value of 0.91, both regarded as ‘almost perfect’ agreement. This was due to the high proportion of responses that showed partial agreement. Analysis of Gleason Sum Score gave κ=0.38 and κp–w=0.58 over all circulations and all readers, indicating that discrepancies occur at the boundary between adjacent grades and may not be as clinically significant as suggested by composite κ.

Conclusion Weighted κ show higher levels of agreement than previously reported as they have the advantage of applying weighting, which reflects the relative importance of different types of discordance in diagnosis or grading. Agreement on grading remained low.

  • Observer variation
  • κ statistics
  • customised weighting
  • diagnosis of prostate biopsies
  • EQA scheme
  • Gleason grading
  • EQA
  • diagnosis
  • prostate
  • cancer
  • urinary tract tumours
  • testis
  • genitourinary pathology
  • bladder
  • urogenital pathology
  • kidney
  • cytopathology
  • uropathology
  • histopathology
  • colorectal cancer
  • gall bladder
  • oncogenes
  • P53
  • pancreas

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


  • Funding Karen Wright, Sue Moss, Derek Coleman and Jane Melia were funded by the Policy Research Programme of the Department of Health. The views expressed in this paper are those of the authors and not necessarily those of the Department of Health. Dr Dan Berney is funded by Orchid (Registered with the Charity Commission No. 1080540 and registered in England 3963360). The Prostate EQA is funded by the NHS Cancer Screening Programme.

  • Competing interests None.

  • Provenance and peer review Not commissioned; externally peer reviewed.