Article Text

Download PDFPDF
Integration of computer-aided automated analysis algorithms in the development and validation of immunohistochemistry biomarkers in ovarian cancer
  1. Lucy Gentles1,
  2. Rachel Howarth1,
  3. Won Ji Lee1,
  4. Sweta Sharma-Saha1,
  5. Angela Ralte2,
  6. Nicola Curtin1,
  7. Yvette Drew1,3,
  8. Rachel Louise O'Donnell1,4
  1. 1 Translational and Clinical Institute, Newcastle Cancer Centre, Newcastle upon Tyne, UK
  2. 2 Department of Pathology, Queen Elizabeth Hospital Gateshead, Gateshead, UK
  3. 3 Northern Centre for Cancer Care, Freeman Hospital, Newcastle upon Tyne, Newcastle upon Tyne, UK
  4. 4 Northern Centre for Gynaecological Surgery, Royal Victoria Infirmary, Newcastle upon Tyne, Newcastle upon Tyne, UK
  1. Correspondence to Dr Rachel Louise O'Donnell, Translational and Clinical Institute, Newcastle Cancer Centre, Newcastle upon Tyne, UK; rachel.o%E2%80%99donnell{at}newcastle.ac.uk

Abstract

In an era when immunohistochemistry (IHC) is increasingly depended on for histological subtyping, and IHC-determined biomarker informing rapid treatment choices is on the horizon; reproducible, quantifiable techniques are required. This study aimed to compare automated IHC scoring to quantify 6 DNA damage response protein markers using a tissue microarray of 66 ovarian cancer samples. Accuracy of quantification was compared between manual H-score and computer-aided quantification using Aperio ImageScope with and without a tissue classification algorithm. High levels of interobserver variation was seen with manual scoring. With automated methods, inclusion of the tissue classifier mask resulted in greater accuracy within carcinomatous areas and an overall increase in H-score of a median of 11.5% (0%–18%). Without the classifier, the score was underestimated by a median of 10.5 (5.2–25.6). Automated methods are reliable and superior to manual scoring. Fixed algorithms offer the reproducibility needed for high-throughout clinical applications.

  • ovary
  • carcinoma
  • antibodies
  • cell biology
  • immunohistochemistry

Statistics from Altmetric.com

Introduction

Immunohistochemical assays performed on formalin-fixed paraffin-embedded (FFPE) tissue sections have traditionally been semiquantified through visual scoring by pathologists. Immunohistochemistry (IHC) is playing an increasing role in the modern diagnostic pathology of ovarian cancer serving as an important complementary tool for accurate assignment of histological subtype, identification of different molecular phenotypes and prognostic information.1–4

Recent developments in understanding and application of targeted therapies in ovarian cancer mean that biomarker-driven stratification is of high importance. Direct quantification of protein may be superior to genomic and transcriptomic studies given the high rates of discordance5 described between genomic studies with actual protein levels.6 Additionally, the universal availability of IHC throughout diagnostic clinical laboratories makes this technique highly attractive for the application of validated biomarkers. IHC techniques benefit from the ability to locate the protein of interest within each cellular compartment of the tumour examined. This may be particularly important given the heterogenous expression profile frequently observed in disseminated ovarian cancer7 that may require multiple tumour sampling from a number of intra-abdominal sites for accurate assessment of overall biomarker status. The demand for multiple sampling for each patient alongside supercentralisation of ovarian cancer services with specialist pathology hubs increases the need for reproducible, automated quantitative IHC processes.

Visual IHC scoring by a pathologist takes time, is subjective and typically produces categorical scoring rather than continuous data. Scoring systems like H-score use a summation of the percentage of tumour area stained multiplied by the intensity score.8 Several studies examining the reproducibility of histological typing of ovarian carcinomas identify poor/variable degree of interobserver agreement perfect.9–14 We thus hypothesise that this interobserver variation also exists in the interpretation of IHC-detected protein expression. Computer-aided automation offers the possibility to overcome many of these limitations with technology aiming to standardise scoring and provide accurate and reproducible measures in a high-throughput computer-based analysis.

This study aimed to compare computer-based automated quantitative IHC scoring with traditional visual scoring in the evaluation of protein expression in an ovarian cancer tissue microarray (TMA).

Methods

Patient samples

Patients provided written informed consent. FFPE omental tumour samples from patients with disseminated high grade serous ovarian cancer (HGSOC) were collated.

TMA construction and IHC

Duplicate 1 mm core samples from 33 patients were randomly distributed alongside positive control tissue cores (testis, breast, uterus and skin) to create the ovarian cancer (OvCa) TMA using a Galileo TMA CK3500 tissue microarrayer (Intergrated Systems Engineering, Milan, Italy). The TMA was serially sectioned to a thickness of 4 µm, mounted, deparaffinised and rehydrated before antigen retrieval and staining with primary antibody and visualisation with 3,3-diaminobenzidine (table 1).

Table 1

Antibody panel

Slide digitisation, visual H-scoring and computer-aided image analysis

Digital images of H&E-stained and IHC-stained TMA slides were captured at 20×magnification using a whole slide scanner (Leica ScanScope, Aperio Technologies, UK) and images saved in SVS format (Aperio), managed with server software (ImageScope, Aperio), and retrieved with a file management web interface (Spectrum, Aperio).

H&E-stained TMA cores were examined by a specialist pathologist (AR) and technician (LG) with carcinomatous areas marked. Minimum adequate tumour content was defined as 20% with inadequate cores excluded from analysis. All validations were undertaken blinded to patient data and IHC results.

Manual score

For each core, the H-score was calculated as a multiplicative score of maximal stain intensity (0–3; where 0 was no staining, 1 was weak staining, 2 was moderate staining and 3 was strong staining), multiplied by percentage area of staining relative to the entire core area (categorised into 0.1%–4%=1, 5%–19%=2, 20%–39%=3, 40%–59%=4, 60%–79%=5, 80%–100%=6), to give a final score of 0–18.

Default automated scoring (unmasked)

For automated default scoring, each TMA core was analysed using an unmodified default image analysis algorithm only, which did not include tissue classification evaluation or masking processes.

Automated scoring with tissue class and masking

Through an iterative process, representative regions of three pathologist-defined tissue classes (carcinoma, stroma and background) were annotated on the digital slides to build a training montage of morphological parameters for each tissue class (ImageScope, Aperio). Using the montage, a Genie Classifier algorithm was created and evaluated for accuracy in identifying the annotated regions of interest by overlaying a coloured map of positively identified regions of each tissue class. Accuracy was assessed visually with regard to proportion of area covered as well as definition of its borders of the tumour area in comparison to the benchmark pathologist-assigned classifier. The resulting ‘tumour mask’ was then used to limit scoring to the defined tumour regions.

Automated antigen scoring algorithm

The previously described nuclear V.9 algorithm (Aperio)15 16 was then applied to the TMA digital image with and without the tumour mask to identify individual staining parameters for quantification. Staining was quantified by product of the staining intensity (table 2), multiplied by the percentage of carcinoma with positive antigen staining.16 Blue (no stain), yellow (weak), orange (moderate) and red (strong) colour mark-ups conveyed intensity of nuclear-specific staining (figure 1).

Table 2

Scoring criteria thresholds (arbitrary values of pixel intensity) for weak, moderate and strong staining for nuclear V.9 algorithm default settings and following modification to improve accuracy

Figure 1

Representative images for PAR, DNAPKcs, RPA, Ku70, ATM and ATR. (A) H&E manually annotated by a specialist pathologist for areas of carcinoma. (B) Consecutive tissue section stained with target antibody and counterstained with haematoxylin. (C) Staining intensity mark-up following automated scoring with nuclear V.9 default settings (unmasked). (D) Genie classifier algorithm training annotations indicating carcinoma (pink) and stroma (blue). (E) Positively identified carcinomatous regions by genie classifier algorithm. (F) Staining intensity mark-up following genie tissue classification algorithm to select out carcinoma and image analysis with a modified nuclear V.9 algorithm (masked). ATM, ataxia telangiectasia mutated; ATR, ataxia telangiectasia and Rad3-related; DNAPKcs, DNA-dependent protein kinase catalytic subunit; PAR, poly (ADP-ribose); RPA, replication protein A.

Statistical analysis

IHC staining was evaluated in each TMA core by the three scoring modalities: (1) Visual H-score; (2) Automated unmasked analysis and (3) Automated analysis within masking of carcinomatous area. Each core served as its own control. Correlation analysis and Bland-Altman plots were used to measure agreement between scoring methods.17

Results

Classifier accuracy by automation

Representative images of H&E TMA cores with areas of carcinoma marked by a pathologist alongside target antigen stain, classification maps generated from the default and modified algorithms are shown in figure 1. The automated unmasked Aperio algorithm shows no ability to distinguish carcinoma from non-carcinomatous areas, with all nuclei scored irrespective of cellular morphology and tissue classification (figure 1C). Incorporation of the tissue classifier (figure 1D) results in accurate mapping of tumour areas (figure 1E). This map was then used to mask for subsequent automated nuclear antigen intensity staining (figure 1F).

Scoring

Antigen expression was variable between and within each TMA core, irrespective of scoring method used. Correlation between scoring modalities was assessed for automated default (unmasked) versus Genie-modified (masked), as well as each automated (masked and unmasked) versus manual scoring methods for each antigen (figure 2).

Figure 2

Linear regression analysis showing correlation between automated unmasked and manual H-scores in (A) ATM (r2=0.1833, p<0.0001), (B) ATR (r2=0.4330, p<0.0001), (C) DNAPKcs (r2=0.6296, p<0.0001), (D) Ku70 (r2=0.5891, p<0.0001), (E) PAR (r2=0.3978, p<0.0001), (F) RPA (r2=0.4453, p<0.0001) and between automated unmasked and Genie-modified masked H-scores in (G) ATM (r2=0.8347, p<0.0001), (H) ATR (r2=0.8307, p<0.0001), (I) DNAPKcs (r2=0.8312, p<0.0001), (J) Ku70 (r2=0.7638, p<0.0001), (K) PAR (r2=0.8663, p<0.0001), (L) RPA (r2=0.8141, p<0.0001). Pink data points denote Y-intercept. ATM, ataxia telangiectasia mutated; ATR, ataxia telangiectasia and Rad3-related; DNAPKcs, DNA-dependent protein kinase catalytic subunit; PAR, poly (ADP-ribose); RPA, replication protein A.

Weak correlation was observed between automated unmasked and manual scoring methods (r2=0.1833–0.6296, p<0.0001) (figure 2A–2F). Strong linear correlation was observed between automated unmasked and Genie-modified masked scoring methods across all antigens (r2=0.7638–0.8663, p<0.0001) (figure 2G–2I). Positive Y-intercept values ranged from 5.2 in ataxia telangiectasia mutated (ATM) and ataxia telangiectasia Rad3-related (ATR) to 25.6 in poly (ADP-ribose) (PAR), demonstrating a consistent underestimation of antigen expression following automated unmasked scoring compared with the masked method which is likely to be a reflection of scoring ‘dilution’ in stromal tissue.

Percentage change in H-Score following application of the Genie-modified mask was distinctly varied between and within antigens (figure 3). Across all targets, there was an overall upward trajectory, with median percentage H-score change ranging from 0% (ATM) to 18% (DNA-dependent protein kinase catalytic subunit (DNAPKcs) and PAR).

Figure 3

Percentage (%) change in H-score from automated unmasked analysis to Genie-modified masked analysis of all targets with dotted line at 0. Positive values (above dotted line) indicate a % increase in H-score while negative values (below dotted line) indicate a % decrease in H-score. Median % change in H-scores were calculated: ATM=0% (−46 to +212), ATR=10% (−29 to +77), DNAPKcs=18% (−50 to +475), Ku70=12% (−75 to +206), RPA=11% (−67 to +180), PAR=18% (−11 to +65). ATM, ataxia telangiectasia mutated; ATR, ataxia telangiectasia and Rad3-related; DNAPKcs, DNA-dependent protein kinase catalytic subunit; PAR, poly (ADP-ribose); RPA, replication protein A.

Agreement between the automated with and without tumour classifier mask is high with >95% of TMA scores falling within the limits of agreement on Bland-Altman analysis (figure 4). Greatest variation in scores is seen at the extremes of protein staining.

Figure 4

Bland-Altman method comparison plots of difference versus average between automated unmasked and Genie-modified masked algorithms in (A) ATM (SD of bias=10.16), (B) ATR (SD of bias=10.15), (C) DNAPKcs (SD of bias=19.55), (D) Ku70 (SD of bias=19.47), (E) PAR (SD of bias=16.75), (F) RPA (SD of bias=10.56). ATM, ataxia telangiectasia mutated; ATR, ataxia telangiectasia and Rad3-related; DNAPKcs, DNA-dependent protein kinase catalytic subunit; PAR, poly (ADP-ribose); RPA, replication protein A.

Analysis across histological subtype

The tissue classifier created to select areas of carcinoma for protein quantification is histological subtype-specific. The HGSOC mask created for the OvCa TMA was used to assess TMA cores from mucinous, endometrioid and clear cell ovarian carcinomas. Masking was imprecise with poor categorisation of carcinoma and stromal regions, leading to inaccurate staining mark-up and final H-score (figure 5).

Figure 5

Analysis of mucinous, endometrioid and clear cell histological subtypes using high grade serous ovarian cancer (HGSOC) mask. (A) Tissue section stained with 8-hydroxy-2'-deoxyguanosine (8-OHdG), (B) Imprecise identification of carcinomatous regions (pink) and excluded stromal regions (grey), (C) Subsequent staining intensity mark-up based on regional categorisation of carcinoma and stroma.

Discussion

The growing spotlight on IHC characterisation of prognostic biomarkers as well as biomarkers of therapy response in ovarian cancer demands robust, reproducible and automated processes to accurately quantify expression.

This study highlights the potential inaccuracies and interobserver variation in protein assessment using traditional visual scoring. Importantly we demonstrate the accuracy and reproducibility of the Aperio image analysis algorithm and additionally demonstrate increased accuracy through modification of the algorithm using histological specific tissue classifiers. We quantify the gross underestimation of protein expression with unmasked processes revealing potential for misclassification if accurate tissue classifiers are not incorporated into the scoring algorithm. Once algorithms are established, complete automation producing continuous data sets for all cores within a single TMA can be achieved, with the same algorithm applied to subsequent TMAs containing the same histological cancer type. Unique morphology seen within each of the histological subtypes of epithelial ovarian cancer necessitates individual tissue classifier algorithms to be created for each subtype. This approach, taken in this study to build consensus on tissue classifiers as well as intensity IHC scoring in the development phase of an automated approach, will reduce clinical and technical time in validation. Future work aims to validate the HGSOC classifiers on other FFPE catalogues.

Ethics statements

Ethics approval

Ethical approval for donation of patient material and data was granted by the local ethics committee (North-East Newcastle Research Ethics Committee: 12/NW/0202, 17/NE/0395)

Acknowledgments

The authors thank the Newcastle upon Tyne Hospitals NHS Charity for funding this work. The authors also thank the clinical teams at Newcastle Centre for Cancer (NCCC) and Northern Gynaecological Oncology Centre (NGOC) for their recruitment of patients to the study and all the patients who generously donated clinical samples for this study.

References

Footnotes

  • Handling editor Runjan Chetty.

  • Contributors LG acquired laboratory data, devised methodologies, curated data, undertook formal analysis, drafted and revised the manuscript; RH acquired laboratory data and devised methodologies; WJL acquired laboratory data; SS-S contributed to data analysis; AR acquired clinical data and devised methodologies; NC revised and provided final approval of manuscript; YD recruited participants, acquired clinical data and revised the manuscript; RLO recruited participants, acquired clinical data, undertook formal analysis, drafted, and revised and the manuscript, and provided final approval of the manuscript. RLO has full access to the data in the study and is accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

  • Funding Work in this study was funded by the Newcastle uponon Tyne Hospitals NHS Charity (registered charity number 1057213).

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; internally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.