Aims To investigate the accuracy and reproducibility of a scoring system for cervical intraepithelial neoplasia (CIN1–3) based on immunohistochemical (IHC) biomarkers Ki-67 and p16ink4a.
Methods 115 cervical tissue specimens were reviewed by three expert gynaecopathologists and graded according to three strategies: (1) CIN grade based on H&E staining only; (2) immunoscore based on the cumulative score of Ki-67 and p16ink4a only (0–6); and (3) CIN grade based on H&E supported by non-objectified IHC 2 weeks after scoring 1 and 2. The majority consensus diagnosis of the CIN grade based on H&E supported by IHC was used as the Reference Standard. The proportion of test positives (accuracy) and the absolute agreements across pathologists (reproducibility) of the three grading strategies within each Reference Standard category were calculated.
Results We found that immunoscoring with positivity definition 6 yielded the highest proportion of test positives for Reference Standard CIN3 (95.5%), in combination with the lowest proportion of test positives in samples with CIN1 (1.8%). The proportion of test positives for CIN3 was significantly lower for sole H&E staining (81.8%) or combined H&E and IHC grading (84.8%) with positivity definition ≥CIN3. Immunoscore 6 also yielded high absolute agreements for CIN3 and CIN1, but the absolute agreement was low for CIN2.
Conclusions The higher accuracy and reproducibility of the immunoscore opens the possibility of a more standardised and reproducible definition of CIN grade than conventional pathology practice, allowing a more accurate comparison of CIN-based management strategies and evaluation of new biomarkers to improve the understanding of progression of precancer from human papillomavirus infection to cancer.
- cervical cancer
- Ki 67
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
Cervical screening programmes are based on the detection and treatment of cervical precancer. Accurate colposcopy and histological assessment of cervical precursor lesions is essential to determine clinical management. Histologically cervical lesions are classified as cervical intraepithelial neoplasia (CIN) and categorised as CIN1, CIN2 or CIN3 based on the extension of immature, dysplastic cells into the squamous epithelium above the basal layer (one-third to full thickness) and the severity of cellular abnormality. CIN3 is considered the most advanced precancerous lesion. Although CIN is classified into three grades, the development of CIN to cancer is a dynamic process and represents a morphological continuum.1
The diagnosis of the pathologist is subjective and based on personal experience, making use of histomorphological features in H&E stained slides alone or in combination with immunohistochemical (IHC) findings. The heterogeneity inherent in this subjective grading of CIN results in limited reproducibility with moderate interobserver and intraobserver agreement,2 3 and consequently has effects on treatment. Generally, CIN3 is treated by excision and CIN1 is managed conservatively. However, for CIN2 management differs between clinics, that is, either excisional treatment or close surveillance, because of the high regression rate of CIN2.4 5
Due to the moderate reproducibility of CIN grading, the WHO has introduced a two-tiered grading system in 2013,6 in which the term high-grade CIN is used for CIN2 and CIN3, and low-grade CIN for CIN1. The diagnosis of high-grade squamous intraepithelial lesion (HSIL) results in excisional treatment to prevent precancers developing into cancer. Consequently, this approach results in overtreatment of potentially non-progressive or regressive lesions. The USA adopted and recently optimised this two-tiered classification system,2 whereas most European countries adopted the CIN grading system.7 In the latter, preferably CIN2 should be divided into (early productive) CIN1-like and (late transforming) CIN3-like lesions.8
An IHC or biomarker-based classification system might improve the accuracy and reproducibility of grading CIN, and hence standardisation of diagnosis. This will allow comparison of the results of clinical management between centres for clinical audit and also simpler comparison of alternative management strategies in clinical trials. Ki-67 and p16ink4a are used widely to guide CIN grading by pathologists and expression of these markers increases with increasing CIN.2 5 9–16 Ki-67 staining in suprabasal and parabasal layers is an indicator of cellular proliferation, whereas diffuse p16ink4a staining occurs when p16ink4a is overexpressed as a result of functional inactivation of retinoblastoma protein by the human papillomavirus (HPV) E7 protein. These are the consequences of a persistent cervical HPV infection and deregulation of E6 and E7 oncogene expression in proliferating cells. Deregulation of E6 and E7 oncogenes causes chromosomal instability, which is the driving force for accumulation of (epi)genetic aberrations in host cell genes and drives the progression of CIN lesions towards cervical cancer.8 17
In this study, we propose a classification system using the cumulative immunoscore value of Ki-67 and p16ink4a, compare accuracy and reproducibility of CIN grading based on this score to classical histomorphological and IHC CIN grading and show the benefits of this scoring system.
In this retrospective cross-sectional study, we selected 115 formalin-fixed paraffin-embedded cervical biopsy and large loop excision of the transformation zone (LLETZ) specimens guided by severity of the initial diagnosis (no dysplasia n=22; CIN1 n=22; CIN2 n=27; CIN3 n=22; squamous cell carcinoma (SCC) n=22) from the files of the Pathology Department of the VU University Medical Center in Amsterdam, The Netherlands. The selected tissue blocks were anonymously processed for the purposes of this study. Ethical approval was waived according to the regulations in the Netherlands.18
All tissue blocks were cut to provide sections of 3 µm. The first and last sections were used for H&E staining to ensure the presence of the same cervical lesion, and in-between sections were used for immunostaining with mouse monoclonal antibodies against Ki-67 antigen (DAKO, Clone MIB-1) or p16ink4a antigen (Roche, CINtec, Clone E6H4), using the automated IHC Ventana staining machine (Roche, Basel, Switzerland).
Scoring of Ki-67
Nuclear Ki-67 staining in cells of the squamous epithelium was scored positive. Score 0 is a normal staining pattern (ie, staining of nuclei in the basal layer, figure 1A). Score 1 is defined as positive nuclei predominantly found in the lower one-third of the epithelium (figure 1B). Score 2 is defined as positive nuclei predominantly found in the lower two-thirds of the epithelium (figure 1C). Score 3 is defined as positive nuclei in more than two-thirds of the epithelium (figure 1D). The presence of a few scattered positive individual cells in the upper two-thirds layer of the epithelium in a predominant staining pattern in the lower one-third is scored as 1 (figure 1E). Also, a few scattered positive individual cells found in the upper one-third layer of the epithelium in a predominant pattern with two-thirds involvement of the epithelium is scored as 2.
Areas where dermal papillae narrow down the width of the epithelium cannot be scored reliably and are therefore not taken into account.
Scoring of p16ink4a
Diffuse or ‘block’ staining for p16ink4a of the cell cytoplasm or nucleus in squamous epithelium was considered positive.2 3 19 Score 0 is defined as either no p16ink4a positivity or focally scattered positive cells or small cell clusters (ie, patchy staining, figure 2A). Score 1 is defined as low intensity, diffuse positivity restricted to the lower one-third part of the epithelium (figure 2B). Score 2 is defined as continuous positivity in the lower two-thirds of the epithelium (figure 2C). Score 3 is defined as positive cells involving the full thickness of the epithelium (ie, diffuse full thickness staining, figure 2D).
It is considered important to distinguish positive diffuse or ‘block’ p16ink4a staining from patchy or background staining.
Three expert gynaecopathologists (P1, P2 and P3) from two different laboratories (MCGB, DJ and MvdS) received the H&E slides, selected the area with the most dysplastic features of the epithelium and graded the CIN lesion based on morphologic characteristics (further referred to as ‘H&E score’). All specimens were classified for H&E scoring as no dysplasia, CIN1, CIN2, CIN3 or SCC, according to international criteria.20
Ki-67 and p16ink4a immunoscore
Subsequently, the pathologists scored the Ki-67 and p16ink4a expression on a separate scoring sheet, using the scoring system described above and depicted in figures 1 and 2. This scoring sheet, including examples of the scoring, was discussed with and approved by all three pathologists prior to the start of the study. No morphologic features were taken into account in this scoring. The cumulative immunoscore of Ki-67 (scores 0–3) and p16ink4a (scores 0–3) could vary from 0 to 6, irrespective of how the score was obtained (further referred to as ‘Ki-67 and p16ink4a immunoscore’). For example, we considered a cumulative score of 5, either established by a Ki-67 score of 2 and p16ink4a score of 3, or vice versa, as identical.
Combined H&E and non-objectified IHC score
At least 2 weeks after scoring of the immunostains, each pathologist was asked to render the CIN grade now based on morphologic features in combination with the subjective interpretation of the immunostaining (further referred to as ‘combined H&E and non-objectified IHC score’).
The consensus diagnosis of the combined H&E and non-objectified IHC score, based on agreement of CIN grade in at least two out of three pathologists, was used as the ‘Reference Standard’. In six lesions in which no majority CIN score was available, consensus was reached in a panel discussion with a fourth pathologist (CJLMM).
A flow chart of the statistical analysis is shown in figure 3. To assess the accuracy of a test strategy, we calculated the proportion of test positives for all CIN grading strategies separately within each of the different Reference Standard diagnoses (ie, no dysplasia, CIN1, CIN2, CIN3 and SCC). A positive status for each strategy was obtained using the following definitions: diagnosis of ≥CIN2 based on H&E scoring for strategy I; diagnosis of ≥CIN3 based on sole H&E scoring for strategy II; diagnosis of ≥CIN2 based on combined H&E and non-objectified IHC scoring for strategy III; diagnosis of ≥CIN3 based on combined H&E and non-objectified IHC scoring for strategy IV; immunoscore of ≥4 for strategy V based on Ki-67 and p16ink4 immunoscoring for strategy V; immunoscore of ≥5 based on Ki-67 and p16ink4 immunoscoring for strategy VI; and immunoscore of 6 based on Ki-67 and p16ink4 immunoscoring for strategy VII (table 1). We accounted for the scores of three pathologists by pooling the proportions of test positives of the individual pathologists. Absolute differences between the pooled proportions of test positives of each strategy per Reference Standard category were calculated with 95% CIs. If a CI did not include the value 0, the difference was considered statistically significant.
Second, to assess the reproducibility of a test strategy, we calculated the average of absolute agreements for every CIN grading strategy (with positivity definitions of strategies I–VII, see table 1) between pathologists (P1 vs P2, P1 vs P3 and P2 vs P3) within each Reference Standard category.
Calculations were performed in SPSS (V.22) and STATA (V.14.1).
Different grades for H&E, H&E and IHC, Ki-67 and p16ink4a immunoscores, and the Reference Standard category by the three pathologists for the 115 cervical biopsy and LLETZ specimens are shown in table 2.
Overall, taking the proportion of test positives as a measurement of the accuracy, CIN grading based on the Ki-67 and p16ink4a immunoscore, with immunoscore 6 to define a positive status (CIN grading strategy VII), detected the highest numbers of Reference Standard CIN3, combined with the fewest CIN1 (figure 4, online supplementary table 1). Furthermore, CIN grading based solely on H&E grading compared with grading based on combined H&E and non-objectified interpretation of IHC (strategy I vs III, and strategy II vs IV) showed no significant differences in detection of all Reference Standard categories (online supplementary table 2).
In more detail, within Reference Standard no dysplasia and SCC, no significant differences in accuracy (online supplementary table 2) between any of the CIN grading strategies were observed, with the proportions of test positives of all grading strategies being close to 0 and 100%, respectively.
Within Reference Standard CIN1, the accuracy of CIN grading strategy I (≥CIN2 based on sole H&E scoring), strategy III (≥CIN2 based on combined H&E and non-objectified interpretation of IHC), strategy V (immunoscore ≥4) and strategy VI (immunoscore ≥5), with proportions of test positives ranging from 17.5% to 28.1%, was significantly higher than the accuracy of strategy II (≥CIN3 based on sole H&E scoring), strategy IV (≥CIN3 based on combined H&E and non-objectified interpretation of IHC) and strategy VII (immunoscore 6), with proportions of test positives ranging from 1.7% to 5.3%.
Within Reference Standard CIN2, the accuracy of CIN grading strategies I, III, V and VI, with proportions of test positives ranging from 76.5% to 90.2%, was significantly higher than the accuracy of strategy VII, with a proportion of test positives of 54.9%. Furthermore, the accuracy of CIN grading strategies II and IV was significantly lower compared with strategy VII, with proportions of test positives of 27.5% and 31.4%.
For the detection of Reference Standard CIN3, the accuracy of CIN grading strategies I, III, V, VI and VII was near 100%, with proportions of test positives ranging from 97.0% to 100%, and significantly higher than the accuracy of strategies II and IV with proportions of test positives of 81.8% and 84.8%.
Overall, grading of CIN based on the Ki-67 and p16ink4a immunoscore with a definition for positivity of immunoscore 6 (strategy VII) showed a high absolute agreement for Reference Standard CIN3 (90.9%) in combination with a high agreement for Reference Standard CIN1 (96.5%, figure 5, online supplementary table 3). Other grading strategies with a high absolute agreement for Reference Standard CIN3 showed moderate agreement for Reference Standard CIN1. All strategies showed only moderate agreement for Reference Standard CIN2.
In more detail, the absolute agreement between pathologists was high for Reference Standard no dysplasia and SCC (all ≥96.2%; figure 5 and online supplementary table 3). For Reference Standard CIN1, the absolute agreement was moderate to high (range 64.9%–96.5%) with highest agreement for CIN grading strategies IV and VII. For Reference Standard CIN2, the absolute agreement was moderate (range 37.3%–84.3%). For Reference Standard CIN3, the absolute agreement was high for CIN grading strategies I, III, V, VI and VII (range 90.9%–100%) and moderate for strategies II and IV (69.7% and 72.7%).
In this cross-sectional study, we describe a simple CIN grading system based solely on a cumulative three-tiered immunoscore for biomarkers Ki-67 and p16ink4a to perform better in terms of accuracy and reproducibility, compared with the classical histological and non-objectified IHC CIN grading system. We have added Ki-67 scores to the p16ink4a immunoscore because it is essential to identify proliferative activity in p16ink4a-positive CIN. Performance of p16ink4a scoring without other markers or H&E grading is known to increase the shift from CIN1 to CIN2, resulting in overtreatment.2 Interestingly, our additional statistical analysis shows that sole p16ink4a staining has a lower accuracy for CIN3 than combined p16ink4a and Ki-67 staining (online supplementary figure 1). By using the Ki-67 and p16ink4a immunoscore, we showed that immunoscore 6 was able to detect reliably the highest number of CIN3, combined with the lowest proportion of CIN1 lesions. The proportion of test positives for CIN3 of immunoscore 6 (95.5%) was significantly higher in comparison to the classical CIN grading based on sole H&E staining (81.8% for positivity definition ≥CIN3) or combined H&E with non-objectified IHC interpretation (84.8% for positivity definition ≥CIN3). This accurate detection of CIN3 and CIN1 by immunoscore 6 indicates that a substantial proportion of classically graded CIN2 will be recategorised into CIN1 and CIN3 lesions by use of this immunoscore. Because the Ki-67 and p16ink4a immunoscore defines with more accuracy and better reproducibility the grade of the cervical lesion, it facilitates clear and accurate communication about CIN lesions between pathologists and clinicians and provides a more reliable basis to decide whether cervical treatment or a wait-and-see policy is appropriate.
Decisions on clinical management are based primarily on CIN grade. However, management guidelines of CIN2 vary substantially between and within non-US countries. Presently, gynaecologists generally will treat all CIN3 lesions, but for CIN2 lesions, depending on the size of the lesion and patient’s preference and age, both treatment and a wait-and-see policy are widely advocated.21–23 The CIN grading system presented in this paper can be used as a proposal for further research to validate our results and to achieve more standardisation in CIN management.
This proposal is based on the most accurate CIN grading strategy found in this study to detect CIN3 (treatment) and CIN1 (no treatment). For clinical use we propose to first define a CIN grade (CIN1–3) based on an H&E staining. Then the immunoscore of a CIN lesion should be reported. Based on the immunoscore treatment can be defined: suggesting no treatment for lesions with an immunoscore 0–3, a wait-and-see policy for lesions with an immunoscore 4 and 5, and excisional treatment for lesions with an immunoscore 6. Thus, treatment is based on the most accurate and reproducible CIN grading. Especially for CIN2 where only 55% had an immunoscore 6, use of the immunoscore would direct clinical management objectively and separate this group into lesions requiring close follow-up and lesions requiring treatment. Additional studies with a large number of samples should validate these data, and give insights on management suggestions before implementation in clinical practice can be recommended.
In the recently published US Lower Anogenital Squamous Terminology Standardization Project for HPV-Associated Lesions (LAST) guidelines, Darragh et al extensively investigated the most optimal classification strategy for the grading of genital intraepithelial neoplasia.2 They further optimised the Bethesda classification for dividing genital lesions in HSILs and low-grade squamous intraepithelial lesions. Thereby they dissuade classically three-tiered CIN grading and recommend p16ink4a staining only for CIN2+ lesions where the pathologist is in doubt. Positivity for p16ink4a of at least one-third of the epithelium supports the diagnosis of HSIL, and treatment of all HSILs is recommended.2 As more than half of CIN2 lesions and a substantial number of CIN3 lesions will regress,4 23–26 treatment of all HSIL results in considerable overtreatment which has major consequences, especially for women in their fertile age, in terms of cervical morbidity and preterm delivery.27 The Ki-67 and p16ink4a immunoscore could benefit patients in reducing the current practice of overtreatment by excising or ablating all CIN2 that is known to include productive, regressive and progressive lesions.
The Ki-67 and p16ink4a scoring system has some limitations. It still involves microscopic evaluation of a biopsy with its attendant sampling error, and interpretation of immunostaining may be difficult in some cases. However, by strict definition of the different scores and addressing difficulties in scoring as described in this article, potential scoring problems have been reduced to very low levels.
To define the Reference Standard score, we used the consensus diagnosis of CIN based on the combined H&E score with non-objectified Ki-67 and p16ink4a interpretation. This was done because the use of H&E in conjunction with these IHC markers has the highest accuracy.19 28–30 To prevent that the Ki-67 and p16ink4a immunoscore could have influenced the CIN grade based on H&E and non-objectified Ki-67 and p16ink4a score, at least 2 weeks were present between the scoring of Ki-67 and p16ink4a, and the grading of CIN based on combined H&E and non-objectified Ki-67 and p16ink4a interpretation.
A further limitation is that expression of Ki-67 and p16ink4a represents only part of the complex process of progression of CIN to cancer. Reduction or loss of completion of oncogenic HPV life cycle as, for instance, indicated by loss or reduced expression of HPV E4 protein in CIN3 is also potentially relevant for progression of CIN and for treatment decisions especially in CIN2.31 32 In addition, the presence of hypermethylation of promotor regions in host cell genes (ie, CADM1, MAL, miR124-2 and FAM19A4) is associated with CIN2/3 lesions with a high short-term progression risk for cervical cancer.8 33 The evaluation of HPV E4 protein staining in CIN, classified by the immunoscore system and the presence of hypermethylation of these genes, is presently under investigation. Preliminary results seem to confirm that, in HPV-positive lesions, E4 staining decreases from immunoscore 0 to 4, whereas hypermethylation starts to be present in immunoscore 4 lesions with highest values in immunoscore 6 lesions.34 Collectively, the immunoscore system seems to provide a classification system that could be useful and important in studying the role of these additional markers in improving management of CIN.
In conclusion, the grading of CIN by this simple Ki-67 and p16ink4a immunoscore system shows a higher accuracy and better reproducibility than the classical CIN grading system, especially for CIN3 (treatment) and CIN1 (no treatment). Due to the optimisation of CIN3 and CIN1 diagnoses, a division of classical CIN2 into these categories can be made. Validation in large study numbers, preferably in a prospective trial in which also other biomarkers such as E4 and hypermethylation of host cell genes are taken into account, is needed. Furthermore, the Ki-67 and p16ink4a immunoscore system better reflects where in the biological trajectory of development of cervical cancer through infection and precancer the cervical lesion is situated. This new grading system might provide a basis for development of standardisation in the diagnosis of CIN and clinical management in women with cervical precancer.
Take home messages
Grading of cervical intraepithelial neoplasia (CIN) by a simple Ki-67 and p16ink4a immunoscore system has a higher accuracy and reproducibility compared with current CIN grading.
By use of the Ki-67 and p16ink4a immunoscore, most of classical CIN2 can be divided into more accurately graded CIN1 and CIN3.
Use of the Ki-67 and p16ink4a immunoscore for CIN grading allows better evaluation of the role of new biomarkers in the development of cervical cancer.
We gratefully acknowledge all the research staff and technicians of the Department of Pathology VU University Medical Center.
AL and WWK contributed equally.
Handling editor Cheok Soon Lee.
Contributors MZ and CJLMM have set up the trial. MZ, AL, WWK, MCGB, DJ, MvdS, DAMH, RDMS, PJFS, JB, WGVQ and CJLMM were involved in data collection. MZ and HB performed the statistical analysis. MZ managed the database. MZ, MCGB, DJ and CJLMM drafted the manuscript. All authors critically reviewed the manuscript and approved the final version. All authors had full access to all of the data in the study and can take responsibility for the integrity of the data and the accuracy of the data analysis and believe that the manuscript represents honest work. CJLMM affirms that the manuscript is an honest, accurate and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned have been explained.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests DAMH, PJFS, RDMS and CJLMM are minority shareholders of Self-screen, a spin-off company of VUmc, of which CJLMM is part-time director since September 2017. Self-screen holds patents related to the work (ie, hrHPV test and methylation markers for cervical cancer screening). DAMH serves occasionally on the scientific advisory board of Pfizer. PJFS has been on the speakers bureau of Roche diagnostics, Gen-Probe, Abbott, Qiagen and Seegene and has been a consultant for Crucell. JB received travel support from DDL Diagnostic Laboratory, speakers’ fees from Qiagen and consultancy fees from Roche, GlaxoSmithKline and Merck/SPMSD; all JB’s fees were collected by his employer. WGVQ is shareholder of DDL Diagnostic Laboratory. CJLMM has received speakers’ fee from Qiagen and SPMSD/Merck, served occasionally on the scientific advisory board (expert meeting) of Qiagen and SPMSD/Merck and has been by occasion consultant for Qiagen. CJLMM has a very small number of shares in Qiagen, and was minority shareholder of Diassay until April 2016. All other authors have no conflict of interest to declare.
Patient consent Not required.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.