Aims Experience in the use of whole slide imaging (WSI) for primary diagnosis in pathology is very limited. We aimed to determine the accuracy of interpretation of WSI compared with conventional light microscopy (CLM) in the diagnosis of routine gynaecological biopsies.
Methods All gynaecological specimens (n=452) received over a 2-month period at the Department of Pathology of the Hospital Clinic of Barcelona were analysed blindly by two gynaecological pathologists, one using CLM and the other WSI. All slides were digitised in a Ventana iScan HT (Roche diagnostics) at 200×. All discrepant diagnoses were reviewed, and a final consensus diagnosis was established. The results were evaluated by weighted κ statistics for two observers.
Results The level of interobserver agreement between WSI and CLM evaluations was almost perfect (κ value: 0.914; 95% CI 0.879 to 0.949) and increased during the study period: κ value 0.890; 95% CI 0.835 to 0.945 in the first period and 0.941; 95%; CI 0.899 to 0.983 in the second period. Major discrepancies (differences in clinical management or prognosis) were observed in 9 cases (2.0%). All discrepancies consisted of small lesions (8 high grade squamous intraepithelial lesions of the uterine cervix, one lymph node micrometastasis of an ovarian carcinoma) underdiagnosed or missed in the WSI or the CLM evaluation. Discrepancies with no or minor clinical relevance were identified in 3.8% of the biopsies. No discrepancy was related to the poor quality of the WSI image.
Conclusions Diagnosis of gynaecological specimens by WSI is accurate and may be introduced into routine diagnosis.
- DIGITAL PATHOLOGY
- GYNAECOLOGICAL PATHOLOGY
Statistics from Altmetric.com
Whole slide imaging (WSI), also referred to as virtual microscopy or digital pathology allows digitisation of entire glass slides to achieve the diagnosis of pathological specimens. WSI scanners create a digital slide of the tissue section which, with the use of specific software, can be viewed and magnified in real time across the web very much like the use of a conventional light microscope (CLM).1–3 WSI has been shown to have many practical applications including education,4–6 teleconsultation for second opinions,7–10 intraoperative frozen section consultation10 ,11 and quality assurance.12 ,13 Potential additional benefits of WSI include improvement of the workflow by eliminating the task of slide distribution and facilitating slide filing and retrieval.1–3
The rapid advances in this technology and its many potential benefits will probably result in a progressive shift from conventional to virtual microscopy in the routine diagnosis in pathology. Currently, several commercially available systems are able to digitise glass slides containing tissue sections and produce virtual slides of excellent quality. However, although WSI has been around for more than a decade, its widespread application in primary histological diagnosis still awaits validation as opposed to traditional CLM. Moreover, despite several pilot studies suggesting that WSI is as useful as CLM for diagnostic purposes,14–22 WSI-based diagnosis has yet to be integrated in routine pathological studies, with a very small number of exceptions. The use of WSI in the routine practice of pathology laboratories is still not common because of difficulties in the integration between the WSI software and the laboratory information systems (LIS), insufficient scanning speed and robustness, and limitations in storage capacity and/or excessive costs of file storage. The lack of systematic validation studies on their use in primary diagnosis is also a major concern that hampers the introduction of this technique.1 ,3
The College of American Pathologists Pathology and Laboratory Quality Center has recently published the guidelines for validating the use of WSI for diagnostic purposes.3 According to these guidelines all laboratories should conduct their own validation studies, and the validation should include at least 60 samples. However, very few large validation studies have focused on pathology subspecialties. Indeed, only one study has been published on gynaecological disease, evaluating the correlation between WSI and CLM in the assessment of the diagnoses of frozen sections of 52 ovarian lesions.11 No previous studies have evaluated the accuracy of WSI diagnosis in the routine practice of gynaecological pathology. In the present study we aimed to determine the accuracy of interpretation using WSI as compared with CLM in a series of consecutive gynaecological specimens, representative of the spectrum of specimen types and diagnoses encountered in the routine practice of a large academic department.
Materials and methods
Characteristics of the institution
This study was performed at the Department of Pathology, Hospital Clinic (Barcelona, Spain), a large academic department composed of 15 staff pathologists, 8 residents and additional fellows. There are 14 subspecialties, and most pathologists limit their practice to one or two subspecialty areas. In 2013 the Department handled 41 928 specimens with 83 619 blocks. The number of gynaecological specimens analysed during 2013 was 4909, with 16 805 blocks of gynaecological specimens and with a median number of slides per case of 1.
Sample size calculation
Based on previous reports,15 ,23 the major rate of discrepancy between the original diagnosis by CLM and that by WSI was calculated to be 3%, with a non-inferiority margin for WSI review of 4%. A one-sided binomial test was used for comparison at a level of significance of 0.05. The power to be achieved was 80%, and the level of significance was 0.05. Based on these assumptions, it was calculated that 450 cases would need to be reviewed to establish non-inferiority.
Specimens included in the study
All gynaecological specimens consecutively received over a 2-month period (July–August 2013) were included in the study at the Department of Pathology of the Hospital Clinic of Barcelona (n=452). This represented 9.21% (452/4909) of the total number of gynaecological specimens evaluated in 2013: 353/452 (78.1%) specimens were evaluated in the 1st month (July) and 99/452 (21.9%) in the 2nd month (August). Table 1 shows a summary of the location and type (either biopsy or resection) of the specimens, and the number of cases included for each type. The number of blocks per case ranged from 1 to 30 (median 1, IQR 1–3). The overall number of slides scanned was 1253.
To evaluate possible changes associated with increasing experience in the use of WSI over the study period and the agreement between WSI and CLM observers, the 456 specimens were divided into two periods: one including the first 226 specimens and the second including the last 226 specimens.
The Department of Gynecology of our hospital has a very active referral Colposcopy Clinic.24 Consequently, the series included 157 biopsies or excisions of the uterine cervix from patients referred to colposcopy because of an abnormal Pap smear. In addition to evaluation in the general analysis, these specimens were evaluated independently.
The study was approved by the Hospital Clinic Institutional Ethical Review Board.
Scanning process and characteristics of the WSI display
In July and August 2013 all the slides of gynaecological pathology were scanned daily after diagnosis by light microscopy. Scanning was performed on a Ventana iScan HT (Roche Diagnostics, Sant Cugat, Spain) at a magnification of 200×. The system creates high-resolution digital images of the tissue sections. The whole scanning process runs automatically (including selection of the area of the slide that contains tissue, placing focus points, calibration, etc). In cases with step sections of a sample on a single slide the system scans all the sections. No specific quality control of the slides scanned was made by the technicians prior to evaluation by the pathologist. The WSI produced are stored in a dedicated mass storage environment and linked to the pathology report, based on the recognition of a quick response (QR) code on the slide label. Although WSI can be accessed through the pathology LIS software (Novopath, Vitrosoft, Sevilla, Spain), for the purposes of this study the accession to the WSI was made through the viewer.
The images are viewed in the Virtuoso viewer (Roche), which works as a web browser and simulates a conventional microscope. The images are shown using the same structure provided by the LIS. No specific software installation is required to visualise the WSI. The images scanned can be viewed up to a real magnification of 200× and up to 400× with digital zoom, are always in focus, with optimised contrast and adjusted illumination. The viewer shows a thumbnail of the whole slide, which allows confirmation that all the material present on the glass slide has been included in the digital image and helps in the navigation through the slide. Figure 1 shows the appearance of the virtual microscope display.
The WSI were displayed on a 30” Coronis fusion MDC4130 monitor which has a resolution of four megapixels (Barco Electronic Systems, Barcelona, Spain).
WSI and CLM diagnosis
All cases were analysed blindly by two gynaecological pathologists, one using CLM and the other WSI. The pathologist doing the WSI evaluations had previously had a 1-week training course on the use of WSI. WSI were presented to the pathologist per case, together with the original clinical information in order to emulate the real clinical environment, and blinded to the original report based on CLM. For the purposes of this study only the H&E slides were evaluated.
The original CLM and the WSI-based diagnoses were compared by an independent gynaecologist, who judged the concordance of the two diagnoses as: (A) complete agreement between the original diagnosis and that determined with WSI; (B) minor discrepancy (mild differences which would not have any clinical or prognostic implications); and (C) major discrepancy (differences with clinical and/or prognostic implications for the patient).
Final gold standard diagnosis
The gold standard was considered as the concordant diagnosis in all cases with complete agreement in both evaluations. Each case with a discrepant result was reviewed by the two pathologists involved in the study. The revision was made using CLM, and a final consensus diagnosis was established. In this final adjudication process, the H&E slides, as well as the immunohistochemical stains (when necessary) were used to achieve the final diagnosis.
The SPSS statistical programme (SPSS TM140, V.18, Chicago, Illinois, USA) was used for statistical analysis. The results for categorical variables are expressed as absolute numbers and percentages and 95% CIs. The χ2 or the Fisher’s exact tests were used to compare the variables. The results were evaluated by weighted κ statistics for two raters. This measure calculates the degree of agreement in classification over that which would be expected by chance and is scored as a number between 0 and 1. Following the Landis-Koch benchmarks the strength of agreement of the κ values is: 0 poor; 0–0.20 slight; 0.21–0.40 fair; 0.41–0.60 moderate; 0.61–0.80 substantial; 0.81–1.00 almost perfect.25 For the purposes of weighted κ calculation the diagnoses were categorised from normal to cancer as 1: normal tissue or reactive lesions; 2: benign tumours; 3: low-grade premalignant lesions; 4: high-grade premalignant lesions; and 5: malignant tumours.
Overall, 218/452 specimens (48.2%) were evaluated as being composed of normal tissue or showing reactive lesions; 130 (28.8%) were benign tumours, 18 (4.0%) were low-grade premalignant lesions, 48 (10.6%) high grade premalignant lesions and 38 (8.4%) showed malignant tumours. Table 2 shows the distribution of these diagnoses for each specific site.
Agreement between WSI and CLM evaluations
Interpretations by WSI and CLM completely agreed in 94.2% of the biopsies (95% CI 91.7 to 96.0) while major discrepancies were observed in 9/452 (2.0%) and minor discrepancies were identified in 17/452 (3.8%) of the biopsies.
The final consensus diagnosis achieved after the adjudication meeting was in agreement with the CLM evaluation in 7/9 (77.8%) major discrepancies and in 11/17 (64.7%) minor discrepancies. Discrepancy in interpretations between WSI and CLM evaluations occurred in only two settings. Eight out of the nine major discrepancies (88.9%) observed in the study were related to the diagnosis of a high-grade squamous intraepithelial lesion (H-SIL) of the uterine cervix, as a low-grade squamous intraepithelial lesion (L-SIL) or as negative (four cases each, figure 2). The consensus diagnosis was in keeping with the CLM evaluation in six of eight cases and with the WSI evaluation in two of eight cases. The last case was a small lymph node metastasis of an ovarian carcinoma missed in the WSI evaluation (figure 3). In this latter case a small lipogranuloma was identified close to the small metastatic nest missed in the WSI evaluation. Thirteen out of the 17 minor discrepancies were related to overdiagnosis or underdiagnosis of L-SIL. In 10 cases the L-SIL lesions were missed in the evaluation (8 in the WSI and 2 in the CLM evaluation). Three cases were reactive changes in the uterine cervix overdiagnosed as L-SIL (three biopsies, two overdiagnosed in the WSI and one in the CLM evaluation). Two cases of endometrial polyps were missed (one case missed in the WSI evaluation) or overdiagnosed (one case, overdiagnosed in the CLM evaluation). The other two minor discrepancies were two small foci of endometriosis (one in the ovary, one in the Fallopian tube) missed in the CLM evaluation. None of the discrepancies was related to the poor quality of the WSI image or to insufficient magnification.
Overall the level of interobserver agreement between the WSI and CLM evaluations was almost perfect (κ value: 0.914; 95% CI 0.879 to 0.949).
Concordance in biopsies of the uterine cervix and in other samples
In the subset of 157 biopsies or excisions of the uterine cervix from patients referred to colposcopy because of abnormal Pap smear, complete agreement was observed between the WSI and CLM interpretations in 86.6% (95% CI 80.3 to 91.5) of the biopsies. Major discrepancies were observed in 8/157 (5.1%) and minor discrepancies in 13/157 (8.3%) of the samples. The κ value for this subset of samples was 0.832 (95% CI 0.757 to 0.906).
In the 295 gynaecological specimens representing tissues other than cervical biopsies and excisions, complete agreement between WSI and CLM was observed in 98.3% (95% CI 96.1 to 99.4) of the biopsies. Major discrepancies were observed in 1/295 (0.3%) and minor discrepancies in 4/295 (1.4%) of the samples. The κ value for this subset of samples was 0.976 (95% CI 0.950 to 1).
κ Analysis and discrepant diagnoses in the two study periods
Interobserver agreement increased during the study period, κ value: 0.890 (95% CI 0.835 to 0.945) in the first period, and κ value: 0.941 (95% CI 0.899 to 0.983) in the second period. In the first period of the study 5/226 (2.2%) major discrepancies and 12/226 (5.3%) minor discrepancies were detected. The number of major and minor discrepancies in the second period was 4/226 (1.80%) and 5/226 (2.2%), respectively. Interestingly, whereas the consensus gold standard diagnosis was in keeping with the CLM diagnosis in 14/17 (82.4%) discrepancies observed in the initial period, the consensus was in keeping with the WSI evaluation in 5/9 discrepancies (55.5%) observed in the second period (p=0.078).
The results of our study show a high concordance between WSI and CLM evaluations (over 94%) in the diagnosis of a large series of routine gynaecological specimens. The κ value, considered as a measure of the level of agreement among observers corrected by chance, was at the almost perfect level (0. 914). Thus, our results confirm that WSI may safely be used for performing primary histological diagnoses of gynaecological specimens.
The results of our study are comparable with other validation studies conducted on skin,5 ,19 breast,14 ,26 prostate,17 ,27 urinary bladder,18 gastrointestinal8 ,20 and paediatric pathology,28 which show a high rate of concordance between WSI and CLM-based diagnoses. However, no previous studies have evaluated the accuracy of WSI diagnosis in the routine practice of gynaecological pathology, and neither are there any data about intraobserver or interobserver agreement in the evaluation of routine gynaecological specimens using CLM. The rate of discrepancies observed in our study is within the range of generally observed intraobserver variability in pathology.27 ,29 Interestingly, the final consensus diagnosis was in agreement with the WSI evaluation in 22.2% of the major discrepancies and in 35.3% of the minor discrepancies. None of the discrepancies was related to the poor quality of the WSI image or to insufficient magnification, but rather were mostly associated with different interpretations of difficult or borderline cases or with the presence of small lesions overlooked in the evaluation.
Eight out of the nine discrepancies (88.9%) observed in the study were related to the diagnosis of H-SIL as L-SIL or as negative or reactive changes in the uterine cervix (four cases each). Similarly, 13 out of the 17 minor discrepancies (76.5%) were related to discrepancies in the diagnosis of L-SIL lesions versus normal/reactive cervical epithelium. Consequently, the κ value for the subset of cervical biopsies or excisions from patients referred because of abnormal Pap smear was 0.832, clearly lower than the general value. A number of studies using CLM have shown that there is a substantial variation between and within-observers in the interpretation of squamous intraepithelial lesions on H&E-stained tissue sections. Indeed, κ coefficients are typically found within the range of 0.45–0.50,29–36 indicating moderate agreement. Estimates for the reproducibility of histological cervical specimen interpretations performed in the course of the ASCUS/L-SIL Triage Study comparing the diagnostic results of the original clinical centre pathologists with the results from a quality control review35 showed that the reproducibility of histological interpretations of biopsy specimens was moderate (κ<0.5). In the latter study, the lack of reproducibility was substantially higher for punch biopsy specimens than loop excision procedure specimens, and variability was more prominent in the low-grade abnormalities, similar to what was observed in our study. The p16 immunohistochemical stain, strongly expressed in almost all H-SIL and not in reactive lesions,36 has recently been recommended by the College of American Pathologists to reduce interobserver variability, particularly in cases of professional disagreement.37 This technique was used in our series to achieve the final consensus diagnosis in all disagreements between CLM and WSI detected in biopsies of the uterine cervix.
The pathologist who performed the WSI evaluation had previously had very little experience in the use of WSI, although this did not severely affect the reproducibility, even in the initial period of the study. Nevertheless, a clear increase in the rate of reproducibility was observed during the study period. This suggests that minor difficulties may arise in the initial periods of using this new tool and that increasing experience with WSI increases the accuracy of the diagnosis. Moreover, whereas the consensus diagnosis reached in the discrepant cases was in keeping with the CLM diagnosis in most discrepancies observed in the initial period (82.4%), in the second period there was a tendency towards a more balanced situation and in 55.5% of the cases the gold standard diagnosis was in keeping with the WSI evaluation.
The pathologist working with WSI did not report difficulties in rendering the diagnosis at the magnification of 200× applied, indicating that a higher magnification does not seem to be relevant for most cases. This scanning strategy which is used in most validation studies,15 ,17 ,19–21 ,23 ,28 significantly saves scanning time and storage requirements. However, when WSI is routinely used for primary diagnosis it is very likely that the pathologist would require an increase in the scanning magnification in a small percentage of cases to safely establish the diagnosis. Although no formal timing was performed, the pathologist doing the WSI evaluation perceived the diagnostic process to be a little slower.
The main strength of our study is that it is the largest validation study focused on gynaecological pathology that includes a sufficient number of cases to allow robust statistical power. Indeed, only one published study has analysed the correlation between WSI and CLM in the evaluation of frozen section diagnosis of 52 ovarian lesions, showing that, as observed in our study, the correlation between CLM and WSI diagnoses is very good.11 The second strength of our study is that the participating pathologists were aware of the clinical information which could affect the diagnostic outcome.
The main limitation of this study is that intraobserver variability, which is considered the best design to confirm the reproducibility of the results obtained with two different techniques, was not evaluated.38 However, the very good interobserver reproducibility observed in our series suggests that very similar data would be obtained when cases are evaluated by the same observer.
A potential advantage of WSI is that it allows performing image analysis. This may assist in the objective evaluation of the size of the tumours and the depth of the invasion, which are relevant information for tumours of the vulva, cervix and endometrium. Eventually, this may even permit computer-assisted diagnosis that could help improve the diagnosis and decrease interobserver variability. Several legal issues have arisen from the use of WSI for primary diagnosis related to image quality, image presentation (monitor quality), storage space, adequate backup, document transfer, patient confidentiality and the confidence of the pathologist to sign out a pathology report depending on WSI. Most of these issues will probably be settled in the near future. Several digital pathology vendors are currently seeking approval from the US Food and Drug Administration to use WSI in primary diagnosis, which will definitely encourage the general use of this technique.
In conclusion, the diagnosis of gynaecological specimens using WSI shows a high concordance with the results of CLM evaluation. Our results confirm that WSI may safely be used for performing primary histological diagnosis of gynaecological specimens.
Take home messages
Interobserver agreement between whole slide imaging (WSI) and conventional light microscopy (CLM) evaluations is very good, with κ values of over 0.91.
Although the accuracy of WSI diagnosis is good even in users with limited experience, the reproducibility between WSI and CLM improves over time, indicating that increasing experience with WSI increases the accuracy of the diagnosis.
WSI may safely be used for performing primary histological diagnosis of gynaecological specimens using current scanning technology and viewing interfaces.
Scanning slides at the magnification of 20× is sufficient to achieve a correct diagnosis in most gynaecological biopsies. This scanning strategy significantly saves scanning time and storage requirements.
The authors are grateful to Rosana Millan, Margarita Mainar, Berta Coloma, Ingrid Rubio, Olga Ten, Gemma Laguna, Silvia Moya, and Arantxa Sánchez, members of the technical and administrative staff of the department of Pathology, for their support in the scanning of the slides. The authors thank Antonio Teruel, Anna Rubi, Jaume Barderi, Jose Antonio Collados and Enric Vidal, members of the staff of Roche for their technical support. The authors thank Donna Pringle for English revision of the manuscript.
Abstract in Spanish
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
- Abstract in Spanish - Online abstract
Presented in part at the annual meeting of the USA and Canadian Academy of Pathology; March 2014; San Diego, California, USA.
Contributors JO: conception, design and conduct of the study, analysis and writing of the manuscript, approval of the final version of the manuscript. PC, AS, LR-C: conduct of the study, writing of the manuscript, approval of the final version of the manuscript. OO: conception, design and writing of the manuscript, approval of the final version of the manuscript. MdP: conception of the study, the statistical analysis and writing of the manuscript, approval of the final version of the manuscript. JR: design of the study, writing of the manuscript, approval of the final version of the manuscript.
Competing interests None.
Ethics approval Ethical committee of clinical research of Hospital Clinic of Barcelona.
Provenance and peer review Not commissioned; externally peer reviewed.