Aims: Receiver operating characteristic (ROC) curve analysis is a well-established method to study the accuracies of biological markers. It may, however, be suboptimal for analysing outcomes over time, such as prognosis. Here, the clinical value of time-dependent ROC curve analysis for improving the identification of high-risk patients with colon cancers and diffuse large B-cell lymphomas (DLBCL) is explored.
Methods: Using tissue microarrays, immunohistochemistry was performed on two matched sets (N = 469, each) of colon cancers (p53, CD8+ tumour infiltrating lymphocytes (TILs), mammalian sterile-like 20 kinase 1 (MST1), mucin 2 (MUC2) and urokinase plasminogen activator receptor (uPAR)) and on 208 DLBCL (Bcl2, Bcl6, CD10, FOXP1 and Ki67). The area-under-the-curve (AUC)-over-time plots, cut-off scores for tumour marker positivity and Kaplan–Meier survival curves were analysed.
Results: With the exception of uPAR, all markers were most accurate within the first 18 months following diagnosis. Expression of p53 (AUC = 0.75), uPAR (AUC = 0.64), Bcl2 (AUC = 0.58) and FOXp1 (AUC = 0.68) was linked to more aggressive tumours, while TILs (AUC = 0.38), MST1 (AUC = 0.39), MUC2 (AUC = 0.38), Bcl6 (AUC = 0.4), CD10 (AUC = 0.49) and Ki67 (AUC = 0.41) were predictive of improved survival. Cut-off scores for markers at their peak accuracies as well as survival time differences were reproducible between colon cancer groups. Only FOXp1 at its optimal cut-off of 60% had significant effects on survival in DLBCL (p = 0.019).
Conclusions: Time-dependent ROC curve analysis is a novel tool for identifying potential immunohistochemical prognostic markers across varying follow-up times. Use of this tool could facilitate the identification of high-risk patients not only with colon cancer and DLBCL but with a range of other tumour types.
Statistics from Altmetric.com
Immunohistochemistry (IHC) is an established part of the surgical pathologist’s daily routine known to yield reliable qualitative diagnostic information. It has also become an indispensable research tool and has contributed to the identification of proteins which are current therapeutic targets.1 2 3 In prognostic studies, however, the clinical value of IHC is largely dependent on the scoring methodology used for assessment. Often, tumours are scored as positive or negative around a predetermined cut-off score dependent on the number of immunoreactive tumour cells, for example 10% for p53.4 5 These arbitrarily chosen cut-off scores, however, can vary significantly between both studies and observers, thus rendering the true prognostic value of a protein difficult to ascertain.6 7 8 Staining intensity is also commonly used as a measure of positivity but can be affected by storage time and laboratory processing techniques. Such intensity scores, in many instances, yield only low to moderate interobserver agreement between independent pathologists.9 10 An alternative approach would be to evaluate the proportion of immunoreactive tumour cells over the total number of tumour cells resulting in a percentage of positivity. This scoring method leads to acceptable interobserver agreement and has the considerable advantage of allowing more clinically relevant cut-off scores for positivity to be identified due to the quantitative nature of such scores.11
Receiver operating characteristic (ROC) curve analysis is an established statistical tool in many areas of medical practice and is used to evaluate the sensitivity, specificity and diagnostic utility of quantitative tests when the endpoint of interest has two possible outcomes.12 13 14 This method has at least two direct applications to IHC. By evaluating the sensitivity and specificity of a tumour marker at every possible IHC score, a cut-off value for positivity can be selected from the ROC curve such that it optimally discriminates between patients with or without an outcome of interest.15 Such a systematic approach to selecting cut-off scores increases the probability of observing clinically relevant findings in subsequent analyses. This method has recently been used for immunohistochemical markers of tumour progression in colorectal cancer, lymphoma and urothelial bladder cancer.16 17 18 19 20 ROC curve analysis can also be applied to evaluate and compare the diagnostic or prognostic accuracies of protein markers. A marker capable of perfectly discriminating between all patients with or without an outcome will have an area under the curve (AUC) of 1.0.14 A protein with no predictive ability will have an AUC of 0.5 and resemble a line bisecting the ROC plot. The greater the AUC, the greater the capacity of the marker to correctly classify patients. This aspect of ROC curve analysis has been used in medicine to test the performance of standard versus novel serum markers capable of discriminating between different diagnoses, predicting disease recurrence or classifying patients into prognostic subgroups.21 22 Often, however, prognosis is not characterised by the binary outcome “deceased” or “living” but rather as time to death, or survival time. A comparison of the prognostic accuracies of different markers using a standard ROC curve approach does not account for this time-dependent variable and may thus prove incomplete or even misleading.
Recently, an extension of ROC curve analysis was established to handle time-to-event outcomes, such as prognosis.23 This extension has so far been used to study gene-expression profiles in breast and lung cancer as well as the changes in prostate specific antigen (PSA) over time in patients with prostate cancer.24 25 26 This time-dependent ROC curve approach does not appear to have been previously used in the context of IHC but can be seen to have significant implications for the selection of protein markers for future prognostic studies. In this study, we explore the clinical utility and possible benefits of time-dependent ROC curve analysis for determining the prognostic accuracies of IHC markers. To establish this proof of concept, five prognostic protein markers, namely p53, CD8 for detecting cytotoxic tumour infiltrating lymphocytes (TILs), mammalian sterile-like 20 kinase 1 (MST1), mucin 2 (MUC2) and urokinase plasminogen activator receptor (uPAR), were evaluated on 938 cases of colon cancer randomised into two matched test groups. The method was then successfully applied to 208 cases of diffuse large B-cell lymphoma (DLBCL) for the markers Bcl2, Bcl6, CD10, FOXp1 and Ki67.
Patients and clinicopathological features
Tissues from 1420 primary colorectal cancers were included on a previously described tissue microarray.27 Only tumours located in the colon (N = 938) were entered into the analysis. The clinicopathological information for these patients is summarised in table 1 and included sex, age, tumour diameter, T classification (pT) and N classification (pN) stage, tumour grade, vascular invasion, mismatch-repair status and survival time. Censored observations involved patients who were alive at last follow-up or who died from causes other than colon cancer. In addition, information on the presence or absence of local recurrence or distant metastasis was available in 361 and 371 cases, respectively, and on adjuvant therapy in 362 patients, of which 21% (N = 76) received postoperative therapy.
Previously described tissue microarrays of 208 clinically well-characterised DLBCL were used for the study.28 29 The clinicopathological information for these patients included sex, age, tumour stage, the international prognostic index (IPI), presence or absence of B symptoms, therapy, death causes and survival time (table 2).
In colon cancer cases, IHC was performed for p53, CD8+ TILs, MST1, MUC2 and uPAR, while in DLBCL cases, IHC for Bcl-2, Bcl-6, CD10, FOXP1 and Ki-67 was carried out as described previously elsewhere for both entities.19 28 Intraepithelial CD8+ TILs located in direct contact with tumour cells were quantified over the area of the entire punch for each case in the TMA. The remaining protein markers were scored by evaluating the proportion of positive tumour cells over the total number of tumour cells resulting in a percentage of positivity from 0% to 100%. Staining intensity was not assessed.10
Randomisation of colon cancer cases
Colon cancers were randomised into two test groups of N = 469 patients each and matched on gender, pT, pN, tumour grade, vascular invasion and survival time. The 5-year survival rate for test group 1 was 57.4% (95% CI 52 to 62%) and in test group 2, 57.7% (95% CI 52 to 62%). Due to the relatively few cases of DLBCL, these tumours were not randomised.
A time-dependent ROC curve analysis was performed with R software, version 2.7.1 (The R Foundation for Statistical Computing 2007, http://www.r-project.org/) and with the “survivalROC” package (http://faculty.washington.edu/heagerty/Software/SurvROC/). In a first step, the prognostic accuracies of all markers were evaluated by plotting the cumulative AUC over time curve. From the curve, the time point with the greatest accuracy for predicting survival was then identified, and the 95% CI for the AUC at that time point were obtained by 200-bootstraped replications of the data. In a second step, the ROC curve for the marker at the time of greatest accuracy was plotted and used to identify the “optimal” immunohistochemical cut-off score. For AUC values >0.5, the optimal cut-off score was selected by identifying the point on the curve with the shortest distance to the point (0,1), or the upper-left hand corner of the ROC curve plot. In contrast, for AUC values <0.5, the optimal cut-off was selected by identifying the point on the curve with the shortest distance to the lower right-hand corner of the ROC curve plot (http://en.wikipedia.org/wiki/Receiver_operating_characteristic). In a third step, Kaplan–Meier survival curves for negative and positive marker expression were analysed by the logrank test. A multivariable survival analysis was carried out by Cox regression analysis, after verification of the proportional hazards assumption. For colon cancer, the independent variables for multiple regression included the marker itself, pT stage, pN stage, tumour grade, vascular invasion, age and mismatch-repair status, while for DLBCL, variables included the marker itself and IPI or stage. Survival analyses were carried out with SAS (Version 9.1, The SAS Institute, Cary, North Carolina).
The prognostic accuracy of p53 was greatest within 7 months (AUC = 0.75) and 4 months (AUC = 0.74) after surgery in test groups 1 and 2, respectively (fig 1A,B). The cut-off score for p53 positivity was determined to be 15% (fig 1C,D) but did not lead to significant differences in survival time when negative (⩽15%) and positive (>15%) tumours were analysed (fig 1E,F) in either univariate or multivariable analysis.
In both test groups, CD8+ TILs demonstrated the greatest prognostic value within the first 7 months (AUC = 0.42) and 17 months (AUC = 0.38), respectively (fig 2A,B). Cut-off values of ⩽2 TILs/high-power field (HPF) (fig 2C,D) resulted in reproducible and significant prognostic differences between negative and positive patients in univariate analysis. These results were not maintained after adjusting for the effect of mismatch repair status in multivariable analysis (fig 2E,F).
Maximum AUC values of 0.34 and 0.39 were found for the marker MST1 (fig 3A,B). The cut-off score of 70% was reproduced in both test groups (fig 3C,D) and led to considerable survival time differences, with MST1 positivity resulting in favourable prognosis compared with MST1 negativity (fig 3E,F). Although MST1 was found to be an independent prognostic factor after adjusting for pT, pN, grade and vascular invasion, it did not maintain this independence after including mismatch-repair status in multivariable analysis.
The prognostic value of MUC2 was the greatest within the first 5 to 7 months (AUC = 0.44) and 13 months (AUC = 0.38) in test groups 1 and 2, respectively (fig 4A,B), and an identical cut-off score of 5% was reproduced for both (fig 4C,D). Patients with negative MUC2 staining demonstrated a significantly more adverse survival compared with patients with positive MUC2 (fig 4C–F). MUC2 was found to be a prognostic factor independent of pT and N stage, tumour grade and vascular invasion but not after adjusting for mismatch repair status.
Prognostic accuracies of 0.59 and 0.64 were found for uPAR in test groups 1 and 2 (fig 5A,B). A cut-off score of 75% was obtained in both groups and led to significant survival time differences between positive and negative tumours (fig 5C–F). Although uPAR was an independent prognostic factor in test group 1 (HR = 1.42 (1.1 to 2.0)), this effect was not reproduced in test group 2.
The prognostic accuracy of all markers of DLBCL was the strongest within the first 9 months after diagnosis (fig 6A–E). FOXp1 was found to have the strongest prognostic ability (AUC = 0.68) followed by Bcl6 (AUC = 0.6), Ki67 (AUC = 0.59) and Bcl2 (AUC = 0.58). Even at the time of greatest accuracy, the AUC for CD10 was 0.49, indicating no discriminatory ability of the marker for survival. Cut-off scores for each marker were obtained (fig 6F–J) and subsequently applied for Kaplan–Meier survival curve analysis (fig 6K–O). Only FOXp1 demonstrated significant differences in prognosis with patients having positive tumours experiencing an unfavourable outcome compared with patients with negative FOXp1. Expression of FOXp1 was found to have a significant independent effect on prognosis (p = 0.037, HR = 1.8 (1.1 to 3.2)) in multivariable analysis with tumour stage but not after adjusting for IPI (p = 0.099, HR = 1.7 (0.9 to 3.2)).
A summary of prognostic marker accuracies, cut-off scores along with their sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and survival analysis can be found in table 3 for all markers.
Our results clearly outline the dynamic prognostic accuracies of immunohistochemical biological markers. This temporal heterogeneity can be exploited to identify the time interval after diagnosis over which a marker can be optimally utilised for prognostic studies. The time-dependent method proposed and validated here in two different disease entities and in two randomised groups of colon cancers can identify protein markers which could be considered candidates for future prognostic studies.
This methodology has several advantages over standard ROC curve approaches, the first and foremost being the visualisation of the changes in discriminatory power of the marker from the time of diagnosis over various follow-up times. This procedure is summarised in fig 7A. The first possibility is that marker accuracy over time is maintained around 0.5 and therefore has no particular discriminatory ability. A second option is that increasing protein marker expression is related to poorer prognosis, and so AUC values are >0.5 (marker A). The cut-off score is chosen by plotting the standard ROC curve at the time where the marker has the greatest prognostic impact and identifying the point on the curve with the shortest distance to the upper left-hand corner of the graph (fig 7B).15 If one were to classify tumours as positive or negative using this cut-off score, one would expect positivity to occur more frequently in patients who have died over this time interval. This scenario is exemplified by uPAR in colorectal cancer where the positivity of the protein was associated with adverse survival. The third option is that increasing protein expression of the marker is related to favourable prognosis and thus has AUC values <0.5 (marker B). For marker B, the optimal cut-off score is found by selecting the point on the ROC curve with the shortest distance to the lower right-hand corner (panel C). Positivity is therefore expected to occur more frequently in patients who have survived. For these markers of favourable outcome such as CD8+ TILs, MUC2 or MST1, one may be more interested in visualising the effects of negative expression and its adverse effect on prognosis. In this case, the ROC curve can easily be inverted. The orientation of the ROC curves is simply dependent on the target outcome, and most statistical packages will either perform this inversion automatically or allow the user to choose the outcome of interest.
The example of p53 highlights the usefulness of time-dependent ROC curve analysis. Although p53 was found to be the most accurate colon cancer marker in this study, the Kaplan–Meier survival curves showed no significant differences between p53 negative and positive tumours. This result can be explained by focusing on the changes in accuracy of this marker over time. The AUC-over-time plot shows a highly significant drop in AUC values immediately following the peak time points in both test groups. At 2-year, 5-year and even 10-year follow-up times, p53 undergoes a complete loss of discriminatory ability, a loss which is reflected in the Kaplan–Meier survival curve of both matched groups and occurred despite having selected the optimal cut-off score for this marker. The impact of p53 on clinical outcome may therefore only be relevant at earlier follow-up times, which could help to explain the discrepant findings between study groups investigating p53 IHC and 5-year survival.30 31 Moreover, plotting the Kaplan–Meier curve for p53 for the first year follow-up only leads to significant differences in survival time with patients having positive tumours demonstrating a worse prognosis compared with patients with p53-negative tumours. In fact, with the exception of uPAR, the prognostic accuracy of the remaining colon cancer markers was also the greatest within the first 18 months after surgery, suggesting that they are more reliable when used as prognostic markers within this time frame.
The prognostic findings using this time-dependent approach are in agreement with the established biological function of these markers in the literature. p53 gene expression has been linked to worse survival, while positivity for CD8+ TILs, MST1 and MUC2 is associated with less aggressive tumour phenotypes and improved survival.19 31 32 33 34 In our study, uPAR was found to be the only colon cancer marker whose accuracy was improved or at least maintained at later time points. Since uPA and uPAR have been implicated in the initiation of metastasis, this finding could be related to the time necessary for the seeding of disseminated tumour cells from the primary tumour and the formation of metastases at distant secondary sites.35 36 37 These results suggest that immunohistochemical assessment of uPAR has significant clinical relevance and may be useful in selecting patients who would benefit from close follow-up between 2 and 5 years following surgical resection.
In DLBCL, all markers were found to be most accurate within the first year following diagnosis. Time-dependent ROC curve analysis also allowed us to identify FOXp1 as a significant adverse prognostic factor. The FOXP1 forkhead transcription factor maps to a solid tumour suppressor locus on 3p14.1, and nuclear protein expression is commonly lost in epithelial malignancies.38 Recent reports have indicated that FOXP1 expression is associated with a subset of DLBCL with a non-germinal centre origin and that recurrent chromosome translocations targeting the FOXP1 locus are linked to increased expression of this protein in some DLBCL.39 40 41 FOXP1 expression in DLBCL is in turn associated with poorer survival time.42 43 Here, we document the role of FOXP1 as a strong indicator of a more aggressive tumour phenotype which may have significant clinical value for identifying high-risk DLBCL patients within the first year following diagnosis.
Although our findings clearly underline the relevance of adding time to prognostic marker studies, it is less clear how this addition affects the selection of cut-off scores to define tumour positivity. Using standard ROC analysis, we previously reported cut-off scores of 60–75% for uPAR, 60–80% for MST1, fewer than five CD8+ TILs per HPF and less than 5% staining for MUC2 for various clinical endpoints. These results are almost identical to the cut-off scores obtained here with time-dependent ROC curve analysis. Although the reason for this is unclear at present, the preliminary results presented here indicate that a standard ROC curve approach may be sufficient to obtain cut-off scores for immunohistochemical prognostic markers. In order to clarify this, further optimisation to this process will be required, including consideration of the choice of antigen, laboratory protocols and interobserver agreement of scores from independent pathologists.
Time-dependent ROC curve analysis may increase the identification of prognostically relevant immunohistochemical markers. In addition to the advantages of standard ROC curve analysis, this time-dependent approach provides information on the time interval over which the marker is most reliable, how clinically relevant it is and which cut-off scores should be used to discriminate between better or worse survival. Most importantly, this method has the potential to identify potential prognostic markers which would facilitate the identification of high-risk patients not only with colon cancer and DLBCL but also eventually with other tumour types.
The prognostic accuracy of immunohistochemical protein markers varies significantly with follow-up time.
The prognostic accuracy of different protein markers can be compared by time-dependent ROC curve analysis using a graphical display of the area under the curve (AUC) over follow-up time.
Using this approach, the time interval over which a marker can provide optimal prognostic information can be easily determined and employed to identify patients with adverse prognosis.
Optimal cut-off scores for immunohistochemical prognostic marker positivity is made possible by time-dependent ROC curve analysis.
The novel application of time-dependent ROC curve analysis to identify potential immunohistochemical prognostic markers could have a significant impact on the identification of high-risk patients with poor clinical outcome.
The authors would like to thank Y Zheng, Fred Hutchinson Cancer Research Center, for valuable help with statistical programming, and K Baker, for revision and editing of the manuscript.
Funding This project was supported by the Novartis Foundation, formerly Ciba-Geigy-Jubilee-Foundation (IZ). The sponsor has had no role in study design; in the collection, analysis and interpretation of data; in the writing of the report; and in the decision to submit the paper for publication.
Competing interests None.
Ethics approval Ethics approval was provided by University Hospital of Basel, Basel, Switzerland.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.