Aims Morphological differentiation among different blast cell lineages is a difficult task and there is a lack of automated analysers able to recognise these abnormal cells. This study aims to develop a machine learning approach to predict the diagnosis of acute leukaemia using peripheral blood (PB) images.
Methods A set of 442 smears was analysed from 206 patients. It was split into a training set with 75% of these smears and a testing set with the remaining 25%. Colour clustering and mathematical morphology were used to segment cell images, which allowed the extraction of 2,867 geometric, colour and texture features. Several classification techniques were studied to obtain the most accurate classification method. Afterwards, the classifier was assessed with the images of the testing set. The final strategy was to predict the patient’s diagnosis using the PB smear, and the final assessment was done with the cell images of the smears of the testing set.
Results The highest classification accuracy was achieved with the selection of 700 features with linear discriminant analysis. The overall classification accuracy for the six groups of cell types was 85.8%, while the overall classification accuracy for individual smears was 94% as compared with the true confirmed diagnosis.
Conclusions The proposed method achieves a high diagnostic precision in the recognition of different types of blast cells among other mononuclear cells circulating in blood. It is the first encouraging step towards the idea of being a diagnostic support tool in the future.
- image analysis
- automatic classification
- morphological analysis
- peripheral blood
Statistics from Altmetric.com
Acute leukaemia (AL) are some of the most common neoplastic diseases affecting both adults and children. The starting point for clinical pathologists in the diagnosis of most AL is the detection of blast cells in peripheral blood (PB). For this reason, morphology is still important since a reliable quick diagnosis can be made using basic haematological parameters combined with microscopic investigation.
Automated analysers based on digital image analysis are used in laboratories, which pre-classify PB normal cells. However, the morphological discrimination of abnormal PB cells by these devices is still an unresolved problem, and some authors reported accuracies in blast classification around 76%.1–3 Moreover, these analysers cannot discriminate between myeloid and lymphoid blast lineages, which is essential for choosing the suitable initial treatment.
Multiple efforts have been made to develop image-based automated methods for classifying blasts, some of them proposing the use of machine learning techniques.4–16 Most of these previous papers focused on the automatic recognition of myeloblasts vs. lymphoblasts,9 16 lymphoblasts vs. lymphocytes,5 10 11 and lymphoblasts or myeloblasts vs. leukocytes.7 12 13 15 Nevertheless, automatic distinction between blasts and reactive lymphocytes could be also difficult since both share some morphological similarities, as it has been highlighted in some external quality control surveys.17 18 This is why a previous paper by the authors19 included for the first time reactive lymphocytes for their automatic recognition alongside myeloblasts and lymphoblasts.
The objective of this paper is to develop a machine learning system to predict the diagnosis of acute leukaemia using PB images. The system input is a set of cell images of an individual smear, and the output is the prediction of one of the following diagnoses: myeloid leukaemia, promyelocytic leukaemia, lymphoid leukaemia or infection. It will be achieved through the automatic recognition of different blasts (myeloid and lymphoid origin) and pathological promyelocytes, reactive lymphocytes and other normal mononuclear cells, such as lymphocytes and monocytes.
Such a high number of groups has not been jointly addressed before. This adds more complexity to the challenging automatic classification of blasts because of the overlapping morphological characteristics they exhibit.20–22 A particular major challenge is the detection of pathological promyelocytes, whose patients could present serious haemorrhagic accidents and die if no treatment is initiated promptly.23 24
Materials and methods
This study was performed following two stages: (1) classifier development, and (2) diagnostic system design and assessment.
Blood samples, collected in EDTA, were automatically prepared using the slide maker–stainer SP1000i (Sysmex, Kobe, Japan) and stained with May Grünwald-Giemsa. Digital images of PB cells were acquired by the CellaVision®DM96 (CellaVision, Lund, Sweden) (363×360 pixels) from the routine workload of the Core Laboratory of the Hospital Clinic of Barcelona. PB cell images were identified by pathologists (AMe and AMo) according to their morphological characteristics. Patients’ diagnoses were confirmed by the integration of all supplementary test results: flow cytometry, cytogenetics and molecular biology. Patients with AL were diagnosed by the Clinic Haematology Service of the Hospital following the WHO 2016 classification.25 Table 1 shows the number of images and patients with AL and infections included in training and testing sets.
A total of 442 PB smears from 206 patients were analysed: 86 corresponding to outpatients (not admitted to the hospital) without any haematological disease, in which cell blood count and smear review were normal, 53 with viral or other infections and 67 with AL: myeloid (18) promyelocytic (9) monocytic (17) and lymphoid (23). From each smear, pathologists selected those cell images corresponding to the confirmed diagnosis, obtaining a total of 7,468 images to work with. The training set was arranged with 75% of these smears (332 with 5,493 images), distributed in seven cell groups: lymphocytes (n=844) and monocytes (n=656) as the control group, reactive lymphocytes (n=944), myeloblasts (n=858), monoblasts (n=657), pathological promyelocytes (n=597) and B-lymphoblasts (n=937).
The testing set included the remaining 25% of the smears (110 with 1,975 images), distributed as follows: lymphocytes (n=245), monocytes (n=227), reactive lymphocytes (n=264), myeloblasts (n=337), monoblasts (n=357), pathological promyelocytes (n=157) and B-lymphoblasts (n=388).
Stage 1: classifier development
Image-based recognition systems include image segmentation, feature extraction/selection and classification (see figure 1). In this work, we used an automatic segmentation procedure recently published by our group. This algorithm uses the image colour information through fuzzy clustering of different colour components and the application of the watershed transformation with markers. A more detailed explanation could be found in Alférez et al.26. This segmentation allowed us to obtain three regions of interest (ROIs): the nucleus, the whole cell and the peripheral zone around the cell. A fourth ROI was obtained for the cytoplasm by the difference between the cell and the nucleus.
From each ROI, quantitative features were calculated for the purpose of morphological analysis and further classification. In this study, 28 geometric and 2,839 colour and texture features were calculated for each segmented cell. Geometric features give quantitative geometric interpretations of cell and nucleus shapes, whose definitions can be found in Alférez et al.27.
Colour is a physical property very helpful in characterising blood cells. An image is separated into three colour components: red, green and blue. To explore the rich amount of colour information in malignant blood cells, more colour spaces were used. Each colour component results in an individual greyscale image, which contains shades of grey, varying from black at the weakest intensity (0) to white at the strongest (255).28 .
Colour features are calculated from the histogram of a greyscale image. This plot displays the number of pixels corresponding to different intensity values. For each histogram, six statistical features were calculated: mean, standard desviation, skewness, kurtosis, energy and entropy.27 29
Texture in digital images is identified by uniformity, density, pixel tone and their spatial relationships, among others.30 31 Statistical measures are calculated from the grey level co-occurrence matrix (GLCM), which represents the probability that pairs of neighbouring pixels have similar intensities.32–34 Examples of these statistical parameters are correlation, cluster shade and maximum probability. In this study, we also considered more mathematical approaches to characterise texture, whose explanation can be found in Alférez et al.27 35 36
After feature extraction, each cell image is uniquely described by a set of numerical features. Using this set, the classifier output is the prediction of the class to which the cell image belongs. In machine learning, it is important to select a reduced number of features to decrease complexity and computation time.37 In this study, selection techniques were employed to determine the most relevant features by using conditional mutual information maximisation criteria.38 This procedure was implemented within the tuning of the classifier parameters, as shown by the feedback arrow in figure 1, to obtain the most relevant features from the best classification.
Several classification techniques were studied to face the challenging number of cell types under study: linear discriminant analysis (LDA), k-nearest neighbour, naive Bayes, support vector machine and random forest. A 5-fold cross validation technique was performed using the images of the training set. The overall classification accuracy (ratio of images correctly classified in their true category) was used to choose the best features and classifier. This classifier was further validated using the images of the testing set, which were not previously used in the training step.
Furthermore, to interpret the most relevant features involved in the discrimination of the cells under study, statistical analysis was performed using R software39 over the selected features: Kolmogorov-Smirnov test for residuals normality, Fligner-Killeen test for variance homogeneity, Kruskal-Wallis test to calculate p-values and Kruskal-Wallis test after Dunn test by applying Bonferroni adjustment for multiple comparisons to see which pairs of cells could be quantitatively differentiated.
Stage 2: diagnostic system design and assessment
Once the system was ready for the classification of individual cell images, our strategy was to predict patient’s diagnosis using the PB smear, trying to emulate the way pathologists interpret results in clinical laboratories. The system input was a set of cell images of an individual smear and the output was the prediction of one of the following diagnoses (see figure 2): myeloid leukaemia, promyelocytic leukaemia, lymphoid leukaemia or infection. The diagnosis was made by identifying the cell class that predominated in the smear. This required a previous step to establish a threshold value such that the diagnosis will be predicted by identifying the cell class with the percentage of images classified above this value. We used all the smears of the training set to perform a multiclass Receiver Operating Characteristic (ROC) analysis with the statistical R software. The result was the optimal threshold in terms of the percentage of images correctly classified by the system over the total number of images in the smear, such that it guarantees the largest number of smears correctly predicted in its true diagnosis within the training set. Once the threshold was determined, the automatic recognition system was assessed using the smears of the testing set, which were not used before.
A confusion matrix was obtained to evaluate the system performance, and sensitivity and specificity were calculated.
Stage 1: classifier development
For the classifier development, we extracted 2,867 quantitative features, which were ranked based on a relevance criterion. As shown in figure 3, we trained and tuned five classifiers using different numbers of the best-ranked features and displayed the corresponding accuracy. The highest accuracy was achieved with the LDA classifier when using the best 700 features.
Table 2 lists the 10 most relevant features. Among them, three features were geometric and seven represented colour and texture. Eight features were calculated from the whole cell and two from the nucleus.
As an example of a colour statistical feature calculated from the histogram, figure 4 shows the kurtosis of the green–red of the cell. It was the best feature to distinguish among the seven cell groups. It considers the green–red pixels and measures the histogram’s flatness. We observed that kurtosis is related to the nucleus/cell relation. In fact, cells with high relation show higher kurtosis values. Figure 4 shows two original images of a B-lymphoblast and a monoblast and the histograms from the green–red component. The histogram of a B-lymphoblast (in blue) presents a narrow and flatter peak with the pixels very localised within a short intensity range, so that the kurtosis of the cell is higher. In contrast, the histogram of a monoblast (in red) shows two peaks with lower height, indicating more information variety, corresponding to cytoplasm and nucleus, respectively. Table 2 summarises the results of the statistical analysis performed among the cell pairs in which recognition by morphology is difficult. It shows that kurtosis was significantly different between reactive lymphocytes and myeloblasts or B-lymphoblasts (p<0.0001), between B-lymphoblasts and myeloblasts (p<0.0001), and between myeloblasts and monoblasts (p<0.0001).
Regarding geometric features, nuclear area, cellular area and Nucleus/Cell (N/C) ratio were the most representative. Those cells with higher nucleus and bigger cellular area, such as pathological promyelocytes and monoblasts, had higher values for nuclear and cellular area s. In contrast, either myeloblasts or B-lymphoblasts are smaller cells with smaller nucleus, which was correlated with their lower values for these features. The N/C ratio was significantly different between myeloblasts and B-lymphoblasts (p<0.0001), between myeloblasts and pathological promyelocytes (p<0.01) and between myeloblasts and monoblasts (p<0.0001) (see table 2).
The blue correlation of the cell is the first among the texture features. Figure 5 shows this statistical feature, which is based on the GLCM. Correlation measures how correlated a pixel is to its neighbours over the whole cell. As shown in figure 5, the blue correlation for a reactive lymphocyte with lower nucleus/cell relation was high (0.94), since bright intensity pixels are close to each other (corresponding to cytoplasm), as well as those of darker intensities (nucleus). Moreover, the majority of values were grouped in the main GLCM’s diagonal, which resulted in a higher correlation value. In the opposite way, figure 5 shows a myeloblast with the nucleus occupying almost the whole cell and presenting many pixels with different levels of grey intensities. This indicates more intensity differences between neighbouring pixels and, therefore, a lower correlation value (0.42). As seen in table 2, blue correlation showed significant differences between reactive lymphocytes and myeloblasts and B-lymphoblasts (p<0.0001), between myeloblasts and lymphoblasts (p<0.0001), and between myeloid and monocytic blasts (p<0.0001).
Classification by individual cells
The last step for classifier development was a validation of the LDA classifier (with the best 700 features). This was done through a blind classification of all the individual images of the testing set. Table 3 shows the confusion matrix that summarises the classification results. The true positive rates shown in the main diagonal were: 97.7% for reactive lymphocytes, 97.6% for lymphocytes, 93% for monocytes, 80.8% for myeloid blasts (myeloblasts and monoblasts), 72.6% for pathological promyelocytes and 78.9% for B-lymphoblasts. The overall classification accuracy was 85.8%. Most of the individual cell images were automatically identified in the group of its true class.
Stage 2: diagnostic system design and assessment
Prior to assessment, figure 6 shows the results of the multiclass ROC analysis performed by analysing all the smears of the training set. This curve was obtained by averaging the single ROC curves obtained for each of the six classes under study. It was found that 50% of the cell images correctly classified was the best threshold to predict patient’s diagnosis through the smear, obtaining an area under the curve of 0.99.
For example, such threshold may be interpreted in such a way that if more than 50% of the cell images of a smear are classified as myeloblasts, the predicted diagnosis is acute myeloid leukaemia.
Classification by individual smears
Table 4 shows the classification results for the smears of the testing set, considering the 50% threshold. As seen in the main diagonal, 100% of smears corresponding to patients with infections were correctly classified, as well as those containing lymphocytes or monocytes (accuracies of 100% and 97%, respectively). In regard to AL, the true positive rates were 88% for myeloid leukaemia, 85% for lymphoid leukaemia and 80% for promyelocytic leukaemia. The overall classification accuracy of individual smears was 94%. Moreover, sensitivity values above 97% were obtained for normal smears and those related to infections and above 80% for all AL subtypes, while specificity values for all categories were above 96%.
Morphological review of PB smears is still relevant nowadays in that it provides the primary evidence of a specific haematological or non-haematological diagnosis. In this sense, blood cell morphology has become crucial for the initial diagnosis of AL subtype, especially in acute promyelocytic leukaemia, to apply the suitable treatment.
In this paper, for the first time, we employ a LDA classifier combined with our recently published automated segmentation algorithm26 and with feature extraction for the automatic recognition of blasts, reactive lymphocytes and other mononuclear cells, as well as for the distinction among myeloblasts, B-lymphoblasts and pathological promyelocytes. The automatic recognition of this wide variety of cell types is a new contribution since it has not previously been accomplished. Previous results showed accuracy in the automatic classification of B-lymphoblasts and lymphocytes.10 However, these two groups of lymphoid cells are usually easier to differentiate than myeloblasts and lymphoblasts. Other studies tried to automatically detect lymphoblasts7 13 or myeloblasts6 among other leukocytes, which show a very different morphology since the nucleus of neutrophils, eosinophils and basophils is lobulated and their cytoplasm shows abundant granules. From a diagnostic perspective, our contribution is relevant since it allows us (1) to diagnose malignant and non-malignant diseases and (2) to recognise the AL subtype.
In previous studies, we have described results about the automatic recognition of different types of abnormal lymphocytes, where high precision has been achieved when selecting 140 features.1 In regard to leukemic cells, the results of this study demonstrate that the best classification is achieved when the number of features is increased compared to our previously published paper.19 When we considered smaller numbers of features, the accuracy decreased (data not shown). The reason why this study required a greater number of features may be related to the inclusion of a larger image data set, as well as much different types of blasts and other mononuclear cells.
Regarding feature analysis, our results demonstrated that the N/C ratio was one of the most important geometric features. This is consistent with our previous publications, which reported that this feature was also the most relevant for the recognition among several abnormal lymphocytes subsets40 and blasts.19 Moreover, in41 were listed the features most commonly used for the automatic recognition of leukaemia, with N/C ratio being one of them. Wahhab et al. 42 found that the N/C ratio was within the best geometric features to distinguish between lymphoblasts and myeloblasts with an accuracy of 93.6%.
The relevance obtained in colour and texture features, such as K urtosis and C orrelation, can be explained by the large cytoplasm of reactive lymphocytes, monoblasts and some pathological promyelocytes in comparison with the low cytoplasm of myeloblasts and B-lymphoblasts. The Mean provides essential information for cell types with dispersed chromatin. The differences observed among different myeloid blasts can be related to the immature chromatin with visible nucleolus in myeloblasts, which is seen as a thinner texture in monoblasts, compared to the abundant azurophil granulation typical in pathological promyelocytes. Moreover, Cluster Shade allows the distinction among cells with thinner and condensed texture. Lower values shown in lymphoblasts and myeloblasts can be associated with their nucleus occupying almost the entire cell, resulting in a cell image with a very condensed texture. Higher values are seen in monoblasts and reactive lymphocytes, which exhibit a thinner texture because of their cytoplasm. The results obtained in this work are in accordance with previous studies, such as that of Wahhab et al.,42 who found that information of the nuclear chromatin distribution pattern could be provided by GLCM features, and cluster shade, correlation, mean and skewness were among the most important texture features to distinguish between lymphoblasts and myeloblasts.
The automatic recognition of blasts has always been addressed considering each cell image as a unit,4 37 without considering individual smears when arranging sets for both training and assessing the classifier. It may happen that images from the same smear are used in both stages, causing an undesired bias. An innovation of this study is that we arranged a set of smears for the system development and a different one for the assessment. For the first time, PB smear was used as a diagnostic unit, which allowed us to achieve an efficient approach for diagnostic prediction in patients with AL or infection using digital images. Future implementation in clinical laboratories of such technology will entail challenges in the standardisation of PB smear staining to ensure high-quality smears and to minimise diversity among images from different sources.
A satisfactory diagnostic ability was achieved as the system differentiated normal smears from those related to infections and with respect to smears with abnormal leukemic cells (as seen in table 4). It is important to note that the lower accuracy for smears with pathological promyelocytes (80%), in relation to the other AL, is because the remaining 20% were classified as smears from patients with myeloid leukaemia, which is true since promyelocytic leukaemia is a subtype acute myeloid leukaemia.
The key contribution of this work is a system which has integrated feature selection to lead to the most relevant quantitative features to achieve the highest percentage of classification accuracy. Moreover, such detailed analysis of the most relevant features to understand their importance has not previously been done. All these improvements related to methodology have allowed us to achieve high overall accuracy not only when classifying cell by cell but also when classifying simultaneously all the images from a single smear for diagnostic purposes.
For the first diagnostic orientation of leukaemia in the clinical laboratory, it is important the detection of quantitative abnormalities in leucocyte count, haemoglobin level or thrombocyte count, which triggers the smear review. The detection of blast cells circulating in blood is the following step. The integration of laboratory parameters, smear review and an automatic classification system able to confirm blast detection and to discriminate among blasts of different origin may improve the diagnostic orientation of patients with AL. In the end, the proposed method would be a practical tool to assist the pathologist in the initial diagnosis of acute leukaemia through the morphological examination of PB cells.
Take home messages
Morphological review of the peripheral blood smear is the first step to detect blast cells, being crucial for the initial diagnosis of acute leukaemias.
The contribution of this paper is to provide an automatic recognition system able to predict the diagnosis of patients with acute myeloid leukaemia, acute promyelocytic leukaemia, acute lymphoid leukaemia and infection using peripheral blood images.
An innovation of this study is that the smear is used as a diagnostic unit, which enabled the achievement of a satisfactory diagnostic ability when differentiating normal smears from those related to infections and with respect to smears with abnormal leukemic cells.
In addition, the proposed system would be a practical tool to assist the pathologist in the initial diagnosis of acute leukaemia during the morphological examination in peripheral blood.
Handling editor Prof Mary Frances McMullin.
Contributors All the authors have fullfilled the criteria of authorship and have done substantial contributions to the study and the interpretation of data. They have revised the work, as well as the current revision of the manuscript.
Funding The study was funded by the Directory of Science, Technology and Innovation of the Ministry of Economy and Competitiveness of Spain (grant number: DPI2015-64493-R)
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement All data relevant to the study are included in the article or uploaded as supplementary information.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.