Aims This work aims to propose a set of quantitative features through digital image analysis for significant morphological qualitative features of different cells for an objective discrimination among reactive, abnormal and blast lymphoid cells.
Methods Abnormal lymphoid cells circulating in peripheral blood in chronic lymphocytic leukaemia, B-prolymphocytic leukaemia, hairy cell leukaemia, splenic marginal zone lymphoma, mantle cell lymphoma, follicular lymphoma, T-prolymphocytic leukaemia, T large granular lymphocytic leukaemia and Sézary syndrome, normal, reactive and blast lymphoid cells were included. From 325 patients, 12 574 cell images were obtained and 2676 features (27 geometric and 2649 related to colour and texture) were extracted and analysed.
Results We analysed the 20 most relevant features for the morphological differentiation of the 12 lymphoid cell groups under study. Most of them showed significant differences: 19 comparing follicular and mantle cells, 18 for blast and reactive cells, 17 for Sézary cells and T prolymphocytes and 16 for B and T prolymphocytes and 16 for villous lymphocytes. Moreover, a total of five quantitative features were significant for the discrimination among reactive and the set of abnormal lymphoid cells included.
Conclusions Image analysis may assist in quantifying cell morphology turning qualitative data into quantitative values. New cytological variables were established based on geometric and colour/texture features to contribute to a more accurate and objective morphological assessment of lymphoid cells and their association with flow cytometry methods may be interesting to explore in the next future.
- image analysis
Statistics from Altmetric.com
Microscopic examination of a blood smear is necessary and clinically useful in a high number of circumstances and for several reasons,1–4 however it suffers from interobserver and intraobserver variations5 because there are not objective values to define cytological variables.6 Peripheral blood (PB) film morphology is crucial to distinguish among reactive lymphocytes (RL), abnormal lymphocytes (AL) and lymphoid blasts (LB). AL are the most difficult to identify by the participants of external quality assessment schemes in haematology.7
Lymphoid cells in chronic lymphocytic leukaemia (CLL) and follicular lymphoma (FL) are often small, while in mantle cell lymphoma (MCL) are pleomorphic, varying in size and nucleus/cytoplasm (N/C) ratio, and some cells may appear blastic with cleft nucleus and a prominent nucleolus (blastic variant). Hairy cells are larger than mature lymphocytes (ML) and have an abundant pale blue-grey cytoplasm with circumferential ‘hairy’ projections. Splenic marginal zone lymphoma (SMZL) cells show typically short polar villi. In B prolymphocytes (B-PL), cells exhibit a soft central nucleolus. Nuclei are irregular in T prolymphocytes (T-PL). Large granular lymphocytes (LGL) have fine azurophilic granules and Sézary cells (SC) may be large or small (Lutzner variant) but the nuclear morphology, described as cerebriform, is their distinguishing feature.8
During the ‘90s, studies towards quantifying cytological parameters were performed on histological sections,9–11 whereas in the next decade, PB works became more prominent.6 12–15 Benattar and Flandrin12 13 perform a morphometric and colorimetric analysis of PB lymphocytes in B cell disorders and create a scoring system to characterise these diseases by combining seven relevant morphometric criteria: nuclear shape, cellular shape and area, N/C ratio, nuclear red/blue ratio, cytoplasmic green/blue ratio and proportion of cells with nucleolus. The study includes some T cell disorders but due to the small sampling (n=10), any conclusions are drawn. Regarding B cell disorders, the study outlines that introducing colorimetric measurements can be helpful for quantifying relevant features, but standardised conditions are imperative.
Angulo et al14 15 use mathematical morphology tools to process PB images, extract quantitative measures of size, shape, colour and texture (using particle size) and analyse specific nuclear and cytoplasmic irregularities. An ontology-based framework is built to automatically describe AL from CLL, hairy cell leukaemia (HCL), SMZL, FL, MCL and T-PLL. Jahanmehr et al6 quantify cytological parameters (such as cellular area and diameter, nuclear area and density) by using image analysis, which allows distinguishing among CLL, MCL and PLL.
The contribution of this paper is to provide a set of 20 new cytological variables (geometric, colour and texture) with the following properties: (1) have quantitative formulations, (2) allow qualitative morphological interpretations and (3) are efficient to discriminate among a significant number of different abnormal lymphoid cells groups.
Materials and methods
The study included: ML from healthy individuals; AL from patients with CLL, B-PLL, HCL, SMZL, MCL, FL, T-PLL, T-LGL, Sézary syndrome; LB from patients with B-lymphoid precursor neoplasms and RL from patients with viral or other infections. Malignant diagnoses were confirmed following the WHO classification.8 A total of 325 patients were included (table 1). PB samples, collected in EDTA, were obtained from the laboratory workload. Smears were prepared using the slide maker stainer SP1000i (Sysmex, Kobe, Japan) and stained with May Grünwald-Giemsa. A total of 12 574 cell images (363×360 pixels) were obtained using the CellaVision DM96 (CellaVision AB, Lund, Sweden) (table 1). All further processes were implemented in a Dell Precision 5810 workstation by using MATLAB. Through an automatic segmentation procedure based on colour clustering and mathematical morphology,16–18 three main regions of interest (ROI) are obtained: nucleus, whole cell and peripheral zone around the cell. A fourth ROI is obtained for the cytoplasm by the difference between cell and nucleus regions. This step is critical to be successful in the feature extraction.19 In the study, 2676 features were used: 27 geometric and 2649 describing colour and texture.
With respect to geometric features, the area, perimeter, circularity, equivalent diameter, eccentricity, solidity, extent, elongation, roundness, convexity, circle and elliptic variance were calculated for both the nucleus and the whole cell regions. Their interpretation can be found in ref 20. The remaining three geometric features used involved both ROI: N/C ratio, nuclear eccentricity and hairiness. N/C is calculated as the ratio between nuclear and cytoplasmic areas. The cytoplasmic area is obtained by subtracting the nuclear from the cellular area. The nuclear eccentricity is calculated as the distance between the cellular and nuclear centre.15 20 Finally, ‘hairiness’ was calculated as described by Alférez et al16 to represent the cytoplasmic external profile.
From a numerical point of view, a digital blood image is composed of a rectangular finite grid of pixels. Each pixel is identified by its specific location and holds an intensity value. As a common practice, an original colour image is decomposed into three or four colour components, which define a colour space. For example, RGB is the colour space most commonly used, which is formed by three colour components: red (R), green (G) and blue (B). Other colour spaces used in digital image processing are CMYK (cyan, magenta, yellow and black) and HSV (hue, saturation, value), among others. Each colour component results into an individual greyscale image, which is composed exclusively of shades of grey, varying from black at the weakest intensity to white at the strongest.21
The histogram is a very practical way to quantitatively describe such images and the basis of many processing techniques as those developed in this work.19 As an illustration, figure 1 shows two examples of cell images with their corresponding greyscale images for the magenta colour component. The histogram is a two-axis plot where the horizontal axis represents the grey intensity within a range from 0 (black) to a normalised maximum value (white), divided into a discrete number of intervals. The vertical axis gives the number of pixels in the image with the intensities corresponding to each interval.
Texture in digital image is defined by uniformity, density, pixel tone and their spatial relationships, among others.22 23 To identify different texture patterns, in this work we considered the following classes of features: (1) statistical (first-order and second-order statistical features),24 25 (2) wavelet,26 (3) granulometric16 19 27 and (4) Gabor features.22 28 29
First-order statistical features are parameters calculated from the histogram of a greyscale digital image30 : (1) mean (average of the whole ROI intensity values); (2) SD (intensity dispersion around the mean); (3) skewness (histogram symmetry around the mean); (4) kurtosis (histogram flatness); (5) energy; and (6) entropy 1. The energy and entropy of a histogram correspond to the uniformity and variability of the image, respectively, and have an inverse relationship.31 Both texture features are exemplified in figure 1. Notice that the histogram of image I of figure 1 (high entropy and low energy) shows significant numbers of pixels over most of the intensity range, which indicates significant variability. On the opposite, image II of figure 1 has most of the pixels in a short high intensity interval, corresponding to a quite uniform intensity distribution. Second-order statistical features provide more information about the texture of the ROI and are based on the grey level co-occurrence matrix of a digital image.24 Some of these features are the homogeneity, information measure correlation 1 and cluster shade.25
The purpose of granulometry is to estimate the size distribution of particles in an image. In a greyscale image, two types of particles are usually considered, those being lighter or darker than the background, respectively. They can be identified by means of mathematical morphology operations, such as erosion and dilation.19 These operations basically consist of processing the original image with a small-sized auxiliary image (so-called structuring element), which acts as a probe that moves over the image to extract localised patterns of interest. The most typical is a disk with certain radius measured in pixels.
Erosion is defined to remove the bright details (higher intensities) in the image that are smaller in area than the structuring element. The whole procedure consists of applying successive erosions with disks with increasing size and computing the sum of the removed pixel intensities. Since erosion reduces the light intensities, such sum decreases, so that the procedure finishes after a number of iterations. To emphasise variations, it is better to calculate the differences between the values obtained in successive operations.
In an opposite direction, dilation aims to remove the smaller darker details by removing the lower intensities through a similar iterative procedure, which ends with differences in intensities for the successive size of the structuring element. Both procedures are visualised by means of the so-called pseudo-granulometric curve as illustrated in figure 2. The horizontal axis represents the radius of the structuring disk element and the vertical axis represents the differences in intensities. The most significant information is given by the plot peaks, which indicate the most relevant size distributions of small elements in the image. Notice that the curve has two parts: the right part (with positive size numbers) represents the density of bright elements, while the left part (with negative size numbers) corresponds to the dark elements.
The above procedures can be also realised with alternative morphological operations such as opening and closing with similar objectives to erosion and dilation, respectively. The results are plotted in the so-called granulometric curve, as illustrated also in figure 2. Both curves are commonly used in practice since they offer the possibility of extracting texture features.
The mean, SD, skewness and kurtosis were calculated over the granulometric and pseudo-granulometric curves. These eight parameters compose the granulometric features used in this work. Figure 2 depicts an explanatory example of granulometry with two lymphoid cell images with different degree of granulation: a LGL with abundant granulation and another without (AL of SMZL).
Regarding Gabor features, they are orientation-sensitive filters used for edge detection.32
In summary, 43 texture features were applied over six colour spaces including 19 components altogether (RGB, CMYK, XYZ, Lab, Luv and HSV) and three ROI, with the addition of 22 Gabor features applied over the same ROI but only on three colour components resulted into a total of 2649 colour/texture features.
We employed information theoretic feature selection33 34 to determine the 20 most relevant and less redundant from the 2676 features obtained (27 geometric and 2649 of colour/texture) with the purpose of feature analysis. Statistical tests were performed using the R code35 over the selected features36: Kolmogorov-Smirnov test to check residuals normality, Fligner-Killeen test for the variances homogeneity, Kruskal-Wallis test to calculate p values and Kruskal-Wallis after Dunn test by applying Bonferroni adjustment for multiple comparisons37 38 and Pearson's correlation.
The 20 most relevant features obtained for the discrimination of the 12 lymphoid cell groups under study are shown in table 2. All of them allowed quantifying different morphological characteristics since significant different values were obtained in all 20 features (p<0.0001). The N/C ratio was the best feature to distinguish among them and its results were in accordance with the morphology of the different lymphoid cell groups. Figure 3 shows the cell images corresponding to the lowest and highest N/C ratio values for each group and their box plots. Table 3 presents the median values of the five most relevant geometric features.
Colour/texture features were the remaining 15 most relevant. Only three colour spaces (CMYK, RGB and HSV) and six of the initial 19 colour components were involved within the first 20 features (table 2). Moreover, considering the 15 texture features used, 13 were statistical (eight of first order and five of second order) and two were granulometric. Regarding the ROI involved, 10 were calculated from the whole cell, 3 from the cytoplasm and 2 from the nucleus.
For space reasons we present box plots of a reduced number of representative features. Figure 4A depicts the box plots of the fourth most relevant feature, which is the kurtosis of the pseudo-granulometric curve of the cyan component of the cell: the highest values are found in B-PL (5.9) and LB (5.6), while the lowest are seen in villous lymphocyte of hairy cell leukaemia (VL-HCL; 3.3) and LGL (3.6) (p<0.0001). Figure 5 is intended to illustrate the basic technical background of two types of texture features with the purpose of understanding their relation with their morphological discriminating characteristics. The rest of the features that involve granulometric curves and histograms for their calculation have a similar interpretation. Figure 5A shows (I) two ML images with its lowest (L=2.85) and highest (H=6.12) kurtosis values, (II) their cyan components and (III) their pseudo-granulometric curves. Their left part (negative size) represents the size density of dark granules, while the right part represents the size density of bright granules. Note that when there is more proportion of bright granules in the lymphoid cell image (L), greater contribution of the cyan component to the image is seen, resulting into a greater uniformity and a wider curve (lower kurtosis, red curve).
The fifth feature was the skewness of the blue histogram of the cell (BHC) and figure 4B shows its box plots: the highest values are seen in AL-CLL (0.97) and SC (0.92), whereas the lowest are found in VL-HCL (0.18) and RL (0.23) (p<0.0001). Figure 5B depicts (I) two RL images with its lowest (L=−0.29) and highest (H=0.64) skewness values, (II) their blue components and (III) their histograms. Looking at the blue pixels intensity histogram, it is noted that the L histogram has a negative skewness, while it is positive in the H histogram. Image L shows higher amount of bright pixels (seen in the cytoplasm), so that the histogram distribution shifts to the right (negative skewness). Conversely, a positive skewness is observed when the majority of pixels are dark, which belong to the nucleus (image H).
The SD of the BHC region was the seventh most relevant feature and it is calculated as the fifth feature, but in this case the statistical parameter is the SD. The less histogram dispersion, the lower SD. Figure 4C displays its box plots: the lowest medians are seen in AL-FL (10.7) and LB (12.6) whereas VL-HCL (21.0) and VL-SMZL (18.7) show the highest (p<0.0001).
The above two features (fifth and seventh) are associated to the blue intensity of the whole cell image but it is worth to observe a relation with the N/C ratio. For lymphoid cells with low N/C ratio, the histogram has lower skewness since there is a balance between the nuclear and cytoplasmic pixels. In contrast, the SD has an opposite interpretation in comparison with the skewness. Indeed, we found that AL-CLL, T-PL, AL-FL and LB, which are frequently cells with high N/C ratio, showed higher skewness median values (0.97, 0.84, 0.71 and 0.63, respectively) and lower SD median values (13.8, 12.9, 10.7 and 12.6, respectively) (p<0.0001). By contrast, cells with low N/C ratio such as VL-HCL, VL-SMZL and RL had lower skewness median values (0.18, 0.41 and 0.23, respectively) and higher SD median values (21.0, 18.7 and 17.4, respectively) (p<0.0001). As shown in table 4, correlation between the N/C ratio and the fifth feature was seen in most of the lymphoid cell groups whereas an inverse correlation was detected for the seventh feature. In other words, when the lymphoid cells show very little cytoplasm, in addition to the high N/C ratio, high skewness values were observed because of the higher intensity of the dark blue of the nucleus with respect to the cytoplasm. In contrast, in these lymphoid cells with very little cytoplasm and high N/C ratio, the BHC values were lower because of the small variation of the blue components of the cell.
The eighth feature corresponds to the entropy 1 of the magenta histogram of the cell region (E1MHC), which is a first-order statistical texture feature. Figure 4D displays its box plots: it can be seen that RL, VL-SMZL, VL-HCL, LGL and B-PL show the highest entropy values (2.7, 2.6, 2.5, 2.5 and 2.4, respectively). On the contrary, AL-FL and T-PL show the lowest values (1.4 and 1.5, respectively) (p<0.0001). When correlating this feature with the N/C ratio, we found that lower entropy values corresponded to higher N/C ratio values in some lymphoid cell groups (see table 4). It means that lymphoid cells with more cytoplasm and low N/C ratio, such as reactive or villous lymphocytes, will show high E1MHC values because of the higher differences between nucleus and cytoplasm.
The ninth feature is the skewness of the green histogram of the cell and it is related to the green intensity of the cell image. Figure 4E depicts its box plots: VL-HCL and RL show the lowest skewness values (0.30 and 0.41, respectively) while the highest are seen in T-PL (1.54) (p<0.0001). This feature is calculated like the fifth feature but, in this case green is the colour component. As shown in table 4, good correlation values are found between the N/C ratio and this feature in lymphoid cells with a large cytoplasm (r=0.852–0.909; p<0.00001).
The most relevant feature obtained from the cytoplasm was the 10th feature: the mean of the pseudo-granulometric curve of the black component of the cytoplasm. B-PL showed the highest values (0.106) while AL-FL the lowest (0.030) (p<0.0001). The mean of the blue histogram of the nucleus was the most relevant feature obtained from the nucleus, which ranked 11th. Figure 4F contains its box plots. The highest median value is found in RL (142), while the lowest in ML (128) (p<0.0001).
Table 5 presents the significance obtained when comparing the 20 features among six pairs of AL subsets whose discrimination by morphology in PB smears is usually difficult. We found that most of them showed significance: a total of 19 comparing AL-FL and AL-MCL, 18 for LB and RL, 17 for T-PL with respect to SC and 16 for ML and AL-CLL, the two subsets of VL and between prolymphocyte subsets. Table 6 includes the five colour/texture features that showed significance when performing multiple comparisons between RL with each AL subset.
Morphological differentiation among AL requires experience and skills since they are the most difficult cells to recognise using only morphological features.7 The goal of this work was to use image analysis to contribute to this problem with a set of quantitative cytological features. The outcome has been 20 quantitative features that have shown ability to differentiate between 12 lymphoid cell groups: nine AL (including six B cell and three T cell disorders), RL, LB and ML. Within these features, five are geometric and 15 are based on colour/texture properties. Geometric quantitative measures are easy to interpret in terms of visual morphological characteristics. Colour/texture features have a physical basis but require more complex mathematical background for a quantitative expression and calculation. In this respect, this work has drawn reasonable cytological interpretations to help understand their discriminating abilities among the cell types under study. This discussion is focused on highlighting relevant issues concerning the features obtained.
Regarding features used in previous works for lymphoid cell differentiation, Benattar and Flandrin12 and Jahanmehr et al6 calculate geometrical features to discriminate between lymphoid cell groups by quantifying cytological variables. Sabino et al39 use texture features such as co-occurrence probabilities to differentiate among five types of normal leukocytes and only one group of AL: CLL. Angulo et al15 apply geometrical and colour/texture features using the Lab and Luv colour spaces. In our study, we considered interesting to use six colour spaces to take advantage of the rich information that more colour spaces may give to discriminate the large variety of lymphoid cell types under study. Although many of them may be redundant or not relevant for the cell recognition, this is not known a priori. The selection approach reduced the colour/texture features amount to 15, finally involving three colour spaces (CMYK, RGB and HSV) and six components (cyan, magenta, black, green, blue and saturation).
Results have demonstrated that the five relevant geometric features are in accordance with the morphological characteristics of all lymphoid cell groups. In particular, the N/C ratio was the main feature, which contained the most information regarding the cell type. As seen in figure 3, the two VL (VL-HCL and VL-SMZL) and the RL showed the lowest values, while AL-FL the highest. Previous works of our group also found that the N/C ratio was the most relevant feature in the differentiation among some AL subsets18 and among blast cells with respect to RL.40 Other studies6 12 15 calculated the N/C ratio but did not specify its relevancy. In addition, the spread values found for the nuclear perimeter in AL-MCL and SC are due to the morphological heterogeneity in the first one and to the inclusion of small SC in the second.
As we described for the first time in a previous work,16 hairiness (17th) was proposed to quantify the cytoplasm profile in VL-HCL. This work has shown that it can also be used for VL-SMZL. This feature indeed exhibits a high specificity for villous lymphocytes.
Analysing colour/texture features, we may observe interesting relations with the N/C ratio. Up to the author's knowledge, previous studies are not available to quantify the N/C ratio using colour/texture features. Cells with a large cytoplasm and low N/C ratio (figure 5B image L) have brighter pixels and show a histogram with lower skewness due to its symmetry. For this reason, lymphoid cells, in which the proportion of its cytoplasm is very low and therefore N/C ratio is high, showed lower correlation values for the fifth and ninth features, both of which contain information about the proportion of bright and dark pixels in the whole cell.
In addition, we detected that the SD of the BHC (seventh) presented an inverse correlation with the N/C ratio for all lymphoid cell groups. In the case of the E1MHC (eighth) we found that lower entropy values corresponded to higher N/C ratio values only in some lymphoid cell groups. Up to the best of our knowledge, there are no previous studies that have used features based on entropy for AL recognition.
In summary, we found a set of 20 quantitative features which were relevant for the discrimination among 12 lymphoid cell groups. It is important to remark that most of them were significant for AL pairs in which their recognition by morphology in PB smears can be difficult, such as ML versus AL-CLL, HCL versus SMZL, AL-FL versus AL-MCL, LB versus RL, B-PL versus T-PL and T-PL versus SC. Moreover, five quantitative colour/texture features were significant to discriminate RL related to infections from AL related to neoplasms.
The applicability of this research lies in the potential to reduce the subjectivity associated with PB morphology. The standardisation of the use of the quantitative features described here could be interesting to support differential morphological diagnosis and, very especially, when lymphoid cell immunophenotyping by flow cytometry is inconclusive. New haematological analysers are currently under development based on image analysis of individual cells.41 42 They could benefit from the use of relevant quantitative features, as those described in this work, to discriminate among reactive, abnormal and blast lymphoid cells. The recognition of these cell groups using image analysis, alongside flow cytometry methods, could be an interesting association to explore in the next future.
Take home messages
Peripheral blood (PB) film morphology is the first step to detect reactive lymphocytes (RL), abnormal lymphocytes (AL) and blast cells and these subsets share subtle morphological features.
The contribution of this paper is to provide a set of 20 new cytological variables (geometric, colour and texture) with the following properties: (1) have quantitative formulations, (2) allow qualitative morphological interpretations and (3) are efficient to discriminate among a significant number of different abnormal lymphoid cell groups.
Most of the 20 features were significant for the discrimination of pairs of abnormal lymphoid cells in which their recognition by morphology in PB smears can be difficult, such as mature lymphocytes versus AL-CLL (chronic lymphocytic leukaemia), hairy cell leukaemia versus splenic marginal zone lymphoma, AL-FL (follicular lymphoma) versus AL-MCL (mantle cell lymphoma), lymphoid blasts versus RL, B prolymphocytes versus T prolymphocytes (T-PL) and T-PL versus Sézary cells.
In addition, new haematological analysers, which are currently under development based on image analysis of individual cells, could benefit from the use of relevant quantitative features, as those described in this work, in order to discriminate between AL, blast cells and RL.
This work is part of a research project funded by the Directory of Science, Technology and Innovation of the Ministry of Economy and Competitiveness of Spain, with reference DPI2015-64493-R. LP acknowledges the Polytechnic University of Catalonia for a PhD grant within the Biomedical Engineering Program as well as the International Council for Standardization in Hematology for receiving a Carol Briggs-Smalley Scholarship in 2016.
Contributors All the authors have contributed in this article. The team is a multidisciplinary group involving medical and engineering fields.
Funding This work is part of a research project funded by the Directory of Science, Technology and Innovation of the Ministry of Economy and Competitiveness of Spain, with reference DPI2015-64493-R.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
Correction notice This paper has been amended since it was published Online First. Owing to a scripting error, some of the publisher names in the references were replaced with 'BMJ Publishing Group'. This only affected the full text version, not the PDF. We have since corrected these errors and the correct publishers have been inserted into the references.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.