Article Text

Download PDFPDF

Clinical use of WHO haemoglobin colour scale: validation and critique
  1. C F Ingram1,
  2. S M Lewis2
  1. 1Helen Joseph Hospital Laboratory, South African Institute for Medical Research, PO Box 1038, Johannesburg 2000, South Africa
  2. 2WHO Collaborating Centre for Haematology Technology, Department of Haematology, Hammersmith Hospital, Imperial College School of Medicine, London W12 0NN, UK
  1. Dr Lewis smlenl{at}


Aim—The World Health Organisation (WHO) haemoglobin colour scale has been developed as a simple, inexpensive clinical device for diagnosing anaemia when laboratory based haemoglobinometry is not available. In an initial validation study at several health centres, scale readings were compared with measurements of haemoglobin by the laboratory. This showed the scale to have 90% sensitivity and 70% specificity in identifying whether anaemia was present or not. In addition, when present, the degree of anaemia was correctly classified in clinical terms as moderate, pronounced, or severe, with an overall sensitivity of 60% and specificity of 88%. Errors were mainly marginal—that is, between two adjacent categories—but there were also some major discrepancies, such as a blood with a haemoglobin of 6–7 g/dl being read as normal or vice versa. Because this would compromise the scale's reliability in practice, this study was undertaken to identify the causes of the discrepancies and to reassess the performance of the scale.

Methods—Venous blood samples were collected into potassium EDTA from patients attending selected clinics at three South African hospitals with good laboratory facilities. A prototype of the device was used unsupervised by nursing staff, doctors, and phlebotomists, who were told to follow the printed instructions. The blood specimens were then immediately sent to the laboratory where haemoglobin was measured by standardised automated blood cell counters. Any discrepancies > 1 g/dl were recorded and the tests were repeated by the same operators under supervision of the investigators.

Results—Almost all the errors that occurred resulted from the incorrect use of the device, namely: inadequate or excessive blood, reading the results too soon or too late (beyond the limit of two minutes), poor lighting, or holding the scale at the wrong angle. The accuracy improved dramatically when the tests were repeated under supervision and these faults were avoided: 95% of readings were within 1 g/dl of the reference measurements, and 97% within 1.5 g/dl. Anaemia screening showed 96% sensitivity and 86% specificity. Clinical judgement of pallor was frequently wrong, whereas the scale gave the correct diagnosis in more than 97% of cases.

Conclusion—The study confirmed the usefulness and reliability of the scale and its advantage over clinical signs for the diagnosis of anaemia, thus providing a clinically reliable near patient method in the absence of a laboratory. The instructions are easy to follow but must be strictly adhered to.

  • diagnosis of anaemia
  • haemoglobinometry
  • haemoglobin colour scale
  • clinical assessment of anaemia

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

A simple and inexpensive device for providing a reliable indication of the presence and severity of anaemia would be of considerable value in situations where laboratory based haemoglobinometry is not readily available. The most simple method is to match the colour of a drop of blood on absorbent paper against a colour scale; however, previous devices of this type have been too unreliable to serve their intended purpose.

Stott and Lewis investigated the various factors responsible for the wide margin of error occurring with the available colour scales.1 This led them, with collaboration by Wynn,2 to identify Whatman 31ET chromatography paper as suitable for the preparation of test strips of 45 × 15 mm on which to absorb drops of blood. They established a set of standards consisting of blood samples, the haemoglobin concentration of which was measured by spectrometry with the ICSH3 (International Commmittee for Standardisation in Haematology; Expert Panel on Hoemoglobinometry) reference method, and adjusted to obtain a range of 4–14 g/dl, in 2 g/dl steps. (It was thought that the intended users of the scale would be more familiar with this traditional unit for haemoglobin than g/l). The spectral characteristics of the colour produced by a drop of blood on a test strip from each of the haemoglobin standards were identified by a computerised analytical spectrophotometer. These specifications were reproduced in light resistant printing inks that were prepared from pigments of the three primary colours and a neutral diluent. The colour shades were then printed at a defined ink density on sheets of paper which, to avoid affecting the ink colour, was chemically neutral, unbleached, chlorine free, and resistant to UV light. After the ink was spread the paper was dried and varnished. The colour strips were mounted on a neutral grey surround with a PVC backing, which would allow any traces of blood to be wiped easily off the back of the scale. The viewing area was restricted to a circular aperture of 8–9 mm diameter in the centre of each colour standard (fig 1). Optimal conditions were established for colour matching: the best results were obtained with the scale held at an angle of about 45° with the light (daylight or artificial tungsten or fluorescent light) coming over the shoulder of the observer. Although the scale is a clinical device, not intended to be compared with laboratory based measurement, the latter was necessary to provide reference measurements for validating the reliability of the scale. By this means, a laboratory based study on 1213 blood samples showed that with training it was possible to detect anaemia reliably with the scale, and to classify the degree of anaemia correctly in clinical terms as moderate (8 to < 12 g/dl), pronounced (6 to < 8 g/dl), or severe (< 6 g/dl).2

Figure 1

The World Health Organisation haemoglobin colour scale.

To test the feasibility of using the scale in field conditions a study was undertaken in a rural hospital with a group of nurses, medical students, and untrained assistants. After a 30 minute training session, they were able to identify the degree of anaemia correctly in a batch of blood samples, albeit with some individual discrepancies in readings.4 The device was also used with satisfactory results in a WHO training course on malaria control,5 peripheral health clinics in the Gambia,6 a rural antenatal clinic in Malawi,7 and a survey of anaemia in preschool children.8

An international study with the prototype device was organised by the WHO Programme on Health Technology to assess its validity when used by health clinic staff who were provided with detailed written instructions but no individually supervised training. Scale readings were judged against the measurement of haemoglobin on the blood samples by laboratories using haemoglobinometers calibrated with the international haemiglobincyanide standard. A total of 3600 tests were performed at the clinics and also 2800 tests at blood donor sessions. The results were similar to those in the original evaluation,2 with 90% sensitivity and 70% specificity in identifying whether anaemia was present or not. In blood donor screening, 97% were correctly identified at a 12 g/dl discrimination value. When anaemia was present, its severity was assessed as moderate, pronounced, or severe with an overall sensitivity of 60% and specificity of 88% (unpublished report to the WHO). There were also some unexpected mistakes: in some centres there were differences as great as 3–4 g/dl in an otherwise accurate batch of readings in a run, and although these discrepancies occurred in only a small proportion of the total number of tests, they put into question the reliability of the scale. Accordingly, a study was devised to assess the usefulness of the scale in practice at busy health clinics.


The project was carried out with the prototype scale in general medical and antenatal clinics at three hospitals in the neighbourhood of Johannesburg. The participants included nursing staff (senior nurses, midwives, student nurses), doctors, and phlebotomists. The tests were performed on a total of 548 patients who presented routinely at various outpatient clinics. Using samples from venous blood collected by a phlebotomist, the participants measured their haemoglobin with the scale independently and without supervision; their readings of haemoglobin values were recorded on a report form (“initial readings”). The specimens and forms were then sent without delay to the hospital laboratory (a branch of the South African Institute for Medical Research), where the haemoglobin was measured on an automated blood cell analyser, which was calibrated by reference to the ICSH international haemoglobin (HiCN) standard.3 These results were entered on the report forms, which were then reviewed by the investigators (CFI and SML). Discrepancies of > 1 g/dl between the scale reading and the haemoglobin measurement were noted and the test was repeated on the original specimens by the same participants (who were not told the laboratory results), but now supervised by the investigators who were vigilant for any possible discrepancies in technique and for any difficulty in distinguishing between the colour shades.

It was impractical to retest the entire set of specimens, but about 10% of the correctly read specimens were included randomly in the batches for the repeated tests. None of these gave different repeat readings; this was consistent with the intra-observer reproducibility (SD, < 1 g/dl) recorded in the original study.2

At some of the clinics the clinicians (doctors or nurses) assessed clinical signs of anaemia, examining conjunctiva, palms, nail bed, and gums for pallor, in accordance with their standard routine practice. These observations were judged against the laboratory haemoglobin measurements on the patients' venous blood samples as reference; a comparison was made with the ability of the scale to identify anaemia correctly.


There were two sets of results for the scale: initial readings and repeat readings. The ICSH reference method was used for haemoglobinometry3. These measurements were recorded in g/dl to one decimal point, but when they were above 14 g/dl they were analysed as 14.0. Tables 1 and 2 compare the scale readings and the reference measurements.

Table 1

Comparisons of scale and reference measurements

Table 2

Discrepancies between scale and reference haemoglobins in individual cases

PAIRED t test of scale v reference measurements

Initial scale readings: 0.3743.

Repeat scale readings: 0.0805.

These results indicate a 70% probability that there is no significant difference between the reference measurements and the initial scale readings, and a 95% probability of no significant differences with the repeat readings. This indicates the occurrence of some random errors in the initial readings, with no specific bias.


This was assessed by comparing readings with the reference haemoglobin measurements in four clinically demarcated groups, namely: no anaemia (≥ 12 g/dl), moderate anaemia (8 to < 12 g/dl), pronounced anaemia (6 to < 8 g/dl), and severe anaemia (< 6 g/dl), the repeat readings being used if there had previously been discrepancies (table 3).

Table 3

Diagnostic usefulness and predictive values at different haemoglobin (Hb) values using best scale readings where discrepancies occurred (n = 548)

Diagnostic sensitivity and positive predictive value are indices of the frequency with which a positive result is correct, whereas diagnostic specificity and negative predictive value indicate the frequency with which a negative result is correct.9 The results in table 3 indicate that there was a high degree of accuracy in identifying when anaemia was present and in estimating the degree of anaemia in the clinical cut off points. The likelihood ratio was calculated by Youden's method,10 where values from +0.1 to +1.0 indicate an increasing probability that the test gives correct diagnostic information; the results show that the ratio was satisfactorily high at all values, especially in anaemia with haemoglobin < 6 g/dl.


Tables 4 and 5 compare the scale with the clinical ability to diagnose anaemia from physical signs. In severe anaemia (haemoglobin, < 6 g/dl) no judgmental errors were made on physical examination, but with higher haemoglobins the physical features were increasingly difficult to detect and to interpret. Even when haemoglobin was normal, pallor was diagnosed in 16.6% of cases, whereas only 3.4% of the normals were recorded by the scale as having mild, clinically unimportant anaemia, with haemoglobin readings of 10 or 11 g/dl. On the other hand, when the reference measurements indicated moderate anaemia (haemoglobin, 8–9 g/dl), this was not detected in one third of the patients by physical examination, whereas the scale identified moderate anaemia correctly in 97% of cases.

Table 4

Assessment of clinical signs of anaemia (n = 335)

Table 5

Comparison of clinical signs and colour scale


The haemoglobin colour scale was developed as a clinical device to provide a means for identifying whether an individual is anaemic and, in a broad classification, the severity of anaemia. It does not aim to compete with a haemoglobinometer in the laboratory, but it is intended for use when the latter is not available or practical.

Previous studies, including a multicentre evaluation organised by the WHO Programme on Health Technology, indicated its clinical reliability in general medical and antenatal clinics for detecting anaemia and discriminating between moderate, pronounced, and severe anaemia. A study of anaemia in pregnancy in Malawi provided broadly similar results, with a likelihood ratio of correct diagnosis from 1.4 to 3.7 However, in all these studies there were some unexpected discrepancies and even some gross mistakes. Thus, for the scale to be depended upon at all times it became necessary to investigate possible causes of such faulty readings.

Our present study was undertaken in medical and antenatal clinics in the neighbourhood of Johannesburg with medical and nursing staff and phlebotomists. Initially, they performed the tests on their own in accordance with the written instructions used in the earlier WHO study, and some of them also received a preliminary brief demonstration. Liaison with the laboratory ensured that the reference haemoglobin measurements were obtained soon after the scale was read. Where there were discrepancies, the scale test was repeated on the original blood by the original users under expert supervision. In some cases, the participants found it helpful to undertake an exercise with blood samples of known haemoglobin values before repeating the test. This checking resulted in a significant reduction in reading errors (tables 1 and 2), confirming that the scale is fundamentally reliable as long as it is used correctly. Accordingly, it was considered justifiable to use the repeat measurements in assessing the usefulness of the scale. From these results (table 3), it can be seen that the scale is sensitive and specific, with a high degree of discrimination. However, to ensure its reliability in practice it was important to identify the causes the of errors and for this information to be incorporated in the instructions to users.

Several factors appear to contribute to incorrect readings. (1) The instructions state that colour matching must be made only after waiting for at least 30 seconds, and that the test must be completed within two minutes because the blood stain changes colour (becoming paler) after this time. In most cases where an error occurred, this was the result of not waiting for the requisite time; subsequently, a wait of one minute has been proposed. Conversely, in some cases, readings were delayed beyond the stipulated two minutes, and it was necessary to prevent users from adding the drops of blood simultaneously to test strips for several consecutive tests. (2) The size of the drop is important. In a small number of cases the initial reading was made on a test strip with too little blood, leading to inadequate spread, with a white periphery at the adjacent matching area. Conversely, too much blood led to a thick spread and insufficient drying in the prescribed time. The solution to this problem was to take up blood to a distance of about 3 cm in a capillary tube for delivery on to the test strip. (3) It is essential to have good light. This can be either daylight (but not direct sunlight) or artificial light, or a mixture of both. The study was carried out in midsummer, but outside light varied from bright sunshine to heavy rain clouds. The tests were performed in clinics with daylight coming through windows of different sizes, supplemented by fluorescent light. Scale readings were not affected by any of these variable conditions except on one occasion when the tests were performed in a poorly lit basement examination room—30% of the results were discrepant—but were corrected when the tests were repeated by the same operators after adjourning to a better lit room. Problems also occurred when the users stood in their own shadows or did not hold the scale at the recommended angle of vision. Similarly, there were occasional errors from failure to ensure close apposition of the test strip to the apertures on the scale, because this cast a shadow on the colour strip. (4) Some training is required, albeit minimal: careful understanding of the instructions and a few minutes of practice with some blood samples of known haemoglobin content is all that is necessary.

Clerical errors with incorrect entering of results and specimen mix up are recognised as major causes of error in any laboratory, and this occurred on two occasions in our study during the stress of busy clinics; this problem is not unique to scale readings, and should not be attributed to scale failure.

The relative frequency of the different faults does not imply that some are more important than others, but only indicates how individuals performed. It demonstrates the importance of adhering to the instructions, which take account of all the causes of error listed above.

Because the scale is intended to be a clinical facility, the study included a comparison with the clinical ability to diagnose anaemia from physical signs, based on the presence of pallor. With severe anaemia the clinical features were clearly positive, but with moderate anaemia pallor was difficult to detect: when the reference haemoglobin was 8–9 g/dl, anaemia was not detected in one third of the patients by physical examination, whereas the scale identified moderate anaemia correctly in 97% of cases. Conversely, pallor was recorded (misleadingly) in 16.6% of cases when haemoglobin was normal, whereas 96.6% were normal by the scale reading, and only 3.4% were read as mild anaemia with haemoglobin readings of 10–11 g/dl.

Without exception the scale was well received, and the clinic staff indicated their wish to use it as soon as possible in their routine practice. Most participants found the scale and the test strips easy to handle, and the test easy to perform and user friendly. Some found it difficult to distinguish between 10 and 12 g/dl. Consequently, these adjacent shades have been adjusted in the production version of the scale to ensure that discrimination will be easier. Confidence in selecting the correct shade increased with experience and most users had no difficulty in judging intermediate values.


Our study confirmed the usefulness and reliability of the scale and its great advantage over clinical signs for diagnosing anaemia, and providing a reliable near patient method for screening anaemia in the absence of a laboratory. Most health workers should be able to obtain reliable readings of haemoglobin, accurate within 1 g/dl, as long as the the correct procedure is strictly adhered to. It is useful, but not essential, to have a brief introductory training with a batch of blood samples of known haemoglobin values, and even trained users might find it helpful to have a brief refresher training course with such bloods from time to time. It is essential to accompany the scale and test strips with a leaflet containing clearly written instructions for use, emphasising the potential causes of error. This study has been valuable in identifying these errors, and thus provides guidance for both the specifications of the scale and the correct technique for its use.


We thank Professor J Hofmeyr, Head of Obstetrics at Coronation Hospital; Professor Van Gelderen, Head of Obstetrics at Chris Hani Baragwanath Hospital; and Professor MacPhail, Head of Medicine at Helen Joseph Hospital for providing the facilities to carry out this study. We are grateful to Dr J Hall, Consultant Obstetrician at Chris Hani Hospital for her support, and the nursing staff, phlebotomists, clinicians, and laboratory technologists at the three centres who made the study possible by their collaboration and enthusiastic support, despite having to cope simultaneously with their heavy routine workload.