Article Text

Design of a real time quantitative PCR assay to assess global mRNA amplification of small size specimens for microarray hybridisation
  1. V Choesmel,
  2. F Foucault,
  3. J P Thiery,
  4. N Blin
  1. UMR144 CNRS, Research Division, Institut Curie, 75248 Paris cedex 05, France
  1. Correspondence to:
    Dr N Blin
    UMR144 CNRS, Research Division, Institut Curie, 75248 Paris cedex 05, France;


Background: Low RNA yields from clinical samples are a limiting step for microarray technology.

Aims: To design an accurate real time quantitative polymerase chain reaction (PCR) assay to assess the crucial step of global mRNA amplification performed before microarray hybridisation, using less than 1 µg total RNA.

Methods: Three RNA extraction procedures were compared for small size samples. Total RNA was amplified from universal RNA or the BC-H1 breast cancer micrometastatic cell line using three different protocols. Real time quantitative PCR technology was used for accurate measurement of urokinase plasminogen activator receptor and cytokeratin 8 RNA amplification rates and ratios, using primer sets binding at various distances from the 3′ end of transcripts. A 50 mer oligomeric array targeting 87 genes potentially involved in breast cancer metastatic progression was built and hybridised with amplified RNA.

Results: Eighteen nanograms of total RNA could be purified from 1000 BC-H1 micrometastatic cells. Amplification rates of 25 000 to 100 000 were achieved with as little as 10 ng of starting material. However, results were highly variable, depending on the amount of starting material, gene characteristics, sample quality, and protocols used. Oligomeric array hybridisation with 20 µg reference RNA resulted in specific and reproducible signals for 83% of the genes, whereas mRNA amplification from less than 400 ng of starting material resulted in selective detection of signals from highly expressed genes.

Conclusions: Improvements in the design of global mRNA amplification procedures and oligomeric arrays are needed to extract informative gene expression data from clinical samples containing limited cell numbers.

  • aRNA, antisense RNA
  • BM, bone marrow
  • Ct, cycle threshold
  • GAPDH, glyceraldehyde 3-phosphate dehydrogenase
  • IVT, in vitro transcription
  • Ker8, cytokeratin 8
  • PCR, polymerase chain reaction
  • RT, reverse transcription
  • rtqPCR, real time quantitative PCR
  • uPAR, urokinase plasminogen activator receptor
  • uRNA, universal RNA
  • amplified RNA
  • in vitro transcription
  • real time quantitative PCR
  • oligomeric array
  • expression profiling

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

It is currently thought that human cancer cells acquire the hallmarks of malignancy by accumulating gene activation and inactivation events over long periods of time.1 Gene expression profiling by means of cDNA or oligonucleotide arrays has opened up a new field in genomic research, making it possible to study several 1000s of genes at once. Diverse applications have been found for this novel microtechnology, particularly in the area of breast cancer research and treatment.2–5 However, microarray analyses are still based on the assumption that a tumour is a single entity. It has been shown that specific genes are correlated with metastatic potential, but this discovery cannot be extended further unless the metastatic step in which the gene is involved is identified at the cellular level. Ideally, pure cell population identification and separation techniques combined with array based approaches could be used to define accurately the role of specific gene products in individual steps of the metastatic progression.

“The identification of new molecular markers for the detection and targeting of micrometastatic cells may be of clinical relevance and crucially important to our understanding of the molecular mechanisms involved in the spread of cancer”

An important challenge in this field of investigation is the gene expression profiling of human carcinoma micrometastases. The detection of occult tumour cells disseminated in bone marrow (BM) has been reported as an independent prognostic factor for the risk of relapse in patients with breast cancer.6,7 Thus, the identification of new molecular markers for the detection and targeting of micrometastatic cells may be of clinical relevance and crucially important to our understanding of the molecular mechanisms involved in the spread of cancer. During the past decade, many studies have used the technique of reverse transcription polymerase chain reaction (RT-PCR) targeting genes such as EGP-2, cytokeratin 19, mammaglobin, MAGE-A, and MUC1 to detect micrometastases in blood, BM, and lymph nodes.8,9,10 However, the ectopic expression of target genes or the presence of pseudogenes led to inherent limitations in studying such rare cells.11–13 For gene expression profiling on microarrays, there are major technical difficulties, such as the limited number of tumour cells that can be immunopurified from patients’ BM aspirates and the generation of RNA samples of sufficient quality and size. Typical cDNA microarray labelling procedures require 10–20 µg total RNA. This amount of RNA can be obtained from samples of human tissue weighing at least 50 mg or from approximately 107 isolated cells. However, breast cancer biopsy samples generally weigh 10–25 mg and yield only 3–5 µg of total RNA. A technique for the preparation of labelled cDNA probes from 0.5 to 1 µg RNA without the need for signal or template amplification has been reported,14 but this procedure is inappropriate for use in the study of clinical samples as rare as BM micrometastases. One way of circumventing the small amount of RNA available from specimens involves the use of indirect cDNA labelling protocols, optimised to increase sensitivity and decrease variability, by the direct incorporation of fluorescent Cy-3 or Cy-5 aminoallyl modified nucleotides.14,15

RNA amplification provides another alternative, and requires less material than techniques based on signal amplification. The two most frequently used RNA amplification technologies are RT-PCR and T7 based global mRNA amplification procedures. Exponential amplification in RT-PCR is believed to distort abundance relations because cDNAs differing in length and composition, and transcripts differing in abundance, will probably be amplified with different efficiencies. The T7 based global mRNA amplification technique, commonly referred to as the Eberwine procedure,16 is designed to minimise such bias. Typically, the generation of aRNA (antisense RNA) is preceded by first strand cDNA synthesis, using an oligonucleotide primer containing a bacteriophage T7 RNA polymerase promoter proximal to an oligo(dT) sequence. After second strand cDNA synthesis and purification, an in vitro transcription (IVT) reaction is performed, using T7 RNA polymerase in the presence of labelled nucleotides. This method was first described for use with 1–10 µg total RNA, and is currently preferred because RNA polymerase activity is generally not affected by template sequence or concentration in a complex transcription mixture. Indeed, this procedure has been shown to provide information about the relative abundance of transcripts,17,18 and is particularly useful if only limited amounts of RNA are available for gene expression profiling.

Although several laboratories have used RNA amplification before microarray hybridisation, the accurate identification of outlying genes by the global mRNA amplification approach has not been fully validated. Moreover, few quantitative studies of differences in transcript ratios between amplified RNA and total RNA have been carried out, and no consensus has been reached concerning the basis of observed results. We describe here a comparative study of the T7 based global mRNA amplification method, using breast cancer cell lines and limited amounts of nucleic acids to mimic immunopurified BM micrometastases in experimental conditions. We carried out a series of amplification reactions based on two protocols optimised for small size samples. The amplified RNA was first hybridised to a 16 mm2 oligomeric array targeting 86 genes potentially involved in breast cancer metastasis. This made it possible to evaluate this custom built microarray and to determine the minimum amount of aRNA required for global gene expression profiling. We then designed and carefully standardised a real time quantitative PCR (rtqPCR) assay, using 3′, 500 bp distant from 3′ (midtranscript), and 5′ primer sets targeting two breast tumour cell genes encoding the urokinase plasminogen activator receptor (uPAR) and cytokeratin 8 (Ker8), with the glyceraldehyde 3-phosphate dehydrogenase (GAPDH) reporter gene used as the reference. This made it possible to analyse the reproducibility, reliability, and sensitivity of one or two rounds of global mRNA amplification.


RNA purification

The RNA used in the experiments was obtained from three different sources. The universal human reference RNA (uRNA) (Stratagene, La Jolla, California, USA) is composed of total RNA extracted from 10 human cell lines. It approximates the expression profile of most human genes and was used for the standardisation of all new procedures and for the normalisation of each rtqPCR experiment. Total RNA was extracted from the MCF7 metastatic and BC-H1 micrometastatic breast (kindly provided by Professor K Pantel, Institute of Tumour Biology, Hamburg, Germany) cancer cell lines. MCF7 cells were cultured according to the ATCC (Rockville, Massachusetts, USA) protocol and BC-H1 cells were cultured as described in a previous study.19 We compared three methods in terms of their efficiency for extracting high quality total RNA from 103 to 106 cells. All three RNA purification methods were derived from Chomczynski’s procedure. They were based on Trizol (Invitrogen, Carlsbad, California, USA) or RNA-Plus (Qbiogene, Carlsbad, California, USA) reagents, or the Absolutely RNA Nanoprep kit (Stratagene) recommended for very small volume samples. Pellet Paint Co-Precipitant (Novagen, Madison, Wisconsin, USA) was added as a visible dye labelled carrier for the precipitation of nucleic acids. The yield and integrity of extracted total RNA were determined with a Kontron 810 spectrophotometer (BioServ, Rostock, Germany), and the RNA 6000 Nano LabChip kit combined with the Agilent 2100 bioanalyser (Agilent, Palo Alto, California, USA).

RNA amplification and labelling

RNA was amplified using protocols derived from the T7 based Eberwine procedure optimised for small amounts of RNA.17 The first protocol was that of the MessageAmp aRNA Manual Version 0110 from Ambion (Austin, Texas, USA; The second protocol was the GeneChip eukaryotic small sample preparation protocol from Affymetrix (Santa Clara, California, USA; provided as supplemental data online at The third protocol was an optimised version of the Affymetrix method with two additional steps. First, double stranded cDNA was extracted in phenol/chloroform/isoamyl alcohol in Phase Lock Gel™ tubes (PolyLabo, Eppendorf, Germany) and then precipitated in ethanol. Before further purification, the IVT product was then treated with DNAse I to remove the contaminating double stranded cDNA. All three protocols proceed in five steps: (1) first strand cDNA synthesis using T7 oligo(dT) primers, (2) second strand cDNA synthesis, (3) cDNA purification, (4) in vitro transcription, and (5) aRNA purification. The only differences lie in methods to purify cDNA and aRNA, or are related to enzyme and reagent sources. In both protocols, the IVT procedure is performed with the MEGAscriptTM kit (Ambion). Although, two different IVT incubation times are recommended (6–14 hours for Ambion and four hours for Affymetrix), they have been homogenised to six hours to compare protocols.

One or two rounds of amplification were carried out for each of the three procedures, and aRNA products were measured at each step using the Agilent Technology. RNA was labelled by adding aminoallyl labelled UTP (Sigma, St Louis, Missouri, USA) during the IVT step and by coupling the aminoallyl groups to cyanin Cy3 or Cy5 ester fluorescent dyes (Molecular Probes, Eugene, Oregon, USA), following the protocol of the Institute for Genomic Research (

Oligomeric array design, hybridisation, and analysis

Eighty seven target genes (the list of genes is supplied as a supplemental file available online at were chosen based on their expression pattern in breast cancer cell lines, as reported in microarray studies.2,4 We designed, aligned, and selected 185 specific 50–55 mer oligonucleotides with Oligo Primer Analysis version 4.0 software (National Biosciences, Plymouth, Minnesota, USA) and the annotated RefSeq database (NCBI, Bethesda, Maryland, USA). Oligonucleotides (50–100 pmoles) were covalently bound to 3D-link™ activated glass slides (Surmodics, Eden Prairie, Minnesota, USA) using a Microgrid II TAS (Biorobotics, Cambridge, UK) spotter. Six 16 mm2 matrices of 196 spots each, with spots measuring 200–250 µm in diameter and separated by 300 µm, were spotted on to each slide under sterile conditions, at 20°C, 60% humidity. Oligonucleotides were spotted in duplicate, and a series of control oligonucleotides targeting three different Arabidopsis thaliana genes (negative controls) and the β actin and GAPDH genes (positive controls), were set in different locations of the matrix. Postcoupling and hybridisation of the oligomeric array were performed as described in the Codelink Activated Slides Protocol ( ScanArray ExpressHT (Packard Biosciences, Perkin Elmer, Boston, Massachusetts, USA) software was used, at a 5 µm/pixel resolution, and data were acquired from the images with GenePix Pro 3.6 software (Axon Instruments, Union City, California, USA). The intensity of each spot was corrected by subtracting the mean intensity of the pixels in the local background from the mean intensity of the pixels in the spot.

Quantitative real time PCR

We carried out rtqPCR experiments with the AbiPrism® 7900HT sequence detection system (Applied Biosystems, Foster City, California, USA) using SYBR® green technology. The coding sequences of the human GAPDH, uPAR, and Ker8 genes were retrieved from the NCBI database, and primers (EuroGentec, Seraing, Belgium) were designed with the Oligo Primer Analysis 4.0. software. We generated cDNA from uRNA or aRNA samples (1/10th of the sample), using Superscript II RNase H reverse transcriptase (Invitrogen) and a 50 : 50 ratio of random primer to oligo(dT)12–18 primer (Invitrogen). We carried out rtqPCR with 1/10th of the reverse transcription products, in a total volume of 25 µl, using the AmpliTaq Gold® DNA polymerase and the SYBR® Green master mix (Applied Biosystems), in the presence of 600nM sense and antisense primers. Amplification was carried out in a thermal cycler under the following conditions: heating for two minutes at 50°C, followed by 10 minutes at 95°C, and then 40 cycles of denaturing for 15 seconds at 95°C followed by annealing/extension for one minute at 60°C. Baseline fluorescence levels were determined (normalised background fluorescence of cycles 3–11 for GAPDH and 3–15 for uPAR and Ker8), and calibration curves were generated for each gene and primer set, using 0.1, 1, 10, and 100 ng total RNA. The overall efficiency of rtqPCR was calculated from the gradient of the standard curve (determinations were carried out in triplicate), and dissociation plots were systematically derived to check for product formation. The initial template molecules in the samples were measured in duplicate and expressed as the mean (SD). Negative controls for reverse transcription (no reverse transcriptase or no RNA) and for rtqPCR (A thaliana primer sets or no cDNA), and positive controls for each rtqPCR (cDNA prepared from 1 µg uRNA) were included in each experiment.


RNA purification and amplification from limited numbers of cells

We compared three RNA purification protocols in terms of their efficiency for the extraction of high quality total RNA from 106, 105, 104, and 103 BC-H1 micrometastatic cells. The amount and quality of RNA were determined by calculating the A260/280 ratio and by carrying out Agilent electrophoregram analyses. These two methods gave similar results, but the Agilent technology nanochip was more sensitive, making it possible to compare the total RNA purified from 104 cells (100 ng RNA). The Trizol purification method was chosen because it gave high yields and preserved the integrity of the total RNA extracted from small size samples (fig 1A). This protocol was highly reproducible and a linear association between various amount of starting material and the amount of RNA purified was observed. We have shown that 90–1600 (mean, 555; SD, 460) carcinoma cells can be immunopurified from the entire bone marrow aspirate of patients with “advanced” breast cancer.20 Using the Trizol extraction protocol, it was possible to isolate 18 (SD, 3) ng total RNA from 1000 BC-H1 micrometastatic cells. Therefore, we chose 10 ng of RNA as the limiting condition to be investigated in all experiments performed in our study.

Figure 1

 Qualitative analysis of RNA before and after amplification. Total RNA extracted from (A) 10 000 BC-H1 micrometastatic cells, and amplified RNA ((B) one round or (C) two rounds) from 10 ng (purple), 100 ng (green), and 1000 ng (blue) BC-H1 total RNA, were tested for yield and integrity using the Agilent nanotechnology. Electropherograms and gel images show high quality mRNA amplification, with a decrease in amplified RNA size during the second round of amplification for the 100 ng sample.

Global mRNA amplification is required to obtain the microgram quantities of RNA needed for successful hybridisation on microarrays. We simultaneously amplified 10, 100, and 1000 ng of total RNA from BC-H1 micrometastatic cells, by means of a T7 based protocol (Ambion). The quality of the resulting aRNA was determined by analysing size distribution using the Agilent technology (fig 1B, C). Electropherograms demonstrated a high quality of mRNA amplification, with most transcripts 500–2000 bases long, and no DNA or ribosomal RNA contamination. However, a characteristic decrease to 300 nucleotides in the mean size of transcripts was seen after the second round of amplification. Using this technique and 100 ng of total RNA starting material, we generated about 0.5 µg and 5.6 µg of aRNA after one round and two rounds of amplification, respectively. This amount of RNA should be sufficient for microarray hybridisation.

Evaluation of a custom built oligomeric array

To overcome the problems associated with the use of small size samples, we developed a new strategy based on building a 16 mm2 oligomeric array targeting approximately 100 genes, and facilitating hybridisation in a volume as small as 4 µl. Several key factors determine the quality of microarray data. A reproducible aRNA sample preparation method is required and RNA labelling and microarray hybridisation should be sensitive and reproducible. Aliquots of 20 µg of uRNA were labelled with Cy3 or Cy5 and were then hybridised with the oligomeric array matrix in a single experiment (data not shown). The data were expressed as studentised residuals of signal intensities and a strong correlation (r2  =  0.9784) was seen, with a variance ratio between 0.5 and 2, providing cutoff values for expression data. However, if the same experiment was performed with two different oligomeric array matrices, the dots were highly dispersed, and the correlation coefficient decreased to 0.6498.

Because the main application of microarray technology is the identification of differentially expressed genes, we assessed the ability of three T7 based global mRNA amplification protocols to achieve this goal. Therefore, Ambion and Affymetrix protocols optimised for small size specimens were compared with a modified Affymetrix procedure, improved as described under the materials and methods section. The per cent present call comparison was used to estimate the overall extent to which this custom built oligomeric array could reliably detect variations in expression levels (data not shown). About 80% of the oligonucleotides were called as present, using uRNA as the common reference. The modified Affymetrix amplification protocol performed the best, with 97% of nucleotides called as present for 400 ng of amplified uRNA, as compared with 50% and 63% obtained with the Ambion and Affymetrix protocols, respectively. However, all three protocols were unable to deliver similar transcript representations for smaller size samples (200 ng amplified uRNA). The 87 genes of the oligomeric array displayed higher per cent present calls than the oligonucleotides (61%, 76%, and 98% for 400 ng of amplified uRNA with the Ambion, Affymetrix, and modified Affymetrix protocols, respectively). Indeed, most of the oligomers more than 1000 bases distant from the 3′poly(A) were less frequently called as present after the amplification of 400 ng of uRNA, and were not called as present at all if only 200 ng of uRNA was amplified. This may reflect transcript shortening after the T7 based IVT amplification procedure.

We further investigated signals obtained on the oligomeric array with amplified RNA samples compared with expected expression levels of uRNA. From the scatter plots of 20 µg unamplified uRNA against 400 ng aRNA (data not shown), the three protocols provided poor correlation coefficients (r2  =  0.35–0.40). Thus, the variance introduced during the amplification step may account for large discrepancies and result in biased transcriptome analysis when working with RNA amounts less than 400 ng. Therefore, we decided to set up a rigorous assay to assess the variability introduced by the amplification procedure.

Validation of a rtqPCR assay

We designed and standardised a rtqPCR assay to assess the Ambion and modified Affymetrix amplification protocols. Specific primers targeting two genes expressed in breast carcinoma (uPAR and Ker8) and the reference GAPDH reporter gene were chosen (fig 2A). To detect possible shortening of transcripts, as suggested by the Agilent and oligomeric array data, we designed three sets of primers for each gene, which bound at various distances from the 3′ end of the transcript (fig 2B). The 3′, midtranscript, and 5′ sets bound 10–50 bp, 400–500 bp, and 1000–1500 bp away from the 3′poly(A), respectively.

Figure 2

 Real time quantitative PCR strategy. (A) Name, sequence, and position in coding sequence (cds), for primer sets specific for the human glyceraldehyde 3-phosphate dehydrogenase (GAPDH), cytokeratin 8 (Ker8), urokinase plasminogen activator receptor (uPAR), and the Arabidopsis thaliana rubisco activase (ATH rbcl) genes. (B) The above primer sets have been designed such that they bind at the 5′, the middle (mid), or the 3′ region of the coding sequence.

We thoroughly investigated this rtqPCR assay setup, by assessing its technical value in validating RNA amplification data. Primers were designed to give amplicons of similar size (112–166 bases long). Figure 3 illustrates representative data obtained with the GAPDH 3′ set reference primers (O3-O4). PCR amplification kinetics (fig 3A) were similar over the whole range of template molecules, with high efficiencies (E  =  108.3%; SD, 3.3%) and correlation coefficients (r2  =  0.995), as averaged for all the primer sets. Under these conditions, we obtained mean cycle threshold (Ct) values of 15.5 (SD, 0.4) for the uRNA positive control template and of 40.0 (SD, 0.1) for A thaliana negative control primers. All differences in Ct values obtained in reactions with or without the reverse transcription (RT) step before rtqPCR were in the range of 8.5–14.5 (mean, 11.8; SD, 2.2). Dissociation plots (fig 3B) were used to check for the absence of contaminants or irrelevant products. Standard curves (fig 3C) obtained by plotting Ct values against template concentrations (0.1 ng, 1 ng, 10 ng, and 100 ng uRNA or BC-H1 cell RNA) showed overall RT-rtqPCR efficiencies to be similar (E  =  82.4% (SD, 6.3%) and 82.2% (SD, 10.4%), respectively).

Figure 3

 Real time quantitative polymerase chain reaction (rtqPCR) validation. (A) glyceraldehyde 3-phosphate dehydrogenase (GAPDH) rtqPCR amplification, (B) dissociation, and (C) standard curve plots, exemplify the methodology used to validate each primer set (human GAPDH O3-O4 here) and experimental conditions.

As another validation step, we calculated the relative efficiencies of primer sets with respect to the GAPDH O3-O4 reference primers. The data obtained with the various genes and primers showed equivalent relative efficiencies (data not shown). Therefore, we used the comparative Ct (ddCt) method on uRNA and BC-H1 cell RNA preparations to measure the expression of the uPAR and Ker8 genes with respect to the GAPDH gene. The overall reproducibility of the RT-rtqPCR technique was confirmed by the mean dCt(uPAR–GAPDH) of 5.7 (SD, 0.4) for 10 ng BC-H1 RNA and of 5.9 (SD, 0.1) for a corresponding 1000 BC-H1 cell extract, and the mean dCt(Ker8–GAPDH) of 7.3 (SD, 0.3) for 10 ng BC-H1 RNA and of 7.2 (SD, 0.6) for a 1000 BC-H1 cell extract. Measurement of gene expression by the comparative Ct method showed that Ker8 transcripts were 25 times more abundant than uPAR in uRNA, whereas they were about 160 times more abundant in uRNA than in BC-H1 RNA samples. These data are in accordance with signals measured by expression profiling on the oligomeric array, Ker8 being 17 times more abundant than uPAR in uRNA.

Evaluation of the T7 based global mRNA amplification method

The gene expression results obtained with amplified material may be affected by the amplification step. Therefore, it is important to be aware of the variability of the amplification procedure if reliable data are to be obtained. The inherent variability of most experimental models makes it difficult to detect such effects. Our carefully designed and reliable RT-rtqPCR procedure makes it possible to compare amplification of the uPAR, Ker8, and GAPDH transcripts relative to their length and to their abundance. We also assessed the reliability of the Ambion and modified Affymetrix methods, using 1, 10, and 100 ng of reference uRNA or 10, 100, and 1000 BC-H1 micrometastastic cells.

We first evaluated the Ambion and modified Affymetrix amplification methods using uRNA and 3′ primer sets for the three genes (fig 4). Measurement of the amplified RNA showed that one round was sufficient to generate 10–25 µg of aRNA from 100 ng starting material, depending on the method used (fig 4A, C). This equates to an amplification rate of 2000 to 5000 if we assume that mRNA accounts for about 5% of the total RNA in eukaryotes. Two consecutive amplification rounds from 10 ng uRNA samples can result in the production of 12.5–50 µg aRNA, depending on the method used (fig 4B, D), equivalent to an amplification rate of 25 000 to 100 000. The two procedures gave similar amplification profiles for the three genes, but the modified Affymetrix protocol was clearly more efficient at generating large amounts of aRNA. We assessed the linearity of the amplification reaction by determining amplification rates for 1 ng, 10 ng, and 100 ng uRNA starting material in rtqPCR assays targeting GAPDH with 3′ primer sets (fig 5). Although amplification should theoretically not be limited at lower RNA concentrations, fidelity seems to decrease with decreasing amounts of RNA, as shown under our experimental conditions. The Ambion protocol was the better of the two for one round only and if smaller amounts of starting material are available (fig 5A). In contrast, and surprisingly, the modified Affymetrix protocol was more efficient after two rounds, with 10 ng of starting material (fig 5B). Similar amplification profiles were obtained with the uPAR and Ker8 primer sets for both protocols (data not shown).

Figure 4

 Real time quantitative polymerase chain reaction (rtqPCR) quantification of amplified RNA (aRNA) obtained with two T7 based protocols. Aliquots of 1 ng (yellow), 10 ng (green), and 100 ng (blue) universal RNA were amplified using (A, B) the Ambion and (C, D) the modified Affymetrix protocols. The first round (A, C) and the second round (B, D) of amplification were assessed, using rtqPCR with 3′ primers for the glyceraldehyde 3-phosphate dehydrogenase (GAPDH), urokinase plasminogen activator receptor (uPAR), and cytokeratin 8 (Ker8) genes. Amounts are expressed in nanograms and were calculated from standard curves for each primer set.

Figure 5

 Assessment of the linearity of the global mRNA amplification procedure. Aliquots of 1 ng (yellow), 10 ng (green), and 100 ng (blue) universal RNA were subjected to (A) a first and (B) a second round of amplification, using the Ambion and modified Affymetrix (Affy-mod) protocols. Real time quantitative polymerase chain reaction was performed using glyceraldehyde 3-phosphate dehydrogenase (GAPDH) 3′ primers, and amplification rates are reported as a ratio of the amount of amplified RNA produced with respect to the total RNA input.

We then investigated 5′ truncation of the transcript during the T7 based amplification process by comparing the rtqPCR data obtained with the 3′, midtranscript, and 5′ primer sets (fig 6). Amplification rates were conserved for the GAPDH, uPAR, and Ker8 genes if the 3′ primers were used (fig 6A), whereas the use of 5′ primers decreased amplification yields by a factor of three and generated gene dependent variations (fig 6B). These variations did not seem to be correlated with transcript size, because the GAPDH amplicon was similar in length to the Ker8 one, but longer than the uPAR amplicon. All sets of amplified samples were affected similarly by this 5′ truncation, regardless of the amount of starting material used. Thus, truncation results solely from the amplification step. We characterised the truncation process during the first and second rounds of amplification with both methods by systematically measuring the 3′ to midtranscript and 3′ to 5′ ratios for the GAPDH reference gene (table 1). The mean ratio (SD) obtained after the first round was 2.3 (1.1) with both protocols, indicating high quality aRNA samples in both cases. In contrast, most of the ratios obtained for the second round of amplification exceeded 3, and some were as high as 15 to 30 for the Ambion protocol. These results indicate the differential loss of the 5′ end and middle part of transcripts in amplified samples. This may account for the decrease in RNA size seen with Agilent technology, and the lower per cent present call obtained for the 1000 base 3′poly(A) remote oligomers in microarray hybridisation. Thus, we assume that each round of amplification induces some truncation of RNA molecules, possibly as a result of priming the second strand from an internal site and/or cleavage of the relatively labile RNA.

Table 1

 Amplification ratios derived from rtqPCR using GAPDH 3′, midtranscript, and 5′ primer sets

Figure 6

 Evaluation of 5′ truncation of the transcript during the T7 based amplification procedure. Aliquots of 1 ng, 10 ng, and 100 ng of uRNA were subjected to one round of amplification with the Ambion protocol. Real time quantitative polymerase chain reaction targeting the glyceraldehyde 3-phosphate dehydrogenase (GAPDH; blue), urokinase plasminogen activator receptor (uPAR; purple), and cytokeratin 8 (Ker8; yellow) genes, with (A) 3′ and (B) 5′ primers, was used to measure the amplified RNA yield and to deduce amplification rates.

Because we aimed to assess new techniques for deciphering the transcriptome of BM micrometastatic cells, we amplified total RNA extracted from 10, 100, and 1000 BC-H1 cells. The Ambion procedure gave high amplification rates (20 000 for a 10 cell RNA extract), whereas the modified Affymetrix protocol did not work with less than 1000 cells (data not shown). In contrast, very high 3′ to midtranscript and 3′ to 5′ ratios (from 5 to 102) were obtained with the Ambion method for small cell numbers, whereas more reasonable ratios were obtained with the modified Affymetrix protocol (table 1). Therefore, the method used to prepare RNA samples seems to have a large effect on amplification parameters. The Ambion protocol appears to be more efficient with “home prepared” biological samples, whereas the modified Affymetrix procedure requires RNA preparations of higher quality provided either by the use of uRNA or by larger cell number extracts.


BM micrometastases have been implicated in metastatic relapse in patients with breast cancer.6,7 Micrometastatic cells are a prerequisite for cancer progression. Deciphering the transcriptome of these cells should make it possible to identify key genes associated with one or several steps in the metastatic cascade, involved in dormancy and/or chemoresistance, or induced in the early stages of relapse. These cells are frequently collected from BM in vivo, to prevent the distortions in expression pattern that might occur with the in vitro culture of cell lines. We have demonstrated the feasibility of immunopurifying these cells with antibody activated magnetic beads,21 and have shown that the number of EpCAM positive cells is correlated with the stage of the disease.20 An average of 1000 cells can be immunopurified from the BM aspirate of patients with “advanced disease”—approximately 1 ng total RNA and 50 pg mRNA. For all these reasons, microarray expression profiling of these cells represents a major challenge. In our study, we established an accurate rtqPCR assay to assess the performance and limitations of the global mRNA amplification procedure, which is indispensable to address this issue.

Using appropriate RNA extraction methods is essential for successful and valid molecular experiments. We compared RNA extraction protocols in terms of the generation of sufficient amounts of high quality RNA from a limited number of cells. The Trizol method gave the highest RNA yields and purity. It has been described as the method of choice for extracting RNA from patient samples.22–24

In an attempt to reduce the amount of RNA required for microarray analysis, we built a 16 mm2 50 mer array targeting only about 100 genes and facilitating hybridisation in a volume as small as 4 µl. Oligomeric arrays have advantages over cDNA microarrays, because cross hybridisation is prevented and quality control by means of sequence validation of PCR clones is not required.25–27 Longer oligonucleotides, 50–70 nucleotides in length, are favoured because they require smaller amounts of target for each hybridisation. In addition, their sequence specificity is higher than that for the 20 mer oligonucleotides generally used, such that one probe/gene is generally sufficient. The microarray procedure includes several steps, such as hybridisation and signal amplification, in which it is difficult to control for the level of variability. The use of a common reference RNA pool (uRNA) as a standard in each experiment may allow differences in array variations to be equalised. In this custom built oligomeric array, the gene and oligomer per cent present calls and the studentised residuals of signal intensities obtained in log ratio tests with 400 ng uRNA were high. However, there was no guarantee of reproducible and representative data if less starting material was used for amplification. Considering the low gene per cent present call measured on the oligomeric array for 200 ng amplified uRNA, there is no doubt that very few or no molecules would be available as template for several of the genes monitored by the microarray using 1, 10, and 100 ng RNA concentrations. This may lead to large discrepancies in gene expression profiling for small size specimens, as exemplified by the absence of a hybridisation signal for the uPAR gene, the expression of which is 15 to 25 times lower than Ker8, as measured by rtqPCR.

We aimed to identify the source of variation in gene expression data as a result of amplification from small size specimens, regardless of variations in the oligomeric array data set. Despite the increasing use of the T7 based global mRNA amplification procedure in the study of human diseases, the reproducibility and accuracy of this approach remain unclear. However, such information is essential to determine the extent to which the amplified sample resembles the unamplified one, and to extend the application of this technique to the study of clinical specimens. Therefore, we used a reliable rtqPCR based assay to assess two protocols optimised for the amplification of small size RNA samples. One round of amplification was found to be sufficient to generate 10–25 µg aRNA from 100 ng starting material, whereas two consecutive amplification rounds resulted in the production of 12.5 µg and 50 µg aRNA from 10 ng uRNA. These yields are higher or similar to those reported in previous studies based on similar methods.28,29 However, linearity for different amounts of starting material of the global mRNA amplification protocols, estimated by amplification rates, was not achieved with either method. Conflicting reports have been published claiming either that the technique is highly reproducible and linear,28,30,31 or describing high levels of variability and transcript 5′ truncation for samples as small as 100 ng RNA.29

“Any skewing of an mRNA population resulting from amplification would be compounded by the multiple rounds of amplification necessary when working with rare cell specimens such as bone marrow micrometastases”

We compared the technical and biological variability of the amplification procedures. Although the amplification protocols tested herein were similar, slight differences in enzyme/reagent sources and/or nucleic acid purification methods seemed to introduce substantial variability. Moreover, any skewing of an mRNA population resulting from amplification would be compounded by the multiple rounds of amplification necessary when working with rare cell specimens such as BM micrometastases. Probe shortening can be detected by comparing signals, using several primer sets binding at various distances from the 3′ end of the transcript. In agreement with previous studies,17,29 we found that RNA amplification systematically decreased with the distance of the target sequence from the 3′poly(A). Moreover, caution is required when using amplification procedures because IVT may introduce a bias in initial transcript levels as a result of sequence specificity of the RNA polymerase. This phenomenon is partly the result of uncontrolled termination because of very long poly(A) tails or strong secondary structures in the mRNA. Considering that a finite number of molecules is in fact used as template for each gene under study, variations seen may also reflect stochastic variations in the small number of molecules available. In addition, the method used to prepare the starting material may introduce some variability, because the two amplification protocols behaved differently with uRNA and cell extract RNA samples.

Measures such as aRNA yields may be useful for the evaluation of novel amplification protocols or the comparison of existing ones, but the key information is the extent to which differences in gene expression can be detected in an experiment. Amplification efficiency may well differ from one gene to another. In fact, the relative intensities of gene expression varied little under our conditions. Furthermore, we did not observe the “compression” phenomenon reported in previous studies in which small amounts of RNA were used as starting material for amplification.17 In contrast, the gene expression profiles of uPAR, Ker8, and GAPDH in amplified RNA clearly differed, but were unaffected by the amount of starting material.

Take home messages

  • For the genes and the amplification protocols used here, we found that amplification was responsible for considerable variability in gene expression profiles, and that transcript ratios were not preserved

  • Oligomeric array hybridisation with 20 µg reference RNA resulted in specific and reproducible signals for 83% of the genes, whereas mRNA amplification from less than 400 ng of starting material resulted in selective detection of signals from highly expressed genes

  • Thus, improvements in the design of global mRNA amplification procedures and oligomeric arrays are required for the extraction of informative gene expression data from extremely limited cell numbers, such as bone marrow breast carcinoma micrometastases

Hence, within the limits investigated so far, distinguishing between genes with truly different patterns of expression and genes that are simply affected by various sources of noise remains a challenge. For the data set under study (the genes chosen, and the amplification protocols used), we found that amplification was responsible for considerable variability, with transcript ratios not preserved. If less than 100 ng of starting material is used, even Affymetrix amplification protocols optimised for small samples are not recommended for array hybridisation experiments.29 Caution may therefore be necessary when interpreting results obtained from amplification procedures. New strategies are being developed in the field.32,33 New oligomeric arrays with better sequence information and improved probe selection (nearer the 3′poly(A) of transcripts) will be considered. Improvements in the design of global mRNA amplification procedures and oligomeric arrays are required for the extraction of informative gene expression data from extremely limited cell numbers, such as BM breast carcinoma micrometastases.


This work was sponsored by the “Programme Incitatif et Coopératif” on micrometastases at the Institut Curie.


Supplementary materials

  • Supplementary Material

    The supplementary material is available as a downloadable PDF (printer friendly file) and a MS PowerPoint file.

    For the PDF file, if you do not have Adobe Reader installed on your computer, you can download this free-of-charge, please Click Here

    Files in this Data Supplement: