Article Text
Abstract
Aims PCR amplicon-based next-generation sequencing (NGS) panels are increasingly used for clinical diagnostic assays. Amplification bias is a well-known limitation of PCR amplicon-based approaches. We sought to characterise lower-performance amplicons in an off-the-shelf NGS panel (TruSight Myeloid Sequencing Panel) for myeloid neoplasms and attempted to patch the low read depth for one of the affected genes, CEBPA.
Methods We performed targeted NGS of 158 acute myeloid leukaemia samples and analysed the amplicon read depths across 568 amplicons to identify lower-performance amplicons. We also correlated the amplicon read depths with the template GC content. Finally, we attempted to patch the low read depth for CEBPA using a parallel library preparation (Nextera XT) workflow.
Results We identified 16 lower-performance amplicons affecting nine genes, including CEBPA. There was a slight negative correlation between the amplicon read depths and template GC content. Addition of the separate CEBPA library generated a minimum read depth per base across the CEBPA gene ranging from 268x to 758x across eight samples.
Conclusions The identification of lower-performance amplicons will be informative to laboratories intending to use this panel. We have also demonstrated proof-of-concept that different libraries (TruSight Myeloid and Nextera XT) can be combined and sequenced on the same flow cell to generate additional reads for CEBPA.
- GENETICS
- LEUKAEMIA
- MOLECULAR PATHOLOGY
Statistics from Altmetric.com
Introduction
Next-generation sequencing (NGS) platforms are being rapidly adopted for clinical application in a wide spectrum of disease conditions. Methodological approaches comprise either a global genome-wide analysis such as whole-genome or whole-exome sequencing or targeted panels comprising tens to hundreds of genes.1
PCR amplicon-based panels are widely used for targeted workflows.2 One of the well-known limitations of a PCR amplicon-based approach is the presence of PCR amplification bias, for which template GC content is one of the recognised reasons.3–5
We have profiled a local cohort of acute myeloid leukaemia (AML) samples using an off-the-shelf amplicon-based targeted panel. Our objectives in this study were to identify amplicons that displayed persistently low read depth (which we term lower-performance amplicons) and to attempt to identify factors (such as template GC content) that result in low amplicon read depth. We also successfully ‘patch’ CEBPA, one of the genes affected by lower-performance amplicons, using a separate library preparation workflow.
Materials and methods
Study population
One hundred and fifty-eight AML bone marrow or peripheral blood samples were obtained from the archives of the Department of Haematology-Oncology, National University Hospital, Singapore.
NGS sample preparation and library generation
Genomic DNA was obtained using the Gentra Puregene Blood Kit (Qiagen, Valencia, California, USA) according to the manufacturer's instruction. Input DNA quantitation was performed using a Qubit fluorometer (Life Technologies, Carlsbad, California, USA) with 50 ng input per sample. Quantitated DNA was then processed using the TruSeq Custom Amplicon (TSCA) assay for the TruSight Myeloid Sequencing Panel (Illumina), which includes the hybridisation of upstream and downstream oligos, extension-ligation to produce PCR-ready templates, followed by PCR amplification with unique dual-index primers and adapters. PCR products were examined via agarose gel electrophoresis. After PCR clean-up via AMPure XP beads, the normalised libraries were pooled and loaded on a MiSeq for sequencing using the 600-cycleMiSeq Reagent Kit v3 to generate 2×150 read lengths. The TruSight Myeloid Sequencing Panel comprises 568 amplicons interrogating 54 genes associated with myeloid neoplasms: ABL1, ASXL1, ATRX, BCOR, BCORL1, BRAF, CALR, CBL, CBLB, CBLC, CDKN2A, CEBPA, CSF3R, CUX1, DNMT3A, ETV6, EZH2, FBXW7, FLT3, GATA1,GATA2, GNAS, HRAS, IDH1, IDH2, IKZF1, JAK2, JAK3, KDM6A, KIT, KMT2A, KRAS, MPL, MYD88, NOTCH1, NPM1, NRAS, PDGFRA, PHF6, PTEN, PTPN11, RAD21, RUNX1, SETBP1, SF3B1, SMC1A, SMC3, SRSF2, STAG2, TET2, TP53, U2AF1, WT1 and ZRSR2.
CEBPA-specific Nextera XT sample preparation
PCR amplification of the CEBPA gene was performed using 3 µL of genomic DNA (100 ng/µL) via the Expand Long Range dNTPack (Roche Diagnostics GmbH, Mannheim, Germany). Both PCR primers contain the 34-bp Nextera transposase sequences to enable tagmentation reaction at both ends of the amplicons. Each reaction contains 1× Expand Long Range Buffer with MgCl2; 0.5 mM dNTPs and 3.5 U of enzyme mix; 0.3 µM of both forward/reverse primers and 8% DMSO. The 50 µL reaction mixture was initially incubated for 2 min at 96°C, followed by 10 cycles of denaturation (96°C for 10 s), annealing-extension (69°C for 3 min), then 20 cycles of denaturation (96°C for 10 s), annealing-extension (69°C for 3 min, with an additional 20 s cycle elongation for every successive cycle) and a final extension at 70°C for 7 min. Five microlitres of the 1340-bp CEBPA PCR products (0.2 ng/µL) was processed using the Nextera XT DNA Sample Preparation Kit (Illumina, San Diego, California, USA) according to the manufacturer's protocol (ver. October 2012). Briefly, the preparation workflow involves tagmentation of amplicons by the Nextera XT transposome at 55°C for 5 min followed by adding 5 µL of Neutralize Tagment Buffer for 5 min to neutralise the reaction. The tagmented DNA is then amplified via a limited-cycle PCR programme (12 cycles) which adds adapters required for cluster establishment. Post-amplification clean-up was performed using 90 µL of AMPure XP beads (Beckman Coulter Inc, Australia). The purified library DNA was then normalised by adding 45 µL of combined library normalisation additives/beads. Both the TruSight Myeloid and CEBPA-specific Nextera XT libraries were combined for sequencing on a MiSeq system run with v3 reagents. The final diluted amplicon library pooling ratio used was 57 TruSight Myeloid libraries (eight samples): 1 CEBPA-specific Nextera XT libraries (eight samples).
Bioinformatics analysis
Amplicon coverage data (.coverage.csv files) were generated from the TruSeq Amplicon (BaseSpace Workflow) V.1.1.0.0 algorithm and ISIS (Analysis Software) V.2.4.6.20 (Illumina). In this study, the average read depth per amplicon is defined as the mean coverage of the particular amplicon across all 158 samples. The average read depth was used to identify lower-performance amplicons (defined as amplicons with an average read depth less than 500×). The average read depth was correlated with the GC content of the amplicon, where the GC content is calculated by the ‘nuc’ function in BEDTools2.6 In addition, the read depth of the ‘patched’ CEBPA amplicons was calculated from BAM files generated from the CEBPA-specific Nextera XT workflow using the ‘depth’ function in SAMtools.7
Results
Characterisation of the mean read depth per amplicon
The mean amplicon read depth (across 568 amplicons) per sample ranged from 2249.8x to 18069.1x. In general therefore, the mean amplicon read depth (across 568 amplicons) per sample was above 500× (threshold used to define lower-performance amplicons) for all 158 samples.
However, the mean read depth per amplicon (across 158 samples) ranged from 1.1x (DNMT3A.CDS.6.line.68.chr2.25475062.25475066_tile_1.1) to 33790.2x (TET2.exon.3.line.113.chr4.106155053.106158597_tile_17.1). As seen in figure 1, 16 of the 568 (2.8%) amplicons showed a mean read depth less than 500×, which is the minimum coverage recommended for somatic mutation detection by some authors (http://www.wadsworth.org/labcert/TestApproval/forms/NextGenSeq_ONCO_Guidelines.pdf; accessed 28 October 2015). Overall, the performance exceeded the manufacturer's specifications (>95% of amplicons should have a read depth >500×).
Characterisation of lower-performance amplicons
As seen in table 1, the 16 lower-performance amplicons affected nine genes: DNMT3A, GATA2, CUX1, CDKN2A, CEBPA, RUNX1, STAG2, BCOR and BCORL1. These lower-performance amplicons were further characterised with regard to their GC content and the percentage of samples with zero coverage (table 1).
Of note, the read depths for several of these amplicons did not increase proportionally with an increase in mean amplicon read depths (across 568 amplicons) per sample (figure 2), suggesting that increasing the amount of pooled amplicons for sequencing may not result in significantly improved read depth in these lower-performance amplicons.
The mean read depth per amplicon and GC content
We observed an inverse relationship between the mean read depth per amplicon and GC content (figure 3), suggesting that amplicon GC content may be one of the variables affecting the mean amplicon read depth. Examples of such amplicons with high GC content (>0.7) and low amplicon read depth include those present within CDKN2A, CEBPA and CUX1 (table 1).
Patching the lower-performance CEBPA amplicons using an additional CEBPA library
An additional CEBPA library generated a minimum read depth per base across the CEBPA gene ranging from 268x to 758x across eight samples.
Discussion
In this study, we have performed a detailed evaluation of the read depth of the 568 amplicons across 158 samples. We identified 16 amplicons that met the criteria for lower performance, among which include amplicons covering the CEBPA gene. Mutational analysis of the CEBPA gene is known to be technically challenging due to various reasons such as its high GC content,8 and our findings are consistent with the reported literature.
Besides template GC content, other factors that might account for PCR bias include high melting temperature regions that could not be predicted by overall assessment of GC content.9
Our identification of lower-performance amplicons informs the design of an alternative approach to overcome this issue. A PCR amplification step targeting the lower-performing amplicons may be performed in parallel to the TruSight Myeloid library preparation step, and the amplicons sequenced together with the TruSight library. This has been successfully accomplished with the HaloPlex methodology.10
Among the various genes that were affected by the lower-performing amplicons, we opined that CEBPA was the most important. This is in view of emerging guidelines that ‘AML with CEBPA mutation’ will formally become an entity,11 necessitating molecular haematology services to offer CEBPA mutational analysis as standard of care. To this end, we performed a separate library preparation (Nextera XT) workflow for the CEBPA gene and combined this library with the TruSight Myeloid library for eight samples. Sequencing of both libraries was performed on the same flow cell. Bioinformatics analysis showed that the CEBPA library generated a minimum read depth per base across the CEBPA gene ranging from 268x to 758x across eight samples. Four samples still had a minimum read depth per base less than 500×. While this is not ideal, we think that this issue might be solved by increasing the amount of CEBPA library added to the pooled amplicon library for sequencing. Also, it is in theory possible to merge the data (either FASTQ or BAM files) from both libraries to generate a combined read depth, which we did not explore in this study.
In summary, we have identified lower-performance amplicons within an off-the-shelf targeted NGS panel, and this will be informative for laboratories intending to use this panel. We have also demonstrated proof-of-concept that two different libraries can be combined and sequenced on the same flow cell to generate additional reads for CEBPA, which is a clinically important gene affected by lower-performing amplicons. Moving forward, we intend to further study and optimise the workflow to ensure adequate read depth coverage for all the genes affected by the identified lower-performance amplicons once firm evidence for their clinical utility emerges.
Take home messages
PCR amplicon-based next-generation sequencing (NGS) panels, increasingly used for clinical diagnostic assays, suffer from amplification bias.
We performed targeted NGS on 158 samples using an off-the-shelf panel for myeloid neoplasms and identified and characterised 16 lower-performance amplicons that did not attain a minimum read depth of 500×.
We attempted to ‘patch’ one of the genes affected by these lower-performance amplicons, CEBPA, using a parallel workflow, demonstrating proof-of-concept that two different libraries can be combined and sequenced on the same flow cell.
Acknowledgments
The authors are grateful to Kahlil Lawless, Yue Ying Tan, Heidi Cheng and Kenneth Yew from Illumina Singapore for technical support.
Footnotes
Handling editor Runjan Chetty
Contributors BY, ES-CK and W-JC designed the study. ES and CHN organised the clinical samples. CN, PTH, P-LL and LC performed the experiments. YH, KHKB and TWT analysed the data.
Funding W-JC is supported by NMRC Clinician Scientist Investigator Award. This work is partly supported by Singapore Cancer Syndicate Grant. This research is supported by the National Research Foundation Singapore and the Singapore Ministry of Education under the Research Centers of Excellence initiative.
Competing interests None declared.
Ethics approval Institutional review board, NUHS (reference number 2015/00111).
Provenance and peer review Not commissioned; externally peer reviewed.