Aim The presence of biallelic CEBPA mutations is a favourable prognostic feature in acute myeloid leukaemia (AML). CEBPA mutations are currently identified through conventional capillary sequencing (CCS). With the increasing adoption of next-generation sequencing (NGS) platforms, challenges with regard to amplification efficiency of CEBPA due to the high GC content may be encountered, potentially resulting in suboptimal coverage. Here, the performance of an amplicon-based NGS method using a laboratory-developed CEBPA-specific Nextera XT (CEBNX) was evaluated.
Methods Mutational analyses of the CEBPA gene of 137 AML bone marrow or peripheral blood retrospective specimens were performed by the amplification of the CEBPA gene using the Expand Long Range dNTPack and the amplicons processed by CCS and NGS. CEBPA-specific libraries were then constructed using the Nextera XT V.2 kit. All FASTQ files were then processed with the MiSeq Reporter V.188.8.131.52 using the PCR Amplicon workflow via the customised CEBPA-specific manifest file. The variant calling format files were analysed using the Illumina Variant Studio V.2.2.
Results A coverage per base of 3631X to 28184X was achieved. 22 samples (16.1%) were found to contain CEBPA mutations, with variant allele frequencies (VAF) ranging from 3.8% to 58.2%. Taking CCS as the ‘gold standard’, sensitivity and specificity of 97% and 97% was achieved. For the transactivation domain 2 polymorphism (c.584_589dupACCCGC/p.His195_Pro196dup), the CEBNX achieved 100% sensitivity and 100% specificity relative to CCS.
Conclusions Our laboratory-developed CEBNX workflow shows high coverage and thus overcomes the challenges associated with amplification efficiency and low coverage of CEBPA. Therefore, our assay is suitable for deployment in the clinical laboratory.
- molecular pathology
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Acute myeloid leukaemia (AML) is a malignancy of undifferentiated myeloid precursors.1 In the recent WHO classification, AML with biallelic mutations of CEBPA is recognised as a distinct category with a favourable prognosis.2 Therefore, an accurate platform is needed for CEBPA mutational analysis.
Conventional capillary sequencing (CCS) has hitherto been the gold-standard DNA sequencing technique.3 However, this method is limited by its sensitivity and throughput when concurrent analysis of multiple genes is required.3 Recently, advancements in sequencing technology have led to the emergence of next-generation sequencing (NGS) platforms, which in general have higher throughput and sensitivity relative to conventional sequencing methods.4 However, amplification efficiency and suboptimal coverage may affect amplicon-based NGS panel assays, as we have previously reported for CEBPA.5
The CEBPA gene, located on chromosome band 19q13.1, is intron-less and has a GC-rich coding region (approximately 75% GC content).6 It encodes for a protein that belongs to the basic leucine zipper family of transcription factors, and consists of two N-terminal transactivation domains and a C-terminal DNA-binding and dimerisation motif.7 8 CEBPA is expressed in myelomonocytic cells and is specifically upregulated during granulocyte differentiation.9
Approximately 15% of all patients with AML have mutations in the CEBPA gene.6 There are two main types of mutations: N-terminal frameshift mutations and C-terminal mutations. N-terminal mutations occur between the major translational start codon and second ATG in the same open reading frame. This leads to a premature stop of translation of the wild-type (WT) p42 CEBPA protein, while conserving the translation of a short p30 isoform. The p30 isoform inhibits the function of the full-length protein by a dominant negative mechanism.10 C-terminal mutations are generally in-frame insertions/deletions in the DNA-binding domain (DBD) or basic leucine zipper (bZIP) domain that disrupt binding to DNA or dimerisation.8 Mutations in the transactivation domain 1 (TAD1) and transactivation domain 2 (TAD2) domains were shown to result in early termination of the protein sequence, giving rise to a truncated CEBPA protein.11 One of the commonly occurring polymorphisms in CEBPA is located in the TAD2 domain, which is an in-frame duplication of 6 bp (c.584_589dupACCCGC) resulting in four ACCCGC repeats and a His-Pro duplication (p.His195_Pro196dup), whereas the WT sequence contains three repeats of ACCCGC.12 The incidence of this polymorphism was reported to be 6.6% in healthy controls and 6.5% in patients with AML.12
Due to the high GC content of the gene leading to suboptimal amplification efficiency, we observed poor coverage and read depth in this gene using the TruSight Myeloid Sequencing Panel (TMSP).5 Hence, we designed a CEBPA-specific NGS protocol, CEBPA-specific Nextera XT amplicon workflow (CEBNX) to overcome these challenges. In this study, we evaluated the performance characteristics of the CEBNX.
Materials and methods
Patient cohort and cell lines
One hundred and thirty-seven banked AML bone marrow (BM) or peripheral blood (PB) samples were obtained from the archives of the Department of Haematology-Oncology, National University Hospital, Singapore. These samples were used for analytical validation of the CEBNX. Of these 137 samples, 122 samples were previously analysed by the TMSP.5
Cell line mix
The MOLM-14 cell line harbours the common TAD2 polymorphism (c.584_589dupACCCGC). To determine the limit of detection (LoD) of CEBNX and TMSP, the MOLM-14 cell line was mixed with the OCI-AML3 cell line to obtain expected variant allele frequencies of 2%, 5%, 10%, 20% and 40%. Genomic DNA (gDNA) was then extracted and analysed by CCS, CEBNX and TMSP.
CEBPA long-range PCR and CCS
A summary of the workflow is shown in figure 1. gDNA was obtained using the Gentra Puregene Blood Kit (Qiagen, Valencia, California, USA) as per the manufacturer’s instruction. PCR amplification of the CEBPA gene was performed using 100 ng/µL of genomic DNA using the Expand Long Range dNTPack (Roche Diagnostics GmbH, Mannheim, Germany). Each reaction contained 1X Expand Long Range Buffer with MgCl2; 0.5 mM dNTPs and 3.5 U of enzyme mix; 0.3 µM of both forward/reverse primers and 8% dimethyl sulfoxide (DMSO). The 25 µL reaction mixture was initially incubated for 2 min at 96°C, followed by 10 cycles of denaturation (96°C for 10 s), annealing-extension (69°C for 3 min), then 20 cycles of denaturation (96°C for 10 s), annealing-extension (69°C for 3 min, with an additional 20 s cycle elongation for every successive cycle) and a final extension at 70°C for 7 min. Products were then separated on a 2.0% agarose gel with 1X Tris-borate-EDTA, followed by gel purification and cycle sequencing using the BigDye Terminator, V.3.1, Cycle Sequencing Kit (Applied Biosystems). Sequencing products were then cleaned by using DyeEx V.2.0 Spin Kit (Qiagen, Valencia, California, USA) as per the manufacturer’s instructions. The cleaned sequencing products were loaded on a 3130xl Genetic Analyzer (Applied Biosystems). Results were then analysed using Sequencing Analysis Software V.5.2 (Applied Biosystems). All sequencing results were aligned and analysed using the ATF Software (Conexio Genomics, Fremantle Western Australia, Australia) and Mutation Surveyor Software (SoftGenetics, State College, Pennsylvania, USA).
Library preparation for NGS: CEBNX and TMSP
CEBPA-specific library preparation for 137 samples was performed as previously described.12 Briefly, long-range PCR and Nextera XT library preparation was performed to fragment the amplicon into varying sizes. PCR amplification of the CEBPA gene was performed using 100 ng/µL of genomic DNA via the Expand Long Range dNTPack (Roche Diagnostics GmbH, Mannheim, Germany). Both PCR primers contain the 34 bp Nextera transposase sequences to enable tagmentation reaction at both ends of the amplicons. Each reaction contains 1×Expand Long Range Buffer with MgCl2; 0.5 mM dNTPs and 3.5 U of enzyme mix; 0.3 mM of both forward/reverse primers and 8% DMSO. The 50 mL reaction mixture was initially incubated for 2 min at 96°C, followed by 10 cycles of denaturation (96°C for 10 s), annealing-extension (69°C for 3 min), then 20 cycles of denaturation (96°C for 10 s), annealing-extension (69°C for 3 min, with an additional 20 s cycle elongation for every successive cycle) and a final extension at 70°C for 7 min. Five microlitres of the 1340 bp CEBPA PCR products (0.2 ng/mL) was processed using the Nextera XT DNA Sample Preparation Kit (Illumina, San Diego, California, USA) according to the manufacturer’s protocol (ver. October 2012). Briefly, the preparation workflow involves tagmentation of amplicons by the Nextera XT transposase at 55°C for 5 min followed by adding 5 mL of Neutralise Tagment Buffer for 5 min to neutralise the reaction. The tagmented DNA is then amplified via a 12-cycle PCR protocol, which adds adapters required for cluster establishment. Postamplification clean-up was performed using 90 mL of AMPure XP beads (Beckman Coulter, Australia). The purified library DNA was then normalised by adding 45 mL of combined library normalisation additives/beads. Both the TMSP and CEBNX libraries were combined for sequencing on a MiSeq system run with v3 reagents.
The TMSP library preparation was also performed on 122 samples as previously described.5 In brief, quantitation of gDNA was performed using a Qubit fluorometer (Life Technologies, Carlsbad, California, USA) with 50 ng input per sample. Quantitated DNA was then processed using the TruSeq Custom Amplicon (TSCA) assay for the TruSight Myeloid Sequencing Panel (Illumina, San Diego, California, USA), which involves the hybridisation of upstream and downstream oligos, extension-ligation to produce PCR-ready templates, followed by PCR amplification with unique dual-index primers and adapters. PCR products were examined using agarose gel electrophoresis. After PCR clean-up via AMPure XP beads, the normalised libraries were pooled and loaded on a MiSeq for sequencing using the 600-cycle MiSeq Reagent Kit v3 to generate 2×150 read lengths. The TMSP comprises 568 amplicons interrogating 54 genes associated with myeloid neoplasms: ABL1, ASXL1, ATRX, BCOR, BCORL1, BRAF, CALR, CBL, CBLB, CBLC, CDKN2A, CEBPA, CSF3R, CUX1, DNMT3A, ETV6, EZH2, FBXW7, FLT3, GATA1,GATA2, GNAS, HRAS, IDH1, IDH2, IKZF1, JAK2, JAK3, KDM6A, KIT, KMT2A, KRAS, MPL, MYD88, NOTCH1, NPM1, NRAS, PDGFRA, PHF6, PTEN, PTPN11, RAD21, RUNX1, SETBP1, SF3B1, SMC1A, SMC3, SRSF2, STAG2, TET2, TP53, U2AF1, WT1 and ZRSR2.
Bioinformatics data analysis
Paired-end FASTQ files generated by the MiSeq Software V.2.6 were processed with MiSeq Reporter V.184.108.40.206 using the TruSeq or PCR Amplicon workflow via the TruSight Myeloid Manifest file downloaded from the Illumina website or a user-customised manifest file, respectively. The customised manifest file targeted the CEBPA gene (NM_004364.3) only at nucleotide positions 33792213–33793514 of chromosome 19, with upstream and downstream probe lengths of 23 and 17, respectively. Briefly, paired-end alignments were performed using the mem-algorithm of the BWA-0.7.9a-isis-1.0.0 software, against the reference genome (Homo sapiens, hg19, build 37.2), followed by variant calling using the Somatic VariantCaller-3.6.2 with default settings.
The variant calling format file generated was imported into the Illumina Variant Studio V.2.2 for variant annotation, alongside a TruSight Myeloid Amplicon BED file downloaded from the Illumina website. Additional filtering parameters were applied to facilitate the data interpretation. Briefly, all genotype categories (ie, heterozygote, homozygote and hemizygote), all variant types (ie, single nucleotide variations, insertions, deletions and reference), all possible consequences of the variants (ie, missense, frameshift, stop-gained, stop lost, initiator codon, in-frame insertion, in-frame deletion and splice), pass filter with quality >99, read depth >100 and alternate variant frequency of at least 5, were defined in the Variant Studio. In addition, only variants with population frequency of <5% in all populations were included. The depth of coverage for the CEBPA amplicon was obtained using the Depth Of Coverage tool available from the Genome Analysis Toolkit (V.3.4–46)13 and plotted using an in-house python script. In addition, the raw FASTQ files were processed with Linux software, using manual GREP tool to isolate fragments with 3 and 4 repeats. The number of fragments containing full 3-repeats and 4-full repeats were counted.
Statistical methods: sensitivity and specificity
Sensitivity is calculated as the percentage of CEBPA mutations detected by CEBNX over those detected by CCS, or the percentage of CEBPA mutations detected by TMSP over those detected by CEBNX NGS. Specificity is calculated as the percentage of WT CEBPA detected by CEBNX over those detected by CCS, or the percentage of WT CEBPA detected by TMSP over those detected by CEBNX.14 Sensitivity and specificity were calculated using statistical software Stata/SE14.0 (StataCorp, USA).
CEBPA-specific NGS via the Nextera XT workflow (CEBNX)
A median amplicon coverage graph was plotted using an in-house python script to assess the quality of the runs. The median of the coverage in each nucleotide across the CEBPA gene is shown by the dark grey line in figure 2, with the minimum median coverage of 3631X located at nucleotide position (np) 112 and the maximum coverage of 28 184X located at np 946. All sequencing runs had an average of >90% of clusters passing filter. The sequencing quality metrics on the MiSeq system is shown in table 1.
Limit of detection: CEBNX and TMSP
The LoD refers to the lowest amount of analyte measured that can be detected; for mutation-based tests, this refers to the minimum detectable variant allelic fraction in a given sample.15 Through CCS, we confirmed the presence of the TAD2 polymorphism in the MOLM-14 cell line (figure 3A). As determined by cell line mixture experiments, the LoD of CEBNX and TMSP are 5% and 2%, respectively (figure 3B).
Comparison of CEBNX vs CCS
A total of 137 retrospective samples were analysed using both the CEBNX and CCS. CEBNX detected 32 variants: 3 insertion (9.4%), 13 deletion (40.6%), 10 duplication (31.3%) and 6 substitution (18.8%) mutations (sensitivity: 97%, 95% CI 82% to 100% and specificity: 97%, 95% CI: 92% to 99%). CEBNX revealed four additional mutations (AD371, c.238dupG; AD38, c.568T>C; AD62, c.934C>A and AD544, c.840_842delGAA) that were not detected by CCS. Most mutations (34.4%, 11 out of 32 mutations) occurred in the C-terminal region of the CEBPA gene. The spectrum of mutations found in the CEBPA gene by CEBNX is shown in figure 4 and table 4. Table 2 shows the concordance between CEBNX and CCS, while figure 5 shows an example of a mutation detected by CEBNX but not by CCS.
Comparison of CEBNX vs TMSP
Results from CEBNX and TMSP were compared for 122 samples. TMSP identified 29 mutations (table 3): 3 insertion (10.34%), 10 deletion (34.48%), 7 duplication (24.14%) and 9 substitution (31.03%) variants. CEBNX identified 14 mutations (AD165, c.562_562insT; AD165, c.558_561delGCCG and c.558_566delGCCGCCGCC; AD193, c.435_436dupCC; AD351, c.134_136delCACCTGCCGCCCC and c.934_936dupCAG; AD267, c.138delT; AD193, c.435_436dupCC; AD371, c.238dupG; AD492, c.376_379dupGGGC; AD532, c.68dupC; AD38, c.568T>C; AD62, c.934C>A; AD399, c268A>T) that were not observed in the TMSP (table 4). The VAFs of mutations detected only by TMSP range from 7.8% to 85.7% (sensitivity: 47%, 95% CI 31% to 62% and specificity: 94%, 95% CI 88% to 98%) (table 4). Table 2 shows the concordance between CEBNX and TMSP.
TAD2 (c.584_589dupACCCGC) polymorphism in the CEBPA gene: CCS, CEBNX and TMSP
We observed 100% concordance for CEBNX and CCS in the detection of the TAD2 polymorphism (n=22) (sensitivity: 100%, 95% CI 87% to 100%; specificity: 100%, 95% CI: 97% to 100%). However, nine of them (AD229, AD330, AD331A, AD372, AD416, AD472, AD475, AD532 and AD560) were not detected by the TMSP NGS (sensitivity: 59%, 95% CI 36% to 79% and specificity: 100%, 95% CI 97% to 100%). A comparison of the CEBNX and TMSP NGS results are shown in Tables 2 and 5.
In this study, we report the performance characteristics of a NGS-based assay for CEBPA mutational analysis. This assay is 96.5% concordant with CCS for the identification of a variety of CEBPA mutations that include point mutations and indels. In addition, it is 100% concordant with CCS for the identification of the common TAD2 polymorphism (c.584_589dupACCCGC). Our protocol may be useful for laboratories that intend to use the TMSP (or other amplicon-based NGS panels) as a laboratory-developed test and require solutions for adequate coverage of CEBPA.16
Specific issues that may be worth highlighting first include the observation that variant calling was confounded by the presence of repeat or palindromic sequences that are present in CEBPA. For instance, in our study, CCS detected a mutation (AD544; c.838_840delAAG) which was not identified by CEBNX. However, CEBNX identified another mutation, c.840_842delGAA, for the same sample. Manual inspection of the CCS result showed (REFSEQ: 5'-GAAGTCGGTGGACAAGAACAGCAACGAGTA-3' → MUT: 5'-GAAGTCGGTGGAC_AACAGCAACGAGTA-3'), whereas the Variant Studio V.2.2 reported the result to be c.840_842delGAA (REFSEQ: 5'-GAAGTCGGTGGACAAGAACAGCAACGAGTA-3' → MUT: 5'- GAAGTCGGTGGACAA_CAGCAACGAGTA-3'), with both deletions resulting in the same mutant sequence. Standardisation of variant-calling pipelines is an important aspect that needs to be addressed.
Another example pertains to the common TAD2 (c.584_589dupACCCGC) polymorphism. We observed TAD2 polymorphism VAFs ranging from 11.0% to 35.1% by CEBNX. This finding appears to be incongruous with the expected VAF of 50%, given the heterozygous status of this widely known polymorphism.16 We believe that this is due to the presence of fragmented libraries that therefore pose challenges for accurate VAF determination of duplication variants. Of note, CEBPA sequence with TAD2 polymorphism contains four repeats of ‘ACCCGC’ (4RP), while the WT sequence contains three repeats of ‘ACCCGC’ (3RP). CEBNX randomly cleaves the CEBPA amplicons, generating fragments of different lengths (figure 6). Fragments with <3 ACCCGC repeats including fragments generated from sequences with four repeats, will be mapped to the WT sequence. This is a plausible explanation for the consistently lower observed (relative to expected) VAFs in CEBNX as shown in figure 3B. We further investigated this phenomenon by manually inspecting the FASTQ files using GREP tool via Linux and computed the total number of 3RP (with flanking 5’ and 3’ sequence) and 4RP (without flanking 5’ and 3’ sequence), assuming that there are no partial fragments occurring within 3RP and 4RP. Our results showed that the mapping algorithm and Somatic Variant Caller under-reports the VAFs of both 3RP and 4RP. This is further supported by the observation that the TAD2 polymorphism VAFs were consistently higher for the TMSP workflow, where intact (instead of fragmented) amplicons were mapped.
In addition, when comparing the performance of CEBNX and TMSP workflow for CEBPA mutational analysis, we observed that the TMSP detected several mutations that were not identified using CCS and the CEBNX (eg, AD165; c.568_569delT, c.564delG with VAFs of 81.6% and 78.5%, respectively). This could potentially be due to a difference in polymerase fidelity, as the polymerase used in TMSP is different from CEBNX. In TMSP, the Phusion polymerase is used, while CEBNX uses a thermostable Taq/TgoDNA polymerase blend that possesses superior 3'−5' exonuclease proofreading activity, which allows for the amplification of DNA three times more accurately than the regular Taq DNA polymerase.17
Lastly, as our pipeline can identify mutations that are present below the LoD of CCS, we believe that this has important clinical implications as the CEBPA mutational status (on which the current prognostic model of biallelic CEBPA mutant-AML having a favourable outcome rests on) has hitherto been defined by CCS or technologies with equivalent LoD. The clinical significance of subclonal CEBPA mutations (defined as CEBPA mutations below the LoD of CCS for the purposes of our study) is at present unknown, necessitating the need for future work in this area.
In summary, we have optimised an NGS workflow that may be useful as a complementary solution for attaining optimal CEBPA coverage where amplicon-based NGS panels have been found to be inadequate. However, the minor limitations of this CEBNX workflow centre mainly on the difficulty of accurately profiling variants that are repeat sequences, such as the common TAD2 (c.584_589dupACCCGC) polymorphism. The identification of subclonal CEBPA mutations using NGS technologies may pose challenges with regard to clinical management of patients with AML, and additional work is warranted to understand the clinical impact of such subclonal mutations.
Take home messages
The CEBPA-specific Nextera XT (CEBNX) protocol has a limit of detection (LoD) of 5%.
Both the sensitivity and specificity of CEBNX relative to conventional capillary sequencing (CCS) for CEBPA mutational analysis is 97%.
The high coverage of CEBNX enables detection of subclonal mutations in CEBPA .
CWSN and BK contributed equally.
Handling editor Runjan Chetty.
Funding This study was funded by Centre for Personalized and Precision Health (CPPH) of National University Hospital (NUH), Singapore (CPPH/FY2016/ProjectBudget/003).
Competing interests None declared.
Patient consent Obtained.
Provenance and peer review Not commissioned; externally peer reviewed.