Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

Single-cell genome-wide bisulfite sequencing for assessing epigenetic heterogeneity

Abstract

We report a single-cell bisulfite sequencing (scBS-seq) method that can be used to accurately measure DNA methylation at up to 48.4% of CpG sites. Embryonic stem cells grown in serum or in 2i medium displayed epigenetic heterogeneity, with '2i-like' cells present in serum culture. Integration of 12 individual mouse oocyte datasets largely recapitulated the whole DNA methylome, which makes scBS-seq a versatile tool to explore DNA methylation in rare cells and heterogeneous populations.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: scBS-seq is an accurate and reproducible method for genome-wide methylation analysis.
Figure 2: scBS-seq reveals DNA methylation heterogeneity in ESCs.

Similar content being viewed by others

Accession codes

Primary accessions

Gene Expression Omnibus

Referenced accessions

Sequence Read Archive

References

  1. Jones, P.A. Nat. Rev. Genet. 13, 484–492 (2012).

    Article  CAS  Google Scholar 

  2. Smith, Z.D. & Meissner, A. Nat. Rev. Genet. 14, 204–220 (2013).

    Article  CAS  Google Scholar 

  3. Jaitin, D.A. et al. Science 343, 776–779 (2014).

    Article  CAS  Google Scholar 

  4. Deng, Q. et al. Science 343, 193–196 (2014).

    Article  CAS  Google Scholar 

  5. Macaulay, I.C. & Voet, T. PLoS Genet. 10, e1004126 (2014).

    Article  Google Scholar 

  6. Lee, H.J. et al. Cell Stem Cell 14, 710–719 (2014).

    Article  CAS  Google Scholar 

  7. Miura, F. et al. Nucleic Acids Res. 40, e136 (2012).

    Article  CAS  Google Scholar 

  8. Shirane, K. et al. PLoS Genet. 9, e1003439 (2013).

    Article  CAS  Google Scholar 

  9. Chambers, I. et al. Nature 450, 1230–1234 (2007).

    Article  CAS  Google Scholar 

  10. Islam, S. et al. Nat. Methods 11, 163–166 (2014).

    Article  CAS  Google Scholar 

  11. Hayashi, K. et al. Cell Stem Cell 3, 391–401 (2008).

    Article  CAS  Google Scholar 

  12. Torres-Padilla, M.E. & Chambers, I. Development 141, 2173–2181 (2014).

    Article  CAS  Google Scholar 

  13. Ficz, G. et al. Cell Stem Cell 13, 351–359 (2013).

    Article  CAS  Google Scholar 

  14. Habibi, E. et al. Cell Stem Cell 13, 360–369 (2013).

    Article  CAS  Google Scholar 

  15. Stadler, M.B. et al. Nature 480, 490–495 (2011).

    Article  CAS  Google Scholar 

  16. Ziller, M.J. et al. Nature 500, 477–481 (2013).

    Article  CAS  Google Scholar 

  17. Hon, G.C. et al. Nat. Genet. 45, 1198–1206 (2013).

    Article  CAS  Google Scholar 

  18. Guo, H. et al. Genome Res. 23, 2126–2135 (2013).

    Article  CAS  Google Scholar 

  19. Smallwood, S.A. et al. Nat. Genet. 43, 811–814 (2011).

    Article  CAS  Google Scholar 

  20. Quail, M.A. et al. Nat. Methods 9, 10–11 (2012).

    Article  CAS  Google Scholar 

  21. Krueger, F. & Andrews, S.R. Bioinformatics 27, 1571–1572 (2011).

    Article  CAS  Google Scholar 

  22. Illingworth, R.S. et al. PLoS Genet. 6, e1001134 (2010).

    Article  Google Scholar 

  23. Creyghton, M.P. et al. Proc. Natl. Acad. Sci. USA 107, 21931–21936 (2010).

    Article  CAS  Google Scholar 

  24. Li, Y. et al. PLoS Biol. 8, e1000533 (2010).

    Article  Google Scholar 

  25. Bock, C. et al. Mol. Cell 47, 633–647 (2012).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank K. Tabbada and the Welcome Trust Sanger Institute sequencing pipeline team for assistance with Illumina sequencing, R. Walker for assistance with flow cytometry, T. Hore (Babraham Institute, Cambridge, UK) for providing ESCs maintained in 2i medium and serum conditions, and T. Hore, J. Huang, I. Macaulay, S. Lorenz, M. Quail, T. Voet and H. Swerdlow for helpful discussions. This work was supported by the UK Biotechnology and Biological Sciences Research Council grant BB/J004499/1, UK Medical Research Council grant MR/K011332/1, Wellcome Trust award 095645/Z/11/Z and EU FP7 EpiGeneSys and BLUEPRINT.

Author information

Authors and Affiliations

Authors

Contributions

S.A.S. and H.J.L. designed the study, prepared scBS-seq libraries, analyzed data and wrote the manuscript. F.K., H.S. and S.R.A. performed sequence mapping and analyzed data. J.P. contributed to technical developments. C.A. and O.S. analyzed data. O.S. provided advice on statistical analyses. W.R. and G.K. supervised the study and wrote the manuscript.

Corresponding authors

Correspondence to Wolf Reik or Gavin Kelsey.

Ethics declarations

Competing interests

W.R. is a consultant to Cambridge Epigenetix Ltd.

Integrated supplementary information

Supplementary Figure 1 Quality control of scBS-seq libraries.

(a) Mapping efficiency of scBS-Seq samples and negative controls. Boxplot representation of the mapping efficiencies (on sequences obtained after trimming and mapping against human genome) for each single cell and negative control (red crosses represent individual cell values). The overall higher mapping efficiency of oocytes versus ESCs can be explained by the amount of DNA in each cells (4n for MII oocytes and 2n for ESCs), resulting in a relatively lower contribution of spurious sequences in MIIs (see Supplementary Fig. 2). All negative controls had less than 3.5% mapping efficiency (the dashed line indicates 5% mapping efficiency). (b) Visualization of scBS-Seq library fragment size distribution on the Bioanalyser platform. The Bioanalyser trace of library MII#1 is shown as an example.

Supplementary Figure 2 Contribution of spurious sequences to scBS-seq mapping efficiency.

(a) The relatively low mapping efficiency of scBS-Seq is associated with a significant fraction of sequences mapping at multiple genomic locations, which are therefore discarded. (b) Analysis of the G+C content of the raw sequences (i.e. prior to mapping) of scBS-Seq libraries revealed many with <3% G+C, absent from bulk samples. These correspond to poly-T stretches (poly-Ts) (i.e., (T)N with N>50). Poly-Ts are present in both actual samples and corresponding negative controls suggesting a contaminant as their main source of origin. (c,d) The amount of poly-Ts is higher in ESCs than oocytes, and the percentage of sequences with poly-Ts and sequences with multiple alignments are tightly correlated across samples. (e) This suggests that poly-Ts are the major cause for scBS-Seq low mapping efficiency. To test this, we trimmed, from the raw fasq file, sequences containing poly-Ts of at least 50 bp in size and repeated the mapping. This resulted in a drastic reduction in the percentage of sequences with multiple alignments and an increase in the percentage of sequences with unique alignments. Poly-Ts are inherent to our current methodology, and while alternative protocols we developed do not generate these artifacts, they still yield significantly fewer measured CpGs.

Supplementary Figure 3 Saturation level of scBS-seq libraries.

For each individual MII scBS-Seq library and one representative example of bulk BS-Seq (PBAT), the percentage of informative CpGs is plotted for 10% increments of mapped sequences. This demonstrates that in contrast to the bulk BS-Seq example (black line), MIIs scBS-Seq libraries (colored lines) have not reached the plateau of saturating sequencing depth, indicating that further sequencing would yield additional information. MII#2 Deep Seq and MII#5 Deep Seq correspond to the deeper sequencing of these libraries (see main text and Supplementary Table 1).

Supplementary Figure 4 scBS-seq generates a digital output of DNA methylation.

(a) For each single MII BS-Seq library, and for the bulk MII sample, CpGs were grouped based on their read depth. The proportion of CpGs in each group with a methylation value of either 0% or 100% (digital output) was calculated for each sample. The boxplot represents the results from all 12 single MII libraries. The results from the bulk MII sample are superimposed as solid blue circles. As expected, the proportion of digital CpGs in the scBS-Seq libraries was very high (>90% for read depth 2-5 in all cells, dashed line). In contrast, the bulk sample had fewer digital CpGs (66% at read depth 5) due to cell-to-cell variability within the population. (b) Histograms of the distribution of CpG methylation values for MII bulk and MII single cells for CpGs with at least 2 reads.

Supplementary Figure 5 CpG concordance obtained from MIIs and ESCs using scBS-seq.

(a) CpG concordance was calculated for each cell pair as the proportion of overlapping CpGs with identical methylation state. On average, 1.8 M CpGs were measured for each pairwise analysis. Within each cell types, the order from bottom – up is the same than in Supplementary Table 1 (For oocytes bottom sample is MII#1 and top sample is MII#12). (b) Pearson correlation matrix of MIIs, 2i ESCs and serum ESCs scBS-Seq was calculated using 2 kb window methylation values.

Supplementary Figure 6 scBS-seq accurately determines CpG island (CGI) methylation status in MII oocytes.

(a) Heatmap displaying in individual MII libraries the methylation level of CGIs identified as methylated (>80%) and unmethylated (<20%; random selection) in bulk. The number on top indicates the number of individual MIIs in which CGIs are commonly informative. The discrepancy between the number of methylated and unmethylated CGIs informative across single cells reflects the different CpG density between these 2 groups as previously described19. (b) Histogram displaying for MII bulk and individual MII libraries the percentage of total CGIs (23,020) found methylated, unmethylated, with an intermediate level of methylation, and the percentage of wrong calls (i.e., CGI methylated in bulk (>80%) and called unmethylated (<20%) in single cells, and vice versa). (c) Boxplot presenting the methylation level in each individual MII of CGIs found methylated in bulk (>80%). The percentage of these CGIs informative in each MII with a methylation level lower than 80% is shown below the plot. (d) Similar to (c) for unmethylated CGIs (<20%).

Supplementary Figure 7 scBS-seq provides information on all genomic contexts.

(a) Snapshot displaying read distribution across 61 Mbp of chromosome 19. Below the annotation tracks are displayed the mapped reads and the quantification (number of reads per 25 kb window (log)). (b) The representation of different genomic contexts in single cell and bulk libraries is shown as fold enrichment over the expected value (dashed line). The boxplot represents the values for all single cell samples, and the bulk samples are superimposed as blue diamonds (MII), purple crosses (serum ESCs) and red plus signs (2i ESCs).

Supplementary Figure 8 Union and intersect for scBS-seq libraries.

Number of CpGs (a) and CGIs (b) for the union and intersect of all possible combinations of the 12 individual MII scBS-Seq libraries. The union shows that pooling data from multiple scBS-Seq samples increases the number of measured sites. The intersect shows that the number of measured sites common to multiple scBS-Seq datasets decreases as the number of datasets increases. Dotted lines show the information obtained in standard BS-Seq experiments as well as the number of CpGs and CGIs in the mouse genome.

Supplementary Figure 9 scBS-seq snapshot of the imprinted locus Plagl1.

The imprinted Plagl1 locus (top) and Plagl1 maternal DMR/CGI (bottom) is shown for all 12 individual MIIs, MIIs merged and MII bulk. Quantification is absolute level of methylation (%), at individual CpG resolution, as indicated on the scale on the left of each sample (0 is 0% methylation, 1 is 100% methylation).

Supplementary Figure 10 Comparison of cluster analyses for ESCs.

Cluster dendrograms are shown for (a) genome-wide methylation estimates (equivalent to the dendrogram shown in Figure 2b) and (b) the top 300 most variable sites among single ESC samples (equivalent to the dendrogram shown in Figure 2c). The cell IDs are included for direct comparison between dendrograms. (c) The distance matrix for the 300 most variable sites is grossly similar to that for all sites (Figure 2b). Cells are presented in the order shown in (b).

Supplementary Figure 11 Cluster dendrogram and distance matrix for the most variable sites in ESCs.

The top 300 ranked most variable sites in ESCs show similar methylation patterns across ESCs, as indicated by the low distance between sites.

Supplementary Figure 12 Detailed variance analysis for different genomic contexts.

(a) Receiver Operating Characteristic (ROC) curves showing the fraction of annotated sites (sensitivity) versus the fraction of non-annotated sites (1-specificity). Sites with high variance are more likely to belong to a given genomic context if the ROC curve is above the diagonal (e.g. H3K4me1), and less likely to belong to genomic contexts if the ROC curve is below the diagonal (e.g. CGI). (b) Different genomic contexts have different mean methylation values. (c) For most genomic contexts, variance was greatest for sites with mean methylation rates close to 50%. H3K27ac and H3K4me1 sites were among the most variable, even after accounting for mean methylation rate. CGI and p300 sites with intermediate mean methylation rates were also highly variable.

Supplementary Figure 13 Comparison of scRRBS and scBS-seq in MII oocytes.

(a) Summary table showing the number of raw sequences, informative CpGs and CGIs. For scRRBS, the number of CpG dinucleotides and the number of informative CGIs were calculated using the methylation calls present in the.bed file of GEO accession number GSE47343 from Guo et al.18. (b) Plots showing the number of raw sequences generated and the corresponding number of CpGs obtained in MII oocytes for both methods.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–13 (PDF 2148 kb)

Supplementary Table 1

Characteristics and statistics of scBS-seq libraries. (XLSX 40 kb)

Supplementary Table 2

Representation of regulatory regions in ESC scBS-seq datasets. (XLSX 20 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Smallwood, S., Lee, H., Angermueller, C. et al. Single-cell genome-wide bisulfite sequencing for assessing epigenetic heterogeneity. Nat Methods 11, 817–820 (2014). https://doi.org/10.1038/nmeth.3035

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.3035

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing