Article Text

Short history of 5-methylcytosine: from discovery to clinical applications
  1. Olga Taryma-Lesniak,
  2. Katarzyna Ewa Sokolowska,
  3. Tomasz Kazimierz Wojdacz
  1. Independent Clinical Epigenetics Laboratory, Pomeranian Medical University in Szczecin, Szczecin, Zachodniopomorskie, Poland
  1. Correspondence to Dr Tomasz Kazimierz Wojdacz, Independent Clinical Epigenetics Laboratory, Pomeranian Medical University in Szczecin, 71-252 Szczecin, Zachodniopomorskie, Poland; tomasz.wojdacz{at}


Covalent modifications of nucleotides in genetic material have been known from the beginning of the last century. Currently, one of those modifications referred to as DNA methylation, is impacting personalised medicine both as a treatment target and a biomarker source for clinical disease management. In this short review, we describe landmark discoveries that led to the elucidation of the DNA methylation importance in the cell’s physiology and clarification of its role as one of the major processes in disease pathology. We also describe turning points in the development of methodologies to study this modification, which ultimately resulted in the development of in-vitro diagnostic kits targeting disease related DNA methylation changes as biomarkers.

  • methods
  • diagnosis
  • neoplasms
  • biomarkers
  • tumor

Statistics from

Covalent modifications of genetic material

In the 1940s, nucleotides which are the building blocks of the genetic material were shown to contain covalent modifications, such as methyl group at the carbon 5 position on the pyrimidine ring of cytosines referred to as DNA methylation.1 In 1964, P R Srinivasan and E Borek, while studying the biological role of this phenomenon, suggested that this modification may act to protect DNA from enzymes (figure 1 represents a timeline schematic of the discoveries described in this short review).2 In the subsequent years, DNA modifications were identified to play a central role in the Restriction and Modification system used by bacteria as a defence against foreign DNA.3 4 In this system, the nucleic acid with methyl groups is protected from cutting by endonucleases as opposed to non-modified nucleic acid. These discoveries led R Holliday and J E Pugh in 1975 to suggest that it is not only methylation sensitive endonucleases, there are also other proteins that can distinguish covalently modified nucleotides present in regulatory regions, such as gene promotors. Here, these proteins regulate the expression of genes with specific covalent modifications in the promoter.5 Also in 1975, but independently A D Riggs suggested that DNA methylation may take part in X chromosome inactivation in eukaryotes.6 Nevertheless, at that time none of the above authors had experimental evidence proving the claims they were putting forward.

Figure 1

The timeline with schematic representation of discoveries described in this short review. 5mC, 5-methylcytosine.

Along with the discoveries leading to elaboration of the function of covalent modifications of genetic material, the term epigenetics was proposed to describe all mechanisms involved in the development of a phenotype from genotype. In 1979, R Holliday suggested that “the specific methylation of bases such as adenine or cytosine could provide epigenetic switches in gene activity”7 and from that point on the field began to recognise covalent modifications of DNA as a mechanism of epigenetic gene expression regulation. Today, after years of research, several different covalent modifications of genetic material have been discovered.

The most frequent covalent modification of nucleic DNA in mammals is the addition of methyl group to the 5th atom in the 6-atom ring, counting counterclockwise from the NH nitrogen at the 6 o'clock position. In the human genome, methylated cytosines (5-methylcytosine (5-mC)) account for about 4% of all cytosines and are frequently referred to as the 5th base.8 In general terms, genes with methylated cytosines in the promotor are not transcriptionally active. However, a substantial volume of research was needed to elucidate the function of this modification in gene expression regulation and we will briefly review these discoveries below.

The 5th base

Methylated cytosine (5-methylcytosine (5-mC)) was synthesised in 1904 by H L Wheeler and T B Johnson, who also showed that 5-mC has very similar chemical properties to cytosine.9 The presence of 5-mC in nucleic acid was first described by T B Johnson and R D Coghill in 1925,10 who obtained a crystallised form of cytosine and 5-mC from hydrolysed nucleic acid of tubercle bacillus and were able to distinguish picrate salt crystals of cytosine from 5-cytosine under a microscope with polarised light. Despite the fact that the paper describing those findings was widely criticised and other researchers failed to reproduce those results,1 11 this publication was significant as it accelerated the progress in the field. Further evidence for the presence of 5-mC in the genetic material was described in 1948 by R D Hotchkiss, who noticed an additional spot on paper chromatographs representing the 5th base, while studying the components of nucleic acids obtained from calf thymus.1 R D Hotchkiss also named the 5th base ‘epi-cytosine’. Subsequently, G R Wyatt showed that apart from calf thymus, 5-mC is present in DNA of several other species, including: beef spleen, ram sperm, herring sperm, locusts (whole) and wheat germ.12 The precise localisation of the 5-mC within the context of nucleotides in DNA was described in 1968 by P Grippo et al. This group, used DNase digest of the genetic material isolated from sea urchin embryos to show that the distribution of 5-mC in the DNA of this invertebrate is not random and 90% of 5-mC occurs within CpG dinucleotide separated from the digest.13

Subsequent analyses of the methylation within CpG sites showed that the methylation status of those dinucleotides differs between tissues and cells14 and CpG sites frequently group to the genomic regions with higher than expected concentration of CpG sites - CpG islands (CGI).15 With the advent of genome wide screening technologies both CGIs and single CpG sites outside of the CGIs were mapped to specific functional parts of the genomes, and 60% of human genes have been shown to have promoters embedded in CGI. This makes DNA methylation a potential mechanism of expression regulation for over half of the human genes.16 17

DNA methylation in disease

Studies of the significance of DNA methylation in disease started to rapidly develop from the publication by A P Feinberg and B Vogelstein in 1983. Using methylation-sensitive restriction enzymes, they showed that the global level of 5-mC in DNA from human cancer was significantly lower than in the DNA from the tissue that cancer originated from.18 Subsequently, S B Baylin et al, reported that the 5’ region of the calcitonin gene had a different number of methylated CpG sites in different cell lines.19 Those findings, together with the results of earlier experiments, in which they showed that the same cell lines had different levels of calcitonin gene expression,20 lead S B Baylin et al to suggest that methylation of a gene promoter regulates gene expression.19 The above discoveries paved the way for further studies on the role of DNA methylation in regulation of genes involved in oncogenesis.

The retinoblastoma suppressor gene (RB), considered at the time to be a tumour suppressor prototype, was the first gene for which the direct correlation between DNA methylation and gene expression was established. In 1989, V Greger et al while studying retinoblastoma tumours found that one of the patients, who was unilaterally affected and did not have familial history of retinoblastoma, harboured methylation of the RB gene promotor. With those observations, the authors speculated that hypermethylation of the gene could be involved in human neoplasia development and hypomethylation can result in spontaneous regression of tumours.21 Subsequently, in 1991, T Sakai et al confirmed V Greger’s results in a study performed on a larger number of retinoblastoma patients and additionally were able to demonstrate allele-specific methylation of the RB gene. Moreover, as the patient they studied with allele-specific RB gene methylation, did not harbour any mutation within the gene, they speculated that methylation alone could lead to cancer development.22 However, the direct evidence that hypermethylation of the RB gene promoter results in the gene inactivation was published in 1993 by N Ohtani-Fujita et al, who in the study performed on a neuroblastoma cell line showed that cells harbouring gene constructs with methylated RB promoter had different expression levels of the gene.23 The above, were the pivotal discoveries that in principle initiated and accelerated studies of the involvement of the methylation dependent gene expression regulation in disease.

The methodology to study the 5th base

The functional studies of DNA methylation accelerated with the development of methodologies facilitating differentiation of patterns of DNA methylation between various nucleic acid sources. This was initiated by A P Bird and E M Southern, who in 1978 adapted the previously described bacterial restriction enzymes4 to study DNA methylation in eukaryotes. Using DNA from two different tissues of vertebrate Xenopus laevis, they showed that some enzymes are able to cleave the DNA at non-methylated CpG sites, while DNA with methyl group attached to cytosine at the same CpG sites were resistant to digestion. Thus, the digestion of DNA with this type of enzymes resulted in methylation-specific DNA fragmentation.24 Another milestone in development of methodologies to study methylation patterns was discovery by C Waalwijk and R A Flavell, who also in 1978 showed that the enzyme MspI (an isoschizomer of HpaII), is capable of cleaving both unmethylated and methylated DNA, while HpaII cleaves only the unmethylated sites. This enabled the study of methylation status of single CpG dinucleotide.25 Using those enzymes C Waalwijk and R A Flavell showed that various tissues differ in methylation status at specific CpG sites.26 However, studies of DNA methylation biological function based on methylation-sensitive restriction enzymes digestion (MSRE) followed by Southern blot hybridisation, required large amounts of DNA, cannot detect methylation if a CpG site is methylated in only a few per cent of alleles in the sample and can only assess methylation within the enzyme’s recognition sites. A significant improvement of this approach was the combination of MSRE with PCR first used in 1990 by J Singer-Sam et al. In this approach a CpG site is flanked with PCR primers and presence or absence of the PCR product from the amplification of the DNA digest indicates the methylation status of that site.27 Although this combination increased the sensitivity of methylation detection, the limitation of the technology to assess only CpG sites within methylation-sensitive restriction enzymes recognition sequence, remained.

The breakthrough in development of technologies to study DNA methylation was the combination of sodium bisulfite modification of DNA with PCR amplification.28 Sodium bisulfite deaminates unmethylated cytosines to uracil, while methylated cytosines are resistant to that modification.29 Thus, treatment of the DNA with sodium bisulfite preserves the methylation status of the cytosines in a sequence of interest during PCR amplification, as in the PCR product amplified from the bisulfite modified template all non-methylated cytosines are amplified as thymine (or uracil) and methylated ones as cytosines.

One of the first and a landmark method that takes advantage of bisulfite conversion was methylation-specific PCR published by J G Herman et al in 1996.30 This method requires PCR primers designed to contain as many as possible CpG sites within primer. Those primers bind only to the template where CpG sites were methylated before bisulfite modification and amplify only fully methylated template. The disadvantage of this method is that another primer set needs to be designed and PCR performed to confirm non-methylated status of the screened locus. To overcome this limitation a second primer design approach was proposed where primers are targeted to the parts of the template devoid of CpG sites.31 32 That allows amplification of the locus of interest regardless of its methylation status and investigation of methylation status of that locus in post PCR manner with technologies such as sequencing or melting. A review of technologies for methylation studies is out of scope for this publication (more detailed information can be found in 33).

The PCR based technologies are indispensable in the assessment of methylation within specific loci. However, with technological advances the assessment of methylation at the genome level has become available. Microarray technologies first delivered the possibility to screen for methylation at thousands of loci, for example, all the promoters in single experiment with the oligonucleotide resolution. Subsequently, with the advent of next generation sequencing (NGS) we are currently able to obtain global maps of methylation within whole genomes at single nucleotide resolution. It is worth mentioning the enormous progress in the development of NGS for the study of methylation patterns. With the use of NGS, within a few years we went from being able to study methylation of single molecules in the pool of the genomes extracted from a tissue or a mix of tissues using ultra-deep bisulfite sequencing,34–36 to study methylomes of the single cell.37–39

Methylation biomarkers in the in-vitro diagnostic setting

The disease-related methylation changes involving single CpG sites as well as a region containing a number of CpG sites (eg, promoter CGIs) can be considered methylation biomarkers. The diagnostic utility of methylation alterations at single CpG sites is still to be elucidated. Though, with increasing research data indicating that a methylation of a single CpG can interfere with the binding of, for example, SP1 transcription factor40 or enhancers41 it is likely that methylation changes at single CpG sites will become clinically useful biomarkers. Currently however, the vast majority of the methylation biomarkers used in clinical practice are regional methylation changes in gene promoters that have been associated with clinical outcomes. The regional methylation changes are likely to remain the best methylation biomarker candidates. Mainly because, studies of the distribution of methylation across the genome show that the short distance between consecutive CpG sites is the best determinant of methylation status and the cytosines in the clusters of CpG sites (CpG islands) are characterised by uniform methylation status.42 Consequently, PCR-based methods are currently the most frequently used technologies in methylation biomarker screening in diagnostic laboratories. Those techniques are labour and cost effective, along with being relatively easy to implement in diagnostic setting, as opposed to the genome-wide methylation screening technologies which still are technically complex and require scientific knowledge for result interpretation. Regardless of the technological challenges in measuring methylation changes it is no longer a question if methylation biomarkers can significantly contribute to all the aspects of the personalised patient care, including: risk assessment, disease detection, clinical disease management and with the increasing number of the disease becoming chronic, monitoring for the relapse.43–45

The discussion of specific in-vitro diagnostic tests currently used or at the advanced stages of clinical validation for diagnostic use is out of scope for this short review, but the references to the studies reporting clinical validation data for the methylation biomarkers currently entering in-vitro diagnostics can be found in our recent review.46

Current state of the application of methylation biomarkers in clinical disease management

From the first evidence that methylation changes contribute to the disease phenotype, which also indicated that those changes can be used as biomarkers useful in clinical disease management, the number of the research publications indicating potential in-vitro diagnostic utility of disease-related methylation changes is growing exponentially (figure 2). Nevertheless, the use of disease-related methylation changes in in-vitro diagnostics is still lagging. This is mainly due to the fact that the introduction of biomarkers to in-vitro diagnostic testing is strictly regulated and requires systematic and comprehensive evidence of clinical validity for each biomarker test. This evidence can rarely be obtained in a research study, where potential disease-related biomarkers are discovered. Thus, the biggest challenge the field of methylation biomarkers is currently facing is to implement a systematic assessment of clinical validity of the large number of known disease-related methylation changes. Nevertheless, recent successes of the diagnostic test such as Cologuard (from Exact Sciences Corporation, USA) which is partly built on methylation biomarkers or Epi proLung (Epigenomics AG, Germany) which is solely built on methylation biomarkers, indicate that in the near future we will witness increasing use of methylation biomarkers in personalised medicine.

Figure 2

Increase in the number of publications per annum in the file of DNA methylation biomarkers (source PubMed).

Take home messages

  • Methylation of cytosine is the most common covalent modification of nucleotides in humans.

  • DNA methylation regulates gene expression in healthy cells.

  • Changes of gene methylation contribute to disease phenotype and can be used as biomarkers.

  • Aberrant methylation is potentially reversible and thus can be targeted by therapy.

Abstract translation

This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Ethics statements

Patient consent for publication


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Handling editor Tahir S Pillay.

  • Contributors OTL and KES: performed the literature searches, drafted the manuscript. TKW: performed the literature searches, drafted the manuscript and coordinated editing of the manuscript.

  • Funding This work was financed by the Polish National Agency for Academic Exchange

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.