Article Text

Download PDFPDF

Gene of the month: the 2019-nCoV/SARS-CoV-2 novel coronavirus spike protein
  1. Tahir S Pillay1,2
  1. 1 Department of Chemical Pathology, Faculty of Health Sciences, University of Pretoria & National Health Laboratory Service, Pretoria, South Africa
  2. 2 Division of Chemical Pathology, University of Cape Town, Cape Town, South Africa
  1. Correspondence to Professor Tahir S Pillay, Chemical Pathology, University of Pretoria, Pretoria, 0002, South Africa; jclinpatheic{at}


The year 2020 has seen a major and sustained outbreak of a novel betacoronavirus (severe acute respiratory syndrome (SARS)-coronavirus (CoV)-2) infection that causes fever, severe respiratory illness and pneumonia, a disease called COVID-19. At the time of writing, the death toll was greater than 120 000 worldwide with more than 2 million documented infections. The genome of the CoV encodes a number of structural proteins that facilitate cellular entry and assembly of virions, of which the spike protein S appears to be critical for cellular entry. The spike protein guides the virus to attach to the host cell. The spike protein contains a receptor-binding domain (RBD), a fusion domain and a transmembrane domain. The RBD of spike protein S binds to Angiotensin Converting Enzyme 2 (ACE2) to initiate cellular entry. The spike protein of SARS-CoV-2 shows more than 90% amino acid similarity to the pangolin and bat CoVs and these also use ACE2 as a receptor. Binding of the spike protein to ACE2 exposes the cleavage sites to cellular proteases. Cleavage of the spike protein by transmembrane protease serine 2 and other cellular proteases initiates fusion and endocytosis. The spike protein contains an addition furin cleavage site that may allow it to be ‘preactivated’ and highly infectious after replication. The fundamental role of the spike protein in infectivity suggests that it is an important target for vaccine development, blocking therapy with antibodies and diagnostic antigen-based tests. This review briefly outlines the structure and function of the 2019 novel CoV/SARS-CoV-2 spike protein S.

  • antiviral agents
  • infections
  • laboratory infection
  • viruses
  • virology

This article is made freely available for use in accordance with BMJ’s website terms and conditions for the duration of the covid-19 pandemic or until otherwise determined by BMJ. You may use, download and print the article for any lawful, non-commercial purpose (including text and data mining) provided that all copyright notices and trade marks are retained.
View Full Text

Statistics from


The pathogen severe acute respiratory syndrome (SARS)-coronavirus (CoV)-2 is a newly discovered member of the genus Betacoronavirus and is related to SARS-CoV but has been found to be readily transmitted between humans leading WHO to declare initially a Public Health Emergency of Immediate Concern and then later, a pandemic. In the new millennium, several CoVs have crossed species to cause severe and often fatal pneumonia: SARS-CoV; Middle-East respiratory syndrome (MERS)-CoV and the 2019 novel CoV/SARS-CoV-2. SARS-CoV-2 originated from Wuhan in the Hubei province of China and was first identified in December 2019. SARS-CoV-2 is the seventh CoV that has been shown to infect humans.1

CoVs are lipid enveloped and spherical with a size of approximately 100–120 nM (figure 1A). The nucleocapsid houses the single-stranded non-segmented positive-sense RNA genome.2 The name Coronavirus is derived from the appearance of the virus—there are large petal-shaped spikes protruding from the envelope—this is the spike glycoprotein which is 20–40 nm long. In some betacoronaviruses, there is a second shorter projection which is the haemagglutinin (HA) esterase protein3 (figure 1A). The most abundant protein, the M protein, provides structural support. The E protein is a small membrane protein that is essential for the assembly and release of virions.

Figure 1

Structure of the (A) novel coronavirus severe acute respiratory syndrome-CoV-2 and the (B) spike protein.

The genome sequence of SARS-CoV-2 was elucidated in January 2020.4–7 The SARS-CoV-2 genome is 96% identical to that of the bat coronavirus (BatCoV) RaTG13 and 80% identical to the SARS-CoV genome.7 The positive single-stranded RNA genome contains 29.3 kilobases4 6 (figure 2). A unique feature of the CoV family is the synthesis of a nested set of subgenomic mRNAs that are transcribed from negative strand RNAs. The subgenomic RNAs all contain a leader sequence from the 5′ end of the genome. The genome is transcribed in a non-contiguous manner where the viral RNA-dependent RNA polymerase (RdRp) skips across from one part of the genome to the next to generate the RNAs.8 9 The basis of the most commonly used nucleic acid detection tests for SARS-CoV-2 is centred around the real-time-PCR detection of the RdRp, E (envelope) and N (nucleocapsid) genes (figure 2).10 11

Figure 2

Figure 2 Schematic diagram of thegenomic structure of the 29.3 kilobase 2019-novel coronavirus (nCoV)gene and domain structure of the1273amino acidspike glycoprotein S (not to scale).E,envelope protein gene; M, membrane protein gene; N, nucleocapsid protein gene;RBM, receptor-binding motif;RdRP, RNA-dependent RNA polymerase; S, spike protein gene.

In common with other viruses, the genome of the virus encodes for proteins that facilitate cellular entry (figure 3). In the case of SARS-CoV-2 (and other CoVs), this is the spike protein (S), a 1273 amino acid homotrimeric class I fusion protein that allows the viral membrane to fuse with the host cell membrane.12 (A class I viral fusion protein is a viral protein primed by cleavage of a single-chain precursor that exists as a trimer. The classical example of a type 1 fusion protein is influenza virus HA. Other examples include Ebola and paramyxovirus.) The third open reading frame in the genome encodes the spike protein5 (figure 2). Each monomer of the spike protein trimer is approximately 180 kDa. The S protein is the largest protein in the group of four structural proteins (including, M, E and N proteins). The spike protein of SARS-CoV-2 has 76% amino acid sequence identity with SARS-CoV suggesting that it interacts with similar protein targets. The spike protein of SARS-CoV-2 shows 93% and 97% amino acid identity with that of the BatCoV RaTG13 and Pangolin-CoV, respectively, strongly hinting at the origin of the intermediate hosts of SARS-CoV-2.4 7 13 14 Owing to the strong conservation of residues and binding to the receptors, immunity to SARS-CoV can confer some very limited immunity to SARS-CoV-2 based on the observation that SARS-CoV antibodies are able to prevent the entry of SARS-CoV-2 into cells15 16 using either convalescent sera or immunised rabbit sera.16 However, in spite of the sequence identity, the electrostatic surface potential, antigenicity and epitopes of the spike glycoproteins from different CoV proteins are distinct, even though they use the same ACE2 receptor protein.17

Figure 3

Cellular entry of severe acute respiratory syndrome coronavirus 2 requires ACE2 and a cellular protease such as TMPRSS2.

The spike protein can be divided into a number of regions or functional domains or subunits. First, the globular head (or ‘petal’) contains the longer S1 N-terminal RBD. The stem contains the C-terminal membrane fusion domain (S2) followed by the two heptad regions (HR1 and HR2), the transmembrane (TM) domain and cytosolic tail2 (figure 2). The RBD of S1 is approximately 200 amino acids long. The binding of the S1 domain exposes regions within the S2 domain.18

There are significant post-translational modifications in the S protein. The spike protein ectodomain is heavily glycosylated with heterogeneous N-linked glycans19 and exists in a prefusion and a postfusion conformation. The oligosaccharides could influence priming by host proteases and determine antibody recognition. Cysteine residues in the cytosolic tail undergo palmitoylation.19 The switch in conformations is triggered by binding to its cellular receptor, ACE2 (figure 3).

Proteolytic conversion and activation of the spike protein prior to fusion

Entry of the virus is dependent on proteolytic activation of the spike protein. In the process of viral infection involving attachment of the virus and entry into cells, the spike protein is cleaved at the protease cleavage sites into the S1 and S2 subunits and the S2 subunit is released.17 18 20 There are several steps: these include binding of the virus to the cell surface; alteration of the conformation of the spike protein; proteolysis of the spike protein; release of the S2 subunit which then mediates fusion of the virion and endocytosis.21 22

The first step in the process of cellular entry is the binding of the spike protein to the cell surface receptor, ACE2 mediated by the RBD of S1.16–18 20 23 24 The critical function of the S protein in facilitating the interaction with the cell membrane indicates that it could be a target for therapy using antibodies or chemical compounds, as well as a vaccine target. Attachment of the S1 RBD to the ACE2 enzyme exposes a cleavage site on S2 that is acted on by host cell proteases such as TMPSSR2 to initiate the process of cellular entry.12 16

The RBD binds to the carboxypeptidase domain of ACE2. The RBD moves like a hinge between two conformations (‘up’ or ‘down’) to expose or hide the residues that bind the ACE2 receptor.25 Within the RBD, there is a receptor binding motif (RBM) which makes the primary contact with the peptidase domain of ACE2.26 27 The spike protein RBM is conserved (50%) between SARS-S and SARS-CoV2-S and these residues are absent from SARS-related-CoVs which do not use ACE2.

The cleavage process of the spike protein takes place in two steps: a ‘priming’ cleavage and and then an ‘activation’ cleavage21 22 at a cleavage site between the S1 and S2 domains as well as the S2′ site. Most CoVs spike proteins are cleaved at the junction between S1 and S2. The cleavage site between the S1/S2 domains contains multibasic arginine residues. Although SARS-CoV also uses the ACE2 receptor, SARS-CoV2 has a distinct furin cleavage site (Arg-Arg-Ala-Arg) at residues 682–685, between the S1 and S2 domains (figure 2) not found in SARS-CoV which may explain some of the biological differences. Removal of this motif affects cellular entry. The furin cleavage site expands the versatility of SARS-CoV-2 for cleavage by cellular proteases and potentially the tropism and transmissibility owing to the wide cellular expression of furin proteases especially in the respiratory tract.12 This means that newly synthesised virions can be secreted in a ‘preactivated’ state ready to fuse with and infect other cells without the need to bind to a cellular receptor, such as ACE2.

In the two-step sequential process, the activation cleavage at the S2′ site is thought to activate the protein for membrane fusion. This then allows fusion of the viral membrane with the cellular membrane. Inhibition of transmembrane protease serine 2 (TMPRSS2) using camostat mesylate partially blocks the entry of SARS-2-V.16 TMPRSS2 is also critical for infection by viruses such as influenza A.16

Apart from TMPRSS2, SARS-CoV-2 spike protein can be proteolytically activated by a variety of other cellular proteases including cathepsin B and L (endosomal cysteine proteases). Inhibition of all of these prevents viral entry. Other proteases such as furin, elastase, factor X and trypsin are also capable of this ‘priming’ proteolysis that initiates the process of cellular entry.22 28

There is some evidence that SARS-CoV-2 S protein can trigger protease-independent and receptor-dependent cellular fusion to enhance viral spreading. PIKfyve, the enzyme generating PI(3,5)P2, also regulates early-to-late endosomal maturation and inhibition of this may block viral entry.29

ACE2 as a receptor

The role of ACE2 as a receptor for CoVs was first identified during the outbreak of the SARS-CoV using coimmunoprecipitation studies.30 31 A similar role for ACE2 in SARS-CoV2 infection was identified in 2020.7 The S1 RBD (amino acid residues 333–527) of SARS-CoV-2 binds to ACE2 with nanomolar affinity implicating it as an important component of the viral cellular interaction.12 In comparison with SARS-CoV, the RBD of SARS-CoV-2 binds to ACE2 with more than 10-fold higher affinity.17 25 Within the RBD, there is a RBM at residues 438–506. Other cellular receptors that viruses use are mouse carcinoembryonic antigen-related cell adhesion molecule 1a (Mouse hepatitis virus); human dipeptidyl peptidase 4 (MERS-CoV and SARS-CoV).1

ACE2 is a dipeptidyl carboxypeptidase that primarily exists as an 805 amino acid type 1 TM (single TM protein with N-terminus extracellular) protein with a zinc-binding domain and is predominantly expressed in lung, heart, kidney, testis and gastrointestinal tract.32 33 In the lung, ACE2 is found in alveolar epithelial type II cells (and brochial epithelial and is a receptor for SARS-CoV, MERS-CoV and SARS-CoV-2.34 MERS-CoV also uses dipeptidyl peptidase 4 (DPP4/CD26) as a receptor.1 There is also a second circulating soluble form of ACE2. ACE2 appears to have distinct physiological functions from ACE and ACE inhibitors used clinically as antihypertensive agents do not affect ACE2 activity. The protease activity of ACE2 has no role in facilitating viral entry and it merely appears to act as a receptor to guide attachment and fusion of the spike protein. ACE2 contains two functional regions, the N-terminal M2 peptidase domain and the C-terminal collectrin domain which regulates the trafficking of amino acid transporters to the cell membrane. The physiological function of ACE2 is to convert angiotensin 1 to angiotensin(1-9) and angiotensin II to angiotensin (1-7).33 35 Angiotensin (1-9) binds to the Mas receptor leading to vasodilatation and anti-inflammatory effects. This is in contrast (antagonistic) to ACE which converts angiotensin I to angiotensin II. Angiotensin II causes vasoconstriction and proinflammatory effects via the angiotensin II receptor type 1.

The critical residues of the spike protein RBM that interact with the ACE2 domain have been identified from the cocrystal structures.18 26

While both SARS-CoV and SARS-CoV-2 bind to ACE2 and the binding sites overlap, there are structural differences in both spike proteins that influence binding characteristics; for example, the SARS-CoV-2 RBM has a four residue motif (Gly-Val-Glu-Gly) at 482–485 in the RBM that allows better contact with the ACE2. ACE2 contains two lysine residue virus hotspots that appear to be critical for CoV binding.18 These create positive charges that need to be neutralised by the CoV. Two key residues in the RBM of SARS-CoV-2, Gln493 and Leu455 bind to these hotspots18 leading to considerable stabilisation of binding and a higher affinity for ACE2 than SARS-CoV-2.

Using the structural information, variations in the ACE2 gene have been studied to identify susceptibility determinants and it is postulated that two ACE2 alleles rs73635825 (S19P) and rs143936283 (E329G) may confer resistance to SARS-Cov-2 infection.36

Other binding determinants that may play a role in the interaction of the spike glycoprotein include glycoproteins based on the differential susceptibility of SARS-CoV-2 relative to SARS-CoV implying that that interaction with ACE2 may not be sufficient for strong attachment to respiratory cells.37 A ganglioside-binding domain has been identified in the N-terminus of the SARS-CoV-2 and this domain may be mimicked by drugs such as chloroquine and hydroxychloroquine.37 Mouse studies have shown that ACE inhibitors and angiotensin receptor blockers increase the expression of ACE2. In humans, antihypertensives such as olmesartan (angiotensin receptor blocker) cause increased urinary soluble ACE2 and diabetes mellitus results in increased circulating ACE2. It is therefore speculated that the increased expression of ACE2 could potentiate infection with SARS-CoV-2. Increased soluble ACE2 may hinder binding of SARS-CoV-2 to cells and attenuate injury to cellular tissues. In engineered human tissues, administration of soluble recombinant ACE2 blocks the growth of SARS-CoV-2.38

In conclusion, the spike protein of SARS-CoV-2 plays a critical and fundamental role in viral infection and in comparison with SARS-CoV the structure has evolved to exploit respiratory cell receptors and proteases to enable rapid spread.

Take home messages

  • The spike protein of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is critical for viral entry into cells and is a target for vaccine development, blocking therapy and antigen testing.

  • The receptor for SARS-CoV-2 in the type II alveolar respiratory cell is ACE2. Binding is mediated by the receptor-binding domain and motif (RBM) of the S1 region of the spike protein.

  • Viral attachment, fusion and entry begins with the interaction of the spike protein with ACE2 and proteolytic cleavage by cellular proteases such as transmembrane protease serine 2.

  • SARS-CoV-2 binds to ACE2 with higher affinity than SARS-CoV owing to the presence of a 4-residue motif in the RBM that makes better contact with ACE2 than the spike protein of SARS-CoV.


View Abstract


  • Handling editor Runjan Chetty.

  • Contributors TSP is the sole author who conceived and wrote the manuscript.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; internally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Linked Articles