Aims Brazil is nowadays one of the epicentres of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic and new therapies are needed to face it. In the context of specific immune response against the virus, a correlation between Major Histocompatibility Complex Class I (MHC-I) and the severity of the disease in patients with COVID-19 has been suggested. Aiming at better understanding the biology of the infection and the immune response against the virus in the Brazilian population, we analysed SARS-CoV-2 protein S peptides in order to identify epitopes able to elicit an immune response mediated by the most frequent MHC-I alleles using in silico methods.
Methods Our analyses consisted in searching for the most frequent Human Leukocyte Antigen (HLA)-A, HLA-B and HLA-C alleles in the Brazilian population, excluding the genetic isolates; then, we performed: molecular modelling for unsolved structures, MHC-I binding affinity and antigenicity prediction, peptide docking and molecular dynamics of the best fitted MHC-I/protein S complexes.
Results We identified 24 immunogenic epitopes in the SARS-CoV-2 protein S that could interact with 17 different MHC-I alleles (namely, HLA-A*01:01; HLA-A*02:01; HLA-A*11:01; HLA-A*24:02; HLA-A*68:01; HLA-A*23:01; HLA-A*26:01; HLA-A*30:02; HLA-A*31:01; HLA-B*07:02; HLA-B*51:01; HLA-B*35:01; HLA-B*44:02; HLA-B*35:03; HLA-C*05:01; HLA-C*07:01 and HLA-C*15:02) in the Brazilian population.
Conclusions Being aware of the intrinsic limitations of in silico analysis (mainly the differences between the real and the Protein Data Bank (PDB) structure; and accuracy of the methods for simulate proteasome cleavage), we identified 24 epitopes able to interact with 17 MHC-I more frequent alleles in the Brazilian population that could be useful for the development of strategic methods for vaccines against SARS-CoV-2.
- HLA antigens
This article is made freely available for use in accordance with BMJ’s website terms and conditions for the duration of the covid-19 pandemic or until otherwise determined by BMJ. You may use, download and print the article for any lawful, non-commercial purpose (including text and data mining) provided that all copyright notices and trade marks are retained.https://bmj.com/coronavirus/usage
Statistics from Altmetric.com
In December 2019, a novel coronavirus strain was detected and isolated in the city of Wuhan, Hubei province, China.1 This emerging viral infection was associated with severe human respiratory disease with a fatality rate of ~2%–3%.2 Despite the strict containment measures, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has spread rapidly worldwide, with cases now confirmed in multiple countries and propagation still ongoing. On 30 January 2020, the WHO declared COVID-19 a public health emergency of international concern.3
Brazil is actually one of the epicentres of the pandemic.4 Day by day, the epidemic advances with a high daily rate of cases per million people. Nonetheless, these data may be underestimated because of the reduced number of SARS-CoV-2 molecular detection tests being made.5
The SARS-CoV-2 is an enveloped positive-strand RNA virus belonging to the Betacoronavirus genus of the Coronaviridae family of the Nidovirales order.6 SARS-CoV-2 genome has ~30 kilobases that encode structural and non-structural proteins.7 The 5′ encodes two large genes (ORF1a and ORF1b), which are translated into two polyproteins (pp1a and pp1ab), which are cleaved in a set of non-structural proteins essential for virus replication.8 The 3′-terminal region of the genome encodes four structural proteins, namely spike (S), nucleocapsid (N) envelope (E) and membrane (M).9
The protein S on the virion surface mediates receptor binding and membrane fusion.6 For SARS-CoV-2 to infect a host cell, it is mandatory that protein S is cleaved (S1/S2 cleavage) by host cell proteases into two units: the N-terminal ectodomain (S1) subunit and the C-terminal membrane-anchored (S2) subunit.10 During the receptor binding process, S1 binds via its receptor binding domain to the extracellular peptidase domain of the host receptor ACE-2, which is mainly expressed in the lung, gastrointestinal tract, kidney and heart tissues, although it is also present in other tissues, dictating viral tropism.10 11 After that, S2 subunit mediates membrane fusion. Although cell entry is not yet fully understood, it is likely that a second proteolytic site (S2′) at subunit S2 is required for viral entry.12 Viral entry triggers the host’s immune system and initiates an inflammatory cascade that starts with the mechanism of antigen presentation.6
During the process of presentation of antigenic peptides to CD8+ (cytotoxic) T cells by Class I Major Histocompatibility Complex (MHC-I) molecules, peptides are generated by proteasomal cleavage in the cytosol and transported to the lumen of the endoplasmic reticulum by a transporter associated with antigen processing protein before they can bind MHC groove and trigger an immune response.13 The correlation between MHC-I and the severity of the disease in patients with COVID-19 has been previously hypothesised.14
The understanding of which SARS-CoV-2 epitopes are immunogenic may help advances in the development of diagnostic kits and prophylactic vaccines. An immunoinformatic approach is suitable for an initial screening, since it may predict algorithms and test for the immunogenicity of a vast quantity of peptides and alleles in a cost-effective manner, reducing the costs and time on workbench. Here, we analysed SARS-CoV-2 protein S peptides aiming to prospect epitopes and their ability to elicit an immune response mediated by the most frequent MHC-I alleles in the Brazilian population.
Selection of the MHC-I alleles
We searched for the most frequent MHC-I alleles in the Brazilian population through the Human Leukocyte Antigen (HLA) Allele Frequency Database.15 We selected allele frequency data from overall Brazilian populations, excluding genetic isolates, such as indigenous tribes or Quilombos. After that, we calculated the average allelic frequency for each HLA-A, HLA-B and HLA-C allele with data from more than one population data. Finally, we chose the 10 most frequent class I alleles.
The Protein Data Bank (PDB) structure of some alleles was not available (table 1). Therefore, we used SWISS-MODEL16 to perform modelling by homology. The best model was chosen using the default parameters. Model refinement was made using 3Drefine software.17 The quality of the refinement was evaluated by the MolProbity score. For model validation, ERRAT18 and RAMPAGE19 softwares were used with their default parameters.
MHC binding affinity and antigenicity prediction
For immunogenicity inference, we used the NetMHCPan V.4.020 server, by importing the FASTA sequence of the SARS-CoV-2 spike protein retrieved from PDB (PDB id: 6VSB) that will be subsequently cleaved in peptides with a length of 10-mer. The interrogation of each 10-mer peptide was made querying the selected alleles ranked in the previous section. We accepted as possible candidates, those allele:peptide complexes with strong binding affinity, that is, those with %Rank score <0.50.
After that, we tested the strong binding peptides for probable antigenicity using Vaxijen V.2.0,21 using a threshold of score <0.5 to filter out possible non-antigenic peptides.
Peptide docking was carried out using the CABS-dock server.22 We submitted to a docking simulation, those peptides that showed strong binding affinity and probable antigenicity to their specific MHC-I allele using the default parameters. After the docking, we excluded the complexes that showed root mean squared deviation (RMSD) values above 3 Å to follow the molecular dynamics (MD).23
The complexes from the docking simulations (RMSD values below 3 Å) were further investigated by MD simulations using GROMACS 2020.1.24 The complex was inserted in a 0.7 nm cubic box, filling the voided space with SPC-E water molecules. Calcium and sodium ions were added for reaching the system electronic equilibrium. Energy minimisation was carried out using default parameters. The equilibrium phase was made during 10 ns (for temperature and pressure), while the production phase lasted 100 ns. These steps were also performed using default parameters.
In order to investigate the stability of the complexes, we calculated RMSD, root mean squared fluctuation, radius of gyration (Rg), solvent accessible surface area and the determination of hydrogen bonds.
Results and discussion
Table 1 shows the most frequent HLA-A, HLA-B and HLA-C alleles, according to the HLA Allele Frequency Database. The top 10 HLA-A alleles correspond to 68.7% of the A* alleles, whereas the top 10 HLA-B alleles represent 53% and the HLA-C alleles 68.8%. These values may not be sufficient to depict the MCH-I allelic variation, but it can give an estimation of the major diversity of the class I genes.
We modelled the MHC-I alleles, whose PDB structures were not available. The templates used for model prediction, refinement and validations are reported in online supplementary table 1. Overall, the models were considered suitable for docking analysis since they presented significant amino acid sequence similarity to the templates and were above RAMPAGE and ERRAT thresholds (98% and 80%, respectively).
Two hundred twenty-eight peptides were considered with strong binding affinity, according to NetMHCpan V.4.0 results (online supplementary table 2); of these, 73 were also considered probably antigenic based on the Vaxijen V.2.0 results. We carried out peptide docking simulations for all of them except for two (pep88 and pep111), which could not be docked by CABS-dock server, possibly due to some issues with the MHC-I structure. As result, 61 peptides had good docking predictions, as they showed average RMSD <3 Å (online supplementary table 3).
These complexes comprised 24 different MHC-I alleles: HLA-A*01:01 (four epitopes), HLA-A*02:01 (2), HLA-A*03:01 (3), HLA-A*11:01 (5), HLA-A*23:01 (4), HLA-A*24:02 (3), HLA-A*26:01 (3), HLA-A*30:02, HLA-A*31:01 (3), HLA-A*68:01 (5), HLA-B*07:02 (3), HLA-B*18:01 (1), HLA-B*35:01 (4), HLA-B*35:03 (2), HLA-B*38:01 (2), HLA-B*44:02 (2), HLA-B*44:03 (1), HLA-B*51:01 (1), HLA-C*05:01 (2), HLA-C*07:01 (2), HLA-C*07:02 (1), HLA-C*08:02 (1), HLA-C*15:02 (1) and HLA-C*17:01 (1). To evaluate the stability of these complexes, we completed MD simulations to make trajectory analyses.
Twenty-four MHC-I and protein S peptides (pep) complexes presented good results from MD simulations; these are summarised in table 2. Trajectory analysis of HLA-A alleles showed that the most compact and rigid complexes were: HLA-A*01:01 with the peptides pep2 and pep3 (online supplementary figure 1); HLA-A*02:01 with pep6 and pep7 (online supplementary figure 2); HLA-A*11:01 combined with pep15 and pep18 (online supplementary figure 4); HLA-A*24:02 with pep24 (online supplementary figure 6); HLA-A*68:01 with pep65 (online supplementary figure 10); HLA-A*23:01 in complex with pep83 (online supplementary figure 5); HLA-A*26:01 with pep85 and pep87 (online supplementary figure 7); HLA-A* 30:02 with pep92 (online supplementary figure 8); and HLA-A*31:01 in complex with pep96 (online supplementary figure 9). It is worth noting that even among the most stable complexes, the solvent accessibility varied.
This variation was due to the peptides binding in different positions to the HLA-A alleles. The other peptides (less compact and more flexible) showed more fluctuations in the loop conformations of the complex, including in the peptide structure by itself. In fact, a comparison among the structures confirmed that the major difference among the complex is in the peptides and not in the regions of HLA-A without secondary conformation (online supplementary figures 25 and 26). However, although these differences may affect the stability of the complexes, they do not induce structural modifications sufficient to unfold the protein.
Regarding HLA-B alleles, the complexes with minimal changes from the initial form during the simulations were: HLA-B*07:02 with pep59 (online supplementary figure 11); HLA-B*51:01 in complex with pep70 (online supplementary figure 18); HLA-B*35:01 with pep72, pep73 and pep75 (online supplementary figure 13); HLA-B*38:01 with pep79 (online supplementary figure 15); and HLA-B*35:03 in complex with the peptides pep97 and pep98 (online supplementary figure 14). Similar to the most compact and stable complexes of HLA-A, the major differences among the most rigid and flexible complexes were not in the peptides itself but in regions of HLA-B alleles without secondary conformations (online supplementary figures 27 and 28).
Analysing the trajectories for HLA-C, it was possible to observe that the most stable complexes were HLA-C*05:01, with pep103 (online supplementary figure 19); HLA-C*07:01, with pep105 (online supplementary figure 20) and HLA-C*15:02, with pep109 (online supplementary figure 23). The latter complex, though, appeared to be stable only by the end of the simulation.
There are a few works in the literature using a similar approach to evaluate possible immunogenic peptides in the SARS-CoV-2 protein S. One study identified five CD8+ T cells epitopes (YLQPRTFLL, GVYFASTEK, EPVLKGVKL, VVNQNAQAL and WTAGAAAYY) and eight B cell epitopes that bind the MHC class I and II alleles more frequent in China.25 A second study found 13 MHC-I (SQCVNLTTR, GVYYHKNNK, GKQGNFKNL, GIYQTSNFR, VSPTKLNDL, KIADYNYKL, KVGGNYNYL, EGFNCYFPL, GPKKSTNLV, SPRRARSVA, LGAENSVAY, FKNHTSPDV and DEDDSEPVL) and three MHC-II possible protein S antigenic peptides.26Joshi et al27 reported the MHC-I ITLCFTLKR as a possible candidate for vaccine development. Comparing these previous results with our simulations, only GWTAGAAAYY complexed with HLA-A*01:01 and HLA-A*26:01 presented good NetMHC-Pan and Vaxijen scores, as well as good docking stability and RMSD during the MD.
In conclusion, being aware of the intrinsic limitations of in silico analysis (differences between the real and the PDB structure, accuracy of the methods for simulate proteasome cleavage, as well as molecular modelling, docking and dynamics’ shortcomings), we described 24 epitopes present in the SARS-CoV-2 protein S that could interact with 17 different MHC-I alleles in the Brazilian population. These epitopes can elicit an effective CD8+ T cells immune response and could be useful to develop strategic methods for vaccines against COVID-19. Finally, our immunoinformatic approach could be a useful tool to determine a guided starting point to design and develop epitope-based vaccines.
Take home messages
The understanding of which severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) epitopes are immunogenic may help in the development of diagnostic kits and prophylactic vaccines.
An immunoinformatic approach is suitable for Major Histocompatibility Complex Class I (MHC-I) screening, to predict the immunogenicity of a vast quantity of peptides and alleles in a cost-effective manner.
SARS-CoV-2 protein S peptides have been analysed in silico aiming to prospect epitopes and their ability to elicit an immune response mediated by the most frequent MHC-I alleles in the Brazilian population.
Twenty-four epitopes present in the SARS-CoV-2 protein S have been observed, which could interact with 17 different MHC-I alleles identified in the Brazilian population.
CAS-S and AMB-I thank CAPES (Coordination for the Improvement of Higher Education Personnel, Brazil) and CNPq (Brazilian National Council for Scientific and Technological Development) for financial support and scholarships.
Handling editor Runjan Chetty.
Contributors RRdM, CAS-S, BRA and NS made substantial contributions to the conception or design of the work, or the acquisition, analysis or interpretation of data. RRdM, AA, SC, CAS-S, LB and AMB-I drafted the work or revising it critically for important intellectual content. SC and LB made the final approval of the version published. The authors do agree that they have the accountability for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Funding This study has been supported by the following grants: Italian-Slovenian Ecosystem for Electronic and Mobile Health from European Community (grant number: 07/2019) and BioHub—A High-throughput Platform For OMICs Data Analysis And Integration from the Italian Ministry of Health (grant number: RC03/20). LB is recipient of a senior fellowship from the Brazilian National Council for Scientific and Technological Development (CNPq) (grant number: 308540/2017). RRdM is recipient of a senior fellowship from RC03/20 project from IRCCS Burlo Garofolo, Trieste, Italy.
Competing interests None declared.
Patient consent for publication Not required.
Provenance and peer review Not commissioned; internally peer reviewed.
Data availability statement Data are available upon reasonable request. All data relevant to the study are included in the article or uploaded as supplementary information. Raw data generated in this study may be requested by sending an email to firstname.lastname@example.org.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.