Article Text


Signature sequence validation of human papillomavirus type 16 (HPV-16) in clinical specimens
  1. Sin Hang Lee,
  2. Veronica S Vigliotti,
  3. Suri Pappu
  1. Milford Hospital, Milford, Connecticut, USA
  1. Correspondence to Dr Sin Hang Lee, Department of Pathology, Milford Hospital, 300 Seaside Avenue, Milford, CT 06460, USA; sinhang.lee{at}


Aims Persistent infection indicated by detection of human papillomavirus 16 (HPV-16) on repeat testing over a period of time poses the greatest cervical cancer risk. However, variants of HPV-16, HPV-31 and HPV-33 may share several short sequence homologies in the hypervariable L1 gene commonly targeted for HPV genotyping. The purpose of this study was to introduce a robust laboratory procedure to validate HPV-16 detected in clinical specimens, using the GenBank sequence database as the standard reference for genotyping.

Methods A nested PCR with two pairs of consensus primers was used to amplify the HPV DNA released in crude proteinase K digest of the cervicovaginal cells in liquid-based Papanicolaou cytology specimens. The positive nested PCR products were used for direct automated DNA sequencing.

Results A 48-base sequence downstream of the GP5+ priming site, or a 34-base sequence upstream thereof, was needed for unequivocal validation of an HPV-16 isolate. Selection of a 45-base, or shorter, sequence immediately downstream of the GP5+ site for Basic Local Alignment Search Tool sequence analysis invariably led to ambiguous genotyping results.

Conclusions DNA sequence analysis may be used for differential genotyping of HPV-16, HPV-31 and HPV-33 in clinical specimens. However, selection of the signature sequence for Basic Local Alignment Search Tool algorithms is crucial to distinguish certain HPV-16 variants from other closely related HPV genotypes.

  • HPV-16
  • HPV-16 genotyping
  • validation
  • DNA sequencing
  • signature DNA sequences
  • gynaecological pathology
  • STD
  • tumour virology

Statistics from


Almost all cervical cancers or precancerous lesions harbour a human papillomavirus (HPV) long before a cytohistological diagnosis is established. The risk of developing precancer or cancer is greatest in women positive for the same ‘high-risk’ genotype of HPV on repeat testing over a period of time as an indication of persistent of HPV infection.1–4 Therefore, accurate HPV genotyping may play an important role in clinical management. The College of American Pathologists has urged rigorous validation for all HPV assays performed in clinical laboratories.5 6

Since HPV-16 is consistently detected in 50% of cervical cancers and cervical intraepithelial neoplasia 3 (CIN3) lesions,7 8 a reliable method to detect and validate HPV-16 in Pap cytology specimens would be a valuable tool to follow the patients with persistent HPV-16 infection before a precancer Pap cytology becomes obvious.

Accurate genotyping of HPV, especially of HPV-16, is challenging in clinical pathology. Two commercial HPV genotyping kits: the SPF10-line probe assay (SPF10-LiPA; DDL Diagnostic Laboratory, Voorburg, The Netherlands) and the line blot assay (Roche Molecular Systems, Pleasanton, California, USA), have been evaluated against each other with only poor to intermediate agreement in genotyping results.9 When the same specimens were tested in parallel for comparison, the SPF10-LiPA detected more HPV-31 than HPV-16, while the line blot assay detected more HPV-16 than HPV-31.9 10 These results indicate that some HPV-16 and HPV-31 isolates from clinical samples might have been classified as one or the other interchangeably. The direct automated Sanger sequencing method11–21 may validate type-specific signature DNA sequences for HPV-16 differential genotyping.

Materials and methods

A total of 2740 alcohol-preserved Cytyc or Surepath liquid-based specimens submitted by the gynaecologists practicing in southern Connecticut for HPV PCR testing were included in this analysis. These HPV tests were primarily ordered by the physicians affiliated with Milford Hospital for their patients 30 years and older (up to 65 years) as adjunctive screening to routine Pap cytology and for patients below the age of 30 years who had a cytology result of atypical squamous cells of undetermined significance or more severe changes. In this rural and suburban population, the cervical cancer prevalence is less than 6.8/100 000 women.22 The HPV positive prevalence rate for the patients below 30 years old was found to be 36.1%, and that for those 30 years and older was 7.3%.18 Publication of laboratory data with concealed patient identities was approved by the Milford Hospital Institutional Review Board.

For HPV detection and genotyping, the pellet derived from 5% of the liquid-based Pap cytology sample was digested in proteinase K, and 1 μl of the digestate was used for nested PCR amplification, first with a pair of MY09/MY11 degenerate outer primers and then with a nested PCR primer pair (ie, a GP5+/GP6+ pair or a GP6/MY11 pair (or its equivalent)).17–19 21

The positive nested PCR products were subjected to direct automated DNA sequencing without further purification.17–21 An exclusive unique ‘100% identities’ match between the query and subject sequences, returned by the Basic Local Alignment Search Tool (BLAST) online algorithm, was required for genotyping except for variants not yet recorded in the GenBank, as reported previously.17–19 21


Analysis of a DNA sequence of 34 bases immediately downstream of the GP5+ priming site was adequate for accurate genotyping of all HPV variants encountered, except for HPV-16, HPV-31 and HPV-33 (figure 1). The online BLAST sequence algorithm reports always noted that some variants of HPV-16, HPV-31 and HPV-33 in the GenBank database share a short sequence homology for up to 45 bases in this region (figure 2), indicative of an ambiguous typing result, when a sequence of the HPV-16 L1 gene in this region of less than 46 bases was submitted for BLAST algorithm. Genotyping distinction between these three HPV types depended on identifying the sequence of the adjacent three ‘crucial’ bases further downstream. A crucial three-base sequence ‘GTT’ identified one of the HPV-16 variants represented by HPV-16 GenBank Locus FJ006723; a sequence ‘GGT’ identified one of another group of HPV-16 variants represented by HPV-16 GenBank Locus AF134178; a sequence ‘GGC’ identified an HPV-31 variant (GenBank Locus EF140820); and a sequence ‘CGT’ identified an HPV-33 variant (GenBank Locus DQ448214).

Figure 1

Computer-generated electropherogram showing a 72-base sequence of the L1 gene of a human papillomavirus 16, Locus FJ006723 or its variant. A GP5+/GP6+ nested PCR amplicon was the template. The sequence of the GP5+ priming site has been deleted from the upstream 5′ end on the right. The 3-base crucial sequence GTT, for differential genotyping of HPV-16, HPV-31 and HPV-33, is underlined.

Figure 2

Alignment of four 72-base sequences representing two common variants of human papillomavirus 16 (HPV-16), one variant of HPV-31 and one variant of HPV-33, immediately downstream of their GP5+ priming site (not shown). The three crucial bases for differential genotyping are bold.

Using the crude proteinase K digest of clinical materials of a complex nature for PCR and direct automated DNA sequencing might generate electropherograms of varying qualities. When the quality of an electropherogram was high, recognition of the 3-base ‘crucial’ sequence (figure 1) for differential genotyping of HPV-16 was straightforward by selecting at least 48 bases downstream of the GP5+ priming site for a BLAST analysis. However, when the electropherogram was less than perfect (figure 3), an inexperienced sequence analyst might have selected the first 34 bases for BLAST sequence alignment algorithms, leading to genotyping ambiguities or errors. For the latter reason, a GP6/MY11 heminested or an equivalent HiFi nested PCR primer amplicon was used to generate an extended electropherogram with an additional base-calling stretch upstream of the GP5+ priming site. In this upstream stretch, the GP5+ priming site became part of the sequence useful for base calling.

Figure 3

This is a less than perfect computer-generated electropherogram similar to that shown in figure 1. An imperfect base-calling sequence tracing may prevent selection of a long enough sequence inclusive of the three crucial bases GTT (underlined) for Basic Local Alignment Search Tool (BLAST) algorithm, causing ambiguity in differential genotyping of human papillomavirus 16.

Any 34-base sequence in the extended upstream segment, including the entire 23 bases at the GP5+ binding site or a fraction of it, was found to be sufficient for distinguishing any HPV-16 variants from other closely related HPV types without a reasonable doubt. As illustrated in the two sample electropherograms selected for demonstration (figure 4 and figure 5), all HPV-16 isolates detected in this population had a ‘100% identity’ match in sequence with a variant of one of the two HPV-16 prototypes: Locus FJ006723 and Locus AF134178. These latter two representative prototypes of HPV-16 differ from each other by only 1 base at nucleotide 47 downstream of the GP5+ priming site. But each of them in turn depends on the 3-base crucial sequence at this site for distinction from certain variants of HPV-31 or HPV-33 (figure 2).

Figure 4

This is the extension of a sequence electropherogram to include the 23-base GP5+ priming site (underlined) and 27 bases further upstream. All variants of human papillomavirus 16 (HPV-16) contain a 34-base type-specific ‘100% identity’ matched sequence selected randomly from the 50 bases on the right. Note the crucial three-base sequence GTT (underlined) in the region downstream of the GP5+ priming site for HPV-16 Locus FJ006723 and variants.

Figure 5

Sequence electropherogram almost identical to that in figure 4, characteristic of another common human papillomavirus 16 (HPV-16) variant with the 23-base GP5+ priming site underlined. Note the three-base crucial sequence GGT (underlined) in the region downstream of the GP5+ priming site for genotyping of HPV-16 Locus AF134178 and variants.

Of the 2740 specimens analysed, 202 (7.4%) were found to be positive for one of the 13 ‘high-risk’ HPV genotypes targeted by the FDA-approved Digene HC2 assay. Specimens with mixed HPV infections were excluded from this study. In this local patient population, mixed HPV infections were found in 4.7%–8.5% of the HPV-positive specimens.18 19

Each nested PCR amplicon was confirmed by matching its sequence with a unique genotype-specific DNA sequence through online BLAST algorithms, using the HPV DNA sequences deposited in the GenBank database as the reference. There were 35 single HPV genotypes identified, including the high-risk and low-risk types. The general distribution pattern of the individual HPV genotypes in this rural and suburban population has been published elsewhere.18

HPV-16 was found to be the most prevalent among the ‘high-risk’ HPV infections (n68; 33.6%). The HPV-16 DNA in 28 of 68 positive specimens (41.2%) relied on a nested PCR for detection. The prevalence of the individual ‘high-risk’ HPV genotypes is summarised in table 1.

Table 1

Genotype prevalence of 202 ‘high-risk’ HPV isolates from 2740 specimens in southern Connecticut

There were 10 high-grade squamous intraepithelial lesion (HSIL) results in the 2740 companion Pap cytology reports. Correlation of the HPV genotyping data with the Pap cytology results showed that six cases of HSIL were associated with a single HPV-16, two cases of HSIL with a HPV-31, one HSIL with a HPV-52, and one HSIL with a HPV-69 infection. No HSIL cytology was associated with a mixed HPV infection.

Colposcopic biopsies were performed on 18 of the 68 patients positive for HPV-16, including the six HPV-16 positive patients whose Pap cytology results were HSIL. Of these six HPV-16 positive HSIL cases, two (2/6) were finally confirmed by histology to harbour a CIN3/2 lesion, and the other four (4/6) were found to be histologically negative on colposcopic biopsy, indicating that reversible HSIL cytology was associated with an HPV-16 infection in these four patients. Therefore, the positive predictive value (ppv) of HSIL cytology for CIN3/2 was 33.3%.

Among the other 12 HPV-16 positive patients with a Pap cytology lower than HSIL, but selected by the gynaecologists for cervical biopsies, four (4/12=33.3%) were found to harbour a CIN3/2 histology.

There was no invasive cervical cancer recorded in the 68 patients who were found to be positive for HPV-16 DNA in their Pap cytology specimens. Since only 12 of the 62 patients whose cytology was positive for HPV-16 and whose Pap cytology was lower than HSIL were subjected to colposcopic biopsies, the gynaecologists in this community apparently did not rely on a one-occasion HPV-16 genotyping test result as the determining triage tool for referral to colposcopic biopsy. The availability of a routine HPV genotyping for follow-up of persistent HPV infections and a well-known low cervical cancer prevalence rate in this local patient population with an above-average level of healthcare education might have influenced the practice of the local gynaecologists.


The role of persistent infection by HPV as a tumour promoter in cervical cancer induction is well recognised.1–4 23 The risk of developing precancerous lesions or cancer is greatest in women positive for the same genotype of HPV on repeat testing over a period of time.1–4 The medical community is now urged to emphasise the persistence of a cervical HPV infection, not the single-time detection of HPV, in management strategies and health messages.24

According to the definition adopted in virology and used by the GenBank, a genotype of HPV differs in its L1 gene DNA sequence by at least 10% from every other known HPV type.25 However, sequence dissimilarities are unevenly distributed with scattered short sequence homologies between variants of different papillomavirus genotypes in various segments of the L1 gene.26 As shown in the GenBank database, there are a few short sequence homologies, including one of 45 bases, between the HPV-16, HPV-31 and HPV-33 in their L1 genes downstream of the GP5+ priming site (Figure 2).

Some of the hybridisation probes used for HPV-16 genotyping, including the HPV-16A probe (GATATGGCAGCACATAATGAC) published by Roche Molecular Systems, Alameda, California, USA,27 are within this 45-base sequence homology. Such a probe will hybridise with any variants of HPV-16, HPV-31 or HPV-33 with a DNA strand complementary to its sequence (figure 2). Another HPV-16 capture probe (GTAGTTTCTGAAGTAGATATGG) used for chip development28 contains a strand of seven nucleotides of the aforementioned Roche HPV-16A probe with an extension downstream to include the 3-base crucial sequence GTT. This latter HPV-16 capture probe would theoretically be able to discriminate against the HPV-31 and HPV-33 variants, which do not have a 3-base sequence complementary to GTT in this location (figure 2). By the same token, however, this probe will also fail to capture those HPV-16 variants with a GGT complementary in this location, for example, HPV-16 GenBank Locus AF134178 and its variants (figure 2 and figure 5). According to the GenBank database, the unique signature sequence of HPV-16 in this region for genotyping differentiation from HPV-31 and HPV-33 is located in the 50-base segment further upstream, including the 23 bases of the GP5+ binding site (underlined in figure 4). This information appears to have been overlooked by some manufacturers of commercial test kits using GP5+ and GP6+ PCR primers for HPV-16 DNA amplification although a 22-base hypervariable strand of DNA upstream of the GP5+ binding site has been used for probe design by the SPF10-LiPA.

When relying on ‘high-risk’ HPV DNA testing in combination with cytological screening for triage, more than 95% of referrals to colposcopic biopsy for detection of cancer and precancer have been found to be excessive or unnecessary.29 It has been suggested that increase in test specificity may lead to large changes in referral rates.29 Our experience shows that using the most sensitive and specific nested PCR/DNA sequencing method for HPV DNA testing and a HSIL cytology result as the endpoint for evaluation, a one-occasion HPV test result may have a 100% clinical sensitivity, a 94% clinical specificity, and a 100% negative predictive value, but its ppv is only 3.4%–6% in a rural and suburban US population, with a cervical cancer prevalence rate of <6.8/100 000 women.18 19 According to the data presented in this series, if all 68 women found to be positive for HPV-16 were subjected to colposcopic biopsy work up, the one-occasion HPV-16 test would have a ppv of 8.9% (6/68) provided CIN3/2 is used as the endpoint for evaluation. The appropriate use of HPV-16 genotyping is to identify potentially carcinogenic persistent infections,30 not as a referral tool to colposcopic biopsy.

Take-home message

  • Accurate human papillomavirus 16 (HPV-16) genotyping plays an important role in cervical cancer risk management. However, errors in HPV-16 genotyping may occur because certain variants of HPV-16, HPV-31 and HPV-33 share short sequence homologies in the L1 gene commonly targeted for probe hybridisation. This study shows that the signature DNA sequence for differential genotyping is located in a 50-base segment outside of this common target.


View Abstract


  • Competing interests Dr Sin Hang Lee declares that he is a pathologist at Milford Hospital, Milford, Connecticut, USA, and the director of Milford Medical Laboratory. Dr Lee receives a fixed salary from the hospital which charges fees for cancer biopsies and HPV testing. Dr Lee is also the president and a shareholder of HiFi DNA Tech, LLC (, a company specialised in transferring the Sanger DNA sequencing technology to community hospital laboratories for DNA testing. Veronica S Vigliotti: None to declare. Dr Suri Pappu: None to declare.

  • Ethics approval Ethics approval was obtained from the Milford Hospital Institutional Review Board.

  • Patient consent Not obtained.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: and

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.