Peptides and synthetic peptide-like molecules are powerful tools for analysis and control of the biological function. Instability of their interactions with biomolecules is the major problem with the use of peptides, with the typical micromolar affinity relating to the limited accessible surface area and the intrinsic the flexibility of peptides. However, appending a short peptide tag is the most common way to allow a protein of interest to be isolated or detected, giving minimum perturbation to protein function. This technique is explained using two examples in this article.

Application 1

Here we have designed a way to bind a peptide tag irreversibly, by adopting a recently discovered feature of amino acid chemistry: the spontaneous formation of an amide bond between a Lys and an Asn side chain in the appropriate environment. Amide linkages outside of the protein main chain are termed isopeptide bonds. Isopeptide bonds are chemically stable and resistant to most proteases. Enzymes such as transglutaminases catalyze isopeptide formation, stabilizing the extracellular matrix and strengthening blood clots, but these enzymes are large and have low sequence specificity. Recently, certain proteins were discovered to autocatalyze single-turnover isopeptide bond formation, yielding ultrathin viral capsid chain mail, or the proteolytically stable pili of Gram-positive bacteria, through nucleophilic attack of the ε-amino group from a Lys to the Cγ group of an Asn, promoted by a nearby Glu.


To apply spontaneous isopeptide bond formation to direct new covalent peptide interactions, the major pilin protein Spy0128 from Streptococcus pyogenes was dissected and explored whether the two fragments would covalently associate. Split proteins have successfully reconstituted in many cases, including enzymes and fluorescent proteins, albeit through noncovalent interactions. Spy0128 was split at the final L-strand of the C-terminal domain, to give the fragment pilin-C (Spy0128 residues 18-299, with N-terminal His6) and the isopeptag (Spy0128 residues 293-308: TDKDMTITFTNKKDAE). This placed the reactive Asn on the isopeptag and the reactive Lys on pilin-C. To enhance recombinant expression in E. coli, the isopeptag was genetically fused to the N-terminus of maltose binding protein (MBP).

Testing Covalent Reaction

To test whether the the two grafments have covalently assoaciated, isopeptag-MBP and pilin-C are mixed, each at 10 μM, and boil the samples in SDS before SDS-PAGE. A new product formed at ~80 kDa, consistent with a reaction between isopeptag-MBP and pilin-C. Verification of amide bond formation between isopeptag-MBP and pilin-C is done by mass spectrometry, demonstrating the loss of NH3 upon reaction. Pilin-C K179A, lacking the reactive Lys, did not form a covalent complex with isopeptag-MBP, determined by SDS-PAGE and mass spectrometry. Also, pilin-C did not react with MBP fused to an alternative peptide containing four potentially reactive Asn residues (MBP-isopeptag-N).

Spy0128 contains another isopeptide bond in its N-terminal domain. The general strategy of designing spontaneous amide bond-forming peptides is shown, by dissecting Spy0128 at its N-terminal L-strand, with this time the reactive Lys on the peptide (isopeptag-N) and the reactive Asn on the protein fragment (pilin-N): these partners also formed a covalent bond to each other when mixed.

The exact features of pilin-C and isopeptag important for reaction are determined in this methodology: truncating pilin-C earlier or later in the final L-strand did not substantially change the reactivity, but reaction was dramatically reduced upon truncating the isopeptag by the 5 residues of the loop preceding the final b-strand.

The speed of pilin reconstitution is tested: with each partner at 10 μM, a reaction was clearly detectable at 1 h and at later time points reached 60% yield. With a 2-fold excess of isopeptag, 98% of pilin-C was able to react in 24 h.


The concentration-dependence of the reaction was  tested by incubating both partners at 1, 5, or 10 μM: the extent of reconstitution increased according to concentration over this range. Surprisingly, the yield and speed of reaction were largely temperature-independent at 4-37 °C. Reaction was also largely independent of pH at pH 6-8 but was reduced by 15% at pH 5 after 24 h. Bond formation proceeded to a similar extent in a range of biological buffers, including with detergent, and with no requirement for any particular monovalent or divalent ions.

The conditions to prevent spontaneous amide bond formation is not yet tested. The rate of the intramolecular Lys-Asn bond formation has not been determined, because the reaction had gone to completion when the pilin was isolated, but it is likely to be substantially faster than the ~25 min generation time of S. pyogenes; future screening of phage-display peptide libraries may identify isopeptag variants that associate rapidly and approach the intramolecular rate of reaction.

To demonstrate that this spontaneous amide bond formation would occur within living cells, we made a bicistronic construct, with pilin-C and isopeptag-MBP expressed from the same promoter. Inside the cytosol of E. coli, pilin-C but not the pilin-C K179A control efficiently reconstituted with isopeptag-MBP.

Specificity Test

To test the specificity of the pilin-C: isopeptag interaction in a complex environment, The isopeptag to the surface of mammalian cells are targeted. Isopeptag-CFP-TM was labeled by pilin-C, but no binding was detected by the control pilin-C K179A, indicating the good specificity of isopeptide formation on cells.

The detailed methodology to harness autocatalytic side chain amide bond formation to provide a new possibility for a genetically encoded covalent reaction between a peptide and a protein is illustrated in this method. This reaction proceeded with similar efficiency at 4 and 37 °C; since there must be an activation barrier to the reaction,the limiting step is an association of the isopeptag with pilin-C in a conformation suitable for reaction is the rate limiting step and that such a conformation is less stable at elevated temperature. However, this temperature independence opens up the possibility of isolation of  isopeptag containing proteins from cell lysates at 4 °C, to minimize sample degradation. Some split proteins do not reconstitute or remain soluble at 37 °C, but we obtained reaction at 37 °C and observed solubility >200 μM for pilin-C and isopeptag-MBP.  A small amount of side products of the pilin-C reaction is formed, which may point to alternative conformations where amide bond formation can occur. It will be valuable to explore the behavior upon splitting of several of the other domains known to contain spontaneous isopeptides. Spontaneous amide bond formation proceeded over a pH range from 5 to 8, indicating that it could be applied even in low pH cellular compartments such as endosomes. Pilin-C and the isopeptag do not contain cysteines and so the redox status of the compartment should not matter for reaction, in contrast to bisarsenicals and most split inteins.

The specificity of the spontaneous amide bond formation at the surface of mammalian cells is also discussed here; there are many other approaches for labeling of cellular proteins with fluorophores, but few precedents for covalent labeling of a genetically encoded peptide with a genetically encoded protein 0partner on cells. Alternative approaches to form covalent bonds to peptides include sort as an e-catalyzed reaction of N-terminal oligoglycine with C-terminal LPXTG, which has the advantage of the small tags required but requires millimolar calcium (disruptive in the cytosol and nucleus) and is only applicable at termini. Disulfide bonds can also be used for covalent peptide binding but are reversible and prone to nonspecific interactions. Covalent bond formation will be particularly advantageous either when peptide attachment must be stable over long periods, such as for bio-assembly or imaging, or when proteins are subject to high forces, such as from the shear in the blood stream or from the firing of molecular motors.

 Application II Stabilized isopeptide bonds

Bacterial pili are filamentous structures that extend from the bacterial cell surface and mediate host cell adhesion, bacterial motility, and other critical aspects of colonization. The pili of pathogenic bacteria are also major virulence factors and important vaccine candidates. The best-characterized are the type I and type IV pili of Gram-negative organisms, for which considerable structural information exists on subunit structure and assembly. These pili are long (1 to 4 mm), thin (5 to 8 nm), and flexible, but are nonetheless very strong and can withstand extreme physical stresses. By contrast, the pili on Gram-positive bacteria have mostly gone unrecognized until recently, probably because they are extremely thin (2 to 3 nm) and hard to see. Unlike Gram-negative pili, whose subunits associate via noncovalent interactions, Gram-positive pili are assembled by bacterially encoded transpeptidase enzymes called sortases.

These enzymes recognize specific sequence motifs in the pilin subunits, elongate the pilus oligomer by progressive addition of subunits joined by intermolecular isopeptide bonds, and then tether the entire assembly to the cell wall peptidoglycan. The pili thus consist of multiple, covalently bonded copies of a single backbone pilin, to which can be added a few accessory proteins.

Streptococcus pyogenes [group A Streptococcus (GAS)] infects the human throat and skin, causing common infections such as a sore throat and tonsilitis, as well as severe invasive illnesses such as necrotizing fasciitis, rheumatic fever, and streptococcal toxic shock syndrome. Thin pili, ~2 nm wide and >1 mm long, have been revealed by electron microscopy and were shown to be essential for adhesion to human tonsil and skin cells as well as promising vaccine candidates against virulent GAS bacteria. The pilus-forming proteins are encoded in a small gene cluster within a pathogenicity island known as the FCT (fibronectinbinding, collagen-binding T antigen) region. In the S. pyogenes M1 strain SF370, spy0128 encodes the backbone pilin, spy0129 the sortase C1, and spy0125 and spy0130 two pilin-associated proteins. The backbone pilin subunits are Lancefield T antigens, named for their antigenicity and their extreme resistance to trypsin (T) digestion.


T0o understand pilus stability and assembly in Gram-positive organisms, the backbone pilin protein Spy0128 from an M1 strain of S. pyogenes was used. This 340-residue protein has a sortase recognition motif, Glu-Val-Pro-Thr-Gly, at residues 308 to 312. Constructs comprising residues 18 to 311 and 18 to 308 were prepared. We obtained excellent crystals for the latter and solved its crystal structure at 2.2 Å resolution (R = 20.3%, Rfree = 26.4%).

The Spy0128 monomer has an elongated two-domain structure, with length 98 Å and width 20 to 30 Å. Both domains have irregular all-b structures that are modified variants of the immunoglobulin fold. The N-terminal domain, residues 18 to 171, forms a b  sandwich in which the strands in one b sheet are progressively extended such that the upper portion of this b sheet, at the top of the domain is relatively exposed. The C-terminal domain, residues 173 to 307, comprises 11 b  strands. Its core is a b sandwich in which a five-stranded b sheet packs against a four-stranded b sheet. A prominent b ribbon (strands b3 and b4) extends the first sheet to seven strands and provides a wide loop at the base of the domain. Overall, the domain is wedge-shaped with a broad base and a narrower top where it joins to the N domain. The two domains are intimately associated, with only one residue, Ser172, separating the final b strand of the N domain from the first of the C domain. The interface between domains is mostly hydrophobic and buries ~1200 Å2 of surface area.

The crystal asymmetric unit contains three independent Spy0128 molecules that generate columns of molecules extending through the crystal. This arrangement, found also in another crystal form, provides a compelling model for the assembly of GAS pili. Successive molecules stack head-to-tail, related by an approximate 31 helical screw along their long axis. Each interface, between the N domain of one molecule and the C domain of the next, buries ~850 Å2 of a solvent-accessible surface with a shape complementarity of 0.72, comparable with other protein oligomerization interfaces. There is very little lateral interaction between columns of molecules in the crystal.

The head-to-tail packing means that Phe307, which closely precedes the sortase recognition motif in Spy0128, packs against the exposed face of the N-domain b sheet. Sortase action cleaves the Thr311-Gly312 bond, after which isopeptide bond formation between the new C terminus and a Lys residue covalently links adjacent pilin subunits. Five invariant lysines are potential candidates for this intermolecular linkage. Of these, only Lys161, near the top of the N domain and 11 to 13 Å below Phe307 of the next molecule in the column, is a viable candidate for generating an elongated pilus. We used mass spectrometry of pilus fractions from S. pyogenes to show that Lys161 is indeed the essential lysine involved in oligomerization. This finding strongly supports the biological relevance of the assembly seen in all crystal forms. Residues 308 to 311 would continue below Phe307, packing against a highly sequence-conserved region of the b sheet and allowing isopeptide bond formation between the Thr311 carboxyl and Lys161 Nz of the next molecule.

Intermolecular isopeptide bonds are known in other contexts besides the sortase-generated isopeptide bonds of Gram-positive pili. In ubiquitination, specific lysine residues of a target protein are covalently linked by ubiquitin ligases to the terminal carboxylate of ubiquitin. In transglutamination, enzyme-catalyzed isopeptide bond formation occurs between Gln and Lys side chains, as in the cross-linking of fibrin subunits, catalyzed by factor XIII . A rare example of self-generated isopeptide bonds between Asn and Lys residues occurs in the bacteriophage HK97, where capsid subunits are covalently cross-linked to form interlocked circular rings that give extraordinary stability. However, No examples of intramolecular isopeptide bonds have been reported. The formation of two intramolecular isopeptide bonds within the pilin subunit, one in each domain by covalent bonding between lysine and asparagine side chains (Lys36-Asn168 in the N domain; Lys179-Asn303 in the C domain), these are each indicated by continuous electron density extending through the lysine e-amino group into the d-carboxyamide group of asparagine.

Mass spectrometry provided independent confirmation. The protein molecular mass was consistent with the loss of two NH3 units through isopeptide bond formation, and proteolytic digestion and peptide mapping gave cleavage products containing nonconsecutive sequences. These mapped to peptides surrounding both isopeptide bonds. These bonds appear, as in HK97, to be selfgenerated.

An essential Glu residue is associated with each bond, forming hydrogen bonds to the isopeptide C=O and NH groups. The hydrogen bonding implies that both glutamic acids, Glu117 and Glu258, are protonated. In each case, the Lys, Asn, and Glu residues are surrounded by a cluster of aromatic residues, which would favor elevation of the pKa of the glutamic acid and reduction of the pKa of the lysine e-amino group. In the N domain, the isopeptide moiety sits over the aromatic ring plane of Phe52, and Glu117 is surrounded by Phe54, Tyr128, and Phe166. Similar roles are played by Phe192, Phe194, Tyr261, and Phe301 for the C-terminal isopeptide. A plausible mechanism for isopeptide bond formation, first suggested for HK97, is that the protonated Glu polarizes the C=O bond of the Asn side chain, inducing positive charge on Cg. Nucleophilic attack on Cg by the unprotonated Lys e-amino group then generates the isopeptide bond. The impact  of the Glu residues for isopeptide formation by mutating Glu117 and Glu258 to alanine, creating proteins E117A and E258A was studied. Mass spectrometry showed the loss of one isopeptide bond from each mutant, and crystallographic analysis of E117A confirmed that the N-domain isopeptide was not formed when Glu117 was mutated. Both mutants also showed greatly increased susceptibility to proteolysis, indicating the stabilizing effect of these cross-links. Sequence comparisons suggested that the isopeptide bonds may be a conserved feature of the pili of all GAS. Despite low over- all sequence identity in Spy0128 alleles, the Lys, Asn, and Glu residues of the isopeptide bonds are strictly conserved, as are five of the eight aromatic residues surrounding them. The other aromatics are replaced only by hydrophobic residues. The isopeptide bonds are strategically located in each domain (just before the interdomain connection and the sortase recognition motif, respectively), tying together the first and last b  strands. Sequence similarities with the major pilins from other Gram-positive bacteria are too low to determine whether isopeptide bonds are a common feature, but a conserved Asn precedes the sortase motif by 5 to 8 residues in all sequences have been examined, and conserved Lys and Glu residues can also be traced.

The evidence for intramolecular isopeptide bonds in other cell surface proteins was also identified. The C-terminal domain of the pilin-associated Cpa (GAS collagen-binding protein), encoded by spy0125, is homologous with the C domain of Spy0128, with residues involved in the Cterminal isopeptide bond (Lys, Asn, Glu, and three Phe) invariant across all 14 Cpa sequences in the current sequence database. Examination of the recently released structure of a minor pilin, GBS52 from Streptococcus agalactiae, reveals an unrecognized Lys-Asn isopeptide bond like those in Spy0128. By searching the Protein Data Bank, using a Lys-Asn- Glu/Asp structural template, and an identification of the collagen-binding adhesin Cna from Staphylococcus aureus as also having previously unrecognized isopeptide bonds in its A and B domains. Further sequence searches showed many instances of these domains containing predicted isopeptide bond–forming residues in the same locations, all from Gram-positive organisms and all (where functionally characterized) cell surface adhesion proteins.

The isopeptide bonds we have found in GAS pili and other Gram-positive adhesins provide a striking parallel with the disulfide bridges found in Gram-negative pilins and adhesins, which are important for pilus assembly and substrate binding. We hypothesize that in Gram-positive organisms, which lack the disulfide bond formation machinery of Gram-negative bacteria, intramolecular isopeptide bonds may provide an alternative mode of stabilization for cell surface proteins involved in host pathogenesis. A model for the assembly of S. pyogenes pili, in which self-generated intramolecular isopeptide bonds complement the sortase-catalyzed intermolecular bonds is done. The long, thin GAS pili are only ~2 nm (one molecule) thick but typically >100 molecules long, and we infer that these bonds play a critical role in maintaining pilus integrity in the face of severe mechanical and chemical stress while bound to host cells. GAS pili show considerable antigenic variation, indicating an important role in virulence, and the pilin subunits are T antigens that are used for serotyping. The presence of several conserved regions on a highly variable background suggests that the structure could help provide an effective pilusbased vaccine against GAS.