Remarkable Indications-How the Glycomics is Coded in Cells and Tissues

It is presupposed that N-glycosylation encodes crucial information about the protein’s structure-function, its age and its localization. Although N-glycosylation is often described as posttranslational modification, recent studies show that it occurs cotranslationally. During translation, ribosome is attached to a proteintranslocation channel in the endoplasmic reticulum membrane and a nascent polypeptide is threaded through the channel, and oligosaccharyltransferase, adjacent to the channel, catalyses the attachments of oligosaccharide to the polypeptide chain before the protein acquire secondary or tertiary structure [1]. This commentary highlights connection between mentioned process of the glycoprotein folding and glycan–protein interactions on cell surfaces. Seems to be, both processes use special amino acid triplets (glycocodons) for programming of glycomics events, and this coding has relationship with the genetic code.


Remarkable Indications-How the Glycomics is Coded in Cells and Tissues
Jozef Nahalka* Institute of Chemistry, Centre for Glycomics, Slovak Academy of Sciences, Dúbravská cesta 9, SK-84538 Bratislava, Slovak Republic Institute of Chemistry, Centre of Excellence for White-Green Biotechnology, Slovak Academy of Sciences, Trieda Andreja Hlinku 2, SK-94976 Nitra, Slovak Republic It is presupposed that N-glycosylation encodes crucial information about the protein's structure-function, its age and its localization. Although N-glycosylation is often described as posttranslational modification, recent studies show that it occurs cotranslationally. During translation, ribosome is attached to a proteintranslocation channel in the endoplasmic reticulum membrane and a nascent polypeptide is threaded through the channel, and oligosaccharyltransferase, adjacent to the channel, catalyses the attachments of oligosaccharide to the polypeptide chain before the protein acquire secondary or tertiary structure [1]. This commentary highlights connection between mentioned process of the glycoprotein folding and glycan-protein interactions on cell surfaces. Seems to be, both processes use special amino acid triplets (glycocodons) for programming of glycomics events, and this coding has relationship with the genetic code.
There is a belief that "eukaryotic cells form glycoproteins" and "prokaryotic cells form non-glycosylated proteins". In the eukaryotic cell, N-glycosylation occurs in the Endoplasmic Reticulum (ER) by Oligosaccharyltransferase (OSTs). However, OST homologues are also found in archaea and in bacteria [2,3]. It is possible to say that N-glycosylation is a phenomenon that integrates eukarya, bacteria and archaea, and the conservation of the structure and sequence of OSTs in these three forms of life suggests that similar glycosylation processes take place on the surfaces of bacterial and ER membranes, which is a simple confirmation of the endosymbiotic theory [4]. In other words, glyco-coding is not eukaryotic phenomenon, but it is older and had to originate before the last universal common ancestor. Even some researchers accept the theory that prebiotic peptides, which catalysed the first reactions in the primordial soup, established stereochemical principles that were taken later as bases for coding of biochemical reactions and cell life formation [5]. For example, homochiral L-dipeptides stereospecifically condensate glycolaldehyde to D-sugars [6]. The nature preferentially uses L-amino acids and D-sugars. In light of this, the types of peptide bonds that occur in monosaccharide recognizing proteins (lectins) were quantified recently [7]. This bioinformatics study revealed that the distribution of peptide bond types is quite similar for proteins (lectins) that recognize the same monosaccharide, and that typical pair of amino acids, which is related to a concrete monosaccharide, can be observed. The maximal occurrence of WS amino acids was detected for galactose (Gal), EW for fucose (Fuc), AY for mannose (Man), FS for N-acetylglucosamine (GlcNAc), MS and DD for N-acetylgalactosamine and MF for the glucose (Glc). When the search was simplified to GAVD amino acids, amino acids that are detected as main amino acids in carbonaceous meteorites [8] and main amino acids in Miller's experiments simulating amino acid synthesis in the primitive Earth [9], then GA amino acids were detected for Gal, DG for Fuc, GD for Man, VG for GlcNAc, VA and VV and DD for GalNAc, and AA for Glc. A comparison of hydropathy similarity of detected amino acids and the comparison with the genetic codon table revealed that modern amino acids pairs, related to monosugars, probably evolved from amino acids pairs established from prebiotic GAVD amino acid pool. Additionally is known that the sugar hydroxyl groups are well suited for directional acceptor/donor hydrogen (H) bonds (or the coordination of Ca 2+ ions), and the polar amino acids, such as D, E, Q, N, and R, are well suited for cooperative H-bonding [10]. It means that minimally one polar amino acid is needed to add to detected amino acid pairs for obtaining a basic peptide (glycocodon) which "recognizes" monosaccharide in a 3-D manner. As the result, AAR glycocodon can be recognized for Glc, GWN for Gal, QGD for Man and so on [7]. Human Galectin-3 in complex with LacNAc can serve as example how deduced glycocodons are used for Gal reading (Figure 1). The crystallized Carbohydrate Recognition Domain (CRD) shows that GW amino acid pair, inserted among polar amino acids, is used for Gal reading. It appears that GW detects hydrophobic patch over the C3-C6 galacto-positions and polar amino acids are responsible for the detection of sugar hydroxyl groups. Nowadays, the scientists start to understand how distant protein regions, that often do not adopt a well-defined structure, are involved in dynamic readout mechanism [11]. It was found that N-terminal non-CRD of the Galectin-3 has biological importance, the intact Gal-3 has a 3.8-fold higher affinity than the crystallized carbohydrate recognition domain (CRD) [12]. In figure 1, the sequences composed from the potential glycocodons of non-CRD are marked (Y is considered as polar, it shares the same second base in the DNA codons for polar amino acids; the biomolecular behavior of Y, in the context of the polypeptide chain, is probable polar). It seems that the non-CRD segment modulates conformational preference and flexibility of CRD by triplets -glycocodons distributed in the non-CRD.
Lectins, adhesins and other proteins, which are exposed on cell membrane surfaces, are responsible for glycomics coding in cell-cell interactions, but on the level of glycoprotein synthesis, OSTs are mainly responsible for implementation of influence of some glycan-sequences on the protein structure formation. The first structure of completely crystallized OST from Campylobacter lari (PGIB, 3RCE) showed that the conserved WWD triplet defines the substrate specificity, but it is not directly involved in catalysis [3]. In the case OSTs, similar glycocodon system seems be involved in the coding of glycoprograme. W 463 W 464 D 465 is hydropathically close to AAD and GWD, the glycocodons identified for glucose and galactose, and following sequence Y 466 G 467 Y 468 is close to mannose and fucose glycocodons, and E 457 D 458 Y 459 V 460 V 461 A 462 sequence, that goes in front of W 463 W 464 D 465 , can be divided into E 457 D 458 Y 459 and Y 459 V 460 V 461 glycocodons and V 461 A 462 peptide bond specific for GalNAc (Y is considering as a "polar amino acid" according to the second base in the DNA codons). OSTs transfer isoprenoid-pyrophosphate-linked oligosaccharides to specific sequence motifs (sequons) on the asparagine side-chain of proteins. According to the glycocodon theory [7], there should be a correlation between the sequon and the glycocodon for directly connecting monosaccharide of the transferred oligosaccharide with the sequon's asparagine. The sequon for eukaryotic N-glycosylation is either N-X-S or N-X-T, where X is any amino acid except proline, S denoting serine and T threonine. Seems that N-X-S matches NFS glycocodon deduced for N-acetylglucosamine, and interestingly, adding the F to N-X-T motif doubles the number of proteins that can be stabilized by glycosylation without having to alter the native reverse turn type [13]. For bacterial PGIB, DQNATF has been recognized as an optimal acceptor sequence [14], DQN can be a GalNAc-glycocodon and TF is close to the FS -SF identified for GlcNAc.
Theories about the origin of genetic code can be divided to RNA world theories, protein world theories, co-evolution theories and stereochemical theories. All these theories take different approach with aim to find "the first word", the most ancient codons. Recent results confirm older proposals that a system of four codons ("gnc", n=a, g, c, u) and four amino acids (G, A, V, D) could be the original genetic code [7,15]. There is a connection, present day glycocodons can be simplified to triplets composed from GAVD amino acids. It indicates that prebiotic [GAVD]-peptides established "specificity" for each monosaccharide, and during evolution, it is likely that [GAVD]glycocodons were transformed to novel glycocodons by positive selection for the increased diversity and functionality of "sugarprotein language" that can be made with a larger amino acid alphabet. Nevertheless, evolution process holds hydropathic similarity, amino acids in glycocodons are substituted with amino acids with similar polar properties, what minimizes errors in established sugar-protein interactions [7]. In conclusion, it says that the origin of the glycocode is related to the origin of the genetic code.
The author appreciate the support from the project: Centre of excellence for white-green biotechnology, ITMS 26220120054, supported by the Research & Development Operational Programme funded by the ERDF.