Department of Heteroorganic Chemistry, Sienkiewicza 112, 90-363 Lódz, Poland
Received Date: October 06, 2014; Accepted Date: October 29, 2014; Published Date: November 5, 2014
Citation: Blaszczyk J (2014) Handling of Selenomethionines in Macromolecular Refinement. Biochem Anal Biochem 3:155. doi:10.4172/2161-1009.1000155
Copyright: © 2014 Blaszczyk J, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Biochemistry & Analytical Biochemistry
In the Protein Data Bank, we can find X-ray structures which were determined from selenium-derivatized proteins, but the deposited coordinates, to our surprise, contain native methionines. The problem is likely due to not complete substitution of native methionines with selenomethionines during chemical reaction. When the crystal grows from such sample, and later is subjected for X-ray analysis, it may show partial presence of native sulfur. This paper is a voice in discussion how to handle the refinement of selenomethionines. The purpose of this work is to explain why selenomethionines can’t be refined purely as native methionines, particularly in macromolecular structures determined at high resolution. It is important because selenium and sulfur are different chemical elements. To explain the importance of refinement of correct atom type, author provides evidence, collected mainly from his previously reported small organic structures containing sulfur or selenium. Author emphasizes that the Xray structures, which contain selenium, are never identical with their sulfur-containing analogs. They differ at least in the respective bond lengths. Two available publicly X-ray structures have been re-refined, one small molecule and one macromolecule, in which the atom type was intentionally changed. Refinement of chemically incorrect atom type yielded non-natural behavior of atomic displacement parameters. Author concludes that the structures which contain incorrect atom types can’t represent the experimental data but can serve only as models. There are examples, in the PDB, which show that selenomethionines in not fully Se-Met derivatized proteins can be refined with arbitrarily assumed fractional, less than 100%, occupancies. This inspired the author to perform various re-refinements, of the above mentioned Se-Met structure that contained S atoms in coordinates. In these re-refinements, selenomethionines are handled in different manners, and the results are compared. Author suggests refinement of chemical identities, selenium and sulfur, with fractional occupancies, as a method.
Methionine; Selenomethionine; Sulfur; Selenium; Chemical identity; X-ray crystal structure; Theoretical model; Macromolecular crystallography; Small molecule crystallography; Structure solution and refinement.
PDB: Protein Data Bank; JCSG: Joint Center for Structural Genomics; S-Met: Native Methionine; Se-Met: Selenomethionine; S-Cys: Native Cysteine; Se-Cys: Selenocysteine; B-factor: Atomic Displacement Parameter
The worldwide Protein Data Bank (wwPDB) is an archive of the macromolecular structures, which have been determined by the experimental methods, such as X-ray, NMR, electron microscopy or hybrid methods [1,2]. Since 2006, the Protein Data Bank does not accept theoretical models. The models, which were deposited before 2006, still exist in the PDB, but they are separated from the main archive . In May 2014, the total number of depositions reached 100000, and is growing continuously.
The advances in macromolecular crystallography methodologies allowed to increase significantly the number of structures determined at atomic resolution. The beauty of atomic resolution is that various structural details may be built and refined with much higher confidence. This includes the clear appearance, in electron density, of multiple conformations of amino acid side chains or entire chain fragments . Another advantage is that high resolution allows determining the chemical identity of the terminal side chain atoms (for example, N versus O atoms in Asn or Gln, or C versus N atoms in His residues). In high-resolution structures, the atom type in these side chains can be determined in similar manner like in small-molecule crystallography, i.e., by simple inspection of the atomic displacement parameters (B-factors) and interatomic distances, without time-consuming and uncertain analyses of the entire systems of intra- or intermolecular contacts. The determination of the atom type, by using the small-molecular approach, has been accomplished, for example, in the PDB structures 1EQO, 1F9H, 1F9Y, 1HQ2, and 2O90 [5,6].
However, even if the methods of crystallization, data collection, computing and visualization become more and more advanced the refinement of protein structures still remains a challenge. The inconveniences include, for example, troubles in handling of refinement of flexible fragments of the molecule. This happens, when the side chains, loops or entire domains lie at the surface of the protein and are solvent-exposed. Such fragments are frequently omitted in the final coordinate file, even if the refinement is performed at atomic resolution. For example, in 1.07 Å entry PDB 2O90, seven C-terminal residues in the final structure of dihydroneopterin aldolase are missing due to lack of electron density . Interestingly, the entire polypeptides are completely visible, for this protein, in two other PDB entries, 2NM2 and 2NM3, even when these structures were determined at resolution not that high, of 1.50 and 1.68 Å, respectively .
Other refinement problems may arise during crystallization. The protein crystals usually grow from the ‘soup’ which contains many different components such as salts, additives, metal ions, etc. Some of extra entities can also bind to the protein yet at the protein preparation steps such as purification. Very frequently, such extra entities are clearly visible in the electron density, but are very difficult to interpret. An inspection of the X-ray crystal structures, which are deposited in the PDB, reveals many coordinate files, which contain such elements. Unidentified elements are labeled in the PDB entries in three ways: as UNX, UNL, and UNK. UNX is an abbreviation for unknown atom or ion, and UNL stands for unknown ligand. The UNX or UNL elements are present, for example, in PDB entries 4IGV , 4IAJ , or 3S8S . UNK is an abbreviation for unknown amino acid residue. For example, in PDB entries 4I79  or 4GS0 , authors were able to identify only a fragment of the polypeptide, and they are uncertain of the sequence registration.
Next issue is how to handle, during refinement and deposition, the protein residues, which are only partially visible in experimental electron density. Because of the software limitations, the crystallographers usually can’t refine the amino acid residues with missing atoms. Instead, they use tricks, which consist in renaming the visible residue fragment to the residue, which in such shape is complete. For example, the residues of longer side chains, such as lysine, glutamine, cysteine, methionine, etc., for which the authors only see the density up to Cβ atom, are usually renamed to alanines, and further refined as alanines.
The PDB policies and deposition procedures do not allow accepting depositions, which contain differences in the amino acid sequence information and in atomic coordinates. Therefore, the longer residues, which were not completely visible in density and are reported as alanines, during PDB deposition and annotation process, have their original names restored, and the information about missing atoms is included in the text of the processed entry. The remark 470 in the PDB entry 4NZK, is an example of such deposition .
In the PDB, there are many entries (7839 entries available in September 2014), which contain atomic coordinates obtained from refinement of the data collected from selenium-containing protein. The main purpose of protein derivatization by substitution of native methionines (S-Met) with relatively ‘heavier’ selenomethionines (Se-Met; 34Se versus 16S) is to facilitate the solution of the phase problem. Selenium can be used for phase determination in multiwavelength anomalous diffraction (MAD) method of crystal structure solving . Incorporation of selenium is the way of producing modified proteins without the structural disturbances that are commonly associated with heavy-atom incorporation. It removes the need for time-consuming and challenging screening for heavy-atom derivatives .
Another amino acid residue which contains selenium, is a selenocysteine (Se-Cys). Selenocysteines are naturally present in selenoproteins . Protein derivatization, by replacement of cysteines with selenocysteines, for purpose of structure solution by X-ray crystallography, is not such common approach  as the replacement of S-Met with Se-Met. In September 2014, there were only 34 entries available in the PDB which contained Se-Cys residues.
A detailed inspection of the structural information in the PDB archive may reveal the entries, which still (despite the PDB deposition policies) have serious inconsistencies in chemical identity between author’s provided sequence information and the coordinates. The example of such inconsistence is the PDB entry 3B40 , which contains the following statement: ‘SeMet modeled as Met’ (see remark 3, other refinement remarks in the PDB file, or refine. details in cif file). In other words, the PDB entry 3B40 is an X-ray structure of selenomethionine-derivatized protein. Surprisingly, there is no presence of Se-Met residues in the sequence information and in coordinate section. Instead, the sequence and coordinates contain S-Met residues.
This report is an attempt to answer, whether the X-ray structure, which contains different atom type in coordinates (S-Met) than in the crystal (Se-Met), still could be considered as true structure, or only as representative model of investigated macromolecule.
It is widely reported, that the replacement of methionines with selenomethionines, usually does not affect protein properties such as folding [15,19,20]. Therefore, the native, S-Met containing, proteins, are frequently derivatized with selenium to facilitate the structure solution process. After successful determination of the structure of Se-derivatized protein, the obtained Se-Met containing coordinate set is then used as a starting model for structure solution and refinement of the native, S-Met, data. This approach has been utilized, for example, in the X-ray structure determination of 6-hydroxymethyl -7,8-dihydropterin pyrophosphokinase  or ESCRT-II eap-45 Glue domain .
Even if the main purpose of using the Se-Met data is only to yield a starting model for solving the native (S-Met) structure, we can find many cases when both Se-Met and native data are deposited. Why? - Because the authors found that the native and Se-Met structures were non-isomorphous, and had some interesting differences in the three-dimensional shape (conformation). The examples of such situation include both the native and Se-derivative structures, reported for orotidine-5’-monophosphate decarboxylase , the disulfide-bond protein dsbG , the RNA-dependent RNA polymerase QDE-1 , or phosphoinositide phosphatase MTMR2 .
It also happens that the Se-derivatized protein gives much better diffracting crystals then native protein, or, the structure of native protein can’t be solved by using, as starting model, the Se-Met structure obtained from collected Se-Met data. In such situation, the Se-Met structure, which is determined from the Se-Met data, becomes the only structure to report. Such phenomenon took place, for example, in structure determination of alpha-isopropylmalate synthase , XLF/Cernunnos protein , chorismate synthase , Rho-associated kinase coiled-coil domain  or bacterioferritin A .
Because of the difference in chemical identity, the Se-Met structure (which is determined from the Se-Met derivatized protein), can’t simply represent the structure of the native protein (and vice versa, the S-Met structure which is determined from the native S-Met data, can’t represent the structure of the Se-Met derivatized protein). The Se-Met structure can only serve as model of the native protein, and vice versa, the S-Met native structure can only serve as model of the Se-Met derivatized protein.
Crystallographers very frequently use models to solve the new structures by the molecular replacement method. The starting models, which are used in molecular replacement, contain usually the chain fragments which have only similar folding to that in the investigated protein, with many differences in the amino acid sequence. In many cases, the starting models represent only the fragments of the newly investigated structures. Therefore, starting models, at the stage right after successful rotation and translation, without any refinement, can’t represent the final structures. At least, the missing fragments need to be built, the sequence in the rotated models has to be corrected, and some side chains or larger fragments of the chain need to be repositioned to fit the experimental electron density.
Similarly like the molecular replacement model, the Se-model which is used to solve the native, S-containing protein (and vice versa), needs later to be refined. The atoms with different chemical identity have to be corrected first.
It is not necessary to be an experienced chemist, to say, that selenium- and sulfur-containing compounds, from chemical point of view, are not the same. It is as simple as saying that the brass is not the gold, even if both shine in yellow. There are many examples which could explain why selenium-containing compounds are chemically not the same as sulfur-containing compounds. Not speaking about the differences in physical, biological, physiological, etc. properties, the sulfur-containing compounds and their selenium-containing analogs, even if they are isostructural, i.e., their three-dimensional structures share the same three-dimensional shape (conformation), they most likely will not show the same chemical features such as reactivity [32-34].
Small-molecule crystallography is much more powerful, than macromolecular crystallography, in the judgment of correctness or incorrectness of the refined atom type. Very high resolution of the majority of determined small-molecular structures, allows figuring out the correct atom type, simply by the analysis of interatomic distances and atomic displacement parameters.
Table 1 contains collection of interatomic distances, which involve S or Se atoms, and shows that the distances which involve Se atoms are much longer from the distances which involve S atoms. All examples, which are provided in Table 1, have been selected from author’s previous work on X-ray structure determination of small-molecular organic compounds such as phosphoroorganic disulfides and diselenides [35-41], selenides , dithianes, dithiolanes, oxathiaphospholanes, dithiaphospholanes and oxaselenaphospholanes [43-53], phosphoramidothioates , phosphoramidoselenoates , sulfoxides [55-59] and thiophosphoryl compounds . For full listing of compounds 1-42 included in that comparison, please see supplementary material.
Table 1: Interatomic distances, which involve sulfur and selenium atoms, in the X-ray crystal structures of selected organic compounds [35-60].
The values, which are listed in row ‘1-42’, are the average bond lengths calculated from the respective values for all compounds 1-42. The detailed bond length listing for compounds 4-42 is provided in supplementary material.
1a: bis((2,3,4,6)-tetra-O-acetyl-beta-D-glucopyranosyl) disulfide 
1b: bis((2,3,4,6)-tetra-O-acetyl-beta-D-glucopyranosyl) diselenide 
2: bis((2,3,4,6)-tetra-O-acetyl-beta-D-glucopyranosyl) diselenide, crystal form 2 
3a: bis(5,5-dimethyl-2-thioxo-1,3,2-dioxaphosphorinan-2-yl) disulfide 
3b: bis(5,5-dimethyl-2-thioxo-1,3,2-dioxaphosphorinan-2-yl) diselenide 
The crystal structures of bis (2,3,4,6)-tetra-O-acetyl-beta-D-glucopyranosyl) disulfide 1a and diselenide 1b are not identical (Table 1), even if 1a and 1b are crystallographically isomorphous . The same is the crystallographic symmetry, space group, and three-dimensional shape (conformation) of both molecules 1a and 1b . However, the bond lengths, which involve selenium, are much longer than the respective bonds which involve sulfur (Table 1). More, the unit-cell dimensions a,b,c of diselenide 1b are all longer than the unit cell constants of respective disulfide 1a .
Bis (5,5-dimethyl-2-thioxo-1,3,2-dioxaphosphorinan-2-yl) disulfide 3a and diselenide 3b are the examples of the analogs, which are totally non-isomorphous. They do not share even such crystallographic features like crystal symmetry or space group . The overall three-dimensional shape of both molecules (conformation) is also quite different . The bond lengths, which involve selenium, are, similarly like in the previous observation for compounds 1a and 1b, much longer than the respective bonds that involve sulfur (Table 1).
The bonds, which involve selenium atoms, are longer that the bonds, which involve sulfur atom. This phenomenon is very clear and consistent in all compared compounds 1-42 in Table 1 [35-60], despite the differences in the chemical structure, and, despite the fact that the S or Se atoms in some of these compounds are involved in exo-or/and endocyclic connections. For details, please see the Table in supplementary material.
The average length of the single P-Se bond, which is calculated for the compounds 1-42 (Supplementary material), is 2.244 Å (Table 1) and is much longer than the average length of the single P-S bond (2.087 Å, Table 1). Similarly, the average length of double P=Se bond (2.090 Å) is much longer than respective average P=S bond (1.917 Å, Table 1).
The average length of the C-Se bond, calculated from values for the compounds 1-42 (Table 1 and Supplementary material), is 1.949 Å, and is much longer than the average length of the C-S bond (1.808 Å). Similarly, the average length of the diselenide bond (Se-Se; 2.331 Å) is also much longer than respective disulfide bond (S-S; 2.057 Å; Table 1).
From the presented above analysis of bond lengths, which involve Se and S atoms in small molecules 1-42, it is clear and easy to conclude that the three-dimensional structures of S- and Se-containing compounds, which differ only in chemical identity, S versus Se, will never align well. Thus, the selenomethionine side chain: Cα-Cβ-Cγ-Sed-Ce, will align well, with respective methionine chain: Cα-Cβ-Cγ-Sd-Ce, only up to the Cγ atom. The average linear difference between the γ and ε positions in such alignment is about 0.29 Å. Similarly, the side chains of selenocysteine, Cα-Cβ-Seγ, will not align well with native cysteines: Cα-Cβ-Sγ. At atom position γ, the average difference in length is about 0.15 Å. If the cysteines in the native protein form disulfide bridges: Cα1-Cβ1-Sγ1-Sγ2-Cβ2-Cα2, the alignment of the entire disulfide bridges with respective diselenide bridges in the Se-derivatized proteins: Cα1-Cβ1-Seγ1-Seγ2-Cβ2-Cα2, will be much worse than the alignment of single (free) cysteines with free selenocysteines. The alignment of disulfide with Diselenide Bridge, which starts from Cα1 atom, will end yet at Cβ1 atom, and will give the linear difference between distant ß1 and a2 positions of about 0.57 Å.
The evidence, which is provided above, is, hopefully, a convincing argument that the selenomethionines can’t be refined in the macromolecular structure as native methionines.
More proper way of handling the Se-Met residues in macromolecular refinement, has been established, as the research protocol, by the Joint Center for Structural Genomics (JCSG). This protocol employs some features from the small-molecular refinement. In each entry, which is deposited by this Center (and which contains Se-Met), we can find the following information: ‘A met-inhibition protocol was used for selenomethionine incorporation during protein expression. The occupancy of the SE atoms in the MSE residues was reduced to 0.75 for the reduced scattering power due to partial S-Met incorporation’. For example, please see that information (in remark 3 in pdb file, or, in refine.details in cif file) in the PDB entries 4Q6K , 4R7S , 2OGI , or 2OU5 , or in any other deposited Se-Met entry by the JCSG Center. In September 2014, there were total of 1334 such JCSG depositions in the PDB archive. However, in these entries, authors do not mention anything about handling of the remaining 25 percent of the atom type in these residues.
Refinement of incorrect atom type, Se instead of S, in small molecule
The crystallographic refinement of small molecules, which is usually performed at high resolution, is very powerful in judgment whether the atom type in refined structure is correct or wrong. As an example, in this work, author re-refines one recently reported structure, of 4-isothiocyanato-1-butyl 2’,2’,2’-trifluoroethyl sulfoxide . For the view of the molecule, Figure 1A. For bond lengths, see supplementary material (compound 36) . For purpose of this work, an incorrect atom type of the two original sulfur atoms has been intentionally assumed (Figure 1B). The two sulfurs, S1 and S2, have been renamed to Se1 and Se2, respectively, and such model has been refined, with the use of the software available in author’s facility [65,66].
The refinement, which has been performed with assumed wrong atom type (Se instead of S), yielded unreasonably enlarged thermal ellipsoids of these atoms, and significant increase of the value of crystallographic R-factor. For the values of refined equivalent isotropic atomic displacement parameters (B-factors), please see Figure 1. The R-factor of refined ‘wrong’ structure (Se instead of S, Figure 1B) is dramatically higher: R=0.201 [this work], than the R-factor of ‘correct’ structure (with S atoms, Figure 1A), which is R=0.016 . The refined B-factors for seleniums Se1 and Se2 are unreasonably high. While the B-factors of sulfurs S1 and S2 were comparable to the B-factors of atoms in the closest vicinity (Figure 1A) , the B-factors of wrongly assumed selenium atoms Se1 and Se2 are unreasonably increased (Figure 1B) [this work]. The incorrect Se1 and Se2 atom types, are in Figure 1B crossed out, to indicate their wrong chemical identity.
Figure 1: (A) The X-ray crystal structure of 4-isothiocyanato-1- butyl 2’,2’,2’-trifluoroethyl sulfoxide (compound number 36 in supplementary material) . The values near atom names are the equivalent isotropic atomic displacement parameters (B-factors) refined tor these atoms. (B) Re-refined model of the same structure with assumed incorrect atom types (Se instead of S) [this work]. The model shows artificial increase of the size of thermal ellipsoids, and increased values of equivalent isotropic B-factors, due to wrong atom type assignment. The Se1 and Se2 atoms are crossed out, to indicate their wrong chemical identity, and to show that this model was incorrectly refined and can’t represent the real structure.
Handling of two different atom types, Se and S, in refinement of Se-Met proteins
It is obvious that an X-ray structure, which is usually displayed in the form of a set of atomic coordinates, is an average structure of all molecules, which were present in the investigated, i.e., exposed to X-ray, crystal. Therefore, it is very likely that the crystal which grew from the buffer containing of Se-Met derivatized protein, may include molecules, in which the Se-derivatization process did not proceed with 100% yield. In consequence, some of the individual molecules in that crystal may still contain native, S-Met, residues.
The publicly available, from the Protein Data Bank, entry 3B40 , which is under extensive investigation in this work, is most likely an example of that case. The partial presence of S-Met residues, or, in other words, not complete Se-Met derivatization of the protein, is possible . To investigate this problem in detail, the six different re-refinement paths of the entry 3B40 have been carried out.
For purpose of this work, the coordinates (file 3B40.pdb) and structure factors (file 3b40-sf.cif) have been downloaded from the PDB site, and converted to appropriate formats. The downloaded coordinate file (file 3B40.pdb) is an S-Met containing model, and the downloaded structure factor file (file 3b40-sf.cif) contains X-ray intensities collected from the Se-Met crystal.
For purpose of further comparisons, the downloaded model 3B40 (the coordinates which contain S atoms) has been initially re-refined [65,66]. Four rounds of re-refinement were carried out. The coordinates and isotropic atomic displacement parameters (B-factors) were refined for all atoms. During this refinement, some side chains were repositioned in the electron density, and some new, well visible water molecules, added to the solvent network.
A separate coordinate file has been created, from the re-refined model, in which all S atoms have been renamed to Se. Then, the two separate refinement paths, A and B, against the same experimental data set (3b40-sf.cif), have been carried out. In refinement path A, the S-containing coordinates have been used, and refined in same way, as the authors originally did in their deposited structure. In refinement path B, the atom type was changed from S to Se. In both refinement paths A and B, the isotropic atomic displacement parameters (B-factors) were initially set to the arbitrary values, and then, refined individually, together with the B-factors of other atoms. The occupancies of S atoms were fixed, at 100%, in path A, and the occupancies OD Se atoms were fixed, also as 100%, in path B, and were not refined. The B-values obtained of both refinement paths A and B, are collected in Table 2.
|Path A||Path B||Path C||Path D||Path E||Path F|
|Residue number||Average B side-chain, S excluded||B-factor 100% S||B-factor 100% Se||B-factor 75% Se||Refined occupancy Se only||Global occupancy 84%S,16%Se||Refined occupancy S/Se|
Table 2: Atomic displacement parameters (B-factors, Å2) and occupancies for the S and Se atoms in re-refinement of the PDB entry 3B40 .
Path A: refined B-factors, the presence of 100% sulfur was assumed. Path B: refined B-factors, the presence of 100% selenium was assumed. Path
C: refined B-factors, the presence of selenium was assumed with 75% occupancy (JCSG approach). Path D: individually refined occupancies for Se atoms. Path E: refined value of global occupancy ratio of partial presence of S or Se atom type. Path F: individually refined values of occupancy ratio of partial presence of S or Se atom type in the residues. In paths D, E, and F, the B-values were set as equal to average values in the chain (values shown in parenthesis) and not refined.
Table 2 contains values of refined isotropic atomic displacement parameters (B-factors) of atoms, which in path A were refined as sulfurs (column ‘Path A’), and in path B as seleniums (column ‘Path B’). The residues, which contain these heteroatoms, are in the following ten sequence positions: 43, 104, 136, 158, 166, 168, 191, 245, 283, and 333. Refinement path B, in which the heteroatoms have correct chemical identity (with the experimental data), should yield the B-factors which have comparable values with the average B-factors of the entire side chains. From the values collected in column ‘Path B’ in Table 2, it is difficult to find even single such regularity. Only for residues 166 and 191, and less evident for 104 and 136, the refined B values (in path B) may speak for the presence of selenium. For residues 283,168,245, the B-factors, which are closer to average side-chain value, are rather for S-atom refinement (path A). This speaks rather for the presence of sulfur in these residues. The B-factors of remaining residues 43,158,333, likely indicate the partial presence of both sulfur and selenium (in fractional, less than 100%, occupancy). Such distribution of B-factor values in refinement paths A and B (Table 2) may speak for partial presence of sulfur and selenium in originally refined residues as S-Met by the authors. Worse, for the ten sequence positions listed in Table 2, the percentage of partial occupancy of S versus Se seems to be different in each residue. This is an indication of possibility, that the investigated protein 3B40 , was not fully Se-Met derivatized. Both refinement paths A and B, carried out in this work, did not allow answering the question what atom type, Se or S, is correct in the refined structure.
In order to investigate the problem whether sulfur or selenium was present in the ten sequence positions listed in Table 2, the four additional refinement paths: C, D, E, and F, have been carried out. Refinement path C followed exactly the refinement protocol, which has been established by the JCSG Center. In starting coordinate set, all sulfurs were renamed to seleniums, and the occupancies for the selenium atoms have been arbitrarily set to 0.75 and not refined. The refinement yielded the values of atomic displacement parameters (B-factors), which are collected in Table 2 in column ‘Path C’. Comparison of B-values from refinement path C (75% Se) with B-values from refinement path B (100% Se) shows, that the B-values in path C are getting closer to the average values for the respective side chains. However, all refined B-values in Path C are still too large from the respective average side chain B-values.
It became interesting to find out, to which values will refine the individual occupancies of the Se atoms, when their B-values are set as equal to the average values for the respective side chains. Therefore, in refinement path D, the B-values of the ten selenium atoms were set as equal to the average values for the respective side chains, and not refined. Instead, the fractional occupancies for these atoms were refined. The obtained values are collected in Table 2 in column ‘Path D’. All values of refined occupancies are lower from the value of 0.75, which was arbitrarily used in the JCSG-followed approach (Path C).
The refinement paths C and D, showed that the occupancies of the selenium atoms are lower than 100%. It may be an indication, that the ten methionines in the native protein were not completely Se-derivatized. However, the question, what happened to the missing 25% of the atom type after refinement path C, and the missing higher percentages in Path D, still remains open.
It seemed that the best way to investigate that problem was to employ directly the small-molecule approach. In paths E and F, the heteroatom positions were set to involve both Se and S atom types with initially equal occupancies (50% Se and 50% S). The B-values for these atoms were arbitrarily set as equal to the average values of the entire side chains, and not refined. In path E, the global occupancy ratio for the Se/S identity, was refined (as the same value for all ten residues). In path F, the occupancy ratio values for the Se/S identity, were refined for each residue individually.
Refinement path E yielded the value of occupancy ratio, for Se/S identity, equal to 16%/84%, respectively. This may indicate that the overall protein derivatization with selenium was successful only in about 16 percent. Refinement path F showed, that the occupancy ratio values were different in each of the ten residues. For details, see Table 2. However, it is too far to conclude, from the values given in column ‘Path F’ in Table 2, that the methionine number 43 underwent the Se-derivatization in exactly 13%, methionine 104 in 19%, methionine 136 in 37%, etc. The resolution of the collected data is only 2.0 Å , which is not high enough to draw such precise conclusions. Similarly, the refined value of the global occupancy ratio of Se/S atom type in path E, and refined individual occupancies for Se-‘only’ atoms in path D, can also be considered, as only very approximate.
Handling of selenomethionines in crystallographic refinement of the selenium-derivatized proteins seems to be not as straightforward as, for example, the refinement of native methionines in native proteins. The problem is due to frequent, not complete substitution, of native methionines, with their selenium-containing equivalents, during chemical reaction. The Se-Met derivatized protein samples, used for crystallization, may contain macromolecules, in which the native methionines are still, partially present. In result, the crystal which grows from such sample, and is later subjected for X-ray analysis, may contain, partially, native sulfur.
Selenomethionines can’t be refined purely as native methionines, as it was done, for example, by authors of the PDB-deposited entry 3B40 . The reason is obvious, because selenium and sulfur represent different chemical identity. The presented above analysis of bond lengths, which involve Se and S atoms in selected small molecules 1-42 [35-60], shows clearly that the three-dimensional structures of the S-containing compounds and their Se-analogs, can never be identical. The reason is that the bonds, which involve selenium, are much longer than the respective bonds, which involve sulfur (Table 1). Even if the S- and Se-analogs are crystallographically isomorphous, they will never align well. The unit-cell dimensions of the isomorphous Se- and S-analogs, will also never be identical, since the Se-containing molecules are relatively larger than their S-containing counterparts. For example, see crystal structures of compounds 1a and 1b (Table 1) . In fact, the Se- and S-containing analogs do not have any obstacles preventing them from being structurally non isomorphous. For example, see crystal structures of compounds 3a and 3b (Table 1) . The performed refinement of a small molecule with intentionally altered atom type (Figure 1) yielded strange thermal mobility and very bad refinement statistics. This means that the atom coordinate sets which contain different atom types than in the investigated crystal, could serve, at most, only as models, but definitely can’t represent directly the structure of investigated compound.
If selenomethionines can’t be handled, in refinement, as 100% methionines, would it be better to handle them, during refinement, as UNX (unknown atom types) or UNK (unknown amino acid residues)? - Most likely not, because we already know that the atom type is either Se or S, and the unknown amino acid is either Se-Met or S-Met.
If, again, the Se atoms can’t be refined as 100% S, would it be better to delete these atoms from the coordinates and to refine the remaining residue fragments as alanines? Most likely also not, because the entire side chains were already visible in the density, so there was no reason to ‘shorten’ them.
Refinement of selenomethionines with the complete (100%) occupancies of the Se atoms, in the structures of not-fully SeMet derivatized proteins, is also not the best way, since it yields unnaturally enlarged values of the atomic displacement parameters.
The way of handling the selenomethionines, which is close to reality, has been suggested by the researchers from the Joint Center for Structural Genomics, who refine the Se-Met residues with the occupancies of the Se atoms reduced to 75%.
In reality, the partial derivatization of native methionines with selenium does not to have to proceed with the same yield for each individual methionine in the entire polypeptide chain. The experiment performed in this work (Table 2, path D) suggests that the occupancies of the Se atoms can be refined individually for each Se-Met residue.
The approach, which is commonly employed in small-molecule crystallography, and can mimic the real situation in the closest-possible way, is the partial refinement of both Se and S identities, with fractional occupancy. It means that the residues of interest can be refined partially as selenomethionines and partially as methionines. The experiments performed in this work showed that this method has, however, essential limitations. Macromolecular crystals usually do not diffract to such high resolution ranges as small molecules. Low resolution of macromolecular data is the limitation for use of the small-molecular approach. Therefore, handling the residues, partially as Se-Met and partially as S-Met, can’t be used as general approach for macromolecular refinement of every Se-Met derivatized protein. Handling the selenomethionines in macromolecular refinement depends on the individual case, i.e., depends mainly on the crystallographic resolution of the data collected. The small-molecule approach can apply only to the macromolecular data, which were collected from very well ordered crystals, which diffracted to high resolution.
This paper is dedicated to Professor Helen M. Berman on the occasion of her birthday. Financial support (Grant No. DEC-2012/05/B/ST4/00075) by the Polish National Science Center is gratefully acknowledged.