Structure and Catalytic Mechanism of a Glycoside Hydrolase Family-127 β-L-Arabinofuranosidase (HypBA1)

The β-L-arabinofuranosidase from Bifidobacterium longum JCM 1217 (HypBA1), a DUF1680 family member, was recently characterized and classified to the glycoside hydrolase family 127 (GH127) by CAZy. The HypBA1 exerts exo-glycosidase activity to hydrolyze β-1,2-linked arabinofuranose disaccharides from non-reducing end into individual L-arabinoses. In this study, the crystal structures of HypBA1 and its complex with L-arabinose and Zn2+ ion were determined at 2.23-2.78 A resolution. HypBA1 consists of three domains, denoted N-, S- and C-domain. The N-domain (residues 1-5 and 434-538) and C-domain (residues 539-658) adopt β-jellyroll architectures, and the S-domain (residues 6-433) adopts an (α/α)6-barrel fold. HypBA1 utilizes the S- and C-domain to form a functional dimer. The complex structure suggests that the catalytic core lies in the S-domain where Cys417 and Glu322 serve as nucleophile and general acid/base, respectively, to cleave the glycosidic bonds via a retaining mechanism. The enzyme contains a restricted carbohydrate-binding cleft, which accommodates shorter arabino oligosaccharides exclusively. In addition to the complex crystal structures, we have one more interesting crystal which contains the apo HypBA1 structure without Zn2+ ion. In this structure, the Cys417-containing loop is shifted away due to the disappearance of all coordinate bonds in the absence of Zn2+ ion. Cys417 is thus diverted from the attack position, and probably is also protonated, disabling its role as the nucleophile. Therefore, Zn2+ ion is indeed involved in the catalytic reaction through maintaining the proper configuration of active site. Thus the unique catalytic mechanism of GH127 enzymes is now well elucidated.

Very recently, the crystal structures of the ligand-free and L-Araf complex forms of HypBA1 (PDB ID: 3WKW and 3WKX) were determined [12]. Based on the structural analyses, biochemical experiments and quantum mechanical calculations, Ito showed that the nucleophile function is likely served by Cys 417 rather than a glutamate [12]. In the meantime, we have determined the crystal structures of HypBA1 in native form, in apo form and in complex with its product L-Araf. Here, by analyzing these structures, the relationship between the glutamate residues, the catalytic cysteine, the Zn 2+ ion, and the substrate in HypBA1 is elucidated, and implications on the retaining mechanism of GH127-family enzymes are discussed.

Protein expression, purification, crystallization and data collection
The expression and purification methods employed for the protein have been described before [13]. To obtain phase information by multiple isomorphous replacement (MIR), the apo HypBA1 crystals grown in 0.4 M ammonium acetate and 18% w/v polyethylene glycol 3350 [13] were used for preparing heavy atom derived crystals by using the Heavy Atom Screen Hg kit (Hampton Research). The apo crystals (isomorphous to native crystal) were soaked with various mercurycontaining reagents (final concentration 2 mM) in cryoprotectant solution (0.5 M ammonium acetate, 25% w/v polyethylene glycol 3350 and 5% w/v glycerol) for at least 1 hr.
To remove any metal ion from protein, the purified enzyme (both Se-Met protein and native protein are used here) was dialyzed against 100 mM EDTA for two times (at least 12 hrs for each time) before crystallization. The crystals of Se-Met protein without Zn 2+ ion (apo crystal) diffracted X-rays better than the native crystals. The HypBA1-Araf complex crystal was obtained by soaking the native crystal switch a cryoprotectant solution containing 50 mM Araf.
The X-ray diffraction datasets were collected at the beam line BL13B1 and BL13C1 of the National Synchrotron Radiation Research Center (NSRRC, Taiwan). The data was processed using the program of HKL2000 [14]. Prior to structural refinements, 5% randomly selected reflections were set aside for calculating R free as a monitor [15].

Size exclusion chromatography-multi-angle light scattering (SEC/MALS)
Absolute molecular weight of the purified protein was determined by static light scattering (SLS) using a Wyatt Dawn Heleos II multiangle light scattering detector (Wyatt Technology) coupled to an AKTA Purifier UPC10 FPLC protein purification system using a Superdex 200 10/300 GL size-exclusion column (GE Healthcare). HypBA1 protein with 0.75, 1.4, 1.6, 10, and 20 mg/ml concentration were applied to the size-exclusion column with a buffer containing 20 mM HEPES (pH 7.0) and 0.02% NaN 3 by a flow rate of 0.5 ml/min. A 1.8 mg/ml concentration of BSA was used for the system calibration as a control. The absolute molecular weights of individual peaks in the size-exclusion chromatogram were determined by the SLS data in conjunction with the refractive index measurements (Wyatt Optilab rEX, connected downstream of the LS detector). A standard value of refractive index, dn/dc=0.185 ml/g, was used for proteins and the buffer viscosity h=1.0164 cP at 25°C was calculated using SEDNTERP. The value of reference refractive index, 1.3441 RIU, was taken directly from the measurement of the Wyatt Optilab rEX when buffer only passing through the reference cell.

Sedimentation velocity
Sedimentation velocity was performed using an XL-A analytical ultracentrifuge (Beckman Coulter, Fullerton, CA) using absorption optics. HypBA1 and control reference buffer (approximately 400 μL) were added to 12 mm thick epon double-sector centerpieces in an AN60-Ti rotor and spun at 20°C and 45,000 rpm after an initial 90min temperature equilibration period. Detection of concentrations as a function of radial position and time was performed by optical density measurements at wavelength of 280 nm and absorbance profiles were recorded every 3 min. The protein samples were in the buffer containing 20 mM HEPES, pH 7 with a concentration of ca. 6 µM. The buffer density and viscosity were calculated by SEDNTERP [16] and the sorted data were analyzed with the standard c(s) model [17]

Structural determination and refinement
The HypBA1 structure was solved by MIR with anomalous scattering (MIRAS) using one native and two mercury datasets [(C 2 H 5 HgO)HPO 2 and C(HgOOCCH 3 ) 4 ]. In our previous studies, mercury can bind to free Cys residues very easily and the phase problem can be easily solved by using at least two mercury datasets by MIR [18][19][20][21]. The mercury atom binding sites were located using AutoSol wizard of PHENIX using 2.4 Å resolution cutoff [22]. Two mercury atom binding sites were identified and following density modification for phasing improvement was performed (figure of merit = 0.33). Automatic initial model building was carried out by AutoBuild of PHENIX. A 94% completeness model (620 residues with side chains) was built. Subsequent model building and structural refinement were carried out by using the programs COOT [23] and CNS [24].
The apo (treated with EDTA; no Zn 2+ ion observed) and productcomplex (HypBA1-L-Araf) structures were determined by using the molecular replacement (MR) method with PHASER [25] using refined native structure as a search model. The 2F o -F c difference Fourier map showed clear electron densities for most amino acid residues. Subsequent refinements by incorporating ligands and water molecules were according to 1.0 σ map level. The refined structures were validated by using RAMPAGE [26]. Data collection and refinement statistics of these crystals are summarized in Table 1 and Supplementary Table S1. All Figures were prepared by using PyMOL [27].

X-ray fluorescence scan analysis
To investigate the metal ion presented in the HypBA1 crystals, fluorescence scan analysis of the protein crystal was performed at beam line BL13B1 of the National Synchrotron Radiation Research Center (NSRRC, Taiwan). The wavelength from 12 eV to 25588 eV was used to scan the crystal to see if there is any metal ion existed.
In order to investigate whether the L-Araf (product) could cause conformational change, the native, apo and complex structures were superimposed and no obvious conformational change was observed (data not shown). However, a few regions which are close to the L-Araf-binding cavity were missing in both the native and apo structures (residues 31-50 and 247-250 for native, residues 35-50 and 413-415 for apo), but all these loop regions can be clearly seen in the complex structure. Interestingly, the longest loop consisting of residue 31-50 was found stretching across the surface of the S-domain to cover, or cap, the ligand-binding cavity ( Figure 1B). From the complex structure, the capping loop undergoes induced fit and is then stabilized through interacting with the L-Araf. Accordingly, the capping loop might play an important role in the catalytic reaction by regulating the substrate binding because one of the ligand binding residues, Gln 45 , is located on the capping loop and a similar case has been reported [35].
On the other hand, although not supplemented in the crystallization solution, a Zn 2+ ion was observed adjacent to the active site as a strong  Values in parentheses are for the outermost resolution shells  electron density. It was further validated by using X-ray fluorescence scan analysis and anomalous difference Fourier map (Figure 2A). Based on the |F o | -|F c | omit map, it is clear that the Zn 2+ ion forms a typical tetrahedral complex with four Zn 2+ ion-chelating residues, including Glu 338 , Cys 340 , Cys 417 and Cys 418 and all the metal-to-ligand distances are about 2.3 Å (Figure 2A). Interestingly, even though no significant conformational change was caused by Zn 2+ binding, the two Zn 2+ ion-chelating residues Cys 417 and Cys 418 in the apo structure were shifted away from the original positions ( Figure 2B). Therefore, when the Zn 2+ ion is absent only a broken loop can be observed because of the absence of stabilizing interactions ( Figure 2C).

Dimerization
Interestingly, although there is only one HypBA1 monomer in an asymmetric unit, HypBA1 forms a dimer with a crystallographic symmetry-related molecule ( Figure 3A and 3B). An analysis with PDBePISA shows that the contact interface encompasses 77 residues that bury a total surface area of about 2781 Å 2 on the S-and C-domain [36]. The intermolecular forces include hydrogen bonds (not shown) and salt bridges ( Figure 3C). To confirm that HypBA1 also forms a dimer in solution, size exclusion chromatography coupled with multiangle light scattering (SEC/MALS) was conducted. SEC/MALS offers an estimate of the absolute molecular weights in solution based on the angular dependence of scattered light intensity, which is less dependent on the molecular shapes. At protein concentrations of 0.75-1.60 mg/ ml, the SEC/MALS analysis using a Superdex 200 10/300 GL column indicates that its molecular mass is 134.9-138.8 kDa, corresponding to a dimeric form of HypBA1 ( Figure 4A). As the elution peak of the SEC/MALS is relatively symmetric, the calculated molecular weight distribution indicates that the sample is monodispersed. Accordingly, our SEC/MALS data suggests that the HypBA1 protein exists as a very stable dimer in solution under low protein concentrations or at increased protein concentrations of 10 mg/ml or 20 mg/ml (data not shown).
Furthermore, attempting to determine the molecular weight of HypBA1 at an even lower concentration in solution, we applied sedimentation velocity experiment by using analytical ultracentrifugation (AUC-SV). At a concentration of 6 μM, the polypeptide was detected as a single species with a sedimentation coefficient of 7.77 S, which corresponds to a molecular mass of 143.0 kDa, the mass of a HypBA1 dimer ( Figure 4B and 4C). Taken together, the results by employing two independent biophysical methods elucidate the oligomeric state of the recombinant HypBA1 in solution. At a much higher protein concentration (ca. 10 mg/ml), a symmetric elution peak was still observed and the corresponding molecular weight also coincided with the dimeric forms of HypBA1. The results are consistent with the AUC-SV data, which showed that HypBA1 was dimeric at a low protein concentration and only a small population of higher oligomers emerged when the protein concentration was increased (Figure 4). Both the results of the SEC/MALS and AUC-SV analyses are in a good agreement with the crystallographic findings that HypBA1 exhibits as a very stable dimerization propensity.

Ligand binding and substrate modeling
The |F o | -|F c | omit map and anomalous difference Fourier map of the bound L-Araf and Zn 2+ ion are both very clear, respectively ( Figure  2A). Based on the binding mode of L-Araf, there are ten hydrogen bonds between the sugar unit and eight amino acid residues including Gln 45 , His 142 , His 194 , His 270 , Glu 322 , Glu 338 , Tyr 386 and Cys 415 ( Figure 5A), but not Glu 366 . According to the relative spatial positions of L-Araf and Zn 2+ ion in the complex structure, Zn 2+ ion might not directly involve in the catalytic reaction because the Zn 2+ ion is distantly located to the C1 of product (5.0 Å) (data not shown). Moreover, the configuration of Zn 2+ ion also makes it unlikely to activate a water molecule for catalysis (which would turn out an inverted α-sugar). This is because the Zn 2+ ion has formed an almost perfect tetrahedral coordination with Glu 338 , Cys 340 , Cys 417 and Cys 418 (Figure 2A).
By analyzing the HypBA1-L-Araf complex structure, we believe that the bound L-Araf (product) corresponds to the -1 subsite. The space adjacent to the O2 atom is too small to accommodate a sugar residue. To further elucidate possible substrate binding mode, a twosugar-units substrate (Araf-β1,2-Araf; β-Ara 2 ) was manually modeled into the potential subsites from -1 to +1 in the substrate-binding cavity ( Figure 5B). The model was subsequently subjected to several cycles of energy minimization with CNS [24]. Interestingly, the location of the simulated model seems almost fit the size and shape of calculated cavity map ( Figure 5B), which is generated by using the web server POCASA (POcket-CAvity Search Application) [37]. Therefore, the accuracy of the simulated model is justified. In this model, the side chain of Glu 322 also binds to the O1 of the +1 sugar, which is in turn bound to Tyr 386 , and two other residues Gln 44 and Tyr 250 can interact with the O5 of the same +1 sugar. Beyond the +1 sugar, there is no room to accommodate additional L-Araf and Hyp units unless the capping loop is opened. How the enzyme binds to Ara 2 -and Ara 3 -Hyp remains to be elucidated.

Proposed catalytic mechanism
As previously mentioned, three potential catalytic residues (Glu 322 , Glu 338 and Glu 366 ) have been proposed. Among them, Glu 366 is too far away from the substrate-binding cavity and unlikely to participate directly in the catalytic reaction ( Figure 1A). However, the mutant E366A had 16% activity left in a previous study [11]. Consequently, Glu 366 might play a role in structural stability, although not participating directly in the catalytic reaction. By contrast, the residues Glu 322 and Glu 338 are more reasonable catalytic amino acids due to their proximal locations. In the crystal structure of HypBA1-L-Araf complex, Glu 322 is hydrogen bonded to the O1 atom of the sugar and it is in a good position for the general acid/base role. In the absence of L-Araf, Glu 322 is probably hydrogen bonded to His 270 , which is supposed to be protonated. When His 270 turns to interact with the sugar, the proton may remain bound to Glu 322 , making it a good general acid catalyst. On the other side Glu 322 is hydrogen bonded to Tyr 336 , which may also serve as a proton donor. Both His 270 and Tyr 336 are strictly conserved among the GH127 enzymes ( Figure 3).
Regarding the nucleophile, Glu 338 binds to the O2 atom but it is nearly 4 Å away from C1, too far and unlikely to undertake this role, although it can be fully ionized by binding to Zn 2+ . However, the finding that the mutant E338A only had 0.0013% activity left in a previous study [11] clearly shows that Glu 338 plays an important role in ligand recognition and Zn 2+ binding. Very recently, Ito solved the Zn 2+ ion-containing HypBA1 structure in ligand-free and complex forms [12]. The subsequent structure-based mutagenesis and biochemical analysis, in conjunction with quantum mechanical calculations, allowed Ito to make a clear proposal that the nucleophile should be Cys 417 rather than Glu 338 [12]. In our apo structure, the Cys 417 -containing loop is shifted away due to the disappearance of all coordinate bonds in the absence of Zn 2+ ion ( Figure 2C). Cys 417 is thus diverted from the attack position, and probably is also protonated, disabling its role as the nucleophile. Therefore, Zn 2+ ion is involved in the catalytic reaction through maintaining the proper configuration of active site.
As said by Ito, however, we cannot rule out other possibility of catalytic reaction mechanism, such as the utilization of two carboxylate residues (Glu 322 and Glu 338 ) separated by a suitable distance (5.4 Å) for retaining mechanism. In this case, the bound L-Araf should represent the +1 sugar and the -1 sugar would be severely skewed to fit into the limited space, which is not likely. On the other hand, a recent review suggests that some GH families employ novel mechanisms instead of typical carboxylate base/nucleophile, including substrateassisted mechanisms, proton transferring network, utilization of noncarboxylate residues and utilization of an exogenous base/nucleophile [38]. Interestingly, apart from Glu 322 , Glu 338 , Cys 340 , Cys 417 and Cys 418 , Tyr 386 is also strictly conserved among several GH127 members ( Figure  3). The side chain of Tyr 386 is equally close to the C1 of L-Araf (3.2 Å) as is that of Cys 417 , and it may correspond to the non-carboxylate residue in an alternative mechanism. However, the lack of a base to subtract its proton renders Tyr 386 a weak nucleophile. Consequently, the most reasonable catalytic mechanism may involve a Cys 417 -sugar intermediate, as shown in Scheme 1. Besides the use of a different nucleophile (Cys 417 rather than an Asp or Glu) in the first step, the remaining steps are almost the same as those of classic retaining mechanism.
In summary, the native, apo and complex crystal structures of HypBA1 give us a first glimpse of the GH127 family with respect to protein folding and catalytic mechanism. The results presented here shall provide a critical starting point and a firm basis for further studies of the GH127 family. In addition to the catalytic S-domain, HypBA1 also contains N-domain and C-domain, the latter participating in dimer formation. To investigate the functions of this novel multidomain protein, further experiments with mutagenesis and truncation are required. 007-MY2). The synchrotron data collection was conducted at beam line BL13B1 and BL13C1 of NSRRC (National Synchrotron Radiation Research Center, Taiwan).