Diversity of Hydrocarbon-Related Catabolic Genes in Oil Samples from Potiguar Basin (RN, Brazil)

Biodegradation may result in physicochemical changes in crude oil and natural gas properties, being responsible for the decrease of saturated hydrocarbons and yielding heavy oil with low economic value. Studies on the diversity of microbial catabolic genes in oil reservoirs are scarce and could help to predict the potential of a petroleum sample to be biodegraded. The aim of this study was to evaluate the diversity of genes involved in hydrocarbon degradation in Brazilian petroleum samples (biodegraded and non-biodegraded) through the construction and analysis of gene libraries (alkane monooxygenase – alk, dioxygenase – ARHDs and 6-oxocyclohex-1-ene-1-carbonyl-CoA hydrolase bamA). The results showed a differential distribution of catabolic genes between the sites, being the biodegraded oil more diverse for the alk and bamA genes. Sequences were similar to the alkB genes from Geobacillus thermoleovorans and several species of Acinetobacter, to ARHD genes from Pseudomonas spp. and two species of Burkholderia, and to bamA genes from deltaproteobacteria. Interestingly, most of the catabolic sequences recovered from both petroleum reservoirs grouped together forming distinct clusters in the phylogenetic tree reconstruction and may correspond to potentially new genes, possibly harbored by yet uncultivated microorganisms. This is the first report on the detection of alk, ARHD and bamA genes in petroleum reservoir environments, demonstrating the genetic potential of such microbial communities to biodegrade the oil. Diversity of Hydrocarbon-Related Catabolic Genes in Oil Samples from Potiguar Basin (RN, Brazil)


Introduction
Most of the world's oil is biodegraded and while the effects of biodegradation on the molecular composition and physical properties of crude oil and natural gas are relatively well known empirically, the actual processes that occur during the biodegradation of oil in deep reservoirs (below some hundred meters) remain unclear [1]. During biodegradation, the hydrocarbon content is transformed, with a consequent increase in oil density, sulfur content, acidity and viscosity. All these factors interfere with the extraction and refining operations, resulting in significant economic losses [2]. On the other hand, the mechanisms of oil degradation in petroleum reservoirs, as well as the microorganisms involved are still poorly understood.
Oil is a complex mixture of hydrocarbons such as saturated, unsaturated, linear, monoaromatic and polycyclic hydrocarbons [3,4]. Each of these compounds is biodegraded through different routes of several steps. Many studies have already demonstrated the existence of large and diverse populations of microbes with different metabolic activities in petroleum systems [5,6].
Nonetheless, knowledge of the diversity of catabolic genes involved in the processes of degradation of oil is still scarce. Information on community composition of bacteria and genes involved in the biodegradation process can bring light on the understanding of microbial metabolic pathway preferences, as well as a better application of specific microorganisms in biodegradation or bioremediation processes. Techniques able to identify the genes involved in the catabolic degradation of hydrocarbons are valuable tools for elucidating the structure of the microbial community that is truly functional in the environment.
The functional gene involved in the aerobic degradation of aromatic hydrocarbons, the dioxygenase protein coding gene, was also analyzed in this study. The first step in the degradation of aromatic hydrocarbons usually occurs through the incorporation of molecular oxygen in aromatic nucleus by a multicomponent enzyme system, forming cisdihydrodiol. In this system, the terminal dioxygenase is composed of one large α subunit and one small β subunit [22]. The α-subunit contains two conserved regions: the Rieske center [Fe2-S2] and the catalytic domain containing mononuclear iron [23]. The dioxygenases belong to a large family known as aromatic-ring-hydroxylating dioxygenases (ARHDs) [24]. These genes are located in chromosomal or plasmid DNA and were identified in bacterial strains belonging to α-Proteobacteria (Sphingomonas) [25], β-Proteobacteria (Alcaligenes, Burkholderia, Commamonas, Polaromonas, Ralstonia) [26][27][28] and γ-Proteobacteria (Pseudomonas) [29].
The process of biodegradation in subsurface reservoirs is often anaerobic in nature [2,6]. Most degraded oils contain metabolic markers for the anaerobic degradation, such as specific reduced naftoic acids, indicating that the anaerobic metabolism of oil is the mechanism by which most biodegraded oil in the world were produced [30].
Therefore, research on the anaerobic degradation of hydrocarbons is also necessary for the understanding of the whole process of oil degradation in reservoirs. In this sense, this study also examined bamA, the functional gene involved in the degradation route of benzyl-CoA, by which most aromatic substrates (benzene, toluene, ethylbenzene, xylenes and also other aromatic compounds) are anaerobically degraded [31]. bamA gene encodes the protein 6-oxociclohex-1-ene-1-carbonyl-CoA hydrolase (6-OCH-CoA) and has been used for the detection of a wide range of anaerobic microorganisms able to degrade hydrocarbons [32]. This functional marker was recently used to detect anaerobic hydrocarbon-degrading micro-organisms in environmental samples [32], as well as to study the microbial community structure of sulfate-reducing enrichment cultures growing on petroleum hydrocarbons, where Desulfosarcina ovata was showed to be dominant [33]. bamA was also used as a biomarker to investigate the diversity of hydrocarbon-degrading bacteria under iron-reducing conditions in a leachate contaminated aquifer, with bamA sequences found to be closely related to the ones from Geobacter species and Georgfuchsia toluolica [31].
This study aimed at investigating the presence and diversity of catabolic genes involved in the hydrocarbon degradation process in biodegraded and non-biodegraded oil samples from petroleum reservoirs through the construction and analyses of alk, ARHD and bamA gene libraries. The results allowed us to evaluate the effectiveness of the primer sets used as a tool to analyze the community structure of bacteria that have the genetic potential to degrade hydrocarbons by aerobic and anaerobic pathways.

Sampling
Oil samples (naturally mixed with formation water) were collected in July 2008 from two production wells, named GMR75 and PTS1, at the onshore Potiguar Basin (Northeast, Brazil), with logistic support from CENPES/Petrobras. Sampling details as well as oil characteristics and geological settings are given in Silva et al. [34].
Total DNA extraction and PCR amplification alk, ARHD and bamA genes DNA extraction from oil/formation water samples was carried out using the PowerSoil TM DNA Isolation Kit (MoBio Laboratories, California), according to the manufacturer´s instructions. Five tubes from the kit were used to start with the DNA extraction. At the end of the procedure, DNA samples obtained from the five tubes were pooled, concentrated and used for subsequent PCR amplification. Crude oil/ formation water samples were not filtered or centrifuged before DNA extraction. The PCR amplifications of the three functional genes were performed independently using the respective set of primers (Table  1) [35]. Twenty five microliter reaction mixtures contained 5 µl of total DNA, 1 U Platinum Taq DNA polymerase (Invitrogen), 0.2 mM dNTP mix (GE Healthcare), 1.2 µM of each primer, 1X Taq buffer and 1.5 mM MgCl 2 . PCR amplifications were done using an Eppendorf Mastercycler Gradient (Eppendorf Scientific, New York, USA) with the programs shown in table 2. The amplification products were checked by electrophoresis in 1.2% (wt vol -1 ) Agarose gels. Four replicate PCR reactions were done for each sample with each pair of primers. The corresponding PCR replicates were then pooled and concentrated in a speed vacuum concentrator 5301 Eppendorf, A-2-VC rotor.

Clone library construction and sequencing
The PCR products amplified with functional gene-targeted primer pairs were cloned using the pGEM-T Easy Vector (Promega), according to manufacturer's guidelines, and transformed into Escherichia coli JM 109 cells. The functional gene-targeted inserts were amplified from plasmid DNA of selected clones using the universal the M13 forward (5'-CGCCAG GGT TTT CCC AGT CAC GAC-3') and reverse (5'-TTT CAC ACA GGA AAC AGC TAT GAC-3') primers. PCR was performed in a 50 μL-reaction volume, containing 1 μL of an overnight clone culture, 0.4 μM each primer, 0.2 mM dNTP mix, 2 U Taq DNA polymerase (Invitrogen), 1X Taq buffer, and 1.5 mM MgCl 2 . The amplification program consisted of an initial denaturation step at 94°C for 3 min, followed by 30 cycles of 94°C/20 s, 60°C/20 s, and 72°C/90 s. PCR products were purified as previously described for automated sequencing in the MegaBace DNA Analysis System 500 (GE Healthcare) and ABI Prism 377 DNA Sequencer IM (Applied Biosystems). The sequencing was carried out with the corresponding primers used for PCR amplification. The sequencing programs consisted of 30 cycles of

Phylogenetic analysis
The partial sequences from GMR75 and PTS01 clones were assembled in a contig (consensus sequence) using the phred/Phrap/ CONSED program [36]. Nucleotide sequences were translated into protein sequences using the translate tool in the ExPASy (Expert Protein Analysis System) proteomics server of the Swiss Institute of Bioinformatics (http://us.expasy.org/tools/dna.html). Deduced protein sequences from each library were grouped into Operational Protein Families (OPFs) using MOTHUR version 1.14.0 [37]. The cutoff value was selected after initial examination of phylogenetic trees that included all protein sequences generated by neighbor-joining analysis [38] using MEGA5 software [39], with 1,000 bootstrap replicates. Deduced protein sequences were aligned using ClustalX, version 1.83 [40], with related reference sequences identified from BLASTp searches, in addition to other sequences of several representative alkB, ARHD and bamA sequences identified from the literature.

Diversity indices and statistics
Diversity index calculations (α-diversity measures) were performed individually for each catabolic gene library with the program MOTHUR [37]. Rarefaction curves were calculated using the OPF cutoff values established based on MOTHUR analyses in combination with phylogenetic clustering of the catabolic sequences. In addition, the Shannon (H') diversity index, Simpson index [41], and the nonparametric richness estimators ACE (abundance-based coverage estimator) [42] and Chao1 [43] were calculated for each community based on each target gene.

Nucleotide sequences accession numbers
The partial sequences of alk, ARHD and bamA genes determined in this study for the environmental clones were deposited at the Genbank database under the accession numbers: alk -JX171303-JX171314, ARHD -JX512729-JX512738 and bamA -JX512739-JX512747. It is worth to note that only one representative sequence of each OPF was deposited at the database.

Results and Discussion
The composition of microbial communities in samples from Brazilian petroleum fields was determined by the construction and analysis of three functional gene libraries: alk, ARHDs and bamA. In the case of alk gene libraries, 63 clones from GMR75 oil sample (biodegraded) and 87 clones from PTS01 oil sample (non-biodegraded) were obtained. For the ARHDs gene libraries, 88 clones from GMR75 sample and 80 clones from PTS01 sample were obtained, and about 90 clones from GMR75 library and 93 clones from PTS01 library were obtained for the bamA gene.
Operational Protein Family (OPF) was defined based on cutoff values at 93% dissimilarity level between sequences [44] in the case of alk gene, and 96% and 97% for the ARHDs and bamA genes, respectively. These cutoff values were selected based on the analysis of the phylogenetic trees of total clones in combination with MOTHUR analysis. Shannon, ACE and CHAO1 scores indicated that the GMR75 library is more diverse than the PTS01 library for the two functional genes analyzed (Table 3).
In both libraries of the alk gene, all 150 sequences showed the presence of the two internal conserved regions (motif B, EHXXGHH and motif C, NYXEHYG) [45,46] and matched with alk genes from GenBank using the BLASTX program. Seven OPFs in GMR75, 2 OPFs in PTS01 and 3 OPFs composed by sequences from both alk gene libraries were found (Figure 1).
One of these OPFs (OPF 11) matched with the alkB gene from Geobacillus thermoleovorans, which has the ability to grow in oil and its   hydrocarbon degradation capacity was reported in an extensive study of the genus Geobacillus [47]. This bacterium has been commonly isolated from petroleum reservoirs, especially high temperature oil fields [48,49]. Geobacillus spp. has attracted industrial interest due to their potential applications in biotechnological processes as sources of various thermostable enzymes [50]. Shestakova and co-authors [51] also detected the presence of Geobacillus-like organisms in the alk gene RNA library and in the 16S rRNA gene library from hydrocarbonoxidizing aerobic enrichments from a high-temperature petroleum reservoir in China, showing the predominance and functional activity of geobacilli.
Three of the alk OPFs (01, 02 and 03) matched with genes from Acinetobacter baumanii, Acinetobacter calcoaceticus and some strains of Acinetobacter spp. Acinetobacter spp. are widespread in nature and can be obtained from water, soil and living organisms. They can use various carbon sources for growth [52]. Sette et al. [53] characterized and compared the bacterial community structure of two distinct samples (non-biodegraded and highly biodegraded oils) from a petroleum field in Brazil and showed that the genus Acinetobacter was exclusive to and predominant in the biodegraded oil sample. Bacteria belonging to this genus are known to be involved in biodegradation and are considered one of the most efficient oil degraders [9,11,52]. The OPF04 was represented by only one sequence, which was recovered in a clearly separate cluster, not related to any previously identified alk sequences present in the public database ( Figure 1). This OPF is a great candidate for a putative new alk gene from a bacterium indigenous to the oil reservoir. A great number of OPFs (05, 06, 07, 08, 09 and 10) were grouped together and were somewhat distantly related to sequences of organisms represented in the databases. These results indicate the presence of novel putative alkane monooxygenase genes in petroleum reservoirs not previously identified in any other environment or isolate. Kuhn and colleagues [19] found a differential distribution of alk genes between two sites of Antarctic marine environments, and the predominant presence of new alk genes, mainly in the pristine site. Recently, Miqueletto et al. [54] also observed a higher diversity of an alkane hydroxylase gene in petroliferous soil sample in comparison to a non-petroliferous soil. The authors showed that most of the alkane hydroxylase genes corresponded to potentially new genes.
Diversity analyses clearly showed profound differences between GMR75 and PTS01 libraries. ACE and Chao estimators yielded higher richness values of alk genes in the GMR75 library (Table 3). Both Shannon and Simpson diversity indices were consistent with rarefaction analyses, showing higher diversity of alk sequences in GMR75library than in the PTS01 library (Table 3 and Figure 2).
In the ARHD gene libraries, the majority of the sequences from both libraries matched with ARHD genes from GenBank using the BLASTx program. Phylogenetic analysis revealed 3 OPFs exclusive to the GMR75 library, 6 OPFs exclusive to the PTS01 library and 1 OPF composed by sequences from both libraries (Figure 3). Five OPFs (OPFs 01, 02, 03, 04 and 05) were related with dioxygenase genes from several species of Pseudomonas. The enzymes for the degradation of aromatic hydrocarbons have been reported to be encoded by genes located on plasmids (e.g., TOL, NAH, SAL, OCT, and CAM) in Pseudomonas spp. [55]. A new genetic organization and co-regulation of a cluster of genes involved in the first steps of phenol and benzene catabolic pathways was described in Pseudomonas spp. M1 [56]. The authors showed that, differently from the established models for Pseudomonas upper pathway, the Pseudomonas spp. M1 exhibited exceptional biodegradation ability towards a wide range of toxic and/or recalcitrant compounds. On the other hand, 5 OPFs (OPFs 06, 07, 08, 09 and 10) were related with dioxygenase genes from Burkholderia cepacia and Burkholderia xenovorans. These two species have been extensively studied in terms of degradation of the low-molecular-weight PAHs naphthalene and phenanthrene [57] and detected in many crude-oil samples [34,58,59].
Cunha et al. [60] isolated bacteria of the genus Bacillus from the rock of an oil reservoir in Brazil and detected the presence of catechol dioxygenase genes in 4 Bacillus isolates by PCR. Interestingly, in the present study none of the ARHD gene sequences analyzed matched with ARHD genes from Bacillus strains represented in the databases. In addition, rarefaction analysis of ARHD genes (Figure 2) yielded plateau-shaped curves, indicating that the ARHD gene diversity in the environment was close to saturation and the sampling effort was satisfactory. These results might suggest that ARHD genes belonging to the genus Bacillus are simply not present in Potiguar oil samples, or that they could not be detected due to extremely low abundance or to bias inherent to the DNA extraction (differential lysis) and/or PCR techniques (preferential amplification).
Phylogenetic analysis of bamA gene libraries obtained from samples GMR75 and PTS01 (Figure 4) showed that nine OPFs were grouped together and were somewhat distantly related to sequences of organisms represented in the databases, being bamA genes from Synthrophus aciditrophicus and Desulfobacterium anilini the closest described sequences. Syntrophus aciditrophicus is a bacterium that degrades a wide range of organic compounds, including alcohols, fatty acids, and hydrocarbons, including methane, which are degraded syntrophically under anaerobic conditions [61]. Desulfobacterium anilini is a sulfatereducing bacterium known to be involved in phenol degradation [62]. Kuntze et al. [32] observed that the bamA gene was highly conserved among organisms studied by the group. In the present study, most of the clone sequences from the oil samples GMR75 and PTS01 were clustered into OPF02 and OPF04 (Figure 4), demonstrating that the gene under study has indeed a conserved region. In addition, all of the 183 sequences obtained from GMR75 and PTS01, showed none or low similarity level to the sequences from reference organisms. This suggests that the sequences corresponding to the bamA gene recovered from the petroleum reservoirs are potentially new sequences.
In spite of the higher diversity indices observed for the GMR75 library in the case of alk and bamA genes, the same was not observed for the ARHD gene. ACE and Chao indices indicated that the GMR75 library contained more distinct OPFs (more richness) than the PTS01 library for both catabolic genes (Table 3). Shannon and Simpson diversity indices also showed higher diversity of alk and bamA sequences in library GMR75 than in library PTS01. These results suggested that the bacterial communities of the biodegraded and non-biodegraded oil reservoirs are different also on the distribution of such catabolic genes.
A recent paper published by our research team [34] reported the microbial communities present in the same oil samples used in the present work, originated from the biodegraded (GMR75) and nonbiodegraded (PTS1) terrestrial reservoirs from Potiguar Basin, by using 16S rRNA gene libraries. The authors demonstrated likewise the presence of the genera Acinetobacter, Geobacillus, Pseudomonas and Syntrophus in the microbial community. These results are in agreement with the ones found in the present work, confirming the presence of bacteria bearing catabolic genes involved in the process of biodegradation. However, Silva et al. [34] found the predominance of Geobacillus in the microbial community of PTS1 sample, but no representative of this genus was found in the GMR75 sample. These results are in contrast with those found in the present work, since alk genes similar to Geobacillus alkane monooxygenases were found only in the GMR75 oil. These contrasting results could be explained by the fact that the sampling effort was not sufficient to cover both phylogenetic [34] and functional diversity ( Figure 2) present in these oil samples. On the other hand, Silva et al. [34] suggested that anaerobic metabolism is the dominant process for hydrocarbon degradation in GMR75 oil field, corroborating with the result that the biodegraded oil sample was more diverse for the bamA gene.
In conclusion, the combined results obtained from the catabolic gene libraries showed that both oil samples contained alk, ARHD and bamA genes related to genes present in bacterial species known to be hydrocarbon-degraders, such as Acinetobacter, Pseudomonas, Geobacillus, Burkholderia and deltaproteobacteria. Nonetheless, subsets of the catabolic sequences recovered from both petroleum reservoirs may correspond to new genes endemic to such environments, possibly harbored by yet uncultivated and unidentified microorganisms. Moreover, it is important to point out that this was the first work using degenerate primers to detect the presence of functional genes involved in the degradation of hydrocarbons in petroleum reservoirs, demonstrating the genetic potential of such microbial communities to biodegrade the oil. However, such biodegradation will only occur if the environmental conditions are appropriate for the metabolism of the degrading microorganisms. Curiously, the non-biodegraded oil sample showed to be more diverse in terms of the catabolic genes ARHD, suggesting that the environmental parameters that control biodegradation are playing a major role in differentiating both biodegraded and non-biodegraded oil reservoirs. As previously detailed in Silva et al. [34], the oil sample GMR75 has been classified as moderately biodegraded, produced from a reservoir with an in situ temperature of 42.2 o C at a depth of 535.5-540.5 m. On the other hand, the PTS1 sample has been classified as non-biodegraded, collected from a deeper and hotter reservoir, with an in situ temperature of 48.3°C at a depth of 801-803 m [63]. Both oils were collected in the same basin, but they are related to different petroleum systems, so different source rocks and reservoirs. These discrepancies in situ temperature, depth and, possibly, in source rock composition and water salinity might explain the functional differences of the hydrocarbon degrading populations between both reservoirs. It is worth to mention that studies carried out with DNA sequences shows only the bacteria with potential to degrade oil, requiring further studies at the level of gene expression and GC-MS assays for the precise evaluation of the degradation process.
Finally, the results obtained allowed us to confirm the effectiveness of the primer sets used for the analysis of the community structure of bacteria that have the genetic potential to degrade hydrocarbons by aerobic and anaerobic pathways. The application of molecular methods for the rapid detection of specific microorganisms or genes in environmental samples is a valuable tool for studies focusing on diversity, abundance and distribution of hydrocarbon degradation populations, providing new insights into the diversity of catabolic genes in the oil reservoir environment and allowing a better understanding of the role of microorganisms on the biodegradation process.