High Resolution Mass Spectrometry Improves Data Quantity and Quality as Compared to Unit Mass Resolution Mass Spectrometry in High- Throughput Profiling Metabolomics

Volume 4 • Issue 2 • 1000132 includes the precursor unit mass profile (including adducts, in-source fragments, isotopes, etc.), retention time, and MS/MS spectra on the ions from the authentic standard. Experimental data is then searched against this library and detected compounds are rapidly identified. This multi-criteria authentic standard library dramatically diminishes the need for HRAM data for compound identification since the multiple data streams (i.e., mass, retention and fragmentation pattern), provide the needed specificity to make identifications. Using this library, our methodology monitors each sample for over 3200 endogenous and exogenous metabolites. In addition, this library includes over 4000 chemicals whose identities have yet to be determined (unknowns). It is important to note that while this library consists of such a large number of compounds, not all compounds are detected in each experimental analysis on a routine basis. Many are matrix specific; for example, found in cells or urine only. Others may be species specific or disease specific. The field at large has yet to agree on the number of metabolites that are routinely detected in mammalian or plant species and there is much debate about how many should, can or will ultimately be detectable; estimates range from the low hundreds to many thousands [4, 27, 28].


Introduction
Metabolomics has repeatedly demonstrated utility in identifying biomarkers, elucidating disease and treatment mode of action, bioprocess improvement and other areas of study [1][2][3][4][5][6][7]. The instrumentation applied to metabolomics varies widely depending on the approach used and the desired properties of the final dataset. NMR is often utilized when rapid classification of study samples is needed; however, NMR is limited by low sensitivity [8][9][10][11][12][13]. Triple quadrupole mass spectrometers are often used when the desired output is the quantification of a specific subset of known biochemicals, often referred to as targeted metabolomics [14][15][16]; however, this approach is blind to novel changes and novel biochemicals. Finally, there is small molecule profiling, otherwise known as non-targeted metabolomics, which aims to detect and semi-quantify as many biochemicals, both known and unknown, as possible. This approach is often used to discover new insights into biological phenomenon, but presents challenges in compound identification and data processing. It is often necessary to utilize several of the above noted techniques in combination. For example, non-targeted metabolomics techniques may be used to discover biomarkers followed by targeted metabolomics, based on standard analytical chemistry techniques, to validate the biomarkers [17][18][19].
The current non-targeted high-throughput biochemical profiling approach utilized by our group differs from many other methodologies in the field, which typically rely on High Resolution Accurate Mass (HRAM) data output to drive compound identification. A great deal of literature has been focused on how to best utilize these data streams for compound identification [20][21][22][23][24][25]. In our approach, rather than relying on HRAM data to identify biochemicals, identifications are based on multiple orthogonal criteria to a unit mass spectral library built from authentic standards, so called tier 1 identifications [26]. This library

Abstract
Metabolomics is a technique in which the small molecule component from a biological source material is analyzed for changes resulting from some set of test conditions. Liquid chromatography tandem mass spectrometry (LC/MS/MS) methods are commonly used because of the sensitivity and specificity of the data collected. The sensitivity of these methods permit the detection of a large number of small molecules, leading to greater coverage of the biochemical pathways involved in the system being tested. The success of metabolomic studies are partially reliant upon instrumentation, but to what extent? Here we present an evaluation of the analytical attributes of a high resolution accurate mass (HRAM) orbitrap based mass spectrometer compared to a unit mass resolution (UMR) ion-trap mass spectrometer as applied to high-throughput, non-targeted metabolomics. To carry out this evaluation, different sets of samples were analyzed and the data evaluated for analytical performance. Two dilution series of authentic standards demonstrated that the HRAM data stream improved the limit of detection from several fold to several orders of magnitude and showed an increased linear dynamic range of an order of magnitude over the UMR data stream. Analysis of a biological serum sample set demonstrated that the HRAM data stream enabled the detection of 118 additional named/known compounds, leading to the detection of 531 tier 1 and tier 2 identified compounds in human serum, with decreased process variability, increased consistency and accuracy of detection and integration.
Even though, given our methodology, accurate mass instrumentation is not necessary for compound identification, we wanted to assess the other potential analytical benefits, above and beyond compound identification, from the use of HRAM data on our nontargeted metabolomics methodology [29]. The goal of this evaluation was to compare and contrast the analytical performance characteristics of HRAM data to Unit Mass Resolution (UMR) data. The analyses included assessing Linear Dynamic Range (LDR), Limit Of Detection (LOD)/sensitivity, scan rate, and mass accuracy, then determining how these different factors impacted the process variability, the number of compounds and features detected, and the overall quality of data in a biological non-targeted metabolomics analysis.
To perform this evaluation, separate sets of data were analyzed. To compare the limit of detection/sensitivity and linear dynamic range for the different instrument data streams two different dilution series of isotopically labeled standards, ranging from 0.05 ng/mL to 250,000 ng/ mL, and spanning almost seven orders of magnitude, were analyzed. The different dilution series were designed to assess different aspects of the sensitivity profile of the instruments. One series contained standards which spanned chromatographic time and was analyzed using reverse-phase chromatography while the other spanned a wider mass range and was analyzed using Hydrophilic Interaction Liquid Chromatography (HILIC). The next set of data analyzed was from 30 individual human serum samples, and 11 Quality Control (QC) samples, which included six technical replicates of a pool of aliquots from each of the 30 serum samples [30] to assess process variability, and five water blanks used to identify process contributed artifacts. The individual serum and QC samples were used to compare the analytical performance of the instrument data streams based on the total number of chromatographic peaks detected, the total number of named/ known compounds that were detected and identified, scan speed, mass accuracy and precision, process variability, reproducibility/consistency and accuracy of detection and integration.
While we have not focused on data processing software in this manuscript, without such tools and methodologies the robust analysis of the data would not have been possible. It is well established that one challenge of any high-throughput screening methodology is data processing. A great deal of previous work has established the necessary software applications, tools and methodologies in order to permit rapid compound detection, integration, identification and QC of the data streams being analyzed in this study [31][32][33].

Sample Material
Found in Supplementary Information 1.

Sample Preparation
Dilution Series: The aliquots were analyzed on two separate ThermoFisher Scientific (Waltham, MA) mass spectrometers; a Linear Ion-Trap (LTQ) and an Orbitrap (Q-Exactive), to determine the limit of detection and the linear dynamic range of each instrument for each standard. The two different series dilutions were prepared; one destined for a reverse-phase chromatographic method and the other for a Hydrophilic Interaction Liquid Chromatographic (HILIC) method. The dilution series of standards ranged from 0.05 ng/mL to 250,000 ng/mL and included one blank. For the reverse-phase dilution series, aliquots were dried and then reconstituted with 100 µL 0.1% formic acid in water. The list of standards in the reverse-phase dilution series can be found in Supplementary Information 2. For the HILIC dilution series of energy metabolites, 50 µL aliquots were plated into two 96-well PCR plates each at twice the final concentration in 60/40 acetonitrile/10mM ammonium formate buffer (pH 10.6) and brought to final concentration with 50 µL acetonitrile. The list of standards in the HILIC dilution series can be found in Table 1 .

Biological Samples
Biological samples were stored at -80°C until needed and then thawed on ice just prior to extraction. Extraction of samples was executed using an automated liquid handling robot (Hamilton LabStar, Hamilton Robotics, Inc., Reno, NV), where 450 µL of methanol was added to 100 µl of sample to precipitate proteins. The methanol contained four recovery standards, DL-2-fluorophenylglycine, tridecanoic acid, d6-cholesterol and 4-chlorophenylalanine to allow confirmation of extraction efficiency. Four aliquots of each sample were taken from the extract and dried. For serum samples, two aliquots of each sample were reconstituted in 50 µL of 6.5 mM ammonium bicarbonate in water (pH 8) for the negative ion analysis and another two aliquots of each were reconstituted using 50 µL 0.1% formic acid in water (pH ~3.5) for the positive ion method. Urine samples were extracted similarly but reconstituted with 100 µL of reconstitution solvent. Reconstitution solvents contained instrument internal standards (listed in Supplementary Information 2) to assess instrument performance and to serve as retention index markers for chromatographic alignment. Extracts of a pooled serum sample were injected six times for each data set on each instrument to assess process variability and five water aliquots were also extracted and analyzed to serve as process blanks for artifact determination.

UPLC Method
Separations were performed using a Waters Acquity UPLC (Waters, Milford, MA). Reverse-phase (RP) positive ion method analysis used mobile phases consisting of 0.1% formic acid in water (A) and 0.1% formic acid in methanol (B). Reverse-phase negative ion analysis used mobile phases consisting of 6.5 mM ammonium bicarbonate in water, pH 8 (A) and 6.5 mM ammonium bicarbonate in 95% methanol/ 5% water (B). The gradient profiles can be found in Supplementary Information 3. The sample injection volume was 5 µL and a 2x needle loop overfill was used. Separations utilized separate acid and base-dedicated 2.1 mm × 100 mm Waters BEH C18 1.7 µm columns held at 40°C.

Unit Mass Resolution (UMR) Method
A ThermoFisher Scientific (Waltham, MA) LTQ was the unit mass resolution instrument tested. Detailed source, MS and MS/MS settings can be found in Supplementary Information 4. For all methods, the scan range was 80-1000 m/z with a scan speed of ~4.5 scans per second (alternating between MS and MS/MS scans). The MS/MS dynamic exclusion option was enabled with the user-set exclusion duration time of 3.5 s. Calibration of the LTQ instrument was performed as needed.

High Resolution Accurate Mass (HRAM) Method
A ThermoFisher Scientific (Waltham, MA) Q-Exactive [34] was the HRAM instrument tested. Detailed source, MS and MS n settings can be found in Supplementary Information 4. The scan range was 80-1000 m/z with a scan speed of ~9 scans per second (alternating between MS and MS/MS scans), and the resolution was set to 35,000 (measured at 200 m/z). Mass calibration was performed as needed to maintain <5 ppm mass error for all standards monitored.

Data Processing and Analysis
A detailed description of data processing including chromatographic alignment, QC practices and compound identification can be found in references [31][32][33]. A brief description is provided below.

Dilution Series Analysis
To analyze the data from the dilution series experiment, the ThermoFisher Scientific software Xcalibur QuanBrowser was used for peak detection and integration. This software package targeted the specific compounds in the dilution series and permitted optimization of peak detection and integration criteria on a per-compound and a per-sample basis. The integration of each individual chromatographic peak was manually approved and the integration refined, if necessary, for each standard in each step of the series to ensure an accurate comparison of instrument performance.

Biological Sample Analysis
In-house peak detection and integration software was used whose data output was a list of m/z ratios, retention indices and area under the curve (AUC) values. User specified criteria for peak detection included thresholds for signal to noise ratio, area and width. Relative standard deviations (RSDs) of peak area were determined for each internal and recovery standard to confirm extraction efficiency, instrument performance, column integrity, chromatography and mass calibration. The biological data sets, including QC samples, were chromatographically aligned based on a retention index that utilized internal standards assigned a fixed RI value [35,36]. The RI of the experimental peak was determined by assuming a linear fit between flanking RI markers whose RI values are set.
Peaks were matched against an in-house library of authentic standards and routinely detected unknown compounds specific to the respective method. Identifications were based on retention index values within 150 RI units (~10 s), experimental precursor mass match to the library authentic standard within 0.4 m/z for the LTQ or 0.005 m/z for the accurate mass instrument and quality of MS/MS match. All proposed identifications were then manually reviewed and hand curated by an analyst who approved or rejected each identification based on the criteria above [31,32].

Dilution Series Limit of Detection (LOD)/Sensitivity
The LOD of an instrument is a direct measurement of an instrument's capability to distinguish a compound's signal from any noise present in the mass channel. The lower the limit of detection, the more sensitive the instrument is and therefore the more signals can be detected and/or distinguished from noise. In the application of non-targeted metabolomics, it is critical to be able to detect as many compounds as possible and therefore any technology that offers lower limits of detection and improved sensitivity provides increased compound detection. To compare the limits of detection and therefore sensitivity of each instrument, two separate dilution series, one using reverse-phase chromatography and the other hydrophilic interaction liquid chromatography (HILIC), were run. Each dilution series contained a unique set of compounds used to assess different aspects of instrument sensitivity; the reverse-phase dilution series standards spanned the chromatographic time window and the HILIC dilution series standards covered a wider mass range. The reverse-phase dilution series included nine labeled standards ranging in concentration from 0.05 ng/mL to 250,000 ng/mL, with each concentration being run in triplicate.
This dilution series demonstrated that the HRAM data stream had consistently lower LODs than the UMR data stream ( Supplementary  Information 2) in scanning mode. The degree of improved sensitivity ranged from several fold to several orders of magnitude. The LOD was determined as the lowest concentration where a discernible and reproducible peak could be detected and/or distinguished from the background in all technical replicates and demonstrated dilution from the next higher concentration.
The improvement in sensitivity is likely a result of the decreased noise associated with the smaller isolation window utilized with the HRAM data. The HRAM data demonstrated better than 3 ppm mass accuracy for the dilution standards and therefore, when integrating peaks, a 5 ppm mass window could confidently be used to detect and quantify these standards. This meant that instead of having to use a total mass window of 0.4 m/z, which was used for the UMR analysis, a much smaller mass window could have been used to isolate the same analytical signature. As an example, the mass window of 0.001 m/z could be used to isolate the analytical signature for d3-leucine on the HRAM data, while for this same signature in the UMR data one would have to use a 0.4 m/z window. Ultimately, using smaller mass windows included significantly less noise thus improving the signal/noise ratio. The reasoning for assessing the difference in sensitivity of these two instruments using dilution series was that often the noise associated with the HRAM data stream was minimal to non-existent, therefore sensitivity was determined as the first concentration where a signal was detected.
It should be noted that the HRAM instrument tested contained a different and newer source design than the UMR instrument. It is possible that the new source design increased signal, which in addition to the reduction in noise due to tighter mass tolerance, improved the sensitivities of these standards as well. In order to better assess the relative contribution of improved sensitivity resulting from an increase in signal or a decrease in noise we performed another dilution series which included several higher mass energy metabolites analyzed using hydrophilic interaction liquid chromatography (HILIC). In general, the amount of background noise in a mass spectrometer is higher at lower masses and decreases with higher masses. This low mass noise is primarily from solvent clusters and contaminants. If the gain in sensitivity seen is driven by a reduction of noise, then at higher masses the difference between the HRAM data and the UMR data would be less pronounced as there is less noise in the higher mass region. Table  1 shows that the LOD/sensitivity for the higher mass standards are the same or very similar between the HRAM and UMR instruments. This data supports the theory that the improvement in sensitivity between these instruments is primarily due to the reduction of noise provided by the tighter mass window tolerance.

Number of Compounds and Peaks Detected in a Biological Sample Set
The dilution series data demonstrated the improved sensitivity of the HRAM data over UMR data using standards in a neat environment. In order to understand the implications of improved sensitivity on biologically variable and complex samples, a sample set consisting of 30 individual human serum samples, along with QC samples, was analyzed. This sample set was run on both instruments and the data monitored for approximately 3200 known compounds using an inhouse authentic standards library. Identification of compounds was based on three criteria: 1) mass match within 0.4 m/z for the UMR data and a very conservative value of 0.005 m/z for the HRAM data, 2) fragmentation spectral match, and 3) retention time/index match, all to the authentic standard library entry for each compound. Data was manually inspected to remove compounds not present with at least 3x greater concentration than the corresponding peaks found in the water blanks and to assess quality of fragmentation spectral match and consistency of integration of peaks sample to sample [31,32].  Supplementary Information 13 and 14). This data shows that the HRAM data enabled the detection of more compounds when monitoring for positive or negative ions as compared to UMR data. In total, the HRAM data stream permitted the detection of an additional 118 unique compounds over the UMR data stream.
The major factors contributing to the increase in the number of compounds detected by using HRAM data over UMR data are the added mass resolution and sensitivity permitting the detection of metabolites whose masses could not be resolved previously, either from other compounds or from noise. Many examples of these phenomena were seen in the data. Figure 2 demonstrates the ability of the improved mass resolution to separate two known and co-eluting compounds. In this example, the HRAM data stream (Figure 2A) permitted the clean detection of the significantly lower intensity N-acetylglutamine peak underneath the N6-acetyllysine peak, whereas N-acetylglutamine was masked by N6-acetyllysine in the UMR data ( Figure 2B). There were also examples in the data where a metabolite could not be differentiated from the noise in the UMR data but could reproducibly be detected using HRAM data ( Figure 3). In Figure 3, the HRAM data stream ( Figure 3A) is able to distinguish the family of methylxanthine compounds from a noisy mass channel that masks the family almost entirely in the UMR data ( Figure 3B). In addition to detecting more unique/new compounds, the HRAM data also showed improved consistency of detection, with more compounds being detected in 100% of the experimental samples in the HRAM data than the UMR data ( Supplementary Information 5).
The number of named compounds varies in different matrices. The data presented here is from human serum, which is a relatively simple matrix, in terms of number of compounds, when compared to other matrices like urine or feces. For example, initial data using HRAM instrumentation have demonstrated that we were able to detect almost 800 named compounds in feces (data not shown). It is important to note that this large increase in detected compounds is accompanied by a large increase in the number of chromatographic/mass peaks (ionfeatures) detected. The HRAM data stream produced three to four times more ion features than the UMR data stream ( Supplementary  Information 6). The ~3-4x increase in ion-features did not translate into 3-4x the number of compounds that were detected, because these additional peaks had multiple sources. Some of these additional peaks were from the detection of newly detectable known compounds as a result of the improved sensitivity and mass resolution, as evidenced by the increased number of named compounds detected. However, in addition to these newly detected compounds, some of these new peaks are simply new redundant measurements of the same parent compound in the form of new adducts, in-source fragments, and multimers not previously detected for an individual metabolite [20,37,38]. Finally, some of the additional peaks detected could represent compounds not previously detected or characterized. As a result, data mining will likely add to the number of named compounds.

Process Variability
The overall process variability of an analytical method contributes significantly to the ability to effectively detect changes in concentrations of compounds within a biologically variable sample set. The lower the process variability of the measurement for any given compound the smaller the biologically relevant concentration shift which can be accurately and reproducibly detected. In this way, it becomes imperative to continually reduce the process variability in order to detect smaller yet statistically significant biological concentration changes.
Biological variability is routinely much higher than process variability. Therefore the technical replicates of the serum samples were used to assess the process variability. In order to assess the effect these two data streams had on overall process variability, the median Relative Standard Deviation (RSD) for the compounds (excluding internal standards) detected in 100% of the technical replicates (6 total) was determined. The total process variability of the HRAM data set was reduced 50% as compared to the UMR data for both the positive and negative ion methods. The median RSD went from 13 in the UMR data to 6 in the HRAM data, even though the HRAM data set detected more total compounds ( Supplementary Information 7). This reduction in process variability seems to be mostly a result of improved consistency and quality of peak detection and integration, again due to the reduction of noise associated with the tighter mass tolerance that can be utilized in the HRAM data. In Figure 4, the selected ion chromatogram for N-acetylhistidine is shown from two individual human urine samples (black and red traces respectively) from UMR data ( Figure 4A) and from HRAM data ( Figure 4B). Given the more clearly defined peak start and stop in the HRAM data ( Figure 4B), automated software could more readily detect and integrate peaks as compared to the UMR data shown in Figure 4A, thus driving reduced overall process variability.
Another observation from this data is that the noise associated UMR data can mask the differences in concentration between samples as seen in Figure 4. In this example, the reduced noise of the HRAM data stream ( Figure 4B) permitted the detection of the difference in concentration of N-acetylhistidine between these two different urine samples that was completely masked in the noisier UMR data ( Figure  4A). In this way, the use of HRAM data permitted the detection of potentially significant biological changes in concentration that was obscured in the UMR data.

Dilution Series Linear Linear Dynamic Range (LDR)
The LDR of an instrument is a measurement of an instrument's ability to accurately represent concentration changes seen in experimental samples. To compare the LDRs of the HRAM and UMR instruments, the reverse-phase dilution series data was used. For this analysis, the data from the nine labeled standards in the dilution series were fitted with a linear trend line. When the area response for each concentration deviated enough to shift the linear fit beyond an R 2 of 0.98 the concentration was considered to have become non-linear ( Supplementary Information 8). Comparing the data streams the HRAM data demonstrated an overall increased LDR compared to the UMR data. The average LDR for the HRAM data was four orders of magnitude, while the UMR data had an average LDR of three orders of magnitude ( Supplementary Information 9). Interestingly, the UMR data was capable, in several cases, of improved linear behavior at higher concentrations while still demonstrating an overall reduced LDR as compared to the HRAM dataset. This overall reduced LDR was a result of the decreased sensitivity at low concentrations for the UMR data ( Supplementary Information 2). A summary of all LOD and LDR data for each standard can be found in Supplementary Information 10. As expected, all standards on both instruments demonstrated deviations from linear area response behavior at high concentrations ( Supplementary Information 8-11). This deviation from linearity likely derives from electrospray saturation. Even though both instruments   : ESI+ extracted ion chromatogram of the mass for N-acetylhistidine from two different human urine extracts (black and red traces respectively) from UMR data (A) and HRAM data (B). Precise mass and mass windows used are labeled on each panel. The added mass resolution and accuracy of the HRAM data stream permitted the detection of the difference in concentration of N-acetylhistidine between these two samples that was masked in the noisier UMR data. This data was reproduced on an alternate HRAM data stream instrument to confirm finding (data not shown). displayed deviations from linear behavior, the replicate injections demonstrated a high degree of precision in the determination of area, with the HRAM instrument demonstrating tighter precision than the UMR instrument ( Supplementary Information 11). In addition, while the area response deviated from linear behavior, even at high concentration the mass measurements were still within a 5ppm tolerance (data not shown) for the HRAM instrument.

Scan Speed
The scan speed comparison of these instruments is highly dependent on the methods and instrument settings used. The UMR instrument scanned approximately 4.5 scans per second, which permitted adequate sensitivity and UHPLC peak sampling. When operated at 35,000 resolution (measured at 200 m/z) the HRAM instrument scanned two times faster than the UMR instrument (Supplementary Information 12 and Figure 3, vertical lines). This increased scan speed likely had some positive influence on the observed reduction of process variability, but more practically, this improved scan speed opened a great deal of flexibility around method development. For example, given our requirement of MS/MS spectrum match for compound identification, the HRAM instrument scanned fast enough that one could choose to take one full mass MS scan followed by two or even three data dependent MS/MS scans (top 2 or 3, respectively) and still obtain more full mass MS scans in a second, which are used for determining the area under the curve for quantification purposes ( Supplementary  Information 12). Others might find the ability to increase the mass resolution of the instrument, thus lowering the scan speed in order to gain mass resolution, more important to their specific application. Either way the added scan capacity opens up more options for the user. For our instrument comparison the same scan profile was maintained, specifically one MS scan followed by one MS/MS scan (top 1), for all of the reported comparisons.

Accurate Mass for Unknown Identification
While the presented work focused on the known or named compounds detected, a large component of any global screening metabolomics method is the detection of unknowns. Unknowns or unnamed compounds are compounds with reproducible mass, retention and fragmentation characteristics but for which a precise identity has yet to be determined. While the majority of compounds detected in the serum sample set were named/known compounds matched to authentic standard library entries, many were unknown compounds. There is certainly a great deal of work by the community focused on identifying similar unknowns found in various biological data since these can often be distinguishing biomarkers or display important correlations with the study design at hand. By using the HRAM stream of data the mass assignments of these unknowns can be automatically used to identify or aid in identification of these unknowns. For many of the lower mass unknowns detected in biological data the accurate mass data can lead to a unique molecular formula, particularly when fragmentation and isotope ratios are included in the determination [21,22]. Therefore these instruments also provide powerful additional information to aid in the identification of unknowns without need for additional sample analyses. In the time between the analysis and data processing of the data presented in this manuscript and more recent data analyzed in human serum the number of named/known compounds has increased to over 600 (data not shown) as a result of the accurate mass data stream permitting us to identify and annotate unknowns.

Conclusion
Our results demonstrate the utility of HRAM data above and beyond its use for compound identification. The HRAM data offered significant analytical benefits to every aspect of data quality investigated and improved downstream data processing of high throughput metabolomics data. The HRAM data, mostly through the reduction of noise and interferences, demonstrated greater sensitivity, wider dynamic range, reduced process variability and permitted the detection of more compounds than the UMR data without detrimental effect on scan speed. While only orbitrap-based HRAM instrumentation was directly evaluated, it is likely that other accurate mass instrumentation, such as ToF, will demonstrate similar analytical benefits [39]. While metabolomics, as a whole, is far from being only an instrumentation problem, our results indicate that the HRAM data stream demonstrated significant analytical improvement. In addition to the benefit of accurate mass for compound identification, this type of instrumentation is likely to be extremely beneficial to practitioners of non-targeted metabolomics.