Metabolon, Incorporated, 617 Davis Drive, Suite 400, Durham, 27713 North Carolina
Received date: March 09, 2012; Accepted date: March 29, 2012; Published date: March 31, 2012
Citation: Evans AM, Mitchell MW, Dai H, DeHaven CD (2012) Categorizing Ion -Features in Liquid Chromatography/Mass Spectrometry Metobolomics Data. Metabolomics 2:110. doi:10.4172/2153-0769.1000110
Copyright: © 2012 Evans AM, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Metabolomics:Open Access
Mass spectrometry based metabolomics experiments generate copious amounts of signal data which in turn is processed to ultimately convert the signal data into identified metabolites so that biological interpretation and pathway analysis can be performed. The actual number of biochemicals detected in global biochemical profiling studies utilizing liquid chromatography coupled to mass spectrometry (LC/MS) is much lower than the total number of mass spectral ion-features detected, particularly when using positive electrospray ionization (ESI+). Given the conflicting numbers of detected metabolites reported in literature, a detailed analysis of the ion-feature composition is warranted. Ultrahigh pressure liquid chromatography (UHPLC)/Ion-trap MS and fragmentation (MS2) nominal mass data from 10 human plasma samples were analyzed in triplicate. The resulting detected ion-features were analyzed for ion-feature reproducibility, type and source. It was found that nearly 70% of all ion-features detected were non-reproducible, that 22% were from chemicals contributed to the samples due to storage and processing and that only 25% of the reproducible and annotatable ion-features could be determined to be protonated molecular ions. In addition, a previously undocumented ion-feature type; amalgam adducts, and ion-feature source; ions arising from chemistry of compounds occurring within extracted samples is reported. Ultimately, this analysis demonstrated that from an average of 10,000 ion-features detected in a human plasma sample ultimately only 220 compounds of biological origin were detected and identified from a positive ion analysis only.
Global biochemical profiling, also known as metabolomics, is a technique by which the small molecule complement, generally molecules with masses less than 1000 Da, is analyzed from biological source material for changes resulting from some set of test conditions. Metabolomics has been utilized in a wide range of applications using a wide variety of instrumentation [1-10]. For the purposes of this manuscript we focus on one of the commonly used methods to detect, quantify, and identify this small molecule complement: liquid chromatography coupled to mass spectrometry (LC/MS).
LC/MS methods are commonly used because of the sensitivity and specificity of the data collected. The sensitivity found with these methods allows an analyst to detect and monitor a larger number of small molecules, leading to greater coverage of the biochemical pathways potentially involved in the system being tested. The specificity of mass spectrometry, in the form of mass to charge (m/z) and fragmentation data, gives an analyst the ability to identify the detected biochemicals in order to gain some biological insight to the question at hand. However, the processing and identification of the detected biochemicals is complicated by the fact that a single biochemical does not give rise to a single mass-to-charge signal but rather many in the form of co-eluting isotopes, adducts, in-source fragments, and multimers, especially when electrospray ionization techniques are used [11,12]. This convoluted data can have a significant effect on the time and the ease with which data is interpreted.
Recently there have been a number of publications addressing the need for comprehensive software packages that are capable of deconvoluting such datasets [11,13-17]. These programs are a necessary first step in aiding the analyst to accurately estimate the “actual” number of biochemicals detected. Most of these software packages use combinations of the following principles to deconvolute ion-feature types: predefined known mass relationships, peak intensity or peak area correlation analysis across samples, and chromatographic peak shape correlation.
There unfortunately still remains wide differences in the numbers of “features” and biochemicals reported in literature [18-21]. This discrepancy is partly due to a lack of a standardized language in the field to discuss these data, but is certainly also dependent on what methods and data-processing were used. Ultimately the total number of biologically relevant biochemicals detected and identified, rather than the number of ion-features detected, is essential to the actual biological understanding.
While the tools to deconvolute the many ion-features produced by a single chemical are becoming available, there is still a lack of a clear picture as to "how many biochemicals are actually being detected" and a general lack of discussion concerning the source of the ion-features being detected and the amount of reproducibility associated with these types of analyses. Given the implications to successful metabolomics analyses we felt a comprehensive analysis and annotation of the ionfeatures detected in an electrospray ionization metabolomics dataset was warranted. We analyzed ion-features detected in an ion-trap mass spectrometry based metabolomics study of 10 human plasma samples. This analysis included the distribution of types of ion-features: adducts, isotopes, etc., the number of ion-features which could be classified as non-reproducible, an analysis of the number of ion-features that were also present in water blanks, i.e. artifacts, as well as an analysis and discussion of other unique ion-feature types not previously reported, to our knowledge, namely ion-features that are amalgams of monomers from two separate co-eluting chemicals and ion-features that arise from chemistry occurring within the extracted chemical samples while awaiting LC/MS analysis.
Ten plasma samples were selected randomly from a large cohort of Europeans with varying degrees of risk factors for insulin resistance. The subjects used in this smaller study ranged in age (30-59) and gender. In addition to these biological samples, aliquots of 18MΩ high purity water were also extracted to serve as process blanks.
The analytical method including sample preparation, LC/MS analysis, data processing and biochemical identification are reported in detail in  for the UHPLC method. Brief descriptions of each section are included below but for full method disclosure see .
Sample preparation: Prior to extraction, samples were stored at -80ºC. On the day of extraction, samples were thawed on ice. Proteins were precipitated from 100 μL of plasma with methanol using an automated liquid handler (Hamilton LabStar). The methanol contained four standards, which permitted the monitoring of extraction reproducibility. The resulting supernatant was split into two aliquots and dried under nitrogen and then further dried in vacuo. One of the two aliquots was reconstituted in 100 μL of 0.1% formic acid and the other was reserved for other analyses not discussed in this manuscript. In addition to the 10 human plasma samples, 100 μL of 18 MΩ high purity water was also extracted five independent times to serve as process blanks. Every sample analyzed was reconstituted with solvent that contained standards in order to monitor and evaluate instrument and extraction performance and align chromatograms. Most of these standards were isotopically labeled versions of endogenous chemicals and were carefully chosen so as not to interfere with the measurement of the endogenous species.
Instrument analysis: All 10 biological plasma samples were analyzed in triplicate; specifically three separate injections were made from the same extract vial. All sample injections were chromatographically separated using a Waters Acquity UPLC (Waters, Millford, MA). All samples were gradient-eluted at 350 μL/min using (A) 0.1% formic acid in water and (B) 0.1% formic acid in methanol (gradient profile: 0.5%-70% B in 4 min, 70-98% B in 0.5 min, 98% B for 0.9 min) using a 2.1 mm x 100 mm Waters BEH C18 1.7 μm particle column heated to 40ºC. A 5 μL aliquot of sample was injected using 2x overfill and analyzed using an LTQ ion-trap mass spectrometer (MS) (ThermoFisher Corp.) using electrospray ionization (ESI) (detailed MS conditions for positive ion analysis can be found in ). The extracts were monitored for positive ions. Injection order, including replicate injections, was randomized, i.e. the replicate injections were not analyzed back-to-back but rather at random times throughout the entire dataset. The instrument scanned 99-1000 m/z and alternated between MS and MS/MS scans utilizing dynamic exclusion at a rate of approximately six total scans/sec.
Data processing and analysis
Quality assessment: Once data collection was completed, the process coefficients of variation (CVs) were checked for all standards to ensure high quality data including consistent and non-trending instrument response, consistent and non-trending chromatography, and extraction reproducibility. Retention time alignment via internal standards was also checked and validated.
Peak detection and integration: Peak detection and integration were performed by proprietary software, but any vendor-supplied or open access software is sufficient for peak detection and integration. The outcome from this process is a list of detected chromatographic peaks characterized with m/z, retention and area under the curve values. Raw MS data was smoothed with a five point Gaussian smooth, prior to peak detection. Criteria for peak detection included signal to noise threshold of five, height threshold of 2000, area threshold of 15000, and peak width threshold of 0.035 min.
Chromatographic alignment: All samples were aligned chromatographically based on spiked standards. These standards were given a fixed retention index (RI) value [22-24]. The RIs of the experimentally detected peaks were determined assuming a linear fit between their individual flanking markers. Therefore, each compound’s RI was based on its elution relationship to its two surrounding retention markers. This system corrects for any systematic errors in retention time relating to column age or sample pH by setting everything to a locked scale via the retention markers.
Ion-feature binning for reproducibility assessment: The resulting ion-feature list, organized by mass to charge (m/z), retention time/ retention index and area, was processed per sample to find common ion features detected among the three replicates for the 10 different plasma samples or the five replicates among the water process blanks. To find and group common ion features among the replicate samples, ion feature data was binned by mass to charge and time . Binning parameters included a mass window of 0.4 m/z and a 35 RI unit window (approximately two sec) for all but the end of the chromatography 0-6 min (~0-6000 RI) and then a 55 RI unit window (approximately three sec) was used from six minutes to the end of data acquisition (~6000- end RI).
1. Sorting ions by their areas in descending order.
2. Bin ions with smaller areas around ions with larger areas, with the larger ions serving as the bin centers.
3. Calculate the statistics of each ion bin: mean mass, mean area, mean RI and their standard deviations, respectively, from all singlet ions in the bin.
4. Reset the bin center mass and center RI to its mean mass and mean RI to take into account the ion distribution within the bin. Remove all bins that have no singlet ions.
5. Re-binning all ions into these bins. If an ion can’t be binned into any of them, a new bin is created with its mass and RI as the center mass and RI.
6. Repeat steps 4 and 5 for optimized binning of all ions.
Bins were separated into groups by those with one out of the three common ion features per bin, two out of three common ion features per bin and so on. For those ion features binned and found in all three replicates, the average values of mass, area, and RI were calculated and stored.
Ion-feature bins grouped based on correlation: Only bins with ion-features detected in all three replicates and found in all 10 plasma samples were included in the correlation and grouping analysis. Due to study design (three replicates of 10 different plasma samples) correlation analysis could be skewed by the replicate injections of the same sample. For that reason, when correlation analysis was performed the mean area, mean RT, and the mean mass of the three ion-features per bin per plasma sample were used.
Software utilizing the QUICS method  was used to generate groups of ion features with high levels of correlation across all of the samples within the experiment. Each group is therefore composed of the various ion-features that are generated from any given molecule. Settings for the application of the method were that ions must fall within a 35 RI unit window and correlation must be greater than 0.7 across all samples. These groups were then compared to the internal spectral library for matching and identification against a library of standards. This methodology was particularly helpful in identifying insource fragments and unique adduct and multimer combinations that did not have previously characterized mass relationships.
Identification of artifacts: Artifact detection logic was applied at the bin and group level. This was accomplished by comparing the bins and groups identified within the water process blanks and the experimental samples. Any bin or group found in the experimental samples that was also found in the water process blank samples whose area was not at least five times higher in the experimental samples was automatically marked as a potential artifact.
Compound identification: The resulting data was searched against an in-house chemical library generated by analyzing over 2000 authentic standards through the method described above. Identifications are therefore based on three criteria: retention index within 150 RI units of the proposed identification, experimentally detected precursor mass match to the authentic standard within 0.4 m/z and a MS/MS fragmentation spectral score. The MS/MS score is based on a comparison of the forward and reverse scores, specifically whether all the ions present in the authentic standard library entry are present and at the correct ratios in the experimental MS/MS spectrum and vice versa. All proposed library calls were hand curated by an analyst who ultimately was responsible for library identification being accepted or rejected based on the above stated criteria.
Ion-feature analysis: A detailed analysis of all reproducible ionfeatures detected (i.e. present in all three replicates) in plasma sample B was performed. This analysis first included identifying which ion-features were isotopes, adducts, protonated molecular ions, multicharged ions, amalgams, in-source fragments and multimers based on approved biochemicals. Approved biochemicals are those biochemicals that were detected and identified from the in-house spectral library and included known named biochemicals from authentic standard analysis, unknown biochemicals routinely detected and monitored standards and artifacts. After the ion-features that originated from these biochemicals were accounted for, the remaining ion-features were analyzed. The groups created by the correlation analysis aided in identifying ion-features that were related even if their chemical composition was unknown. This analysis is particularly helpful in identifying in-source fragments and unique adducts.
How many of the ion-features detected are noise, i.e. nonreproducible? In order to estimate this, each biological sample was run in triplicate. The ion-feature totals per replicate, per 10 biologically variant European plasma samples are shown (Figure 1). The total number of ion-features detected across the technical replicates and also from plasma-to-plasma sample does not vary highly. The similarity in ion-feature totals across the different plasma samples might be related to the highly homeostatic nature of plasma in healthy individuals. However, given human variability there were a significant number of ion-features that were only reproducibly detected in sub-fractions of the plasma samples tested, including a large number of ion-features only reproducibly detected in one out of the ten samples tested (Table 1). Many of these include OTC or prescription drugs and their metabolites as well as lifestyle indicators such as nicotine metabolites.
|Number of Plasma Samples out of 10||Number of eproducible Ion-feature Bins|
Table 1: The number of reproducible i.e. detected in all three technical replicates, ion-feature bins detected in varying sub-fractions of plasma samples. Eg. There were 260 reproducible ion-feature bins detected in 7 out of the 10 plasma samples.
The technical replicate ion-feature totals within each plasma sample are also quite consistent (Figure 1). However, based on an analysis of the same ion-features showing up in these technical replicates, this consistency in total ion-feature counts is not an indication of a high degree of ion-feature reproducibility. An ion-feature was considered to be the same across replicates if the mass of the detected ion-feature was within a 0.4 m/z window and eluted within a ~2 sec window. The number of ion-features detected in all three technical replicates of the same plasma sample averages around 28% (Table 2). Another 24% of ion-features detected were detected in two out of the three replicates and 30% were detected in only one out of the three replicates. Approximately, 19% of the ion-features fell into the “other” category, which includes ion-feature bins where more than one feature was detected within the specified windows. Manual inspection of some of the ion-features falling into the “other” category showed many partially resolved chromatographic peaks leading to irreproducibility. Overall, this analysis indicates that over 70% of the ion-features detected were not reproducible as a result of miss-integration or non-detection. Perhaps not surprisingly, the ion-features only detected in one-outof- three technical replicates have the lowest overall area distribution followed by the two-out-of-three category then the three-out-of-three technical replicates category (Figure 2). This supports the hypothesis that the reason for much of the poor reproducibility is that many of these ion-features represent a population of border-line detectable peaks.
|Ion-feature Reproducibility||Detected in 3 out of 3 replicates||Detected in 2 out of 3 replicates||Detected in 1 out of 3 replicates||Other|
|% of total Ion-features in plasma B||29%||24%||30%||17%|
|Avg over all 10 plasma samples||28%||24%||30%||19%|
Table 2: Ion-feature reproducibility based on a sum total of 34053 ion-features detected in the three replicates of plasma sample B.
The total number of ion-features detected will certainly strongly depend on the instrumentation and method used. For example, a much longer chromatographic gradient will result in the generation of larger ion-feature totals, as will accurate mass instrumentation which is capable of higher sensitivity measurements as a result of lower noise. Presumably the percentage of non-reproducible ion-features would likely still be quite high with other methods as there would still be compounds present at borderline detection levels.
Distribution of ion-feature type
Of the ion-features detected, how many are protonated molecular ions (m+H)? How many are adducts? When attempting to identify ion-features, the majority of databases assume that the detected ion represents a protonated molecular ion when searching for potential matching molecular formulas. It is therefore critical to understand how often this assumption is valid. For this analysis, all ion-features reproducibly detected in all three technical replicates of a single plasma sample were considered and manually annotated. These ion-features were searched against an in-house authentic standard spectral library of over 2000 known chemicals, plus over 2400 previously detected and documented unknown chemicals. Ion-feature type, i.e. isotope, adduct, multimer, etc., was only assigned after visual inspection. It is important to note that almost 75% of ion-features detected did not show any relationship with other ion-features, in other words there was no known mass relationship or high degree of correlation among these ion-features. In these cases there was no way to gauge the ion-feature type based on a lack of supporting information (Figure 3A).
Ion-features were divided into seven different types: isotopes, protonated molecular ions (m+H), adducts, amalgams, in-source fragments, multi-charged ions, and multimers (Figure 3A and 3B). Ion-features were only classified if there was data to support the classification. Supporting data could be the presence of other ions such as consistent adducts to frame the molecular ion or a MS/ MS fragmentation match to a known chemical in the library. Ionfeatures were classified as follows: if an ion-feature was an isotope then it was placed in this category regardless of whether it was an isotope of the parent molecule or whether it was an isotope of an adduct, multi-charged ion, in-source fragment, etc. The adducts most commonly detected were Na and NH4, however other adduct forms were also detected. An ion-feature could be classified an adduct even if the adducting species was unknown, i.e., not one of the commonly detected ESI+. We classified the ion-feature as an adduct if the ion-feature showed a high degree of signal correlation, sample to sample, with the protonated molecular ion and co-eluted with the molecular ion. Adducts were classified as such regardless of whether they were an adduct of a molecular ion, multimer, in-source fragment, etc. In-source fragments were classified by matching them to the authentic standard library, i.e., if the analysis of the authentic standard showed an insource fragment then the experimentally detected ion-feature could be classified an in-source fragment. The main exception to this was in the case of peptides where in-source fragments could be readily identified by looking at the MS/MS spectrum for the intact peptide. Often the major fragment ions seen in the MS/MS spectrum are the same as those ion-features arising from in-source fragmentation.
The most common ion-feature type, of those able to be identified, was isotopes at 38% (Figure 3B). This is not surprising as all other ion-feature types; adducts, multimers, etc., have isotopes if present at high enough levels to be detected. The high percentage of isotopes is quite advantageous as isotopes are by far the easiest ion-feature type to identify and some of the commercially available software data handling packages already offer isotope removal logic [13-17,26]. Adducts and protonated molecular ions make up the next two largest ion-feature types, both making up ~25% of the identified ion-features. The remaining 12% is divided among in-source fragments, multi-charged ions, multimers, and amalgam ion-features. The number of ion-features detected on a per molecule basis is still low (Figure 4), with almost 85% of the molecules detected producing five or fewer ion-features.
These data are similar to previously reported ion-feature annotation studies where protonated molecular ion ion-features did not represent the major ion-feature type [11,12]. It is important to note that since the percentages reported here only include those that were able to be annotated i.e. not taking into account the large percent of ion-features with no readily discernable annotation, the percentages are larger than they would be if all detected ion-features were used in the calculation.
This analysis suggests that, assuming an analyst can readily identify isotopes, for the most part an ion-feature has an equal chance of being an adduct or a protonated molecular ion. However, within the adduct population some adducts are more readily identifiable than others and any ability of the analyst to successfully identify an adduct as such will shift this 50/50 chance. For example, a Na adduct can be readily identified based on the unique mass contribution of Na, whereas it is far more difficult to identify a NH4 and various solvent adducts (MeOH, ACN, etc.) because there are no atoms unique to these adducts that are not commonly seen in the molecules themselves. In the case of NH4 or solvent adducts, if there is no indication that the detected ionfeature is an adduct, then the molecular formula posed will include the adduct and databases searches will yield inaccurate chemical structure possibilities.
These data are highly dependent on the type of ionization technique and mass spectrometer used as well as the database/library used for identification. Certainly one could expect a very different profile of ion-feature types if atmospheric pressure chemical or photo ionization (APCI and APPI, respectively) techniques were used. These particular ionization mechanisms tend to induce more in-source fragmentation and fewer adducts and multimers. In addition, the analysis of the ionfeature types is heavily influenced by the size and “completeness” of the library used for identification of the molecules detected. If the perfect database of all chemicals existed, the ratios of these ion-feature types would likely shift somewhat, but to what extent is difficult to assess. While this analysis is certainly heavily influence by the method used we feel it is still relevant beyond the boundaries of this specific method .
Perhaps one of the most interesting findings while annotating ionfeature types was the discovery of what we are terming “amalgam” ion-features. Amalgam ion-features arise from the interaction of the monomers of two co-eluting biochemicals and are often seen when two abundant biochemicals co-elute or closely elute. The m/z of the detected “amalgam” ion is the sum total of the two separate molecules (Figure 5, panels A, B and C) and the fragmentation pattern often shows the neutral loss of one or both of the intact molecules (Figure 5, panel D). Amalgam ions can be thought of as a type of adduct, however as they are dependent on the chromatographic resolution of the biochemicals and vary sample type to sample type, we felt they should be categorized separately. Amalgam ion-features are most easily identified by looking at the fragmentation spectrum. An amalgam ion-feature is unique because while essentially an adduct it does not co-elute with either molecule making up the amalgam but rather elutes only where the two molecules overlap. This is demonstrated in Figure 5, panels A, B and C where the amalgam ion in panel C does not perfectly co-elute with either monomer (panel A and B). At present it appears that any coeluting or partial overlapping strong signals can form amalgam ions. The ratio of amalgam ion-feature to the two monomers is quite low (Figure 5 panels A, B, and C) thus for the amalgam to be detectable the two monomers must be abundant. Other examples of amalgam ionfeatures that have been detected in other data sets include urea amalgam adducts which are often detected in urine samples where urea is present at very high levels and often elutes over a wide time range (data not shown). Amalgam adducts of nicotinamide and hypoxanthine, glucose and lactate, glucose and glutamine, as well as several amalgam adducts of various known compounds with unknown compounds have also been detected experimentally (data not shown).
Figure 5: Amalgam ion-features are formed when two monomers of abundant and closely eluting biochemicals adduct with one another forming a unique “dimer”. Panel A shows the XIC for a biochemical at m/z 205 in positive ESI and Panel B shows the XIC for a biochemical at 210 m/z in positive ESI. Panel C shows the XIC for the amalgam ion which is a dimer of one monomer of the 205 ion and one monomer of the 210 ion. Panel D shows the positive ion iontrap MS/MS spectrum of the 414 m/z amalgam ion where the fragment ions are the individual monomer masses.
Ultimately, the analysis of ion-feature type points to the need for intelligent deconvolution software packages that are able to readily identify isotopes, adducts, etc, particularly when ESI techniques are utilized. Fortunately there has been a recent increase in the number of software packages capable of said deconvolution, but their wide spread use have yet to be implemented [11,13-17]. Unfortunately, the common methodology is still to perform statistical analysis on all detected ion-features without any ion-feature deconvolution (ion-centric). Given the large percentage of non-protonated molecular ion species, this method is inefficient and time-consuming in terms of identifying the chemicals being altered, but also lead to more false discoveries as a result of more measurements being processed through a statistical analysis . In addition, when using this approach the list of statistically significant ion-features will likely include multiple versions of the same biochemical. As an example, the sodium adducts, dimer, protonated molecular ion and all associated isotopes of a single biochemical might be listed as statistically significant ion-features. The analyst will have taken the time to determine each of these ion-features and ultimately have only discovered one interesting biochemical.
A different approach to address the many redundant ionfeatures types is what we call a chemo-centric approach. In a chemocentric approach, detected ion-features are first identified, including determining which ions are isotopes, adducts, multimers, etc. of the various detected biochemicals. This allows one to choose one ion per biochemical to represent that biochemical in statistics, simultaneously removing the others redundant measurements from the data set. This approach dramatically reduces the number of measurements being process through statistics thus reducing the number of false discoveries and simplifying the resulting data output. The previously mentioned software applications use this chemo-centric approach. These programs attempt to group related ion-features by using chromatographic peak shape, cross-sample correlation and/or known mass relationships and ultimately utilize one of these ion-features, presumably and ideally the protonated or deprotonated molecular ion, for statistics and identification purposes. There are limitations and potential pit falls associated with using these applications particularly if using them for identification purposes, namely accuracy of the algorithms to determine the molecular ion. However, they do function to deconvolute and reduce redundant data and offer significant advantages and improvements to the standard ion-centric method.
Another method of identifying the ion-features in order to properly deconvolute them, and one of the methods used in this study, is by use of an authentic standard spectral library. The library used in this study contained enough information to permit confident identification of all of the various types of detected ion-features. The library used in this study contained expected retention time/index, m/z, and MS/MS spectral data on all molecules present in the library, including their associated adducts, in-source fragments, multimers, etc. For example, with this library, when one experimentally detects a sodium adduct it can be immediately identified as such and not a new biochemical. In this way, the library permits rapid and high confidence identifications of a compound’s protonated molecular ion as well as associated ion-features without need of further analyses. The most significant drawback to this type of approach is the time and financial resources needed to build and maintain an extensive spectral library. In addition, one must still have a means of grouping ion-features from chemicals not present in the library . Overall, the time savings associated with using a spectral library is significant when considering reduction of false discoveries and reduced complexity of data output.
Distribution of ion-feature source
Of the chemicals detected how many of them were of biological origin and how many were contributions from the analytical process? Not all biochemicals detected in a traditional global small molecule profiling study originate from the biological matrix under study. An analysis of this data set showed that only 63% of the ion-features (74% of the biochemicals) detected originated from the biological matrix (Figure 6, “Other”). The remaining 37% of ion-features (26% biochemicals) were a combination of standards (15%), artifacts (21%), and process artifacts (1%).
Figure 6: Ion-feature totals per biochemical class or type. Pie chart is based on the 299 approved biochemicals detected in all three replicates of the Plasma B sample. 640(220) designates 640 total ion-features detected from 220 named and unknown biochemicals. Standards are isotopically labeled compounds spiked into every sample. Artifacts are defined as compounds detected in both the plasma samples and the water process blank where the plasma sample did not exceed 5x the levels seen in the blanks. Process artifacts are compounds not detected in water process blank but are known products of chemistry occurring within the extracted samples. “Other” refers to detected biologically derived ion-features/biochemicals. In Legend, percentage value is percentage of total ion-features detected.
The preponderance of ion-features arising from only 14 standards exemplifies the fact that the larger the MS response the more ion-features are typically detected. This fact raises the question of what effect does differential ion-feature production have on statistics in an ion-centric / all ion-feature statistical analysis? Do biochemicals producing more ion-features differentially affect the statistical outcome? Is there any possible effect of over/under representation of biochemicals through ion-features that may skew the statistics? As with any statistical results this will likely depend on the type of statistical analysis performed, but being aware of the potential issue is important.
Artifacts comprise 21% of ion-features and 20% of the unique biochemicals detected in this study. Artifacts are defined as biochemicals detected in the samples resulting from storage and processing of those samples and not of biological significance as a result. These ion-features/ biochemicals include plasticizers used to soften plastic tubing and storage vials, releasing agents for ease of separating plastic from moulds as well as chemicals added to the samples for stabilization purposes, e.g. EDTA. Many of these ion-features/biochemicals were determined by analysis of a water blank carried throughout the entire process sideby- side with the experimental samples. Any artifacts extracted into the biological experimental samples were also extracted into the water blank. Any ion-features detected in the water process blank samples whose area was not at least 5x lower than the experimental sample detection area was automatically marked as an artifact.
Perhaps the most difficult ion-feature/compound artifact to identify is process artifacts. Process artifacts are artifacts arising from chemistry occurring within the small molecule extract and not of biological origin and only comprised 1% of the ion-features detected. These biochemicals are difficult to identify because they are not present in water blanks as they only arise when the complex mixture of extracted chemicals are in solution together. These process artifacts were initially discovered and identified when glycosylated forms of deuterated amino acid standards were detected in experimental samples. Further experimentation demonstrated that a neat mixture of natural amino acids in the presence of glucose readily formed glycosylated amino acid derivatives in aqueous and neutral conditions typical of sample reconstitution and extraction conditions (data not shown). While the possibility remains that some portion of these glycosylated amino acids are biologically relevant, since there is known process contribution their quantity and biological significance is called into question. While thus far these glycosylated amino acid artifacts have been the only identified process artifact, it is possible that chemistry is also occurring between glucose/sugars with a variety of other molecules or even between as of yet unidentified compounds.
There is little discussion in literature about the prevalence and number of ion-features/chemicals detected that are non-biologically derived artifacts. Given the large number of artifacts (21% of ionfeatures detected) and process artifacts (1% of ion-features detected) in global profiling they can contribute significantly to false discovery. This study demonstrates the importance of analyzing some sort of blank in order to remove these signatures prior to statistical analysis, however as mentioned previously blanks will not help the analyst identify process artifacts.
Number of compounds detected
Together these data show that from an original injection where ~10,000 ion-features were detected only 220 compounds of biological origin could be identified. The total number of compounds detected and identified is directly related to the methods used as well as the completeness of the library/database used. There are a number of experimental ways to increase the number of compounds detected; one can diversify one’s method, e.g. monitoring negative ions, or one can collect additional data streams, e.g. collecting GC/MS data or HILIC based method. On the other side, increasing the number of compounds identified is most heavily influenced by the library/database used. Again, if the perfect database of all biochemicals existed, then more compounds would certainly be identified. The extent of the increase in number of detected compounds is difficult to assess. However, even if the number of identified biochemicals could be three times higher there is still a significant difference between the number of ion-features detected and the number of compounds detected/identified in global biochemical profiling.
This analysis points out the non-trivial composition and source of ion-features produced in a global small molecule profiling/ metabolomics analysis using electrospray ionization. It was found that nearly 70% of all ion-features detected were non-reproducible as a result of non-detection or miss-integration. Multiple ion-features were produced per molecule such that only 25% of the reproducible ion-features that could be annotated were determined to be protonated molecular ions. In addition, large percentages of the detected ionfeatures arose from non-biological sources (22% artifacts and 15% standards). These data demonstrate the need for software approaches directed toward identifying and removing the redundant ion-features prior to statistical analysis as well as pointing out the importance of collecting data that allows the analyst to readily identify artifacts. In total, this analysis clearly demonstrates that the actual number of biologically relevant biochemicals detected in this global biochemical profiling study is much lower than the total number of ion-features detected in this type of analysis.
The authors thank the platform, IT and software development teams for their assistance in collecting and processing the data presented in this manuscript. We also want to thank Sandy Stewart, Director of Platform Operations; Mike Milburn, CSO; and John Ryals, CEO for their support and guidance.