Use of Pharmacogenetic Testing in Routine Clinical Practice Improves Outcomes for Psychiatry Patients

Introduction: In clinical trials, pharmacogenetic testing has been shown to improve outcomes in psychiatric patients. It is unclear if these improved outcomes translate into routine clinical practice. A significant impediment to evaluation of pharmacogenetics testing in routine practice has been a lack of quantifiable outcomes data. This study leverages longitudinal symptom evaluations using validated computer-based assessments to evaluate the impact of pharmacogenetic testing across a number of psychiatric symptom dimensions in routine clinical practice.


Introduction
Despite a substantial number of FDA-approved medications, many psychiatric conditions remain challenging to treat. In particular, depression and psychosis have low success rates with the initially selected medication. In fact, the initial medication typically achieves meaningful response in 40% or less of patients [1].
The advent of personalized medicine offers an objective, biological tool to enhance success rates. Genetic variation in metabolic enzyme genes has long been known to alter the pharmacokinetics of drug metabolism. Indeed, many psychiatric medications have language in their labels specifically highlighting metabolism by enzymes with known genetic variants that alter activity.
Retrospective analysis of clinical cohorts demonstrates the correlation of genetic information with various outcomes De Leon et al. have described how cytochrome p450 (CYP) 2D6 poor metabolizer status greatly increases the risk of adverse events for risperidone-treated patients [2][3][4][5][6][7][8]. The Mayo clinic has an extensive publication history dealing with pharmacogenetic (PGx) testing and depression [3,4]. A number of groups have reported on the impact of genetic variation in the serotonin transporter [5,6]. The Clinical Antipsychotic Trials of Intervention Effectiveness working group has published the results of PGx analysis of response to antipsychotics [7,8]. Similarly, many analyses for large multicenter trials looking at the impact of genetics on treatment response in depression have been published, including both gene-specific and large scale genetic analyses [9][10][11][12].
Within the past few years, some prospective studies have been conducted. The Mayo group has published several studies showing that therapy directed by genetic testing outperforms standard of care for the treatment of depression [3]. Herbild et al. reported on a prospectively designed study demonstrating that PGx-guided therapy led to lower health care costs, presumably through better treatment outcomes in schizophrenia [13]. The two studies shared a focus on CYP variants. The Mayo studies also evaluated variation in the gene encoding the serotonin transporter. Each of these studies demonstrated the benefits of incorporating PGx testing in a controlled setting. Recently, consortia have begun publishing consensus recommendations for antidepressant dosing based on metabolism gene status [14]. Olgiati et al. reported that PGx testing can be cost effective in high income European Union countries, which have economies similar to the United States [15]. More recently, confirmatory findings are beginning to emerge from studies in naturalistic settings [5].
diversity. Accordingly, it has becoming increasingly important to assess the efficacy of medical interventions in routine clinical practice. These kinds of assessments can be helpful to understand if technological advances (such as PGx testing) improve quality of care, and consequently, if they are cost-effective.
The present study examined the impact of use of a commercial PGx test, the SureGene Test for Antipsychotic and Antidepressant Response (STA 2 R), in routine clinical practice in a private setting (The Neuropsychiatric Clinic [NPC] at Carolina Partners in Raleigh, North Carolina). The NPC patient population demographics are reflective of the American South-East modified by the presence of numerous Universities and high technology companies located in the Research Triangle Area of North Carolina. Typically, evaluating results in a non-controlled setting can be troublesome due to lack of standardized measures of efficacy. However, the NPC routinely assesses patients with validated computer-based self-report mechanisms, the NeuroPsych Questionnaire -Short Form (NPQ) [16]. The NPQ is part of a commercially available computerized neurocognitive test battery (CNS Vital Signs). The NPQ has been shown to correlate with standardized rating scales including the Hamilton and Beck depression and anxiety rating scales [16]. This availability of a consistent measure of psychiatric symptom burden enabled a direct comparison between tested and untested patients and evaluation of the benefit of STA 2 R testing in a routine setting.

Patients
The patients for this analysis came from The NPC at Carolina Partners, a private practice in Raleigh, North Carolina. The Neuropsychiatric Clinic staff routinely assesses patients with various computer-based assessments (CBAs). These assessments are monitored by psychometricians. In particular, most patients complete the NeuroPsych Questionnaire -Short Form (NPQ) at each visit [16].
The inclusion and exclusion criteria for the study were as follows: (1) All patients with at least four assessments with NPQ between July 2012 and August 2013 were included.
(2) Patients that had undergone transcranial magnetic stimulation (TMS) therapy at the NPC at any time were excluded.
Both tested and untested patients that met these criteria were included in this retrospective analysis. There were sufficient NPQ data for a total of 57 untested and 74 tested individuals to be included in the study. Table 1 lists the frequency of the various diagnoses for tested and untested subjects. Most subjects had more than one diagnosis. As expected for a Neuropsychiatric practice, the majority of patients also had at least one neurological disorder.

Clinical data
The primary assessment used to determine clinical progress for this study was the NPQ. This self-reported CBA demonstrates good reliability both for retesting patients over time and between raters, and it is sensitive to treatment effects [16]. The test is provided through an online questionnaire, and its ease of use along with the psychometric properties noted above allows it to be used routinely in clinical practice. Only retrospective, de-identified clinical outcomes data were used. The principles outlined in the Declaration of Helsinki were followed.

Statistical methods
We used the Student's t-test to test the hypothesis that there were no significant differences between baseline NPQ scores of tested and untested patients. To test whether the genetic testing influenced patients' improvement as measured by the change of NPQ item score (dScore), we built a general linear model for dScore that included days from the initial assessment and median centered baseline, both nested in Case (tested) / Control (untested) status. We used all available observations up to day 300 from each subject to fit the model [17]. We used the model for two purposes. First, the model tested the hypothesis that the slope of the line indicating change over time differed significantly from zero (i.e., a significant difference indicates that there is a treatment effect). Second, the model was used to predict response at 300 days of treatment at the NPC. We then compared the average predicted improvement at 300 days for tested versus untested patients using the t-test and further evaluated those dimensions that showed significant differences using linear regression with covariate adjustment for baseline score.

Genetic testing
The patients were tested with the commercially available STA 2 R test. This test includes several genetic markers that provide information regarding the likely efficacy of olanzapine (sulfotransferase family member 4A1 gene [SULT4A1]) [9,10], and of selective serotonin reuptake inhibitors (serotonin transporter gene [SLC6A4]) [18,19]. Additionally, STA 2 R provides information on the probable, genetically determined activity level of numerous cytochrome P450 genes (CYP) 1A2, 2C9, 2C19, 2D6, and 3A4/5. The CYP enzymes metabolize a broad range of medications including, but not limited to, commonly used antipsychotics (aripiprazole, clozapine, haloperidol, olanzapine, quetiapine, and risperidone) and antidepressants (amitriptyline, citalopram, duloxetine, escitalopram, fluoxetine, nortryptyline, and venlafaxine). For patients having CYP genetic variation that results in altered activity levels, STA 2 R provides information that facilitates physician-directed dose adjustments for many of the most commonly prescribed medications.

Results
A total of 131 patients, 74 tested and 57 untested, from the NPC met the study criteria. For the tested patients, Table 2 lists the tested genes, commonly used drugs impacted by the tested gene, and the frequency of test results for each gene indicating altered metabolism, function, or efficacy in the tested population. All of the genes had a meaningful percentage of patients with altered function, ranging from 8% for CYP 3A4 to 86% for CYP 1A2. Predictors of efficacy, SULT4A1 and SLC6A4 (for olanzapine and SSRIs respectively), occurred frequently, 29% and 76% respectively. Of tested patients, only 1 of 74 had no impactful variants.   Using the statistical model described in the Methods section, we evaluated whether or not the slope of the line describing patient improvement over time differed significantly from zero. The model used for this analysis included an adjustment for baseline (median centered). For all symptom dimensions, baseline score was a highly significant term in the model, with all p-values < 10 -6 . Table 4 lists the estimated slope of the change in each NPQ dimension score over time for tested and untested patients. The Table 4 values represent the change per day in the outcome measure. A negative slope indicates a decrease in symptom burden, corresponding to clinical improvement.
In tested patients, aggression, anxiety, depression, fatigue, impulsive, mood instability, and panic showed significant daily improvement, i.e. had negative slopes that differed from zero. In untested patients, only mood instability had a significantly negative slope. After correcting for multiple comparisons, anxiety, panic, and mood instability continued to show significantly negative slopes in the tested group, while no domains did so for the untested group. Table 5 provides the mean model-predicted response for each of the symptom dimensions for tested and untested patients. As above, the model corrects for differences in baseline scores. Notably, for eight of the 12 dimensions, tested patients experienced significantly greater decreases in symptom burden than untested subjects. While the predictive model incorporated baseline scores, we performed a linear regression analysis that explicitly included a baseline covariate to eliminate the possibility that the significant differences between tested and untested patients could be attributed to the significant differences in baseline values reported in Table 3. Table  6 shows the results from a linear regression analysis of the significant items from Table 5: aggression, anxiety, depression, fatigue, impulsivity, mood instability, panic, and suicide. After correction for baseline score, all eight symptom dimension demonstrated significantly greater symptom reduction in tested subjects. Indeed, all dimensions except depression retained or improved the degree of significance. We also calculated the effect size for the difference in response between tested and untested patients. All eight dimension had an effect size (0.25) greater that often used to separate a drug from placebo in clinical trials.

Discussion
We found that multiple, clinically important psychiatric symptoms improved significantly more for patients with pharmacogenetic (PGx) testing compared to those that were not tested. On average, the tested patients tended to have higher baseline NPQ values than untested patients, reflecting a likely bias toward performing PGx testing for patients who are more ill and/or treatment refractory. It is important to note that our response model incorporated baseline NPQ values, and, therefore, the greater improvement seen for tested patients cannot be  1 Each row represents one of the twelve symptom dimensions for which the NPQ provides quantitative values of severity; 2 Mean is the mean of the model predicted values for each of the two patient groups; 3 SD is the standard deviation of the mean; 4 The difference is calculated as (Mean of Tested) -(Mean of Untested). A negative number indicates that Tested patients experienced greater symptom reduction than Untested patients; 5 P-value is calculated by applying t-test to the distribution of the change in dScore for each symptom dimension for Tested vs. Untested patients.  5 Effect size is a measure of the relative magnitude of the response and is calculated by Beta/SD Table 6: Linear regression analysis of predicted change in NPQ values at day 300 using baseline NPQ score as a covariate.
attributed to higher baseline values. The magnitude of the enhanced response enabled the tested patients to achieve similar final symptom burdens despite having significantly higher baseline values for many dimensions. Furthermore, to rule out the possibility that the observed important to evaluate the impact of new technologies, including PGx testing, on clinical outcomes in order to justify their use [13,15].
In routine clinical care, if such testing results in significantly better improvement in symptom burden, as found here, the increased costs of testing may be justified. Further, if real-world outcomes are better for tested patients, this may encourage psychiatrists to incorporate use of this technology to benefit their patients and payers to provide access to and coverage for PGx tests.

Disclosures
Sandeep Vaishnavi and Elizabeth Griffin are employees of Carolina Partners, a North Carolina psychiatric practice Qian Liu has no conflicts. Timothy Ramsey is a shareholder in SureGene, LLC and employee of Clinical Reference Laboratories. Mark Brennan is a shareholder in SureGene, LLC difference was driven by differences in baseline score, we conducted linear regression analysis including baseline score as covariate, which confirmed that a significant portion of the observed difference was due to tested status.
For three of the four dimensions that did not show superior improvement in tested versus untested patients (memory, pain, and sleep), the test results did not provide information that would meaningfully impact the treatment decision. For attention, the test results provided information regarding dosing of amphetamines, patients for whom atomoxetine was inappropriate, and no information regarding use of methylphenidate. Conversely, with the exception of fatigue, all of the dimensions for which tested subjects displayed superior response compared to untested patients were treated extensively with antidepressants and antipsychotics, As shown in Table 6, most test results will have vital information concerning the selection and/or dosing of the commonly used antidepressants and antipsychotics. For example, seventy to ninety percent of test reports will provide actionable information on the most common SSRIs. Similarly, over fifty percent of test reports will provide actionable information regarding the use and/ or dosing of aripiprazole, olanzapine, risperidone, and/or quetiapine.
While this study had several differences from tightly controlled clinical trials, such as baseline differences between the two study groups, lack of randomization and blinding, as well as inclusion of multiple diagnoses, these issues are an intrinsic component of routine clinical practice [3,4,[20][21][22][23][24]. Additionally, rather than focusing on a single, disease specific efficacy measure, e.g. Hamilton Rating Scale for Depression or Positive and Negative Syndrome Scale, this study focused on improvement across a wide range of common psychiatric symptoms that are more reflective of typical psychiatric patient populations. Furthermore, this study did not dictate a specific treatment algorithm or plan. The study simply provided the treating physician with information regarding how the biology and genetics of individual patients was likely to influence the pharmacokinetics and pharmacodynamics of commonly used psychotropic medications and classes of medication. The treating physician then incorporated this information into his personalized treatment plan for each individual patient. These issues and differences between initial, controlled clinical trials and real-world care will always be present when translating any new therapy or testing methodology into routine clinical use. Nonetheless this naturalistic study does reflect routine practice at a specific clinic trying to optimize patient outcomes, and as such, provides evidence of the value of PGx testing in psychiatric practice. This study provides evidence that incorporating PGx testing into psychitric practice can benefit appropriate patients [17,24] This study thus provides some of the first evidence that the superior treatment response observed in patients undergoing PGx testing compared to untested patients in structured clinical trials indeed translates into better outcomes for tested patients in routine clinical practice [3,4,21]. Moreover, virtually all tested patients had at least one genetic variant with actionable information indicating that routine testing would be beneficial to most patients being treated with psychotropic medications.

Conclusion
The use of the STA 2 R test in routine clinical practice can enable significant improvement in clinically important outcomes for psychiatric patients. Hopefully, as the use of personalized medicine increases, more evidence from routine clinical practice will emerge that assesses outcomes for appropriately used PGx testing. As the healthcare environment continues to change in the United States, it will be more