Meta-Analysis of Psychopharmacologic Treatment of Child and Adolescent Depression: Deconstructing Previous Reviews, Moving Forwardy

The proliferation of Selective Serotonin Reuptake Inhibitors (SSRIs) in the 1980s led to increased use of antidepressant medications for children and adolescents with Major Depressive Disorder (MDD). Since then, there have been 18 reviews of this literature with nine being meta-analytic. Many of these meta-analyses suffer from several methodological problems: did not statistically compare medication efficacy, included only randomized placebo control trials, calculated response rate rather than risk-difference and odds ratio, were conducted prior to the 2009 publication of the PRISMA meta-analyses standards, rarely addressed publication bias, and failed to conduct metaregressions to account for moderator variables. The purpose of the present meta-analysis was to address each of these limitations. Results indicated SSRIs were the most effective class of medication with Sertraline having the highest response rate and Citalopram having the lowest response rate. Overall, Nefazodone had the highest response rate of any medication regardless of class, although there was a relatively small sample size (n = 39). When examining publication bias, only SSRIs had statistically significant positive findings. In terms of moderator variables, RCTs and open-label trials predicted response rate, as did age and gender (females).


Introduction
Major depressive disorder (MDD) is a chronic and recurrent condition that effects approximately 2% of children and 6% of adolescents [1,2]. Many children with MDD experience significant psychosocial impairments and require multimodal treatments including pharmacologic agents [3]. The proliferation of selective serotonin reuptake inhibitors (SSRIs) in the 1980s, with a wider margin of safety than tricyclic antidepressants, has led to their increased use among children and adolescents [4]. Nevertheless, SSRIs were included in the 2007 Food and Drug Administration (FDA) updated black box warning label that targeted the risk for suicidal thinking and behavior [5]. However, in a meta-analysis of the literature, Bridge et al. [6] found evidence that the benefits of antidepressants for children and adolescents appear to be much greater than risks of suicidal thoughts or behavior.
There is still a considerable amount of undetermined information and speculation concerning the use of antidepressants with children and adolescents even though there have been approximately 18 previously published reviews and analyses on this subject from 1995 through 2012. Part of the reason is that many of these reviews were purely descriptive and did not statistically synthesize (i.e., meta analyze) findings of the research [1,4,[7][8][9][10]. They also had different foci including, but not limited to, effects of medications specifically on treatment resistant depression [11], safety of medications [9], and current issues associated with pharmacological treatment of childhood and adolescent depression [12].
There have been several reviews of the literature using meta-analytic techniques, but these too have methodological issues and different foci. First, the majority of the reviews examined only randomized placebo control trials [6,[13][14][15][16][17][18][19]. Second, most studies used response rate as the primary effect measure, with only one utilizing risk-difference (RD) [6] and two using Odds Ratios (OR) [15,20]. Third, many of these analyses occurred prior to, or did not ascribe to, the Preferred Reporting Items for the Systematic Reviews and Meta-Analyses (PRISMA) [21]. Fourth, because many of the reviews did not follow PRISMA guidelines, only three had replicable descriptions [6,19,20]. Finally, most reviews focused solely on SSRIs [6] and did not include third generation medications (e.g., venlafaxine, reboxetine, nefazodone, mirtazapine). Their analysis also included children with anxiety disorders, and only randomized placebo control trials.
In addition to these different foci, there is an important issue in meta-analyses that none of the previous syntheses addressed: the specific publication bias of the "file drawer" problem. Sussman [22], in particular, stressed the importance of addressing this problem when assessing the efficacy and safety of antidepressants. The basic notion of the file drawer problem is that statistically nonsignificant results are less likely to be published [23,24]. Consequently, the effect size obtained from a meta-analysis is likely to be distorted and that the exaggerations are strongest when the true effect size approaches zero [25]. However, if the missing studies are randomly omitted there is no systematic impact on the effect size [23]. Conversely, if there is a systematic omission from a literature synthesis (i.e., consistent lack of publishing nonsignificant results), readers and reviewers of the meta-analysis may draw the wrong conclusion about what that body of research shows [26]. Consequently, there is a continuing and pervasive concern with meta-analytic literature regarding the universality and consequences of the file drawer phenomenon on the synthesis of research [27].
Information sources for the present meta-analysis included searching the following online databases: Medline, Psych INFO, and ERIC. The following Boolean phrase was used for each source: ("Depression") AND ("medication" OR "Anti-depressant") AND ("Child*" OR "youth" OR "student" OR "adolescent") with the last search completed on September 3, 2013. The study criteria were limited to reports of empirical studies, quantitative studies, literature reviews, clinical trials, systematic reviews, and meta-analyses.

Eligibility criteria
There were four eligibility criteria to be included in the current review and analysis: (a) type of study, (b) type of participants, (c) type of intervention, and (d) type of outcome measures. First, articles were published in English, reported results of empirical data, and were quantitative in design. Second, participants were children or adolescents between the ages of 0 to 18 years with Major Depressive Disorder (MDD) as the primary diagnosis. Depressive disorders as a secondary comorbid condition to another psychiatric condition or individuals with bipolar disorder were not considered. Third, studies were considered that examined the efficacy of a physician-prescribed medication and oversight intended to reduce depressive symptoms. In order to be included, the medication must have been one condition in isolation, and not combined with another intervention (e.g., cognitivebehavioral therapy with medication). Fourth, outcome measures were standardized psychometric instruments for determining severity of depressive symptomatology. Where more than one measure was used, each measure was coded independently.
Initial screening of articles was conducted by the second author and a research assistant independently reading the title and abstract of each manuscript and including or excluding the article based on the four eligibility criteria. In the event it was unclear from the abstract or title that the article met each of the eligibility criteria, the article was retained in the search rather than excluding it at this initial stage. In cases of research syntheses, articles included were added to the list of articles screened for inclusion, removing duplicates as necessary. Article assessment and selection was performed independently by the second and third authors, with disagreements between reviewers resolved by consensus.

Data collection and analysis
A data collection sheet was developed by the research team and pilot tested using five randomly selected articles. The second author performed all data extraction, with the third author independently checking the extracted data on 100% of the studies. Disagreements were resolved through discussion. In the event no agreement could be made, the first author acted as the tie-breaker. Information extracted from each article included the following: (a) characteristics of participants (mean age, % female, and race), (b) type of medication administered, (c) treatment length, (d) study design, and (e) measures of effect along with raw scores when available. Because the objective of this analysis was to be as inclusive as possible, the authors hypothesized that effect sizes of individual studies may differ as a result of study design.
Two meta-analyses were performed. First, Response Rate (RR) was calculated for children and adolescents with MDD to treatment with psychopharmacologic medication. Although response rate is not a standard effect measure, it does allow for the inclusion of summary data from studies that may not have had a control condition. Second, an experimental odds ratio (OR) and 95% confidence intervals were calculated for all studies using response rates from the studies versus the average response to placebo (RR = 49.15%) and weighted to the treatment n. For example, if the treatment group (n = 10) had 6 responders, the placebo condition was determined as n = 10 with 5 (10 x 0.4915) responders for OR = (6/10)/(5/10); OR = 1.2. The OR was calculated for two reasons: (a) it provides a standard measure of effect that is recognized in medical research and (b) it provides an index of how each study's results relate to the standard response rate.
The meta-analyses were then performed using SPSS v. 22 and Comprehensive Meta-analysis (v. 2.2) employing a random-effects model. In order to account for studies that included more than one outcome measure, the most conservative measure (determined by ranking total effects from each measure across studies) was used. In the event a study did not provide data on moderating variables (e.g., age, % female) the mean of the included studies was used. Studies were then weighted by the inverse of the variance consistent with procedures set forth by Lipsey and Wilson [28].

Risk of publication bias
The probability of a treatment effect reported in a systematic review resembling the truth depends on the validity of the studies included in the analysis because certain methodological characteristics may be associated with effect sizes [29,30]. Therefore, it is important to determine whether the obtained sample of studies were representative of the totality of research conducted on the efficacy of psychotropic medications to treat childhood and adolescent depression. The possibility of bias resulting from a tendency of only positive findings being published -known as the "file drawer effect" -was addressed using two methods: calculating the fail-safe N [31] and the p-curve approach [32]. The fail-safe N is determined by calculating the number of studies with an average null result necessary to make the overall results insignificant. The p-curve was introduced to account for "p-hacking", a theory asserting that researchers may be able to get most studies to find positive results through differing statistical methods [32]. The p-curve assesses the skew of the p-values reported to determine if p-hacking has occurred. Essentially, data that skews to the right is evidence of little or no p-hacking, whereas data skewing to the left may be evidence of p-hacking.
In primary studies, regression is used to determine the relation between one or more moderators and a dependent variable. The same approach is essentially used with meta-regression except that the covariates are at the level of the study rather than the level of the participant, and the dependent variable is the effect size in the studies rather than participant scores [23]. In the present study, fixedeffects model meta-regressions were computed using response rate to medication with the following moderator variables: age, gender, treatment length, study design, medication class, and measurement instrument. Categorical data (medication class and study design) were dummy coded into binary data where each category was coded individually as "yes" = 1 "no" = 0. The following statistic was used to test the significance of the slope(s):

Study selection
A total of 38 studies involving 14 different prescribed antidepressant medications were identified for inclusion in the analysis. Figure 1 provides a diagram of the study selection process in accordance with the PRISMA (2009) standards. The search of Medline, Psych INFO, and ERIC databases resulted in a total of 2,311 citations after removal of duplicates, with 23 articles included from previous reviews. Of the 2,334 citations, 2,267 were excluded after reviewing titles and abstracts. The 67 remaining manuscripts were read in their entirety resulting in 38 studies meeting all inclusion criteria and being incorporated into the meta-analysis ( Figure 1).

Study characteristics
The majority of studies (n=22) were Randomized Placebo-Controlled Trials (RCTs), followed by open-label studies (n=11), and comparison designs (n=5). Treatment length ranged between four and 12 weeks, with the majority of studies (n = 19) employing an 8-week treatment phase (not including added "placebo lead-ins").
Participants: A total of 2,328 participants from the 38 studies received antidepressant medication treatment. In addition, there were a total of 1,627 placebo participants in the included 22 randomized controlled trials. Sample sizes of the included studies ranged from 3 to 185 (M = 56.75, SD = 55.47). Participant characteristic means were calculated by using the study mean and weighting each study in SPSS by the n of each study's treatment condition. Mean age of the participants ranged from 9.1to 18.8 years old with a weighted mean of 13.92 (SD = 1.74) [33,34]. The percent of females included in each study ranged from 11% to 87% with a weighted mean of 56.31 (SD = 10.25) [35,36]. In addition, the percentage of ethnic-minority participants was quite small (.8% to 14.5%). Table 1 presents study characteristics and results of individual studies (Table 1).

Syntheses of medication results
Response rates: Syntheses of each medication classification and medication were completed using the response rates from each study. Each study was weighted by the treatment n. Mean RR for SSRI medication was 60.79% (SD = 8.47), followed by TCAs (RR = 54.22%, SD = 13.48), and the 'other' category (RR = 52.92%, SD = 8.95). Of the SSRIs, Sertraline (RR = 67%) had the highest response rate, followed by Escitalopram (RR = 64%), and Fluoxetine (RR = 63%), while Citalopram performed the poorest (RR = 52%). Amitriptyline, a TCA, had a Response rate of 76%, although the three studies examining its impact amounted to a modest total participant n of 34. In the 'other' category, Nefazodone had the highest average response rate (RR = 76%), followed by Bupropion (RR = 73%) however, only one study examined Bupropion's impact with only 11 participants and an openlabel design. Finally,a total of 1,180 participants responded to placebo conditions in the RCT studies from a total of 2,401 placebo participants for a placebo response rate of 49.15%.
Odds ratios: Response rates were calculated into Odds Ratios for all studies using the response rate from included studies versus the average response to placebo (RR = 49.15%) and weighted by the inverse of the variance. The odds of responding to SSRI medication were 1.582 times the odds of responding to the average placebo response (95% CI 1.37-1.82, p = 0.000). The odds of responding to TCAs were 1.163 times (95% CI 0.86-1.58, p = 0.335) followed by the other category of 1.176 times (95% CI 0.89-1.56, p = 0.263).

Risk of Publication Bias Within Studies
The possibility of bias in publication resulting from a tendency of only positive findings being published was assessed through two methods: (a) Rosenthal's fail-safe N (Rosenthal, 1979) and the p-curve approach [32]. Table 2 provides results of analyses related to publication bias (Table 2).

Fail-Safe N:
The fail-safe N was calculated for each class of medication and each individual medication (provided more than two studies were present) using the experimental Odds Ratio calculated from the aggregate placebo response rate of the included studies with placebo controls. Of the classifications of medications, only SSRIs had statistically significant positive findings and those positive findings would require 184 studies with null findings to bring the p value to insignificant levels. The most significant findings from SSRI medications with more than five studies were for Fluoxetine, which would require 19 null studies to bring the p value to >0.05 followed by Paroxetine and Sertraline. Further, Amitriptyline was the only TCA with positive findings and would require only two studies with null findings to invalidate the results.

P-Curve:
The p-curve was applied to account for p-hacking [32]a theory suggesting that studies may have utilized certain statistical procedures to ensure finding positive results. In calculating the p-curve, only medications or classifications of medications with more than five studies were included. Results indicated that neither the extant SSRI literature (p = .1104) nor the specific medication Fluoxetine (p = .5328) have sufficient evidence in their findings, although there does not appear to be evidence that the literature has been p-hacked. Furthermore, the TCA literature lacks evidence of findings (p=0.0197) and nearly has evidence to suggest that it has been p-hacked (p=0.0648).

Additional Analyses
Meta-Regressions: Meta-regressions were performed to determine which of the moderator variables predicted response rate to medication. Of the medication classes coded, not being a TCA (t = -10.052, p = 0.000) or the 'other' category (t = -2.975, p = 0.003) significantly predicted response rate. Of the designs employed, only open-label trials significantly predicted RR (t = 2.927, p = 0.004). Further, % of females in the study (t = -2.742, p = 0.006) and participant age (t = 3.389, p = 0.001) significantly predicted response rate to treatment.
ANOVAs: Two analyses of variance (ANOVA) were performed to determine if certain medication classes or study designs significantly affected the response rates to medication. Regarding medication classes, differences between groups were significant (F [2, 887] = 5.075, p = 0.007). Scheffe post-hoc tests were then performed to examine individual differences with SSRIs significantly outperforming the TCA category (M = 0.0515, p = 0.010), but not the 'other' category (M = 0.0215, p = 0.353). Further, the 'other' category was also associated with higher response rates than TCAs (M= 0.03004, p = 0.342) but not SSRIs (M = -0.0215, p = 0.353). Of the study designs, differences between groups were significant (F [2,887] =9.598, p=0.000) with both randomized control-trials (M = 0.03266, p = 0.001) and open-label trials (M = 0.05558, p = 0.001) significantly impacting response rates compared to comparison trials. However, there was no significant difference between RCTs and openlabel trials in terms of response rate.

Discussion
Results of the present analysis can be summarized as follows: The majority of studies had Caucasian participants and the percentage of participants who were African-American and Hispanic from the few studies that included them were very low. When there were sufficient studies, the most effective class of medication was SSRIs with Sertraline having the highest response rate and Citalopram the lowest response rate. However, regardless of medication class, overall Nefazodone was the medication with the highest response rate. It is important that this finding is interpreted cautiously because it is based on only 38 participants. Although Amitriptyline and Bupropion had response rates higher than the SSRIs, the number of studies for each (N = 3, N = 1), respectively, was small. In terms of risk of bias, it is unlikely that the extant SSRI literature suffers from the file drawer problem because it would take 184 studies with null findings to bring the p value to   In terms of moderator variables, not being a TCA or 'other' medication predicted response rate, while open-label trials was the only design that predicted response rate, and age and gender (females) predicted response rate to treatment. Specifically, as the age of participants increases, so too does their response rate to medication. Also, the lower the percentage of female participants, the higher the response rates. Of the study designs, RCTs and open-label trials resulted in significantly higher response rates than did comparison trials.

Medication class, type, and efficacy
It was not easy to reach unequivocal conclusions regarding which class and type of medication was the most efficacious. The reason was because some medications (e.g., Amitriptyline, Bupropion) had too few studies and participants to determine their differential efficacy with any veracity. Response rates and odds ratios were calculated and both indicated that SSRIs were more effective than TCIs and medications in the 'other' category when sufficient studies and participants existed. These results are consistent with past research in which TCAs have repeatedly been found to be no more effective than placebo and present the possibility of severe, even lethal side effects [7,8,20,37].
Another important point in determining differential efficacy of medication involves the three phases with which depression is treated: acute, continuation, and maintenance. Boylan et al. [1] stated that most studies in children and adolescents have evaluated treatments during the acute phase, with only one controlled trial for continuation [38] and no maintenance studies. Therefore, recommendations regarding medication efficacy during the continuation and maintenance treatments must be extrapolated from the adult literature, but there are a variety of pharmacokinetic differences that may impact how youngsters respond to medication during these phases [39].
At this juncture, there are simply too few studies to reach definitive conclusions regarding which class and type of medication is the most efficacious for children and adolescents. However, the current analysis was the first to statistically compare medications and, within the class of SSRIs, determine that Sertraline had the highest response rate while Citalopram the lowest response rate. Relatedly, Pfalzgraf et al. [40] surveyed child psychiatrists and found that the antidepressants of choice tended to be the SSRIS fluoxetine or sertraline -a finding that is consistent with the results of the present meta-analysis.

Publication Bias: The "File Drawer Problem"
The "file drawer problem" was articulated by Rosenthal [31] approximately 35 years ago. Basically, research on a given topic that has not been published (i.e., nonsignificant results) cannot be determined. This systematic omission from the literature may distort an effect size obtained from a meta-analysis and that the exaggerations are strongest when the true effect size approaches zero [25]. In the present study, results of the Fail-Safe N reflect how fragile "significant effects" from a body of literature can be. For example, Amitriptyline, the only TCA with positive findings, would only require two studies with null results to invalidate its effectiveness. On a positive note, for the statistically significant findings of SSRIs to be invalidated would require 184 studies with null findings. The most significant SSRI finding was for Fluoxetine, which would require 19 null studies. Ironically, in the present study, Fluoxetine was one of the least effective SSRIs, yet it is the only medication in which there was the most confidence that no p-hacking occurred. Conversely, not only were TCAs no more effective than placebo, there was some evidence suggesting they had been p-hacked.
Although attempts, such as the two procedures used in the present analysis to account for the file drawer problem, have often been undertaken by researchers, some unknowable number of nonsignificant findings remain unrecoverable. Furthermore, Kromrey and Rendina-Gobioff [41] concluded that current statistical methods to account for publication bias may fail to control Type I error rates or lack sufficient power. Therefore, Howard et al. [42] proposed a different way of addressing the file drawer problem using the example of the psychotherapy efficacy literature. Rather than correcting for bias statistically, they suggested performing a new mini-literature review meta-analysis of all their new studies and whether the results approached the value of a meta-analysis obtained from the entire *Too few studies to calculate Fail-Safe N -presumably biased (i.e., file drawer effect) -literature, or whether results were closer to the null value (d = 0.00). Clearly, the solutions to the file drawer problem present a vexing and challenging issue to metaanalytic research and it will likely take a paradigm shift to truly address this problem such as authors submitting only their literature review and methods, abandoning conventional inferential statistics in favor of Bayesian approaches, or registering studies and protocols online prior to conducting a study [43].

Age, gender, and response rates
In the present analysis, participants' response rates to medication increased as did their age. This finding is congruent with results of previous reviews [6] and also data indicating that some neurotransmitter systems related to affect are not fully mature during childhood and adolescence [39]. Moreno et al. [39] described a myriad of pharmacokinetics that impact children and adolescents' response to antidepressants compared to adults such as lower absorption rate, increased metabolism rate, reduced level of drug protein binding, and more permeable blood-brain barrier. They also believed that the typical approach for children and adolescents -based on adult studies -of proportionally reducing doses by body weight may result in nontherapeutic levels that would yield negative results. Results from the present analysis partially confirm this assumption because older participants had a greater response rate then younger participants, but refuted it because significant effects were obtained for all aged participants. Therefore, in the lack of a pediatric dosing protocol, the present approach seems sufficient and appropriate.
Gender differences in the prevalence, phenomenology, and natural history of MDD have been well documented. The rate of MDD is approximately equal between boys and girls, but during adolescence, there is a dramatic increase in depression among females, and that trend continues through adult life [44,45]. Although men and women differ in the metabolism and distribution of antidepressants, actual gender differences in terms of response rate remains a controversial topic that is marked by individual-specific variability that may be due, in part, to genetic disparities [44]. In the present study, the lower the percentage of female participants, the higher the response rates. However, gender and age interact and influence each other. Participants response rate improved the older they got and with fewer females.

Design
In terms of study design, randomized control trials and openlabel trials had participants with significantly better response rates than participants in comparison trials. However, there were no differences in response rate between randomized control-trials and open-label trials. A similar result was obtained by Biederman et al. [46] who conducted a meta-analysis of open-label versus randomized, placebo-controlled trials to predict results of psychopharmacologic treatments for pediatric bipolar disorder. They concluded that openlabel trials are useful predictors of the potential safety and efficacy of a psychopharmacologic agent to treat youth with bipolar disorder. It may come down to a clinical judgment of perceived risk whether to expose patients to randomized placebo control-trials versus open-label trials. For example, in a sample of adults receiving antidepressive treatment, Deuschle et al. [47] found no difference between each research design in terms of clinical outcome, but suggested that randomized controltrials may expose patients to an increased risk of adverse events compared to the open condition. This increased risk is mitigated in most other fields of medicine by using a standard of care (SOC) guideline as the control condition. For example, the American Society of Clinical Oncology (ASCO; 2014) publishes guidelines for the most current SOC for treating each type of cancer [48]. Those standards of care are then used to inform quality of practices used by physicians and to guide subsequent clinical trials that use the SOC as a control condition with other treatments compared to it, thereby ensuring that the patient is not exposed to increased risk. Unfortunately, the field of psychiatry has relatively vague guidelines for treating depression in adults [48] allowing for the use of a variety of pharmacotherapies (e.g., SSRIs, SNRIs, Bupropion), psychotherapy, or other somatic therapies (e.g., electro-convulsive therapy), making the systematic study of any treatment modality against a SOC difficult.

Conclusion
Major depressive disorder in children and adolescents has been the topic of much research including, but not limited to, diagnosis, assessment, prevalence, characteristics, and treatment. In the latter case, most of the research points to cognitive-behavioral therapy and psychopharmacology -or a combination of both -being the most efficacious approaches for treating children and adolescents with MDD. The current review analyzed 38 studies, which is the most to date, in order to address several methodological issues inherent to previous meta-analyses. Specifically, the present review was the first to adhere to the PRISMA standards for reporting systematic reviews and metaanalyses of studies regarding efficacy of psychopharmacologic therapy to treat MDD in children and adolescents. In addition, the current review was the first to statistically compare the efficacy of different classes and types of medication, address publication bias the filedrawer problem presents, and examined various moderator variables.
There are several limitations to the present study. First, there are different data-bases from which studies can be obtained (e.g., Medline versus Pubmed). Also, it is always problematic determining which search terms to input. These considerations may lead to different studies being obtained. That is why the PRISMA statement recommends authors include their search parameters so that replication is possible. Second, there were simply too few studies reviewed to examine the impact of race on the effect of antidepressants to treat MDD in children and adolescents. Third, the outcome measures used were not consistent across studies ranging from a variety of clinician rating scales (e.g., CGI-I) to depression inventories (e.g., BDI), thus results should be treated with caution. Fourth, the results regarding the differential efficacy of antidepressants must be evaluated carefully because of the small number of studies and/or participants in individual studies for certain classes and types of medication. For example, in the present review, there were only one study a piece on the response rates of Azalopram and Bupropion. By far, the most research has been conducted on SSRIs, but the comparative efficacy of medications within this class has been sparse. There is also little research on the comparative efficacy of SSRIs versus third generation antidepressants that make up the serotonin-norepinephrine-dopamine reuptake inhibitors (SNDRIs). Among adults, Oliver et al. [49] found third generation antidepressants to be just as effective and safe as the previous class of SSRIs. A final area for future study needs to address socioeconomic status and ethnicity upon response rate to antidepressants. Especially important is ensuring that future metaanalyses adhere to the PRISMA standards and, regardless of the study design, participants receive a treatment as is typical in other areas of medical treatment.