Advancements in Medical Statistics Promote Economies of Scale

Modern medical studies, especially those intended for high impact journals, can easily cost millions of dollars (USD). For example, cost of drug development, including phase I to III trials, averages about USD 800 million [1,2]. Studies searching for single nucleotide polymorphisms (SNPs) and copy number variants (CNVs) that contribute to disease are also costly. The full processing cost of an Affymetrix SNP 6.0 microarray, for example, can be $600 [3] multiplied by the 6000 to 9000 patients typical for recent high impact publications of GenomeWide Association Studies (GWAS) [4-9]. Still, researchers aim for even larger sample sizes because clear answers of variants’ relationships with disease are not forthcoming given current sample sizes. For example, exome sequencing studies of 5300 and 6700 patients only yielded either very rare variants or variants with small effect size [10]. More traditional clinical studies, which measure clinical parameters to differentiate between groups of patients, may also have high costs due to recruitment, treatment, testing, and follow-up. They may also have high cost if a large sample size is required to achieve sufficient power for detection of significance in the questions of interest [11,12].


Introduction
Modern medical studies, especially those intended for high impact journals, can easily cost millions of dollars (USD). For example, cost of drug development, including phase I to III trials, averages about USD 800 million [1,2]. Studies searching for single nucleotide polymorphisms (SNPs) and copy number variants (CNVs) that contribute to disease are also costly. The full processing cost of an Affymetrix SNP 6.0 microarray, for example, can be $600 [3] multiplied by the 6000 to 9000 patients typical for recent high impact publications of Genome-Wide Association Studies (GWAS) [4][5][6][7][8][9]. Still, researchers aim for even larger sample sizes because clear answers of variants' relationships with disease are not forthcoming given current sample sizes. For example, exome sequencing studies of 5300 and 6700 patients only yielded either very rare variants or variants with small effect size [10]. More traditional clinical studies, which measure clinical parameters to differentiate between groups of patients, may also have high costs due to recruitment, treatment, testing, and follow-up. They may also have high cost if a large sample size is required to achieve sufficient power for detection of significance in the questions of interest [11,12].
The burden of funding required for these studies strains the resources of even large pharmaceutical companies [4], wealthy corporations [13], and government funded groups, The Cancer Genome Atlas (TCGA) [14], the National Heart, Lung and Blood Institute and Exome Sequencing Project (ESP) [15], who host some of the largest current medical studies. To fund studies of increasing size, it is necessary to implement economies of scale. Economies of scale maintain or lower study cost while increasing the study's number of patients, number of potential analyses, and power. The majority of articles in this special issue increase economies of scale in one or more of these ways.

Recent Advances
Four studies in this issue increase their economy of scale by lowering the cost per patient. Two studies accomplished this by inexpensive data collection. Rogus et al. [16] retrieved data that was obtained from a health system in the normal course of business. In this way, over 600,000 measurements from over 39,000 patients were obtained at low additional cost. Such a large number of patients gave the study large power to detect the difference between case and control groups, which was found to be highly significant. Maetani and Gamel [12] used existing data from the Cancer Institute Hospital in Tokyo, which included surgery, follow-up, and survival of about 3600 patients for up to 50 years. The relatively low cost of obtaining existing data enabled them to obtain sufficient sample size to run analyses on three subgroups by age, 50 subgroups by demographic and prognostic factors, and on patients treated in different years with different regimens. Lifelong follow-up data was also cheaply and readily available, as required for validation of their Boag model [17]. Another two studies lowered their cost per patient by creating new study designs. Matched case-control studies can have high cost due to issues like follow-up costs until the end of recruitment. Sugihara et al. [18] created a study design that avoids this cost. Their dynamic registration method draws from a pool of potential patients while balancing the two groups of patients for all prognostic factors. Erbas et al. [19] avoided the cost of control recruitment and follow-up by implementing a case-crossover design for their asthma study. Given typical recruitment difficulties, the larger the study, the more resources this design will save.
Advancements in medical statistics also increase economies of scale by increasing power and/or the number of feasible analyses without increasing patient size. Meta-analysis increases power with very little increase in cost by combining existing studies into a single dataset of larger patient size. Meta-analysis also makes additional analyses feasible, when in a single study these analyses either cannot be run or cannot reach statistical significance. Sandoval and Zarate [11] list these and six more ways that economy of scale is increased by meta-analysis.

Rogus et al. [16]
, Ketchum et al. [20], and Maetani and Gamel [12] increased the ability of their studies to achieve significant results without increasing cost. The advancement of Rogus et al. [16] was to use Monte Carlo resampling to correct for intra-subject correlation. This correlation kept their highly negative z-statistic from being significant despite the large number of subjects in the study. Maetani and Gamel [12] developed the Boag model in such a way that the expense of longterm follow-up is not necessary to predict the overall survival curve and the long-term effects of the measured variables in a study. Ketchum et al. [20] developed a within-subject normal-mixture model, minimizing costs by making only a few subjects necessary to obtain significant results. This was achieved by incorporating data from multiple tests per subject into the model. The model has accurate estimation with as few as 20 patients. It uses all the data and can include all covariates.
Several studies in this issue increase the number of potential analyses without increasing cost. The study design of Erbas et al. [19] allows more analyses than a case-control design study of the same size. Particularly, they had more power to detect interactions of time-related risk factors with asthma. Halabi [21] developed an adjustment for type I error rate that preserves the rate at the selected alpha level through multiple intermediate analyses. The adjustment also allows flexibility in the allocation of the type I error rate. In this way, the cost of the study is maintained while maximizing the study's ability to detect meaningful changes. Also, the study becomes adaptable to intermediate discoveries without the need to make costly changes to the study or to design an entirely new study. Bergemann et al. [22] combined three test scores over four time points to make global normalized z-scores. These scores have more power, detecting significant differences at all time points when single test scores only detected significance in some time points. The global scores also yield lower p-values than in previous analyses.