Instrumental Variable Analysis in Epidemiologic Studies: An Overview of the Estimation Methods

Instrumental variables (IV)analysis seems an attractive method to control for unmeasured confounding in observational epidemiological studies. Here, we provide an overview of the estimation methods of IVanalysis and indicate their possible advantages and limitations.We found that two-stage least squares is the method of first choice if exposure and outcome are both continuous and show a linear relation. In case of a nonlinear relation, two-stage residual inclusion may be a suitable alternative. In settings with binary outcomes as well as nonlinear relations between exposure and outcome, generalized method of moments (GMM), structural mean models (SMM), and bivariate probit models perform well, yet GMM and SMM are generally more robust. The standard errors of the IVestimate can be estimated using a robust or bootstrap method. All estimation methods are prone to bias when the IVassumptions are violated. Researchers should be aware of the underlying assumptions of the estimation methods as well as the key assumptions of the IVwhen interpreting the exposure effects estimated through IV analysis. Instrumental Variable Analysis in Epidemiologic Studies: An Overview of the Estimation Methods


Introduction
Instrumental variable (IV) analysis has primarily been used in economics and social science research, as a tool for causal inference, but has begun to appear in epidemiologic research over the last decade to control for unmeasured confounding [1][2][3][4][5][6]. An IV is a variable that can be considered to mimic the treatment assignment process in a randomized study [7][8][9][10]. IVanalysis generally involves in a two-stage modelling approach to estimate the exposure effects. In the first stage, the effect of the IVon exposure is estimated, whereas in the second stage, outcomes are compared in terms of predicted exposure rather than the actual exposure [11]. To value the estimates obtained through IVanalysis, it is important to understand the underlying methodology of the estimation methods in the IV analysis.
Over the last decade several reviews of IVanalysis were published, covering various aspects including the key assumptions, estimating parameters, possible IVs, estimation methods, reporting of the results, and the use of IVs in comparative effectiveness research [3,4,[12][13][14][15][16][17][18][19][20][21][22][23]. We summarized these reviews in Table 1. However, none of these articles included all possible estimation methods of IVanalysis. Hence, we aimed to provide an overview of the estimation methods and to indicate their possible advantages and limitations. After a general introduction to the assumptions underlying IVanalysis, we will describe the methods that have been used in IVstudies in medical research.

Instrumental variables
The IV is an observed variable, which is related to exposure and only related to the outcome through exposure. This resembles a randomized trial, in which treatment allocation typically almost perfectly coincides with the actual treatment received and (in case of a double blind trial) treatment assignment only affects the outcome through the received treatment (hence the term pseudo-randomisation that is used for IVmethods). This implies that an IVis neither directly nor indirectly (e.g. through observed or unobserved confounders) associated with the outcome [6,18,24]. Therefore, all observed and unobserved confounders should on average be equally distributed among different levels of the IV(similar to a randomized trial). These assumptions are illustrated in Figure 1. Along with these basic assumptions, there are other assumptions (i.e., homogeneous treatment effects, monotonicity) that are needed for point identification of IVestimates [14,19].

Notation
Throughout this article, we use the following notation: Y denotes the outcome, X denotes exposure, and Z denotes the IV. C and U denote the (one or more) observed and unobserved confounding variables, respectively. X denotes the predicted value of exposure. Finally, IV βˆindicates the IV estimator, i.e., the estimator of the causal relation between exposure and outcome.

Ratio estimator (RE)
In a study with a single binary IV, the RE (also called Wald [25] or grouping estimator) can be applied and which is expressed as:  Figure 1: Schematic presentation of valid and invalid instrumental variables X, Y, Z, and U denote the exposure, outcome, IV, and confounders (observed or unobserved), respectively. a) Z is associated with X and only related to Y through X (valid IV), b) Z is not associated with X (first IV assumption is violated), c) Z is not independent of confounders, i.e. Z has an indirect effect on Y (second IV assumption is violated), d) Z is not independent of Y given X and U, i.e. Z has a direct effect on Y (third IV assumption violated) where 1 x and 1 x are the mean of y and x, respectively, when Z=0 and 0 y and 0 x , when Z=0; is the difference in probability of being exposed for Z=1 and Z=0; and is the risk difference of an event between Z=1 and Z=0. Equation (1) is suitable for settings with continuous exposure and continuous outcome, equation (2) for binary exposure and continuous outcome [26,27], and equation (3) for binary exposure and binary outcome.
The RE is a simple estimation method to estimate the exposure effects from the IVanalysis. However, it is not suitable for multiple IVs or in a situation when measured confounders need to be adjusted for in the analysis.

Two-stage least squares method (2SLS)
The best known two-stage method for IVanalysis is the2SLS method which is traditionally used in IVanalyses [10,28,29]. Unlike ratio estimators, this method is able to adjust any possible measured confounders. The 2SLS estimator can be obtained by the following models: The first model estimates the effect of the IVon exposure, whereas in the second model outcomes are compared in terms of predicted exposure rather than the actual exposure. The latter model yields the estimated parameter, IV βˆ, which is the IVestimator. For a single IV, the IV βˆis equivalent to the estimators in the equations (1), (2), and (3).In case of multiple IVs, information on these IVs can be simultaneously incorporated in model (4). Then, IV βˆis the weighted average of the ratio estimators [30]. For multiple IVs, 2SLS provides biased estimates [30][31][32] and another method, e.g., limited information maximum likelihood (LIML), [33] can be an alternative. One of the conditions of this method is that the error term should be homoscedastic (homogeneity of variance). However, in case of heteroscedasticity, other methods (e.g., generalized method of moments) can be considered [34]. Moreover, the 2SLS may produce biased results in the case of binary variables or non-linear relation between exposure and outcome ( Table 2).

Linear probability model (LPM)
This method is a particular form of the 2SLS in which the outcome, exposure, and IV are binary and provides exposure effects on the risk difference scale. When there is a single binary IV, the estimator can be expressed as in equation (3) [13,[35][36][37].
LPM is a simple technique to estimate the parameter and interpret as the regression coefficients based on linear regression. However, in linear IVanalysis, LPM may provide ambiguous results because the common technique of linear IVis designed for a continuous response [38]. It should be noted that the LPM of binary exposure and outcome may produce predicted values outside of the 0-1 range [28]. Hence, for rare binary outcomes, some predicted probabilities may become negative [39]. In addition, the probability of success increases linearly with exposure, that is, the marginal or incremental effect of exposure remains constant [37], which is logically impossible for binary outcomes [14].

Two-stage predictor substitution (2SPS)
The two-stage predictor substation is an extension of the 2SLS to nonlinear models, which targets a marginal (population-averaged) odds ratio [36,[40][41][42]. In the first-stage, a nonlinear least squares method (NLS) or any other consistent estimation technique is used to estimate the relation between the IVand exposure [43]. Then, the predicted exposure status from the first-stage model replaces the observed exposure as the principal covariate in the second-stage model on the outcome [43,44]. For a continuous exposure and outcome, 2SPS and 2SLS show similar results [24,36].

Two-stage residual inclusion (2SRI)
2SRI (also called control function estimator) [45] is another twostage method and was first suggested by Hausman [46]. The general notion of the 2SRI is to include the error terms (residuals) from the first-stage model as an additional variable along with the exposure in the second-stage model [47]. The models in the first and second-stage can be either linear or nonlinear models. In case of linear models, the 2SRI estimate is equivalent to the 2SLS and 2SPS estimates [44,48]. Baiocchi   -it requires specification only of certain moment conditions -applicable for the linear and nonlinear models -non-linear GMM estimator is asymptotically more efficient than 2SLS -more robust and less sensitive to parametric conditions -works better than 2SLR when exposure and outcome are binary -in case of heteroskedasticity, this is more efficient than the linear IV estimators -GMM estimator with logistic regression model is not consistent for the COR due to non-collapsibility of the OR Bivariate probit models (BPM) -two-stage method, but as different to 2SLS and model the probabilities directly and are restricted on [0,1] -full information maximum likelihood is used to estimate the parameter -accounts for the correlation between the errors Probit coefficient* -for binary outcome and exposure, BPM perform better than linear IV methods -the estimator of BPM have no interpretation like OR. However, by multiplying a probit coefficient by approximately 1.6, the estimator can be made to approximate OR -when the distribution of error terms are not normal or the average probability of the outcome variable is close to one or zero, the BPM estimator may not be consistent for ACE

Remarks for all methods:
-all basic IV assumptions are needed for all estimation methods and violation of any IV assumption, all methods provide biased results -under the constant exposure effect, all methods provide ACE; in case of a heterogeneous treatment effects, under the monotonicity and no effect modification assumptions, all methods (except SMMs) provides LATE and SMMs provides ATT, respectively *the bivariate probit model is fully parametric, all of the treatment parameters such as risk difference, odds-ratio or risk ratio, can be derived from the probit coefficients as marginal effects. However, for logistic regression model (LRM), 2SRI estimator may not provide causal odds ratio due to non-collapsibility of the odds ratio.
2SRI yields consistent estimates for both linear and nonlinear models [49,50]. The advantage of 2SRI over 2SLS is that 2SLS is only consistent when the second-stage model is linear, whereas this restriction does not hold for 2SRI [43,51]. Moreover, this method shows more precise estimates than 2SPS [52].

Two-stage logistic regression (2SLR)
When both the outcome and exposure are binary and the interest is to use IV to estimate odds ratios, 2SLRcan be applied. It is similar to 2SLS, but instead of linear models using logistic models in both stages [4,53]. This method is fully parametric and maximum likelihood estimation is used to estimate the parameters. If the first-stage logistic model is not correctly specified, the estimates from the second-stage can be biased [54,55]. Also, note that this method may not provide the causal odds ratio due to the non-collapsibility of the OR [19].

Three-stage least squares method (3SLS)
The 3SLS generalizesthe 2SLS. Possible correlation of the errors ( 2 ε and 2 ε ) in equations (4) and (5) is not taken into account by 2SLS. 3SLS accounts for the possible correlations between errors and may improve the efficiency of the estimator [56,57]. Unlike 2SLS, in which the coefficients of the two equations are estimated separately, in 3SLS all coefficients are estimated simultaneously. This requires three steps. The first-stage is similar to the 2SLS, i.e., a linear regression of X on Z to get X. In the second-stage, the residuals of the secondstage 2SLS model are obtained to estimate the cross-model correlation matrix (correlation between error terms in both models). Finally, in the third-stage the estimated correlation matrix is used to obtain the IVestimator. When there is no correlation between the error terms of the 2SLS models, the 3SLS reduces to a 2SLS. However, 3SLS is more vulnerable to misspecification error since misspecification of one of the models in the first or second will affect the third stage model [58].

Structural mean models (SMMs)
SMMs explicitly use counterfactuals or potential outcomes [52], which were originally proposed by Robins [59] in the context of randomized trials with non-compliance to estimate the causal effects for the treated (exposed) individuals. SMMs are semi-parametric models and use IVs via G-estimation for identification and estimation of the causal parameter. This method involves the assumption of a conditional mean independence [14,19,[60][61][62] and does not make distributional assumptions about the exposure [19]. SMMs with an identity link is sometimes called additive SMMs and can be used for continuous outcomes and multiplicative SMMs with log-linear model can be used for positive-valued/binary outcomes in order to estimate the causal risk ratio [19,63]. Additionally, the logistic structural mean model (LSMM) developed by Vansteelandt and Goetghebeur [64] and Robins and Rotnitzky [65] can also be used for binary outcome in order to estimate causal odds ratio [19,63].
To handle continuous outcome data, the IV estimator from the additive SMMs can be expressed as equation (2) given that the assumptions of CMI and no effect modification by Z are fulfilled [14,62,66,67]. This estimator provides the average treatment effect (ATT) for the treated individuals [19,68].
The advantage of this method is that it relaxes several of the modelling restrictions such as homogeneous treatment effects required by more classical methods such as RE/two-stage IV methods [14,19]. One of the key assumptions of this method is no effect modification, which is difficult to verify in practical situations [67].
SMMs have been extended by Robins [60] to a general setting of structural nested mean models (SNMM) for repeated measures at multiple time points. The SMMs are a subclass of the SNMM [59,69]. When instruments, exposures, and confounders are time-dependent,  [14]. Details and mathematical formulations of SMMs are described elsewhere [14,19,63].

Generalized method of moments (GMM)
When applying the GMM a system of equations is set up, which is then solved numerically using computer algorithms. This technique was formalized by Hansen [70] and is a broad class of estimation methods that allow for a larger number of equations (moment conditions) than parameters [4,53,71] that are not possible in the MSMM and LSMM [19]. More clearly, the GMM allows for estimation of parameters in an over-identified model (number of IVs greater than the number of exposures). GMM with linear model can be similar to the ones used in 2SLS [72] but GMM is also a non-linear analogue of 2SLS [17], which is called multiplicative GMM. Detailed explanations can be found elsewhere [4,19,53].
In general, the nonlinear optimum GMM estimator is asymptotically more efficient than 2SLS [73]. Since GMM is a moment based method without parametric assumptions, it is less prone to model misspecification than 2SLR or bivariate probit models when exposure and outcome are binary [4]. In case of a linear model and single IV, the GMM estimator is equivalent to 2SLS, additive SMM, and LIML [53,66,74]. On the other hand, with log-linear model, (i.e., MGMM) [19], it is equivalent with MSMM and provides the population causal risk ratio [19]. However, this estimator with logistic regression model is not consistent for the causal odds ratio due to non-collapsibility of the odds ratio [17].
In case of a binary or count outcome, Palmer et al. [75] suggested a two-stage IV method where the first-stage is a linear regression and the second stage-model is a logistic or log-linear model [19]. Since IV analysis with logistic regression may not provide a consistent exposure effect, in order to estimate causal risk ratio, GMM with log-linear model is preferable. Moreover, 2SRI [48] is also applicable in the setting of count outcome.

Bivariate probit models (BPM)
When the outcome of interest is binary, so-called probit models can be applied for IV analysis. In contrast to 2SLS, probit models directly model probabilities (i.e., are restricted on (0, 1)) [4,30]. BPM can be applied in two-stages, but unlike common two-stage estimation methods, this method is estimated via full-information maximum likelihood, which takes into account the correlation between the error terms in the two equations [24]. A more detailed model description can be found elsewhere [4,30].
The interpretation of BPM parameters is not like those of ordinary regression model parameters (e.g., logarithm of odds ratio from a logistic model). However, by multiplying a probit coefficient by approximately 1.6 or 1.8, probit coefficients approximate the coefficients obtained through logistic regression [4].
In case of binary outcome, linear IV methods may yield biased results and BPM may be preferable [30,47]. Furthermore, the estimates are more efficient than 2SLS, whereas 2SLS models are more robust to incorrect modelling assumptions regarding the bivariate normal distribution of the error terms [76,77]. However, when the distribution of error terms is not normal or the average probability of the outcome variable is close to one or zero, or if there is more than one exposure, the estimates from the BPM are generally not consistent for the average causal effect [30,77].

Other estimation methods
Apart from the methods discussed above, the outcome variable in epidemiologic research may also be a time-to-event. Also in case of these outcome variables, IV analysis has been applied with two-stage method. In that case, the second-stage model could be a Cox proportional hazards model [78][79][80]. However, Brookhart et al. [3] stated that this approach for IV analysis is not motivated by a theoretical model and, therefore, parameters that are obtained from this approach may not be causally interpretable. Examples of this approach are a study of the effect of rosiglitazone on (time to) cardiovascular hospitalization and all-cause mortality using facility-prescribing patterns as an IV [78], and a study of the effect of adjuvant chemotherapy on (time to) breast cancer recurrence using physician preference as an IV [79].

Standard error and characteristics of IVestimators
Consider two-stage models for IV analysis, in which the predicted value of exposure from the first-stage model is included in the secondstage model. The uncertainty around this prediction is not taken into account in the latter model, which therefore may result in incorrect precision. Typically, standard errors (SEs) of the IV estimate from the second-stage model are too small [24,30,44,45]. An alternative method to estimate a correct SE is the so-called sandwich variance estimator (robust SE), which involves cross products of the predicted treatment and a dispersion factor based on the observed treatment [49]. Most statistical software packages provide this sandwich variance estimate [10]. Angrist and Krueger [10] noticed that these SEs are asymptotically valid, but in practice (with finite sample size) they are only approximately valid.
An alternative way of estimating SEs is the bootstrap method [81]. Here bootstrap samples of the original data can be used to estimate the variation in the IV estimates and hence its SE [4,6,[82][83][84]. It should be stressed that one of the weaknesses of the IV estimator is that it tends to display large SEs relative to the conventional regression estimator [13,85]. It is also noted that the IV estimator can perform poorly in finite samples and show biased results [31] and this bias is amplified when the IV is weak [14,31].

Interpretation of exposure effects from IV analysis
Researchers may be interested to estimate the average treatment effects over the entire study population [27]. However, it has been argued that the basic assumptions of IV analysis are not sufficient to achieve point estimates for the causal effect of exposure on the outcome, but only estimate upper and lower bounds of this parameter [14,86,87]. To achieve a point estimate of the average causal effect (ACE) over the entire study population, the additional strong assumption of homogeneity of exposure across levels of the IV should be satisfied [52]. Moreover, IVanalysis captures the ATT under the assumption of no effect modification by IV [52]. When exposure effects are not homogeneous across IV levels, under the monotonicity assumption (i.e., the IV affects the treatment deterministically in one direction), the IV estimate quantifies the local average treatment effect (LATE) [88], which is only informative for a subset of the study population, namely those who comply with the IV [27,[89][90][91].

Assessment of IVassumptions
As noted, IV analysis must satisfy three basic assumptions and if these assumptions do not hold, results may be severely biased [3,13]. The first assumption (i.e., the IV is related to exposure) is generally easier to check using available statistical methods than the other two statistical tool, given that a valid IV is present and IV analysis correctly applied. In that case, it can provide a valid estimate in the presence of measured and unmeasured confounding. However, if there is strong confounding effect, it is difficult to find an appropriate IV [13].
A limitation of our study is that we restricted ourselves to IV methods that are commonly used in epidemiologic research. We did not discuss nonparametric and Bayesian IV methods. We refer to the literature for examples of the methods [12,38,86,[101][102][103][104]. Because of limited space, we did not describe mathematical models with detailed derivation of IVestimators for all methods.
In conclusion, IV analysis is potentially powerful methods to control for confounding (both measured and unmeasured). Some estimation methods (e.g., 2SLS, 2SRI) can be applied in many situations, whereas others (e.g., RE, BPM, 2SLR) can only be applied in a limited number of situations. Irrespective of the methods that are used in a particular study, in order to provide valid interpretation of the exposure effect on the outcome, researchers should be aware of the underlying methodology of the estimation method as well as key assumptions of the IV. assumptions. The second (IV has no direct effect on outcome) and third (IVis independent of confounders) assumptions are unverifiable or not directly testable as they involve unobservable variables [1,13,18,19,68,76,92]. Some authors proposed circumstantial evidence to support these assumptions [2,5,93,94]. Alternatively, for the third assumption a falsification test based on the standardized difference can be applied [95].

Running Head: Methods for IV estimation
In order to check the first assumption, the F-statistic value from the first-stage linear regression model is widely used although this statistic highly affected by sample size [76,83,85]. There is a rule of thumb that if the F-statistic value is greater than 10, the first assumption holds [13,96,97]. Other measures for the strength of the association between IVand exposure include the first-stage regression coefficient of the IV [50,98] or the R 2 of a linear first-stage model [15,78,83], the odds ratio [6,93], or pseudo-R-squared of the first-stage model [76]. When the correlation between IVand exposure is not strong enough, IVanalysis is likely to be biased (weak IV bias, which increases with the weakness of the IV). A weak IV will provide large SEs for the IV estimator [3,13,31,47,99].

Discussion
We provided an overview of estimation methods of IV analyses, highlighting their strengths and limitations for epidemiological research. These methods share aspects, yet also have some particularities. However, when the IV assumptions are violated, the sample size is small, or IV models are not correctly specified, all methods tend to perform poorly and show biased results.
The methods can be categorized as moments based and semiparametric (e.g. 2SLS, GMM, SMM) or likelihood based (e.g. BPM, 2SLR, LSMM) methods. The moment based methods or semiparametric method are in general less efficient than likelihood based methods. However, likelihood methods are more vulnerable to incorrect modelling assumptions, in which case moment based methods are more robust. In empirical data, although several IV methods can be applicable in the same combination of IV, exposure, and outcome, considering different methods' assumptions, target parameters being estimated are different, so the interpretations of exposure effects appear different [45]. Therefore, choosing an appropriate IV method requires attention [76].
In order to obtain ACE or LATE or ATT, along with basic assumptions, extra assumptions such as homogeneous exposure effect or monotonicity in case of heterogeneous exposure effect or no effect modification by IV, respectively should be fulfilled. These different assumptions result in the estimation of different causal effects, and hence, researchers should be aware for interpretations of the IVestimates [14].
In randomized trials, the IV of treatment assignment satisfies the assumptions by design, but in observational studies, this is not the case. In the latter situation, subject matter knowledge and theoretical motivations (why is an IVrelated to treatment and unrelated to patients' characteristics and outcome?) should be given especially regarding the second and third condition underlying the IV method. If the IV is weakly related to exposure and correlated with unmeasured variables, IVmethods may yield biased results [100]. In addition, the main critique of any IV analysis is that the IV may affect the outcome through some pathway other than through the exposure of interest [32]. This condition cannot be verified empirically.
From a methodological perspective, the IV method is a powerful