Method Basic notion Exposure effects Strength Limitation
Ratio estimator (RE) -the RE is  appropriate when only one IV -RD, RR, OR -simple estimation method -with a single binary IV and no other confounders, 2SLS = RE  
Two-stage least squares (2SLS) -linear models without making parametric assumptions on the error terms -for multiple IVs, IV estimator is the weighted average of the ratio estimators -estimator similar as classical regression -natural starting point of IV analysis -the estimate asymptotically unbiased -widely used for binary exposure and outcome and provides the exposure effect on  risk difference scale -unlike RE, it is able to adjust the possible measured confounders -show biased results in binary cases or in the case of non-linear models -for multiple IVs, 2SLS estimator is biased and hence limited information of maximum likelihood method would be an alternative -for smaller sample sizes, limited information maximum likelihood estimator is more efficient and consistent than 2SLS -IV and 2SLS are a special case of GMM; however both yield the same results in the case of homoscedastic errors variance
Linear probability models (LPM) -applied for binary outcome, exposure, and IV, the data are modelled using linear functions -for a single binary IV, the estimator equivalent to the RE RD -simple to estimate and interpret as the regression coefficients -the RD is consistent for the ACE - sometimes predicted probabilities outside of the 0–1 range and for rare outcomes this may become negative - assumes the marginal/incremental effect of exposure remains constant which is logically impossible for binary outcome
Two-stage predictor substitution (2SPS) -the rote extension to nonlinear models of the linear IV models -targets a marginal (population-averaged) odds ratio -it is the mimic  of 2SLS -non-linear least squares is used to estimates the parameter -for a linear model, 2SPS = 2SLS -RD, RR, OR -suitable for non-linear association between exposure and outcome -in practice, 2SPS in non-linear model does not always yield consistent exposure effects on the outcome - parameter estimation process is more difficult than 2SLS -under a logistic regression model, 2SPS  may not provide causal OR  
Two- stage residual inclusion (2SRI) -include the estimated unobservable confounder (residual) from the first-stage as an additional variable along with the exposure in the second-stage  model - also called control function estimator -under a linear model, 2SRI = 2SLS = 2SPS -RD, RR, OR -yields consistent estimates for linear and non-linear models -performs better than 2SPS -possible to apply in the specific case of a binary exposure with a binary or count outcome -for a log-linear model in the stage-two, 2SRI estimator provides CRR -it may give biased estimates when there is strong unmeasured confounding, as is usually the case in an IV analysis -under a logistic regression model, 2SRI estimator may not provide causal OR -generally require the exposure to be continuous, rather than binary, discrete, or censored
Two-stage logistic regression (2SLR) -when outcome and exposure are binary and interest to estimate OR -fully parametric, maximum likelihood technique is used to estimate the parameters -OR -parallel to 2SLS using LRM in both stages instead of linear models -if the first-stage logistic model is not correctly specified then second-stage  parameter estimates might be biased -estimator does not provide COR
Three-stage least squares (3SLS) -an extension of 2SLS but unlike the 2SLS, all coefficients are estimated simultaneously, requires three steps -in 2SLS, if the errors in the two equations are correlated, the 3SLS can be an suitable alternative -RD, RR -more information is used and hence the estimators are likely to be more efficient  than 2SLS -more vulnerable to a misspecification of the error terms -very rarely applied in epidemiologic studies -estimation process is more complicated than 2SLS -3SLS becomes inconsistent if errors are heteroskedastic
Structural mean models (SMM) -SMMs use IVs via G-estimation and involves the assumption of conditional mean independence -additive SMMs use continuous outcome and multiplicative SMMs use positive-valued outcomes -MSMM assumed log-linear model to measure the risk ratio -LSMM assumes logistic regression model which is fitted by maximum likelihood technique RD, RR, OR -it relaxes several of the modelling restrictions (constant treatment effects) required by ratio estimator/two-stage methods -can be used in the case of time-dependent instruments, exposures, and confounders -provides average treatment effects for the treated subjects -the assumption of no effect modification is impossible to verify -with a binary outcome, additive  SMMs and MSMM suffer from the limitations of linear and log-linear models (e.g., predicted response probabilities may outside of the  interval [0, ]))
Generalized method of moments (GMM) -a non-linear analogue of 2SLS -the standard IV (2SLS) estimator is a special case of a GMM estimator -making assumptions about the moments of the error term -allows estimation of parameters inover-identified model (number of IV greater than number of exposure variable) -the parameters are estimated in an iterative process RD, RR, OR -it requires specification only of certain moment conditions -applicable for the linear and non-linear models -non-linear GMM estimator is asymptotically more efficient than 2SLS -more robust and less sensitive to parametric conditions -works better than 2SLR when exposure and outcome are binary -in case of  heteroskedasticity, this is more efficient than the linear IV estimators   -GMM estimator with logistic regression model is not consistent for the COR due to non-collapsibility of the OR
Bivariate probit models (BPM) -two-stage method, but as different to 2SLS and model the probabilities directly and are restricted on [0,1] -full information maximum likelihood is used to estimate the parameter -accounts for the correlation between the errors Probit coefficient* -for binary outcome and exposure, BPM perform better than linear IV methods -the estimator of BPM have no interpretation like OR. However, by multiplying a probit coefficient by approximately 1.6, the estimator can be made to approximate OR -when the distribution of error terms are not normal or the average probability of the outcome variable is close to one or zero, the BPM estimator may not be consistent for ACE
Remarks for all methods:
-all basic IV assumptions are needed for all estimation methods and violation of any IV assumption, all methods provide biased results
-under the constant exposure effect, all methods provide ACE; in case of a heterogeneous treatment effects, under the monotonicity and no effect modification assumptions, all methods (except SMMs) provides LATE and SMMs provides ATT, respectively
*the bivariate probit model is fully parametric, all of the treatment parameters such as risk difference, odds-ratio or risk ratio, can be derived from the probit coefficients as marginal effects.
Table 2: Overview of commonly used estimation methods for IV analysis (basic notions, estimator, strengths, and limitations)