Research Article 
Open Access 

Subhash S Naik^{1*} and YV Reddy^{2} 
^{1}Head, Asset Optimization & Industrial Engineering Department, Vedanta Group of Companies, India 
^{2}Reader, department of Commerce, Goa University, Goa, India 
^{*}Corresponding author: 
Subhash S Naik
Head, Asset Optimization & Industrial Engineering Department
Vedanta Group of Companies, India
Email: [email protected] 


Received February 01, 2013; Published February 23, 2013 

Citation: Naik SS, Reddy YV (2013) Does my Research Thesis Proposed Model Represent the Authentic Study? An Assessment of the Appropriate Use of Structural Equation Modeling (SEM) Model Fit Indices. 2:648 doi:10.4172/scientificreports.648 

Copyright: © 2013 Naik SS, et al. This is an openaccess article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. 

Abstract 

Structural Equation Modeling is a widely used multivariate multiple dependence technique in many social science disciplines. However, it has been found that in the social science literature very few studies report the correct set of model Fit Indices (FIs), with little justification. This trend is largely symptomatic of the lack of clarity on the acceptable criteria on the choice, and their threshold values for the a priori model’s FIs. The objective of this article is to reduce some of the confusion surrounding the appropriate use of SEM model fit indices, and present the arguments for their use. I also discuss the merits and demerits of some of the “popular” model fit indexes, and towards the end also provide some practical thumb rules for appropriate use of FIs, to support the a priori model’s claim to represent the real world phenomenon. 

Keywords 

Structural equation modeling; Model fit index; Chisquare; Sample size; Multivariate 

Introduction 

Structural Equation Modeling (SEM) method, also known as covariance structure analysis or latent variable analysis, is an advanced multivariate technique to examine multiple dependence relationships between variables, simultaneously. SEM is considered a more advanced method visàvis the other multivariate techniques because it can estimate a series of interrelated dependence relationship simultaneously. Byrne [1] notes that SEM is: 

• A confirmatory rather than exploratory approach to test the relationships 

• Accounts for measurement errors in the course of model testing 

• Can incorporate observed (indicator) variables as well as latent (unobserved) variables and most importantly, 

• Tests a priori relationships rather than allowing the technique or data to define the nature of relationship between the variables. 

Ullman [2] prefers to call SEM a method which consists of Exploratory Factor Analysis (EFA) combined with multiple regressions. However Schreiber et al. [3] assert that SEM is rather a combination of CFA and multiple regressions, since it is more a confirmatory technique than an exploratory one. 

In linear Structural Equation Models (SEM), the measured variables, which have a co relational relationship among them, are represented through an assumed linear relationship, so that it is not only meaningful and parsimonious, but also represents the real world phenomenon, with as much close approximation as is plausible. Since this real world phenomenon can never be fully understood, the purpose of such models is to try to evaluate the best possible outcome. This tantamount to choosing a particular model which fits the observed data closely, is amenable to interpretable solution, yet is only one of the many plausible representation of the phenomenon that produced the data that is being dealt with. In other words, model fit is never absolute in the strictest sense of the meaning and the outcome or resultant model that we choose as a result of evaluating various model fit indexes is only one of the many possible and plausible models that may fit the observed data to the same extent or degree (equivalent fit) but have different meanings or interpretations to the same observed data. Such models with equivalent fits may even be infinite in number, yet they can be distinguished based on their substantive meaning. Hence evaluation of model fit by itself does not constitute theory or a theoretical contribution to literature; instead theory should at least partly (or partially) explain or aid the explanation of the equivalently fitting models in SEM and help us choose the one which has the most meaningful, parsimonious and relevant meaning in the context of our research question. 

Model specification in SEM involves one or many models which are technically correct in order to correctly estimate the model parameter estimates and measures of fit. Yet there are ample numbers of studies which have tried to come up with various measures of relative and absolute fit indexes which look at the adequacy of the most appropriate model. Nevertheless the debate on the adequacy cum adequateness of these fit indexes point towards the inherent limitations of either of these indexes and hence the need to complement one index with others. This article attempts to resolve the apparent conflict between the pictures represented by the various fit statistics and reviews the literature with an objective of suggesting some practical thumb rules for social science researchers to use, while claiming the fit of their structural model with the real world phenomenon. Fitting is a priori theoretical model Chisquare (χ^{2}) goodnessoffit statistic and fit indexes are the most commonly used measures to evaluate model fit in SEM. The so called fit is actually the degree to which the particular model matches with the observed data. The important aspect of these fit statistics is that one is supplementing the others. In our discussion, a priori models would be the focus. 

Why χ^{2} test statistic? 

The χ^{2} test statistic T=(N–1)F min with large sample (asymptotic) χ ^{2} distribution tests the level of significance of the difference or discrepancy between the sample and fitted covariance matrices, in a sample of size N, and discrepancy function F. 

Cudeck and Henly [4] mentioned 3 types of discrepancy functions, which may be used as a basis for model selection: sample discrepancy, overall discrepancy and discrepancy due to approximation. 

The χ^{2} statistic should be nonsignificant (Tα at level of significance α1, should be more than T, for the given degrees of freedom 2), for the model to be accepted as good representation of real world phenomenon, for it fairly (within acceptable limits) mirrors the process that has generated the given data in the population. 

The limitation of this test is that, T may not follow a χ^{2} distribution when N is small. What exactly is a small sample size is itself not clearly resolved among researchers. However in large samples too, the χ^{2} test is not free of problems. This is due to the large statistical power of a large sample, which makes even a trivially small discrepancy between sample covariance matrix and fitted model, reject the specified model. Also, since multivariate normality is an underlying assumption in χ^{2} test too, in the case of violation of this assumption, the T statistic may not be χ^{2} distributed. Green et al. [5] have shown that χ^{2} statistic also varies depending on the number of categories in the response variable. 

What follows from the above discussion is that χ^{2} statistic is not a complete and reliable measure of evaluating the model fit, because a significant χ^{2} value may result from reasons other than an inadequate model fit likemodel misspecification, power of test, violation of normality assumptions (for ML and GLS statistics), or sample size effect. There are multiple χ^{2} tests depending on the choice of discrepancy function F and T statistic. 

Several researchers have also looked into nonnormal theory based testing. Browne [6] using multivariate elliptical theory, introduced a kurtosis parameter, beyond the normal distribution parameters, to get the optimal asymptotic estimates and χ^{2} goodness of fit tests. On similar lines, Kano [7] introduced the heterogeneity of kurtosis statistic, which takes care of elliptical and normal theory based statistics. Normal theory is a special case when kurtosis is usual, while elliptical theory is based on homogeneous kurtosis of variables. The SantorraBentler correction is applied in cases when the assumptions related to normal, elliptical or heterogeneous kurtosis theory are violated, and is independent of the distribution of variables. ADF (Asymptotic Distribution Free) methods also are seemingly independent of the distribution of variables, provided the sample size is high. 

Popular use of this method is restricted due to the requirements of sample size in the range of 5,00010,000. Under conditions of dependence between latent variates, the SantorraBentler corrected statistic has better performance than others, and in the case of independence of these latent variates, its performance is equally as good as normaltheory methods. 

However it is still sensitive to sample size, and in the case of smaller sample sizes, tends to overreject correct models more frequently. Typically the SantorraBentler correction is applied to ML χ^{2} statistic; however Hu and Bentler [8] proved that the GLS χ^{2} statistic performed better than the ML estimator even at smaller sample sizes, and one of the most adequate model fit statistics. However in large samples, both ML and GLS estimators are equally good. 

In the case of noncentral distribution of χ^{2} variate, for given dof, a no centrality parameter λ is introduced that measures the discrepancy between population covariance matrix and the fitted model, and is also a measure of the error of approximation [when λ = 0, it is the case of central χ^{2} distribution]. Noncentral χ^{2} measure is predicted to be less dependent on sample size variations according to statistical theory McDonald and Marsh [9]. For comparison between alternative models, scaled no centrality parameter, SNCP= (χ^{2}–df)/N, is often used. 

Fit indices 

The biggest limitation of the χ^{2} statistic is that while it works well with large samples, the tradeoff is in terms of the large power that large samples inherently have to reject specified models that have only an incremental discrepancy with the population matrix. Hence taking a dichotomous decision based on χ^{2} test fails to make a clear decision regarding the model’s degree of fit. This gap in information is filled up by the use of fit indexes. It helps to alleviate many of the problems that are related to sample size and misspecification of the distributional assumption in χ^{2} test. FIs measure the variance and covariance accounted for by the specified model and do not test the null hypotheses unlike in the case of χ^{2} test. However several studies [ 10,11] have repeatedly shown that FIs too are not free of the effect of sample size and distributional misspecification, and even the model complexity. 

Two main types of FIs are accepted in Literature: Absolute FIs and Relative FIs. Unlike χ^{2} statistic, all FIs are nonstatistical measures of model fit. There are also other types of Fit statistics: Parsimony FIs and noncentrality FIs (Appendix 1). 

Absolute fit indexes 

These FIs assess how well a priori model reproduces the sample data, with reference to the saturated model. A saturated model is assumed to exactly reproduce the observed covariance matrix. In this aspect the AFI resembles the R2 of a linear regression equation model. The Joreskog and Sorbom [12] adjusted AFI for ML methods is given by: 

AGFIML = 1–[p (p+1)/2df](1–GFIML) 

Where, p = Number of parameters (to adjust for parsimony) and 

GFIML is given by: 

GFIML = 1– [tr (Σ^{1}S^{–I})^{2} /tr(Σ^{1}S)^{2}] 

and is the measure of the relative amount of variance and covariances in S, accounted for by implied model Σ. 

Here S = Σ, GFI = AGFI = 1 for ML estimates. 

Based on Akaike’s information criterion [10,13] 

CAK = [TT / (N–1)] + [2q/ (N–1)] 

and also another version of AIC called CK 

CK = [TT / (N–1)] + [2q/ (N–p–2)]. 

Here p and q denote the number of variables and parameters respectively. 

Both CAK and CK are used to select models, based on the criteria that smaller the value, better the fit of the implied model. Another index is McDonald’s [14]: 

MCI = exp [–1/2 {(TT–dfT)/ (N–1)}] 

MCI may range from 0 to 1, and may even exceed 1, due to sampling errors. Another fit index, called Hoelter [15] CN, aims at being independent of sample size. CN is thus also used to estimate the adequate sample size for an acceptable fit of the model in a χ^{2} test. 

CN = {[z critical+sqrt (2df–1)]^{2} / [2TT / (N–1)]}+1 

where zcritical = z value at given level of α. For adequate fit, CN should be ≥ 200. 

ECVI approximates the goodness of fit that the estimated model would achieve in another sample with the same sample size N. ECVI also incorporates the number of estimated parameters for both structural and measurement models, and is calculated as: 

[χ^{2} /(N–1)] + [2* No. of estimated parameters/(N–1)] 

ECVI is used for comparison of several alternative models and hence no threshold cutoff values or range are specified as acceptable values. 

Residual analysis 

Besides χ^{2} statistic and Fit Indexes, there is a relatively more straight forward method of evaluating model fits, called residual evaluation. Here we observe the discrepancy between observed correlations and model reproduced correlations. If this discrepancy is found to be small, then model fit is deemed to be high. The residual analysis should be carried out over and above the χ^{2} statistic and the FIs, and should serve to supplement our overall evaluation of model fit. Yet it is still unresolved in literature whether such residuals too are sensitive to sample size and estimation methods like that in the case of RMR statistic. Root Mean Square Error of Approximation (RMSEA) is, “representative of the goodnessoffit that could be expected if the model were estimated in the population” Hair [16]. RMSEA measures the discrepancy per degree of freedom, and a value ≤ 0.05 indicates close fit and ≤ 0.08 indicates a reasonable fit [10]. Root Mean Square Residual (RMSR) is the square root of the squared residuals, which is the mean of the residuals between observed and input matrices. In the case of co variances, RMSR is the mean residual covariance, and in the case of correlations, RMSR is the mean residual correlation. RMSR is most usable and useful for correlation, since they are all on the same scale. However no threshold value for cutoff can be suggested or recommended for the use of RMSR and its use would be dependent on the research objectives and the actual or observed co variances/ correlations. Although both RMSEA and RMSR are discrepancy per degree of freedom, RMSEA is different than RMSR as it is measured in terms of population and not just the sample used for estimation Steiger [ 17]. 

Mulaik [18] suggests an improvised version of RMSEA by converting it to a 01 index, and multiplying the resulting value by PR (Parsimony Ratio which is equal to dof divided by the no. of data points). Through trial and error he found that getting the exponential of minus RMSEA gives an index that ranges between 0 and 1, with 1 being good fit (RMSEA is zero). This he calls the ER index (exponentialised RMSEA): ER = exp (–RMSEA). 

From about 1.00 down to around .80, ER tracks the CFI quite closely. They deviate much more rapidly as CFI gets smaller than this. Converting the RMSEA to ER and obtaining ER*PR should have a value of around 0 .85. 

Incremental FI 

An Incremental FI in contrast to AFI measures the proportionate improvement in fit of the target model with a nested baseline model. Such a baseline model can be as a null model with all observed variables being uncorrelated, Bentler and Bonett [19]. These FIs are also known as comparative FI (CFI). Hu and Bentler [8] propose, based on Marsh et al. [9], 3 types of CFIs: 

Type1 Index: Information only from fitting target and baseline models of the optimized statistic is used, with no distributional assumption being made. However, both these types of models should follow the same fit function. 

Bentler and Bonett’s [19] Normed Fit Index (NFI) is given by: 

NFI = (TB–TT)/ TT 

Where T denotes the nonnegative statistic associated with target and baseline models, which is based on certain statistical and mathematical assumptions which are same for both models. Here the null model is used as a baseline model and NFI denotes that proportion of total covariance among observed variables that a given target model is able to explain w.r.t. the baseline model. Each of the test statistics, TB and TT are “normed” with a 01 range (may not follow the χ^{2} distribution) and TB ≥ TT for optimized indexes (Hu and Bentler). Bollen [20] has also suggested another type1 index and is given by 

[(TB/dfB) – (TT/dfT)]/ (TB/dfB)] 

where df are the degrees of freedom of target and baseline models. 

Type2 Index: Additional information from expected values of target model is used under the assumption of central χ^{2} distribution, Tucker and Lewis [21] is given by: 

[(T_{B}/df_{B}) – (T_{T}/df_{T})]/ [(T_{B}/df_{B}) –1] 

Here assumption of normality is made and method of ML estimation is used. This Index is also called the Nonnormed fit index (NNFI), as T statistic is not normed (does not lie in the 01 range). 

Type3 Index: Uses type1 information along with information from the expected values of either or both the target and baseline models, under noncentral χ^{2} distribution. It is expected that type2 and 3 indexes should perform better than type1 index. Based on the reduction in the amount of misfit that the target model causes w.r.t. the baseline model, Bentler has suggested a noncentrality fit index, given by: 

δ = (λB–λT)/ λB 

Where λB and λT are the noncentrality population parameters associated with the baseline and target models respectively. Hence δ would be larger when the misspecification of target model (λT) is lower. 

Another form of the above fit index using estimates of the noncentrality parameters is given by Bentler [22]: 

BFI (Bentler’s Fit Index) = [(TB–dfB) – (TT–dfT)]/ [(TB–dfB)] 

If BFI falls outside the 01 range, it can be modified and respecified as comparative fit index: 

CFI = 1max [(TT–dfT), 0]/max [(TT–dfT), (TB–dfB), 0] 

Parsimonious fit measures 

The normed χ^{2} statistic, this is the ratio of χ^{2} divided by its degrees of freedom. This measure may reflect an overfitted model based on chance as well as model misfit. Indicative values of normed chisquare for an acceptable model fit lie in the range of 1.0 and 2.0. But even this statistic is sensitive to sample size effects (hence unreliable) and should be used in combination with fit indexes [16]. 

Parsimonious Normed FI (PNFI) is defined as: (df proposed/df null)*NFI 

Higher values are better and this measure is most suited for comparison of alternative models with different degrees of freedom. This measure rewards parsimony. Substantial model differences are said to be when the difference between alternate models are 0.06 to 0.09 Williams and Holahan [23]. 

Parsimonious GFI (PGFI) is defined as: [df proposed /1/2 *(No. of manifest variables)*(No of manifest variables+1)] *GFI 

The value of PGFI lies in the range 0 to 1.0, and the higher the value, the higher the model parsimony is. 

Akaike Information Criteria (AIC): This is a measure based on the statistical information theory, and is used for comparison between alternate models with varying number of constructs. 

AIC= χ^{2} +2*No. of Estimated parameters 

A value of AIC close to 0 indicated a better fit with greater parsimony; smaller values occur when estimated coefficients are less. Lower AIC values also denote that model is not only a good fit but is also not prone to over fitting. 

Fit indices: limitations and challenges in use 

There are huge challenges and limitations while using the correct fit indices for representing reality. Many of the absolute FIs like MCI, CN, CAK, CK etc., have N included in the formula of FI. Bearden [ 24] find that mean of NFI is positively associated with sample size, and for smaller sample sizes, NFI is far less than 1, possibly due to systematic fit index bias. Similarly, Bollen [20] also notes that it would be important to know whether N directly enters the calculation of FI. However subsequent research has not found many takers for this issue. Tanaka and Huba [25] say that estimationspecific FIs like GFI are more appropriate than estimationgeneral FIs like NFI, in case of finite samples under normaltheory based estimation methods. 

Literature is still not clear about the adequacy of FIs when latent variables are dependent, and also when distributional assumptions underlying the estimation methods are violated. But at least from some studies it has been found that NFI, which is an unpredictable FI for small samples, becomes even more unpredictable when latent variables are dependent. Just like NFI, Bollen’s FI (BFI) is also sensitive to estimation methods; thus ML, GLS and ADF estimates are inconsistent, especially when N is less than 1000.Moreoever BFI is also sensitive to dependency among latent variables. Hu and Bentler [8] thus recommend not using BFI as a reliable FI for evaluating model fit. These authors further recommend that among Incremental FIs, TLI is quite independent of N, when ML estimation method is used and latent variables are independent (for N <= 1000). However GLS estimates of TLI were found to be underestimated, and overrejected models when N = 150. When latent variables are dependent, the mean value of TLI based on all three methods was related to sample size. The two authors also found that when N = 150, TLI (ML) rejected 30% of the models. 

Even for other FIs like Bentler’s FI, MCI and Marsh’s noncentrality index [δ], a large proportion of models were rejected (compared to 0.9 as cutoff value), under condition of dependence of latent variables, using GLS and ADF methods. Hu and Bentler [26] have suggested that GFI at N >= 250, and cutoff value of 0.95 provides a reliable measure of evaluating model fit, even under conditions of dependency among latent variables. GFI behaved consistently across ML and GLS methods at all sample sizes, under conditions of independence of common and error variates, irrespective of the distributional form of the variates. When N = 5000, all three methods converged under dependency of latent variables, compared to AGFI where ML and GLS estimates converged at N >= 500, while all three estimates never converged up to N = 5000. CAK and CK are related to sample size, and were consistent across ML and GLS methods at all sample sizes under condition of independence of latent variables, and when N >= 500, when they were dependent, Hu and Bentler [26]. Means of MCI too was consistent across ML and GLS methods in being unrelated to sample size, under independence condition. However in the case of dependence of latent variables, all three estimates were related to mean of MCI. The cutoff value of 0.9 was found to be useless under ML/ GLS when N ≤ 500, under dependency of latent variable conditions, since it resulted in overrejection of models; while under dependence conditions, when N ≥ 250, it resulted in 0% rejection of models, using ML/GLS estimation methods. Hoelter’s Critical N (CN) mean was found to be positively associated with sample size Hu and Bentler [ 26]. These authors recommend that a cutoff/critical CN value should be substantially higher than 200, to effectively evaluate the model fit. This may be because at N ≥ 250 under independence conditions, and N ≥ 500 under dependence conditions, there was almost complete acceptance of all models. In general, Hu and Bentler [26] conclude that type1 incremental FI values (like NFI and Bollen’s 86FI) seem to be positively associated with sample size, compared to type2 and 3, which seemingly are less biased and better performers than type1 and absolute FIs. This means that NFI is less reliable than CFI, TLI, and BFI as evaluators of model fit. At small sample sizes, type1 indexes are recommended for use only through ML estimation methods. In general also, ML is the more preferred method than GLS or ADF methods for estimation. 

Hu and Bentler [8] recommend the following FIs for evaluating model fit under the said conditions: Independence of latent variables condition: ML based MCI, BFI (RNI), CFI, BL89 and TLI. TLI should be avoided when N is very small (say 50); in this case only GFI is recommended. Under dependency between latent variables (including error variates), the authors recommend using a cutoff value of greater than 0.9; however types1, 2 and 3 and AFIs tend to overreject models, when N<=250. In cases where N>=250, GFI, GLS, BFIML, CFI, BL89 and TLI are fairly adequate. 

Hair et al. [16] suggest that to describe the strength of the model’s predictions, results from three different perspectives should be combined: overall fit, comparative fit to a base model, and model parsimony. 

This view is tempered by Bollen [20] ...selecting a rigid cutoff for incremental fit indices is like selecting a minimum R2 for a regression equation. Any value will be controversial. Awareness of the factors affecting the values and good judgment are the best guides to evaluating their sizes. 

The sample size phenomenon 

The effect of sample size on the adequateness of the FI is the most prominent one. What constitutes adequate N for the adequateness of the fit index is decided by the tradeoff between too little power in small sample sizes and too much statistical power in large samples. In other words the tradeoff or the fine balancing act is in choosing between too little power to detect large discrepancies and too much power to detect trivial discrepancies. Schreiber et al. [3] suggest that sample size becomes an important issue since it determines the stability of the estimated parameters, and go on to recommend that replication with multiple samples may be the key to demonstrate the stability of parameters; however they suggest that with one sample analysis, 10 participants (data points) per estimated parameter can be a good thumb rule. Pohlmann [27] recommends estimating the model twice with the same data, by randomly splitting it into two halves. 

Further complications arise from the sensitivity of sample size to the dependency between the latent variables. Normal theory statistics using ML/GLS χ^{2} statistics fail when common and error variates are dependent. SantorraBentler corrected χ^{2} statistic works better in GLS correction; more so when N is small. The third aspect of the sample size effect is that they affect the magnitude of FIs. Hence in case of dependencies of latent variables, no FI is adequate when N is small (when asymptotic assumption is violated). Between dependence and independence of latent variables, ceteris paribus, larger N is required in case of former when latent variables are dependent. As for χ^{2} test statistic, it is most appropriate when 100 ≤ N ≤ 200. Outside this range, the significance test becomes less consistent and less reliable. 


Effect of the estimation method 

Similar to sample size effect, the estimation method effect is more when latent variables are dependent. Hence while χ^{2} statistic is inadequate when dependency occurs, the SantorraBentler statistic is seemingly adequate. For ADF, a very large sample size (say 5000 or more) is required for the χ^{2} statistic to be adequate. Only GFI is an adequate FI, when ADF estimation method is used; while ML based methods underestimate the FI asymptotic values, to a lower extent. Under dependency conditions, all FIs behave inconsistently under all three methods, when N is small. When common and error variates are independent, type2, 3 and AFIs are consistent across ML and GLS methods; while type1 FIs behave erratically across the three methods in small sample sizes. 

Model complexity 

More complex and saturated models have higher values of FIs, due to lower dof [13,28,29]. To adjust for model complexity bias, resulting in a higher FI value due to overparametrisation, AGFI as a FI was proposed. Mulaik et al. [30] have proposed a parsimony ratio (defined as dof of target model relative to total number of relevant dof in the data) to penalize complex models. AIC was also developed to adjust the goodness of fit for the number of parameters estimated in the model. On similar lines CAK and CK are indexes which allow lesser complex models to be selected for smaller N and more complex models for larger N. 

Discussion 

I would now like to provide a few thumb rules for the use of SEM fit indices (FIs) that can be used by researchers to report in their studies. These thumb rules have been arrived at by the following 3 step process: 

1. Review of around 35 published studies using SEM FIs in the literature (19812006). 

2. Reconciliation of all conflicting claims about different FIs in each of the 4 categories 

3. Synthesis of the extant recommendations for the use of FIs, to arrive at few practically useful methods of application. 

Thumb rules for use of model FIs 

1. Regarding the overall fit, use the FIs cutoffs for continuous data as: RMSEA < 0.06, TLI > 0.95, CFI > 0.95, standard root mean square residual (SRMR) < 0.08 [3,8]. 

2. For categorical variables, use the above cutoff values, except SRMR; also weighted root mean square residual (WRMR) < 0.90 works well for continuous and categorical data and WRMR ≤ 1.0 even for moderately nonnormal continuous data (e.g. 2002). 

3. For nonnormal continuous data when N > 250, the SBbased CFI cutoff value is 0.95 and SRMR at 0.07 (acceptable type I and II errors) (e.g. 2002). When, N ≥ 500, the TLI ML and CFIML at the suggested values were acceptable with nonnormal data. 

4. The power of TLI, CFI and RMSEA to detect models with misspecified loadings is higher than their power to detect models with misspecified co variances (e.g. 2002). The power of SRMR is larger to detect models with misspecified co variances. Yuan Bentler statistic should be used when N is in the range 60120 e.g. Bentler and Yuan [31]. 

5. Qplot should be discussed, as the standardized residuals that depart excessively from the Qplot line indicate that a misspecified model e.g. Byrne [32]. 

6. Only CN, NNFI and RMSEA are not significantly related to study characteristics. NNFI is the most suitable index as it was not significantly related to study characteristics such as sample size, number of indicators per latent variable, number of latent variables, number of estimated paths, and degrees of freedom e.g. Berndt [33]. 

Concluding Remarks 

This article is not only a good summary of what we already know about SEM FIs, but it goes beyond this by cutting across the endless debates in various for (published studies, working papers and specially on the SEMNETan e group for SEM users), by providing statistically valid and practically useful thumb rules which are supported by seminal studies in literature. All thumb rules have a very strong support from literature and hence are not conjectural in nature. We suggest that besides simulation, metaanalytic studies should be conducted to further validate our current understanding. Towards this end, the current study can be a good starting point. Future researchers can further the research on the appropriate application of model fit indices by extending research in this direction. 


References 

 Byrne BM (1998) Structural Equation Modelling with LISREL, PRELIS and SIMPLIS, Hillsdale, New Jersey: Lawrence Erlbaum, Routledge, UK.
 Ullman JB (2001) Structural equation modelling. In: Tabachnick BG, Fidell LS (eds.), Using multivariate statistics (4th Ed.). Needham Heights, MA: Allyn and Bacon, USA.
 Schreiber JB, Nora A, Stage FK, Barlow EA, King J, et al. (2006) Reporting Structural Equation Modelling and Confirmatory Factor Analysis Results: A review. J Educ Res 99: 323337.
 Cudeck R, Henly SJ (1991) Model selection in covariance structures analysis and the “problem” of sample size: A clarification”. Psychological Bulletin 109: 512519.
 Green SB, Akey TM, Fleming KK, Hershberger SC, Marquis JG, et al. (1997) Effect of the number of Scale Points on ChiSquare Fit Indices in Confirmatory Factor Analysis: Structural Equation Modeling 4: 108120.
 Cudeck, Robert, Browne, Michael, W (1983) CrossValidation of Covariance Structures: Multivariate Behavioral Research 18: 147167.
 Kano Y, Berkane M, Bentler, Peter M (1993) Statistical inference based on pseudomaximum likelihood test. Jasa 88: 13543.
 Hu L, Bentler PM (1995) Evaluation model fit. In: Hoyle, R. H. (Ed.), Structural equation modelling: Concepts, issues, and applications. Thousand Oaks, Sage, USA.
 McDonald RP, Marsh, HW (1990) Choosing a multivariate model: Noncentrality and goodness of fit. Psychological Bulletin 107: 247–255.
 Browne MW, Cudeck R (1993) Alternative ways of assessing model fit. In: Bollen KA, Long JS. Testing Structural Equation Models, Newbury Park, Sage, USA.
 Gerbing, DW, Anderson JC (1993) Monte Carlo evaluations of goodness of fit indices for structural equation models. In: Bollen KA, Long JS (Eds.), Testing structural equation models, Newbury Park, Sage, USA.
 Jöreskog KG, Sörbom D (1984) LISREL VI user’s guide, (3rdedn), Mooresville, IN: Scientific Software.
 Akaike H (1987) Factor Analysis and AIC. Psychometrika 52: 31732.
 McDonald RP (1989) An index of goodnessoffit based on noncentrality. J Classif 6:97–103.
 Hoelter JW (1983) The Analysis of Covariance Structures: GoodnessofFit Indices. Sociological Methods and Research 11: 32544.
 Hair JF, Anderson RE, Tatham RL, Black WC (1998) Multivariate Data Analysis (5thedn), Upper Saddle River, NJ: Prentice Hall, USA.
 Steiger JH (1990) Structural Model Evaluation and Modification: An Internal Estimation Approach. Multivariate Behavioral Research 25: 173180.
 Mulaik SA (2001) The curvefitting problem: an objectivist view. Philosophy of Science 68: 218241.
 Bentler PM, Bonett DG (1980) Significance tests and goodness of fit in the analysis of covariance structure. Psychological Bulletin 88: 588 606.
 Bollen KA (1989) Structural Equations with Latent Variables, New York: Wiley, USA.
 Tucker LR, Lewis C (1973) A reliability coefficient for maximum likelihood factor analysis. Psychometrika 38: 110.
 Bentler PM (1990) Comparative fit indexes in structural models. Psychological Bulletin 107: 238246.
 Williams LJ, Holahan PJ (1994) Parsimonybased model fit indices for multipleindicator models: Do they work? Structural Equation Modeling 1: 161189.
 Bearden WD, Sharma S, Teel JE (1982) Sample size effects on chisquare and other statistics used in evaluating causal models. J Mark Res 19: 425–430.
 Tanaka JS, Huba GJ (1989) A general coefficient of determination for covariance structure models under arbitrary GLS estimation. British J Math Stat Psy 42: 233–239.
 Hu LT, Bentler PM (1999) Cutoff criteria for fit indices in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling6: 1–55.
 Pohlmann JT (2004) Use and interpretation of factor analysis in The Journal of Educational Research 98: 14–23.
 Jöreskog KG, Sörbom D (1984) LISREL V: Analysis of linear structural relationships by the method of maximum likelihood,Mooresville, IN: ScientificSoftware, USA.
 Steiger HH, Lind JM (1980) Statistically based tests for the number of common factors. Paper presented at the annual meeting of the Psychometric Society, Iowa City, USA.
 Mulaik S, James L, Van Alstine J, Bonnett N, Lind S, et al. (1989) Evaluation of goodness of fit indices for structural equation models. Psychological Bulletin 105: 430445.
 Bentler PM, Yuan KH (1999) Structural equation modelling with small samples: test statistics. Multivariate Behavioral Research 34: 181197.
 Byrne BM (1989) A primer of LISREL: Basic applications and programming for confirmatory factor analytic models, New York: SpringerVerlag, USA.
 Berndt AE (1998) Typical model’ features and their effects on goodness of fit indices. Presented at the 106th Annual Convention of the American Psychological Association, San Francisco, CA, USA.


