Open Access Scientific Reports

Your Research - Your Rights

Does my Research Thesis Proposed Model Represent the Authentic Study? An Assessment of the Appropriate Use of Structural Equation Modeling (SEM) Model Fit Indices

Research Article Open Access
Subhash S Naik1* and YV Reddy2
1Head, Asset Optimization & Industrial Engineering Department, Vedanta Group of Companies, India
2Reader, department of Commerce, Goa University, Goa, India
*Corresponding author: Subhash S Naik
Head, Asset Optimization & Industrial Engineering Department
Vedanta Group of Companies, India
E-mail: [email protected]
Received February 01, 2013; Published February 23, 2013
Citation: Naik SS, Reddy YV (2013) Does my Research Thesis Proposed Model Represent the Authentic Study? An Assessment of the Appropriate Use of Structural Equation Modeling (SEM) Model Fit Indices. 2:648 doi:10.4172/scientificreports.648
Copyright: © 2013 Naik SS, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Structural Equation Modeling is a widely used multivariate multiple dependence technique in many social science disciplines. However, it has been found that in the social science literature very few studies report the correct set of model Fit Indices (FIs), with little justification. This trend is largely symptomatic of the lack of clarity on the acceptable criteria on the choice, and their threshold values for the a priori model’s FIs. The objective of this article is to reduce some of the confusion surrounding the appropriate use of SEM model fit indices, and present the arguments for their use. I also discuss the merits and demerits of some of the “popular” model fit indexes, and towards the end also provide some practical thumb rules for appropriate use of FIs, to support the a priori model’s claim to represent the real world phenomenon.
Structural equation modeling; Model fit index; Chisquare; Sample size; Multivariate
Structural Equation Modeling (SEM) method, also known as covariance structure analysis or latent variable analysis, is an advanced multivariate technique to examine multiple dependence relationships between variables, simultaneously. SEM is considered a more advanced method vis-à-vis the other multivariate techniques because it can estimate a series of interrelated dependence relationship simultaneously. Byrne [1] notes that SEM is:
• A confirmatory rather than exploratory approach to test the relationships
• Accounts for measurement errors in the course of model testing
• Can incorporate observed (indicator) variables as well as latent (unobserved) variables and most importantly,
• Tests a priori relationships rather than allowing the technique or data to define the nature of relationship between the variables.
Ullman [2] prefers to call SEM a method which consists of Exploratory Factor Analysis (EFA) combined with multiple regressions. However Schreiber et al. [3] assert that SEM is rather a combination of CFA and multiple regressions, since it is more a confirmatory technique than an exploratory one.
In linear Structural Equation Models (SEM), the measured variables, which have a co relational relationship among them, are represented through an assumed linear relationship, so that it is not only meaningful and parsimonious, but also represents the real world phenomenon, with as much close approximation as is plausible. Since this real world phenomenon can never be fully understood, the purpose of such models is to try to evaluate the best possible outcome. This tantamount to choosing a particular model which fits the observed data closely, is amenable to interpretable solution, yet is only one of the many plausible representation of the phenomenon that produced the data that is being dealt with. In other words, model fit is never absolute in the strictest sense of the meaning and the outcome or resultant model that we choose as a result of evaluating various model fit indexes is only one of the many possible and plausible models that may fit the observed data to the same extent or degree (equivalent fit) but have different meanings or interpretations to the same observed data. Such models with equivalent fits may even be infinite in number, yet they can be distinguished based on their substantive meaning. Hence evaluation of model fit by itself does not constitute theory or a theoretical contribution to literature; instead theory should at least partly (or partially) explain or aid the explanation of the equivalently fitting models in SEM and help us choose the one which has the most meaningful, parsimonious and relevant meaning in the context of our research question.
Model specification in SEM involves one or many models which are technically correct in order to correctly estimate the model parameter estimates and measures of fit. Yet there are ample numbers of studies which have tried to come up with various measures of relative and absolute fit indexes which look at the adequacy of the most appropriate model. Nevertheless the debate on the adequacy cum adequateness of these fit indexes point towards the inherent limitations of either of these indexes and hence the need to complement one index with others. This article attempts to resolve the apparent conflict between the pictures represented by the various fit statistics and reviews the literature with an objective of suggesting some practical thumb rules for social science researchers to use, while claiming the fit of their structural model with the real world phenomenon. Fitting is a priori theoretical model Chi-square (χ2) goodness-of-fit statistic and fit indexes are the most commonly used measures to evaluate model fit in SEM. The so called fit is actually the degree to which the particular model matches with the observed data. The important aspect of these fit statistics is that one is supplementing the others. In our discussion, a priori models would be the focus.
Why χ2 test statistic?
The χ2 test statistic T=(N–1)F min with large sample (asymptotic) χ 2 distribution tests the level of significance of the difference or discrepancy between the sample and fitted covariance matrices, in a sample of size N, and discrepancy function F.
Cudeck and Henly [4] mentioned 3 types of discrepancy functions, which may be used as a basis for model selection: sample discrepancy, overall discrepancy and discrepancy due to approximation.
The χ2 statistic should be non-significant (Tα at level of significance α1, should be more than T, for the given degrees of freedom 2), for the model to be accepted as good representation of real world phenomenon, for it fairly (within acceptable limits) mirrors the process that has generated the given data in the population.
The limitation of this test is that, T may not follow a χ2 distribution when N is small. What exactly is a small sample size is itself not clearly resolved among researchers. However in large samples too, the χ2 test is not free of problems. This is due to the large statistical power of a large sample, which makes even a trivially small discrepancy between sample covariance matrix and fitted model, reject the specified model. Also, since multivariate normality is an underlying assumption in χ2 test too, in the case of violation of this assumption, the T statistic may not be χ2 distributed. Green et al. [5] have shown that χ2 statistic also varies depending on the number of categories in the response variable.
What follows from the above discussion is that χ2 statistic is not a complete and reliable measure of evaluating the model fit, because a significant χ2 value may result from reasons other than an inadequate model fit like-model misspecification, power of test, violation of normality assumptions (for ML and GLS statistics), or sample size effect. There are multiple χ2 tests depending on the choice of discrepancy function F and T statistic.
Several researchers have also looked into non-normal theory based testing. Browne [6] using multivariate elliptical theory, introduced a kurtosis parameter, beyond the normal distribution parameters, to get the optimal asymptotic estimates and χ2 goodness of fit tests. On similar lines, Kano [7] introduced the heterogeneity of kurtosis statistic, which takes care of elliptical and normal theory based statistics. Normal theory is a special case when kurtosis is usual, while elliptical theory is based on homogeneous kurtosis of variables. The Santorra-Bentler correction is applied in cases when the assumptions related to normal, elliptical or heterogeneous kurtosis theory are violated, and is independent of the distribution of variables. ADF (Asymptotic Distribution Free) methods also are seemingly independent of the distribution of variables, provided the sample size is high.
Popular use of this method is restricted due to the requirements of sample size in the range of 5,000-10,000. Under conditions of dependence between latent variates, the Santorra-Bentler corrected statistic has better performance than others, and in the case of independence of these latent variates, its performance is equally as good as normal-theory methods.
However it is still sensitive to sample size, and in the case of smaller sample sizes, tends to over-reject correct models more frequently. Typically the Santorra-Bentler correction is applied to ML χ2 statistic; however Hu and Bentler [8] proved that the GLS χ2 statistic performed better than the ML estimator even at smaller sample sizes, and one of the most adequate model fit statistics. However in large samples, both ML and GLS estimators are equally good.
In the case of non-central distribution of χ2 variate, for given dof, a no centrality parameter λ is introduced that measures the discrepancy between population covariance matrix and the fitted model, and is also a measure of the error of approximation [when λ = 0, it is the case of central χ2 distribution]. Non-central χ2 measure is predicted to be less dependent on sample size variations according to statistical theory McDonald and Marsh [9]. For comparison between alternative models, scaled no centrality parameter, SNCP= (χ2–df)/N, is often used.
Fit indices
The biggest limitation of the χ2 statistic is that while it works well with large samples, the trade-off is in terms of the large power that large samples inherently have to reject specified models that have only an incremental discrepancy with the population matrix. Hence taking a dichotomous decision based on χ2 test fails to make a clear decision regarding the model’s degree of fit. This gap in information is filled up by the use of fit indexes. It helps to alleviate many of the problems that are related to sample size and misspecification of the distributional assumption in χ2 test. FIs measure the variance and covariance accounted for by the specified model and do not test the null hypotheses unlike in the case of χ2 test. However several studies [ 10,11] have repeatedly shown that FIs too are not free of the effect of sample size and distributional misspecification, and even the model complexity.
Two main types of FIs are accepted in Literature: Absolute FIs and Relative FIs. Unlike χ2 statistic, all FIs are nonstatistical measures of model fit. There are also other types of Fit statistics: Parsimony FIs and non-centrality FIs (Appendix 1).
Absolute fit indexes
These FIs assess how well a priori model reproduces the sample data, with reference to the saturated model. A saturated model is assumed to exactly reproduce the observed covariance matrix. In this aspect the AFI resembles the R2 of a linear regression equation model. The Joreskog and Sorbom [12] adjusted AFI for ML methods is given by:
AGFIML = 1–[p (p+1)/2df](1–GFIML)
Where, p = Number of parameters (to adjust for parsimony) and
GFIML is given by:
GFIML = 1– [tr (Σ-1S–I)2 /tr(Σ-1S)2]
and is the measure of the relative amount of variance and co-variances in S, accounted for by implied model Σ.
Here S = Σ, GFI = AGFI = 1 for ML estimates.
Based on Akaike’s information criterion [10,13]
CAK = [TT / (N–1)] + [2q/ (N–1)]
and also another version of AIC called CK
CK = [TT / (N–1)] + [2q/ (N–p–2)].
Here p and q denote the number of variables and parameters respectively.
Both CAK and CK are used to select models, based on the criteria that smaller the value, better the fit of the implied model. Another index is McDonald’s [14]:
MCI = exp [–1/2 {(TT–dfT)/ (N–1)}]
MCI may range from 0 to 1, and may even exceed 1, due to sampling errors. Another fit index, called Hoelter [15] CN, aims at being independent of sample size. CN is thus also used to estimate the adequate sample size for an acceptable fit of the model in a χ2 test.
CN = {[z critical+sqrt (2df–1)]2 / [2TT / (N–1)]}+1
where zcritical = z value at given level of α. For adequate fit, CN should be ≥ 200.
ECVI approximates the goodness of fit that the estimated model would achieve in another sample with the same sample size N. ECVI also incorporates the number of estimated parameters for both structural and measurement models, and is calculated as:
2 /(N–1)] + [2* No. of estimated parameters/(N–1)]
ECVI is used for comparison of several alternative models and hence no threshold cut-off values or range are specified as acceptable values.
Residual analysis
Besides χ2 statistic and Fit Indexes, there is a relatively more straight forward method of evaluating model fits, called residual evaluation. Here we observe the discrepancy between observed correlations and model reproduced correlations. If this discrepancy is found to be small, then model fit is deemed to be high. The residual analysis should be carried out over and above the χ2 statistic and the FIs, and should serve to supplement our overall evaluation of model fit. Yet it is still unresolved in literature whether such residuals too are sensitive to sample size and estimation methods like that in the case of RMR statistic. Root Mean Square Error of Approximation (RMSEA) is, “representative of the goodness-of-fit that could be expected if the model were estimated in the population” Hair [16]. RMSEA measures the discrepancy per degree of freedom, and a value ≤ 0.05 indicates close fit and ≤ 0.08 indicates a reasonable fit [10]. Root Mean Square Residual (RMSR) is the square root of the squared residuals, which is the mean of the residuals between observed and input matrices. In the case of co variances, RMSR is the mean residual covariance, and in the case of correlations, RMSR is the mean residual correlation. RMSR is most usable and useful for correlation, since they are all on the same scale. However no threshold value for cut-off can be suggested or recommended for the use of RMSR and its use would be dependent on the research objectives and the actual or observed co variances/ correlations. Although both RMSEA and RMSR are discrepancy per degree of freedom, RMSEA is different than RMSR as it is measured in terms of population and not just the sample used for estimation Steiger [ 17].
Mulaik [18] suggests an improvised version of RMSEA by converting it to a 0-1 index, and multiplying the resulting value by PR (Parsimony Ratio which is equal to dof divided by the no. of data points). Through trial and error he found that getting the exponential of minus RMSEA gives an index that ranges between 0 and 1, with 1 being good fit (RMSEA is zero). This he calls the ER index (exponentialised RMSEA): ER = exp (–RMSEA).
From about 1.00 down to around .80, ER tracks the CFI quite closely. They deviate much more rapidly as CFI gets smaller than this. Converting the RMSEA to ER and obtaining ER*PR should have a value of around 0 .85.
Incremental FI
An Incremental FI in contrast to AFI measures the proportionate improvement in fit of the target model with a nested baseline model. Such a baseline model can be as a null model with all observed variables being uncorrelated, Bentler and Bonett [19]. These FIs are also known as comparative FI (CFI). Hu and Bentler [8] propose, based on Marsh et al. [9], 3 types of CFIs:
Type-1 Index: Information only from fitting target and baseline models of the optimized statistic is used, with no distributional assumption being made. However, both these types of models should follow the same fit function.
Bentler and Bonett’s [19] Normed Fit Index (NFI) is given by:
Where T denotes the non-negative statistic associated with target and baseline models, which is based on certain statistical and mathematical assumptions which are same for both models. Here the null model is used as a baseline model and NFI denotes that proportion of total covariance among observed variables that a given target model is able to explain w.r.t. the baseline model. Each of the test statistics, TB and TT are “normed” with a 0-1 range (may not follow the χ2 distribution) and TB ≥ TT for optimized indexes (Hu and Bentler). Bollen [20] has also suggested another type-1 index and is given by
[(TB/dfB) – (TT/dfT)]/ (TB/dfB)]
where df are the degrees of freedom of target and baseline models.
Type-2 Index: Additional information from expected values of target model is used under the assumption of central χ2 distribution, Tucker and Lewis [21] is given by:
[(TB/dfB) – (TT/dfT)]/ [(TB/dfB) –1]
Here assumption of normality is made and method of ML estimation is used. This Index is also called the Non-normed fit index (NNFI), as T statistic is not normed (does not lie in the 0-1 range).
Type-3 Index: Uses type-1 information along with information from the expected values of either or both the target and baseline models, under non-central χ2 distribution. It is expected that type-2 and 3 indexes should perform better than type-1 index. Based on the reduction in the amount of misfit that the target model causes w.r.t. the baseline model, Bentler has suggested a noncentrality fit index, given by:
δ = (λB–λT)/ λB
Where λB and λT are the noncentrality population parameters associated with the baseline and target models respectively. Hence δ would be larger when the misspecification of target model (λT) is lower.
Another form of the above fit index using estimates of the noncentrality parameters is given by Bentler [22]:
BFI (Bentler’s Fit Index) = [(TB–dfB) – (TT–dfT)]/ [(TB–dfB)]
If BFI falls outside the 0-1 range, it can be modified and respecified as comparative fit index:
CFI = 1-max [(TT–dfT), 0]/max [(TT–dfT), (TB–dfB), 0]
Parsimonious fit measures
The normed χ2 statistic, this is the ratio of χ2 divided by its degrees of freedom. This measure may reflect an over-fitted model based on chance as well as model misfit. Indicative values of normed chi-square for an acceptable model fit lie in the range of 1.0 and 2.0. But even this statistic is sensitive to sample size effects (hence unreliable) and should be used in combination with fit indexes [16].
Parsimonious Normed FI (PNFI) is defined as: (df proposed/df null)*NFI
Higher values are better and this measure is most suited for comparison of alternative models with different degrees of freedom. This measure rewards parsimony. Substantial model differences are said to be when the difference between alternate models are 0.06 to 0.09 Williams and Holahan [23].
Parsimonious GFI (PGFI) is defined as: [df proposed /1/2 *(No. of manifest variables)*(No of manifest variables+1)] *GFI
The value of PGFI lies in the range 0 to 1.0, and the higher the value, the higher the model parsimony is.
Akaike Information Criteria (AIC): This is a measure based on the statistical information theory, and is used for comparison between alternate models with varying number of constructs.
AIC= χ2 +2*No. of Estimated parameters
A value of AIC close to 0 indicated a better fit with greater parsimony; smaller values occur when estimated coefficients are less. Lower AIC values also denote that model is not only a good fit but is also not prone to over fitting.
Fit indices: limitations and challenges in use
There are huge challenges and limitations while using the correct fit indices for representing reality. Many of the absolute FIs like MCI, CN, CAK, CK etc., have N included in the formula of FI. Bearden [ 24] find that mean of NFI is positively associated with sample size, and for smaller sample sizes, NFI is far less than 1, possibly due to systematic fit index bias. Similarly, Bollen [20] also notes that it would be important to know whether N directly enters the calculation of FI. However subsequent research has not found many takers for this issue. Tanaka and Huba [25] say that estimation-specific FIs like GFI are more appropriate than estimation-general FIs like NFI, in case of finite samples under normal-theory based estimation methods.
Literature is still not clear about the adequacy of FIs when latent variables are dependent, and also when distributional assumptions underlying the estimation methods are violated. But at least from some studies it has been found that NFI, which is an unpredictable FI for small samples, becomes even more unpredictable when latent variables are dependent. Just like NFI, Bollen’s FI (BFI) is also sensitive to estimation methods; thus ML, GLS and ADF estimates are inconsistent, especially when N is less than 1000.Moreoever BFI is also sensitive to dependency among latent variables. Hu and Bentler [8] thus recommend not using BFI as a reliable FI for evaluating model fit. These authors further recommend that among Incremental FIs, TLI is quite independent of N, when ML estimation method is used and latent variables are independent (for N <= 1000). However GLS estimates of TLI were found to be under-estimated, and over-rejected models when N = 150. When latent variables are dependent, the mean value of TLI based on all three methods was related to sample size. The two authors also found that when N = 150, TLI (ML) rejected 30% of the models.
Even for other FIs like Bentler’s FI, MCI and Marsh’s non-centrality index [δ], a large proportion of models were rejected (compared to 0.9 as cut-off value), under condition of dependence of latent variables, using GLS and ADF methods. Hu and Bentler [26] have suggested that GFI at N >= 250, and cutoff value of 0.95 provides a reliable measure of evaluating model fit, even under conditions of dependency among latent variables. GFI behaved consistently across ML and GLS methods at all sample sizes, under conditions of independence of common and error variates, irrespective of the distributional form of the variates. When N = 5000, all three methods converged under dependency of latent variables, compared to AGFI where ML and GLS estimates converged at N >= 500, while all three estimates never converged up to N = 5000. CAK and CK are related to sample size, and were consistent across ML and GLS methods at all sample sizes under condition of independence of latent variables, and when N >= 500, when they were dependent, Hu and Bentler [26]. Means of MCI too was consistent across ML and GLS methods in being unrelated to sample size, under independence condition. However in the case of dependence of latent variables, all three estimates were related to mean of MCI. The cut-off value of 0.9 was found to be useless under ML/ GLS when N ≤ 500, under dependency of latent variable conditions, since it resulted in over-rejection of models; while under dependence conditions, when N ≥ 250, it resulted in 0% rejection of models, using ML/GLS estimation methods. Hoelter’s Critical N (CN) mean was found to be positively associated with sample size Hu and Bentler [ 26]. These authors recommend that a cut-off/critical CN value should be substantially higher than 200, to effectively evaluate the model fit. This may be because at N ≥ 250 under independence conditions, and N ≥ 500 under dependence conditions, there was almost complete acceptance of all models. In general, Hu and Bentler [26] conclude that type-1 incremental FI values (like NFI and Bollen’s 86FI) seem to be positively associated with sample size, compared to type-2 and 3, which seemingly are less biased and better performers than type-1 and absolute FIs. This means that NFI is less reliable than CFI, TLI, and BFI as evaluators of model fit. At small sample sizes, type-1 indexes are recommended for use only through ML estimation methods. In general also, ML is the more preferred method than GLS or ADF methods for estimation.
Hu and Bentler [8] recommend the following FIs for evaluating model fit under the said conditions: Independence of latent variables condition: ML based MCI, BFI (RNI), CFI, BL89 and TLI. TLI should be avoided when N is very small (say 50); in this case only GFI is recommended. Under dependency between latent variables (including error variates), the authors recommend using a cut-off value of greater than 0.9; however types-1, 2 and 3 and AFIs tend to over-reject models, when N<=250. In cases where N>=250, GFI, GLS, BFIML, CFI, BL89 and TLI are fairly adequate.
Hair et al. [16] suggest that to describe the strength of the model’s predictions, results from three different perspectives should be combined: overall fit, comparative fit to a base model, and model parsimony.
This view is tempered by Bollen [20] ...selecting a rigid cut-off for incremental fit indices is like selecting a minimum R2 for a regression equation. Any value will be controversial. Awareness of the factors affecting the values and good judgment are the best guides to evaluating their sizes.
The sample size phenomenon
The effect of sample size on the adequateness of the FI is the most prominent one. What constitutes adequate N for the adequateness of the fit index is decided by the trade-off between too little power in small sample sizes and too much statistical power in large samples. In other words the trade-off or the fine balancing act is in choosing between too little power to detect large discrepancies and too much power to detect trivial discrepancies. Schreiber et al. [3] suggest that sample size becomes an important issue since it determines the stability of the estimated parameters, and go on to recommend that replication with multiple samples may be the key to demonstrate the stability of parameters; however they suggest that with one sample analysis, 10 participants (data points) per estimated parameter can be a good thumb rule. Pohlmann [27] recommends estimating the model twice with the same data, by randomly splitting it into two halves.
Further complications arise from the sensitivity of sample size to the dependency between the latent variables. Normal theory statistics using ML/GLS χ2 statistics fail when common and error variates are dependent. Santorra-Bentler corrected χ2 statistic works better in GLS correction; more so when N is small. The third aspect of the sample size effect is that they affect the magnitude of FIs. Hence in case of dependencies of latent variables, no FI is adequate when N is small (when asymptotic assumption is violated). Between dependence and independence of latent variables, ceteris paribus, larger N is required in case of former when latent variables are dependent. As for χ2 test statistic, it is most appropriate when 100 ≤ N ≤ 200. Outside this range, the significance test becomes less consistent and less reliable.
Effect of the estimation method
Similar to sample size effect, the estimation method effect is more when latent variables are dependent. Hence while χ2 statistic is inadequate when dependency occurs, the Santorra-Bentler statistic is seemingly adequate. For ADF, a very large sample size (say 5000 or more) is required for the χ2 statistic to be adequate. Only GFI is an adequate FI, when ADF estimation method is used; while ML based methods underestimate the FI asymptotic values, to a lower extent. Under dependency conditions, all FIs behave inconsistently under all three methods, when N is small. When common and error variates are independent, type-2, 3 and AFIs are consistent across ML and GLS methods; while type-1 FIs behave erratically across the three methods in small sample sizes.
Model complexity
More complex and saturated models have higher values of FIs, due to lower dof [13,28,29]. To adjust for model complexity bias, resulting in a higher FI value due to overparametrisation, AGFI as a FI was proposed. Mulaik et al. [30] have proposed a parsimony ratio (defined as dof of target model relative to total number of relevant dof in the data) to penalize complex models. AIC was also developed to adjust the goodness of fit for the number of parameters estimated in the model. On similar lines CAK and CK are indexes which allow lesser complex models to be selected for smaller N and more complex models for larger N.
I would now like to provide a few thumb rules for the use of SEM fit indices (FIs) that can be used by researchers to report in their studies. These thumb rules have been arrived at by the following 3 step process:
1. Review of around 35 published studies using SEM FIs in the literature (1981-2006).
2. Reconciliation of all conflicting claims about different FIs in each of the 4 categories
3. Synthesis of the extant recommendations for the use of FIs, to arrive at few practically useful methods of application.
Thumb rules for use of model FIs
1. Regarding the overall fit, use the FIs cut-offs for continuous data as: RMSEA < 0.06, TLI > 0.95, CFI > 0.95, standard root mean square residual (SRMR) < 0.08 [3,8].
2. For categorical variables, use the above cut-off values, except SRMR; also weighted root mean square residual (WRMR) < 0.90 works well for continuous and categorical data and WRMR ≤ 1.0 even for moderately non-normal continuous data (e.g. 2002).
3. For non-normal continuous data when N > 250, the SB-based CFI cut-off value is 0.95 and SRMR at 0.07 (acceptable type I and II errors) (e.g. 2002). When, N ≥ 500, the TLI ML and CFIML at the suggested values were acceptable with non-normal data.
4. The power of TLI, CFI and RMSEA to detect models with misspecified loadings is higher than their power to detect models with mis-specified co variances (e.g. 2002). The power of SRMR is larger to detect models with mis-specified co variances. Yuan- Bentler statistic should be used when N is in the range 60-120 e.g. Bentler and Yuan [31].
5. Q-plot should be discussed, as the standardized residuals that depart excessively from the Q-plot line indicate that a misspecified model e.g. Byrne [32].
6. Only CN, NNFI and RMSEA are not significantly related to study characteristics. NNFI is the most suitable index as it was not significantly related to study characteristics such as sample size, number of indicators per latent variable, number of latent variables, number of estimated paths, and degrees of freedom e.g. Berndt [33].
Concluding Remarks
This article is not only a good summary of what we already know about SEM FIs, but it goes beyond this by cutting across the endless debates in various for (published studies, working papers and specially on the SEMNET-an e group for SEM users), by providing statistically valid and practically useful thumb rules which are supported by seminal studies in literature. All thumb rules have a very strong support from literature and hence are not conjectural in nature. We suggest that besides simulation, meta-analytic studies should be conducted to further validate our current understanding. Towards this end, the current study can be a good starting point. Future researchers can further the research on the appropriate application of model fit indices by extending research in this direction.