Author(s): Holt B, Benfer RA Jr
Abstract Share this page
Abstract The problem of missing data is common in all fields of science. Various methods of estimating missing values in a dataset exist, such as deletion of cases, insertion of sample mean, and linear regression. Each approach presents problems inherent in the method itself or in the nature of the pattern of missing data. We report a method that (1) is more general in application and (2) provides better estimates than traditional approaches, such as one-step regression. The model is general in that it may be applied to singular matrices, such as small datasets or those that contain dummy or index variables. The strength of the model is that it builds a regression equation iteratively, using a bootstrap method. The precision of the regressed estimates of a variable increases as regressed estimates of the predictor variables improve. We illustrate this method with a set of measurements of European Upper Paleolithic and Mesolithic human postcranial remains, as well as a set of primate anthropometric data. First, simulation tests using the primate data set involved randomly turning 20\% of the values to "missing". In each case, the first iteration produced significantly better estimates than other estimating techniques. Second, we applied our method to the incomplete set of human postcranial measurements. MISDAT estimates always perform better than replacement of missing data by means and better than classical multiple regression. As with classical multiple regression, MISDAT performs when squared multiple correlation values approach the reliability of the measurement to be estimated, e.g., above about 0. 8. Copyright 2000 Academic Press.
This article was published in J Hum Evol
and referenced in Molecular Biology: Open Access