Comparison of Statistical Models for the Estimation of Age at Death Using Subjective Adult Human Dental Indicators

Objective: The main objective of this paper is the estimation of age at death using subjective dental data. This is particularly useful in developing and under developed countries. Methods: This study provides a framework for the estimation of age at death using very subjective measurements of the teeth using (i) Generalized Linear Models (GLMs) and (ii) Generalized Additive Models (GAMs). These predictors of age were all ordinal in nature. A dataset comprising measurements taken on 71 maxillary incisors from different individuals at the time of their death was used. A comparison of two models – the Gamma GLM and the Gamma GAM is used to illustrate the flexibility of this method and the predictive power of the statistical modelling process. Results: The study showed the effectiveness of the models through the Akaike Information Criterion (AIC) as well as the proportion of correct predictions within each of the age groups. The Gamma GAM actually had the higher AIC but the better predictive values within the age groups. Conclusion: Statistical modelling caters for the types of data and can give reasonable predictions of age at death.


Introduction
In developed and developing countries, because of a lack of financial resources, it is often the case that law enforcement and forensic officers are faced with the daunting task of estimating age at death when confronted with partially decomposed human remains. A relatively cheap and effective way of age estimation from human remains is lacking.
Forensic odontology has its foundations in the observation and examination of teeth and the presentation of dental evidence in the field of law [1]. The resilience of dental tissue contributes significantly to the use of teeth as objects of interest in the estimation of age at death because of its resistance to environmental factors such as extreme temperatures and humidity. Past studies used mainly linear regression techniques in age estimation and relied on observational measurements taken on the teeth [2][3][4][5]. When the assumptions of this regression technique are not met, the model often overestimates (or underestimates) the age. In addition, any statistical hypothesis testing made on the model parameters are often incorrect because of inflated standard errors. This paper illustrates the efficacy of the GLM and GAM as alternative models for estimation of age at death. It is an extension of work done on a paper [6] on the same dataset. These methods are often more flexible than linear regression especially when the distribution of the response variable (in our case age) is not normally distributed. At the same time, the paper illustrates how Dentistry has a part to play in law enforcement efforts through forensic science. To the best of our knowledge, this is the first time the GAM is being used in forensic science applied to dental variables.

The data
We used the dataset from Lucy et al. [7]. This is a compilation of data on 71 maxillary incisors which were obtained from different individuals with ages ranging from 17-86 years. Figure 1 shows a crosssection of such a tooth. The dataset was originally collected by Lucy et al. [7] but it is the first time this data have been shown with such clarity.
The variables in the dataset consisted of a continuous response variable (age at death) and independent dental variables that are associated with age. These included periodontal recession, secondary dentin, apical translucency, root colour and root roughness of cementum. Periodontal recession, secondary dentin and apical translucency were measured using a seven point scoring system [5] whereas colour estimate and root roughness were measured using a five point scoring method [8].
continuously throughout the lifetime of the individual. The scoring system is based on measurements of lengths of the pulp chamber [9]. The apical translucency variable describes how transparent the more dense part of the teeth appears in light. Apical translucency is caused by an increase in mineral deposits on the teeth which usually starts at the root tip and moves upward towards the crown of teeth and is associated with aging [10]. Root roughness of cementum describes tissue that covers the root of the tooth and is used to anchor the fibres to the root surface. The amount of cementum on the root surface increases throughout the life of the individual and usually triples between the ages of 11 and 76 years [10]. Periodontal recession describe how much the roots of the teeth are exposed due to receding gums and is measured in millimetres (mm) by examining retraction of gums from root surface [11]. Root colour measures the degree of discoloration of the teeth which is related to increase in age ranging from a score of 1 referring to teeth with mild or no discoloration to a score of 5 referring to severely discoloured teeth [8].

The method
Previous studies involving the use of dental data for estimation of age at death have relied on simple (and multiple) linear regression analysis [1][2][3][4][5]. These methods are appropriate, when the residuals are normally distributed, independent and have constant variance. Most of the times these assumptions are violated. This study seeks to account for these deficiencies by taking into account the shape of the response variable (age) which is not normally distributed.
The GLM accounts for the variation in the data by using appropriate density functions that can be used to model the data. The GLM consists of three parts: • the random component, • the systematic component and • the link function.
The random component describes the probability distribution of the response variable. The systematic component involves the independent variables as linear predictors while the link function specifies functionally how the mean response is related to the independent variables in the linear predictor. There are always competing statistical models. We use the AIC Criterion as a guide and show that this may not always lead to a vastly superior model. All the data analysis was done using the R software.
The Generalized Additive Model has the linear predictor which depends linearly on unknown smooth functions of some predictor variables. It is therefore a more flexible method of modelling. Table 1 shows the mean and standard deviations of age for each category of the predictor variables. An ANOVA (or Kruskal Wallis) test was carried out to determine whether there were significant differences in ages among the groups for each of the variables depending on whether the assumptions of the ANOVA held or did not hold. P-values that were <0.05 indicated statistical significance. There were differences among the mean age categories for each variable. This can give an idea of which categories of the variables were able to between groups of variables. The categories of secondary dentin and aprial transcluency (and to a lesser extent colour estimate and root roughness) showed greater coverage of the entire age band covered in the study. This would suggest that these four variables may be reasonable univariate predictors of age. A gamma distribution was used to fit the response variable since age is non-negative. The gamma distribution has two parameters and changing these parameters can accommodate many shapes.

Model Fitting
Both the GLM and the GAM with a Gamma type response were used to fit the data. First all the predictors were included in the model and step wise regression was used to determine which of these predictors were necessary in the model. The predicted values of age for each of the 71 maxillary incisors were determined from both models and the percentage of correct predictions within the seven age bands were compiled (  To determine the best model for the data, it is customary to compare the Akaike Information Criterion (AIC) for both models. A model with the smaller AIC is usually deemed the better model. Table  3 shows the AIC for the Gamma GLM is marginally smaller than the AIC for the gamma GAM. The gamma GLM is therefore the better model based on the AIC model selection. However, Table 2 shows that the gamma GAM actually has a higher proportion of correct values within each of the age categories. Therefore within this context, it might be better to choose the gamma GAM model.

GLM Model AIC Value
Gamma GLM 509.72 Gamma GAM 512.98 Table 3: AIC values for both models.

Discussion
The previous studies has always relied on the linear regression model which assumes that the data are normally distributed. Age is positive and continuous hence a more appropriate distribution would be the gamma distribution. Previous studies usually had overestimated (and sometimes underestimated) age [2][3][4][5] because the modelling of the response variable was not properly done. Although utilizing only 71 cases in our study, the gamma GAM model does perform well when the age is less than 30 or between 41 and 60. The reason for the poor performance outside of these age categories could be attributed to the small number of cases in these age categories. The findings of this study are significant since they present a framework that can be used to predict age at death using simple subjective dental measurements. The study also shows how the use of statistical models can enhance forensic science. Future research may point in the direction of using larger datasets from various populations to validate some of our initial findings. The use of more sophisticated statistical models in age estimation at death using dental data is also an area of future research.
Authors should discuss the results and how they can be interpreted in perspective of previous studies and of the working hypotheses. The findings and their implications should be discussed in the broadest context possible. Future research directions may also be highlighted.

Conclusions
This study shows that with sufficient data and simple subjective measurements on the teeth of human remains it is possible to predict the age of death of a person within reasonable bounds. This method relies on measurements that do not require sophisticated equipment and is thus an inexpensive alternative that can be used quite easily in underdeveloped and developing countries.
We recognize that teeth structure vary among populations. A database needs to be constructed for each population so as to get results that are more in keeping with the norms in the population. Models for each population can be then constructed to give more valid predictions.
The main goal of this paper was to highlight the use of advanced statistical methodology in age estimation. The authors used the Johansen dataset which comprised of subjective measurements on 78 incisors. While the authors recognize that the dataset is not a new or up-to-date one, the authors used this purely as an illustration to highlight the power of using advanced statistical methodology. In fact, this paper paves the way for a greater detailed study. The authors are also cognizant of the fact that there are more technologically sound measurements that exist today such as radio graphic images which are highly accurate. However, the cost of implementing these would be prohibitive in underdeveloped countries.