Brandon Renfro* and Rebecca Asante
Hampton University, Hampton, Virginia, USA
Received date: July 07, 2014; Accepted date: August 10, 2014; Published date: August 17, 2014
Citation: Brandon Renfro and Rebecca Asante (2014) Critical Analysis of the Stochastic Volatility of the S&P 500 Index between 2000-2010. Bus Eco J 5:102. doi: 10.4172/2151-6219.1000102
Copyright: © 2014 Renfro B, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Business and Economics Journal
This paper presents an empirical analysis of the correlation between some demographic and financial predictor variables and the stochastic volatility of the Standard and Poor’s (S&P) 500 index between January 2000 and December 2010 inclusive. In particular, the predictor variables used for the statistical analysis are: prime rate (PR)(t), the United States population proportion between the ages of 40-64 (PP(t)), inflation rate (IR(t)), logarithm of the unemployment rate (log(UE)(t)), and consumer confidence (CC)(t)). The empirical relationship between these variables is established using multiple regression analytic techniques with EXCEL software. The relevance of each predictor variable is assessed by inspection of the P-value of the associated multiple regression coefficient. The plot of the observed and modeled S&P 500 index for the 149 data points (months) corresponding to the period spanning January 2000 and December 2010 elucidates the potential of the empirical model to forecast the volatility of the S&P 500 for the period in question. The constructed empirical multiple regression model for the observed S&P 500 has the configuration:
The adjusted R2 for the empirical model is approximately 0.49 .This means that during the period 2000-2010, about 49% of the variability of the S&P 500 volatility could be explained by the information accrued from the joint influence of the five predictor variables.
Stochastic Volatility; Predictor Variables; Standard and Poor Index
The Standard and Poor’s (S&P) 500 is a free-float capitalization-weighted index which is a measure of the market dynamics of 500 large companies listed on the NYSE or NASDAQ. The long-term dynamics of the S&P 500 can be influenced by macro-economic, demographic, and financial factors.
This paper is an attempt to utilize a mathematical model to explain the high volatility of the S&P 500 during the period 2000 to 2010. It is plausible to hypothesize that the stochastic dynamics of the S&P 500 during the decade of 2000 to 2010 was influenced jointly by a linear combination of demographic, macro-economic, and financial predictor variables. Demographic factors, such as age groups, specific age structure, year of birth, number of persons per household, family size, birth and death rate, and retirement age can affect macro-economic variables such as aggregate consumption, savings, asset acquisition, labor supply, and social programs.
The Baby Boomers consist of persons born over a 18 year period from 1946 to 1964 and are widely distributed by age across that period. During the period 2000-2010, the Baby Boomer generation population attained ages in the range 40-64 years. Thus during the period of high volatility and turbulence of the S&P 500 of the period 2000-2010, the Baby Boomers were in their prime asset acquisition and savings age. There are several conjectures of the possible effects of aging Baby Boomers on the dynamics of the financial markets. Baski and Chen , conjectured that relative risk aversion in asset acquisition is positively correlated with age and configured a life-cycle risk aversion hypothesis and established a positive statistically significant relation between U.S. stock excess returns and growth in the average age of the U.S. population. Wallick, Shanahan, Tasopoulous , stated that a 2006 analysis by the U.S. Government Accountability Office of S&P 500 index’s stock market returns, from 1948 through 2004 points out that demographic variables accounted for less than 6% of stock market variability. This, they explained, was far less than macro-economic, financial, and other predictor variables. Based on 2010 SCF data provided by Wallick, Shanahan, and Tasopoulous , the Baby Boomers from ages 46-64 owned nearly 47% of the U.S. equities. Their data indicated that in the year 2000, the age group 34-52 owned 40% of U.S. equities. Also, in the years 2004 and 2007, the age groups 40-58 and 43-61 owned respectively 31% and 35% of U.S. equities. Poterba , could not find a statistically significant positive relationship between population age structure and equity returns in the U.S. He stated that he could not find robust evidence in a time series plot that equilibrium returns on financial assets vary in response to changes in population age structure.
This paper examines the extent to which demographic factors and financial factors could jointly have influenced the S&P 500 index volatility of the time period spanning from 2000 to 2010. In particular, the paper investigates the joint role of the population proportion of the 40-64 year olds in the U.S., the U.S. inflation rate, the logarithm of the U.S. unemployment rate, the U.S. prime lending rate, and the U.S. consumer confidence index on the observed S&P 500 stochastic time series dynamics during 2000 to 2010.
This paper’s contribution is the formulation of an empirical mathematical model which relates the joint influence of slow-moving demographic predictors (population proportion of 40-64 year-olds, log of unemployment) and financial factors (prime rate, inflation rate) on the S&P 500 index for the period 2000 to 2010.
The paper is organized into four sections: introduction, methodology, results, and conclusion.
In this section, the statistical techniques used in the analysis will be presented.
Definition of predictor variables
The predictor variables and their acronyms are specified as follows for the period January 2000 to December 2010.
PR(t): Prime rate measured in percentages
PP(t): Population proportion of Baby Boomers aged between 40-64. It is measured as a dynamic fraction with range from 0 to 1.
Log(UE)(t): Logarithm of the unemployment rate. This predictor variable is used because the S&P 500 index between 2000-2010 exhibits a relatively greater correlation with the transformed variable than the untransformed employment rate. Its values span both negative and positive real numbers. The predictor variable has dimensionless units.
IR(t): Inflation rate measured in percentages and expressed as a decimal fraction
CC(t): Consumer Confidence Index expressed as a number depicting how people feel about the economy based on their responses to certain macro-economic and financial questions.
YSP50: The observed S&P 500 index. This is the independent variable.
ŶSP500: The sample S&P 500 index. The empirical statistical model.
Construction of the empirical formula
In this subsection, an elaborate mathematical model will be constructed depicting the relation between the chosen predictor variables and the independent variable, the observed S&P 500 index.
Let, S=[PR(t), PP(t), IR(t), Log(UE)(t), CC(t)] be the set of the predictor variables.
It should be noted that this set may not be complete in the sense that it may not necessarily contain all the relevant influential variables that dictate the stochastic dynamics of the S&P 500 index. The completeness of the set will be determined by the value of the adjusted R2 variable, the Durbin-Watson measure, and the P-value of the global multiple regression F ratio.
The linear combination of the elements in the set S is a linear predictor of the observed S&P 500 index from 2000 to 2010.
The observed global S&P 500 index from 2000 to 2010 is related to the predictor variables in the set S by the multiple linear regression formula
YSP500 =β0+ β1PR(t)+ β2PP(t)+ β3IR(t)+ β4log(UE)(t)+ β5CC(t)+εt (2.1)
where the constant coefficients are defined as follows:
β0: intercept of observed S&P 500 multiple regression line.
βi: Actual coefficients of the of observed variables S&P 500 index in the multiple regression model.
In particular, the sample multiple regression line is defined by the equation:
ŶSP500=b0+b1PR(t)+ b2PP(t)+ b3IR(t)+ b4log(UE)(t)+ b5CC(t)+ εt (2.2)
where the constant coefficients are defined as follows:
b0: intercept of predictor S&P 500 multiple regression line.
b1: Coefficients of the of predictor variables of the S&P 500 index in the multiple regression model.
Computation of the sample coefficient of determination (R2)
The multiple coefficient of regression R2 is defined by the formula:
Ŷi: is the predicted value of the S&P 500 index from sample.
Yi: is the observed value of the S&P 500 index.
: is the mean value of the sample Sand P values in sample data.
In particular the sample R2=0 implies complete absence of correlation between predicted and observed S&P values; whereas R2=1 implies complete fit of sample data to the observed Sand P values. The sample R2 for the multiple linear regression analysis gives the measure of joint influence of the predictor variables on the predicted model.
Hypothesis testing of the coefficients
The predictive utility of the joint (partial) effect of regressors in the set S can be determined by the following hypothesis tests.
H0: βi= 0. For 0= 1, 2, 3, 4, 5
Hα: Not all βi’s are zero
Rejection Region: F>Fα (k,n-(k+1))
Where n=number of data points (149)
k=number of β values in the model, excluding β0 (5)
The Durbin-Watson Test
This is a test that determines whether residuals from a linear regression or multiple regression are independent or auto-correlated.
H0: ρ=0(no auto-correlation)
H1: ρ>1(existence of correlation)
Durbin-Watson Test Statistic:
where ei=yi-ŷi and yi and ŷi are respectively the observed and the predicted values of the response variable for individual i. d becomes smaller as the serial correlations increase. Let dU and dL be the upper and lower critical values of the DW test statistic for a given level of significance α.
The conclusions are as follows:
If d<dL reject H0: ρ=0
If d>dU do not reject H0: ρ ≠0
If dL<d<dU test in inconclusive
If H0 is rejected, it means that the errors are positively auto-correlated and hence the covariance of the errors is not zero. This will undermine the ability of the model to forecast S&P 500 index beyond the given period.
The data for the statistical analysis of the S&P 500 was obtained from Yahoo finance and the website maintained by Robert Shiller. Initially, thirty-one predictor variables including their logarithms and reciprocals were used as regressors on the S&P 500 data. Using the level of significance of 5%, the set S was chosen which consisted of prime rate, proportion of the population between the ages of 40 and 64, inflation rate, the log of unemployment, and consumer confidence. The data was analyzed using Microsoft EXCEL 2013. In particular, the constants and coefficients of the sample multiple regression analysis and their significance were determined using the EXCEL software. There are 149 data points which represent the number of months from January 2000 to December 2010.
The EXCEL data output contains information on the coefficients, confidence intervals, R2 , adjusted R2 , F-value, P-value for the F-statistic, and P-values of significance of the predictor variables at 5% level of significance. The hypothesis test for the significance of the regressors has the form:
Hoi: The ith regressor in S has a significant impact on S&P 500.
Hdi: The ith regressor in S does not have a significant impact on S&P 500.
In this section of the paper, the EXCEL output of the regression data will be presented and discussed.
Regression of S&P 500 and individual predictors
The results of the regression of the observed S&P 500 index in each of the regressors are listed in Figure 1, Figure 3, Figure 5, Figure 7, Figure 9 and Figure 11. The corresponding graphs are exhibited in Figure 2, Figure 4, Figure 6, Figure 8, Figure 10 and Figure 12.
The results of the analysis in this section show that separately the individual predictor variables except the population proportion are significant in explaining the observed S&P 500 stochastic dynamics. On the other hand, the graphical output and the corresponding R2 values show that each of their individual influences are relatively minimal as compared to their joint influence. A point of interest is the fact that the R2 value of the S&P 500 regression on population proportion has a zero value. This finding is analogous to Poterba’s  finding of lack of positive correlation between the percentage of the population between the ages of 40 and 64 and the real return on Treasury bills and long-term government bonds. The joint influence of the predictor variable is considered in the next subsection.
Multiple Regression of the observed Sand P 500 Index on the Linear combination of predictors
In this subsection, the relative joint influence of all the five variables on the observed S&P 500 index between January 2000 to December 2010 are analyzed using multiple linear regression.
The EXCEL output data for the analysis is exhibited in Table 6 and the corresponding graph is shown in Figure 6.
By inspection, the predicted S&P 500 index model gives a better fit to the observed S&P 500 index than those of the individual predictor variables considered separately.
The current paper uses empirical multiple regression techniques to quantitatively explain the observed high frequency fluctuation of the S&P 500 during the financial turbulent period of 2000-2010 in the U.S. The finite set of demographic and financial variables used as predictor factors comprise the prime rate (PR(t)), population proportion of persons aged between 40-64 (PP(t) , inflation rate (IR(t)), logarithm of the unemployment rate (Log(UE)(t), and Consumer Confidence Index (CC(t)). Of the predictor variables chosen, the demographic factor population proportion between the ages of 40-64 has been the subject of conjecture as to its ability to influence high frequency changes in the S&P 500 index [2,4,5,6,7,8]. In particular, the period chosen for the analysis, 2000-2010, has unique demographic, macro-economic, and financial features such as a stable birth rates, stable life expectancy, rising average age, Baby Boomers’ attainment of prime age of 40-64, rising economic uncertainty, collapse of the housing market, collapse of sub-prime lending, and rising unemployment. Thus, the empirical model consisting of the selected predictor variables may be applicable uniquely to the period 2000-2010.
Figure 1 through Figure 5 elucidate the respective dynamics of each of the predictor variables in comparison to the observed S&P 500 index time series from January 2000 to December 2010 represented by the 149 data points. The joint influence of the predictor variables on S&P 500 index dynamics is shown in Figure 6. In particular, the P-value of the hypothesis test for the linear regression coefficient for the population proportion (of Baby Boomers aged between 40-64) is 0.011. This consolidates the assertion by Porteba , that demographic changes have minimal influence on high frequency fluctuation of financial indices such as the S&P 500 index.
Table 6 and Figure 6 depict the summary of the data and hypothesis tests for the multiple regression of the S&P 500 index on the five predictor variables. The measure of R2=0.49 implies that the five predictor variables jointly in the sample account for 49% of the observed volatility of the S&P 500 index between 2000-2010. The corresponding P-value of the Global F-statistic is essentially zero. This means that in the population, the five predictor variables were significant in explaining the high frequency stochastic dynamics of the S&P 500 index during the time period under investigation. The Durbin-Watson test for the residuals of the multiple regression analysis is 0.011 which indicates that the results are applicable uniquely to the period under consideration and may not be accurate in forecasting the dynamics of the S&P 500 for other periods. In particular, it also implies that the set of predictor variables are incomplete and other macro-economic and financial variables may be implicated in the turbulence of the S&P 500 index for 2000-2010.