S&P500 Daily stock Returns

T. Evgeniou, N. Nassuphis, D. Spinellis
INSEAD, Satrapade, AUEB

Disclaimer

This project is meant to be an example of how to organize a data analytics case study/project. It is not meant to provide insights for stock data or stock trading. It also does not build on any finance literature (e.g. regarding risk factors such as size, growth, or momentum).

The returns generated may also be different from the returns of, say, the S&P 500 index, as the universe of stocks/data used may be biased (e.g. survivorship bias).

Project Description

A simple analysis of daily stock returns of S&P 500 stocks.

The Data

10 years (from 2003-01-03 to 2013-04-12) of daily returns of 423 companies which were in the S&P500 index in February 2013. Every row is a day and every column is an individual stock. The data matrix has 2586 rows and 423 columns.

Histogram of Daily Returns

plot of chunk unnamed-chunk-1

How Cumulative Returns are Calculated

All returns reported correspond to the total sum of returns if we invest every day 1 dollar. For example, in this case the market returns is 110.8691, which means that we would have made a total of 110.8691% of 1 dollar, namely 1.1087 dollars. If the return was, say, -200%, we would have lost 2 dollars.

Note: No transaction costs are included. Moreover, given these are the stocks that "survived" in the S&P index until 2013, the returns are not the same as the actual returns of the S&P index.

Cumulative Returns of the Equally-Weighted Market

plot of chunk unnamed-chunk-3

Cumulative Returns of the Equally-Weighted Market

Interactive chart: Put the mouse on the plot to see daily values, and zoom using click-and-drag with the mouse in the smaller graph below.

Summary Statistics of Equal Weighted Market

Summary Statistics Daily Market Returns
V1 V2 V3 V4 V5 V6
Min. :-10.543 1st Qu.: -0.514 Median : 0.099 Mean : 0.043 3rd Qu.: 0.673 Max. : 10.948

Monthly and Yearly Returns of the Equal Weighted Market

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Year
2003 -5.80 -1.40 1.30 8.00 8.90 1.20 2.70 4.50 -1.60 7.10 2.10 4.20 31.20
2004 1.60 2.80 0.30 -2.10 2.20 3.20 -3.60 -0.20 3.60 2.10 5.90 3.60 19.60
2005 -2.20 3.20 -1.10 -3.20 4.70 1.80 5.70 -0.90 1.10 -2.00 4.30 0.70 12.20
2006 4.90 -0.10 2.30 1.00 -3.40 0.30 -1.60 2.60 2.10 3.50 2.80 0.20 14.70
2007 2.70 -0.40 1.10 4.10 3.40 -1.90 -3.30 1.10 3.20 2.20 -4.30 -0.90 7.10
2008 -5.50 -1.90 -0.60 5.40 3.40 -9.50 -0.70 2.40 -10.10 -23.50 -10.60 3.10 -48.30
2009 -8.20 -11.80 9.20 14.40 4.50 -0.20 9.00 4.20 4.80 -3.00 5.20 4.30 32.30
2010 -3.80 4.00 6.40 3.00 -8.00 -6.30 6.60 -5.10 9.70 3.40 1.00 6.90 18.00
2011 2.00 3.90 0.90 2.80 -0.70 -1.90 -3.80 -6.60 -9.30 12.40 -0.30 0.00 -0.80
2012 5.20 3.70 2.30 -0.90 -7.50 3.40 0.30 2.60 2.00 -1.00 0.90 2.10 13.20
2013 5.90 1.10 3.80 0.90 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 11.70

Best Stock (in terms of returns) with Hindsight

Stock: MNST

plot of chunk unnamed-chunk-8

Worst Stock with Hindsight

Stock: C

plot of chunk unnamed-chunk-10

Mean Reversion of the Market

mr_strategy = matrix(-sign(shift(market, 1)) * market, ncol = 1)

plot of chunk unnamed-chunk-12

Monthly and Yearly Returns of the the Mean Reversion Strategy

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Year
2003 6.10 3.20 -4.30 0.90 2.00 -3.10 3.40 -5.90 6.00 0.90 -2.70 0.30 7.00
2004 0.40 -5.00 -5.40 -1.40 -2.20 2.80 5.90 -4.70 2.70 -2.20 -2.30 0.60 -10.80
2005 -1.70 1.80 -0.10 6.70 3.00 1.70 0.20 2.40 -1.10 0.10 -0.50 0.50 13.00
2006 2.10 4.40 -1.30 -2.20 1.30 -6.40 0.70 -0.90 3.00 2.70 -4.90 -0.80 -2.20
2007 0.20 -4.90 -0.40 -2.40 0.40 2.80 8.30 5.90 8.90 6.80 16.20 1.00 43.00
2008 4.90 -6.20 11.70 -6.70 3.10 2.90 14.60 5.90 25.90 -23.10 1.40 39.40 73.80
2009 11.20 -2.60 7.70 -4.60 2.70 -1.20 2.80 -2.40 -4.50 1.30 2.70 -2.40 10.60
2010 2.80 1.50 -4.90 -4.70 9.20 -3.50 0.00 2.00 1.40 5.50 -3.40 -0.20 5.60
2011 4.40 -2.40 -2.70 -1.80 -2.60 0.80 -1.30 -6.30 -7.70 4.80 -6.40 5.50 -15.70
2012 -1.20 2.40 -1.00 -3.40 -6.40 -4.40 -7.70 -0.90 0.00 -2.00 -3.90 0.90 -27.50
2013 -1.60 7.30 -0.70 2.20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 7.20

Mean Reversion of the Market: only when the market fell the day before

plot of chunk unnamed-chunk-14

Monthly and Yearly Returns of the the Mean Reversion Strategy after Up Days only

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Year
2003 0.30 0.90 -1.50 4.40 5.50 -0.90 3.10 -0.70 2.20 4.00 -0.30 2.30 19.20
2004 1.00 -1.10 -2.60 -1.70 0.00 3.00 1.20 -2.50 3.20 0.00 1.80 2.10 4.40
2005 -1.90 2.50 -0.60 1.80 3.90 1.80 3.00 0.70 0.00 -1.00 1.90 0.60 12.60
2006 3.50 2.10 0.50 -0.60 -1.00 -3.10 -0.40 0.80 2.60 3.10 -1.10 -0.30 6.20
2007 1.50 -2.60 0.40 0.80 1.90 0.50 2.50 3.50 6.00 4.50 6.00 0.10 25.10
2008 -0.30 -4.10 5.60 -0.60 3.20 -3.30 7.00 4.10 7.90 -23.30 -4.60 21.20 12.70
2009 1.50 -7.20 8.40 4.90 3.60 -0.70 5.90 0.90 0.10 -0.90 4.00 1.00 21.50
2010 -0.50 2.80 0.80 -0.80 0.60 -4.90 3.30 -1.50 5.60 4.40 -1.20 3.30 11.80
2011 3.20 0.70 -0.90 0.50 -1.70 -0.60 -2.60 -6.40 -8.50 8.60 -3.40 2.70 -8.20
2012 2.00 3.10 0.60 -2.20 -7.00 -0.50 -3.70 0.90 1.00 -1.50 -1.50 1.50 -7.20
2013 2.20 4.20 1.50 1.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 9.50

Mean Reversion of the Market: only when the market rose the day before

plot of chunk unnamed-chunk-16

Most Mean Reverting Stock with Hindsight

Stock: HBAN

plot of chunk unnamed-chunk-18

Most Momentum Stock with Hindsight

Stock: MU

plot of chunk unnamed-chunk-20

Average ("Market") of Mean Reversion of All Stocks

plot of chunk unnamed-chunk-21

Average of Selecting between Mean Reversion and Momentum for each Stock

What if we select (with hindsight) whether to follow a mean reverting or momentum strategy for each individual stock - e.g. choosing the one of the two that leads to the best cumulative returns over the entire 10 years period?

Note: this requires exactly 1 bit of information for each stock, namely only 423 bits of information with hindsight for the entire 10 years of 423 stocks, namely of 1093878 real numbers.

Average of Selecting between Mean Reversion and Momentum for each Stock

The code:

selected_strat = apply(mr_ProjectData, 2, function(r) if (sum(r) < 0) -r else r)
selected_mr_market = apply(selected_strat, 1, mean)

Average of Selecting between Mean Reversion and Momentum for each Stock

plot of chunk unnamed-chunk-23

Averaging with Hindsight per Time Window

One can repeat the same every day, or every time period of some length - fixing the momentum or mean selection choice for each stock for the entire period

Averaging with Hindsight: Recent Third of the Days

Repeating the same but making the selection only using the performance of the recent third of the days, namely the last 862 days

plot of chunk unnamed-chunk-24

Analysis with Hindsight

Note: For computational reasons and simplicity, all the analysis in this note is performed with hindsight. One could perform the exact same analysis using a rolling window (e.g. of 250 or 60 days for example), doing every day the same analysis using the data in the corresponding window and deciding the stocks to trade the next day.

Principal Component Analysis of Daily S&P 500 Stock Returns: The Scree Plot

plot of chunk unnamed-chunk-25

Principal Component Analysis of Daily S&P 500 Stock Returns: Variance Explained

Eigenvalues and Variance Explained
eigenvalue percentage of variance cumulative percentage of variance
comp 1 175.220 41.423 41.423
comp 2 14.593 3.450 44.873
comp 3 11.487 2.716 47.589
comp 4 8.789 2.078 49.666
comp 5 4.944 1.169 50.835

Returns of First Principal Component

Correlation with the market: 0.9998

plot of chunk unnamed-chunk-28

Portfolio weights of First Principal Component

plot of chunk unnamed-chunk-29

Returns of Second Principal Component

Correlation with the market: 0.0211

plot of chunk unnamed-chunk-31

Portfolio weights of Second Principal Component

plot of chunk unnamed-chunk-32

Top Long and Short Stocks in Second Principal Component

Top 10 stocks with the largest positive weight: DVN, APA, DO, NOV, EOG, DNR, SWN, NBL, NE, CHK

Top 10 stocks with the largest negative weights: BBT, STI, MTB, CMA, JPM, WFC, ZION, USB, DLTR, FHN.

Residual Portfolios

  1. Esimate "risk factors"

  2. Regress daily returns of a stock on these factors using least squares regression (or any other regression method)

  3. Estimate the residuals

  4. Trade the portfolios generating these residuals (with weights scaled to invest the desired amount)

Residual Portfolios: Example

  • We use the first 3 Principal Components of our data as "risk factors"

  • We assume 0 mean and 0 alpha/regression constant

  • Scale the regression weights ("betas") to have norm 1.

Residual Portfolios: Example Code (Part 1)

SP500PCA_simple <- eigen(cor(ProjectData))
TheFactors = SP500PCA_simple$vectors[, 1:numb_components_used]
TheFactors = apply(TheFactors, 2, function(r) if (sum(ProjectData %*% r) < 0) -r else r)
TheFactors = apply(TheFactors, 2, function(r) norm1(r))
Factor_series = ProjectData %*% TheFactors
demean_IVs = apply(Factor_series, 2, function(r) r - use_mean_alpha * mean(r))
ProjectData_demean = apply(ProjectData, 2, function(r) r - use_mean_alpha * 
    mean(r))
XXtY = (solve(t(demean_IVs) %*% demean_IVs) %*% t(demean_IVs))
stock_betas = XXtY %*% (ProjectData_demean)
Ybar = t(stock_betas) %*% matrix(apply(Factor_series, 2, mean), ncol = 1)
stock_alphas = apply(ProjectData_demean, 2, mean) - Ybar
stock_alphas = use_mean_alpha * matrix(stock_alphas, nrow = 1)
stock_alphas_matrix = rep(1, nrow(ProjectData)) %*% stock_alphas

Residual Portfolios: Example Code (Part 2)

# make sure each residuals portfolio invests a total of 1 dollar.
stock_betas_stock = apply(rbind(stock_betas, rep(1, ncol(stock_betas))), 2, 
    norm1)
stock_betas = head(stock_betas_stock, -1)  # last one is the stock weight
stock_weight = rep(1, nrow(ProjectData)) %*% tail(stock_betas_stock, 1)
Stock_Residuals = stock_weight * ProjectData - (Factor_series %*% stock_betas + 
    stock_alphas_matrix)

Trading Long-Short Stocks-Risk Porftolios

Note that "trading the residuals" implies that every day we trade the portfolios corresponding to the residuals (with portfolio weights given by the estimated "betas", scaled to invest 1 dollar every day).

Best Residual Portfolio (with hindsight)

Stock: MNST

plot of chunk unnamed-chunk-37

Most Mean Reverting Residuals Portfolio

Stock: XRX

plot of chunk unnamed-chunk-39

Average of Selection with Hindsight

We can repeat the analysis above using the residuals portfolios.

We select with hindsight whether to use mean reversion or momentum for each residual portfolio for the entire 10 years period (hence 1 bit of information with hindsight per stock) and then average.

Average of Selection with Hindsight

The Code:

selected_strat_res = apply(mr_Stock_Residuals, 2, function(r) if (sum(r) < 0) -r else r)
selected_mr_market_res = apply(selected_strat_res, 1, mean)

Average of Selection with Hindsight

plot of chunk unnamed-chunk-41

How Many Bits of Information are there in the S&P500 Data?

The results "with hindsight" may give the impression that, even though one cannot reach those results in practice, there is a lot of potential. Afterall one only has to select 423 binary variables for the entire 10 years of data: whether to follow a mean reversion or a momentum strategy for each individual stock or residual portfolio for the entire 10 years period. At first glance, making only a "423 bits" decision (you can think of it as if you "only see 423 bits of information for the entire 10 years for all 423 stocks, namely for 1093878 real numbers!") does not seem much at all - especially if this data is "close to random" (note: known risk factors, such as the momentum one, indicate this is not the case - depending on how one models the series). But maybe this is indeed as many bits of information as one could possibly need to "know all about the S&P 500 stocks for 10 years"...

As always, one has to be very aware of the signal to noise ratio in the data one explores. This is what "fooled by randomness" can really mean.

What if we know with hindsight other 423 bits?

Instead of selecting between mean reversion and momentum for each of the 423 stocks or residual portfolios, we could select with hindsight whether to buy (long) or sell (short) each of the 423 stocks. Are these 423 bits as informative?

Here is the code for seeing the returns of such a portfolio. Try it by uncommenting the plot line.

hindsight_long_short = apply(ProjectData, 2, function(r) if (sum(r) < 0) -r else r)
hindsight_long_short_market = apply(hindsight_long_short, 1, mean)
names(hindsight_long_short_market) <- rownames(market)
# pnl_plot(hindsight_long_short_market)

Not all "binary choices" have the same information...

Lessons Learned

  • Basic analysis of daily stock returns.

  • There appear do be market regimes.

  • The "equally weighted market" is the first Principal Component of the daily returns data.

  • Example of statistical estimation of, what one could call, "risk factors".

  • Example mean reverting or momentum daily trading strategies.

  • It only take a few bits of information with hindsight to get fooled by randomness with this data.