Principal Components Analysis of Stock Prices
1. Why?

In graduate school, I learned about Principal Components Analysis and did a project with it. Years later, I was curious to try this on stock prices.

In the past, I've tried automating a number of stock trading strategies based on Technical Analysis described in a number of investing books. Most technical analysis methods use only stock price and volume data which makes them easy to automate. However, I slowly came to the painful conclusion that it is very easy to write a convincing book and show several winning stock strategies when one has the benefit of applying them to historical data and can cherry-pick the ones that fit their model.

Our world today is highly interconnected. Companies produce products that drive other companies' production of their own products. Most investment advice suggests a diversified portfolio -- but what is diversity? For example, success of a steel manufacturing company and an automobile company are likely to be codependent. However, an automobile company is likely not to be impacted if a cosmetics company fails (Although sales of pink Cadillacs may be impacted).

I was curious if PCA analysis could help with this...

2. Results

My limited experiments so far reveal both some expected and unexpected outcomes. The graph below shows most (~29) of the Dow Jones Industrial Average stocks. [My list of DJIA stocks is a bit outdated and pre-dates the acquisition of SBC by AT&T]

Principal Components Analysis (PCA) finds the factors the contribute to the variance in a data series. Applied in this manner, PCA just indicates correlations between stocks without explaining why. Of course, correlation is not causation... A few expected simiarities are shown:

However, there are some things which seem odd.

Of course, it is entirely possible that PCA is being abused a bit here. Many of these stocks have absolutely nothing to do with each other. There may be statistical issues (such as short-term trends that pollute the covariance matrix or the fact that the various industries are not equally represented) causing the eigenvectors to be skewed toward a funny "direction".