Stock market prediction using machine learning (elman, regression, and GMDH)

My primary interest is machine learning and computer vision, but in winter quarter, I took a graduate course in computational statistics.

We had a fun group project that involved using R to analyze stock prices which later turned into a presentation at SOURCE 2016 when we added some machine learning techniques to make it more interesting.

There is a great R package called Quantmod which we used to get stock data. “http://www.quantmod.com/

It is very easy to use, for example:


library(quantmod)
library(ggplot2) # Include ggplot so we can graph it. 
start <- as.Date("1986-03-01")
end <-as.Date("2015-12-30")
getSymbols(c('AAPL','MSFT','^IXIC','NDX'), from = start, to = end)

Loads the Quantmod package and gets stock price information between 1986-03-01 and 2015-12-30 for Apple, Microsoft, NASDAQ, and the Nasdaq Composite automatically.

Want to quickly graph the closing prices of Microsoft stocks during that time? That’s just 2 lines of code:



MSFT.df = data.frame(date=time(MSFT), Cl(MSFT))

ggplot(data = MSFT.df, aes(x = date, y = MSFT.Close)) + geom_point() + geom_smooth(se = F) + labs(x = "Date", y = "Close")

Closing prices of Microsoft stock as given by quantmod package and graphed with ggplot2.
Closing prices of Microsoft stock as given by quantmod package and graphed with ggplot2.

As you can see, R facilitates very fast data analytics.

We went on to make some simple predictive regression models and used the R packages RSNNS and GMDH package.

Like most R packages, it’s very easy to use RNSS:


library(quantmod)#for stock data
library(RSNNS) # Stuttguart neural network simulator. 

The training and prediction code segment is here:


modelElman = elman(df$date, df$MSFT.Close, size=8, learnFuncParams=c(0.1),maxit=1000)
predictions = append(pre,predict(modelElman,n+1)[1])

We ran this in a loop to get a series of predictions for various dates.

It’s similarly easy to use the GMDH model:


#####create time series
n = nrow(df)
stock <- ts(df, start=1, end=n, frequency=1)
#####predict
out = fcast(stock, input = 3, layer = 4, f.number = 1, tf = "all")
pre = append(pre,out$mean[1])

We then did a simulation to see which method performs the best on a range of stock values using a simple investment strategy:

Every time the model says the stock prices will go up tomorrow, buy 10 shares.
Every time the model says the stock prices will go down tomorrow: sell everything!
Continue for a year.

Elman neural networks gave the best results on a per stock basis, followed very closely by GMDH and regression far behind. Interestingly, however, if you were to follow this strategy with all the models in 2015, you would actually gain money from both Elman and regression. Surprisingly, GMDH lost money.

This is what you’d make if you used our model and investment strategy using YAHOO, JP morgan, CMS Energy Corporation, Verizon APPLE and Microsoft.

2015:
ELMAN $1334.999
REGRESSION $383.696
GMDH $-623.0998

It’s surprising that an Elman neural network did this well with only closing prices. Obviously, closing prices alone are not very reliable predictors of future stock prices but it managed anyway.

Clearly no one should actually use such a simple method with real money at stake, but it’s still interesting.