Rise Of Automated Trading: Machines Trading S&P 500

Nowadays, more than 60 percent of trading activities with different assets (such as stocks, index futures, commodities) are not made by “human being” traders anymore, instead relying on automated trading. There are specialized programs based on particular algorithms that automatically buy and sell assets over different markets, meant to achieve a positive return in the long run.

In this article, I’m going to show you how to predict, with good accuracy, how the next trade should be placed to get a positive gain. For this example, as the underlying asset to trade, I selected the S&P 500 index, the weighted average of 500 US companies with bigger capitalization. A very simple strategy to implement is to buy the S&P 500 index when Wall Street Exchange starts trading, at 9:30 AM, and selling it at the closing session at 4:00 PM Eastern Time. If the closing price of the index is higher than the opening price, there is a positive gain, whereas a negative gain would be achieved if the closing price is lower than the opening price. So the question is: how do we know if the trading session will end up with a closing price higher than opening price? Machine Learning is a powerful tool to achieve such a complex task, and it can be a useful tool to support us with the trading decision.

Machine Learning is the new frontier of many useful real life applications. Financial trading is one of these, and it’s used very often in this sector. An important concept about Machine Learning is that we do not need to write code for every kind of possible rules, such as pattern recognition. This is because every model associated with Machine Learning learns from the data itself, and then can be later used to predict unseen new data.

Machine Learning is the new frontier of many useful real life applications

Machine Learning is the new frontier of many useful real life applications

Disclaimer: The purpose of this article is to show how to train Machine Learning methods, and in the provided code examples not every function is explained. This article is not intended to let one copy and paste all the code and run the same provided tests, as some details are missing that were out of the scope the article. Also, base knowledge of Python is required. The main intention of the article is to show an example of how machine learning may be effective to predict buys and sells in the financial sector. However, trade with real money means to have many other skills, such as money management and risk management. This article is just a small piece of the “big picture”.

Building Your First Financial Data Automated Trading Program

So, you want to create your first program to analyze financial data and predict the right trade? Let me show you how. I will be using Python for Machine Learning code, and we will be using historical data from Yahoo Finance service. As mentioned before, historical data is necessary to train the model before making our predictions.

To begin, we need to install:

Note that only a part of GraphLab is open source, the SFrame, so to use the entire library we need a license. There is a 30 day free license and a non-commercial license for students or those one participating in Kaggle competitions. From my point of view, GraphLab Create is a very intuitive and easy to use library to analyze data and train Machine Learning models.

Digging in the Python Code

Let’s dig in with some Python code to see how to download financial data from the Internet. I suggest using IPython notebook to test the following code, because IPython has many advantages compared to a traditional IDE, especially when we need to combine source code, execution code, table data and charts together on the same document. For a brief explanation to use IPython notebook, please look at the Introduction to IPython Notebook article.

So, let’s create a new IPython notebook and write some code to download historical prices of S&P 500 index. Note, if you prefer to use other tools, you can start with a new Python project in your preferred IDE.

import graphlab as gl
from __future__ import division
from datetime import datetime
from yahoo_finance import Share
# download historical prices of S&P 500 index
today = datetime.strftime(datetime.today(), "%Y-%m-%d")
stock = Share('^GSPC') # ^GSPC is the Yahoo finance symbol to refer S&P 500 index
# we gather historical quotes from 2001-01-01 up to today
hist_quotes = stock.get_historical('2001-01-01', today)
# here is how a row looks like
hist_quotes[0]
{'Adj_Close': '2091.580078',
'Close': '2091.580078',
'Date': '2016-04-22',
'High': '2094.320068',
'Low': '2081.199951',
'Open': '2091.48999',
'Symbol': '%5eGSPC',
'Volume': '3790580000'}

Here, hist_quotes is a list of dictionaries, and each dictionary object is a trading day with OpenHighLowCloseAdj_closeVolumeSymbol and Date values. During each trading day, the price usually changes starting from the opening price Open to the closing price Close, and hitting a maximum and a minimum valueHigh and Low. We need to read through it and create lists of each of the most relevant data. Also, data must be ordered by the most recent values at first, so we need to reverse it:

l_date = []
l_open = []
l_high = []
l_low = []
l_close = []
l_volume = []
# reverse the list
hist_quotes.reverse()
for quotes in hist_quotes:
l_date.append(quotes['Date'])
l_open.append(float(quotes['Open']))
l_high.append(float(quotes['High']))
l_low.append(float(quotes['Low']))
l_close.append(float(quotes['Close']))
l_volume.append(int(quotes['Volume']))

We can pack all downloaded quotes into an SFrame object, which is a highly scalable column based data frame, and it is compressed. One of the advantages is that it can also be larger than the amount of RAM because it is disk-backed. You can check the documentation to learn more about SFrame.

So, let’s store and then check the historical data:

qq = gl.SFrame({'datetime' : l_date, 
'open' : l_open, 
'high' : l_high, 
'low' : l_low, 
'close' : l_close, 
'volume' : l_volume})
# datetime is a string, so convert into datetime object
qq['datetime'] = qq['datetime'].apply(lambda x:datetime.strptime(x, '%Y-%m-%d'))
# just to check if data is sorted in ascending mode
qq.head(3)

close datetime high low open volume
1283.27 2001-01-02 00:00:00 1320.28 1276.05 1320.28 1129400000
1347.56 2001-01-03 00:00:00 1347.76 1274.62 1283.27 1880700000
1333.34 2001-01-04 00:00:00 1350.24 1329.14 1347.56 2131000000

Now we can save data to disk with the SFrame method save, as follows:

qq.save(“SP500_daily.bin”)
# once data is saved, we can use the following instruction to retrieve it 
qq = gl.SFrame(“SP500_daily.bin/”)

Let’s See What the S&P 500 Looks Like

To see how the loaded S&P 500 data will look like, we can use the following code:

import matplotlib.pyplot as plt
%matplotlib inline # only for those who are using IPython notebook
plt.plot(qq['close'])

The output of the code is the following graph:

Read the full article by Andrea Nalon, a Toptal freelance developer here. 




Leave comments

authimage

Copyright(c) 2017 - PythonBlogs.com
By using this website, you signify your acceptance of Terms and Conditions and Privacy Policy
All rights reserved