How to Run the World’s Easiest Backtest
Backtesting a crypto trading strategy in just 2 lines of python code with Sanpy
In the most general sense, backtesting is the process of analyzing the performance of a trading strategy based on historical data. Through this process, your main goal is to find a data-backed answer to a single question: what would be my returns had I traded this strategy over a set period of time?
In my experience as a data scientist in crypto, one of the biggest obstacles to running your own backtests is the steep learning curve linked to quantitative trading.
Typical backtest frameworks are still relatively difficult for laymans, and learning to plot and manipulate data is not exactly straightforward. Plus, most non-devs tend to just zzz out whenever you put them against a wall of code.
So while backtesting trades makes a lot of sense - and a lot of money - for crypto capital funds and big portfolio managers, the barrier to entry is usually considered too high for little Joe Retail.
In reality, with just a few lines of code and the right set of data, you could literally run hundreds of high ROI backtests, and discover new, uniquely profitable market alphas.
This is what we’ll teach you to do in this article.
Part One: Setting the Stage
In this post, we will test a simple moving average crossover strategy for Bitcoin, using a 50-day moving average and a 100-day moving average on a 1-year time frame. By the end of it, hopefully we’ll remove some of the mystery of quantitative trading and prove how easy and powerful it can really be.
Our strategy is going to be very straightforward: over the last 365 days, buy BTC whenever its 50-day MA is above its 100-day MA, and sell whenever it’s not. We will end by plotting the results of our strategy on a chart and comparing it to the benchmark (HODLing BTC).
Below is a quick and dirty way to calculate the returns of a trading strategy over time. You can also find the full code in this Google Colab notebook. But don’t be fooled by its ‘simplicity’: even this can already give you a huge advantage over other traders and save you a lot of money that you’d have otherwise spent trading useless, untested strategies.
To perform the world’s easiest backtest, we’ll use Python 3 and just two modules:
- 1.) Our own Sanpy module, which lets you tap into Santiment data for 900 cryptocurrencies
- 2.) Matplotlib module for plotting our backtest results onto a chart
Sanpy is a custom Python wrapper developed by Santiment, and it’s ideal for data scientists and beginner quants. This is the only module that lets you pull raw on-chain, pricing, social and development data for over 900 cryptocurrencies with just a few lines of code.
You can get access to Sanpy and our full API here.
If you don’t have Python installed, you can run the entire backtest in Google Collaboratory, a free Jupyter notebook environment that requires no setup and runs entirely in the cloud. Just click on File->New Notebook and you’re off to the races.
Keep in mind that the following backtest is leaving out quite a bit. For example, you’ll notice that we do not account for any transaction costs that you’d pay for each trade executed as part of the strategy. This would naturally reduce your overall performance compared to our results.
Also, you would normally calculate a set of relevant metrics on top of the pure performance, including the standard deviation of the strategy, its Sharpe Ratio, the maximum drawdown and a few others. Let us know if you’d like us to teach you how to do this in a future article!
Part Two: Running Your First Backtest
Before we can start backtesting, we first need to get some data to backtest. In this case, we’ll pull the last year of Bitcoin’s OHLCV data with one Sanpy query:
If this is your first time using Python, don’t panic. Let’s break down this block of code nice and slow:
The first two lines import the only two modules we will need for our backtest:
- 1.) Our Sanpy module (SAN) that lets us import Bitcoin’s OHLCV data
- 2.) The matplotlib module for plotting the results of our backtest on a chart
Once we imported the modules, we loaded the last year (default value) of Bitcoin’s OHLCV data using the san.get() function.
We also defined this function as data, so from now on, whenever you see ‘data’ in our code, think ‘a function that returns the last year of Bitcoin’s OHLCV data’
You can check out all the available functions in Sanpy and examples for each in our documentation.
The last line - data.head() - will return the first 5 rows of OHLCV data, so we can quickly test if we actually have the right information. The return will look like this:
So, to recap: we’ve imported the necessary modules for our backtest, and used san.get() to pull the last year of Bitcoin’s OHLCV data.
As mentioned, our backtest revolves around two indicators: a 50-day and 100-day moving average. So with all the preparations made, let’s generate our moving averages:
The first two lines define our moving averages. As you can see, we execute the rolling() function on the closing BTC price, in order to apply it over the rolling windows of 50 and 100 days, respectively. We then apply the mean() function over the results to get a moving average.
The third line generates Bitcoin’s daily returns by calculating the percent change between the previous day’s BTC closing price and the current day’s BTC closing price. This will be needed later.
So, to recap: we have now defined our 50 and 100-day moving averages, and created a function that will calculate daily BTC returns (in percentages).
Now that we have all of our data imported and our indicators in place, we can finally define and backtest our MA crossover strategy in just two lines of code. Here they are:
The first line defines our main condition: we want to buy Bitcoin every day that the smaller moving average (50 days) is above the larger moving average (100 days).
The result of the first line of code will be a sequence of ‘True’ and ‘False’ values for every day in our time frame, depending on whether the 50-day MA is above 100-day MA (returns ‘True’) or below it (returns ‘False’).
A key thing to know is that Python encodes TRUE as 1 and FALSE as 0. While this sounds strange, you can test it by typing TRUE + TRUE in your Python terminal. The output will be 2. This will be important in a minute.
The second line defines our whole strategy, and is the most complicated line in the entire backtest. Let’s break it down:
- 1.) data.returns.shift(-1) * trades
Here we tell Python to multiply BTC’s daily returns (defined previously as % change from yesterday to today) by trades (defined as a series of ‘True’ and ‘False’ values).
As we just explained, True = 1 and False = 0 in Python. So what we’re actually saying to Python is:
- Multiply the daily BTC returns by 1 when 50-day MA > 100-day MA
- Multiply the daily BTC returns by 0 when 50-day MA < 100-day MA
In other words, we’re keeping the daily returns (simulates buying Bitcoin) on all days that our condition has been met, and discarding the daily returns (simulates selling Bitcoin) on all days that our condition has not been met.
We also use the shift(-1) function to move the daily returns up by one day. If we don’t do this, we would essentially be trading today’s market based on today’s returns. In other words, we would look at BTC’s returns in the evening, and if they meet our condition, would essentially go back in time and buy BTC in the morning. That’s no bueno.
Since this would be impossible in reality, we shift the returns up by one day so that we have tomorrows returns next to today’s MAs. This way we make sure that - if the daily returns meet our condition - we can only execute the order the following morning, thereby not breaking the laws of physics.
- 2.) Adding +1 to the strategy’s returns and calculating its cumulative product
By invoking the cumprod() function, we sum up all the daily BTC returns on days that met our condition (50MA>100MA) over the set time frame (1 year). Since we are using normal returns (and not log returns) it is not as easy as just summing them together, so we calculate its cumulative returns instead.
And that’s it! Finally, we can plot the resulting performance of our strategy by using the plot() function of the matplotlib module:
The above chart shows the cumulative returns of our strategy: we bought BTC whenever its 50-day MA was above its 100-day MA, and sold whenever it’s not.
Every time that our strategy decides to sell, we see the chart flatlining (we’re not adding Bitcoin’s daily returns to our strategy).
The y-scale on the chart shows the multiple of our returns. A good way to think of this is as our ROI on investing 1$ in this model. By following our MA crossover strategy, we would have turned our $1 into ~$2.5 within a year - pocketing 150% gains. Not too shabby! Or is it?
Our performance doesn’t really tell us much if we don’t compare it to the performance of other investment strategies. The strategy that we often use as benchmark in our backtests is HODLing over the same timeframe. The reason is simple - if you’re actively trading Bitcoin, you would at least hope to do better than simply holding Bitcoin, right? Otherwise, what's the point in trading?
To calculate the returns of HODLing, we do almost the same thing as before - calculate Bitcoin’s cumulative daily returns - except this time we remove [* trades] part of the code, so that we don’t discard the days when our condition (50-day MA>100-day MA) hasn’t been met. This will leave us with cumulative BTC returns for all 365 days.
Then, we just plot both strategies at the same time to compare its results:
As you can see, while our MA crossover strategy would turn every dollar you invested into ~$2.5, HODLing Bitcoin would turn every dollar you invested into ~$2. So we beat the benchmark!
If you’re interested in the exact performance numbers, you can run the following code:
Why the index -2 instead of -1? Because the last entry in the time series was turned to NaN when you shifted the whole time series up by one.
Congratulations - you just backtested your first strategy!
Here’s the full code:
As mentioned you can also get access to the full code here.
In total, we only needed 12 lines of code to get the data, backtest a moving average strategy AND plot the results against holding. Not bad!
For a slightly more impressive visualization, you can add a few additional lines at the end to make our matplotlib charts pop:
This returns the following graph:
Part Three: Closing thoughts
As stated already, this is a very raw backtest. Professional backtesting software is typically much more complex, and if you’re a serious trader or quant you would want to have these in your arsenal. However, when it comes to proving its usefulness, a simple backtest is better than no backtest at all.
If you want to improve this backtest, the next steps could include:
- Adding a counter that tells you how many trades you would have executed
- Calculating transaction costs every time the strategy does a trade and remove those costs from the performance of the backtest
- Calculating risk metrics like the volatility, Sharpe ratio and maximum drawdown to get more insights into the strategies performance.
As for backtesting in general, it might also be important to note that the performance of a backtest will almost always overstate the returns you would get in real trading. Even if you account for transaction costs, slippage and other possible biases, the fact that you are testing a lot of different strategies on the same historical data is a bias in itself.
Nevertheless, backtesting is an immensely useful tool that can help you estimate the potential of your strategy without actually trading and should be a major part of every trader’s toolbox.
If you want to start backtesting your own crypto strategies - basic or advanced - go visit Sanpy, the only Python wrapper that lets you pull raw on-chain, pricing, social and development data on 900+ cryptocurrencies.
Happy data crunching!