Released a downloader that retrieves execution data using exchange APIs
Table of Contents
Even if "the ultimate trading strategy I came up with" is completed, it's unclear whether it would work in the actual market.
However, if you actually trade, your funds will quickly run out, or you can only verify for a limited period.
Therefore, to conduct backtests using historical data for verification, you first need to collect data.
There are various granularities to consider for the historical data to be used.
If you want to verify for 30 days, it would be 24 * 30 = 720 candles for 1-hour intervals, and 720 * 60 = 43,200 candles for 1-minute intervals.
Essentially, it's irreversibly compressed data, so data volume and verification accuracy are considered to be in a trade-off relationship.
So, what should be used to verify trading strategies that are very timing-sensitive?
The smallest granularity of data that can be obtained later is execution data like this (example from bitFlyer's /executions
):
{
"id": 39287,
"side": "BUY",
"price": 31690,
"size": 27.04,
"exec_date": "2015-07-08T02:43:34.823",
"buy_child_order_acceptance_id": "JRF20150707-200203-452209",
"sell_child_order_acceptance_id": "JRF20150708-024334-060234"
}
This kind of information about "when, where, how much, bought/sold" is likely provided by any exchange with an API.
How much data would this amount to over 30 days?
It varies greatly depending on the period and exchange, but for bitFlyer, it was 80 million.
With 1-minute intervals being just over 40,000, this is about 2,000 times more.
Now, with this much data, it obviously can't be retrieved in a single request.
The retrieval limit is usually 500 or 1,000, so you need to hit the API at least 80,000 times.
Since you can't retrieve from the API every time, you also need a process to store it in local storage.
You need to update query parameters skillfully, and the calling method and data format differ for each exchange.
Ah, it's troublesome...
Library Description
So, I published a downloader library as a Rust crate.
It has the following features:
- Repeatedly hits the API to write to MySQL or standard output, etc.
- Resuming from interruptions
- Easy customization by simply implementing and passing traits
- Built-in downloaders for bitFlyer, Liquid, and BitMEX
By implementing each of the following individually, you can change the behavior to your liking:
- Retrieval from exchange API
- Writing execution data
- Writing progress
For example, if you want to get data from Binance, you just need to implement and pass the retrieval part from the API.
Whether to output to MySQL or CSV, or if running on GCP, setting the progress writing layer to Cloud Storage, etc.,
you can do it with minimal description according to your environment.
Please refer to the doc for details.
About Publishing the Crate
It was my first crate publish, but it was relatively comfortable with documentation, CI settings, and tools in place.
Especially when using TravisCI, you can just port the .travis.yml
from predecessors.
Also, using cargo-release allows you to automate the version-up process.
To detect areas without doc comments or redundant descriptions, it's good to apply warnings and Clippy lints beforehand, like this:
#![warn(
missing_debug_implementations,
missing_docs,
trivial_numeric_casts,
unsafe_code,
unused_extern_crates,
unused_import_braces,
unused_qualifications)
]
#![deny(clippy::pedantic)]
#![deny(warnings)]
#![allow(clippy::doc_markdown)]
#![allow(clippy::stutter)]
#![allow(clippy::cast_precision_loss)]
#![allow(clippy::cast_sign_loss)]
#![allow(clippy::cast_possible_wrap)]
Summary
To simplify the first step of backtesting, which is retrieving trading data, I made common processes into a library.
Also, by separating layers, I made it easy to extend and handle environment-dependent processes.
If there's a need, please use it.