header source
my icon
esplo.net
ぷるぷるした直方体
Cover Image for Released a downloader that retrieves execution data using exchange APIs

Released a downloader that retrieves execution data using exchange APIs

だいたい 8 分で読めます

Even if "the ultimate trading strategy I came up with" is completed, it's unclear whether it would work in the actual market.
However, if you actually trade, your funds will quickly run out, or you can only verify for a limited period.
Therefore, to conduct backtests using historical data for verification, you first need to collect data.

There are various granularities to consider for the historical data to be used.
If you want to verify for 30 days, it would be 24 * 30 = 720 candles for 1-hour intervals, and 720 * 60 = 43,200 candles for 1-minute intervals.
Essentially, it's irreversibly compressed data, so data volume and verification accuracy are considered to be in a trade-off relationship.

So, what should be used to verify trading strategies that are very timing-sensitive?
The smallest granularity of data that can be obtained later is execution data like this (example from bitFlyer's /executions):

{
  "id": 39287,
  "side": "BUY",
  "price": 31690,
  "size": 27.04,
  "exec_date": "2015-07-08T02:43:34.823",
  "buy_child_order_acceptance_id": "JRF20150707-200203-452209",
  "sell_child_order_acceptance_id": "JRF20150708-024334-060234"
}

This kind of information about "when, where, how much, bought/sold" is likely provided by any exchange with an API.

How much data would this amount to over 30 days?
It varies greatly depending on the period and exchange, but for bitFlyer, it was 80 million.
With 1-minute intervals being just over 40,000, this is about 2,000 times more.

Now, with this much data, it obviously can't be retrieved in a single request.
The retrieval limit is usually 500 or 1,000, so you need to hit the API at least 80,000 times.
Since you can't retrieve from the API every time, you also need a process to store it in local storage.

You need to update query parameters skillfully, and the calling method and data format differ for each exchange.
Ah, it's troublesome...

Library Description

So, I published a downloader library as a Rust crate.

https://github.com/esplo/pikmin

https://crates.io/crates/pikmin

It has the following features:

  • Repeatedly hits the API to write to MySQL or standard output, etc.
  • Resuming from interruptions
  • Easy customization by simply implementing and passing traits
  • Built-in downloaders for bitFlyer, Liquid, and BitMEX

By implementing each of the following individually, you can change the behavior to your liking:

  • Retrieval from exchange API
  • Writing execution data
  • Writing progress

For example, if you want to get data from Binance, you just need to implement and pass the retrieval part from the API.
Whether to output to MySQL or CSV, or if running on GCP, setting the progress writing layer to Cloud Storage, etc.,
you can do it with minimal description according to your environment.

Please refer to the doc for details.

About Publishing the Crate

It was my first crate publish, but it was relatively comfortable with documentation, CI settings, and tools in place.
Especially when using TravisCI, you can just port the .travis.yml from predecessors.
Also, using cargo-release allows you to automate the version-up process.

https://rust-lang-nursery.github.io/api-guidelines/

To detect areas without doc comments or redundant descriptions, it's good to apply warnings and Clippy lints beforehand, like this:

#![warn(
  missing_debug_implementations,
  missing_docs,
  trivial_numeric_casts,
  unsafe_code,
  unused_extern_crates,
  unused_import_braces,
  unused_qualifications)
]
#![deny(clippy::pedantic)]
#![deny(warnings)]
#![allow(clippy::doc_markdown)]
#![allow(clippy::stutter)]
#![allow(clippy::cast_precision_loss)]
#![allow(clippy::cast_sign_loss)]
#![allow(clippy::cast_possible_wrap)]

Summary

To simplify the first step of backtesting, which is retrieving trading data, I made common processes into a library.
Also, by separating layers, I made it easy to extend and handle environment-dependent processes.

If there's a need, please use it.

Share