header source
my icon
esplo.net
ぷるぷるした直方体
Cover Image for A Downloader that Retrieves Trade Data Using Exchange APIs

A Downloader that Retrieves Trade Data Using Exchange APIs

about8mins to read

Even if I complete my "ultimate trading strategy," it's unclear whether it will work in actual markets.
However, when I actually trade, my funds may quickly deplete, and I can only verify it for a limited period.
Therefore, I need to conduct backtesting using historical data, but first, I need to collect the data.

The historical data I use has various granularities.
If I want to verify for 30 days, I need 720 pieces of 1-hour data and 43,200 pieces of 1-minute data.
In essence, the data volume and verification accuracy are in a trade-off relationship since the data is non-reversibly compressed.

So, what should I use for verifying extremely timing-sensitive trading strategies?
The minimum granularity of data that can be obtained later is trade data (e.g., bitFlyer's /executions).

{
  "id": 39287,
  "side": "BUY",
  "price": 31690,
  "size": 27.04,
  "exec_date": "2015-07-08T02:43:34.823",
  "buy_child_order_acceptance_id": "JRF20150707-200203-452209",
  "sell_child_order_acceptance_id": "JRF20150708-024334-060234"
}

This "when, where, how much, and bought/sold" information is provided by exchanges with APIs.

How much data will I get in 30 days?
It varies greatly depending on the period and exchange, but with bitFlyer, it's 80 million.
Since 1-minute data is around 40,000, it's approximately 2,000 times more.

Of course, I can't obtain all of it with a single request.
The acquisition limit is around 500 or 1,000, so I need to hit the API at least 80,000 times.
I also need to store the data locally since I can't obtain it all at once.

I need to update the query parameters skillfully, and the calling method and data format differ by exchange.
Ugh, it's a hassle….

Library Explanation

So, I released a downloader library as a Rust crate.

https://github.com/esplo/pikmin

https://crates.io/crates/pikmin

The library is equipped with the following features:

  • Repeatedly hitting the API and writing to MySQL or standard output
  • Resuming from interruptions
  • Simple customization using traits
  • Built-in downloaders for bitFlyer, Liquid, and BitMEX

You can change the behavior to your liking by implementing each of the following separately:

  • Obtaining data from exchange APIs
  • Writing trade data
  • Writing progress

For example, if you want to obtain data from Binance, you just need to implement the API acquisition part.
You can also change the output destination to MySQL or CSV, or use Cloud Storage for progress writing if you're running on GCP.
The environment-dependent processing is minimized.

Please refer to the doc for details.

About Crate Publication

I published my first crate, and the documentation and CI settings were relatively comfortable.
In particular, using TravisCI makes it easy to set up by just transplanting the .travis.yml file from predecessors.
Also, using cargo-release automates the version upgrade process.

https://rust-lang-nursery.github.io/api-guidelines/

It's a good idea to hang warnings and lints using Clippy beforehand to detect doc-less parts and redundant descriptions.

#![warn(
  missing_debug_implementations,
  missing_docs,
  trivial_numeric_casts,
  unsafe_code,
  unused_extern_crates,
  unused_import_braces,
  unused_qualifications)
]
#![deny(clippy::pedantic)]
#![deny(warnings)]
#![allow(clippy::doc_markdown)]
#![allow(clippy::stutter)]
#![allow(clippy::cast_precision_loss)]
#![allow(clippy::cast_sign_loss)]
#![allow(clippy::cast_possible_wrap)]

Summary

To simplify the first step of backtesting, which is obtaining trade data, I made a common process into a library.
By layering the processing, I made it easy to extend and environment-dependent processing.

Please feel free to use it if you need it.

Share