A Complete Guide to Downloading Binance Historical Crypto Data

·

The binance-historical-data Python package offers a streamlined solution for acquiring comprehensive historical cryptocurrency market data from Binance. It enables developers, traders, and researchers to build a local, ready-to-use dataset of prices and volumes with minimal code.

This tool is designed for efficiency. You don't even need a Binance account to begin downloading vast amounts of historical data. The package handles the entire process: it fetches the data from Binance servers, dumps it locally, and unzips it, ensuring you have a perfect mirror of the data for offline analysis.

A key feature is its simplicity. Downloading a full historical dataset or updating an existing one can be accomplished with just three lines of Python code. However, users should note that there is a slight delay in data availability. The previous day's data typically appears on Binance's servers a few minutes after midnight UTC.

Installation and Setup

Getting started is straightforward. The package can be installed using pip, Python's standard package manager.

pip install binance_historical_data

After installation, you can import the library and initialize the main data dumper object, which is the core of the package's functionality.

Initializing the Data Dumper

The BinanceDataDumper class is your primary interface. You configure it by specifying where to save the data and what type of information you want to collect.

from binance_historical_data import BinanceDataDumper

data_dumper = BinanceDataDumper(
    path_dir_where_to_dump=".",
    asset_class="spot",
    data_type="klines",
    data_frequency="1m"
)

Here’s a breakdown of the initialization arguments:

Core Function: Dumping Data

The main method for retrieving data is dump_data(). It is highly configurable, allowing you to specify exactly which assets and date ranges you're interested in.

data_dumper.dump_data(
    tickers=None,
    date_start=None,
    date_end=None,
    is_to_update_existing=False,
    tickers_to_exclude=["UST"]
)

Key Parameters for Data Download

Understanding the Data Output

When you download klines (candlestick) data, the resulting CSV files contain a standardized set of columns that provide a detailed view of market activity for each time interval:

👉 Explore more strategies for analyzing this market data

Practical Usage Examples

The true power of this package is revealed through simple, practical code examples.

Downloading All Historical Data

To download the entire available history for all USDT trading pairs in the spot market, you simply call the dump_data() method without any arguments. Be aware that this initial full download is the most time-consuming operation and may take up to 40 minutes depending on your internet connection and the number of pairs.

data_dumper.dump_data()

Updating Your Existing Data

Keeping your local dataset synchronized with the latest market data is incredibly simple. The same dump_data() method is used. The dumper intelligently checks your local files and only appends new data that has been generated since your last download.

data_dumper.dump_data()

Reloading a Specific Time Period

If you need to refresh or re-download data for a specific historical period, you can specify the date range and set the is_to_update_existing flag to True.

import datetime

data_dumper.dump_data(
    date_start=datetime.date(year=2021, month=1, day=1),
    date_end=datetime.date(year=2022, month=1, day=1),
    is_to_update_existing=True
)

Additional Useful Methods

Beyond dumping data, the package provides several utility methods to help you manage and inspect your data collection.

Cleaning Up Daily Files

To save disk space, you can automatically delete daily data files for which consolidated monthly data has already been downloaded. This helps maintain a efficient local storage system.

data_dumper.delete_outdated_daily_results()

👉 Get advanced methods for managing large financial datasets

Frequently Asked Questions

Do I need a Binance account or API keys to use this package?
No, that's a major advantage of this tool. It pulls publicly available historical data from Binance's servers without requiring user authentication, API keys, or even an account.

What is the difference between 'spot', 'um', and 'cm' asset classes?
'Spot' refers to the regular cryptocurrency market for immediate settlement. 'um' (USDⓈ-Margined Futures) and 'cm' (Coin-Margined Futures) are derivatives markets where traders use leverage; 'um' contracts are settled in USDT or BUSD, while 'cm' contracts are settled in the base coin itself.

Why would I exclude certain tickers like UST?
Some assets may become delisted or experience extreme volatility and breakdowns (like TerraUSD UST). Excluding them prevents errors during the download process and ensures you only collect data for active, stable trading pairs.

How do I handle the data delay after midnight UTC?
If you run your data update script immediately after midnight UTC, you might not get the previous day's complete data. It's best to schedule your daily update scripts to run at least 15-30 minutes after 00:00 UTC to ensure all data is available.

What can I use this historical data for?
This data is essential for backtesting trading algorithms, conducting quantitative research, performing technical analysis, building machine learning models for price prediction, and creating market visualization dashboards.

The first download is taking a very long time. Is this normal?
Yes, downloading the entire historical dataset for all USDT pairs is a significant task involving gigabytes of data. The initial download is the longest; subsequent updates for new data are much faster.