A Complete Guide to Downloading Binance Historical Crypto Data

The binance-historical-data Python package offers a streamlined solution for acquiring comprehensive historical cryptocurrency market data from Binance. It enables developers, traders, and researchers to build a local, ready-to-use dataset of prices and volumes with minimal code.

This tool is designed for efficiency. You don't even need a Binance account to begin downloading vast amounts of historical data. The package handles the entire process: it fetches the data from Binance servers, dumps it locally, and unzips it, ensuring you have a perfect mirror of the data for offline analysis.

A key feature is its simplicity. Downloading a full historical dataset or updating an existing one can be accomplished with just three lines of Python code. However, users should note that there is a slight delay in data availability. The previous day's data typically appears on Binance's servers a few minutes after midnight UTC.

Installation and Setup

Getting started is straightforward. The package can be installed using pip, Python's standard package manager.

pip install binance_historical_data

After installation, you can import the library and initialize the main data dumper object, which is the core of the package's functionality.

Initializing the Data Dumper

The BinanceDataDumper class is your primary interface. You configure it by specifying where to save the data and what type of information you want to collect.

from binance_historical_data import BinanceDataDumper

data_dumper = BinanceDataDumper(
    path_dir_where_to_dump=".",
    asset_class="spot",
    data_type="klines",
    data_frequency="1m"
)

Here’s a breakdown of the initialization arguments:

path_dir_where_to_dump: (String) The path to the directory on your local machine where the downloaded data will be saved.
asset_class: (String) The market from which to pull data. The options are:
- spot: For the spot market.
- um: For USD(ⓢ)-Margined futures.
- cm: For Coin-Margined futures.
data_type: (String) The type of market data to download. For the spot market, you can choose from aggTrades, klines, or trades. Futures markets support these plus additional types like indexPriceKlines and markPriceKlines.
data_frequency: (String) The timeframe of the candlestick data. This can be any of the following intervals: 1m, 3m, 5m, 15m, 30m, 1h, 2h, 4h, 6h, 8h, or 12h.

Core Function: Dumping Data

The main method for retrieving data is dump_data(). It is highly configurable, allowing you to specify exactly which assets and date ranges you're interested in.

data_dumper.dump_data(
    tickers=None,
    date_start=None,
    date_end=None,
    is_to_update_existing=False,
    tickers_to_exclude=["UST"]
)

Key Parameters for Data Download

tickers: (List) A list of specific trading pairs (e.g., ['BTCUSDT', 'ETHUSDT']) to download. If set to None, the package will automatically download data for all available USDT-quoted trading pairs.
date_start & date_end: (datetime.date objects) Define the start and end dates for the data range. If None is used for the start date, the download will begin from the earliest available data (around 2017-01-01). If None is used for the end date, it will default to the current day.
is_to_update_existing: (Boolean) A crucial flag for managing your local dataset. When set to True, the dumper will check existing files and only download missing data for the specified period, making incremental updates efficient.
tickers_to_exclude: (List) A list of ticker symbols you wish to skip during the download process.

Understanding the Data Output

When you download klines (candlestick) data, the resulting CSV files contain a standardized set of columns that provide a detailed view of market activity for each time interval:

Open time (Timestamp)
Open price
High price
Low price
Close price
Volume
Close time (Timestamp)
Quote asset volume
Number of trades
Taker buy base asset volume
Taker buy quote asset volume
Ignore

👉 Explore more strategies for analyzing this market data

Practical Usage Examples

The true power of this package is revealed through simple, practical code examples.

Downloading All Historical Data

To download the entire available history for all USDT trading pairs in the spot market, you simply call the dump_data() method without any arguments. Be aware that this initial full download is the most time-consuming operation and may take up to 40 minutes depending on your internet connection and the number of pairs.

data_dumper.dump_data()

Updating Your Existing Data

Keeping your local dataset synchronized with the latest market data is incredibly simple. The same dump_data() method is used. The dumper intelligently checks your local files and only appends new data that has been generated since your last download.

data_dumper.dump_data()

Reloading a Specific Time Period

If you need to refresh or re-download data for a specific historical period, you can specify the date range and set the is_to_update_existing flag to True.

import datetime

data_dumper.dump_data(
    date_start=datetime.date(year=2021, month=1, day=1),
    date_end=datetime.date(year=2022, month=1, day=1),
    is_to_update_existing=True
)

Additional Useful Methods

Beyond dumping data, the package provides several utility methods to help you manage and inspect your data collection.

get_list_all_trading_pairs(): Fetches a list of all available trading pairs from Binance for the configured asset class.
get_min_start_date_for_ticker(): Returns the earliest date for which data is available for a specific ticker.
get_all_tickers_with_data(timeperiod_per_file="daily"): Lists all tickers for which you have already downloaded data locally.
get_all_dates_with_data_for_ticker(ticker, timeperiod_per_file="monthly"): Returns all dates for which local data exists for a given ticker.
get_local_dir_to_data(ticker, timeperiod_per_file): Provides the full local directory path where data for a specific ticker is stored.
create_filename(ticker, date_obj, timeperiod_per_file="monthly"): Generates the standardized filename used for a data file, which can be useful for custom data loading scripts.

Cleaning Up Daily Files

To save disk space, you can automatically delete daily data files for which consolidated monthly data has already been downloaded. This helps maintain a efficient local storage system.

data_dumper.delete_outdated_daily_results()

👉 Get advanced methods for managing large financial datasets

Frequently Asked Questions

Do I need a Binance account or API keys to use this package?
No, that's a major advantage of this tool. It pulls publicly available historical data from Binance's servers without requiring user authentication, API keys, or even an account.

What is the difference between 'spot', 'um', and 'cm' asset classes?
'Spot' refers to the regular cryptocurrency market for immediate settlement. 'um' (USDⓈ-Margined Futures) and 'cm' (Coin-Margined Futures) are derivatives markets where traders use leverage; 'um' contracts are settled in USDT or BUSD, while 'cm' contracts are settled in the base coin itself.

Why would I exclude certain tickers like UST?
Some assets may become delisted or experience extreme volatility and breakdowns (like TerraUSD UST). Excluding them prevents errors during the download process and ensures you only collect data for active, stable trading pairs.

How do I handle the data delay after midnight UTC?
If you run your data update script immediately after midnight UTC, you might not get the previous day's complete data. It's best to schedule your daily update scripts to run at least 15-30 minutes after 00:00 UTC to ensure all data is available.

What can I use this historical data for?
This data is essential for backtesting trading algorithms, conducting quantitative research, performing technical analysis, building machine learning models for price prediction, and creating market visualization dashboards.

The first download is taking a very long time. Is this normal?
Yes, downloading the entire historical dataset for all USDT pairs is a significant task involving gigabytes of data. The initial download is the longest; subsequent updates for new data are much faster.