The binance-historical-data Python package offers a streamlined solution for acquiring comprehensive historical cryptocurrency market data from Binance. It enables developers, traders, and researchers to build a local, ready-to-use dataset of prices and volumes with minimal code.
This tool is designed for efficiency. You don't even need a Binance account to begin downloading vast amounts of historical data. The package handles the entire process: it fetches the data from Binance servers, dumps it locally, and unzips it, ensuring you have a perfect mirror of the data for offline analysis.
A key feature is its simplicity. Downloading a full historical dataset or updating an existing one can be accomplished with just three lines of Python code. However, users should note that there is a slight delay in data availability. The previous day's data typically appears on Binance's servers a few minutes after midnight UTC.
Installation and Setup
Getting started is straightforward. The package can be installed using pip, Python's standard package manager.
pip install binance_historical_dataAfter installation, you can import the library and initialize the main data dumper object, which is the core of the package's functionality.
Initializing the Data Dumper
The BinanceDataDumper class is your primary interface. You configure it by specifying where to save the data and what type of information you want to collect.
from binance_historical_data import BinanceDataDumper
data_dumper = BinanceDataDumper(
path_dir_where_to_dump=".",
asset_class="spot",
data_type="klines",
data_frequency="1m"
)Here’s a breakdown of the initialization arguments:
path_dir_where_to_dump: (String) The path to the directory on your local machine where the downloaded data will be saved.asset_class: (String) The market from which to pull data. The options are:spot: For the spot market.um: For USD(ⓢ)-Margined futures.cm: For Coin-Margined futures.
data_type: (String) The type of market data to download. For the spot market, you can choose fromaggTrades,klines, ortrades. Futures markets support these plus additional types likeindexPriceKlinesandmarkPriceKlines.data_frequency: (String) The timeframe of the candlestick data. This can be any of the following intervals:1m,3m,5m,15m,30m,1h,2h,4h,6h,8h, or12h.
Core Function: Dumping Data
The main method for retrieving data is dump_data(). It is highly configurable, allowing you to specify exactly which assets and date ranges you're interested in.
data_dumper.dump_data(
tickers=None,
date_start=None,
date_end=None,
is_to_update_existing=False,
tickers_to_exclude=["UST"]
)Key Parameters for Data Download
tickers: (List) A list of specific trading pairs (e.g.,['BTCUSDT', 'ETHUSDT']) to download. If set toNone, the package will automatically download data for all available USDT-quoted trading pairs.date_start&date_end: (datetime.dateobjects) Define the start and end dates for the data range. IfNoneis used for the start date, the download will begin from the earliest available data (around 2017-01-01). IfNoneis used for the end date, it will default to the current day.is_to_update_existing: (Boolean) A crucial flag for managing your local dataset. When set toTrue, the dumper will check existing files and only download missing data for the specified period, making incremental updates efficient.tickers_to_exclude: (List) A list of ticker symbols you wish to skip during the download process.
Understanding the Data Output
When you download klines (candlestick) data, the resulting CSV files contain a standardized set of columns that provide a detailed view of market activity for each time interval:
- Open time (Timestamp)
- Open price
- High price
- Low price
- Close price
- Volume
- Close time (Timestamp)
- Quote asset volume
- Number of trades
- Taker buy base asset volume
- Taker buy quote asset volume
- Ignore
👉 Explore more strategies for analyzing this market data
Practical Usage Examples
The true power of this package is revealed through simple, practical code examples.
Downloading All Historical Data
To download the entire available history for all USDT trading pairs in the spot market, you simply call the dump_data() method without any arguments. Be aware that this initial full download is the most time-consuming operation and may take up to 40 minutes depending on your internet connection and the number of pairs.
data_dumper.dump_data()Updating Your Existing Data
Keeping your local dataset synchronized with the latest market data is incredibly simple. The same dump_data() method is used. The dumper intelligently checks your local files and only appends new data that has been generated since your last download.
data_dumper.dump_data()Reloading a Specific Time Period
If you need to refresh or re-download data for a specific historical period, you can specify the date range and set the is_to_update_existing flag to True.
import datetime
data_dumper.dump_data(
date_start=datetime.date(year=2021, month=1, day=1),
date_end=datetime.date(year=2022, month=1, day=1),
is_to_update_existing=True
)Additional Useful Methods
Beyond dumping data, the package provides several utility methods to help you manage and inspect your data collection.
get_list_all_trading_pairs(): Fetches a list of all available trading pairs from Binance for the configured asset class.get_min_start_date_for_ticker(): Returns the earliest date for which data is available for a specific ticker.get_all_tickers_with_data(timeperiod_per_file="daily"): Lists all tickers for which you have already downloaded data locally.get_all_dates_with_data_for_ticker(ticker, timeperiod_per_file="monthly"): Returns all dates for which local data exists for a given ticker.get_local_dir_to_data(ticker, timeperiod_per_file): Provides the full local directory path where data for a specific ticker is stored.create_filename(ticker, date_obj, timeperiod_per_file="monthly"): Generates the standardized filename used for a data file, which can be useful for custom data loading scripts.
Cleaning Up Daily Files
To save disk space, you can automatically delete daily data files for which consolidated monthly data has already been downloaded. This helps maintain a efficient local storage system.
data_dumper.delete_outdated_daily_results()👉 Get advanced methods for managing large financial datasets
Frequently Asked Questions
Do I need a Binance account or API keys to use this package?
No, that's a major advantage of this tool. It pulls publicly available historical data from Binance's servers without requiring user authentication, API keys, or even an account.
What is the difference between 'spot', 'um', and 'cm' asset classes?
'Spot' refers to the regular cryptocurrency market for immediate settlement. 'um' (USDⓈ-Margined Futures) and 'cm' (Coin-Margined Futures) are derivatives markets where traders use leverage; 'um' contracts are settled in USDT or BUSD, while 'cm' contracts are settled in the base coin itself.
Why would I exclude certain tickers like UST?
Some assets may become delisted or experience extreme volatility and breakdowns (like TerraUSD UST). Excluding them prevents errors during the download process and ensures you only collect data for active, stable trading pairs.
How do I handle the data delay after midnight UTC?
If you run your data update script immediately after midnight UTC, you might not get the previous day's complete data. It's best to schedule your daily update scripts to run at least 15-30 minutes after 00:00 UTC to ensure all data is available.
What can I use this historical data for?
This data is essential for backtesting trading algorithms, conducting quantitative research, performing technical analysis, building machine learning models for price prediction, and creating market visualization dashboards.
The first download is taking a very long time. Is this normal?
Yes, downloading the entire historical dataset for all USDT pairs is a significant task involving gigabytes of data. The initial download is the longest; subsequent updates for new data are much faster.