HF Data Library

Free, research-grade, high-frequency U.S. equity data for academics and researchers. Documented, version-controlled, and updated weekly daily.

1,391 Tickers
1,533,014,567 1-Minute Bars
23+ Years of Data
27 Academic Variables
All new — data is now updated daily.

What is this?

One-minute OHLCV bars for 1,391 U.S. equities and ETFs, from December 2002 through the present.

Two cleaning versions (Raw and Clean) so you can choose the level of processing appropriate for your research. Twenty-five pre-computed academic variables per ticker per day. Full methodology documentation.

Updated every week. No subscription. No paywall. Licensed under CC BY 4.0.

# Python — load any ticker in seconds
import pandas as pd

df = pd.read_parquet("AAPL_clean.parquet")
print(df.head())

# datetime              Open    High    Low     Close   Volume
# 2002-12-30 09:30:00   0.98    0.99    0.98    0.98    842900
# 2002-12-30 09:31:00   0.98    0.99    0.98    0.99    521400
# ...

Two cleaning versions. You choose.

Raw

Version 1: Raw

Data as received from the source. No outlier removal, no gap-filling. Prices are split/dividend adjusted. 1.5+B bars.

Best for: Market microstructure research, missingness analysis, studying the data itself.

Clean

Version 2: Clean

Nine-step cleaning pipeline applied: outside-hours removal, non-positive prices, OHLC violations, duplicate bars, Brownlees-Gallo outlier filter. Gaps preserved. 1.5+B bars.

Best for: Volatility estimation, spread measurement, jump detection — most empirical finance.

Note: A gap-filled version is not distributed. Gap-filling (LOCF) introduces biases documented in the accompanying paper — researchers who need a regular grid can apply LOCF to the Clean version themselves.

25 pre-computed academic variables

Computed daily for each ticker in each cleaning version. Ready to use in your research.

σ

Volatility

Realized variance (1-min, 5-min), bipower variation, Parkinson range, Yang-Zhang OHLC

Spreads

Roll (1984) implied spread, Corwin-Schultz (2012) high-low spread

Autocorrelation

First-order return AC(1), variance ratio VR(5), VR(10)

Jump Detection

BNS z-statistic, jump indicators at 1% and 5% significance

$

Liquidity

Amihud illiquidity, daily dollar volume, share volume, observed trade count

Data Quality

Gap rate, observed/filled bar counts, longest gap, bars since last trade

Multiple ways to access the data

Browser Download

Download individual tickers or pre-packaged bundles (S&P 500, Nasdaq 100, by sector). Click and go — no account needed for basic downloads.

Browse Downloads
{ }

REST API

Programmatic access to any ticker, date range, and version. JSON, CSV, or parquet. Free API key with 300 requests/minute. Python, R, and Stata examples provided.

API Docs
📦

Bulk Download

Full dataset dump — all 1,391 tickers, all versions, all timeframes. Parquet format. Updated daily.

Full Dataset

How this compares

Feature HF Data Library CRSP/TAQ Yahoo Finance Polygon.io
Price Free $25,000+/yr Free $199+/mo
Frequency 1-minute bars Tick-level Daily only 1-minute bars
Cleaning versions 3 versions 1 version None None
Cleaning documentation Full pipeline Minimal None None
Academic variables 25 measures None None None
Data quality scores Per-ticker No No No
REST API Free No Unofficial Paid
DOI / Citable Zenodo DOI No No No
License CC BY 4.0 Restrictive ToS restricted Commercial
Updated Weekly Daily Quarterly Daily Real-time