Free, research-grade, high-frequency U.S. equity data for academics and researchers. Documented, version-controlled, and updated weekly.
One-minute OHLCV bars for 1,391 U.S. equities and ETFs, from December 2002 through the present.
Three cleaning versions so you can choose the level of processing appropriate for your research. Twenty-seven pre-computed academic variables per ticker per day. Full methodology documentation.
Updated every week. No subscription. No paywall. Licensed under CC BY 4.0.
# Python — load any ticker in seconds
import pandas as pd
df = pd.read_parquet("AAPL_clean.parquet")
print(df.head())
# datetime Open High Low Close Volume
# 2002-12-30 09:30:00 0.98 0.99 0.98 0.98 842900
# 2002-12-30 09:31:00 0.98 0.99 0.98 0.99 521400
# ...
Data as received from the source. No outlier removal, no gap-filling. Prices are split/dividend adjusted. 1,533,403,126 bars.
Best for: Market microstructure research, missingness analysis, studying the data itself.
Nine-step cleaning pipeline applied: outside-hours removal, non-positive prices, OHLC violations, duplicate bars, Brownlees-Gallo outlier filter. Gaps preserved. 1,533,014,567 bars.
Best for: Volatility estimation, spread measurement, jump detection — most empirical finance.
Clean data with LOCF gap-filling to produce a regular 390-bar daily grid (09:30–15:59 ET). Every bar flagged as original or filled. 2,342,519,726 bars.
Best for: Machine learning, backtesting systems, time-series models requiring regular grids.
Computed daily for each ticker in each cleaning version. Ready to use in your research.
Realized variance (1-min, 5-min), bipower variation, Parkinson range, Yang-Zhang OHLC
Roll (1984) implied spread, Corwin-Schultz (2012) high-low spread
First-order return AC(1), variance ratio VR(5), VR(10)
BNS z-statistic, jump indicators at 1% and 5% significance
Amihud illiquidity, daily dollar volume, share volume, observed trade count
Gap rate, observed/filled bar counts, longest gap, bars since last trade
Download individual tickers or pre-packaged bundles (S&P 500, Nasdaq 100, by sector). Click and go — no account needed for basic downloads.
Browse DownloadsProgrammatic access to any ticker, date range, and version. JSON, CSV, or parquet. Free API key with 300 requests/minute. Python, R, and Stata examples provided.
API DocsFull dataset dump — all 1,391 tickers, all versions, all timeframes. Parquet format. Updated weekly.
Full Dataset| Feature | HF Data Library | CRSP/TAQ | Yahoo Finance | Polygon.io |
|---|---|---|---|---|
| Price | Free | $25,000+/yr | Free | $199+/mo |
| Frequency | 1-minute bars | Tick-level | Daily only | 1-minute bars |
| Cleaning versions | 3 versions | 1 version | None | None |
| Cleaning documentation | Full pipeline | Minimal | None | None |
| Academic variables | 27 measures | None | None | None |
| Data quality scores | Per-ticker | No | No | No |
| REST API | Free | No | Unofficial | Paid |
| DOI / Citable | Zenodo DOI | No | No | No |
| License | CC BY 4.0 | Restrictive | ToS restricted | Commercial |
| Updated | Weekly (automated) | Quarterly | Daily | Real-time |