Living errata. Every discovered issue documented with date and resolution.
The data source changes at March 2022. Pre-2022 data comes from the full consolidated tape (all U.S. exchanges). Post-2022 data comes from IEX Exchange only (~2–3% of consolidated volume). This creates a structural break. Please read the details below before using this data in research.
Coverage: Full consolidated tape (CTA/UTP) — captures all trades across all U.S. equity exchanges (NYSE, Nasdaq, ARCA, BATS, IEX, etc.).
This data represents the complete picture of U.S. equity trading. Volume figures, OHLCV bars, and all derived academic measures reflect the full market. This is the same underlying data used by CRSP and TAQ.
Status: Known limitation, documented
Coverage: IEX Exchange only — approximately 2–3% of total consolidated volume.
Post-March 2022 data is sourced from IEX Exchange HIST, which provides publicly available pcap captures of IEX's own trading activity. IEX is the only U.S. exchange that makes its historical data freely available and allows redistribution.
What this means for researchers:
When IEX data is appropriate:
When IEX data is NOT appropriate:
How to identify the data source: Each bar has a source column. Bars with source="pitrading" are from the consolidated tape. Bars with source="iex" are from IEX Exchange HIST only.
Consolidated tape data (CTA/UTP) is owned by NYSE and Nasdaq. All providers who redistribute it (Alpaca, Polygon, Yahoo Finance, etc.) charge licensing fees and prohibit re-redistribution in their terms of service. IEX Exchange HIST is the only U.S. equity exchange that makes its historical data freely available and permits redistribution with attribution. This is a trade-off between coverage and accessibility.
Researchers who require full consolidated tape data for post-2022 analysis should supplement this library with institutional access to WRDS/TAQ or similar licensed sources.
Status: Known structural break
The transition from PiTrading (consolidated tape) to IEX Exchange HIST (single exchange) at March 2022 creates a structural break in volume, trade count, and potentially in price coverage. Step 9 of the cleaning pipeline checks for artificial price jumps at this boundary, but the volume discontinuity is inherent to the source change and cannot be cleaned away.
This page will be updated as issues are discovered. Check back after each data update.
Email [email protected] with the ticker, date(s), and description. See the contact page for full reporting guidelines.