Known Issues

Living errata. Every discovered issue documented with date and resolution.

Transparency policy: All data issues are documented here as they are discovered. If you find an issue, please report it. Reporters are credited.

Source limitations

Important: Data source change at March 2022

The data source changes at March 2022. Pre-2022 data comes from the full consolidated tape (all U.S. exchanges). Post-2022 data comes from IEX Exchange only (~2–3% of consolidated volume). This creates a structural break. Please read the details below before using this data in research.

Pre-March 2022: Consolidated tape (PiTrading)

Coverage: Full consolidated tape (CTA/UTP) — captures all trades across all U.S. equity exchanges (NYSE, Nasdaq, ARCA, BATS, IEX, etc.).

This data represents the complete picture of U.S. equity trading. Volume figures, OHLCV bars, and all derived academic measures reflect the full market. This is the same underlying data used by CRSP and TAQ.

Post-March 2022: IEX Exchange HIST (single exchange)

Status: Known limitation, documented

Coverage: IEX Exchange only — approximately 2–3% of total consolidated volume.

Post-March 2022 data is sourced from IEX Exchange HIST, which provides publicly available pcap captures of IEX's own trading activity. IEX is the only U.S. exchange that makes its historical data freely available and allows redistribution.

What this means for researchers:

When IEX data is appropriate:

When IEX data is NOT appropriate:

How to identify the data source: Each bar has a source column. Bars with source="pitrading" are from the consolidated tape. Bars with source="iex" are from IEX Exchange HIST only.

Why not use a full-tape source?

Consolidated tape data (CTA/UTP) is owned by NYSE and Nasdaq. All providers who redistribute it (Alpaca, Polygon, Yahoo Finance, etc.) charge licensing fees and prohibit re-redistribution in their terms of service. IEX Exchange HIST is the only U.S. equity exchange that makes its historical data freely available and permits redistribution with attribution. This is a trade-off between coverage and accessibility.

Researchers who require full consolidated tape data for post-2022 analysis should supplement this library with institutional access to WRDS/TAQ or similar licensed sources.

Splice boundary (March 2022)

Status: Known structural break

The transition from PiTrading (consolidated tape) to IEX Exchange HIST (single exchange) at March 2022 creates a structural break in volume, trade count, and potentially in price coverage. Step 9 of the cleaning pipeline checks for artificial price jumps at this boundary, but the volume discontinuity is inherent to the source change and cannot be cleaned away.


No other issues reported

This page will be updated as issues are discovered. Check back after each data update.


Report an issue

Email [email protected] with the ticker, date(s), and description. See the contact page for full reporting guidelines.