Complete column definitions and computed academic variable specifications.
Each file contains one-minute OHLCV bars for a single U.S. equity or ETF.
| Column | Type | Description |
|---|---|---|
| datetime | datetime64 | Bar timestamp (Eastern Time), format: YYYY-MM-DD HH:MM:SS |
| Date | string | Trading date, format: MM/DD/YYYY |
| Time | string | Bar time, format: HHMM (24-hour) |
| Open | float64 | Opening price of the 1-minute bar (split/dividend adjusted) |
| High | float64 | Highest price during the 1-minute bar |
| Low | float64 | Lowest price during the 1-minute bar |
| Close | float64 | Closing price of the 1-minute bar |
| Volume | int64 | Number of shares traded during the 1-minute bar |
| source | string | Data source: "pitrading" (pre-2022) or "alpaca" (post-2022, IEX via Alpaca) |
| is_filled | bool | True if this bar was gap-filled (Version 3 only) |
Computed daily for each ticker in each cleaning version.
| # | Variable | Formula / Reference |
|---|---|---|
| 1 | Realized variance (5-min) | RV = Σ rt,i² using 5-minute sampled returns |
| 2 | Realized variance (1-min) | RV = Σ rt,i² using all 1-minute returns |
| 3 | Bipower variation | BV = (π/2) Σ |rt,i| |rt,i-1| (Barndorff-Nielsen and Shephard 2004) |
| 4 | Parkinson range volatility | σ² = (1/4 ln 2) (ln H/L)² (Parkinson 1980) |
| 5 | Yang-Zhang volatility | OHLC-based estimator (Yang and Zhang 2000) |
| # | Variable | Formula / Reference |
|---|---|---|
| 6 | Roll implied spread | S = 2√(−Cov(rt, rt-1)) in basis points (Roll 1984) |
| 7 | Corwin-Schultz spread | High-low spread estimator (Corwin and Schultz 2012) |
| # | Variable | Formula / Reference |
|---|---|---|
| 8 | AC(1) | First-order autocorrelation of 1-minute log returns |
| 9 | VR(5) | Variance ratio: Var(5-min returns) / [5 × Var(1-min returns)] |
| 10 | VR(10) | Variance ratio: Var(10-min returns) / [10 × Var(1-min returns)] |
| # | Variable | Formula / Reference |
|---|---|---|
| 11 | BNS z-statistic | z = (RV − BV) / √(θ × max(1, QV/BV²)) (Barndorff-Nielsen and Shephard 2006) |
| 12 | BNS jump (1%) | Indicator: 1 if z > 2.326 |
| 13 | BNS jump (5%) | Indicator: 1 if z > 1.645 |
| # | Variable | Formula / Reference |
|---|---|---|
| 14 | Amihud illiquidity | |rdaily| / dollar volume (Amihud 2002) |
| 15 | Daily dollar volume | Σ (Closei × Volumei) |
| 16 | Daily share volume | Σ Volumei |
| 17 | Number of trades | Count of observed (non-filled) bars |
| # | Variable | Formula / Reference |
|---|---|---|
| 18 | Gap rate | Fraction of the 390-bar daily grid with no trade |
| 19 | Observed bars | Number of bars with actual trades |
| 20 | Filled bars | Number of bars filled by LOCF (Version 3) |
| 21 | Longest gap | Maximum consecutive missing bars in the day |
| 22 | is_filled count | Count of filled bars in the day |
| 23 | Max bars since last trade | Largest gap between consecutive observed bars |
| # | Variable | Formula / Reference |
|---|---|---|
| 24 | Open-to-close return | ln(Closelast / Openfirst) |
| 25 | Overnight return | ln(Opentoday / Closeyesterday) |
| 26 | Daily high-low range | ln(Highmax / Lowmin) |
| 27 | Intraday return std | Standard deviation of 1-minute log returns |