[ FAQ & Data Quality ]

> What data does MarketParquet provide?

We provide historical OHLCV (Open, High, Low, Close, Volume) data for US Stocks, ETFs, and Futures. Data is available in five timeframes:

  • 1-minute bars -- tick-level precision for intraday backtesting
  • 5-minute bars -- balance of detail and file size
  • 30-minute bars -- swing trading timeframe
  • 1-hour bars -- intermediate timeframe
  • Daily (EOD) bars -- end-of-day for longer-horizon analysis

All data is delivered as Apache Parquet files, partitioned by date.

> What is the date range?

Stock and ETF data covers January 2000 to present. Futures data covers December 2007 to present. New data is added every trading day after market close (~6:30 PM ET). Coverage includes all NYSE/NASDAQ-listed stocks, US-listed ETFs, and 130+ most active futures contracts.

> How is data quality ensured?

Our data pipeline runs automated checks including:

  • Gap detection -- identifying missing trading days
  • Duplicate screening -- ensuring no repeated bars
  • Price validation -- flagging anomalous OHLCV values
  • Volume consistency -- checking for truncated or zero-volume bars

Zero-volume bars are excluded from datasets. These gaps are normal and reflect low-liquidity periods common in small-cap stocks and pre/post-market sessions.

> How are stock splits and dividends handled?

Stock and ETF prices are adjusted for splits and dividends. Historical prices are retroactively adjusted so that price series are continuous and directly comparable over time. This is the standard format for backtesting -- no manual adjustment needed.

> What timezone is the data in?

All timestamps are in US/Eastern time. Regular market hours are 9:30 AM - 4:00 PM ET. The 1-minute data includes pre-market (4:00 AM) and after-hours (8:00 PM) sessions where available.

> Why Parquet format?

Parquet is a columnar storage format that offers significant advantages over CSV for financial data:

  • ~5-10x smaller than equivalent CSV (Snappy compression)
  • Typed columns -- no parsing timestamps or floats from strings
  • Column pruning -- read only the columns you need
  • Native support in pandas, polars, DuckDB, Spark, and most analytics tools

> What file structure do you use?

  by_date/{asset}_{timeframe}/YYYY-MM-DD.parquet

  examples:
    by_date/stock_1min/2024-01-15.parquet    (all stocks, 1-min bars)
    by_date/etf_daily/2024-01-15.parquet     (all ETFs, daily bars)
    by_date/futures_5min/2024-01-15.parquet   (all futures, 5-min bars)
            

Each file contains all symbols for a single trading day. This makes it easy to load cross-sectional data (all stocks on a given day) or build time series by reading a range of dates.

> How many symbols are covered?

Coverage varies by date as securities get listed and delisted. Recent files typically contain ~7,000+ stocks, ~3,000+ ETFs, and ~130+ futures contracts. Historical files include delisted securities, providing survivorship-bias-free data for backtesting.

> What is the difference between Free and Pro?

Feature Free Pro ($49/mo)
Daily (EOD) data Full history Full history
Intraday (1min-1hour) Last 30 days Full history
Asset types Stock, ETF, Futures Stock, ETF, Futures
Downloads/day 5 Unlimited
API access -- Yes

> How do I download data?

Web: Browse the data by asset type and date, then click download.

API (Pro): Use your API key with curl or any HTTP client:

  # List available assets and timeframes
  curl -H "Authorization: Bearer bt_YOUR_KEY" \
    https://marketparquet.com/api/v1/assets

  # List available dates for an asset type
  curl -H "Authorization: Bearer bt_YOUR_KEY" \
    https://marketparquet.com/api/v1/dates/stock_1min

  # Get a presigned download URL (valid 60s)
  curl -H "Authorization: Bearer bt_YOUR_KEY" \
    https://marketparquet.com/api/v1/download/stock_1min/2024-01-15

  # Download the file directly
  URL=$(curl -s -H "Authorization: Bearer bt_YOUR_KEY" \
    https://marketparquet.com/api/v1/download/stock_1min/2024-01-15 \
    | python3 -c "import sys,json; print(json.load(sys.stdin)['download_url'])")
  curl -o stock_1min_2024-01-15.parquet "$URL"
            

Full API docs at /docs (Swagger UI).

> How quickly is new data available?

Our pipeline runs daily after market close. New data is typically available by 6:30 PM ET on trading days. Weekends and market holidays are skipped.

> Can I use this data commercially?

Yes. Pro subscribers may use the data for personal trading, research, and internal business purposes. Redistribution of raw data files is not permitted.