Historical bar data missing data

I tried to use StockBarsRequest from alpaca-py to directly request 1-minute level data like this:

symbol = “NVDA”

timeframe = TimeFrame(1, TimeFrameUnit.Minute)

api_key = os.getenv(“APCA_API_KEY_ID”)

secret_key = os.getenv(“APCA_API_SECRET_KEY”)

client = StockHistoricalDataClient(api_key, secret_key)

os.makedirs(“./data/landing”, exist_ok=True)

end_date = datetime(2024, 5, 10)

start_date = datetime(2024, 5, 8)

request_params = StockBarsRequest(

symbol_or_symbols=[symbol], timeframe=timeframe, start=start_date, end=end_date

)

bars = client.get_stock_bars(request_params)

pandas_df = bars.df

pandas_df.reset_index(inplace=True)

pandas_df[‘timestamp’] = pandas_df[‘timestamp’].dt.tz_convert(‘America/New_York’)

And the returned data is like this:

It seems like we are missing data points like the minutes 04:03:00, 04:05:00, …

So I tried to directly download the raw trade records and manually aggregate them like this:

from alpaca.data.requests import StockTradesRequest

start_date = datetime(2024, 5, 8, 7, 30, 0)

end_date = datetime(2024, 5, 8, 8, 30, 0)

request_params = StockTradesRequest(

symbol_or_symbols=[symbol], timeframe=timeframe, start=start_date, end=end_date

)

trades = client.get_stock_trades(request_params)

trades_df = trades.df

trades_df.reset_index(inplace=True)

trades_df[‘timestamp’] = trades_df[‘timestamp’].dt.tz_convert(‘America/New_York’)

trades_df.set_index(‘timestamp’, inplace=True)

minute_aggregations = trades_df.resample(‘1min’).agg({
‘size’: ‘sum’, # Sum of size for volume
‘id’: ‘count’, # Count of IDs for trade count
‘price’: ‘ohlc’ # Open, high, low, close of price
})

print(“Individual minute volumes and trade counts:”)
minute_aggregations.head(10)

And we can get something like this:

Column 1 Column 2 Column 3 Column 4 E F G
Individual minute volumes and trade counts:
size id price
size id open high low close
timestamp
2024-05-08 04:00:00-04:00 1388.0 117 904.13 905.54 904.13 905.00
2024-05-08 04:01:00-04:00 1330.0 120 904.70 904.80 903.79 903.96
2024-05-08 04:02:00-04:00 1690.0 93 903.96 904.27 902.50 902.54
2024-05-08 04:03:00-04:00 1094.0 81 902.80 902.80 901.80 902.08
2024-05-08 04:04:00-04:00 902.0 72 902.22 903.33 902.19 903.33
2024-05-08 04:05:00-04:00 904.0 54 902.97 902.98 902.03 902.30
2024-05-08 04:06:00-04:00 1815.0 94 902.40 902.40 901.53 901.97
2024-05-08 04:07:00-04:00 583.0 56 901.97 903.00 901.97 902.67
2024-05-08 04:08:00-04:00 698.0 52 903.00 903.46 902.45 903.46
2024-05-08 04:09:00-04:00 634.0 36 903.43 903.84 902.87 903.68

As you can see, the size matches what we get from StockBarsRequest for the existing minutes, and for minutes like 04:03:00, and 04:05:00, we also have valid data.

So is this a problem related to how the data got aggregated in the alpaca’s backend? I’m confused.

@mag1cfrog Bars are only created if there are ‘valid’ trades during the bar. No valid trades, no bar. All trades are not equal and are categorized with associated ‘trade conditions’. One must look at a trade’s ‘trade conditions’ to determine if it is included in a specific bar calculation. This is true for all data providers and not just Alpaca.

The trades reported by the Securities Information Processor (SIP) fall into two general categories

  • informational
  • actual order executions

For example, conditions M and Q are Market Center Official Close and Market Center Official Open respectively. Those are just informational and the actual close, or open, trade appears as a separate ‘trade’. To avoid double counting informational ‘trades’ are excluded from bar calculation.

Of the ‘actual’ trades not all are what one would consider a normal retail execution. For example, condition R is a ‘Seller’ trade. These trades do not settle in the typical T+1 settle time, but rather give the seller the option to deliver the shares within a specified time up to 60 days. Condition 4 is a ‘Derivatively Priced’ trade which is a trade not based upon the current quote but rather some pre-defined price. Because these trades do not execute at the same prices as ‘regular’ trades they are typically excluded from bar calculations.

The largest portion of trades which are excluded from bar calcuations are ‘Odd Lot’ trades less than 100 shares. These have a trade condition I and are excluded from all bar calculations.

There is more detail on how bars are calculated in this article and here in the docs.

That explains why 1) sometimes there are ‘missing’ bars and 2) why simply adding up all individual trades won’t match the bar data.