Historical bar data missing data

mag1cfrog · September 21, 2024, 8:01am

I tried to use StockBarsRequest from alpaca-py to directly request 1-minute level data like this:

symbol = “NVDA”

timeframe = TimeFrame(1, TimeFrameUnit.Minute)

api_key = os.getenv(“APCA_API_KEY_ID”)

secret_key = os.getenv(“APCA_API_SECRET_KEY”)

client = StockHistoricalDataClient(api_key, secret_key)

os.makedirs(“./data/landing”, exist_ok=True)

end_date = datetime(2024, 5, 10)

start_date = datetime(2024, 5, 8)

request_params = StockBarsRequest(

symbol_or_symbols=[symbol], timeframe=timeframe, start=start_date, end=end_date

)

bars = client.get_stock_bars(request_params)

pandas_df = bars.df

pandas_df.reset_index(inplace=True)

pandas_df[‘timestamp’] = pandas_df[‘timestamp’].dt.tz_convert(‘America/New_York’)

And the returned data is like this:

It seems like we are missing data points like the minutes 04:03:00, 04:05:00, …

So I tried to directly download the raw trade records and manually aggregate them like this:

from alpaca.data.requests import StockTradesRequest

start_date = datetime(2024, 5, 8, 7, 30, 0)

end_date = datetime(2024, 5, 8, 8, 30, 0)

request_params = StockTradesRequest(

symbol_or_symbols=[symbol], timeframe=timeframe, start=start_date, end=end_date

)

trades = client.get_stock_trades(request_params)

trades_df = trades.df

trades_df.reset_index(inplace=True)

trades_df[‘timestamp’] = trades_df[‘timestamp’].dt.tz_convert(‘America/New_York’)

trades_df.set_index(‘timestamp’, inplace=True)

minute_aggregations = trades_df.resample(‘1min’).agg({
‘size’: ‘sum’, # Sum of size for volume
‘id’: ‘count’, # Count of IDs for trade count
‘price’: ‘ohlc’ # Open, high, low, close of price
})

print(“Individual minute volumes and trade counts:”)
minute_aggregations.head(10)

And we can get something like this:

Column 1	Column 2	Column 3	Column 4	E	F	G
Individual minute volumes and trade counts:
	size	id	price
	size	id	open	high	low	close
timestamp
2024-05-08 04:00:00-04:00	1388.0	117	904.13	905.54	904.13	905.00
2024-05-08 04:01:00-04:00	1330.0	120	904.70	904.80	903.79	903.96
2024-05-08 04:02:00-04:00	1690.0	93	903.96	904.27	902.50	902.54
2024-05-08 04:03:00-04:00	1094.0	81	902.80	902.80	901.80	902.08
2024-05-08 04:04:00-04:00	902.0	72	902.22	903.33	902.19	903.33
2024-05-08 04:05:00-04:00	904.0	54	902.97	902.98	902.03	902.30
2024-05-08 04:06:00-04:00	1815.0	94	902.40	902.40	901.53	901.97
2024-05-08 04:07:00-04:00	583.0	56	901.97	903.00	901.97	902.67
2024-05-08 04:08:00-04:00	698.0	52	903.00	903.46	902.45	903.46
2024-05-08 04:09:00-04:00	634.0	36	903.43	903.84	902.87	903.68

As you can see, the size matches what we get from StockBarsRequest for the existing minutes, and for minutes like 04:03:00, and 04:05:00, we also have valid data.

So is this a problem related to how the data got aggregated in the alpaca’s backend? I’m confused.

Dan_Whitnable_Alpaca · September 22, 2024, 12:15pm

@mag1cfrog Bars are only created if there are ‘valid’ trades during the bar. No valid trades, no bar. All trades are not equal and are categorized with associated ‘trade conditions’. One must look at a trade’s ‘trade conditions’ to determine if it is included in a specific bar calculation. This is true for all data providers and not just Alpaca.

The trades reported by the Securities Information Processor (SIP) fall into two general categories

informational
actual order executions

For example, conditions M and Q are Market Center Official Close and Market Center Official Open respectively. Those are just informational and the actual close, or open, trade appears as a separate ‘trade’. To avoid double counting informational ‘trades’ are excluded from bar calculation.

Of the ‘actual’ trades not all are what one would consider a normal retail execution. For example, condition R is a ‘Seller’ trade. These trades do not settle in the typical T+1 settle time, but rather give the seller the option to deliver the shares within a specified time up to 60 days. Condition 4 is a ‘Derivatively Priced’ trade which is a trade not based upon the current quote but rather some pre-defined price. Because these trades do not execute at the same prices as ‘regular’ trades they are typically excluded from bar calculations.

The largest portion of trades which are excluded from bar calcuations are ‘Odd Lot’ trades less than 100 shares. These have a trade condition I and are excluded from all bar calculations.

There is more detail on how bars are calculated in this article and here in the docs.

That explains why 1) sometimes there are ‘missing’ bars and 2) why simply adding up all individual trades won’t match the bar data.

Topic		Replies	Views
Missing Historical Bar Data: TimeFrame.Minute Alpaca Market Data	1	390	October 6, 2023
Historical stock data very wrong Alpaca Market Data	5	1156	March 28, 2023
API pulling historical bars with missing values Alpaca Trading	0	30	February 24, 2025
Missing / data delay? Alpaca Market Data	0	768	April 18, 2020
Disconnect between daily and minute-bar data returned via v2/stocks/bars Alpaca Market Data	3	36	April 2, 2025

Historical bar data missing data

Related topics