I tried to use StockBarsRequest from alpaca-py to directly request 1-minute level data like this:
symbol = “NVDA”
timeframe = TimeFrame(1, TimeFrameUnit.Minute)
api_key = os.getenv(“APCA_API_KEY_ID”)
secret_key = os.getenv(“APCA_API_SECRET_KEY”)
client = StockHistoricalDataClient(api_key, secret_key)
os.makedirs(“./data/landing”, exist_ok=True)
end_date = datetime(2024, 5, 10)
start_date = datetime(2024, 5, 8)
request_params = StockBarsRequest(
symbol_or_symbols=[symbol], timeframe=timeframe, start=start_date, end=end_date
)
bars = client.get_stock_bars(request_params)
pandas_df = bars.df
pandas_df.reset_index(inplace=True)
pandas_df[‘timestamp’] = pandas_df[‘timestamp’].dt.tz_convert(‘America/New_York’)
And the returned data is like this:
It seems like we are missing data points like the minutes 04:03:00, 04:05:00, …
So I tried to directly download the raw trade records and manually aggregate them like this:
from alpaca.data.requests import StockTradesRequest
start_date = datetime(2024, 5, 8, 7, 30, 0)
end_date = datetime(2024, 5, 8, 8, 30, 0)
request_params = StockTradesRequest(
symbol_or_symbols=[symbol], timeframe=timeframe, start=start_date, end=end_date
)
trades = client.get_stock_trades(request_params)
trades_df = trades.df
trades_df.reset_index(inplace=True)
trades_df[‘timestamp’] = trades_df[‘timestamp’].dt.tz_convert(‘America/New_York’)
trades_df.set_index(‘timestamp’, inplace=True)
minute_aggregations = trades_df.resample(‘1min’).agg({
‘size’: ‘sum’, # Sum of size for volume
‘id’: ‘count’, # Count of IDs for trade count
‘price’: ‘ohlc’ # Open, high, low, close of price
})print(“Individual minute volumes and trade counts:”)
minute_aggregations.head(10)
And we can get something like this:
Column 1 | Column 2 | Column 3 | Column 4 | E | F | G |
---|---|---|---|---|---|---|
Individual minute volumes and trade counts: | ||||||
size | id | price | ||||
size | id | open | high | low | close | |
timestamp | ||||||
2024-05-08 04:00:00-04:00 | 1388.0 | 117 | 904.13 | 905.54 | 904.13 | 905.00 |
2024-05-08 04:01:00-04:00 | 1330.0 | 120 | 904.70 | 904.80 | 903.79 | 903.96 |
2024-05-08 04:02:00-04:00 | 1690.0 | 93 | 903.96 | 904.27 | 902.50 | 902.54 |
2024-05-08 04:03:00-04:00 | 1094.0 | 81 | 902.80 | 902.80 | 901.80 | 902.08 |
2024-05-08 04:04:00-04:00 | 902.0 | 72 | 902.22 | 903.33 | 902.19 | 903.33 |
2024-05-08 04:05:00-04:00 | 904.0 | 54 | 902.97 | 902.98 | 902.03 | 902.30 |
2024-05-08 04:06:00-04:00 | 1815.0 | 94 | 902.40 | 902.40 | 901.53 | 901.97 |
2024-05-08 04:07:00-04:00 | 583.0 | 56 | 901.97 | 903.00 | 901.97 | 902.67 |
2024-05-08 04:08:00-04:00 | 698.0 | 52 | 903.00 | 903.46 | 902.45 | 903.46 |
2024-05-08 04:09:00-04:00 | 634.0 | 36 | 903.43 | 903.84 | 902.87 | 903.68 |
As you can see, the size matches what we get from StockBarsRequest for the existing minutes, and for minutes like 04:03:00, and 04:05:00, we also have valid data.
So is this a problem related to how the data got aggregated in the alpaca’s backend? I’m confused.