Hello, while running some backtesting I noticed some of the minute bars are missing. Specifically at the market open. I can only assume there will be missing minute bars during the full trading day aswell. I am using the SIP feed when I’m fetching the historical bars.
2023-12-01 16:48:12,558 WARNING:First bar is not 14:30 for symbol: FI Close High Low n Open Volume vw
Timestamp
2023-11-10 14:31:00+00:00 119.62 119.95 119.62 33 119.95 18653 119.919696
Has there been any updates on this topic - I am seeing missing minutes when querying QQQ data from the IEX feed. SIP feed has no missing data. I assume that the IEX just did not have any activity in that one minute interval. This is creating some problems with properly generating plots with multiple tickers, the number of samples per ticker is varied and not 390 per day.
@torobot There will always be ‘missing bars’. Fundamentally, bars are only created if there are ‘valid’ trades during the bar. No valid trades then no bar. Even with highly traded symbols such as SPY there can be a ‘missing’ bar every once in awhile. You can read a bit more on how bars are created and what constitutes a ‘valid trade’ in this article.
Using IEX data there will be more missing bars than when using SIP data. Why? There simply are fewer trades executed on the IEX exchange. SIP data includes all trades from all execution venues. The trades executed on the IEX exchange are only a small subset of these (about 2%). As long as one sets an end datetime no more current than the previous 15 minutes, one can specify feed=sip and get full market bars even with a free data subscription.
For applications which don’t respond well to ‘missing’ bars, a typical approach is to forward fill the data. In python, pandas has several useful methods for this. Check out ffill and resample.
All understood Dan, I was just hoping that that fills could be done on the Alaca side (as they are with another IEX data provider I have been testing with). The difficulty with using the Python fill is only filling in the missing minutes during trading hours, on valid trading days. I am running a backtest with two or three years worth of 1-minute data and the fill process is complex.
@torobot The forward fill proccess shouldn’t be too complex, but it does entail two steps. 1) forward fill the data, then 2) filter the data for only ‘market hour’ minutes.
As you realized, the resample method creates and forward fills minutes for non-market hours including weekends and holidays. So this will get you the ‘big list’ of all resampled and forward filled minutes
symbol_list =['IBM', 'SPY']
# fetch minute bars for the day
# first set end and then start to 200 days ago
end = pd.to_datetime('now').tz_localize('America/New_York').normalize()
start = end - pd.Timedelta('10Day')
# fetch the daily bars. symbol_list should contain all the symbols you want
bars = client.get_stock_bars(StockBarsRequest(
symbol_or_symbols=symbol_list,
timeframe=TimeFrame.Minute,
start=start,
end=end,
adjustment='split',
)).df.tz_convert('America/New_York', level='timestamp')
resampled_bars = bars.reset_index('symbol').groupby('symbol').resample('1T').ffill()
You are correct it takes a bit of logic to then figure out which minutes are market hour minutes and disregard the rest. There is however a very helpful package called exchange_calendars which has a number of handy methods including is_trading_minute. Pass it a timestamp and it will return True if markets were open at that time. Something like his could then filter resampled_bars to just return the market_hour_bars.
!install and import exchange_calendars
!pip install -q exchange_calendars
import exchange_calendars
# instantiate a calendar object
us_markets = exchange_calendars.get_calendar("XNYS")
# create columns for bar timestamp and another if that timestamp is during market hours
resampled_bars['bar_timestamp'] = resampled_bars.index.get_level_values(level='timestamp')
resampled_bars['is_market_hour_bar'] = resampled_bars.bar_timestamp.apply(us_markets.is_trading_minute)
# filter for market hour bars using the query method
market_hour_bars = resampled_bars.query('is_market_hour_bar')