I reached out on the Alpaca Slack and here’s the answer from Dan Whitnable that helped me:
There isn’t any issue with missing Polygon data (at least from what I have ever seen). The issue is the polygon.historic_agg_v2 method limits the number of rows which are returned. If one requests data between two dates, and the number of returned rows is greater than that limit, the result will simply be truncated. It unfortunately does this ‘silently’ without any error mentioning the result isn’t complete. It also isn’t documented exactly what the cutoff is.
That is why it looked like there were several days missing in the data. They had been truncated. One can verify this behavior by simply requesting incrementally longer timeframes of data. Start with requesting 1 day, then 2, and so forth. Everything will be correct up until about 10 days (depending upon how the weekends fall). After that, the data for the last dates are dropped.
I’m sure there are more elegant solutions to get around this behavior, but I use a small function to fetch data one day at time. Like this
def get_hourly_history(ticker, from_date, to_date):
Get Polygon data between two dates and return in a dataframe
df = None
date_range = pd.date_range(start=from_date,
for day in date_range:
data = api.polygon.historic_agg_v2(ticker, 1, 'hour', _from=day, to=day).df
if df is None:
df = data
df = df.append(data)
That will fetch one day at a time and return all the data in a single dataframe. It could be sped up by fetching data in longer ‘chunks’ of days, and only fetching data for trading days, but this is simple.
Hope that helps. (edited)