Issues Retrieving Historical Data From Polygon API

Environment

Language
Python 3.7

Alpaca SDK Version
Latest

Other Environment Details
Using historic_agg_v2 polygon function to get aggregate data

Problem

Summary
There seem to be variations in the number of entries returned by the “polygon.historic_agg_v2” function depending on the timespan, ticker & date range values. I assume limits exist on the number of results per api call, but I fail to find any consistency in the results from which I can deduce a pattern. I do recall reading the limit being 3000 in the documentation somewhere, but that doesn’t reconcile with what I’m seeing either.

Case 1 (pasted below): Hourly data on ‘AAPL’ from 2012-01-01 to 2013-07-01: Returns 1253 entries (starting from 2013-03-06).
Case 2 (omitted for brevity): Hourly data on ‘SPY’ from the same date range: Returns 1041 entries.
Case 3 (pasted below): Minute data on ‘AAPL’ from the same date range: Returns exactly 50,000 entries, but the returned start date is almost the same (off by a day) as it was in Case 1.
Case 4 (omitted for brevity): Minute data on ‘SPY’ from the same date range: Returns exactly 50,000 entries, again with an almost identical start date to Case 2 (hourly data).

The goal is to retrieve intraday data on a universe of tickers in order to run a backtest. Trying to understand what limits exist on the polygon api, and how I can fetch historical data en masse. Surely this isn’t the way the API was designed to function. What am I missing?!

Paper or Live Tradng?
Tried It On Both…Identical Results

Example Code
Case 1:
aapl = api.polygon.historic_agg_v2(‘AAPL’, 1, timespan=‘hour’, _from=‘2012-01-01’, to=‘2013-07-01’).df

                         open     high      low    close      volume

timestamp
2013-03-06 08:00:00-05:00 61.8500 61.8500 61.8286 61.8286 2800.0
2013-03-06 09:00:00-05:00 61.8243 62.1786 60.9071 61.0000 25419079.0
2013-03-06 10:00:00-05:00 61.0000 61.2857 60.8157 61.1114 21132972.0
2013-03-06 11:00:00-05:00 61.1157 61.5257 61.0143 61.1514 12994989.0
2013-03-06 12:00:00-05:00 61.1341 61.2543 60.8400 60.8771 8170974.0
… … … … … …
2013-06-28 15:00:00-04:00 56.9286 57.1757 56.5329 56.5943 19002739.0
2013-06-28 16:00:00-04:00 56.5829 57.1414 56.2157 56.7071 27717753.0
2013-06-28 17:00:00-04:00 56.7014 56.7371 56.5725 56.7286 5209022.0
2013-06-28 18:00:00-04:00 56.7214 56.7286 56.7143 56.7286 15925.0
2013-06-28 19:00:00-04:00 56.7214 56.7429 56.7214 56.7286 23758.0

[1253 rows x 5 columns]

Case 3:
aapl = api.polygon.historic_agg_v2(‘AAPL’, 1, timespan=‘minute’, _from=‘2012-01-01’, to=‘2013-07-01’).df

                          open     high      low    close    volume

timestamp
2013-03-05 09:45:00-05:00 60.6429 60.7014 60.5843 60.6543 626703.0
2013-03-05 09:46:00-05:00 60.6600 60.6928 60.1429 60.6721 590422.0
2013-03-05 09:47:00-05:00 60.6729 60.6729 60.5300 60.5457 385637.0
2013-03-05 09:48:00-05:00 60.5457 60.6429 60.5429 60.6429 491316.0
2013-03-05 09:49:00-05:00 60.6186 60.6571 60.6029 60.6286 505883.0
… … … … … …
2013-06-28 19:43:00-04:00 56.7357 56.7357 56.7357 56.7357 2100.0
2013-06-28 19:50:00-04:00 56.7400 56.7414 56.7400 56.7414 3703.0
2013-06-28 19:54:00-04:00 56.7429 56.7429 56.7286 56.7286 4823.0
2013-06-28 19:55:00-04:00 56.7286 56.7286 56.7286 56.7286 3416.0
2013-06-28 19:59:00-04:00 56.7286 56.7286 56.7286 56.7286 2191.0
[50000 rows x 5 columns]

In all cases, the results seem to end at the date I pass to “to=_” and work backward till some limit is hit. I’ve noticed that though the # of returned entries differ from minute (50k every time…seems to be some global upper limit) vs. hour (circa 1000 order-of-mag ballpark), in all cases the returned date range seems to be circa 3 months.

Scratch that. It varies by the extent of ETH trading. On tickers (e.g. ‘CMG’) with infrequent ETH trading, the date range returned can be 6-7 months. But still getting order-of-mag 1000-1400 entries for hour bars.