Wrong amount of data?

Why am I getting more bars for 1-day intervals than I’m getting 4-hour intervals?

MSFT, 2018 - 2020, 4 hour intervals:

api.polygon.io/v2/aggs/ticker/MSFT/range/4/hour/2018-01-01/2020-10-01?apiKey=******

response:
{
"ticker": "MSFT",
"status": "OK",
"queryCount": 5708,
"resultsCount": 50,
"adjusted": true,
"results": [ ... ]
}

MSFT, 2018 - 2020, 1 day intervals:

api.polygon.io/v2/aggs/ticker/MSFT/range/1/day/2018-01-01/2020-10-01?apiKey=******

response:
{
"ticker": "MSFT",
"status": "OK",
"queryCount": 690,
"resultsCount": 690,
"adjusted": true,
"results": [ ... ]
}

If I convert all the timestamp values to dates for the 4-hour results, I get this:

2020-09-29 09:00:00.000Z
2020-09-29 05:00:00.000Z
2020-09-28 21:00:00.000Z
2020-09-28 17:00:00.000Z
2020-09-28 13:00:00.000Z
2020-09-28 09:00:00.000Z
2020-09-28 05:00:00.000Z
2018-01-17 17:00:00.000Z
2018-01-17 13:00:00.000Z
2018-01-17 09:00:00.000Z
2018-01-16 21:00:00.000Z
2018-01-16 17:00:00.000Z
2018-01-16 13:00:00.000Z
2018-01-16 09:00:00.000Z
2018-01-12 21:00:00.000Z
2018-01-12 17:00:00.000Z
2018-01-12 13:00:00.000Z
2018-01-12 09:00:00.000Z
2018-01-11 21:00:00.000Z
2018-01-11 17:00:00.000Z
2018-01-11 13:00:00.000Z
2018-01-11 09:00:00.000Z
2018-01-10 21:00:00.000Z
2018-01-10 17:00:00.000Z
2018-01-10 13:00:00.000Z
2018-01-10 09:00:00.000Z
2018-01-09 21:00:00.000Z
2018-01-09 17:00:00.000Z
2018-01-09 13:00:00.000Z
2018-01-09 09:00:00.000Z
2018-01-08 21:00:00.000Z
2018-01-08 17:00:00.000Z
2018-01-08 13:00:00.000Z
2018-01-08 09:00:00.000Z
2018-01-05 21:00:00.000Z
2018-01-05 17:00:00.000Z
2018-01-05 13:00:00.000Z
2018-01-05 09:00:00.000Z
2018-01-04 21:00:00.000Z
2018-01-04 17:00:00.000Z
2018-01-04 13:00:00.000Z
2018-01-04 09:00:00.000Z
2018-01-03 21:00:00.000Z
2018-01-03 17:00:00.000Z
2018-01-03 13:00:00.000Z
2018-01-03 09:00:00.000Z
2018-01-02 21:00:00.000Z
2018-01-02 17:00:00.000Z
2018-01-02 13:00:00.000Z
2018-01-02 09:00:00.000Z

Hi, from my experience with polygon data, data is returned in segments.
meaning, we have patches of missing data. e.g we request data from 2020-03-01 to 2020-07-01
and we get something like this: 2020-03-01:2020-03-15, 2020-06-25:2020-07-01
so that makes life difficult… there’s no way to know which patch will
be returned and which one we should try to get again.
so the solution must be, ask data in segments. I selected an arbitrary
time window of 2 weeks, and split the calls until I get all required
data

1 Like

I suspect it relates to an internal limit in polygon for queryCount.

Because the 4 hr query is using minute date, the number of days that can be returned at a time is much smaller.

For 1-day query, in contrast, you can see that the queryCount is matching resultsCount which indicates that the backend efficiency is much higher.

If we probed a few different queries, you could likely identify where the internal limit on queryCount is and adjust the from-to span to avoid bumping into the limit. @Shlomik’s approach is probably staying within that internal limit.

Looking at…
api.polygon.io/v2/aggs/ticker/MSFT/range/4/hour/2020-08-01/2020-09-01
and then advancing the through 2020-08-02, etc. the queryCount is exactly 5000 for me. It’s odd/interesting that for your ranges the limit seems to be in the 5700-5800 range.


For 4 hour intervals, 1min data sampling implies 240 samples. 5000/240 = 20.83… So I wouldn’t expect to get more than 20 intervals reliably.

Assuming 4am to 8pm, implies 16 hours of trading at most per day – which is 4 intervals. So I wouldn’t assume you can query more than 5 days (20 intervals / 4 intervals per day) at at time.

api.polygon.io/v2/aggs/ticker/MSFT/range/4/hour/2020-08-03/2020-08-07

{
    "ticker":"MSFT",
    "queryCount":3626,
    "resultsCount":20,
    "adjusted":true,
    "results":[
        { ... "t":1596441600000 ... },
        ...
        { ... "t":1596830400000 ... }
 ... ]
}

1596441600000 = Monday, August 3, 2020 4:00:00 AM GMT-04:00 DST
1596830400000 = Friday, August 7, 2020 4:00:00 PM GMT-04:00 DST

That’s 5 trading days with all intervals present (via market or extended hours) and the queryCount is well below the apparent internal limit of 5000-58xx.

Looking at the queryCount:
3626 / 240 = 15.1083…

Perhaps some minutes had no data.

1 Like