Historic_agg_v2 returning an extra day

The historic_agg_v2 function appears to be returning an extra under some circumstances.

api.polygon.historic_agg_v2(“SPY”, 1, “day”, _from=“2020-08-17”, to=“2020-08-19”)

returns
open high low close volume vwap
timestamp
2020-08-17 00:00:00-04:00 337.94 338.34 337.48 337.91 35092877.0 337.8453
2020-08-18 00:00:00-04:00 338.34 339.10 336.61 338.64 38627631.0 338.3195
2020-08-19 00:00:00-04:00 339.05 339.61 336.62 337.23 68215296.0 338.2099
2020-08-20 00:00:00-04:00 335.36 338.80 335.22 338.28 42408654.0 337.4729

and
api.polygon.historic_agg_v2(“SPY”, 1, “day”, _from=“2020-11-12”, to=“2020-11-12”)

returns
open high low close volume vwap
timestamp
2020-11-12 00:00:00-05:00 355.58 356.7182 351.26 353.21 66838046.0 353.6225
2020-11-13 00:00:00-05:00 355.27 358.2000 354.71 358.10 44288743.0 356.4866

From what I could tell I think this may have something to do with the fix_daily_bar_date(date, timespan) function.

Any help would be appreciated. Thank you,
David

1 Like

It appears that previously Polygon had a bug and didn’t include the end day when fetching daily data. The Alpaca python SDK included a ‘patch’ to fix this called fix_daily_bar_date (as was mentioned by @DavidW). Time passed, and now voila, Polygon fixed their API so it DOES include the last day. So now the SDK adds a day where it’s not necessary.

I checked this out by directly calling the Polygon API for two days (2019-01-02 and 2019-01-03). It returns two days as expected.

https://api.polygon.io/v2/aggs/ticker/AAPL/range/1/day/2019-01-02/2019-01-03?sort=asc&apiKey=xxxxxx

It returns

{
  "ticker": "AAPL",
  "status": "OK",
  "queryCount": 2,
  "resultsCount": 2,
  "adjusted": true,
  "results": [
    {
      "v": 148158948,
      "vw": 38.8871,
      "o": 38.7225,
      "c": 39.48,
      "h": 39.7125,
      "l": 38.5575,
      "t": 1546405200000,
      "n": 1
    },
    {
      "v": 365248780,
      "vw": 35.895,
      "o": 35.995,
      "c": 35.5475,
      "h": 36.43,
      "l": 35.5,
      "t": 1546491600000,
      "n": 1
    }
  ],
  "request_id": "2fe31826b3f0d0d4cee2570f78adf9f1"
}

Probably we should submit an issue on Github to fix this.

Is it a big problem the way it is? All it does is gives an extra day of data. You could ‘clip’ the returned dataframe to just the desired days as a workaround. Something like this

start_date = '2020-11-12'
end_date = '2020-11-12'
polygon_data = api.polygon.historic_agg_v2('SPY', 1, 
                                           'day', 
                                           _from=start_date, 
                                           to=end_date).df
clipped_data = polygon_data[start_date:end_date]

Thank you! You’re right, it’s not a big issue but I figured it might be something other people might encounter as well and jus trying to help improve this great platform.
-David

THIS IS A BIG ISSUE. I was comparing results of back testing with online performance of my algorithm. I spent a day to find out the root cause is this bug in API. Any discrepancy between documentation and implementation is completely unacceptable.

Yes, this change is messing with my data as well. I’m trying to use Dan’s “clipped_data” method above but not having luck (ie. not doing it right probably, still a rookie coder here).

Would be awesome if Alpaca could remove their patch now that Polygon is returning the end date data.

Thank you!