This message is for Dan Whitnable from Alpaca.
Dan, we’ve already established here that your daily market data may be affected by bad prints. Since you are not going to do anything about it i can code around it. But now i am discovering that your minutely data has misprints as well. Below is an example of Boeing (BA)
I’m afraid your historical data is utterly useless in its current shape, at least to me. Please let me know it you plan to fix this and other issues. Thank you
Dan, did you have a chance to look into the above issue?
I also use a similar alpaca vs polygon audit and am finding the same issue with the historical data v2 get requests. I’ll continue to use polygon until the data is more consistent.
@SpyToTheSky I wasn’t able to replicate the discrepancy between Alpaca and Polygon data. When querying minute bar data for BA on 2021-01-27 15:49, both Polygon and Alpaca return identical data.
Here is the code
# Set the symbol and date to fetch
symbol = 'BA'
date = '2021-01-27'
# Get Alpaca data and convert from UTC
alpaca_bars = api.get_bars(symbol, TimeFrame.Minute, start=date, end=date, adjustment='raw').df
alpaca_bars.index = alpaca_bars.index.tz_convert('America/New_York')
# Get Polygon data
polygon_bars = polygon_api.polygon.historic_agg_v2(symbol, 1, 'minute', date, date).df
# Display the two
Here are the results
It appears the data in the original post incorrectly listed the close price instead of the high price for the PolygonHighPrice. That made it appear the two were different.
There were 883 trades during the 15:49 bar and the actual high trade for $197.08 occurred at 2021-01-27 15:49:20.878600-05:00, and was ID 7171443368123184 on tape A. Both Alpaca and Polygon (and most data providers) get the raw trades from the SIP feeds and use these same trades to generate aggregated ‘bar’ data.
Alpaca strives to provide high quality data following SIP and FINRA reporting guidelines. That’s not to say there aren’t mistakes but they would be rare. We take data seriously and encourage users to bring forth any discrepancies they find.
Dan, as usual, great job researching the issue. Thank you. I really want your company to succeed and that is why I keep bugging you.
Now I better understand your method of computing quote bars. I’m not an expert in this field, but i think for algotrading purposes computing bars with trades that did not occur within those bar is detrimental to the cause. Please correct me if i’m wrong but in the example above you factoring in a trade with condition “P” which, according to CTS spec stands for “Prior Reference Price”. In fact, during that minute NBBO never reported any quote with ask price higher than $193.8 or so. You bar, on the other hand, shows high price of 197.08. Why would you want some king of “Prior Reference Price” influence current bar price by over $3? I think you may want to add some kind of smarts in your bar calculation to make them more useful.
P.S. I don’t know how your polygon_api.polygon.historic_agg_v2 is implemented but when I get my minute bars from polygon I get the right prices.