Bad prints in market data

SpyToTheSky · May 15, 2021, 11:09pm

This message is for Dan Whitnable from Alpaca.

Dan, we’ve already established here that your daily market data may be affected by bad prints. Since you are not going to do anything about it i can code around it. But now i am discovering that your minutely data has misprints as well. Below is an example of Boeing (BA)

Market Data Discrepancy

I’m afraid your historical data is utterly useless in its current shape, at least to me. Please let me know it you plan to fix this and other issues. Thank you

SpyToTheSky · May 18, 2021, 4:06pm

Dan, did you have a chance to look into the above issue?

Beauj34 · May 25, 2021, 5:42pm

I also use a similar alpaca vs polygon audit and am finding the same issue with the historical data v2 get requests. I’ll continue to use polygon until the data is more consistent.

Dan_Whitnable_Alpaca · May 26, 2021, 9:35am

@SpyToTheSky I wasn’t able to replicate the discrepancy between Alpaca and Polygon data. When querying minute bar data for BA on 2021-01-27 15:49, both Polygon and Alpaca return identical data.

Here is the code

# Set the symbol and date to fetch
symbol = 'BA'
date = '2021-01-27'

# Get Alpaca data and convert from UTC
alpaca_bars = api.get_bars(symbol, TimeFrame.Minute, start=date, end=date, adjustment='raw').df
alpaca_bars.index = alpaca_bars.index.tz_convert('America/New_York')

# Get Polygon data
polygon_bars = polygon_api.polygon.historic_agg_v2(symbol, 1, 'minute', date, date).df

# Display the two
display(alpaca_bars.between_time('15:48', '15:50'))
display(polygon_bars.between_time('15:48', '15:50'))

Here are the results

It appears the data in the original post incorrectly listed the close price instead of the high price for the PolygonHighPrice. That made it appear the two were different.

There were 883 trades during the 15:49 bar and the actual high trade for $197.08 occurred at 2021-01-27 15:49:20.878600-05:00, and was ID 7171443368123184 on tape A. Both Alpaca and Polygon (and most data providers) get the raw trades from the SIP feeds and use these same trades to generate aggregated ‘bar’ data.

Alpaca strives to provide high quality data following SIP and FINRA reporting guidelines. That’s not to say there aren’t mistakes but they would be rare. We take data seriously and encourage users to bring forth any discrepancies they find.

SpyToTheSky · May 29, 2021, 5:05am

Dan, as usual, great job researching the issue. Thank you. I really want your company to succeed and that is why I keep bugging you.
Now I better understand your method of computing quote bars. I’m not an expert in this field, but i think for algotrading purposes computing bars with trades that did not occur within those bar is detrimental to the cause. Please correct me if i’m wrong but in the example above you factoring in a trade with condition “P” which, according to CTS spec stands for “Prior Reference Price”. In fact, during that minute NBBO never reported any quote with ask price higher than $193.8 or so. You bar, on the other hand, shows high price of 197.08. Why would you want some king of “Prior Reference Price” influence current bar price by over $3? I think you may want to add some kind of smarts in your bar calculation to make them more useful.

P.S. I don’t know how your polygon_api.polygon.historic_agg_v2 is implemented but when I get my minute bars from polygon I get the right prices.

Topic		Replies	Views
Alpaca and polygon data difference for symbol MMM Alpaca Market Data	3	424	May 21, 2024
Wrong Historical Data Alpaca Market Data	13	1605	November 2, 2022
Historical Bar Data Integrity Error Alpaca Market Data	3	653	April 18, 2022
Invalid bars around 8am EST shuffle for TQQQ Alpaca Market Data	4	1776	May 9, 2022
Not impressed with v2 data Alpaca Market Data	13	1565	April 20, 2021

Bad prints in market data

Related topics