I was using streaming trade data from Polygon with StreamConn(data_stream=‘polygon’), and I noticed for some stocks the trade data have duplicates while some stocks are OK.
For example, I was streaming trade data for CIEN for 10mins from 10:50am to 11:00am EST today, and I aggregated during runtime to get total volume of 127492, and total number of trades of 600. Then I tried Polygon API historic_trades_v2() for the same period, which gives total volume of 63746, and total number of trades of 300. These are exactly half of streaming data.
When I look at my log of each trade data, I do notice same trades appear exactly twice. Again, not every stock has this issue.
I wonder if anyone has seen this before?
I checked again today using Polygon’s stream API to see raw data from Polygon. Here are some findings.
Raw data from Polygon indeed have duplicates for trading data for some stocks. For example, below is one stream message for PAYC:
For this data, it returns two trades in one message, with exact same content including trade ID (annotated by “i”). There are some other examples that same two trades are in separate messages. I can find same trade data in Alpaca stream but in separate messages.
Do we know why there are same trade data in the message? Shall we treat them as one trade data? Apparently Polygon agg bar treat them as one.
Also for Alpaca it doesn’t include trade ID in its stream data, which will make it impossible to identify duplicates. I believe it should be added.