Alpaca web socket trade stream have 20 percent less trades than Polygon Trade stream

I have Pro data plan and I run web socket trade updates for AAPL on March 10, 2021 for both Alpaca and Polygon for comparison. I consistently found that I was getting only around 80% trades of what I was getting in Polygon websocket.
Here is the stats in 10 min window:
Start time: 2021-03-10T17:10:00Z (NYC 12:10 PM)
End time: 2021-03-10T17:20:00Z (NYC 12:20 PM)

#Trades received in Alpaca websocket = 9170
#Trades received in Polygon websocket = 11236
#Trades in that timewindow in historic data download via Polygon = 11236

Alpaca is missing 20% trades and Polygon have 100% coverage.
Wondering what is wrong on Alpaca side?

I did bit more analysis on missing trades, here is trades in 1 second window (2021-03-10T17:10:04.000Z - 2021-03-10T17:10:05.000Z) for Alpaca stream and Polygon data.
Polygon have 23 trades but Alpaca only have 13 trades.
Example of trade which is missing on Alpaca side:

Polygon Trade

price: 120.27
size: 100
trade_id: “56233”
exchange_id: 11
tape: 3
sip_timestamp {
seconds: 1615396204
nanos: 628679924
}
exchange_timestamp {
seconds: 1615396204
nanos: 628310272
}
sequence_no: 3761048

Example of trades which are on both side:

Alpaca

price: 120.275
size: 12
trade_id: “53983”
exchange_id: 4
tape: 3
exchange_timestamp {
seconds: 1615396204
nanos: 318000000
}
condition: 0
condition: 37
symbol: “AAPL”
raw_json: “{\n "T": "t",\n "i": 53983,\n "S": "AAPL",\n "x": "D",\n "p": 120.275,\n "s": 12,\n "t": "2021-03-10T17:10:04.318Z",\n "c": [\n "@",\n "I"\n ],\n "z": "C"\n}”

Polygon

price: 120.275
size: 12
trade_id: “53983”
exchange_id: 4
trf_id: 201
tape: 3
sip_timestamp {
seconds: 1615396204
nanos: 320072326
}
exchange_timestamp {
seconds: 1615396204
nanos: 318000000
}
trf_timestamp {
seconds: 1615396204
nanos: 319714994
}
sequence_no: 3760972
condition: 37

6 Likes

Have you tried looking at the exchange ids? Is it possible Alpaca is missing trades from 1 or more exchanges? Or can you find a pattern with trade conditions? For some reason, some trades with special conditions could be filtered. Just an idea.

Can someone at Alpaca comment on this difference?
Where is the Alpaca data being sourced from in general?
I assumed they were just going to offer Polygon straight through the UI for the $50/month, but that doesn’t seem to be the case.

i think IEX if i recall

I think it’s IEX or SIP but this doesn’t give info about how this is differentiated from Polygon yet. I should have asked a better question initially.

might just be integrated in the IEX feed

I want to clarify that I am using wss://stream.data.alpaca.markets/v2/sip

So I am comparing Alpaca SIP feed with Polygon SIP feed.

1 Like

Also I noticed another difference: trades reported via Alpaca which has condition “F” (Intermarket Sweep) doesn’t have any condition like “Trade Thru Exempt” with them.

“{\n "T": "t",\n "i": 56056,\n "S": "AAPL",\n "x": "P",\n "p": 120.22,\n "s": 100,\n "t": "2021-03-10T17:08:21.151366144Z",\n "c": [\n "@",\n "F"\n ],\n "z": "C"\n}”

But in poly feed and historic data most of condition 14 (Intermarket Sweep) trades also have condition 41(Trade Thru Exempt) attached to them.
For trading decision purpose I generally ignore trades with condition 41 because they doesn’t reflect true price. But on Alpaca side this is missing.

The same trade look like this on Poly side

price: 120.22
size: 100
trade_id: “56056”
exchange_id: 11
tape: 3
sip_timestamp {
seconds: 1615396101
nanos: 151729411
}
exchange_timestamp {
seconds: 1615396101
nanos: 151366144
}
sequence_no: 3742968
condition: 14
condition: 41

I have reported this issue a while back, and glad someone also found the same issue. Sadly I haven’t heard anything from Alpaca yet.

One thing I found is Polygon contains dark pool and I guess Alpaca doesn’t.

It seems a potential lack in dark-pools is the leading hypothesis on the 10-20% tape gap. It would make sense since the tape can be perpetually behind ~10 seconds for all dark pool executions. Polygon states all 16 exchanges & all US volume, and it sounds like Alpaca’s Pro description of “all US exchanges” is equivalent.
Is everyone seeing equivalent:

  1. before/after-hours data?
  2. order cancellations?

All trades from dark pools must be included in the SIP feed for everyone to see. https://www.nasdaq.com/articles/slicing-the-liquidity-pie-2019-02-11 If the data is missing it’s being excluded on purpose.

This is what I’m trying to figure out. If they are sourcing the data from the SIP feeds, everything would be included. Dark pools are required to report to FINRA, and FINRA’s data gets broadcasted through CTA and UTP. They might be sourcing data from a third party? I really have no idea, but that would make more sense than them purposefully excluding 10-20% of data. If they are sourcing the data through a third party, they have to get the NBBO data from somewhere to execute trades on those NBBO prices(which I assume they are doing).

Waiting on a response from the Alpaca Team.

1 Like

Has anybody done a recent comparison between Alpaca and Polygon to see if this issue is still present?

3 Likes

I found numerous data issues with Alpaca, to the point that I’m quite confident that the team behind is not competent and have no fucking idea what they are doing. You won’t hear from them why it’s messed up because they don’t know themselves.

1 Like

I’d prefer to pay for Polygon (historical and real-time market data) as long as the rest from Alpaca worked.

1 Like

Or they can’t disclose that they are not actually receiving full SIP data. They aren’t listed on UTP’s site as a vendor. UTPPlan

TMK, nothing related to Alpaca appears in the ‘Data Provider’ dropdown menu

id love to know if someone, or the original poster has done a recent comparison. i want the dark pool data and am debating a possible switch to polygon if they seem to have it, or more of it.

Ran into a similar issue today, and based on my research, I believe the issue is that Alpaca is using the Exchange provided timestamp for their time windows, while Polygon is using the SIP timestamps.

While Polygon provides both the SIP and Exchange timestamp, Alpaca unfortunately only seems to provide/use the Exchange timestamp. This causes some quite significant discrepancies in how bars are aggregated thus leading to incorrect prices and volume. I will say that the full volume of trades do seem to exist, but just not in the correct bars. i.e. on 5/4/2022 the 8AM EST bar for TQQQ is showing around 200k volume on Polygon/TradingView, but only ~20k (10%) on Alpaca. However, if I look at the volume of trades spanning 7AM EST to 8AM EST, the total volumes are more or less the same. Unfortunately Alpaca isn’t just off by 1 or 2 bars, sometimes the trades are an hour off compared to Polygon.

See my forum post for details: Invalid bars around 8am EST shuffle for TQQQ - #3 by CapitalMastery

I currently have a ticket open with support, but I’m not feel very hopeful. I get the feeling that the support person doesn’t understand my issue, and for whatever reason isn’t able/willing to pull in someone from the data team to provide support.

Alpaca could be the perfect offering if they get these data issues fixed. I’m currently paying $200/mo for Polygon, and I’m really hoping I can switch to Alpaca sooner than later.

1 Like

bar’s and polygon aside… just posted this on a different thread (link below)

if you just query alpaca for all trades on a given day and add up the volume it comes up short of what brokerage apps and yahoo show. im trying to figure that one out. its a similar seeming kind of issue so i wanted to share.