Minute Bars with Wrong Prices

Hi,

I have noticed on a couple of occasions that I have seen minute bar data that appears to be inaccurate. This is very concerning as the data seems to be incorrect. My app is making incorrect decisions based on incorrect data. Here is a detailed example I observed this morning:

I am using the websocket SIP exchange to stream minute bar data for a list of small-cap stocks. For the ticker PBM on the NASDAQ at 8:12am EST there were a couple of anomalies for 3 minutes. Here is the data that i received from Alpaca:

PBM
Market Data - Symbol: PBM
Market data: open high low close volume transactions
datetime
11/20/25 07:56:00-05:00 2.28 2.34 2.27 2.29 63540 0
11/20/25 07:57:00-05:00 2.29 2.34 2.27 2.28 64362 0
11/20/25 07:58:00-05:00 2.28 2.33 2.28 2.32 20710 0
11/20/25 07:59:00-05:00 2.32 2.39 2.32 2.36 58024 0
11/20/25 08:00:00-05:00 2.95 2.95 2.24 2.4475 1143447 0
11/20/25 08:01:00-05:00 2.45 3.17 2.36 2.38 269585 0
11/20/25 08:02:00-05:00 2.38 3.0903 2.3003 2.9802 141110 0
11/20/25 08:03:00-05:00 2.38 2.42 2.38 2.4 34498 0
11/20/25 08:04:00-05:00 2.4 2.41 2.37 2.38 26512 0
11/20/25 08:05:00-05:00 2.38 3.09 2.35 2.36 55555 0
11/20/25 08:06:00-05:00 2.35 3.79 2.34 2.35 796816 0
11/20/25 08:07:00-05:00 2.36 3.04 2.32 2.97 67066 0
11/20/25 08:08:00-05:00 2.98 3.03 2.32 2.35 24036 0
11/20/25 08:09:00-05:00 2.3502 2.37 2.35 2.36 8329 0
11/20/25 08:10:00-05:00 2.36 2.37 2.34 2.37 15589 0
11/20/25 08:11:00-05:00 2.3698 3.795 2.3403 2.35 1045568 0
11/20/25 08:12:00-05:00 2.35 3.76 2.32 3.3899 54545 0
11/20/25 08:13:00-05:00 3.25 3.7845 2.32 3.59 221009 0
11/20/25 08:14:00-05:00 3.53 3.79 2.35 3.59 223916 0
11/20/25 08:15:00-05:00 2.39 2.41 2.35 2.36 242412 0

The cells starting at 8:11 through 8:14 have suspect open, close, and high prices (the lows actually appear to be correct). When I check the same symbol using TradingView, I see this candle chart:

You can see around 8:12am, TradingView is not showing that this stock jumped in price by over $1.50. This also isn’t consistent with the Low price provided by Alpaca. I have seen this on a couple of different stocks at different times over the past few weeks.

If I can’t count on accurate data from Alpaca, I will be forced to move to a different provider.

@rsdimaggio You asked about historical data around 8 AM to 8:15 AM ET having spikes in highs and lows. The issue here is an artifact of how prices are received and timestamped by the SIPs. All Alpaca timestamps are the ‘participant timestamps’. Actually, more precisely ‘Timestamp 1’ received from the SIPs. Below is the definition from the CTS Specification (the UTP Specification is similar).

Timestamp 1
2 x Integer (pair of integers).
Timestamp 1 is a Participant-provided timestamp representing the number of nanoseconds since Epoch. The first integer contains the number of seconds from epoch 1/1/1970, 00:00:00 UTC. The next integer contains the nanosecond portion of the time (e.g., 972402315). For any messages generated by CTS, e.g., Messages generated on behalf of a Participant, Price Band messages, Control messages and Market Status messages, the Timestamp 1 field will be set to current SIP time.

If from an Exchange: Timestamp 1 denotes the Exchange Matching Engine Publication
timestamp for a transaction. Exchanges use a clock sync methodology ensuring that timestamps are accurate within tolerances of 100 microseconds or less. Exchanges shall provide the timestamp in terms of nanoseconds since Epoch.

If from the FINRA Alternative Display Facility (ADF) or a FINRA Trade Reporting
Facility (TRF): Timestamp1 denotes the time of execution that a FINRA member reports to the FINRA ADF or a FINRA TRF. FINRA shall provide such times to the Processor in nanoseconds since Epoch.

We often generalize the timestamp to be the ‘time the trade executed’, but this isn’t exactly the definition. That is true for trades executed on an exchange. However, for trades not executed on an exchange and reported to a FINRA Alternative Display Facility (ADF) or a FINRA Trade Reporting Facility (TRF), it is the time which the trade was reported to the reporting facility. (These trades have an exchange code of “D”.) This time is typically very very close to the execution time, however it really distorts things for trades executed after 20:00 ET. Why? The ADF and TRF facilities aren’t open. They don’t open until 8:00 ET the following morning. Therefore, execution platforms don’t report overnight trades until 8:00 ET (even though they may have executed the previous evening). The timestamp is the time the trade is reported for those trades. All those trades get a timestamp of 8:00 (or shortly after) and therefore all get reported at once. If you look at trade volume between 8:00-8:15 AM ET you will often see a big spike along with swings in prices. That is because of trades from the previous evening all getting reported when the reporting facilities open.

One of the major overnight exchanges is BOATS. They recognize this reporting anomaly in their FAQ. Below is from the BOATS (overnight) exchange FAQ. Note the comment "Trades executed between 8:00 PM ET and 12:00 AM ET will carry a trade date of the following trade day."

The core offering is the Blue Ocean Session, which operates overnight in the US from 8:00 PM ET- 4:00 AM ET (Sunday-Thursday). It operates only on those calendar days when the NYSE Trade Report Facility (TRF) is open for reporting the following morning. Trades executed between 8:00 PM ET and 12:00 AM ET will carry a trade date of the following trade day. Settlement date will reflect the first business day following the transaction (T+1).

For example, when operating from Sunday evening at 8:00 PM to 4:00 AM ET Monday morning, the trades executed during that session will be reported to the NYSE TRF no later than 8:15 AM ET that same Monday morning.

Unfortunately there is not a way to filter out the overnight trades from the pre-market data. Essentially the SIP timestamp is the time it is received by the SIP. There is no other identifier in the SIP data to indicate these trades actually occurred at a different time. Other providers may augment their data with other sources and be able to filter these trades.

Below shows this behavior for a symbol (SLNT) which is typical for many symbols.

Extended hour data can be misleading in this regard.

1 Like

Does the trade-level WebSocket stream ("trades" channel) expose the raw SIP Condition Codes in the c (conditions) field for every message?

Specifically, if I consume the raw trade stream, will I be able to see and programmatically ignore trades with the following Sale Conditions:

  • U (Extended hours, sold out of sequence)

  • T (Extended hours)

  • Z (Sold out of sequence)

I want to confirm that these flags are preserved in the WebSocket message payload so I can exclude them from my candle calculations to eliminate the 8 AM “ghost” spikes.

Has anyone successfully implemented this filter using the Alpaca trades stream?

1 Like