Missing minutes in real-time aggregate collection

Hi there,

I’ve just subscribed to the paid unlimited data plan and have been testing the real-time collection of minute aggregates for stocks in the S&P 500 (as well as SPY). I use the QuantRocket platform to interface with your API.

I’ve noticed that on certain days, there are several minutes with null price data for highly traded stocks (e.g. SPY), which should not happen. There is no pattern to the null minutes. Here’s a shot of the csv I’ve written the SPY price data to for each minute today:

I’ve also noticed that this happens for many of the stocks in the S&P 500. In fact, there were some minutes where the null prices were received for 500 of the 503 stocks that were being collected. Here’s a shot of how many stocks (out of 503 collected) received null ‘Close’ prices at each minute:

Is there any explanation for the gaps in the stream? What’s the expected latency when streaming minute aggregates for 500-1000 stocks? Keep in mind, I’m using the paid subscription with all exchanges, so data gaps in something like SPY and other S&P mainstays would be very unusual.

1 Like

Data integrity in the historical API remains a massive issue that is often brushed aside (for whatever reason). I hope you receive an answer and am looking forward to Alpaca’s response.

I have the very same observation. The explanation I read in this forum is that, odd lots (< 100 shares) are excluded in the way transactions are counted. In a 1-min period if there are no regular lot transactions, then no 1-min bar data will be generated, even though several odd lots may have been transacted

That said, even in some high volume stocks, such as SPY, AMZN, AAPL I see missing 1-min bars frequently, 5-min bars sometimes, and 15 minute bars occasionally. What bothers me the most is the missing of 15 minute bars, I have difficulty imagining days/situations when during a whole 15 minutes period there will not be any regular lot transactions.

But I may be wrong, it may be that a good portion of the high frequency trades, which make up more than half market volume, could be odd lot trades.

By the way this is with the paid data subscription. Another thought is that, I think there is only 1-2 major data providers that feed to brokers, including Alpaca. Then it is up to the broker to process and pass it to you.

Then again, so many gaps make me nervous.

It’s very unlikely that this is explained by the exclusion of odd lots. Aside from the fact that these are heavily institutionally traded S&P 500 stocks, it can’t be a coincidence that so many of the stocks would return null prices for the same minute (see the second table above). This definitely suggests a systemic bug in the data pathway from exchange to alpaca aggregation.

As an update, I’ve run the above test for the entire week. Here are the number of null aggregate minutes per day for SPY:

Fri Jul 2: 18
Tue Jul 6: 0
Wed Jul 7: 15
Thu Jul 8: 1
Fri Jul 9: 8

For Alpaca diagonosis: the platform I use (QuantRocket) will report null values from your API if the aggregate bars are received more than 5 seconds after the minute ends. So, either the API is reporting empty aggregates in these minutes, or the bars are arriving more than 5 seconds into the minute. Either way, it’s a problem.

1 Like