Alpaca market data so bad

Alpaca:
CWBR (‘2021-06-23’, 1.2, 1.22, 1.16, 1.21, 580421)

Yahoo:
Jun 23, 2021 1.1600 1.2200 1.1600 1.2200 624,040

open price was like 1.2 vs 1.16… this is like 3%+ diff…

2 Likes

The Alpaca data is based on the same SIP feeds which most data providers use. The SIPs also provide guidance on how to calculate the High, Low, Close and Volume data. For this reason, all data providers generally have the same data. There can be discrepancies though. One area is in open prices. The SIPs don’t give guidance on how to calculate open prices. Different data providers may have slightly different rules for that. I have also seen where some providers do not calculate volume correctly. The SIPs provide separate guidance on which trades to include for pricing and which to include for volume. Some providers only include those trades which used are for pricing which is not what the SIPs recommend.

So, let’s look specifically at CWBR on 2021-06-23. Below is the Alpaca data. I’ve fetched it using the python SDK

Here is the Alpaca data…

_

Here is the Yahoo data…

And here is NASDAQ just for comparison…

So, a few things to notice… the High, Low, and Close prices all match. The Alpaca and Nasdaq volumes match but Yahoo is less. I didn’t check, but in the past this has been because they also exclude, incorrectly, trades which are excluded from HLC prices. The Alpaca Open prices are also different. As mentioned, the SIPs do not give guidance on which trades to include/exclude in the open prices. I believe NASDAQ and Yahoo simply exclude any trades they would exclude in the HLC prices. Alpaca however also includes “Market Center Official Open” trades (condition Q) when determining the open prices. Let’s take a look at all the trades which occurred for CWBR on 2021-06-23 at open.

Notice Alpaca is picking up the Market Center Official Open of the NYSE Arca exchange as it’s open price. However, Yahoo and NASDAQ choose the NASDAQ OMX trade (which occurred .75 seconds later) as their open.

Hope that helps explain some of the data difference between data providers.

9 Likes

Thanks for the detailed explanations, learned a lot BTW.

Wouldn’t it be nicer to have the same exclusion/inclusion rules as Yahoo and Nasdaq? that would make Alpaca more acceptable by more users? because users usually compare the prices with Yahoo, to be honest. having the same rules will get Alpaca data more users in my opinion. just my 2 cents.

3 Likes

I second steveli’s suggestion - aside from the debate of what is the right/wrong way of determining the open price, why not use the same criteria as Yahoo, as this is the main benchmark for most users, and what is seen in most charting tools, like TradingView? I trade a strategy that relies on doji candle formations, and it can be challenging not to be able to visually verify the pattern in TradingView, that appears in alpaca data.

3 Likes

Not sure what is going on here with the intraday data but missing a lot data.
symbol = ‘SPY’
st = dt.datetime(2021, 12, 13, 9).astimezone(pytz.utc).isoformat()
ed = dt.datetime(2021, 12, 13, 14).astimezone(pytz.utc).isoformat()
df_min = api.get_bars(symbol, TimeFrame.Minute,st, ed).df
print(df_min)

Is there a data issue today doesn’t matter which stock I plug in there is missing historical bar, trade, and quote data? I can live with some data discrepancy but when 10%-15% of the data is missing (getting only 260 of the 300 bars expected) it renders the data useless. Anyone else running into this today?

I’m also having issues, starting from 17:40 UTC.

Firstly, this is a fantastic answer. Thank you, Dan :pray:

However, I am having some trouble with data:

1. Certain bars are missing. Here I want the last 50 1-minute bars for CEI (executed at 09:55 NY time)

datetime_now = dt.now(pytz.timezone('US/Eastern'))
bars = api.get_bars('CEI', '1Min', None, datetime_now.strftime('%Y-%m-%dT%H:%M:%SZ'), limit=50)
print(bars.df)

This is the result:

                            open    high  ...  trade_count      vwap
timestamp                                  ...                       
2021-12-21 09:00:00+00:00  0.9400  0.9400  ...            2  0.940000
2021-12-21 09:01:00+00:00  0.9400  0.9400  ...            9  0.939629
2021-12-21 09:02:00+00:00  0.9597  0.9597  ...            7  0.958237
2021-12-21 09:03:00+00:00  0.9595  0.9595  ...            6  0.958894
2021-12-21 09:04:00+00:00  0.9590  0.9594  ...           10  0.959283
2021-12-21 09:05:00+00:00  0.9594  0.9594  ...            9  0.957695
2021-12-21 09:06:00+00:00  0.9400  0.9400  ...           12  0.940000
2021-12-21 09:07:00+00:00  0.9400  0.9500  ...           19  0.935196
2021-12-21 09:08:00+00:00  0.9308  0.9308  ...            4  0.930649
2021-12-21 09:09:00+00:00  0.9400  0.9494  ...            8  0.947381
2021-12-21 09:10:00+00:00  0.9499  0.9499  ...            8  0.949618
2021-12-21 09:11:00+00:00  0.9494  0.9500  ...           14  0.949883
2021-12-21 09:12:00+00:00  0.9583  0.9590  ...            3  0.958887
2021-12-21 09:13:00+00:00  0.9590  0.9591  ...           13  0.957994
2021-12-21 09:14:00+00:00  0.9591  0.9650  ...           15  0.960255
2021-12-21 09:17:00+00:00  0.9642  0.9660  ...            5  0.963317
2021-12-21 09:18:00+00:00  0.9605  0.9605  ...            5  0.960015
2021-12-21 09:19:00+00:00  0.9600  0.9600  ...            4  0.960022
2021-12-21 09:21:00+00:00  0.9600  0.9600  ...            5  0.960000
2021-12-21 09:23:00+00:00  0.9600  0.9600  ...            1  0.960000
2021-12-21 09:24:00+00:00  0.9600  0.9600  ...            1  0.960000
2021-12-21 09:25:00+00:00  0.9685  0.9685  ...            5  0.960988
2021-12-21 09:26:00+00:00  0.9600  0.9600  ...           27  0.956413
2021-12-21 09:27:00+00:00  0.9591  0.9591  ...            2  0.959100
2021-12-21 09:28:00+00:00  0.9672  0.9672  ...            2  0.966979
2021-12-21 09:34:00+00:00  0.9600  0.9600  ...           18  0.959679
2021-12-21 09:35:00+00:00  0.9619  0.9619  ...            6  0.960558
2021-12-21 09:36:00+00:00  0.9591  0.9591  ...           11  0.958919
2021-12-21 09:37:00+00:00  0.9565  0.9565  ...           23  0.950265
2021-12-21 09:38:00+00:00  0.9500  0.9500  ...            7  0.950000
2021-12-21 09:39:00+00:00  0.9450  0.9523  ...           12  0.944038
2021-12-21 09:40:00+00:00  0.9559  0.9559  ...            2  0.956062
2021-12-21 09:41:00+00:00  0.9591  0.9591  ...            1  0.959100
2021-12-21 09:43:00+00:00  0.9527  0.9527  ...            4  0.950410
2021-12-21 09:44:00+00:00  0.9507  0.9507  ...            6  0.949947
2021-12-21 09:45:00+00:00  0.9406  0.9406  ...            1  0.940600
2021-12-21 09:46:00+00:00  0.9406  0.9406  ...            4  0.940522
2021-12-21 09:47:00+00:00  0.9468  0.9488  ...            2  0.948299
2021-12-21 09:49:00+00:00  0.9488  0.9532  ...            8  0.949277
2021-12-21 09:51:00+00:00  0.9590  0.9590  ...            1  0.959000

[40 rows x 7 columns]

This only goes back to 09:00, presumably as the earliest that Alpaca considers to be pre-market, you can see that there are several bars missing, e.g. 09:15-16, 09:20, 09:22, 09:29-33, 09:42, 09:48, 09:50, and 09:52-55 (when the request was called).

2. I ran this same code a few minutes before and got an error about my subscription not allowing data from the last 15 minutes. I’m no longer getting that error, so I can’t provide the exact message, but is this because I have only a free subscription? Or was it because I ran the command less than 15 minutes after the market opened?

~
Any advice would be greatly appreciated.

Hi

Regarding 1. I think you have some timezone mix-up. As I understand it, the pre-market data starts at 09.00 UTC, corresponding to 04.00 US eastern time, and the time in your “result” is UTC, so what you see is all pre-market data, which explains why you have missing bars (not trading happened). If you instead had run the code for the correct opening hours (14.30 UTC) you should not be missing any bars, unless of course not trading happened in the period.

I ran ‘get_bars’ for the same period and got the following, which clearly shows that the real action is happening between 14.30 and 21.00 UTC:

                               open      high  ...  trade_count      vwap
timestamp                                      ...                       
2021-12-21 09:00:00+00:00  0.940000  0.940000  ...            2  0.940000
2021-12-21 09:01:00+00:00  0.940000  0.940000  ...            9  0.939629
2021-12-21 09:02:00+00:00  0.959700  0.959700  ...            7  0.958237
2021-12-21 09:03:00+00:00  0.959500  0.959500  ...            6  0.958894
2021-12-21 09:04:00+00:00  0.959000  0.959400  ...           10  0.959283
...
2021-12-21 14:28:00+00:00  0.995500  0.997200  ...           33  0.995533
2021-12-21 14:29:00+00:00  0.996200  0.996200  ...           18  0.991649
2021-12-21 14:30:00+00:00  1.000000  1.000000  ...          123  0.998485
2021-12-21 14:31:00+00:00  0.999900  1.010000  ...          176  1.000984
2021-12-21 14:32:00+00:00  1.002400  1.010000  ...           75  1.002298
2021-12-21 14:33:00+00:00  1.008100  1.008700  ...          361  0.993364
2021-12-21 14:34:00+00:00  0.995200  0.995200  ...          114  0.989725
2021-12-21 14:35:00+00:00  0.989500  0.995400  ...          278  0.984770
2021-12-21 14:36:00+00:00  0.984000  0.985000  ...          431  0.976377
...
2021-12-21 20:57:00+00:00  0.880000  0.880000  ...          188  0.879170
2021-12-21 20:58:00+00:00  0.876000  0.878000  ...          246  0.875913
2021-12-21 20:59:00+00:00  0.875201  0.879900  ...          327  0.877178
2021-12-21 21:00:00+00:00  0.880400  0.888799  ...           49  0.880513
2021-12-21 21:01:00+00:00  0.885000  0.887500  ...           29  0.884815
2021-12-21 21:02:00+00:00  0.885000  0.887500  ...           22  0.885682

Regarding 2. I don’t have a proven answer, but if you only have a free subscription, you won’t be able to see data from the last 15 mins, so you probably ran in to that limitation somehow.

Additional question
I have an additional question regarding an issue which is also seen in your snippet. If I take ‘limit’ to be 50 and make a ‘get_bars’ at e.g. 09.20 US eastern time, I would expect to get the first 20 bars for that day (assuming no bars are missing), and then get the remaining 30 bars from the end of the prior day, but for some reason ‘get_bars’ stops at the current day, which is also seen in your snippet…

Can someone explain why and maybe come with a solution to make ‘get_bars’ continue to earlier days?

The reason for asking is that I would like to only specify some large ‘limit’ (and perhaps an ‘end’ time), and then get several days of data, but with these parameters I only get the data for the current day. For some reason I also have to specify a ‘start’ date to get the data for earlier days which is a bit annoying, since I then need to calculate when the start should at least be to get the correct set of bar data…

/Steven

Thank you for your reply . I used the fix provided by Keep getting ServerSelectionTimeoutError - #13 by Gorkem_Erdogan - MongoDB Atlas - MongoDB Developer Community Forums and it worked for me.

1 Like