Can't read SPY minutes

I’m getting the error when I try to read the latest SPY minutes:

Exception has occurred: APIError (note: full exception trace is shown but execution is paused at: _run_module_as_main)
{“message”:“subscription does not permit querying recent SIP data”}

I understand that the free plan doesn’t include SIP, but the SPY should be represented on the IEX.

I’m currently shopping around for a broker for algo trading. Unfortunately Alpaca doesn’t seem to have enough volume on the Crypto side, so I’m wondering if it could still make sense for securities. I’m not keen to sign up for a subscription just to test if Alpaca actually works and does what I need it to. Is there any way to “try before you buy”?

@Eric_Risser Due to licensing restrictions, the Free Market Data plan doesn’t include the most current 15 minutes of full market (ie SIP) data. Your query must have 1) had an end datetime within the last 15 minutes and 2) specified feed=sip (or defaulted to the SIP feed).

To avoid that error either 1) ensure the end datetime is less recent than 15 minutes from the current time or 2) explicitly specify feed=iex . It’s probably always best to include a feed parameter to be sure exactly which data one is getting (ie full market SIP data or single exchange IEX data) and not rely on the defaults.

One thing to note is subscribers to the Free plan can access full market SIP quotes, trades and bars if less recent than 15 minutes from the current time. That ‘historical’ data is identical to the paid plan. The paid plan simply removes that restriction and additionally offers full market ‘real time’ data. Depending upon your use case, the Free plan may work for you? If you wish to “try before you buy” send an email to support and ask for a coupon code for a 30 day free trial. Then, subscribe to the Paid plan and use that coupon code. You won’t get billed for the current month and the subscrition won’t renew if you cancel within 30 days. (Perhaps mention in the email that Dan Whitnable suggested you contact support.)

Thanks for the quick response. I’m loving the forum support!

Okay gotcha, looks like a had a couple incorrect assumptions. I thought I already was on the free plan, and I thought I could query up to the current time and the system would just omit the latest 15 minutes without throwing an error.

In terms of my use case, I’ve developed an ML model that’s achieving a very respectable per-bar profit factor on historical data. It’s got a great martin ratio with superiority confirmed through Monte-Carlo permutation tests. It appears robust as well, seems to work consistently on anything I throw at it. The problem is that it runs on minute bars, and my understanding is that live data at that frequency can be very noisy relative to the cleaned up historical data. Specifically, I know that there’s a couple seconds of lag before the latest minute bar is available, and then another 10 seconds where orders are still being logged and the bar is being updated.

My goal is to run my ML strategy through a live paper trade and better understand the various challenges that are introduced. When you say the latest 15 minutes are restricted, do you mean that you can only query historical data with the free plan, not “live” data? Or do you mean that the live data is given a 15 minute delay, so the bars behave just like the paid plan did 15 minutes ago? The latter scenario is fine for me, as I’m just trying to understand how noisy the live data actually is, and how that affects my strategies performance.

Thanks,
Eric

@Eric_Risser You mentioned a few things and I’l try to clarify.

“…live data at that frequency can be very noisy relative to the cleaned up historical data … there’s a couple seconds of lag before the latest minute bar is available, and then another 10 seconds where orders are still being logged and the bar is being updated.”

I wouldn’t say that live data is “very” noisy. There are typically less than .5% of trades which get updated or added to a bar after it is first calculated ~1 second after the close of the bar. Moreover, most of those ‘updates’ don’t impact the bar OHLC values (ie the update wasn’t an open, high, low, or close trade).

That said, some of the updates can have an ‘outsized’ impact on some algos. Why? A couple of common updates are either 1) a derivatively priced or other “non-market” trade originally has the incorrect trade conditions and is included in the bar calcs or 2) a trade is executed at a price outside of the NBBO but later is adjusted with a ‘price improvement’ by the broker. These appear in the original data as outliers. If one’s algo is specifically looking for “outliers” or anomalies, these will occur much more frequently in live trading than simulated with historical data. If however ones algo is looking for trends and doesn’t trigger from a single anomaly, any bar updates typically aren’t impactful.

Another area where updates can occur with higher frequency are extended hour trades. For several reasons, historical extended hour bars can look different from live bars (ie not updated). Moreover, a few updated bars can have an outsized impact simply because there are many fewer trades during extended hours. If one’s algo is specifically looking at extended hour bars, the results in live trading can be quite different from tests with historical bars.

> “do you mean that the live data is given a 15 minute delay?”

The term “15 minute delayed data” is often used by data providers, but as you noted can be confusing. There isn’t really a ‘delay’. A more accurate description is current full market SIP data (ie data within the last 15 minutes) is simply restricted. One cannot access it without a paid subscription. One can access any SIP data older than 15 minutes, just not current real-time data. It’s not exactly delayed but rather ‘blocked’. This is due to licensing terms imposed by the exchanges.

This restriction makes testing any algo which relies on ‘real time’ data for trade decisions difficult. The simple fix is to pay for a full market data subscription. This however (understandably) can be cost prohibitive for a developer just doing initial debugging. Therefore, as an option, Alpaca offers data from the IEX exchange. They are the single exchange which doesn’t impose the 15 minute restriction on their data. So, one can debug and test ones algo ‘real time’ by fetching IEX data without paying for a data subscription. The big caveat of course is the data reflects only a subset of full market trades and can at times be quite different. It’s provided primarily to debug one’s code and not for testing the efficacy of ones strategy.

Hope that helps.

Thanks again for the detailed clarifications. Everything makes perfect sense now and I have a good idea how best to proceed.

While I’ve explored some trend following and mean reversion strategies, my background is machine learning so I’m mostly exploring methods that look for repeating patterns that can offer some amount of short term prediction. This is a tricky problem because the enemy of any ML approach is noise and the market is mostly noise. That’s reassuring to know how little additional noise is introduced by live trading. In general outliers shouldn’t be a problem, by their own definition they do not form part of a pattern, and the chance they would trigger a false positive seems unlikely. Still… assumptions often come back to bite you in ML, so best to just try it out and see what happens.

As advised I’ll debug using IEX and take you up on the 30 day trial when the system appears to be “working”.

Thanks again!

1 Like

Hi Dan,

I’ve gotten my strategy running live using Market Orders with the IEX feed and I’m happy to say that the live bars and trading signals match the historical backtest I run the next day, so looks like latency/bar accuracy is all good.

Obviously Market Orders aren’t great, so I’m trying to implement a dynamic limit order strategy, which means I need to be continuously gathering real time quote data asynchronously so I can choose the limit price around the bid-ask spread and market depth.

I used the alpaca-py API to get the historical and latest minute bars, specifically I used:

from alpaca.data.historical import StockHistoricalDataClient

and thought it worked great. So to get real time quote data I decided to go with

from alpaca.data.live import StockDataStream

with the actual code:

        stream = StockDataStream(appKeyPaper, appSecretPaper)
        stream.subscribe_quotes(self.handle_quote, self.symbol)
        stream.run()

and while this does work, it blocks execution of the program. I’ve just spent the day trying to refactor my code to create a master event loop so I can move calls to StockDataStream into its own asynchronous function that won’t block the program. Unfortunately the stream.run() command doesn’t seem to be playing nice with the main even loop. I’m no python synchronization expert, and I’m not really keen to become one, so thought I’d check with you if any of this is actually necessary. Is there an easy way to get an asynchronous quote stream going? I see that there are a bunch of different ways to get market data from Alpaca, multiple Python APIs, REST, Websocket, etc. I’m not sure if those are old defunct solutions you’ve pivoted away from, or if they’re just different tools for different jobs?

In any case, could you point me in the right direction please? Again, many thanks for your time and patience!

@Eric_Risser Glad to hear you have your algo up and running live. Great accomplishment.

I would suggest, if possible, stick with placing REST calls to fetch quotes and not streaming them. Here are some of the reasons in no particular order:

  • Count on your streamed data to occasionally drop a connection (for various reasons). This could be network issues or your algo not being able to process data fast enough.

  • It takes a finite time, maybe over a minute, to even tell if there is an issue with the data stream. Error checking, and recovery, is extra coding. Error checking and retry for API calls, on the other hand, is very straightforward.

  • It’s not always obvious, or easy to spot if one’s code is dropping data from either not ‘keeping up’ or some network errors. Websockets isn’t inherently ‘self correcting’.

  • Unless one’s algo is ok with missing data, one needs to implement a method to ‘backfill’ using the REST APIs anyway. Why not simply implement this API process initially and don’t stream to start?

  • Websockets are not supported on many cloud infrastructures (for example Google cloud functions) and therefore limit scalability and architecture options. The basic issue is streaming requires an algo to be ‘stateful’ while many architectures require algos to be ‘stateless’ (so they can be easily started and restarted).

  • One only gets a single websocket connection with each market data subscription. Running both a live algo and then testing/debugging a second (or third) algo is problematic.

  • Getting some data, for example the last trade price, by streaming requires considerable code. At a minimum, one needs to store the data internally, but then as each new trade is received, one must parse the trade conditions to determine if it’s a ‘valid trade’. This same info can be easily fetched with a single API call. No storage and no logic needed. Other data can similarly be easier to get using the APIs.

  • Storing streamed data can be problematic especially bar data. Streamed bar data can be revised 30 seconds after the initial bar is streamed to account for ‘slow’ arriving trades. One needs to check and handle those updates. Bar data fetched with the APIs is updated by Alpaca so this isn’t necessarily. Also, if one is storing bar data overnight, adjusting those prices for splits is a challenge. Again, the APIs do that for you.

Of course everyone’s situation is different, but perhaps get things working with REST first.

In your case, it sounds like you are already making the decision to open or close a position without streamed data, and you only need the quotes to “choose the limit price around the bid-ask”. The easiest… simply call the latest_quotes or the snapshots endpoint immediately before submitting your order. That is the same data which is streamed. No need to implement any async code.

That’s my ‘2 cents’ as they say.