Hi, I’m just trying to start using the API and have got stuck on the first hurdle. The following request just seems to hang forever, it never returns. Should this work?
from alpaca.data import StockHistoricalDataClient
from alpaca.data.requests import StockQuotesRequest
from alpaca.data.timeframe import TimeFrame
from alpaca.data.enums import DataFeed
from datetime import datetime
client = StockHistoricalDataClient(key, secret)
req = StockQuotesRequest(
symbol_or_symbols = ["AAPL"],
start = datetime(2023, 10, 1),
end = datetime(2023, 10, 3),
feed = DataFeed.IEX
)
quotes = client.get_stock_quotes(req)
print(quotes)
edit: I think it’s just returning too many results and is actually taking that long to process! I need to use the limit parameter
@pw12345 The code you have should work. The issue, as you surmised, is there are A LOT of quotes which take awhile to download. However, a couple of things. First, printing ‘quotes’ doesn’t result in anything meaningful because ‘quotes’ is an object. It’s much more meaningful to convert it into a dataframe using the .df method. The second comment i’d make is to always use timezone aware datetimes and always explicitly set the timezone to 'America/New_York. That way there is no confusion about what exact time one wants. I highly recommend using the pandas time methods over datetime because they are more flexible. Additionally, in general do not specify the feed and use the defaults based upon your data subscription. Why? By specifying IEX one only gets IEX data which is not full market data and probably not what you want. A free data plan will allow full market data as long as it’s not the most recent 15 minutes.
So, something like this may work better.
!pip install -q alpaca-py
from alpaca.data import StockHistoricalDataClient
from alpaca.data.requests import StockQuotesRequest
from alpaca.data.enums import DataFeed
import pandas as pd
ALPACA_API_KEY = 'xxxxx'
ALPACA_API_SECRET_KEY = 'xxxxx'
client = StockHistoricalDataClient(ALPACA_API_KEY, ALPACA_API_SECRET_KEY)
req = StockQuotesRequest(
symbol_or_symbols = ["AAPL"],
start = pd.to_datetime('2023-11-03 10:30').tz_localize('America/New_York'),
end = pd.to_datetime('2023-11-03 10:35').tz_localize('America/New_York'),
)
quotes = client.get_stock_quotes(req).df
print(quotes)
I got this working, and spitting out a lot of data. 2 issues:
For my use case I only need 1 quote for every minute. I tried using the timeframe parameter but it’s still spitting out a lot of quotes from many exchanges. How do I limit it to 1 quote for every minute if possible? Or do I need to download it all and filter myself? In this case, should I use a specific exchange?
@Joseph Unfortunately, there isn’t a direct method to getting a downsampled set of quotes. All the endpoint delivers is all quotes between two times. One would need to load all the quotes and downsample locally with resample or some other approach. This is unfortunate because it makes it very difficult actually using historical quote data because of the shear quantity needed to download.
As far as your second question “I seem to be getting the wrong data when I look at quotes from 2017 for AAPL” why is it you feel it’s “wrong”? Quotes are always raw unadjusted prices. Could it be that you are comparing it to adjusted price data?
In this case, would asking for a specific exchange trim down the volume of data? If so, which exchange?
Regarding wrong data: Ask_price for AAPL was not 171.35 back in November 2017. Yahoo shows it to be around the $43 range. I’m pretty sure 171.35 is a recent price.
@Joseph Regarding wrong data, the issue is Yahoo results are adjusted for stock splits. One would have actually paid $171 per share of AAPL back in November 2017. In the case of AAPL there was a 4:1 split on August 28, 2020. That split effectively reduced the ‘adjusted’ price by 1/4 which is the $43 you see on Yahoo. One can fetch both split adjusted and actual raw prices using the Alpaca API. Use the parameter adjustment=split or adjustment=raw. Below are those prices split and actual/raw respectively, Note however that the trade and quote data is only raw unadjusted data. Only bars are ever adjusted.
Regarding reducing the volume by specifying an exchange, one can specify feed=iex which will limit quotes to only those orders placed on the IEX exchange. This will reduce the quotes but also provide a rather distorted set of data. Many symbols do not trade much on the IEX exchange. The quotes can be many minutes (or hours) old.
Depending upon your use case, you may be able to use estimates of the bid ask spread. Technically, one could calculate average bid-ask spreads directly from quote data. However, the practical problem one soon realizes is the HUGE amount of data one needs to fetch. Trying to, for example, download the daily quotes for a single large volume stock like SPY takes almost 8 minutes (4M+ rows). Trying to download quote data for 100 or 1000 stocks would be practically impossible. Fortunately, there are some excellent approaches to estimating bid-ask spreads using only bar data. The one I like is the Ardia, Guidotti, Kroencke method. It is more accurate than another popular approach Corwin Shultz method especially for small spreads. There is a great python package called bidask which uses this former approach. Using bid-ask It takes about 350ms to fetch data (minute bars) and calculate a days bid-ask spread for SPY and is implemented in a single line of code (once one has the bar data). Something like this