EOD Data via get_stock_snapshot dictionary

My goal is simply to get end of day OHLC and volume data from the pricing API to update on daily basis a larger historic database of all the securities I have already pulled from the Alpaca API. I am able to get the data in what appears a nested dictionary using this code:

def multiquotes():
from alpaca.data.historical import StockHistoricalDataClient
from alpaca.data.requests import StockSnapshotRequest

# keys required for stock historical data client
client = StockHistoricalDataClient(ALPACA_API_KEY, ALPACA_SECRET_KEY)

# multi symbol request - single symbol is similar
multisymbol_request_params = StockSnapshotRequest(symbol_or_symbols=["SPY", "TSLA"])

latest_multisymbol_quotes = client.get_stock_snapshot(multisymbol_request_params)
print(latest_multisymbol_quotes)  

The issue is it seems to return a large nested dictionary like so

{‘SPY’: { ‘daily_bar’: { ‘close’: 446.46,
‘high’: 447.68,
‘low’: 445.08,
‘open’: 446.1,
‘symbol’: ‘SPY’,
‘timestamp’: datetime.datetime(2023, 9, 13, 4, 0, tzinfo=datetime.timezone.utc),
‘trade_count’: 9198.0,
‘volume’: 1021666.0,
‘vwap’: 446.589737},
‘latest_quote’: { ‘ask_exchange’: ‘V’,
‘ask_price’: 449.21,
‘ask_size’: 10.0,
‘bid_exchange’: ‘V’,
‘bid_price’: 448.03,
‘bid_size’: 10.0,
‘conditions’: [‘R’],…

But normal code to work with dictionaries fails:
open_prices = {}

# Iterate through each symbol's data
for symbol, symbol_data in latest_multisymbol_quotes.items():
    # Get the 'open' price from the 'daily_bar' section
    open_price = symbol_data['daily_bar']['open']
    # Store the open price in the dictionary with the symbol as the key
    open_prices[symbol] = open_price

because I am getting this error
open_price = symbol_data[‘daily_bar’][‘open’]
TypeError: ‘Snapshot’ object is not subscriptable

What is the easiest way to extract this info or convert the snapshot object to Pandas and then deal with it there?

Thanks

@KentP I use two approaches to ‘unpack’ the snapshots object and put the data into a dataframe. Both approaches leverage list comprehension and the magic of Pandas DataFrame. One issue with the alpaca-py SDK is that it turns everything into objects which are cumbersome to work with (the alpaca_trade_api doesn’t do this). So, you can instruct the data_client to NOT turn things into objects by setting the raw_data=True parameter when instantiating the client. Like this

client = StockHistoricalDataClient(ALPACA_API_KEY, ALPACA_SECRET_KEY, raw_data=True)
snapshots = client.get_stock_snapshot(StockSnapshotRequest(symbol_or_symbols=symbols)

If one has a limited number of fields to extract, I find it easiest to explicitly specify each field. Something like this

snapshot_df = pd.DataFrame(data=[{
                  'latest_price': snapshot['latestTrade']['p'],
                  'latest_bid': snapshot['latestQuote']['bp'],
                  'latest_close': snapshot['dailyBar']['c'],
               } for snapshot in snapshots.values() if snapshot],
               index=[symbol for symbol, snapshot in snapshots.items() if snapshot])> ```

This creates a dataframe like this

If one has a lot of fields to extract, it may be easiest to convert the entire snapshot to a dataframe. The downside of course is the dataframe can get quite large. Something like this

#get separate dataframes
trades = pd.DataFrame([snapshot['latestTrade'] for snapshot in snapshots.values() if snapshot],
               index=[symbol for symbol, snapshot in snapshots.items() if snapshot])
quotes = pd.DataFrame([snapshot['latestQuote'] for snapshot in snapshots.values() if snapshot],
               index=[symbol for symbol, snapshot in snapshots.items() if snapshot])
min_bars = pd.DataFrame([snapshot['minuteBar'] for snapshot in snapshots.values() if snapshot],
               index=[symbol for symbol, snapshot in snapshots.items() if snapshot])
daily_bars = pd.DataFrame([snapshot['dailyBar'] for snapshot in snapshots.values() if snapshot],
               index=[symbol for symbol, snapshot in snapshots.items() if snapshot])
prev_daily_bars = pd.DataFrame([snapshot['prevDailyBar'] for snapshot in snapshots.values() if snapshot],
               index=[symbol for symbol, snapshot in snapshots.items() if snapshot])

#add a prefix to each column label since some labels are duplicated
trades = trades.add_prefix('trade_')
quotes = quotes.add_prefix('quote_')
min_bars = min_bars.add_prefix('min_bar_')
daily_bars = daily_bars.add_prefix('daily_bar_')
prev_daily_bars = prev_daily_bars.add_prefix('prev_daily_bar_')

#now concatenate these 5 dataframes into one
snapshot_df = pd.concat([trades, quotes, min_bars, daily_bars, prev_daily_bars], axis=1)

This creates a dataframe like this

That may give you some ideas.

1 Like

Thanks very much

| Dan_Whitnable_Alpaca Alpaca Developer Relations
September 14 |

  • | - |

@KentP I use two approaches to ‘unpack’ the snapshots object and put the data into a dataframe. Both approaches leverage list comprehension and the magic of Pandas DataFrame.

If one has a limited number of fields to extract, I find it easiest to explicitly specify each field. Something like this

snapshot_df = pd.DataFrame(data=[{
              'latest_price': snapshot.get('latestTrade').get('p'),
              'latest_bid': snapshot.get('latestQuote').get('bp'),
              'latest_close': snapshot.get('dailyBar').get('c'),
               } for snapshot in snapshots.values()],
             index=[symbol for symbol in snapshots.keys()])

This creates a dataframe like this

If one has a lot of fields to extract, it may be easiest to convert the entire snapshot to a dataframe. The downside of course is the dataframe can get quite large. Something like this

#get separate dataframes
trades = pd.DataFrame([values.get('latestTrade') for values in snapshots.values()],
              index=[symbol for symbol in snapshots.keys()])
quotes = pd.DataFrame([values.get('latestQuote') for values in snapshots.values()],
              index=[symbol for symbol in snapshots.keys()])
min_bars = pd.DataFrame([values.get('minuteBar') for values in snapshots.values()],
              index=[symbol for symbol in snapshots.keys()])
daily_bars = pd.DataFrame([values.get('dailyBar') for values in snapshots.values()],
              index=[symbol for symbol in snapshots.keys()])
prev_daily_bars = pd.DataFrame([values.get('prevDailyBar') for values in snapshots.values()],
              index=[symbol for symbol in snapshots.keys()])

#add a prefix to each column label since some labels are duplicated
trades = trades.add_prefix('trade_')
quotes = quotes.add_prefix('quote_')
min_bars = min_bars.add_prefix('min_bar_')
daily_bars = daily_bars.add_prefix('daily_bar_')
prev_daily_bars = prev_daily_bars.add_prefix('prev_daily_bar_')

snapshot_df = pd.concat([trades, quotes, min_bars, daily_bars, prev_daily_bars], axis=1)

This creates a dataframe like this

That may give you some ideas.

1 Like

Thanks I have been able to get the quotes and parsing to work. The issue I am running into is that when you feed the get_stock_snapshot a list with many symbols say 10 or 200, if even one symbol cannot fetch data, none of the data for the other 9 or 199 is returned, which means that every symbol has to be fed to the engine with error handling like this, which isn’t very time efficient:

or symbol in all_symbols:
try:
data = get_stock_data_for_symbols([symbol]) # Fetch data for individual symbol
if data: # Check if the returned list is not empty
df = df._append(pd.DataFrame(data, columns=[“Symbol”, “Open”, “Close”, “Volume”]), ignore_index=True)
except Exception as e:
print(f"Error fetching data for symbol {symbol}: {e}")

@KentP I revised my code above to include an if snapshot filter on the DataFrame methods. Verify you included that and add if not. That will gracefully handle if an entire snapshot is missing. I’m not sure the impact if a snapshot is present but then one or more of the 5 data groups is missing?

Anyway, one thing at a time. Checking if the snapshot exists will fix an entire symbol not responding.

1 Like

Thanks that helps. My confusion now is focused on Alpaca EOD data quality. I am finding off market vs Yahoo, Nasdaq and BBG all of whom have same Open and Closing prices for multiple securities, which is weird, so am considering another API source even if I trade at Alpaca.

@KentP Could you provide an example (or two) of a discrepancy you see in he data. I can check it out.

1 Like

These are the numbers pulled from daily_bar at 18:30 EST for AAPL
prices shown are Open Close Volume
AAPL 176.48 174.95 1,292,908
The open is right 175.01 is the price I get elsewhere and the volume is obviously too low
A 114.395 115.89 225834
Yahoo has 114.43 115.91 and higher volume. If daily_bar is not teh place to get EOD OHLC and Vol data please point me to another source

1 Like

@KentP You must be using the Basic (free) Alpaca market data for the snapshot endpoint? That only reflects trades, quotes, and bars executed on the IEX exchange. The data (especially volume) will typically be much different than full consolidated market data provided by the Algo Trader Plus subscription.

For example, this is the daily_bar data you are seeing

snapshot = api_data.get_snapshot(symbol, feed=‘iex’)

BarV2({ ‘c’: 174.95,
‘h’: 176.48,
‘l’: 173.825,
‘n’: 13640,
‘o’: 176.48,
‘t’: ‘2023-09-15T04:00:00Z’,
‘v’: 1292908,
‘vw’: 174.890334})

This however, is the daily_bar when using the full market (SIP) data with the Algo Trader Plus subscription.

snaphot = api_data.get_snapshot(symbol, feed=‘sip’)

BarV2({ ‘c’: 175.01,
‘h’: 176.495,
‘l’: 173.82,
‘n’: 722788,
‘o’: 176.48,
‘t’: ‘2023-09-15T04:00:00Z’,
‘v’: 79290990,
‘vw’: 175.114879})

Which matches the Yahoo data below.

You may be able to get the data you need, however with a free subscription. The Basic (free subscription) doesn’t fetch current SIP data and must wait 15 minutes. It does however fetch current IEX data. By definition, the snapshot endpoint always fetches the most current data so it only returns IEX data for the Basic subscription (one cannot get full SIP data). The workaround is to use the historical data endpoints. For example, use the bars endpoint to fetch todays daily bar. As long as the end time is not within 15 minutes of the current time, the Basic subscription will fetch full market SIP data.

As an example, to fetch todays bar

symbols = [‘AAPL’]

start_time = pd.to_datetime(“2023-09-15 00:00:00”).tz_localize(‘America/New_York’)
end_time = pd.to_datetime(“2023-09-15 16:00:00”).tz_localize(‘America/New_York’)

bars = api.get_bars(symbols,
‘1Day’,
start=start_time.isoformat(),
end=end_time.isoformat(),
adjustment=‘all’,
feed=‘sip’,
limit=10000,
).df.tz_convert(‘America/New_York’)

Will return the following (with either the Basic (free) or paid subscription.

Notice this matches the Yahoo data above.

The same can be done to get historical trades and quotes.

1 Like

Thanks very much for the help

1 Like