Clarificaton about streaming trade data

hello,

Could someone clarify what “size” represents from streaming trade data? (see the image below)

Is it the number of shares that were just traded? Or something else? When I add them up for a 1 minute window it does NOT equal the volume data shown by either google or yahoo finance for that same time period. So what does “size” actually mean?

one of these i think, depending on the stream

  • Trade Size: The number of shares traded in a single transaction.
  • Quote Size: The number of shares available for sale (ask size) or purchase (bid size) at a given price level.
  • Message Size: The amount of data in a single message or data packet transmitted via the API.

I thought it was Trade Size also, but as I mentioned above, adding all the trade sizes in a 1 minute window does not equal the volume you get by calling:

api.get_bars_iter(“NVDA”, TimeFrame.Minute,“2024-06-03”, “2024-06-03”, adjustment=‘raw’)

for that same 1 minute period. If it was the shares traded, then it should equal the volume for the same time. As you can see in the image above, the trades are all small, e.g. 15, 5 and a bunch of 1’s.

Why are they not equal? What am I missing here?

thats above me, but ive heard of phantom stock shares

@Artic The size attribute is the quantity of shares in the trade. Summing them over a minute will equal the minute bar volume. There are a couple of caveats 1) ensure you group and sum over the trade timestamps and not by when you receive them and 2) there are 3 trade conditions excluded from the bar volume calculation (M, Q, and 9) don’t include those in the bar volume. Additionally, be aware that trades are sometimes ‘corrected’. Those corrections are reflected in the bar calculations but not typically the trades.

Below is some sample code to calculate volumes from trades vs bar volumes. In this particular case the volumes match. Because of the occasional updated trade you may see small variances if resampling other times.


symbol = 'NVDA'
start = pd.to_datetime('2024-06-03 10:30:00').tz_localize('America/New_York')
end = pd.to_datetime('2024-06-03 11:00:00').tz_localize('America/New_York')

bars = (client.get_stock_bars(StockBarsRequest(
                                  symbol_or_symbols = symbol,
                                  timeframe = TimeFrame.Minute,
                                  start = start,
                                  end = end))
                                  .df
                                  .tz_convert('America/New_York', level='timestamp')
                                  .reset_index('symbol'))

trades = (client.get_stock_trades(StockTradesRequest(
                                  symbol_or_symbols = symbol,
                                  start = start,
                                  end = end))
                                  .df
                                  .tz_convert('America/New_York', level='timestamp')
                                  .reset_index('symbol'))

calculated_sizes = trades.resample('1T')['size'].sum()

bars['calculated_size'] = calculated_sizes
bars['calculated_diff'] = bars.calculated_size - bars.volume

Here’s the result. Note the calculated_diff column is all 0.

Not sure why your calculations do not match other sources, but they should be identical to Alpaca unless they are not filtering the trades properly. In that case you would see lower volume on other sources.

1 Like