Incorrect historical Trades info using SDK

Hello,
I just downloaded the latest Alpaca python SDK and tried to retrieve historical data for TSLA stock. I am able to get the data but the time is somehow terribly off. I am not a python programmer but the code is very simple as below:

client = StockHistoricalDataClient('api_key', 'secret_key')

stkReq = StockTradesRequest(
    symbol_or_symbols=["TSLA"],
    start=datetime(2023, 9, 5, 9, 30, 0, 0, pytz.timezone('America/New_York')),
    end=datetime(2023, 9, 5, 9, 30, 59, 999999, pytz.timezone('America/New_York'))
    )

trades = client.get_stock_trades(stkReq)
print(trades.df)

So, I was expecting first minute of trade data on Sept 5 2023 at open (9:30:00 am EST). What I got was data from 10:26:00 am EST (14:26:00 UTC). That’s just a weird time offset that doesn’t make sense. I did double check that price at 10:26:00 EST was precisely what is shown in the output below.

                                        exchange     price  ...  conditions  tape
symbol timestamp                                            ...
TSLA   2023-09-05 14:26:00.006475+00:00        P  253.1800  ...         [@]     C
       2023-09-05 14:26:00.006620+00:00        Z  253.1800  ...      [@, I]     C
       2023-09-05 14:26:00.009469+00:00        Q  253.1700  ...         [@]     C
       2023-09-05 14:26:00.055273+00:00        D  253.1799  ...         [@]     C
       2023-09-05 14:26:00.128756+00:00        D  253.1800  ...      [@, I]     C
...                                          ...       ...  ...         ...   ...
       2023-09-05 14:26:59.973602+00:00        D  252.8700  ...      [@, I]     C
       2023-09-05 14:26:59.973744+00:00        D  252.8700  ...      [@, I]     C
       2023-09-05 14:26:59.974710+00:00        D  252.8700  ...      [@, I]     C
       2023-09-05 14:26:59.974778+00:00        D  252.8700  ...      [@, I]     C
       2023-09-05 14:26:59.974858+00:00        D  252.8700  ...      [@, I]     C

[6463 rows x 6 columns]

Any thoughts what might be happening?
Thanks.

@AmarDes The issue is with pytz. Basically it doesn’t work. The documentation even has a caution about this

Unfortunately using the tzinfo argument of the standard datetime constructors ‘’does not work’’ with pytz for many timezones.

This can be seen by simply displaying the timezone info

   tz = pytz.timezone('America/New_York')
   display(tz)

The result is

  <DstTzInfo 'America/New_York' LMT-1 day, 19:04:00 STD>

This indicates it subtracts 1 day (ie 24 hours) from UTC then adds 19:04:00. In other words, it subtracts 05:04:00 from UTC. The correct offset is 04:00:00. That is exactly what you are seeing with the first time in your dataframe of 14:26:00 but which should be 13:30:00 UTC or 09:30 Market time. You are seeing an incorrect offset applied by pytz.

The fix is to not use pytz. A much more user friendly approach (which always provides the correct datetimes) is to use the Pandas to_datetime method. Something like this

start=pd.to_datetime('2023-09-05 09:30:00').tz_localize('America/New_York'),
end=pd.to_datetime('2023-09-05 09:30:59.999999').tz_localize('America/New_York')

The to_datetime method is very flexible, but I like it simply because one can specify the time in human readable form.

So change your code to be something like this

stkReq = StockTradesRequest(
    symbol_or_symbols=["TSLA"],
    start=pd.to_datetime('2023-09-05 09:30:00').tz_localize('America/New_York'),
    end=pd.to_datetime('2023-09-05 09:30:59.999999').tz_localize('America/New_York')
    )

trades = client.get_stock_trades(stkReq)

That will give you the trades you expected. However, all market data APIs return data in UTC. Even if one specifies the time in some other timezone, the output will always be UTC. I find it easiest to alway convert to market time. This can be done like this.

trades_in_market_time = trades.df.tz_convert(‘America/New_York’, level=‘timestamp’)

The result is probably more what you expected

1 Like

Thank you Dan for pointing this out. I tried your suggestion and it works as expected.
On a separate note, the opening price shown here vs. what’s available say on sites like yahoo/Interactive brokers, differ a bit. Is it because Alpaca shows historical price only from few exchanges? Would this be the same for live data as well? I do sometimes deal in pennies so it kind of makes a difference for my trading strategies.

@AmarDes The Alpaca daily open price (as well as the high, low, and close) for TSLA on 2023-09-05 matches Yahoo. In general Alpaca’s data matches all other providers because they are all sourced from the same Securities Information Processors (SIPs). See the screenshots below.

.
.

I believe however, your question is why the first trade of the day isn’t the open trade? The simple answer is not all trades are included when determining OHLC data. The SIPs provide guidelines for which trades to include for each calculation. Different trades are included for the Open Price, High/Low, Close, and Volume based upon the trade’s conditions. Additionally, different trades are included in minute vs daily bar calculations. There is an article here which explains some of the details.

So let’s look at the first 30 trades for TSLA on 2023-09-05 after 09:30 (ie when markets open).
.

Trade number 25 is used as the “Open” with a price of 245.00. Why? That is the first ‘valid’ or ‘non-excluded’ trade after markets opened. All previous trades had a trade condition of either 1) I which indicates an ‘odd lot’ trade less than 100 shares 2) T which indicates a non-market hour trade or 3) Q a ‘Market Center Open’ which isn’t a real trade but simply a data posting restating the previous trade. Those trades are all excluded from being used as an open price. The first ‘non-excluded’ trade (ie trade 25) is used as the Open.

1 Like

Thanks much Dan… learned something new about the SIP guidelines and trade conditions! The first 2 trade conditions (F&I, F&T) in your above screenshot appears to be conflicting codes, i.e. F states to update OHLC but I&T says not to. So, the not-to-include codes have a higher precedence then?
Is there a feed option in API to only show “non-excluded” trades?

BTW, just to confirm, the “tape” column denotes exchange codes, i.e. “c” being for National Stock Exchange?

@AmarDes If any condition excludes a trade then that trade is excluded. There isn’t an option to show only “non-excluded” trades. Partly because the excluded trades depend upon which value (ie Last, HIGH/Low, Volume) you are looking at. Minute bars also exclude slightly different conditions than daily bars.

The ‘tape’ is used to determine which trade condition guidelines to follow. There are two SIPs each with very similar but not identical guidelines - the Consolidated Tape Association (CTA) and The UTP Plan. Look at the CTA guidelines if the tape is A or B. Look at the UTP guidelines if the tape is C. The ‘tape’ is an indicator of the assets ‘listing exchange’ and is fixed per asset (ie it doesn’t change per trade).

1 Like