Data is delayed by 22 minutes?

When running a very basic query for 1 minute data the data that is being returned is up to 22 minutes (sometimes longer) delayed. Is this normal?

The first datetime is the current local datetime, the second is the datetime being returned by Alpaca along with the price.

barset= self.api.get_barset( ‘DIA’, ‘1Min’, limit = 18 )

local: 2020-10-02 10:57:15.888914-04:00
alpaca: 2020-10-02 10:39:00-04:00 334.13

local: 2020-10-02 10:57:17.290897-04:00
alpaca: 2020-10-02 10:39:00-04:00 334.13

local: 2020-10-02 10:57:18.778982-04:00
alpaca: 2020-10-02 10:39:00-04:00 334.13

local: 2020-10-02 10:57:20.086349-04:00
alpaca: 2020-10-02 10:39:00-04:00 334.13

I’ve never seen data delayed 20 minutes though it often is 2-3 minutes? Here is a little code snippet (python) I have used to check that very same concern

from datetime import datetime
from pytz import timezone

barset= api.get_barset( 'DIA', '1Min', limit = 5 ).df
display(barset)

market_timezone = timezone('America/New_York')
current_time = datetime.now(market_timezone)
display('{:%Y-%m-%d %H:%M:%S} is the current time'.format(current_time))

last_bar_time = barset.index.max()
delta_time = current_time - last_bar_time
delta_time_sec = delta_time.total_seconds() % 60
delta_time_min = delta_time.total_seconds() // 60
display('delta time is {:.0f} min {:.0f} sec'.format(delta_time_min, delta_time_sec))

The result is typically something like this

DIA

time open high low close volume
2020-10-02 11:40:00-04:00 276.070 276.070 276.070 276.070 375
2020-10-02 11:41:00-04:00 276.100 276.160 276.100 276.140 1100
2020-10-02 11:42:00-04:00 276.220 276.310 276.220 276.310 333
2020-10-02 11:43:00-04:00 276.240 276.240 276.240 276.240 100
2020-10-02 11:47:00-04:00 276.403 276.403 276.403 276.403 263

2020-10-02 11:52:23 is the current time
delta time is 5 min 24 sec

In this case the delay was over 5 minutes. However, note that no data will be returned if there were no trades in a given minute. I’m surprised this happens with DIA but notice in the above example the bars skip from 11:43:00 to 11:47:00. This implies there were no trades for that 4 minute gap. This may actually be the reason for the 5 minute delay. I typically see 2-3 minutes.

One could also use the Polygon version

barset = api.polygon.historic_agg_v2('DIA', 1, 'minute', current_time, current_time).df

My experience is the latency is a little lower but still often 2-3 minutes. Though that may simply be because Polygon has ‘denser’ data? To get the most up to date data very quickly one can fetch all the trades and aggregate them yourself.

last_trades = api.polygon.historic_trades_v2('DIA', current_time.date()).df

That may be the most ‘foolproof’ method albeit a bit more effort.

Hey Dan thanks for the response! I implemented some of your time code to double check and I was in-fact using the first returned record in the data set instead of the last, which explain the 22 minute delay.

With that said, with DIA I am still see a delay over 3 minutes. Looking at ThinkOrSwim (my stock trading platform) I can easily see Time and Sales showing trades within that time period, 100’s in fact.

As you can see from the data below, after 3 minutes a new alpaca.bar is produced but it’s already over a minute old from the current time.

Not terribly bad, but 3 minutes is a lot when once is expecting 1Min data…

0 days 00:02:56.204460 277.13
0 days 00:02:57.727826 277.13
0 days 00:02:59.416921 277.13
0 days 00:03:00.924555 277.13
0 days 00:03:02.734181 277.13
0 days 00:03:04.323960 277.13
0 days 00:03:05.941439 277.13
0 days 00:03:07.358110 277.13
0 days 00:03:08.836316 277.13
0 days 00:01:10.238462 277.31
0 days 00:01:11.817407 277.31
0 days 00:01:13.516082 277.31
0 days 00:01:14.945051 277.31
0 days 00:01:16.434369 277.31
0 days 00:01:18.549853 277.31
0 days 00:01:20.119851 277.31

        now = pd.Timestamp.now(tz='America/New_York')

        barset= self.api.get_barset( ticker, '1Min',  limit = 10 )

        last_bar_time = barset.df.index.max()

        delta_time = now - last_bar_time

        print(delta_time,barset[ticker][len(barset[ticker]) -1].c)

Here is a cleaner output of DIA on 1Min charts from a Dataframe… shows a 7 minute delay at 2020-10-02 13:28:00…

2020-10-02 13:10:00-04:00 278.250
2020-10-02 13:11:00-04:00 278.170
2020-10-02 13:13:00-04:00 277.880
2020-10-02 13:15:00-04:00 277.851
2020-10-02 13:16:00-04:00 278.130
2020-10-02 13:19:00-04:00 278.110
2020-10-02 13:23:00-04:00 278.230
2020-10-02 13:27:00-04:00 278.490
2020-10-02 13:28:00-04:00 278.380
2020-10-02 13:35:00-04:00 277.840
2020-10-02 13:40:00-04:00 277.160
2020-10-02 13:41:00-04:00 277.131
2020-10-02 13:42:00-04:00 277.201
2020-10-02 13:43:00-04:00 277.240
2020-10-02 13:45:00-04:00 277.130
2020-10-02 13:47:00-04:00 277.310
2020-10-02 13:49:00-04:00 276.900
2020-10-02 13:50:00-04:00 276.841
2020-10-02 13:52:00-04:00 276.990
2020-10-02 13:56:00-04:00 277.260
2020-10-02 13:58:00-04:00 277.280
2020-10-02 13:59:00-04:00 277.350

The output above is good in that it shows the ‘delays’ probably aren’t really delays. Rather, they are chunks of time which don’t show any trades. How can this be? Probably the biggest cause is the default data which Alpaca uses only reports on trades which occur on a limited number of exchanges. There may be a lot of trading going on but, if it’s not on these exchanges, it won’t show up. One way to ‘fill those holes’ is to use Polygon data. That is consolidated from all exchanges and is much more ‘dense’. There’s a little more on this here

That said, there will always be the ‘rounding’ delay. Data is only aggregated every minute. Even in the best scenario there could be an almost 1 minute delay based just on when it is being fetched. 1-3 minutes is probably the best one can ever achieve using aggregated data. To get data with less latency maybe use the historic_trades_v2 API. Latency there is seconds and not minutes.

Thanks for the clarification. I am still in testing and playing mode so Polygon isn’t an option yet.

One solution I developed today was to get the data once with get_barset and then from that point on update my dataset using get_last_quote. This keeps my dataset up-to-date and the data matches what I am seeing on my brokerage account.

THanks for the information!