Wrong Historical Data

If we retrieve historical bars data from Alpaca for ~800 common ETFs, we see this for many of them:

  • Open or Close lower than daily bar Low
  • Open or Close higher than daily bar High

All those events should be impossible.

Even if we add tolerance to account for possible rounding of values to 2 decimals, the problem still exists, and is significant. A data audit using Yahoo data shows that they do not have this issue. Often, the Yahoo Open or Close are quite different from those reported by Alpaca. Like Yahoo, Polygon is consistent as well.

Two questions:

  1. Does Alpaca acknowledge this issue?
  2. What is Alpaca doing to solve this issue?

Historical data is fundamentally wrong.

Thank you.

2 Likes

I should add that this is using a paper API key. But since this is historical daily bar data, it should not matter whether live vs. paper.

have you got any response?

No response but still hoping. Appreciate your message.

Hey @dominicp, if possible can you please provide a sample query/script you are using to pull these bar data excluding api keys and I can take a look into this

Best,

Dylan

Hi Dylan,

We see frequent inconsistencies in open / close / high / low historical data. For example, an Open which is higher than the High.

Just get the history for the universe of stocks and run through this logic to see very common problems:
Errors.Assert(this.Open > 0);
Errors.Assert(this.Close > 0);
Errors.Assert(this.High > 0);
Errors.Assert(this.Low > 0);
Errors.Assert(this.Open <= this.High);
Errors.Assert(this.Open >= this.Low);
Errors.Assert(this.Close <= this.High);
Errors.Assert(this.Close >= this.Low);

Example for AAAA:
Close=25.04
Date=4/1/2021
High=25.04
Low=25.04
Open=25.05 ← higher than High!
Volume=105

Hey :wave:

@dominicp Yes, it can happen that Open is higher than High. It’s because there are certain trade conditions that update the opening price of the bar, but doesn’t update the high value. If you give me the exact symbol and timestamp of your example bar I can give a more exact explanation.

Hi @Gergely_Alpaca,

Polygon data does not have this behavior. Verified \across their entire universe of symbols:
Errors.Assert(this.Open > 0);
Errors.Assert(this.Close > 0);
Errors.Assert(this.High > 0);
Errors.Assert(this.Low > 0);
Errors.Assert(this.Open <= this.High);
Errors.Assert(this.Open >= this.Low);
Errors.Assert(this.Close <= this.High);
Errors.Assert(this.Close >= this.Low);

Previously, I provided the sample symbol and timestamp:
Example for AAAA:
Close=25.04
Date=4/1/2021
High=25.04
Low=25.04
Open=25.05 ← higher than High!
Volume=105

Using HistoricalBarsRequest with ListHistoricalBarsAsync() and BarTimeFrame.Day will yield this. This is using C# but I am sure can be translated into another programming language.
DateTime date = new DateTime(2021, 4, 1);
HistoricalBarsRequest historicalBarsRequest = new HistoricalBarsRequest(“AAAA”, date.AddDays(-1), date.AddDays(1), BarTimeFrame.Day);
IPage page = this.dataClient.ListHistoricalBarsAsync(historicalBarsRequest).Result;
foreach (IBar bar in page.Items)
{
// run validation above
}

Appreciate any explanation.

I can’t seem to find any AAAA bars in our database. Are you sure the symbol is correct?

@Gergely_Alpaca, I’m sorry for the confusion, and thank you for taking a look - it’s AAA.

Here is the API response for the daily bar:
{ TimeUtc = 2021-04-01T04:00:00.0000000Z, Symbol = “AAA”, Open = 25.05, High = 25.04, Low = 25.04, Close = 25.04 }

The Open on Yahoo finance and Polygon is 25.04 instead of 25.05.

Note: the same discrepancy applies to hundreds of data points across various symbols.

@dominicp

Here are all the trades for AAA for that day:

api.get_trades("AAA", start="2021-04-01", end="2021-04-02",).df

We’re using the SIP guidelines in the UTP specification to calculate the bars. There were 4 different trade conditions that day:

  • I: Odd lot trades never update OHLC, only the volume (that’s why I crossed them out)
  • Q: Market Center Official Open updates O, but not HLC
  • M: Updates nothing
  • 9: Updates HLC.

So when the first Q trade comes in the bar looks like this: 25.05 ? ? ?.
When the first 9 trades comes in it’s updated to: 25.05 25.04 25.04 25.04 25.04, as you noticed.

This is the explanation why you’re getting this bar currently. I’m not saying our aggregation logic is perfect, actually we’re working on making it better as we speak. Do I assume correctly that you say that O and C should never be higher than H or lower than L?

1 Like

@Gergely_Alpaca thank you for digging into this.

Yes, when we get historical daily bars from Alpaca, I feel the results should be as follows:

  • Self-consistent (High cannot be lower than O/C, Low cannot be higher than O/C, High cannot be lower than Low, etc.)
  • Consistent with other providers (Yahoo, Polygon, etc.)

Regards,
Dominic.

@dominicp This issue has been fixed.

api.get_bars("AAA", "1Day", start="2021-04-01", limit=1).df

Screenshot 2022-11-02 at 15.28.31

Hi @Gergely_Alpaca,

Thank you for doing this - appreciated!