Historical Bar Data Integrity Error

I’ve been written a historical bar data downloader utilizing the Alpaca API and today I finished it and downloaded some 1 Minute Bar Data for several sample tickers.
And then I checked the data integrity for the downloaded data and I got some errors as below:

[A] 2021-01-04 14:30:00 ~ 2021-09-30 20:57:00
10:37:10.234 [0]: [A] Price integrity error: [ O:(164.3700) > H:(163.7800) ] at [2021-08-18 13:30]i(61148)!
10:37:10.246 [0]: [A] Price integrity error: [ O:(172.1100) > H:(172.0300) ] at [2021-09-15 13:31]i(68563)!
10:37:10.251 [0]: [A] Price integrity error: [ O:(172.3200) < L:(172.4200) ] at [2021-09-21 13:31]i(70100)!
10:37:10.254 [0]: [A] Price integrity error: [ O:(160.3100) < L:(161.1310) ] at [2021-09-30 13:30]i(72781)!

[AACG] 2021-01-04 14:30:00 ~ 2021-09-30 20:00:00
10:37:10.326 [0]: [AACG] Price integrity error: [ O:(2.8500) > H:(2.8200) ] at [2021-04-21 13:30]i(14085)!
10:37:10.329 [0]: [AACG] Price integrity error: [ O:(2.9100) > H:(2.8600) ] at [2021-05-10 13:30]i(15304)!
10:37:10.332 [0]: [AACG] Price integrity error: [ O:(3.1200) > H:(3.1077) ] at [2021-06-16 13:36]i(17100)!
10:37:10.334 [0]: [AACG] Price integrity error: [ O:(2.8900) < L:(2.9000) ] at [2021-06-23 13:37]i(17396)!
10:37:10.340 [0]: [AACG] Price integrity error: [ O:(3.0800) < L:(3.0900) ] at [2021-07-01 13:30]i(17679)!
10:37:10.346 [0]: [AACG] Price integrity error: [ O:(2.7900) > H:(2.7780) ] at [2021-07-12 13:30]i(17969)!
10:37:10.349 [0]: [AACG] Price integrity error: [ O:(3.2900) < L:(3.3000) ] at [2021-07-21 13:39]i(20067)!
10:37:10.356 [0]: [AACG] Price integrity error: [ O:(3.3300) > H:(3.2800) ] at [2021-07-23 13:30]i(20623)!
10:37:10.361 [0]: [AACG] Price integrity error: [ O:(2.8200) > H:(2.8100) ] at [2021-08-13 13:32]i(22406)!
10:37:10.364 [0]: [AACG] Price integrity error: [ O:(2.6300) < L:(2.6700) ] at [2021-08-26 13:30]i(23028)!
10:37:10.370 [0]: [AACG] Price integrity error: [ O:(2.7100) > H:(2.6841) ] at [2021-08-30 13:30]i(23113)!
10:37:10.373 [0]: [AACG] Price integrity error: [ O:(2.7900) > H:(2.7700) ] at [2021-09-07 13:30]i(23345)!
10:37:10.376 [0]: [AACG] Price integrity error: [ O:(2.3600) > H:(2.3500) ] at [2021-09-24 13:50]i(24047)!

[AAPL] 2017-01-03 09:03:00 ~ 2017-12-30 00:59:00
10:37:10.528 [0]: [AAPL] Detected O-price glitch from (143.5100) to (646.0000): [450.1%] at [2017-07-03 22:33], i(72607)
10:37:10.528 [0]: [AAPL] Detected H-price glitch from (143.5100) to (646.2900): [450.3%] at [2017-07-03 22:33], i(72607)
10:37:10.528 [0]: [AAPL] Detected L-price glitch from (143.5100) to (645.2900): [449.6%] at [2017-07-03 22:33], i(72607)
10:37:10.528 [0]: [AAPL] Detected C-price glitch from (143.5100) to (645.7600): [450.0%] at [2017-07-03 22:33], i(72607)
10:37:10.528 [0]: [AAPL] Detected L-price glitch from (645.2900) to (123.4500): [19.1%] at [2017-07-03 22:34], i(72608)
10:37:10.528 [0]: [AAPL] Detected C-price glitch from (645.7600) to (123.4500): [19.1%] at [2017-07-03 22:34], i(72608)
10:37:10.529 [0]: [AAPL] Detected O-price glitch from (645.6900) to (143.5800): [22.2%] at [2017-07-05 08:00], i(72609)
10:37:10.529 [0]: [AAPL] Detected H-price glitch from (646.2900) to (143.5800): [22.2%] at [2017-07-05 08:00], i(72609)

I only downloaded and tested for 5 tickers and I got these.
I think Alpaca should fix this problem ASAP.

Hey, Alpaca, aren’t you gonna fix this problem? This is serious to everyone using historical market data!

Dear @xsiboss,

Thank you for reporting the issue. Bars are aggregated based on trade conditions.

Querying one of the requested periods you get the following result:
❯ curl -s -H ‘Apca-Api-Key-Id: *****’ -H ‘Apca-Api-Secret-Key: *****’ ‘https://data.alpaca.markets/v2/stocks/A/trades?start=2021-09-15T13:31:00Z&end=2021-09-15T13:32:00Z’ | jq ‘.trades[] | {t:.t, p:.p, c:.c}’ -c | egrep -v ‘“I”’
{“t”:“2021-09-15T13:31:23.061752832Z”,“p”:172.11,“c”:[" “,“Q”]}
{“t”:“2021-09-15T13:31:23.062294784Z”,“p”:172.03,“c”:[” “,“F”]}
{“t”:“2021-09-15T13:31:23.062866688Z”,“p”:172,“c”:[” “,“F”]}
{“t”:“2021-09-15T13:31:23.062951424Z”,“p”:172,“c”:[” “]}
{“t”:“2021-09-15T13:31:23.063Z”,“p”:172,“c”:[” “]}
{“t”:“2021-09-15T13:31:23.071Z”,“p”:172.02,“c”:[” “]}
{“t”:“2021-09-15T13:31:23.227819282Z”,“p”:171.825,“c”:[” “]}
{“t”:“2021-09-15T13:31:44.053Z”,“p”:171.98,“c”:[” “]}
{“t”:“2021-09-15T13:31:44.053Z”,“p”:171.985,“c”:[” “]}
{“t”:“2021-09-15T13:31:44.827Z”,“p”:171.83,“c”:[” "]}

I have filtered out ‘I’ (OddLotTrade) trades because they are not updating opening either high prices. While ‘Q’ (MarketCenterOfficialOpen) condition updates opening price, it does not update high price, that is why you get this weird result.

See: https://www.ctaplan.com/publicdocs/ctaplan/CTS_Pillar_Output_Specification.pdf (page 64-65)

Regards, Gabor

What you have provided here is not an answer nor a solution; you simply explained why users were getting the wrong result. Data Integrity is a vital aspect of historical data and that’s wrong in Alpaca data; the team needs to fix this and provide a bug fix issue number.