`get_bars_async` failing to connect

I am trying to download one year of history for the entire universe of stocks available at Alpaca. I filter the list of holdings to only contain stocks that are ‘active’ ignoring those that trade on the BATS and OTC. This results in 9,462 stocks.

When I attempt to use the ‘get_bars_async’ function I get 3,879 "Cannot connect to host data.alpaca.markets:443’ errors. My code is a trimmed down version of this example: alpaca-trade-api-python/historic_async.py at master · alpacahq/alpaca-trade-api-python · GitHub. When I loop on the tickers using the vanilla ‘get_bars’ I don’t encounter any issues.

We’ve had a similar problem with .NET SDK but after switching to HTTP/2 transport this problem was gone. I’m not sure about the Python SDK but maybe this information will help you.

hi @pablo.mitchell could you add here the code you are trying run?
I will try to help you debug your problem.
I wrote the historic_async module.

import asyncio
import os
import sys
import time

import pandas as pd

import alpaca_trade_api as tradeapi
from alpaca_trade_api.rest import TimeFrame
from alpaca_trade_api.rest_async import gather_with_concurrency, AsyncRest

NY = 'America/New_York'

async def get_historic_bars(
        symbols,
        start,
        end,
        timeframe: TimeFrame,
):
    major = sys.version_info.major
    minor = sys.version_info.minor

    if major < 3 or minor < 6:
        raise Exception('asyncio is not supported  by your python version')

    print(f'Getting bars:')
    print(f'\t n_symbols={len(symbols)}')
    print(f'\t timeframe={timeframe}')
    print(f'\t start={start}')
    print(f'\t end={end}')

    tasks = []

    for symbol in symbols:
        args = [symbol, start, end, timeframe]
        tasks.append(rest.get_bars_async(*args))

    if minor >= 8:
        results = await asyncio.gather(*tasks, return_exceptions=True)
    else:
        results = await gather_with_concurrency(500, *tasks)

    n_errors = 0
    n_bad_requests = 0

    for response in results:
        if isinstance(response, Exception):
            n_errors += 1
            print(f"Got an error: {response}")
        elif not len(response[1]):
            n_bad_requests += 1
            print(f'bad response: {response}')
        else:
            # print(response)
            pass

    print('Showing results:')
    print(f'\t n_bars={len(results)}')
    print(f'\t n_errors={n_errors}')
    print(f'\t n_bad_requests={n_bad_requests}')

async def main(symbols):
    start = pd.Timestamp('2020-09-29', tz=NY).date().isoformat()
    end = pd.Timestamp('2021-09-29', tz=NY).date().isoformat()
    timeframe: TimeFrame = TimeFrame.Day
    await get_historic_bars(symbols, start, end, timeframe)

if __name__ == '__main__':
    key_id = os.environ.get('APCA_API_KEY_ID')
    secret_key = os.environ.get('APCA_API_SECRET_KEY')
    base_url = os.environ.get('APCA_API_BASE_URL')

    feed = "sip"  # ???

    rest = AsyncRest(key_id=key_id, secret_key=secret_key)
    api = tradeapi.REST(key_id=key_id, secret_key=secret_key, base_url=base_url)

    symbols = [
        asset.symbol for asset in api.list_assets(status='active') if
        asset.exchange not in ('BATS', 'OTC') and
        asset.tradable
    ]
    # symbols = symbols[-10:]

    start_time = time.time()
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main(symbols))
    print(f"took {time.time() - start_time:.0f} sec")

Hi @pablo.mitchell
I ran your example code and these are the results I get:

For 10 symbols, 1 year daily data

Getting bars:
	 n_symbols=10
	 timeframe=1Day
	 start=2020-09-29
	 end=2021-09-29
bad response: ('ZWS', Empty DataFrame
Columns: []
Index: [])
Showing results:
	 n_bars=10
	 n_errors=0
	 n_bad_requests=1
took 1 sec

For 100 symbols, 1 year daily data

Getting bars:
	 n_symbols=100
	 timeframe=1Day
	 start=2020-09-29
	 end=2021-09-29
bad response: ('WOLF', Empty DataFrame
Columns: []
Index: [])
...
bad response: ('ZWS', Empty DataFrame
Columns: []
Index: [])
Showing results:
	 n_bars=100
	 n_errors=0
	 n_bad_requests=4
took 1 sec

Process finished with exit code 0

For 1000 symbols, 1 year daily data

Getting bars:
	 n_symbols=1000
	 timeframe=1Day
	 start=2020-09-29
	 end=2021-09-29
bad response: ('RRX', Empty DataFrame
Columns: []
Index: [])
...
bad response: ('ZWS', Empty DataFrame
Columns: []
Index: [])
Showing results:
	 n_bars=1000
	 n_errors=0
	 n_bad_requests=14
took 5 sec

Process finished with exit code 0

For all( ) symbols, 1 year daily data


Getting bars:
	 n_symbols=9516
	 timeframe=1Day
	 start=2020-09-29
	 end=2021-09-29
...
Showing results:
	 n_bars=9516
	 n_errors=530
	 n_bad_requests=127
took 303 sec

Process finished with exit code 0

I do see the errors you refer to, when trying to get data for 9000 stocks

but the errors are roughly for 600 out of 9000 stocks.

an important thing to note is
the time it took for 9000 stocks - 5 minutes!
you could never get those results for the rest module (it will take you hours to achieve that)

What ca you do
Each API has limitations and it doesn’t allow the user to get endless data at once. BUT, we could work with that.

Split your requests to segments of 1000 stocks each time.
When I did that I got this result:

Getting bars:
	 n_symbols=9516
	 timeframe=1Day
	 start=2020-09-29
	 end=2021-09-29
Showing results:
	 n_bars=1000
	 n_errors=0
	 n_bad_requests=7
Showing results:
	 n_bars=1000
	 n_errors=0
	 n_bad_requests=12
Showing results:
	 n_bars=1000
	 n_errors=0
	 n_bad_requests=16
Showing results:
	 n_bars=1000
	 n_errors=0
	 n_bad_requests=14
Showing results:
	 n_bars=1000
	 n_errors=0
	 n_bad_requests=20
Showing results:
	 n_bars=1000
	 n_errors=0
	 n_bad_requests=19
Showing results:
	 n_bars=1000
	 n_errors=0
	 n_bad_requests=15
Showing results:
	 n_bars=1000
	 n_errors=0
	 n_bad_requests=7
Showing results:
	 n_bars=1000
	 n_errors=0
	 n_bad_requests=10
Showing results:
	 n_bars=516
	 n_errors=0
	 n_bad_requests=11
took 49 sec

Process finished with exit code 0

roughly 130 stocks with bad data out of ~9000. not bad. AND it only took 50 seconds!!

I will add the modified code in the next comment

Here’s the modified code that achieves these results:

I hope that helps :slight_smile:

async def get_historic_bars(
        symbols,
        start,
        end,
        timeframe: TimeFrame,
):
    major = sys.version_info.major
    minor = sys.version_info.minor

    if major < 3 or minor < 6:
        raise Exception('asyncio is not supported  by your python version')

    print(f'Getting bars:')
    print(f'\t n_symbols={len(symbols)}')
    print(f'\t timeframe={timeframe}')
    print(f'\t start={start}')
    print(f'\t end={end}')

    step_size = 1000

    for i in range(0, len(symbols), step_size):
        n_bad_requests = 0
        n_errors = 0
        tasks = []
        for symbol in symbols[i:i+step_size]:
            args = [symbol, start, end, timeframe]
            tasks.append(rest.get_bars_async(*args))

        if minor >= 8:
            results = await asyncio.gather(*tasks, return_exceptions=True)
        else:
            results = await gather_with_concurrency(500, *tasks)



        for response in results:
            if isinstance(response, Exception):
                n_errors += 1
                # print(f"Got an error: {response}")
            elif not len(response[1]):
                n_bad_requests += 1
                # print(f'bad response: {response}')
            else:
                # print(response)
                pass

        print('Showing results:')
        print(f'\t n_bars={len(results)}')
        print(f'\t n_errors={n_errors}')
        print(f'\t n_bad_requests={n_bad_requests}')

Thank you for taking the time to confirm the bug.

Pardon me if I disagree with the approach you took to remedying it. I don’t think your hack is the solution. I think diagnosing the bug and fixing it should be the solution. Also, the existing serial version of get_bars does not exhibit this issue. Moreover, if I wrap get_bars in a ThreadPoolExecutor I achieve comparable download speeds. So for the time being I’m going to stick with get_bars.

yes of course, choose your preferred method, both are valid.
if the thread approach work for you great!
how long does it take you to get the same result? would you mind adding a code snippet?

just to point something out - this is not a hack nor a bug.
no server in the world will let you connect simultaneously with 9000 requests from the same IP address. this is what this solution does - simultaneously connect to server and X requests (in this case X=9000)
so by splitting it to segments of 1000 you get all results in less than a minute.
having said that, you are of course free to select your preferred approach

disagree again. if the function fails as implemented then it is a bug. you should build throttling into it if that’s the issue and not expect consumers of your code to do it.

@camelpac Hi, an year late, am trying the same thing - trying to get 1day historical data for all US stocks. I am using segments of only 500 stocks. but get enormous empty responses (infact get data back only for 800 stocks). I am on paper trading account for now and using iex as the feed. the results from using your code is below

Getting Bars data for 7291 symbols, timeframe: 1Day between dates: start=2022-01-01, end=2022-11-14
Requesting tickers 0 - 500
Showing results:
n_bars=500
n_errors=0
n_bad_requests=261
Requesting tickers 500 - 1000
Showing results:
n_bars=1000
n_errors=0
n_bad_requests=720
Requesting tickers 1000 - 1500
Showing results:
n_bars=1500
n_errors=0
n_bad_requests=1179
Requesting tickers 1500 - 2000
Showing results:
n_bars=2000
n_errors=0
n_bad_requests=1626
Requesting tickers 2000 - 2500
Showing results:
n_bars=2500
n_errors=0
n_bad_requests=2080
Requesting tickers 2500 - 3000
Showing results:
n_bars=3000
n_errors=0
n_bad_requests=2543
Requesting tickers 3000 - 3500
Showing results:
n_bars=3500
n_errors=0
n_bad_requests=3003
Requesting tickers 3500 - 4000
Showing results:
n_bars=4000
n_errors=0
n_bad_requests=3458
Requesting tickers 4000 - 4500
Showing results:
n_bars=4500
n_errors=1
n_bad_requests=3925
Requesting tickers 4500 - 5000
Showing results:
n_bars=5000
n_errors=1
n_bad_requests=4378
Requesting tickers 5000 - 5500
Showing results:
n_bars=5500
n_errors=1
n_bad_requests=4820
Requesting tickers 5500 - 6000
Showing results:
n_bars=6000
n_errors=1
n_bad_requests=5287
Requesting tickers 6000 - 6500
Showing results:
n_bars=6500
n_errors=1
n_bad_requests=5749
Requesting tickers 6500 - 7000
Showing results:
n_bars=7000
n_errors=1
n_bad_requests=6215
Requesting tickers 7000 - 7500
Showing results:
n_bars=7291
n_errors=1
n_bad_requests=6480
took 186.6031038761139 sec

Could you kindly help with what am I doing wrong ? Many thanks in advance

hi @pablo.mitchell - if you dont mind me asking how do you handle the 200 calls per minute when using get_bars instead of get_bars_async and how does that effect the overall time taken. thanks in advance

1 Like