Scanning for stocks efficiently

I’m building a stock scanner/screener, and think I’ve got it figured out, but feel like there should be a more efficient way. For example, let’s say I want to find all stocks between $2 and $10.

  • Use REST.list_assets() to retrieve a giant object with all stocks
  • Loop through the object and retrieve a day bar with REST.get_barset() for yesterday’s closing price
  • If the price is within my parameters, append the symbol as a string to a local list variable.

The thing I don’t get is why should I have to retrieve thousands of asset objects, each with 9 properties that I don’t need, and then loop through them to retrieve yet another giant list of barset objects in order to check if it matches my criteria?

Isn’t there a way for me to provide parameters in the original query so that the filtering happens server side? Something like REST.list_assets(price <10.00) so it only requests a list of assets that are less than $10 in the first place?

Looks like that that the list_assets function only accepts status as parameter. I hope they will include others

If you are using Python you may want to look at pipeline-live. In the background it still fetches data for each stock one by one but it hides all the implementation details. You simply need to do somethingn like this to get all stocks with a closing price between $2 and $10

  close_price = USEquityPricing.close.latest
  my_universe = (close_price > 2) & (close_price < 10)

There’s a bit more setup but that’s basically it.

Pipelines do the following behind the scenes for you

  • fetch data for a large number of securities by simply supplying a list of symbols
  • fetch data for long periods of time (eg 2 year moving average price) and break up the Polygon calls as needed if the data limit is reached
  • cache the resulting data so subsequent calls hit the cache and do not re-query the API
  • allow for complex data manipulation using built in and user defined functions (eg RSI)
  • auto calculate the amount of data to fetch for each function (eg 10 day returns will automatically fetch 10 trading days, not calendar days, of data)

There are some other nice features but it’s worth a look. Check it out on Github https://github.com/alpacahq/pipeline-live

1 Like

I think pipeline-live is not working… I tried installing it and had multiple errors even using virtualenvironment… I went to github and there are people having similar errors…