@guillem You asked about best practices for streaming trades. Generally only subscribe to the trades you need. The single biggest issue with streaming data is the client (ie the algo) processing the trades fast enough. Processing trades then filtering them out is a waste of CPU time.
The biggest issues we see with websocket implementations are 1) the client disconnects but then ‘silently’ reconnects (therefore missing data). 2) the client doesn’t keep up with the streamed data and simply processes messages slower than they are being received 3) the client doesn’t properly allocate time for the required underlying ping-pong protocol.
Issue 1 “silently disconnecting”. This is a direct result of the websocket implementations in some of the SDKs which automatically reconnect when a connection is lost. If one is implementing their own code this shouldn’t be an issue. Presumably the code knows when a connection is lost so it won’t be ‘silent’.
Issue 2 “slow message processing”. If messages aren’t pulled from the websocket stream fast enough they are buffered. The Alpaca servers buffer streamed messages but, if that buffer fills, the server will disconnect. The buffer is quite large. It can happen that many minutes of messages get queued. Algos can get up to 30 minutes (or more) behind processing messages without realizing it is working with ‘stale’ data.
Issue 3 “no pong message”. The client is expected to send a ‘keep alive’ pong message to the server at least every 20 seconds (this is handled transparently by most websocket packages). The algo must ensure it doesn’t get too busy processing messages and leave time to send these pongs. If the server doesn’t receive these, it thinks the client went away and disconnects.
One’s algo should always implement a process to compare the timedelta between when a message is received and the timestamp of the message. For quotes this should typically be under 25ms. Trades should also be about 25ms except trades can be reported up to 10 seconds after execution. There may be outliers with a 10 second timedelta. If the algo ever sees many trades with over 10 second timedeltas or if timedeltas are steadily increasing, it’s a sign the algo is processing messages too slowly.
Below is some sample python code using the alpaca-py SDK implementing a timedelta check.
import pandas as pd
ALPACA_API_KEY = 'xxxx'
ALPACA_API_SECRET_KEY = 'xxxx'
from alpaca.data.live import StockDataStream
from alpaca.data.enums import DataFeed
stream = StockDataStream(ALPACA_API_KEY, ALPACA_API_SECRET_KEY, feed=DataFeed.SIP)
MY_SYMBOLS = ['IBM', 'SPY', 'NVDA']
MAX_TIME_DIFFERENCE = pd.Timedelta(seconds=1)
def timestamp_difference(timestamp):
return pd.Timestamp.utcnow() - timestamp
# async function to handle stream data
async def trade_data_handler(data):
# trade data will arrive here
time_delta = timestamp_difference(data.timestamp)
if time_delta < MAX_TIME_DIFFERENCE:
# do whatever your logic is here
print(f"(delta in ms: {time_delta.microseconds/1000} {time_delta}")
pass
else:
# skip proccessing and log skipped data
# this should clear the buffer in most cases
print(f"Algo not keeping up. Skipping data. {pd.Timestamp.utcnow()} {data}")
stream.subscribe_trades(trade_data_handler, *MY_SYMBOLS)
stream.run()
If you will be streaming ~6k symbols I would recommend starting with just a few symbols. Get a baseline timedelta. Incrementally add more symbols and ensure your algo keeps up.
Another thing to remember is not all trades are equal. You will typically want to filter and exclude trades with any of these trade conditions [‘B’, ‘C’, ‘G’, ‘H’, ‘I’, ‘M’, ‘N’, ‘P’, ‘Q’, ‘R’, ‘U’, ‘V’, ‘W’, ‘Z’, ‘4’, ‘7’, ‘9’].
Hope that helps.