I am using stream API to get real time trade data.
What is the timestamp in trade schema?
“t RFC-3339 formatted timestamp with nanosecond precision.”
Assuming it is SIP timestamp, how can we filter trades which are reported too late.
Because if a trade was reported 10 second late, I simply want to ignore it.
Here is an example of trade which SIP feed report 5 second late (AAPL, 2021-02-26, data obtained from polygon APIs):
You can see that “sip_timestamp” is 5 second after “exchange_timestamp”, so it was reported 5 seconds after trade actually took place at exchange.
Majority of trades (~95%) get reported within 100-200ms, but there are some which are reported too late and I want to ignore them.
But without exchange timestamp in stream trade schema, it is impossible to do that.
Can we please also have exchange timestamp?
The current timestamp is actually the ‘participant_timestamp’ and not the ‘sip_timestamp’. So that is the time which the trade took place. Alpaca is looking to maybe include both but that’s the current state.
The best way to filter ‘late’ data is to look at the trade condition. Trades which are reported ‘late’ will have an L as one of the trade conditions. This implies it was reported more than 10 seconds after the actual trade during market hours or 15 minutes after the trade during extended hours trading.
As an alternative, One could also just compare the current time to the timestamp. Consider using the clock API to get the current time synchronized with the markets.
After running websocket for 1 day I could verify that it is indeed “participant_timestamp”. It is sufficient for me.
Thanks for the response. And thanks for being so actively responding to everyone questions on slack and here. I learned a lot just by reading your responses to various questions.
A quick update. Alpaca is looking into changing the way v2 data is presented and aggregate bars are calculated. First, both the participant_timestamp and the sip_timestamp will be surfaced. Additionally, aggregated bars will be available immediately after the minute. However, an additional check is made a bit later (we are thinking 30 seconds). If there were any late but still ‘valid’ trades which should have appeared in the original bar, we will send out an ‘updated bar’. There will be a field in a bar which indicates if it is an update or not. This is a way we can provide both timely ‘real time’ data but also balance that with more correct aggregates.
@Dan_Whitnable_Alpaca Hi! Is there any update on the sip_timestamp? Or was the choice made to not make the sip_timestamp available (so also not in future)?
Hi! Without SIP timestamp is impossible to know in PM if a trade was reported late or not… conditions like ‘L’ does not filter all the trades… any news on this?
@guillem The timestamp Alpaca reports for trades is Timestamp 1 from the SIP feed. This is typically referred to as the participant timestamp which, in most cases, is the trade execution time. However there is a one notable exception.
Timestamp 1 2 x Integer (pair of integers). Timestamp 1 is a Participant-provided timestamp representing the number of nanoseconds since Epoch. The first integer contains the number of seconds from epoch 1/1/1970, 00:00:00 UTC. The next integer contains the nanosecond portion of the time (e.g., 972402315). For any messages generated by CTS, e.g., Messages generated on behalf of a Participant, Price Band messages, Control messages and Market Status messages, the Timestamp 1 field will be set to current SIP time.
If from an Exchange: Timestamp 1 denotes the Exchange Matching Engine Publication timestamp for a transaction. Exchanges use a clock sync methodology ensuring that timestamps are accurate within tolerances of 100 microseconds or less. Exchanges shall provide the timestamp in terms of nanoseconds since Epoch.
If from the FINRA Alternative Display Facility (ADF) or a FINRA Trade Reporting Facility (TRF): Timestamp1 denotes the time of execution that a FINRA member reports to the FINRA ADF or a FINRA TRF. FINRA shall provide such times to the Processor in nanoseconds since Epoch.
We often generalize this timestamp to be the ‘time the trade executed’, however this isn’t exactly the definition. That is true for trades executed on an exchange. However, for trades not executed on an exchange and reported to a FINRA Alternative Display Facility (ADF) or a FINRA Trade Reporting Facility (TRF), it is the time which the trade was reported to the reporting facility. (These trades have an exchange code of “D”.) This time is typically very very close to the execution time, however it really messes things up for trades executed after 20:00 ET. Why? The ADF and TRF facilities aren’t open. They don’t open until 8:00 ET the following morning. Therefore, execution platforms don’t report them until 8:00 AM ET (even though they may have executed the previous evening). The timestamp is the time the trade is reported for those trades. If you look at trade volume between 8:00-8:15 AM ET you will often see a big spike along with swings in prices. That is because of trades from the previous evening all getting reported when the reporting facilities open.
Alpaca is currently looking into options for reporting overnight trades with the execution time and not the reporting time. Currently that information is simply not available in the SIP data which is received.
Is there a specific reason you are wanting the time a trade was reported to the SIP?
Hi @Dan_Whitnable_Alpaca thank you so much for your detailed answer! I actually don’t need “SIP timestamp” for any other reason than to “clean” those unreported overnight trades. I thought that timestamp would allow me to do so, but it seems it’s not that simple.
I’ve tried avoiding the “D” exchange as you mentioned, but it’s not always 100% accurate. In your experience, what would be the best way to handle this with Alpaca’s current data? It would be awesome if there were a way to mark those trades somehow to maintain a clean feed.