Compute PnL statements from `trade_confirmation.json` files

Hi,
I downloaded all the trade_confirmation.json in my account statement in my Alpaca Web dashboard, for the last financial year and now have it in a dataframe (table). My question is without a common position ID which links the open and close of a trade. See highlighted coloured boxes below which links the open and close of a few symbols.

REQUIREMENT
How do I go about creating a function (in pandas python) to compute a capital gains PnL statement?

The capital gains statement should have the following columns:
[ Symbol, TradeOpenDateTime, Direction@TradeOpen, Units,
OpenPrice, OpenAmount(USD), ClosePrice,
CloseAmount(USD), Gain/Loss]

Anyone did something like this before?

@trader_drk Basically, to get the net overall profit (or loss), simply sum the net_amount column. If one wants the profit (or loss) by symbol, first group by the symbol the sum net_amount column. Like this

profit_by_symbol = df.groupby('symbol').net_amt.sum()

This returns something like this

symbol
AAPD     11.1000
AMD      32.3130
NVDA    -79.2435
NVDS     40.2800
SE        0.5655
TSLA    167.3394

However, this works only because all the symbols started with 0 qty and ended with 0 qty. In other words, all the positions have been closed, so all of the gains are ‘realized gains’. This greatly simplifies the task and can be verified by checking the net qty of each symbol.

net_qty_by_symbol = df.groupby(‘symbol’).qty.sum()

This then returns something like this. Note the qty all sum to 0.

symbol
AAPD    0
AMD     0
NVDA    0
NVDS    0
SE      0
TSLA    0

One can put it all together into a single dataframe which can serve as the basis of a ‘statement’. Simply create an empty dataframe and add the desired columns. Something like this

#create an empty dataframe
pl = pd.DataFrame()

#add columns by grouping on symbol, filtering if needed, then performing any calculations
pl[‘trade_open_datetime’] = df.groupby(‘symbol’).trade_time.first()
pl[‘direction_at_open’] = df.groupby(‘symbol’).side.first()
pl[‘units’] = df.query(‘qty>0’).groupby(‘symbol’).qty.sum()
pl[‘open_price’] = df.query(‘qty>0’).groupby(‘symbol’).avg_price.mean()
pl[‘close_price’] = df.query(‘qty<0’).groupby(‘symbol’).avg_price.mean()
pl[‘open_amt’] = df.query(‘qty>0’).groupby(‘symbol’).net_amt.sum()
pl[‘close_amt’] = df.query(‘qty<0’).groupby(‘symbol’).net_amt.sum()
pl[‘gain_loss’] = df.groupby(‘symbol’).net_amt.sum()

This results in a dataframe like this

This probably isn’t exactly what you wanted? One issue is the ‘open_price’ and ‘close_price’ are really just the 'buy" and “sell” prices. For short positions you may want to reverse these? This can be done like this. Notice that to ‘swap’ two columns we need to create a temporary series to hold the original values

#create a temp series to hold the original values
orig_open_price = pl.open_price
orig_close_price = pl.close_price

orig_open_amt = pl.open_amt
orig_close_amt = pl.close_amt

#swap the values if not a “buy”
pl[‘open_price’] = pl.open_price.where(pl.direction_at_open==“buy”, orig_close_price)
pl[‘close_price’] = pl.close_price.where(pl.direction_at_open==“buy”, orig_open_price)

pl[‘open_amt’] = pl.open_amt.where(pl.direction_at_open==“buy”, orig_close_amt)
pl[‘close_amt’] = pl.close_amt.where(pl.direction_at_open==“buy”, orig_open_amt)

Now the ‘open_amt’ columns will better align with the direction_at_open.

Another issue is this treats an entire symbol as a single ‘lot’. However, one could initially go long, close out the long position, then go short. This approach treats that a single long position. If one wishes to capture the long and short positions separately, then simply groupby ['symbol', 'open_side']. One would need to create the new column ‘open_side’ with values like buy, sell, buy_to_cover, and sell_short, but that would be the approach.

My columns don’t exactly match your columns, but you get the idea. Is that sort of what you were looking for?

1 Like

Thanks @Dan_Whitnable_Alpaca ,
You solution worked to a large extent. I had to do some type conversions on the float columns and remove a long_hold position and after that the output came as expected. Attaching the final output as jpeg file

1 Like