0

Currently i have dataframe which has has stock ticker and return

ticker_name    return
"AAPL 2020"    -15%
"AAPL 2019"     20%
"AAPL 2018"     40%
"AAPL 2017"     30%
"AAPL 2016"    -10%
....

I also have the data of index return in yearly format for the last x years.With this data want to tag if specific stock have above market return

sp_500_year return
"2020"      -30%
"2019"       10%
"2018"       10%
"2017"       10%
"2016"       20%
....

Expected output are new column which are tags in boolean format. 1 if they have above market return , and 0 if they underperform the market.

ticker_name    return   above_market
"AAPL 2020"    -15%        1
"AAPL 2019"     20%        1
"AAPL 2018"     40%        1
"AAPL 2017"     30%        1
"AAPL 2016"    -10%        0
....



I found a similar question to mine, however this question is comparing strings and only have two possible input (BULL & BEAR) while mine is float and vary (index return).

Pandas: if row in column A contains "x", write "y" to row in column B

1 Answers1

2

First I would suggest splitting your ticker_name column into two columns.

Below df_ticker stores your first DataFrame with the stock ticker, year and return. Here the Series.str.split() method is used to split the ticker from the year, then we used the trick described here Python pandas split column list into multiple columns.

df_ticker = df_ticker.join(pd.DataFrame(df_ticker['ticker_name'].str.split().tolist(), columns=['ticker', 'year']))

Two new columns are created :

    ticker_name return  ticker  year
0   AAPL 2020   -0.15   AAPL    2020
1   AAPL 2019   0.20    AAPL    2019
2   AAPL 2018   0.40    AAPL    2018
3   AAPL 2017   0.30    AAPL    2017
4   AAPL 2016   -0.10   AAPL    2016

Then I would suggest setting the year as the index, this will ease comparisons between DataFrames.

df_ticker = df_ticker.set_index('year')

Grants:

    ticker_name return  ticker
year            
2020    AAPL 2020   -0.15   AAPL
2019    AAPL 2019   0.20    AAPL
2018    AAPL 2018   0.40    AAPL
2017    AAPL 2017   0.30    AAPL
2016    AAPL 2016   -0.10   AAPL

And

df_index = df_index.set_index('sp_500_year')

Grants

    return
sp_500_year 
2020    -0.3
2019    0.1
2018    0.1
2017    0.1
2016    0.2

Now you can safely compare both DataFrames

df_ticker['above_index'] = df_ticker['return'] > df_index['return']
    ticker_name return  ticker  above_index
year                
2020    AAPL 2020   -0.15   AAPL    True
2019    AAPL 2019   0.20    AAPL    True
2018    AAPL 2018   0.40    AAPL    True
2017    AAPL 2017   0.30    AAPL    True
2016    AAPL 2016   -0.10   AAPL    False
arhr
  • 1,505
  • 8
  • 16