I want to pick two biggest values on each row, sort them and get column names as values. Other values are dropped from dataframe.
import pandas as pd
d = {'col1': [1, 2, np.nan], 'col2': [2,3,3], 'col3': [3,6,5], 'col4': [4,9,10], 'col5': [5,1, np.nan], 'col6': [7,np.nan,2], 'col7': [np.nan, 5,6]}
df = pd.DataFrame(data=d)
I'm now able to get two biggest values of each row but reshaping dataframe based on column values is another task. Code below leaves rest of the values as Nan.
lasttwo = df.stack().sort_values(ascending=True).groupby(level=0).tail(2).unstack()
There is this example from another thread in here Find the column names which have top 3 largest values for each row Example code is down below and it works nearly fine except it doesn't work with Nan-values.
last = pd.DataFrame(df.apply(lambda x:list(df.columns[np.array(x).argsort()[::-1][:2]]), axis=1).to_list(), columns=['Last', 'Second last'])
How to deal with these nan's?
Example:
| --- | col1 | col2 | col3 | col4 | col5 | col6 | col7 |
|---|---|---|---|---|---|---|---|
| a | 1 | 2 | 3 | 4 | 5 | 7 | Nan |
| b | 2 | 3 | 6 | 9 | 1 | Nan | 5 |
| c | Nan | 3 | 5 | 10 | Nan | 2 | 6 |
Intended result:
| --- | Last | Second last |
|---|---|---|
| a | col6 | col5 |
| b | col4 | col3 |
| c | col4 | col7 |