1

What I have:

|    ID   |   Possible_Size    |   Actual_Size     |
|:------: |:------------------:|:-----------------:|  
|   1234  |         BIG        |        BIG        |
|   5678  |       MEDIUM       |        BIG        |
|   9876  |        SMALL       |       SMALL       |       
|   1092  |       MEDIUM       |       MEDIUM      |

What I would like to create:

|    ID   |   Possible_Size    |   Actual_Size     |       Big       |
|:------: |:------------------:|:-----------------:|:---------------:|  
|   1234  |         BIG        |        BIG        |  True Positive  |
|   5678  |       MEDIUM       |        BIG        |  False Negative |  
|   9876  |        BIG         |       SMALL       |  False Positive |   
|   1092  |       MEDIUM       |       MEDIUM      |                 |

What I have tried:

    def sizes(row):
                        
        if row['Actual_Size'] in ['BIG'] and row['Possible_Size'] in ['BIG']:
            df['Big'] = 'True Positive'
        elif row['Actual_Size'] in ['BIG'] and row['Possible_Size'] in ['MEDIUM', 'SMALL']:
            df['Big'] = 'False Negative'
        elif row['Actual_Size'] in ['MEDIUM', 'SMALL'] and row['Possible_Size'] in ['BIG']:
            df['Big'] = 'False Positive'  
        else:
            df['Big'] = ''
                        
    df.apply(sizes, axis=1)

Currently I am getting a blank 'Big' column

big_soapy
  • 137
  • 7
  • 1
    `==` is not `=` – Selcuk Jul 29 '20 at 01:52
  • Yes thank you, I had tried both = and == . Having changed that I am now just getting a blank 'Big' column. – big_soapy Jul 29 '20 at 02:12
  • Can you try to print `row['Actual_Size']` to see what is in the variable. My best guess is that there may be trailing spaces in the string. You might want to do `strip()` before doing the comparison if there are trailing spaces. – Prashanth Mariswamy Jul 29 '20 at 02:37

1 Answers1

1

For this multiple if/elif statements you could use np.select:

choices = ['True Positive','False Negative','False Positive']
conditions = [
       ((df['Actual_Size'].isin(['BIG']))&(df['Possible_Size'].isin(['BIG']))), 
       ((df['Actual_Size'].isin(['BIG']))&(df['Possible_Size'].isin(['MEDIUM', 'SMALL']))),
       ((df['Actual_Size'].isin(['MEDIUM', 'SMALL']))&(df['Possible_Size'].isin(['BIG'])))]
import numpy as np
df['Big'] = np.select(conditions, choices, default='')

If you want to keep your original solution, the problem was that you were returning nothing when applying the function row by row, so you could try this:

def sizes(row):

    if row['Actual_Size'] in ['BIG'] and row['Possible_Size'] in ['BIG']:
        return'True Positive'
    elif row['Actual_Size'] in ['BIG'] and row['Possible_Size'] in ['MEDIUM', 'SMALL']:
        return 'False Negative'
    elif row['Actual_Size'] in ['MEDIUM', 'SMALL'] and row['Possible_Size'] in ['BIG']:
        return 'False Positive'  
    else:
        return ''

df['Big']=df.apply(sizes, axis=1)

Both outputs:

df
     ID Possible_Size Actual_Size             Big
0  1234           BIG         BIG   True Positive
1  5678        MEDIUM         BIG  False Negative
2  9876           BIG       SMALL  False Positive
3  1092        MEDIUM      MEDIUM                
MrNobody33
  • 6,413
  • 7
  • 19
  • Yes that worked thanks so much! Just out of interest, is there any reason you would use one solution over the other? – big_soapy Jul 29 '20 at 03:21
  • 1
    With multiple conditions like this, I would use `np.select`, because numpy as you may know gives a better performance, and the other option(`apply`) sometimes [it's slower](https://stackoverflow.com/q/38697404/13676202). – MrNobody33 Jul 29 '20 at 03:29