1

I would need to assign boolean values to rows in a new column Y based on the value of a column called X (1,2,3,4,5). I have this column in a dataset df:

X
1
1
1
3
2
5
2
4
1

I would like a new one, Y, in a new dataset that is a copy of df, where:

  • if row has X value = 1 then True
  • if row has X value = 2 then False
  • if row has X value = 3 then False
  • if row has X value = 4 then True
  • if row has X value = 5 then False

So I should have

X        Y
1      true
1      true
1      true
3      false
2      false
5      false
2      false
4      true
1      true

I wrote this code:

new_df=df.copy()
new_df['Y'] = False
for index in df.iterrows():
    if   df['X'] == 1:
        new_df.iloc[index,9] = True
    elif df['X'] == 2:
        new_df.iloc[index,9] = False
    elif df['X'] == 3:
        new_df.iloc[index,9] = False
    elif df['X'] == 4:
        new_df.iloc[index,9] = True
    else:
        new_df.iloc[index,9] = False

getting this error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Can you please help me to fix the code to get the expected output? Thank you

  • 1
    https://stackoverflow.com/questions/19913659/pandas-conditional-creation-of-a-series-dataframe-column shows you how to conditionally create a column based on multiple conditions. However, in the example you've provided none of this is necessary because your logic can be simplified to "`True` if X is 1 or 4, and `False` otherwise", which is a simple `df['X'].isin([1, 4])` – ALollz Jun 16 '20 at 19:56
  • 1
    Thank you. I did not look at that question. It is extremely helpful –  Jun 16 '20 at 19:59
  • 1
    I added your comment to my answer because it does provide a very useful simplification @ALollz thanks! – Celius Stingher Jun 16 '20 at 20:03

1 Answers1

1

Edit: np.where() is preferred to map()

I believe what you need to do is to create a custom function where you can use the if-elif-else and then use map with it. Something along the lines of:

def evaluator(x):
   if x == 1:
      return True
   elif x == 2:
      return False
   elif x == 3:
      return False
   elif x == 4: 
      return True
   else:
      return False
df['Y'] = df['X'].map(lambda x: evaluator(x))

@Allolz comment provides a useful simplification which can also allow for the use of vectorized operation with np.where()

df['Y'] = np.where(df['X'].isin([1,4]),True,False) 

This, in your case and given your input dataframe, outputs:

   X      Y
0  1   True
1  1   True
2  1   True
3  3  False
4  2  False
5  5  False
6  2  False
7  4   True
8  1   True
Community
  • 1
  • 1
Celius Stingher
  • 17,835
  • 6
  • 23
  • 53