assigning a value to specific words in a dataframe in python

Question

Hi I have a dataframe consisting of 7989 rows × 1 columns. The different rows are consequences from different maritime piracy attack.

I then want to assign a value to the different rows depending on whether or not a specific word is included in one of the different list below. The value assigned will then depend on the different list.

The 6 lists:

five =['kill','execute','dead']
four =['kidnap','hostag','taken','abduct']
three =['injur','wound','assault']
two =['captur','hijack']
one =['stolen','damage','threaten','robber','destroy']
zero =['alarm','no','none']

I Have tried to do it like this:

df['five']=df.apply(lambda x: '5' if x == 'five' else '-')

and df is my dataframe

Can anyone help?

Can you embed a short example in your question? – Sayan Dey Aug 22 '20 at 14:55 — Sayan Dey, Aug 22 '20 at 14:55

jezrael · Answer 1 · 2020-08-22T15:30:30.117

You can create dictionary for each list with value for number, merge all dictionaries together and then set new columns by numpy.where:

df = pd.DataFrame({'outcom':[['kill','dead'],['abduct','aaaa'],['hostag']]})

#same way add another lists
five = ['kill','execute','dead']
four = ['kidnap','hostag','taken','abduct']   
three =['injur','wound','assault']
two =['captur','hijack']
one =['stolen','damage','threaten','robber','destroy']
zero =['alarm','no','none']    

#same way add another dicts
d5 = dict.fromkeys(five, '5')
d4 = dict.fromkeys(four, '4')
d3 = dict.fromkeys(three, '3')
d2 = dict.fromkeys(two, '2')
d1 = dict.fromkeys(one, '1')
d0 = dict.fromkeys(zero, '0')

d = {**d5, **d4, **d3, **d2, **d1, **d0}
print (d)

for k, v in d.items():
    df[k] = np.where(df['outcom'].apply(lambda x: k in x), v, '-')

print (df)
           outcom kill execute dead kidnap hostag taken abduct
0    [kill, dead]    5       -    5      -      -     -      -
1  [abduct, aaaa]    -       -    -      -      -     -      4
2        [hostag]    -       -    -      -      4     -      -

Dr. Prof. Patrick · Answer 2 · 2020-08-22T15:20:33.570

0

Edited

you can use the loc function (documentation) like so:

import pandas as pd

five = ["I", "like"]
df = pd.DataFrame(["I", "like", "bacon", "in", "the", "morning"], columns=["Words"])
     Words
0        I
1    likes
2    bacon
3       in
4      the
5  morning

df["New"] = df["Words"].copy()
df.loc[df["New"] == "I", "New"] = 5

     Words      New
0        I        5
1     like     like
2    bacon    bacon
3       in       in
4      the      the
5  morning  morning

you can then use a for-loop to help you

edited Aug 22 '20 at 15:20

answered Aug 22 '20 at 14:56

Dr. Prof. Patrick

1,280
2
15
27

Thank you so much! But I would to make a new column where I have a value 0-5 depending on if the word is included. And it is not possible to say df['new_name_column'] = df.where(df ==['kidnap','hostag','taken','abduct], 4) – Sofie Amalie Bodenhoff Aug 22 '20 at 15:04
I guess I could also make a loop, but it doesn't work. :/ – Sofie Amalie Bodenhoff Aug 22 '20 at 15:06
That is probably because i made a mistake, unlike numpy pandas' `where` replaces the values that DONT hold the condition – Dr. Prof. Patrick Aug 22 '20 at 15:09
@SofieAmalieBodenhoff I fixed my answer and ran the code, it works :) – Dr. Prof. Patrick Aug 22 '20 at 15:21
1

I tried but it still doesn't work. I have a a column with multiple words is that the problem? – Sofie Amalie Bodenhoff Aug 22 '20 at 15:32
@SofieAmalieBodenhoff - I think you are right, answer is wrong, because cannot working with lists. – jezrael Aug 22 '20 at 15:36

score 0 · Answer 3 · answered Aug 22 '20 at 16:27

Thank you all for the help I think I found a way to make it work:

 list_of_words = zero + one + two + three + four + five

     outcome_refined = df_Stop2['outcome'].apply(lambda x: [item for item in x if item 
      in list_of_words])



 outcome_numbered=[] #Create an empty list

def max_val(list): #Ensures that then we only get the largest possible value

maximum_value = 0

for i in list:
   
   if i > maximum_value:

        maximum_value = i

return [maximum_value]

        
 #Make sure that you loop through each of the lists
    
 for words in outcome_refined:
    tmp = [] #Create a temprorary empty list
    for word in words:
      if word in zero:
          word = 0
      elif word in one:
          word = 1
      elif word in two:
          word = 2
      elif word in three:
          word = 3
      elif word in four:
          word = 4
      elif word in five:
          word = 5   
      tmp.append(word)
    tmp = max_val(tmp)
    outcome_numbered.append(tmp)


df_Stop['outcome_numbered']=outcome_numbered.copy()   

df_Stop

Finally working

assigning a value to specific words in a dataframe in python

3 Answers3