1

Hi I have a dataframe consisting of 7989 rows × 1 columns. The different rows are consequences from different maritime piracy attack.

I then want to assign a value to the different rows depending on whether or not a specific word is included in one of the different list below. The value assigned will then depend on the different list.

The 6 lists:

five =['kill','execute','dead']
four =['kidnap','hostag','taken','abduct']
three =['injur','wound','assault']
two =['captur','hijack']
one =['stolen','damage','threaten','robber','destroy']
zero =['alarm','no','none']

I Have tried to do it like this:

df['five']=df.apply(lambda x: '5' if x == 'five' else '-')

and df is my dataframe

Can anyone help?

dh762
  • 2,259
  • 4
  • 25
  • 44

3 Answers3

1

You can create dictionary for each list with value for number, merge all dictionaries together and then set new columns by numpy.where:

df = pd.DataFrame({'outcom':[['kill','dead'],['abduct','aaaa'],['hostag']]})

#same way add another lists
five = ['kill','execute','dead']
four = ['kidnap','hostag','taken','abduct']   
three =['injur','wound','assault']
two =['captur','hijack']
one =['stolen','damage','threaten','robber','destroy']
zero =['alarm','no','none']    

#same way add another dicts
d5 = dict.fromkeys(five, '5')
d4 = dict.fromkeys(four, '4')
d3 = dict.fromkeys(three, '3')
d2 = dict.fromkeys(two, '2')
d1 = dict.fromkeys(one, '1')
d0 = dict.fromkeys(zero, '0')

d = {**d5, **d4, **d3, **d2, **d1, **d0}
print (d)

for k, v in d.items():
    df[k] = np.where(df['outcom'].apply(lambda x: k in x), v, '-')

print (df)
           outcom kill execute dead kidnap hostag taken abduct
0    [kill, dead]    5       -    5      -      -     -      -
1  [abduct, aaaa]    -       -    -      -      -     -      4
2        [hostag]    -       -    -      -      4     -      -
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

Edited

you can use the loc function (documentation) like so:

import pandas as pd

five = ["I", "like"]
df = pd.DataFrame(["I", "like", "bacon", "in", "the", "morning"], columns=["Words"])
     Words
0        I
1    likes
2    bacon
3       in
4      the
5  morning

df["New"] = df["Words"].copy()
df.loc[df["New"] == "I", "New"] = 5

     Words      New
0        I        5
1     like     like
2    bacon    bacon
3       in       in
4      the      the
5  morning  morning

you can then use a for-loop to help you

Dr. Prof. Patrick
  • 1,280
  • 2
  • 15
  • 27
0

Thank you all for the help I think I found a way to make it work:

 list_of_words = zero + one + two + three + four + five

     outcome_refined = df_Stop2['outcome'].apply(lambda x: [item for item in x if item 
      in list_of_words])



 outcome_numbered=[] #Create an empty list

def max_val(list): #Ensures that then we only get the largest possible value

maximum_value = 0

for i in list:
   
   if i > maximum_value:

        maximum_value = i

return [maximum_value]

        
 #Make sure that you loop through each of the lists
    
 for words in outcome_refined:
    tmp = [] #Create a temprorary empty list
    for word in words:
      if word in zero:
          word = 0
      elif word in one:
          word = 1
      elif word in two:
          word = 2
      elif word in three:
          word = 3
      elif word in four:
          word = 4
      elif word in five:
          word = 5   
      tmp.append(word)
    tmp = max_val(tmp)
    outcome_numbered.append(tmp)


df_Stop['outcome_numbered']=outcome_numbered.copy()   

df_Stop

Finally working