Assigning NumPy arrays into a DataFrame column inside for loops

Question

There are unreasonably high values and also negative values inside the 'Net Entries' and 'Net Exits' columns. I am trying to fix it with the code above. But I am keep encountering the below error. Below is my code:

indexes = [*D.index.unique()]
list_ = []

for index in indexes :
    
    df = D[D.index == index]
    
    array_ent = np.array(df['Net Entries'])
    array_ext = np.array(df['Net Exits'])
    
    avg_ent = np.mean(array_ent[(array_ent > 0) & (array_ent < 5040)])
    avg_ext = np.mean(array_ext[(array_ext > 0) & (array_ext < 5040)])
    
    array_ent[(array_ent < 0) | (array_ent > 5040)] = avg_ent
    array_ext[(array_ext < 0) | (array_ext > 5040)] = avg_ext
    
    df['x'] = array_ent
    df['y'] = array_ext
    
    list_.append(df)
    
MTA = pd.concat(list_, axis = 0)

D.head()

RuntimeWarning: Mean of empty slice.
  return _methods._mean(a, axis=axis, dtype=dtype,

RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)

Can anyone solve this problem ?

Those are _warnings_, not errors. You seem to have some iterations where no values in `array_ent` or `array_ext` fulfill your conditions. — Pranav Hosangadi, Dec 26 '22 at 23:21
Side note - your loop looks suspicious. Are you trying to manually iterate over the multiindex instead of calling `groupby`? A typical example of transforming a dataframe with `groupby`: https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#transformation. Replacing with per-group mean can be done by using `where` https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.where.html: `lambda x: x.where((x < 0) | (x > 5040), x.mean())`. One-liners with `clip` in answers here: https://stackoverflow.com/q/47187359. — Lodinn, Dec 27 '22 at 03:46
Yes this usage of lambda function can be very helpful, however I am not trying to replace the values x < 0 or x > 5040 with x.mean(), I am trying to change that values with the mean of elements which are between 0 and 5040. — OnurYukay, Dec 27 '22 at 14:26

score 1 · Answer 1 · answered Dec 26 '22 at 23:30

1

You are looking for the .clip() function.

df['Net Entries'] = df['Net Entries'].clip(0, 5040)
df['Net Exits']   = df['Net Exits'].clip(0, 5040)

Once clipped, process those features as you wish: median, mean, whatever.

answered Dec 26 '22 at 23:30

J_H

17,926
4
24
44

Assigning NumPy arrays into a DataFrame column inside for loops

1 Answers1