1

There are unreasonably high values and also negative values inside the 'Net Entries' and 'Net Exits' columns. I am trying to fix it with the code above. But I am keep encountering the below error. Below is my code:

indexes = [*D.index.unique()]
list_ = []

for index in indexes :
    
    df = D[D.index == index]
    
    array_ent = np.array(df['Net Entries'])
    array_ext = np.array(df['Net Exits'])
    
    avg_ent = np.mean(array_ent[(array_ent > 0) & (array_ent < 5040)])
    avg_ext = np.mean(array_ext[(array_ext > 0) & (array_ext < 5040)])
    
    array_ent[(array_ent < 0) | (array_ent > 5040)] = avg_ent
    array_ext[(array_ext < 0) | (array_ext > 5040)] = avg_ext
    
    df['x'] = array_ent
    df['y'] = array_ext
    
    list_.append(df)
    
MTA = pd.concat(list_, axis = 0)  

D.head()

RuntimeWarning: Mean of empty slice.
  return _methods._mean(a, axis=axis, dtype=dtype,
RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)

Can anyone solve this problem ?

Pranav Hosangadi
  • 23,755
  • 7
  • 44
  • 70
OnurYukay
  • 21
  • 2
  • Those are _warnings_, not errors. You seem to have some iterations where no values in `array_ent` or `array_ext` fulfill your conditions. – Pranav Hosangadi Dec 26 '22 at 23:21
  • Side note - your loop looks suspicious. Are you trying to manually iterate over the multiindex instead of calling `groupby`? A typical example of transforming a dataframe with `groupby`: https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#transformation. Replacing with per-group mean can be done by using `where` https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.where.html: `lambda x: x.where((x < 0) | (x > 5040), x.mean())`. One-liners with `clip` in answers here: https://stackoverflow.com/q/47187359. – Lodinn Dec 27 '22 at 03:46
  • Yes this usage of lambda function can be very helpful, however I am not trying to replace the values x < 0 or x > 5040 with x.mean(), I am trying to change that values with the mean of elements which are between 0 and 5040. – OnurYukay Dec 27 '22 at 14:26

1 Answers1

1

You are looking for the .clip() function.

df['Net Entries'] = df['Net Entries'].clip(0, 5040)
df['Net Exits']   = df['Net Exits'].clip(0, 5040)

Once clipped, process those features as you wish: median, mean, whatever.

J_H
  • 17,926
  • 4
  • 24
  • 44