0

I have a standard Pandas data frame:-

<class 'pandas.core.frame.DataFrame'>

The final column, called 'text' contains a text string on each row. I am trying to transform the final column of text values by applying a formula that replaces them with a new string.

However, whatever I do I seem to get the following warning:-

/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py:1763: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy isetter(loc, value)

Assigning my function over the column produces the above error:-

X_train.loc[:,"text"] = X_train.loc[:,"text"].transform(lambda x : a_function(x))

This also produces the error.

X_train.loc[:,"text"] = X_train.loc[:,"text"].apply(lambda x : a_function(x))

Putting the right hand side of the equation into a variable I realise that there is no error from that part of the code (i.e. from the code on the right of the equals sign, X_train.loc[:,"text"].apply(lambda x : a_function(x))). So I know the issue must be connected to the way I am assigning over the top of the dataframe (i.e. the code on the left hand side of the equals sign, X_train.loc[:,"text"])

I have tried just assigning a text string over the top of the values:-

X_train.loc[:,'text'] = "a text string"

I have also tried applying a new Pandas series object containing the new text strings

X_train.loc[:,'text'] = a_series

Both of these experiments conform that the issue is to do with trying to assign over X_train.loc[:,'text']

It seems particularly odd to me that the code appears to be in the .loc[row_indexer,col_indexer] = value format suggested by the warning.

I have researched the Pandas documentation at https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy. However, I am still unclear as to how to resolve my error though I realise that it must be to do with the danger of transforming a shallow copy of my data.

It is worth noting that when I inspect the dataframe after applying the transform, the original data has been amended. The fact that I still get a warning from Pandas concerns me so I don't feel just switching the warnings off. I would rather write code that doesn't produce the error so that I know my code is robust to any new data thrown at it and won't suddenly start amending a shallow copy whilst leaving the original unchanged.

If I am asking too much and

JasonExcel
  • 49
  • 4
  • what's this get you? X_train["text"] = X_train["text"].apply(lambda x : a_function(x)) I don't think you need use .loc in this instance as you are applying lambda to every row – Jonathan Leon Jul 09 '21 at 00:22
  • Thanks Jonathan. Unfortunately it still produces the same warning:- /usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:7: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead It was a worth a try though so thanks for the suggestion – JasonExcel Jul 09 '21 at 19:51
  • 2
    can you post more of your code? it's a little hard without seeing it all. is x_train itself a subset or slice of a different dataframe? or try X_train1=X_train.copy() then operate on X_train1. I use the syntax I provided all the time, so there's something before this that is causing it. – Jonathan Leon Jul 09 '21 at 21:59
  • 1
    Agreed with Jonathan. I believe you are likely doing something like `x_train=SomeOtherDataframe` before applying your function. – Jason Cook Jul 10 '21 at 15:20
  • Thank you Jonathan. You have solved my problem! `X_train = X_train.copy()` resolves the issue completely. You also pointed me to what was creating the error; you were right it was something earlier in the code. I had used the SciKit Learn function `train_test_split()`; it turns out that this creates a View of the data rather than copying it. Hence the warning from Pandas. – JasonExcel Jul 10 '21 at 16:34
  • Just seen your comment Jason - thank you for confirming too. This was indeed the root of the problem. These things always look simple enough once you know the answer but when still in confusion they can be maddening. It has cost me a _huge_ amount of pain getting this far so thank you both again for the help or I would doubtless still be languishing! – JasonExcel Jul 10 '21 at 16:41

0 Answers0