I have a standard Pandas data frame:-
<class 'pandas.core.frame.DataFrame'>
The final column, called 'text' contains a text string on each row. I am trying to transform the final column of text values by applying a formula that replaces them with a new string.
However, whatever I do I seem to get the following warning:-
/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py:1763: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy isetter(loc, value)
Assigning my function over the column produces the above error:-
X_train.loc[:,"text"] = X_train.loc[:,"text"].transform(lambda x : a_function(x))
This also produces the error.
X_train.loc[:,"text"] = X_train.loc[:,"text"].apply(lambda x : a_function(x))
Putting the right hand side of the equation into a variable I realise that there is no error from that part of the code (i.e. from the code on the right of the equals sign, X_train.loc[:,"text"].apply(lambda x : a_function(x))). So I know the issue must be connected to the way I am assigning over the top of the dataframe (i.e. the code on the left hand side of the equals sign, X_train.loc[:,"text"])
I have tried just assigning a text string over the top of the values:-
X_train.loc[:,'text'] = "a text string"
I have also tried applying a new Pandas series object containing the new text strings
X_train.loc[:,'text'] = a_series
Both of these experiments conform that the issue is to do with trying to assign over X_train.loc[:,'text']
It seems particularly odd to me that the code appears to be in the .loc[row_indexer,col_indexer] = value format suggested by the warning.
I have researched the Pandas documentation at https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy. However, I am still unclear as to how to resolve my error though I realise that it must be to do with the danger of transforming a shallow copy of my data.
It is worth noting that when I inspect the dataframe after applying the transform, the original data has been amended. The fact that I still get a warning from Pandas concerns me so I don't feel just switching the warnings off. I would rather write code that doesn't produce the error so that I know my code is robust to any new data thrown at it and won't suddenly start amending a shallow copy whilst leaving the original unchanged.
If I am asking too much and