1

Consider -

df = pd.DataFrame({"a":[1,2,3]})
df

   a
0  1
1  2
2  3

I'd like to do two things:

  1. Convert the dataframe to sparse with a default fill value of False
  2. Assign a column of all False values to this sparse dataframe


Here's two seemingly similar approaches I've come up with.

First method; assign the column and convert the result to sparse.

df.assign(newcol=False).to_sparse(fill_value=False)

   a  newcol
0  1   False
1  2   False
2  3   False

Second method; first convert to sparse and then assign the column.

df.to_sparse(fill_value=False).assign(newcol=False)

   a  newcol
0  1     0.0
1  2     0.0
2  3     0.0

These 0.0s threw me off.

FWIW, this similar-to-the-second method also seems to work properly, giving False instead of 0.0 -

df = df.to_sparse(fill_value=False)
df['newcol'] = pd.SparseSeries([False] * len(df), dtype='bool_', fill_value=False)
df

   a  newcol
0  1   False
1  2   False
2  3   False

I'm confused by why two seemingly similar methods produce radically different outputs. What's the correct way to do this, and why is there a difference between these outputs?

cs95
  • 379,657
  • 97
  • 704
  • 746

0 Answers0