Consider -
df = pd.DataFrame({"a":[1,2,3]})
df
a
0 1
1 2
2 3
I'd like to do two things:
- Convert the dataframe to sparse with a default fill value of
False - Assign a column of all False values to this sparse dataframe
Here's two seemingly similar approaches I've come up with.
First method; assign the column and convert the result to sparse.
df.assign(newcol=False).to_sparse(fill_value=False)
a newcol
0 1 False
1 2 False
2 3 False
Second method; first convert to sparse and then assign the column.
df.to_sparse(fill_value=False).assign(newcol=False)
a newcol
0 1 0.0
1 2 0.0
2 3 0.0
These 0.0s threw me off.
FWIW, this similar-to-the-second method also seems to work properly, giving False instead of 0.0 -
df = df.to_sparse(fill_value=False)
df['newcol'] = pd.SparseSeries([False] * len(df), dtype='bool_', fill_value=False)
df
a newcol
0 1 False
1 2 False
2 3 False
I'm confused by why two seemingly similar methods produce radically different outputs. What's the correct way to do this, and why is there a difference between these outputs?