Skip to content

strange behaviour when using filter + if_else #474

@danielspringt

Description

@danielspringt

Hi - the following example produces strange results:

import siuba as sb
from siuba import _, mutate, count, if_else
from siuba.data import penguins

print(f'initial rows:{penguins.shape[0]}')
dat = penguins >> sb.filter(_.island != "Torgersen") 
print(f'rows after filtering:{dat.shape[0]}')

dat = dat >> mutate(
    binary_col = if_else(_.island == 'Biscoe', 1, 0)
    )

dat_count = dat >> count(_.binary_col )
print(dat_count)

I use a filter to drop some of the rows. When using mutate on the filtered dataframe the previously dropped rows
somehow still appear in the dataframe.

I would expect a count output like:

   binary_col    n
0         0.0  110
1         1.0  130

but the dropped observations get labeled with NaN

   binary_col    n
0         0.0  110
1         1.0  130
2         NaN   52

What am I doing wrong?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions