-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement unnest()
for LazyFrame
#397
Conversation
yes - that it usefull to have
There is library(polars)
pl$set_options(do_not_repeat_call = TRUE)
dtypes_are_struct = \(dtypes) sapply(dtypes, \(dt) pl$same_outer_dt(dt,pl$Struct(pl$UInt8))) # or what ever inner type
test <- pl$DataFrame(iris[, 1:2])$
with_columns(pl$col("Sepal.Width")$to_struct())
test$dtypes |> dtypes_are_struct()
> FALSE TRUE Diving in to the error # we cannot currently make an empty Struct datatype with no inner Fields = (stringname, DataType),
# however though rarely, it is a valid type, and if fixing the bug this would happen
pl$Struct()
> DataType: Struct(
[],
)
#in py-polars it possible to create an empty Struct DataType, so we should be able to do that too.
pl.Struct()
> Struct
pl.Struct([pl.Int64]) # just to show how inner types are printed in py-polars
> Struct([Int64])
#but it is still not possible similarly to create an empty struct as so
pl.struct()
> ... PanicException: index out of bounds: the len is 0 but the index is 0
# note
# struct spelled with minor s, is the "struct" as a lazy Expr or Eager Series
# Struct with captital is the datatype of a struct
# similarly List is the datatype of list
# DataType is the class name of a polars datatype
# probably same error in R
pl$struct(list())
> polars Expr: thread '<unnamed>' panicked at 'index out of bounds: the len is 0 but the index is 0'
# or this error in R. As py-polars has not defined what pl.struct() should do, I guess any error
# should do for now
> pl$struct()
Error in pl$struct() : argument "exprs" is missing, with no default
In addition: Warning message:
In str(x) : restarting interrupted promise evaluation
not sure I understood, it would not be meaningful to have a struct or a DataFrame with undefined names. # in py-polares
pl.struct([1,2],eager = True)
> DuplicateError: multiple fields with name 'literal' found pl$struct(list(1, 2),eager = TRUE)
Error: Execution halted with the following contexts
0: In R: in pl$struct:
0: During function call [pl$struct(list(1, 2), eager = TRUE)]
1: Encountered the following error in Rust-Polars:
duplicate: multiple fields with name 'literal' found maybe you mean the unnest(names = NULL) ? what about it?
I don't quite follow what you are aiming for here. If you do not use alias the |
Ohh I read this page as an issue not a PR sorry @etiennebacher . Makes a lot more sense now :) |
Thank you for the explanation @sorhawell, I'll wait for #398 to be fixed before completing this one |
@sorhawell I used this occasion to do the detection of struct columns on the R side rather than rust for both Dataframe and Lazyframe. I don't think we lose a lot of speed or memory and the rust code is cleaner |
I don't think so either, unless there are 10k columns and even then probably less than 1s :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me 👨🎤
This works when we specify names but not when we don't pass anything, while it should unnest all
Struct
columns. The problem is that:Struct
because it's special DataType format compared toFloat64
for example:dtype
orcolumns
methods that themselves require aniter
method.@sorhawell could you take a look for the case when
names = NULL
? Is it possible a helperis_struct()
on the R side?Works fine when we specify names: