-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More support for lubridate
#53
Comments
If you come across this issue and want to help, take a look at https://tidypolars.etiennebacher.com/contributing#how-to-add-support-for-an-r-function-in-tidypolars |
I second this! As someone new to polars, I've found {tidypolars} very helpful and a great tool. I came here to ask for more lubridate support and saw the current issue and wanted to give a +1. I was trying to use floor_date and make_date with no luck. Also, there isn't a lot of documentation on polars for time-series in R so I did my own workaround (for now.) Not sure if anyone has any recommendations but based off the r-polars vignette and python user guide I found this workflow the best for now. initial test trying to make a date using paste0, too slow
making a date with concat_str and to_date is way faster
note this is on ~ 1million rows of citibike data. |
Hi, thanks for your interest in As I said above, I don't have much time to dedicate to compatibility with For reference, here's a small reprex for your example, with a shorter syntax for method 2: library(tidypolars)
library(polars)
library(dplyr, warn.conflicts = FALSE)
foo <- pl$DataFrame(x = rep("2009-08-03 12:01:59", 1e6))$select(pl$col("x")$str$to_datetime())
foo2 <- foo |>
mutate(
hour = hour(x),
month = month(x),
year = year(x),
mday = mday(x)
)
system.time({
foo2 |>
mutate(
str_dt = paste0(year, "-", month, "-", mday)
)
})
#> user system elapsed
#> 1.18 0.05 1.25
system.time({
foo2$with_columns(
# eventually this should be replaced by `str_dt = pl$date("year", "month", "day")`
str_dt = pl$concat_str("year", pl$lit("-"), "month", pl$lit("-"), "mday")$str$to_date("%Y-%m-%d")
)
})
#> user system elapsed
#> 0.09 0.01 0.11 |
They are now available in the development version of r-polars and will be included in polars 0.16.0. Here's an example with 50M obs: library(polars)
test <- pl$DataFrame(
y = sample(2000:2019, 5*1e7, TRUE),
m = sample(1:12, 5*1e7, TRUE),
d = sample(1:31, 5*1e7, TRUE)
)
system.time({
test$with_columns(
date = pl$concat_str("y", pl$lit("-"), "m", pl$lit("-"), "d")$str$to_date("%Y-%m-%d", strict = FALSE)
)$print()
})
#> shape: (50_000_000, 4)
#> ┌──────┬─────┬─────┬────────────┐
#> │ y ┆ m ┆ d ┆ date │
#> │ --- ┆ --- ┆ --- ┆ --- │
#> │ i32 ┆ i32 ┆ i32 ┆ date │
#> ╞══════╪═════╪═════╪════════════╡
#> │ 2011 ┆ 10 ┆ 22 ┆ 2011-10-22 │
#> │ 2016 ┆ 6 ┆ 16 ┆ 2016-06-16 │
#> │ 2007 ┆ 4 ┆ 21 ┆ 2007-04-21 │
#> │ 2012 ┆ 2 ┆ 9 ┆ 2012-02-09 │
#> │ 2014 ┆ 11 ┆ 25 ┆ 2014-11-25 │
#> │ … ┆ … ┆ … ┆ … │
#> │ 2002 ┆ 3 ┆ 26 ┆ 2002-03-26 │
#> │ 2001 ┆ 1 ┆ 21 ┆ 2001-01-21 │
#> │ 2011 ┆ 12 ┆ 18 ┆ 2011-12-18 │
#> │ 2009 ┆ 9 ┆ 18 ┆ 2009-09-18 │
#> │ 2012 ┆ 5 ┆ 19 ┆ 2012-05-19 │
#> └──────┴─────┴─────┴────────────┘
#> user system elapsed
#> 4.76 0.82 5.66
### NEW
system.time({
test$with_columns(date = pl$date("y", "m", "d"))$print()
})
#> shape: (50_000_000, 4)
#> ┌──────┬─────┬─────┬────────────┐
#> │ y ┆ m ┆ d ┆ date │
#> │ --- ┆ --- ┆ --- ┆ --- │
#> │ i32 ┆ i32 ┆ i32 ┆ date │
#> ╞══════╪═════╪═════╪════════════╡
#> │ 2011 ┆ 10 ┆ 22 ┆ 2011-10-22 │
#> │ 2016 ┆ 6 ┆ 16 ┆ 2016-06-16 │
#> │ 2007 ┆ 4 ┆ 21 ┆ 2007-04-21 │
#> │ 2012 ┆ 2 ┆ 9 ┆ 2012-02-09 │
#> │ 2014 ┆ 11 ┆ 25 ┆ 2014-11-25 │
#> │ … ┆ … ┆ … ┆ … │
#> │ 2002 ┆ 3 ┆ 26 ┆ 2002-03-26 │
#> │ 2001 ┆ 1 ┆ 21 ┆ 2001-01-21 │
#> │ 2011 ┆ 12 ┆ 18 ┆ 2011-12-18 │
#> │ 2009 ┆ 9 ┆ 18 ┆ 2009-09-18 │
#> │ 2012 ┆ 5 ┆ 19 ┆ 2012-05-19 │
#> └──────┴─────┴─────┴────────────┘
#> user system elapsed
#> 2.64 0.41 3.06
system.time({
test$with_columns(date = pl$datetime("y", "m", "d"))$print()
})
#> shape: (50_000_000, 4)
#> ┌──────┬─────┬─────┬─────────────────────┐
#> │ y ┆ m ┆ d ┆ date │
#> │ --- ┆ --- ┆ --- ┆ --- │
#> │ i32 ┆ i32 ┆ i32 ┆ datetime[μs] │
#> ╞══════╪═════╪═════╪═════════════════════╡
#> │ 2011 ┆ 10 ┆ 22 ┆ 2011-10-22 00:00:00 │
#> │ 2016 ┆ 6 ┆ 16 ┆ 2016-06-16 00:00:00 │
#> │ 2007 ┆ 4 ┆ 21 ┆ 2007-04-21 00:00:00 │
#> │ 2012 ┆ 2 ┆ 9 ┆ 2012-02-09 00:00:00 │
#> │ 2014 ┆ 11 ┆ 25 ┆ 2014-11-25 00:00:00 │
#> │ … ┆ … ┆ … ┆ … │
#> │ 2002 ┆ 3 ┆ 26 ┆ 2002-03-26 00:00:00 │
#> │ 2001 ┆ 1 ┆ 21 ┆ 2001-01-21 00:00:00 │
#> │ 2011 ┆ 12 ┆ 18 ┆ 2011-12-18 00:00:00 │
#> │ 2009 ┆ 9 ┆ 18 ┆ 2009-09-18 00:00:00 │
#> │ 2012 ┆ 5 ┆ 19 ┆ 2012-05-19 00:00:00 │
#> └──────┴─────┴─────┴─────────────────────┘
#> user system elapsed
#> 2.25 0.53 2.78 |
first of all, thank you for your modification of method 2 with second, I had not thought about checking the development version. The new support for $date and $datetime is mainly what I am after! This is great to hear. lastly, I still give this ticket a +1 for more support for lubridate, but don't think I'm ready for a PR on it. The help you gave me is exactly what I am after for now |
Even if you did, I only added it in
Once |
@frankiethull I have added support for If you want to try to implement some |
polars
has tons ofdatetime
functions (not all are supported in the R implementation for now) but I don't uselubridate
enough to thorougly test them (I don't have real workflows where I can test that they work as expected).Some help on this would be greatly appreciated. The way to add support for new functions is a bit convoluted, I should make that easier, happy to help if someone wants to take a shot.
The text was updated successfully, but these errors were encountered: