docs: trailing whitespaces

pola-rs · Mar 11, 2024 · d4caebe · d4caebe
1 parent 5ab94d9
commit d4caebe
Show file tree

Hide file tree

Showing 3 changed files with 20 additions and 20 deletions.
diff --git a/altdoc/reference_home.Rmd b/altdoc/reference_home.Rmd
@@ -26,11 +26,11 @@ to choose between eager and lazy evaluation, that require respectively a
 for grouped data).
 
 We can apply functions directly on a `DataFrame` or `LazyFrame`, such as `rename()`
-or `drop()`. Most functions that can be applied to `DataFrame`s can also be used 
-on `LazyFrame`s, but some are specific to one or the other. For example: 
+or `drop()`. Most functions that can be applied to `DataFrame`s can also be used
+on `LazyFrame`s, but some are specific to one or the other. For example:
 
 * `$equals()` exists for `DataFrame` but not for `LazyFrame`;
-* `$collect()` executes a lazy query, which means it can only be applied on 
+* `$collect()` executes a lazy query, which means it can only be applied on
   a `LazyFrame`.
 
 Another common data structure is the `Series`, which can be considered as the
@@ -89,7 +89,7 @@ test$group_by(pl$col("cyl"))$agg(
 ## Expressions
 
 Expressions are the building blocks that give all the flexibility we need to
-modify or create new columns. 
+modify or create new columns.
 
 Two important expressions starters are `pl$col()` (names a column in the context)
 and `pl$lit()` (wraps a literal value or vector/series in an Expr). Most other
@@ -118,7 +118,7 @@ when it is applied on binary data or on string data.
 To be able to distinguish those usages and to check the validity of a query,
 `polars` stores methods in subnamespaces. For each datatype other than numeric
 (floats and integers), there is a subnamespace containing the available methods:
-`dt` (datetime), `list` (list), `str` (strings), `struct` (structs), `cat` 
+`dt` (datetime), `list` (list), `str` (strings), `struct` (structs), `cat`
 (categoricals) and `bin` (binary). As a sidenote, there is also an exotic
 subnamespace called `meta` which is rarely used to manipulate the expressions
 themselves. Each subsection in the "Expressions" section lists all operations
@@ -148,7 +148,7 @@ df$with_columns(
 )
 ```
 
-Similarly, to convert a string column to uppercase, we use the `str` prefix 
+Similarly, to convert a string column to uppercase, we use the `str` prefix
 before using `to_uppercase()`:
 
 ```{r}

diff --git a/vignettes/performance.Rmd b/vignettes/performance.Rmd
@@ -17,7 +17,7 @@ options(rmarkdown.html_vignette.check_title = FALSE)
 
 
 As highlighted by the [DuckDB benchmarks](https://duckdblabs.github.io/db-benchmark/),
-`polars` is very efficient to deal with large datasets. Still, one can make `polars` 
+`polars` is very efficient to deal with large datasets. Still, one can make `polars`
 even faster by following some good practices.
 
 
@@ -100,7 +100,7 @@ will internally check whether it can be optimized, for example by reordering
 some operations.
 
 Let's re-use the example above but this time with `polars` syntax and 10M
-observations. For the purpose of this vignette, we can create a `LazyFrame` 
+observations. For the purpose of this vignette, we can create a `LazyFrame`
 directly in our session, but if the data was stored in a CSV file for instance,
 we would have to scan it first with `pl$scan_csv()`:
 
@@ -140,7 +140,7 @@ lazy_query = lf_test$
 lazy_query
 ```
 
-However, this doesn't do anything to the data until we call `collect()` at the 
+However, this doesn't do anything to the data until we call `collect()` at the
 end. We can now compare the two approaches (in the `lazy` timing, calling `collect()`
 both reads the data and process it, so we include the data loading part in the
 `eager` timing as well):
@@ -165,11 +165,11 @@ bench::mark(
 
 
 On this very simple query, using lazy execution instead of eager execution lead
-to a 1.7-2.2x decrease in time. 
+to a 1.7-2.2x decrease in time.
 
 So what happened? Under the hood, `polars` reorganized the query so that it
-filters rows while reading the csv into memory, and then sorts the remaining 
-data. This can be seen by comparing the original query (`describe_plan()`) and 
+filters rows while reading the csv into memory, and then sorts the remaining
+data. This can be seen by comparing the original query (`describe_plan()`) and
 the optimized query (`describe_optimized_plan()`):
 
 ```{r}
@@ -179,7 +179,7 @@ lazy_query$describe_optimized_plan()
 ```
 
 
-Note that the queries must be read from bottom to top, i.e the optimized query 
+Note that the queries must be read from bottom to top, i.e the optimized query
 is "select the dataset where the column 'country' matches these values, then sort
 the data by the values of 'country'".
 
@@ -188,13 +188,13 @@ the data by the values of 'country'".
 
 `polars` comes with a large number of built-in, optimized, basic functions that
 should cover most aspects of data wrangling. These functions are designed to be
-very memory efficient. Therefore, using R functions or converting data back and 
+very memory efficient. Therefore, using R functions or converting data back and
 forth between `polars` and R is discouraged as it can lead to a large decrease in
 efficiency.
 
 Let's use the test data from the previous section and let's say that we only want
 to check whether each country contains "na". This can be done in (at least) two
-ways: with the built-in function `contains()` and with the base R function 
+ways: with the built-in function `contains()` and with the base R function
 `grepl()`. However, using the built-in function is much faster:
 
 ```r
@@ -207,7 +207,7 @@ bench::mark(
       grepl("na", s)
     })
   ),
-  grepl_nv = df_test$limit(1e6)$with_columns( 
+  grepl_nv = df_test$limit(1e6)$with_columns(
     pl$col("country")$apply(\(str) {
       grepl("na", str)
     }, return_type = pl$Boolean)
@@ -221,12 +221,12 @@ bench::mark(
 #> # A tibble: 3 × 6
 #>   expression      min   median `itr/sec` mem_alloc `gc/sec`
 #>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
-#> 1 contains   387.02ms 432.12ms     2.27   401.86KB    0    
+#> 1 contains   387.02ms 432.12ms     2.27   401.86KB    0
 #> 2 grepl         2.06s    2.11s     0.466  114.79MB    0.512
 #> 3 grepl_nv      6.42s    6.52s     0.153    7.65MB   10.3
 ```
 
-Using custom R functions can be useful, but when possible, you should use the 
+Using custom R functions can be useful, but when possible, you should use the
 functions provided by `polars`. See the Reference tab for a complete list of
 functions.
 
@@ -236,7 +236,7 @@ functions.
 Finally, quoting [Polars User Guide](https://pola-rs.github.io/polars-book/user-guide/concepts/streaming/):
 
 > One additional benefit of the lazy API is that it allows queries to be executed
-> in a streaming manner. Instead of processing the data all-at-once Polars can 
+> in a streaming manner. Instead of processing the data all-at-once Polars can
 > execute the query in batches allowing you to process datasets that are
 > larger-than-memory.
 

diff --git a/vignettes/polars.Rmd b/vignettes/polars.Rmd
@@ -319,7 +319,7 @@ column. See the section below for more details on data types.
 ## Reshape
 
 Polars supports data reshaping, going from both long to wide (a.k.a. "pivotting",
-or `pivot_wider()` in `tidyr`), and from wide to long (a.k.a. "unpivotting", 
+or `pivot_wider()` in `tidyr`), and from wide to long (a.k.a. "unpivotting",
 "melting", or `pivot_longer()` in `tidyr`).
 Let's switch to the `Indometh` dataset to demonstrate some basic examples.
 Note that the data are currently in long format.