-
Notifications
You must be signed in to change notification settings - Fork 3
/
other-gotchas.Rmd
96 lines (75 loc) · 2.83 KB
/
other-gotchas.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
# Other gotchas
This part is a collection of gotchas that don't fit anywhere else.
## Non-text headers e.g. dates
![](images/non-text-headers.png)
At the time of writing, readxl doesn't convert Excel dates to R dates when they
are in the header row.
Using tidyxl and unpivotr, you can choose to make a cell of any data type into a
tidy 'header', and you can reformat it as text before `spatter()` turns it into
the header of a data frame. Another way to format headers as part of the
`behead()` will be shown later.
```{r}
path <- system.file("extdata", "worked-examples.xlsx", package = "unpivotr")
xlsx_cells(path, sheet = "non-text headers") %>%
behead("left", name) %>%
behead("up", `academic-year`) %>%
mutate(`academic-year` = strftime(`academic-year`, "%Y")) %>%
select(row, data_type, `academic-year`, name, numeric) %>%
spatter(`academic-year`) %>%
select(-row)
```
When a single set of headers is of mixed data types, e.g. some character and
some date, `behead()` chooses the correct ones using the `data_type` column,
before converting them all to text via `format()`.
```{r}
xlsx_cells(path, sheet = "non-text headers") %>%
select(row, col, data_type, character, numeric, date) %>%
behead("up", header)
```
To format a header when a single set of headers are of mixed data types, you
can specify a function for each data type in the call to `behead()`.
```{r}
xlsx_cells(path, sheet = "non-text headers") %>%
select(row, col, data_type, character, numeric, date) %>%
behead("up", header, formatters = list(date = ~ strftime(.x, "%Y"),
character = toupper))
```
## Data embedded in comments
![](images/comments.png)
Comment strings are availabe in the `comment` column, just like `character`.
Comments can have formatting, but tidyxl doesn't yet import the formatting. If
you need this, please open an
[issue](https://github.com/nacnudus/tidyxl/issues). It would probably be
imported into a `comment_formatted` column, similarly to `character_formatted`.
```{r}
path <- system.file("extdata", "worked-examples.xlsx", package = "unpivotr")
xlsx_cells(path, sheet = "comments") %>%
select(row, col, data_type, character, numeric, comment) %>%
behead("up", "header")
```
Comments apply to single cells, so follow the same procedure as "Already a tidy
table but with meaningful formatting of single cells".
```{r}
cells <-
xlsx_cells(path, sheet = "comments") %>%
select(row, col, data_type, character, numeric, comment)
cells
values <-
cells %>%
select(-comment) %>%
behead("up", header) %>%
select(-col) %>%
spatter(header)
values
comments <-
cells %>%
behead("up", header) %>%
mutate(header = paste0(header, "_comment")) %>%
select(row, header, comment) %>%
spread(header, comment)
comments
left_join(values, comments, by = "row") %>%
select(-row)
```
## Named ranges
TODO