forked from hadley/adv-r
-
Notifications
You must be signed in to change notification settings - Fork 2
/
Performance.rmd
87 lines (56 loc) · 2.14 KB
/
Performance.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---
title: Performance
layout: default
---
# Performance
General techniques for improving performance
## Brainstorming
Most important step is to brainstorm as many possible alternative approaches.
Good to have a variety of approaches to call upon.
* Read blogs
* Algorithm/data structure courses (https://www.coursera.org/course/algs4partI)
* Book
* Read R code
We introduce a few at a high-level in the Rcpp chapter.
## Caching
`readRDS`, `saveRDS`, `load`, `save`
Caching packages
### Memoisation
A special case of caching is memoisation.
### Modifying in place vs. modifying a copy
```R
library(pryr)
x <- 1:5
address(x)
x[2] <- 3L
address(x)
# Assigning in a real number forces conversion of x to real
x[2] <- 3
address(x)
# Modifying class or other attributes modifies in place
attr(x, "a") <- "a"
class(x) <- "b"
address(x)
# But making a reference to x elsewhere, will create a modified
# copy when you modify x - no longer modifies in place
y <- x
x[1] <- 2
address(x)
```
## Byte code compilation
R 2.13 introduced a new byte code compiler which can increase the speed of certain types of code 4-5 fold. This improvement is likely to get better in the future as the compiler implements more optimisations - this is an active area of research.
Using the compiler is an easy way to get speed ups - it's easy to use, and if it doesn't work well for your function, then you haven't invested a lot of time in it, and so you haven't lost much.
## Other people's code
One of the easiest ways to speed up your code is to find someone who's already done it! Good idea to search for CRAN packages.
RppGSL, RcppEigen, RcppArmadillo
Stackoverflow can be a useful place to ask.
### Important vectorised functions
Not all base functions are fast, but many are. And if you can find the one that best matches your problem you may get big improvements
cumsum, diff
rowSums, colSums, rowMeans, colMeans
rle
match
duplicated
Read the source code - implementation in C is usually correlated with high performance.
## Rewrite in a lower-level language
C, C++ and Fortran are easy. C++ easiest, recommended, and described in the following chapter.