-
Notifications
You must be signed in to change notification settings - Fork 0
/
200-fie-workshop.Rmd
577 lines (337 loc) · 16.8 KB
/
200-fie-workshop.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
# 2019 FIE Workshop {- #fie-workshop}
<!-- This file is included in the book only if it is listed in the _bookdown.yml file. -->
```{r include=FALSE}
library(knitr)
opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE, collapse = FALSE)
opts_chunk$set(fig.width = 6, out.width = "70%", fig.align = "center", fig.asp = 0.618)
```
```{r}
htmltools::img(src =
knitr::include_graphics("images/midfield-grad-logo.png"),
alt = 'midfield logo',
style = 'position:absolute; top:50px; right:50px; padding:0px;',
width = 90)
```
Workshop conducted at the 2019 Frontiers in Education Conference (2019-10-16) in Cincinnati, OH.
The goal of the workshop is to share our data, methods, and metrics for intersectional research in student persistence. The workshop is designed for R beginners.
## Before you arrive {-}
If you are joining us for the first time, it is vital that you attempt to set up your computer with the necessary software in advance or it will be difficult to keep up.
- [Getting started](#getting-started) instructions to install R and RStudio
## Description {-}
The goal of the workshop is to make MIDFIELD more accessible to the FIE community via **midfieldr**. On completing the workshop, participants should be able to
- Describe key variables in the MIDFIELD data
- Select academic programs and populations to study
- Use R to compute and graph persistence metrics (e.g., graduation rate)
- Explain key features of effective data displays
- Continue using **midfieldr** to study additional persistence metrics
Participants should be sufficiently familiar with their operating systems to install software and navigate directories, but prior experience with R is not required.
## Agenda {-}
Workshop activities include active learning, discussion, and self-paced software tutorials. Links to the slides for each segment are listed in the table.
```{r}
suppressPackageStartupMessages(library(tidyverse))
df <- tribble(
~Min, ~Topic,
10, "Introductions [[slides](publish_slides/01-2019-FIE-slides-introduction.pdf)]",
5, "The data structure in MIDFIELD [[slides](publish_slides/03-2019-FIE-data-structure-in-midfield.pdf)]",
30, "Finding stories in the data [[slides](publish_slides/04-2019-FIE-slides-stories-in-the-data.pdf)]",
35, "Starting with R (tutorial) [[slides](publish_slides/05-2019-FIE-slides-starting-with-R.pdf)]",
15, "--- break ---",
20, "Elements of effective graphs [[slides](publish_slides/06-2019-FIE-slides-elements-effective-graphs.pdf)]",
5, "Introducing midfieldr [[slides](publish_slides/07-2019-FIE-slides-introduce-midfieldr.pdf)]",
50, "Starting with midfieldr (tutorial) [[slides](publish_slides/08-2019-FIE-slides-starting-with-midfieldr.pdf)]",
10, "Next steps [[slides](publish_slides/09-2019-FIE-slides-next-steps.pdf)]"
)
kable(df)
```
## Tutorial: Starting with R {-}
This is a self-paced tutorial.
- Don't worry about the pace of your work.
- Everyone works and learns new material at a different pace.
- Please ask questions of your neighbors as well as the facilitators
- If you finish early, ask if anyone near you needs assistance
- Save your work regularly
**1. Create an R project**
Open RStudio.
- File > New Project > New Directory > New Project
- Fill in the Directory Name text box with 2019-FIE-midfieldr-workshop
- Select a location on your computer to save the project
- Check the *Open in a new session* box
- Click Create Project
The new project directory will be all of these things:
- a directory or "folder" on your computer
- an RStudio Project
On your computer, if you navigate to the new project folder, you should have at least the following folders and files,
2019-FIE-midfieldr-workshop/
|-- 2019-FIE-midfieldr-workshop.Rproj
Always begin an R work session by opening the project's Rproj file. Let's practice:
- First, close all RStudio windows
- Navigate to the workshop folder you just created
- Open `2019-FIE-midfieldr-workshop.Rproj`
We work within an R project because it automatically sets project directory as the R *working directory*.
Set project options from the Tools pulldown menu
- Tools > Global Options
- *Restore RData into workspace at startup*: Uncheck this box
- *Save workspace to RData on exit*: Set to Never
```{r}
htmltools::img(src =
knitr::include_graphics("images/rstudio-workspace.png"),
alt = 'rstudio tools global options window',
width = 500)
```
Optional reading: To read all about R Projects, see the [RStudio support page](https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects).
**2. Create an R script**
If you closed the project, then open `2019-FIE-midfieldr-workshop.Rproj`
- File > New File > R Script
- An Untitled script will open
- File > Save As
- Type in a file name `start-with-R.R`
- Save
Your project main directory should now have these files,
2019-FIE-midfieldr-workshop/
|-- start-with-R.R
|-- 2019-FIE-midfieldr-workshop.Rproj
In this workshop, all files we create will be saved in the main project folder, with no subdirectories. If you prefer to use subdirectories (folders) inside the project, that's OK,
In the R file (not the Console), type
library(tidyverse)
To run the script,
- Save
- Click the *Run* button to run a selected line or lines of code
- Click *Source* to run the entire script
```{r}
htmltools::img(src =
knitr::include_graphics("images/run-source.png"),
alt = 'showing the run and source buttons',
width = 600)
```
Notes
- If you get an error statement that the package does not exist, you probably have to install the package.
- When the script runs correctly, you should see set of messages like this, where in this work, a double hashtag (`##`) indicates R output.
```
## -- Attaching packages ------------------------------ tidyverse 1.2.1
## v ggplot2 3.0.0.9000 v purrr 0.2.5
## v tibble 1.4.2 v dplyr 0.7.6
## v tidyr 0.8.1 v stringr 1.3.1
## v readr 1.1.1 v forcats 0.3.0
## -- Conflicts --------------------------------------- tidyverse_conflicts()
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
```
Getting help: To see the help page for an R package or function, in your Console type a question mark followed by the object name, e.g.,
? tidyverse
? library
```{r}
htmltools::img(src =
knitr::include_graphics("images/library-help-page.png")
, alt = 'help page for the library function'
, width = 600
)
```
<br>
**3. Look at a tibble**
The tidyverse comes with several data sets. We will use the one called mpg.
The help page for the data set tells us its source package and explains the variables. In the Console, type
? mpg
```{r}
htmltools::img(src =
knitr::include_graphics("images/sample-help-page.png")
, alt = 'help page for the mpg dataset'
, width = 600
)
```
To get a glimpse of the data, just type its name in the script and run it.
- In an R script, anything after a single hashtag is treated as a comment.
- Type the comment in your script.
- Type `mpg` in your script and run it.
```{r echo = TRUE}
# mpg is a data frame included in the tidyverse package
mpg
```
The result tells us that the data has
- `r nrow(mpg)` observations (rows) and `r ncol(mpg)` variables (columns)
- variable types "`r typeof(mpg[[1]])`", "`r typeof(mpg[[3]])`", and "`r typeof(mpg[[4]])`"
*Data frames* are the most common way of storing data in R: a 2-dimensional data structure with columns of equal lengths. Columns can be of different types (numeric, character, logical, etc.) but all values in a column are the same type. Optional reading: [Data frames](http://adv-r.had.co.nz/Data-structures.html#data-frames).
In a *tidy* data frame, every column is a variable and every row is an observation. The `mpg` data is in tidy form.
The output above tells us that `mpg` is a *tibble*. Tibbles are data frames, but they tweak some older behaviors to make life a little easier. Optional reading: [Tibbles](http://r4ds.had.co.nz/tibbles.html).
<br>
**4. Assignment operator**
In R we assign values to objects with the assignment operator, `<-`. For example, suppose we want to assign the value $\pi$ to the object $x$. Type the following in your script and run it.
```{r echo = TRUE}
# an angle in radians
x <- pi
```
Type the object name `x` to see it's value. Type the following in your script and run it.
```{r echo = TRUE}
x
```
<br>
**5. Functions**
Functions are fundamental building blocks in R. Functions carry out specified tasks, typically operating on one or more arguments. For example, the cosine function, `cos()`, has one argument.
Type the following in your script and run it.
```{r echo = TRUE}
# above, x was assigned the value of pi radians
cos(x)
```
A function from the tidyverse that we will use regularly is `glimpse()`. It gives us a useful glimpse of a data frame, showing all the variables (column names) but written down the page, and data runs across the page. making it possible to see every column in a data frame.
Type the following in your script and run it.
```{r echo = TRUE}
glimpse(mpg)
```
The general form of a function is
```
function_name(arg1 = value, arg2 = value, ... )
```
<br>
**6. Graphs**
We use the `ggplot()` function from the tidyverse for creating graphs.
- Sometimes when clicking Source, a plot is not sent to the output. If that happens, select all the lines of code and press Run.
Type the following in your script and run it.
```{r echo = TRUE}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point()
```
`ggplot()` arguments:
- `data` argument assigns the data frame
- `mapping` argument assigns the aesthetics `aes()` function
- `x` and `y` arguments assign variables from the data frame for graphing
- `+` adds a new layer to the graph
- `geom_point()` assigns points as the data markers
<br>
**7. Saving results to a file**
We can save any result to file, for example, to save the graph we just made as a PNG image, type the following in your script and run it.
ggsave("mileage.png")
with a filename in quotes, here `"mileage.png"`. To save a data frame as a CSV file, we use `write_csv()` (tidyverse). Type the following in your script and run it.
write_csv(mpg, "mpg_data.csv")
where `mpg` is the data frame to be written and `"mpg_data.csv"` is the filename. Add these lines to your script and run it.
If you examine your project folder you will find the PNG and CSV files your just made. Your directory should look like this now,
2019-FIE-midfieldr-workshop/
|-- start-with-R.R
|-- mpg_data.csv
|-- mileage.png
|-- 2019-FIE-midfieldr-workshop.Rproj
<br>
**8. Small multiple graphs**
We can use a 3rd variable from the data frame, the variable `class`, as a conditioning variable to create a set of small multiples (or *facets*). The new graph layer is created using the `facet_wrap()` function.
Type the following in your script and run it.
```{r echo = TRUE, fig.asp = 1, out.width = "100%"}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
facet_wrap(vars(class))
```
Or, instead of class, we could condition using the number of cylinders in the engine. Type the following in your script and run it.
```{r echo = TRUE, fig.asp = 1, out.width = "100%"}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
facet_wrap(vars(cyl))
```
<br>
**9. filter() and the pipe operator**
If we aren't interested in 5-cylinder cars, we omit them from the data frame with the `filter()` function (tidyverse). Filter is a row-operation, keeping all rows that meet the conditional statement and removing all other rows. Here we'll use the condition `cyl != 5`, that is, keep all rows for which the value of the `cyl` variable is not equal to 5 (`!` is the negation operator).
Type the following in your script and run it.
```{r echo=TRUE}
mpg_rev <- mpg %>%
filter(cyl != 5)
```
I've also introduced the pipe operator `%>%` here. After assigning `mpg` to a new object `mpg_rev`, the pipe allows us to operate on the result `mpg_rev` and filter its rows, leaving the original`mpg` data frame unaltered.
Thus, the pipe operator can be thought of as a "then" statement. Do this, "then" use the result and perform the next operation.
The same graph with the revised data frame has only three facets. The `nrow = 1` argument places the panels in one row.
Type the following in your script and run it.
```{r echo=TRUE, fig.asp = 5/12, out.width = "100%"}
ggplot(data = mpg_rev, mapping = aes(x = displ, y = hwy)) +
geom_point() +
facet_wrap(vars(cyl), nrow = 1)
```
<br>
**10. Smooth curve**
Adding the `geom_smooth()` layer adds a loess curve (a local regression) to each panel.
Type the following in your script and run it.
```{r echo=TRUE, fig.asp = 5/12, out.width = "100%"}
ggplot(data = mpg_rev, mapping = aes(x = displ, y = hwy)) +
geom_point() +
facet_wrap(vars(cyl), nrow = 1) +
geom_smooth()
```
If we want to try a linear regression , add the argument `method - "lm"`. Type the following in your script and run it.
```{r echo=TRUE, fig.asp = 5/12, out.width = "100%"}
ggplot(data = mpg_rev, mapping = aes(x = displ, y = hwy)) +
geom_point() +
facet_wrap(vars(cyl), nrow = 1) +
geom_smooth(method = "lm")
```
The shaded area around the regression is the confidence interval. We can remove it using the argument `se = FALSE`. Type the following in your script and run it.
```{r echo=TRUE, fig.asp = 5/12, out.width = "100%"}
ggplot(data = mpg_rev, mapping = aes(x = displ, y = hwy)) +
geom_point() +
facet_wrap(vars(cyl), nrow = 1) +
geom_smooth(method = "lm", se = FALSE)
```
<br>
**11. Labels**
We can change the axis labels and add a title (above the figure) and a caption (below the figure) using the `labs()` layer. Type `? mpg` in the Console to find out that the engine displacement is in liters.
Type the following in your script and run it.
```{r echo=TRUE, fig.asp = 6/12, out.width = "100%"}
ggplot(data = mpg_rev, mapping = aes(x = displ, y = hwy)) +
geom_point() +
facet_wrap(vars(cyl), nrow = 1) +
geom_smooth(method = "lm", se = FALSE) +
labs(x = "Engine displacement (liters)",
y = "Highway mileage (mpg)",
title = "Fuel economy of popular cars, 1199--2008",
caption = "EPA http://fueleconomy.gov")
```
<br>
**12. Wrap-up**
This concludes our brief introduction to R. You learned to
- Create an R project
- Create an R script
- Save and run an R script
Regarding R syntax you should be able to
- Describe a data frame
- Describe the attributes of a tidy data frame
- Use the assignment operator
- Describe basic components of an R function
- Draw a simple scatterplot
- Create small multiples
- Add a smooth loess curve to a graph
- Add a linear regression to a graph
- Save a graph as a PNG file
- Save a data frame as a CSV file
**13. To learn more**
Navigate to the MIDFILED Institute site for three 50-minute tutorials that go into more depth on these topics:
- [R basics](https://midfieldr.github.io/workshops/r-basics.html)
- [Graph basics](https://midfieldr.github.io/workshops/graph-basics.html)
- [Data basics](https://midfieldr.github.io/workshops/data-basics.html)
## Tutorial: Starting with midfieldr {-}
This is a self-paced tutorial illustrating functions in the midfieldr package.
- Don't worry about the pace of your work.
- Everyone works and learns new material at a different pace.
- Please ask questions of your neighbors as well as the facilitators
- If you finish early, ask if anyone near you needs assistance
- Save your work regularly
**Create a new R script**
If you closed the project, then open `2019-FIE-midfieldr-workshop.Rproj`
- File > New File > R Script
- An Untitled script will open
- File > Save As
- Type in a new file name, for example, `start-with-midfieldr.R`
- Save
Your directory should look like this now,
2019-FIE-midfieldr-workshop/
|-- start-with-midfieldr.R
|-- start-with-R.R
|-- mpg_data.csv
|-- mileage.png
|-- 2019-FIE-midfieldr-workshop.Rproj
When you navigate to the tutorial website, you will
- Follow the instructions, adding lines of code to your new script
- Run the script after adding each line of code
- The URL is https://midfieldr.github.io/midfieldr/index.html
- Select the *Getting started* tab
- Scroll down to *Select programs to study*
```{r}
htmltools::img(src =
knitr::include_graphics("images/tutorial-screen-4.png")
, alt = 'the midfieldr getting started web page')
```
## Acknowledgements {-}
Our sincere thanks to our "R-bar" volunteers Hossein Ebrahiminejad and Hassan Al Yagoub, without whom the software tutorials could not have as smoothly as they did.
## References {-}