Skip to content

Latest commit

 

History

History
236 lines (179 loc) · 7.95 KB

p006-new-to-R.md

File metadata and controls

236 lines (179 loc) · 7.95 KB

New to R?


Xena’s best friend by Vera is licensed under CC BY-NC-ND 2.0

Introduction

Researchers interested in exploring MIDFIELD data are often R novices, learning to use midfieldr at the same time they are learning to use R. There are many good online resources for learning R, so we do not attempt to reproduce that work here. However, we would like to offer some suggestions to help address some of the obstacles our new users have encountered in the past.

Keyboard shortcuts

If you are working in RStudio, you can see the menu of keyboard shortcuts using the menu Tools > Keyboard Shortcuts Help. The shortcuts we use regularly include

Windows / Linux Action Mac OS
ctrl shift K Compile R Markdown document cmd shift K
ctrl L Clear the RSDtudio Console ctrl L
ctrl shift C Comment/uncomment line(s) cmd shift C
ctrl X, C, V Cut, copy, paste cmd X, C, V
ctrl F Find in text cmd F
ctrl I Indent or re-indent lines cmd I
alt Insert the assignment operator <- option
ctrl alt B Run from begining to line cmd option B
ctrl alt E Run from line to end cmd option E
ctrl Enter Run selected line(s) cmd Return
ctrl S Save cmd S
ctrl A Select all text cmd A
ctrl Z Undo cmd Z

Work in an RStudio Project

The first (highly opinionated) rule for MIDFIELD researchers

Always work in an RStudio Project

  • A project can be any unit of work: a course, a workshop, a paper, a grant, practice R, etc.
  • Start every work session in an RStudio project.
  • If your last session was in an RStudio Project, launching RStudio will re-open that project
  • Alternatively, launch a project by navigating to the project directory and running the file with the .Rproj suffix.

RStudio is an an integrated development environment (IDE) for R that includes a console, editor, and tools for plotting, history, debugging, and workspace management as well as access to GitHub for collaboration and version control [1]. If we provide IDE screenshots, they will be from the RStudio interface. As the folks at RStudio assert,

RStudio projects make it straightforward to divide your work into multiple contexts, each with their own working directory, workspace, history, and source documents.

The advantages of working in an RProject are:

  • the working directory is set to the project directory, enabling the use of relative file paths in your scripts to support portability, reproducibility, and collaboration
  • active tabs are restored to where they were the last time the project was closed
  • multiple projects can be open at the same time

You can read more about RProjects here.

Type once-only commands in the Console

The installation instructions for midfielddata and midfieldr start with

# install midfielddata first 
install.packages("midfielddata", 
                 repos = "https://MIDFIELDR.github.io/drat/", 
                 type = "source")

These lines can be typed in the RStudio console rather than an R script, because they generally only have to be run once. If you write them in a working script, you unnecessarily re-install the package every time you run the script.

Stay current

Running old software can be considerably harder than running new software. Get current at the start of a new project, but avoid updating if you are approaching a project deadline.

Navigate to Updating the R habitat for guidance in keeping R, RStudio, and R packages up-to-date.

Use relative paths

Explicitly link files using relative file paths within an RStudio Project. Paths are are relative to the main directory of an RStudio Project. For example, you might have an R script in the scripts directory that reads a raw data file,

DT <- fread("data-raw/2020-03-21-student-record-ver09.txt")

When the data have been cleaned up, you might save the results to the data directory, e.g.,

fwrite(DT, "data/2020-03-21-clean-student-record.txt")

Another script in the scripts directory might read this data,

DT <- fread("data/2020-03-21-clean-student-record.txt")

and after constructing a graph, write the graph to file in the figures directory,

gggsave(filename = "fig-01-enrollees", 
        path = "figures/", 
        device = "png", 
        width = 4.5, 
        height = 3.5, 
        units = "in")

Plan your directory structure

At a minimum, a project might start with three directories,

project-name/
  |- data/ 
  |- documents/
  |- scripts/
  |- project-name.Rproj

data/
Data ready for analysis
File names carefully curated for reproducibility

documents/
Outlines, drafts, other text

scripts/
R scripts for analysis and creating graphs

Two additional directories are often useful:

data-raw/
Data in their original form, never edited manually

figures/
Graphs created by scripts are written to this directory, making them easy to find and sort through later

Directories for larger projects

At the start of a project, you should carefully consider the number and types of files you will create over the life of the project and lay out a directory structure that helps you consistently organize your work.

For a longer-term project, the directory structure can become more detailed. Each publication directory could have its own sub-directories for documents, figures, manage, and scripts. For example:

project-name/
  |- 2018-conference-name/
      |- documents
      |- figures
      |- manage 
      |- scripts
  |- 2019-journal-name/
  |- 2020-presentation-name/
  |- admin/
  |- data/ 
  |- data-raw/
  |- resources/
  |- README.Rmd
  |- project-name.Rproj

yyyy-publication-name/
Everything that creates a specific publication, report, or presentation
Can include R scripts, reports, figures, and administrative materials relevant to the publication or report

admin/
Overall project management and correspondence
Proposals and RFPs
Contracts

manage/
Publication administrative materials
Conference registration
Correspondence with editors Travel reservations, registration, etc.

resources/
Bibliography and CSL files accessed by any of the publications
Image downloads and screen-shots accessed by any of the publications

README.Rmd
Describes the overall project generally
Can facilitate collaboration if git and GitHub are being used

For different opinions on directory structure schemes, see

References

1. RStudio Team (2021) RStudio: Integrated development environment for r. http://www.rstudio.com/