Running list of R tips from super-advanced class

Cheat Seets: https://rstudio.com/resources/cheatsheets/

tidyverse

here() : https://cran.r-project.org/web/packages/here/index.html
read_csv(): the way to read in a csv file tidily (as opposed to ‘read.csv’). Read in all data using “read_THING()”
unite(): reshape a dataframe
filter(): subset a dataframe
select(): select certain columns in a tibble
mutate(): create a new column as a function of another function
pivot_longer(): convert a tibble from wide to long format
pivot_wider(): covert a tibble from long to wide
left_join(): take data set 1, and join matching values from data set 2 <– like bind/merge functions (but smarter)
right_join(): take data set 2, and join matching values from data set 1
inner_join(): join rows with matching values
full_join(): join all columns and rows
readxl package: fancy way to read in excel data - https://readxl.tidyverse.org/
purrr::modify_if(is.factor, as.character) - turn factors into characters (!)
tidyr::expand_grid() - provides factorial combinations of things (!)
broom::augment() - stats
purrr::safely() = wrap functions with safely(fctn), and it returns a list with two objects: the result of the function, and any error messages

stringr package

lubridate package

years() year()

broom

tidying up stats results!!!
tidy(): Creates a tibble with results from stats functions !!
augment(): adds on statistics results to dataframe <— WOW!

janitor

clean_names(): cleans up column names by adding things like underscores

ggplot

facet_wrap(~factor, scales = 'free_y')+: make multiple plots by a factor (e.g. same plot repeated by population), and “free_y” adjusts each plots y axis
facet_grid(site~factor)+: make multiple plots by some factors, gridded out where everything on y=by site, and things on x=factor. aes(col=factor, group=factor): try using group= in addition to color/shape etc. to ensure correct grouping.

Set a personalized theme that is frequently used, so not to repeat commonly used customizations

my_theme <- function(...) {
  theme(legend.title = element_blank(), #play around with turning these elements on and off one at a time
        plot.background = element_rect(), 
        panel.background = element_rect(fill = 'white'), #color background of plot
        panel.border = element_rect(fill = NA), #border of plot
        panel.grid = element_blank(),
        legend.key = element_blank(),
  ...)
}

Multipanel plots: patchwork library: https://github.com/thomasp85/patchwork cowplot library:

Example of how to use cowplot for multipanel figures (this example in RMarkdown):

```{r cowplot, fig.width = 11}
a <- ggplot(...)
b <- ggplot(...)
plot_grid(a, b, ncol = 2)
```

Functions

Best practices:

Always end a function with a return()
If function produces a ton of information, you can end it with invisible()
formals() = arguments that you feed to the function
body() = meat of the function, the code
You can set default arguments (aka formals) in functions when writing it. But, when calling the function you can give define arguments, which overwrite the deafult.
... - special arguent, allows you to pass additional arguments, that are unspecified in the function.

Functional programming

“pure” functions = always spit out the same thing
“impure” functions = results depend on the input
Best practices is to group all your “impure” functions together
Best practices - store each function in a separate file
!! (bang-bang) evaluates the contents of the enquoted variable
the program purrr is good for lots of things that apply functions do (see below)
In RStudio use “CODE -> INSERT ROXYGEN FORMAT” to autopopulate a roxygen header to describe a function

Example of how to use a function to create a ggplot, where you can feed it different variables. More complicated than you think:

foo <- function(data, histvar, fillvar) {
  histvar <- enquo(histvar) # captures what the user typed  
  fillvar <- enquo(fillvar)
    data %>%
    ggplot(aes(!!histvar, fill = !!fillvar)) +
    geom_histogram()  
}
foo(iris, Sepal.Length, Species)

Debugging

first run traceback(), which will estimate where in the function your error happened
browser() = add this inside a function near the error, and it will walk you through how to find the error

`purrr`

map = workhorse, basically the apply() function: map(“Lists to apply function to”,”Function to apply across lists”,”Additional parameters”).
map(mtcars, mean, na.rm = T) # <– calculate the mean of mtcar columns, spits out as a vector
map_THING() = there are lots of map variations to define the output format
`map_df(mtcars, mean, na.rm=T) # <— calculate the mean of mtcar columns, spits out a dataframe (well, a tibble)
map2() = allows you to apply a function to multiple lists. Example … map2_chr(c(‘one’,’two’,’red’,’blue’), c(‘fish’), paste) ## [1] “one fish” “two fish” “red fish” “blue fish”

A more explicit way of calling the paste with map2, using an anonymous function (~) -

    map2_chr(c('one','two','red','blue'), c('fish'), ~paste(.x, .y))

Wrangling and leveraging lists

`listviewer::jsonedit(MYLIST) - great way to interact with lists
Tibbles let you store lists inside dataframe columns. (Handy for any time you want to organize a complex result (like a regression) in a dataframe.)

nest(). Here’s a really cool script that runs and returns lm objects for life expectancy by a country:

  gapminder %>% 
    group_by(country) %>% 
    nest() %>% 
    mutate(foo_model = map(data, ~lm(lifeExp ~ gdpPercap, data = .x))) 

Then, if we want to clean up the model output names using broom::tidy(), then unnest lists to show lm statistical results in dataframe format:

    gapminder %>% 
      group_by(country) %>% 
      nest() %>% 
      mutate(foo_model = map(data, ~lm(lifeExp ~ gdpPercap, data = .x))) %>% 
      mutate(foo_coefs = map(foo_model, broom::tidy)) %>% 
      unnest(cols = foo_coefs)

Wramgling lists with `purrr`

Extract items from a list with `map_chr(‘COLUMN NAME’). Example, “got_chars” is a list containing information about game of thrones characters. So, to get items in the “names” column from the first 5 items in the list:

  got_chars[1:5] %>%
      map_chr('name') Another example:  
    
  thing <- list(list(y = 2, z = list(w = 'hello')),  # create imbedded list (russian doll) 
          list(y = 2, z = list(w = 'world')))
   map_chr(thing, c("z","w")) 

Another example, where we apply the function [ to extract names and allegiances of characters from the first 5 people in the list of game of throwns charaters:

    got_chars[1:5] %>%
      map(`[`, c('name', 'allegiances'))  

Putting things together, now suppose that we want to find all the Lanisters in the GoT list. The ~ in the map_lgl function is used to tell R that the next thing is a function, not the name of a thing.

    names <- map_chr(got_chars, "name")
    is_lannister <- map_lgl(names, ~ stringr::str_detect(.x, "Lannister")) 
    got_chars %>% 
      purrr::set_names(names) %>% 
      purrr::keep(is_lannister) %>% 
      listviewer::jsonedit()

RMarkdown

RMarkdown cheatsheet
x~i~ for for subscript
x^2^ for subscript
Writing with color: Roses are $\color{red}{\text{beautiful red}}$, violets are $\color{blue}{\text{lovely blue}}$.
Remember to use here() to read in / save files: here::here("data","mydata.csv")
Can include R code in-line in text!!! (Wow). Just add an “r “ before calling an R object that was previously defined, for instance: r my_amazing_result
Equations: use $ for inline equations (e.g. $x = \beta^2$) and $$ for centered/new paragraph equation.
knitr::kable() function can be used to format tables. (e.g. below). Other packages for highly customized table formatting include: kableExtra, gt. Formatting regression tables = stargazer, xtable, huxtable/huxreg.
```
  knitr::kable(head(mtcars), caption = "My Table", format = "html")
```
RMarkdown has a table of contents view option:

The YAML at the beginning:

---
title: "My Brilliant Paper"
author: "Awesome Me"
date: "2019-11-03"
output: 
  pdf_document: default
  html_document:
    toc: yes
    float_toc: yes
---       

Setup code chunk example to load things that won’t be included in knitted file

{r setup, include=false} #<--- applies to just this chunk library(tidyverse) knitr::opts_opts_chunk$set(echo=FALSE) # <--- applies to all chunks (but can override in a chunk).

RMarkdown chunk setup options

cache=TRUE: Good for formatting debugging, or running just re-running a subset of chunks. It saves results from code chunks, and only changes results if that chunk has change. Don’t use for re-running full analysis.
eval=FALSE: doesn’t evaluate that chunk {python}: run code in python
echo = FALSE: Hide the actual code in each chunk (just show results)
warning = FALSE: Hide all warning messages
fig.dim = c(6,4): All plots are 6inch wide and 4 inches high
fig.align = "center": And all plots centered
results="asis": need to use this when creating a table. Note: table formatting can be finicky, need to specify format=latex for PDF format, and format=html for HTML format.
error=TRUE: if an error is encountered in the code chunk, continue knitting running rest of RMarkdown file. If error=FALSE knitr stops when it encounters an error.

`bookdown` to write papers in RMarkdown

Use bookdown program to write and knit to bookdown document
Write entire papers
Use cross referencing to keep figures, tables, etc. dynamic (if a figure moves, that # and references change automatically)
Citation managing - export a .bib file from citation manager, add that to YAML. Also specify the citation style in YAML using a .csl file which can be found in this zotero repository. Reference are formatting this way: [@aristotle320BC;@cat2019]; in-line just use the @ symbol, for instance @cat2019 is the source.

Parallelling etc.

htop - not automatically on computer, similar to top in that it shows which programs are running, but with pretty graphics.
Beeper package - creates sound when process is complete, can be customized.
doParallel program - run processes in parallel.
To run R script from terminal: Rscript [script name] (may have to add Rscript to path”

Written on September 30, 2019