# Running list of R tips from super-advanced class

Cheat Seets: https://rstudio.com/resources/cheatsheets/

## tidyverse

here() : https://cran.r-project.org/web/packages/here/index.html
read_csv(): the way to read in a csv file tidily (as opposed to ‘read.csv’). Read in all data using “read_THING()”
unite(): reshape a dataframe
filter(): subset a dataframe
select(): select certain columns in a tibble
mutate(): create a new column as a function of another function
pivot_longer(): convert a tibble from wide to long format
pivot_wider(): covert a tibble from long to wide
left_join(): take data set 1, and join matching values from data set 2 <– like bind/merge functions (but smarter)
right_join(): take data set 2, and join matching values from data set 1
inner_join(): join rows with matching values
full_join(): join all columns and rows
readxl package: fancy way to read in excel data - https://readxl.tidyverse.org/
purrr::modify_if(is.factor, as.character) - turn factors into characters (!)
tidyr::expand_grid() - provides factorial combinations of things (!)
broom::augment() - stats
purrr::safely() = wrap functions with safely(fctn), and it returns a list with two objects: the result of the function, and any error messages

#### lubridate package

years() year()

#### broom

tidying up stats results!!!
tidy(): Creates a tibble with results from stats functions !!
augment(): adds on statistics results to dataframe <— WOW!

#### janitor

clean_names(): cleans up column names by adding things like underscores

#### ggplot

facet_wrap(~factor, scales = 'free_y')+: make multiple plots by a factor (e.g. same plot repeated by population), and “free_y” adjusts each plots y axis
facet_grid(site~factor)+: make multiple plots by some factors, gridded out where everything on y=by site, and things on x=factor. aes(col=factor, group=factor): try using group= in addition to color/shape etc. to ensure correct grouping.

Set a personalized theme that is frequently used, so not to repeat commonly used customizations

my_theme <- function(...) {
theme(legend.title = element_blank(), #play around with turning these elements on and off one at a time
plot.background = element_rect(),
panel.background = element_rect(fill = 'white'), #color background of plot
panel.border = element_rect(fill = NA), #border of plot
panel.grid = element_blank(),
legend.key = element_blank(),
...)
}


Multipanel plots: patchwork library: https://github.com/thomasp85/patchwork cowplot library:

Example of how to use cowplot for multipanel figures (this example in RMarkdown):

{r cowplot, fig.width = 11}
a <- ggplot(...)
b <- ggplot(...)
plot_grid(a, b, ncol = 2)



### Functions

Best practices:

• Always end a function with a return()
• If function produces a ton of information, you can end it with invisible()
• formals() = arguments that you feed to the function
• body() = meat of the function, the code
• You can set default arguments (aka formals) in functions when writing it. But, when calling the function you can give define arguments, which overwrite the deafult.
• ... - special arguent, allows you to pass additional arguments, that are unspecified in the function.

### Functional programming

• “pure” functions = always spit out the same thing
• “impure” functions = results depend on the input
• Best practices is to group all your “impure” functions together
• Best practices - store each function in a separate file
• !! (bang-bang) evaluates the contents of the enquoted variable
• the program purrr is good for lots of things that apply functions do (see below)
• In RStudio use “CODE -> INSERT ROXYGEN FORMAT” to autopopulate a roxygen header to describe a function

#### Example of how to use a function to create a ggplot, where you can feed it different variables. More complicated than you think:

foo <- function(data, histvar, fillvar) {
histvar <- enquo(histvar) # captures what the user typed
fillvar <- enquo(fillvar)
data %>%
ggplot(aes(!!histvar, fill = !!fillvar)) +
geom_histogram()
}
foo(iris, Sepal.Length, Species)


#### Debugging

• first run traceback(), which will estimate where in the function your error happened
• browser() = add this inside a function near the error, and it will walk you through how to find the error

#### purrr

• map = workhorse, basically the apply() function: map(“Lists to apply function to”,”Function to apply across lists”,”Additional parameters”).
map(mtcars, mean, na.rm = T) # <– calculate the mean of mtcar columns, spits out as a vector

• map_THING() = there are lots of map variations to define the output format
map_df(mtcars, mean, na.rm=T) # <— calculate the mean of mtcar columns, spits out a dataframe (well, a tibble)

• map2() = allows you to apply a function to multiple lists. Example … map2_chr(c(‘one’,’two’,’red’,’blue’), c(‘fish’), paste) ## [1] “one fish” “two fish” “red fish” “blue fish”

A more explicit way of calling the paste with map2, using an anonymous function (~) -

    map2_chr(c('one','two','red','blue'), c('fish'), ~paste(.x, .y))


#### Wrangling and leveraging lists

• listviewer::jsonedit(MYLIST) - great way to interact with lists
• Tibbles let you store lists inside dataframe columns. (Handy for any time you want to organize a complex result (like a regression) in a dataframe.)
• nest(). Here’s a really cool script that runs and returns lm objects for life expectancy by a country:

  gapminder %>%
group_by(country) %>%
nest() %>%
mutate(foo_model = map(data, ~lm(lifeExp ~ gdpPercap, data = .x)))


Then, if we want to clean up the model output names using broom::tidy(), then unnest lists to show lm statistical results in dataframe format:

    gapminder %>%
group_by(country) %>%
nest() %>%
mutate(foo_model = map(data, ~lm(lifeExp ~ gdpPercap, data = .x))) %>%
mutate(foo_coefs = map(foo_model, broom::tidy)) %>%
unnest(cols = foo_coefs)


#### Wramgling lists with purrr

• Extract items from a list with map_chr(‘COLUMN NAME’). Example, “got_chars” is a list containing information about game of thrones characters. So, to get items in the “names” column from the first 5 items in the list:

  got_chars[1:5] %>%
map_chr('name') Another example:

thing <- list(list(y = 2, z = list(w = 'hello')),  # create imbedded list (russian doll)
list(y = 2, z = list(w = 'world')))
map_chr(thing, c("z","w"))


Another example, where we apply the function [ to extract names and allegiances of characters from the first 5 people in the list of game of throwns charaters:

    got_chars[1:5] %>%
map([, c('name', 'allegiances'))


Putting things together, now suppose that we want to find all the Lanisters in the GoT list. The ~ in the map_lgl function is used to tell R that the next thing is a function, not the name of a thing.

    names <- map_chr(got_chars, "name")
is_lannister <- map_lgl(names, ~ stringr::str_detect(.x, "Lannister"))
got_chars %>%
purrr::set_names(names) %>%
purrr::keep(is_lannister) %>%
listviewer::jsonedit()


#### RMarkdown

• RMarkdown cheatsheet
• x~i~ for for subscript
• x^2^ for subscript
• Writing with color: Roses are $\color{red}{\text{beautiful red}}$, violets are $\color{blue}{\text{lovely blue}}$.
• Remember to use here() to read in / save files: here::here("data","mydata.csv")
• Can include R code in-line in text!!! (Wow). Just add an “r “ before calling an R object that was previously defined, for instance: r my_amazing_result
• Equations: use $for inline equations (e.g.$x = \beta^2$) and$$for centered/new paragraph equation. • knitr::kable() function can be used to format tables. (e.g. below). Other packages for highly customized table formatting include: kableExtra, gt. Formatting regression tables = stargazer, xtable, huxtable/huxreg.  knitr::kable(head(mtcars), caption = "My Table", format = "html")  • RMarkdown has a table of contents view option: The YAML at the beginning: --- title: "My Brilliant Paper" author: "Awesome Me" date: "2019-11-03" output: pdf_document: default html_document: toc: yes float_toc: yes ---  Setup code chunk example to load things that won’t be included in knitted file {r setup, include=false} #<--- applies to just this chunk library(tidyverse) knitr::opts_opts_chunk$set(echo=FALSE) # <--- applies to all chunks (but can override in a chunk). 

##### RMarkdown chunk setup options

cache=TRUE: Good for formatting debugging, or running just re-running a subset of chunks. It saves results from code chunks, and only changes results if that chunk has change. Don’t use for re-running full analysis.
eval=FALSE: doesn’t evaluate that chunk {python}: run code in python
echo = FALSE: Hide the actual code in each chunk (just show results)
warning = FALSE: Hide all warning messages
fig.dim = c(6,4): All plots are 6inch wide and 4 inches high
fig.align = "center": And all plots centered
results="asis": need to use this when creating a table. Note: table formatting can be finicky, need to specify format=latex for PDF format, and format=html for HTML format.
error=TRUE: if an error is encountered in the code chunk, continue knitting running rest of RMarkdown file. If error=FALSE knitr stops when it encounters an error.

### bookdown to write papers in RMarkdown

• Use bookdown program to write and knit to bookdown document
• Write entire papers
• Use cross referencing to keep figures, tables, etc. dynamic (if a figure moves, that # and references change automatically)
• Citation managing - export a .bib file from citation manager, add that to YAML. Also specify the citation style in YAML using a .csl file which can be found in this zotero repository. Reference are formatting this way: [@aristotle320BC;@cat2019]; in-line just use the @ symbol, for instance @cat2019 is the source.

### Parallelling etc.

• htop - not automatically on computer, similar to top in that it shows which programs are running, but with pretty graphics.
• Beeper package - creates sound when process is complete, can be customized.
• doParallel program - run processes in parallel.
• To run R script from terminal: Rscript [script name]` (may have to add Rscript to path”
Written on September 30, 2019