class: center, middle, inverse, title-slide # Best practices for using and teaching R ### Amelia McNamara ### 9/2/2021 --- --- ## Best practices - Best practices for coding (for us, and to teach students) - Best practices for teaching - Biggest takeaways: - Be consistent - Consider congitive load --- ## Style guides - [Google](https://google.github.io/styleguide/Rguide.html) - [Tidyverse](https://style.tidyverse.org) --- ## Naming conventions Pick one and stick to it. Two common conventions are `CamelCase` and `snake_case`. Don't use dots in names (e.g. `object.case`). (I'm not great at this one, but I try to use `snake_case`) --- ## Assignment operators Pick one, and stick to it. Most common are `<-` or `=`. I try to use `<-` for assignment, and only use `=` for specifying function arguments. ```r m1 <- lm(mpg ~ cyl, data = mtcars) ``` --- ## Whitespace Pick a system, and stick to it. I recommend following the spacing suggestions in the tidyverse style guide. Spaces after commas. ```r # Good x[, 1] # Bad x[,1] x[ ,1] x[ , 1] ``` --- ## Whitespace Pick a system, and stick to it. I recommend following the spacing suggestions in the tidyverse style guide. No spaces around parentheses. ```r # Good mean(x, na.rm = TRUE) # Bad mean (x, na.rm = TRUE) mean( x, na.rm = TRUE ) ``` --- ## Whitespace Pick a system, and stick to it. I recommend following the spacing suggestions in the tidyverse style guide. Spaces around assignment operators and other "infix operators" ```r # Good height <- (feet * 12) + inches mean(x, na.rm = TRUE) # Bad height<-feet*12+inches mean(x, na.rm=TRUE) ``` --- ## Whitespace Pick a system, and stick to it. I recommend following the spacing suggestions in the tidyverse style guide. Spaces around pipes, and line breaks after ```r # Good iris %>% group_by(Species) %>% summarize_if(is.numeric, mean) %>% ungroup() %>% gather(measure, value, -Species) %>% arrange(value) # Bad iris %>% group_by(Species) %>% summarize_all(mean) %>% ungroup %>% gather(measure, value, -Species) %>% arrange(value) ``` --- ## Bonus tip: styler Bonus tip-- the package `styler` will automatically style selections or entire files for you! In particular, it adds whitespace and adjusts assignment operators for consistency. I run the add-in from `styler` on every document before I hand it to students. --- ## Miscellaneous best practices - don't need to end lines with semicolons in R - Use `library` rather than `require` ([why?](https://yihui.org/en/2014/07/library-vs-require/)) - give yourself an understandable file structure, e.g. `data/` folder ([why? and how?](https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1001745)) - aim for reproducibility (RMarkdown or .R files) - when you close R, don't save your Workspace ([why? and how?](https://rstats.wtf/save-source.html#always-start-r-with-a-blank-slate)) - don't use `rm(list = ls())` ([why?](https://rstats.wtf/save-source.html#rm-list-ls)) - use paths or projects instead of `setwd` ([why?](https://www.tidyverse.org/blog/2017/12/workflow-vs-script/)) - don't use `attach`, instead use `with`, the `$`, or tidyverse syntax ([why?](https://search.r-project.org/R/refmans/base/html/attach.html)) --- ## Syntax Pick one and stick to it. The three major syntaxes are: - base - formula (interface) - tidyverse See [my cheatsheet](https://osf.io/2k8fw/) for more examples, or [the labs I taught fully in formula and fully in tidyverse](https://www.amelia.mn/STAT220labs). --- ## Use RStudio! Even if you aren't going to use packages from the tidyverse, RStudio is such a great Integrated Development Environment. One simple thing it does is make it impossible to lose the Plots pane. It makes it easier to look at help and work at the same time. Visualizes what's in your Environment. And much more! --- ## RStudio Cloud A cloud version of RStudio that can be used to set up packages and files ahead of time for students. Now costs money, but could be worth it. --- ## File types Pick one and stick to it. Most common would be .R files and .Rmd files. I personally use RMarkdown for everything, because it offers a way to mix text and code, and has nice output formats. But .R is fine, too. Another option is to not show files, and do everything in the Console. But, students still need to take notes somehow! --- ## Congitive load Especially when introducing R, try to reduce cognitive load as much as possible. - Offer scaffolding - documents with some amount of pre-filled material - 'cheatsheet' with all code they will use (see [mine](https://www.amelia.mn/STAT220labs)) - "Let them eat cake first" (via [Data science in a box](https://datasciencebox.org/design-principles.html#start-with-cake)) - start with the good stuff. This is often visualization, but could be some other task. - if at all possible, **don't** start with them installing R locally --- ## Congitive load - Make sure every function you introduce comes back. - Reduce the total number of functions you show students - In my formula labs, I showed 37 functions - In my tidyverse labs, I showed 50 functions - The functions shown in the two labs overlapped by 18 functions - These numbers might seem high (they do to me!) but I recommend checking out your materials to see how many you show. The function `getParseData` will allow you to parse a document. --- ## Resources - [Google style guide](https://google.github.io/styleguide/Rguide.html) - [Tidyverse style guide](https://style.tidyverse.org) - [Best Practices for Scientific Computing](https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1001745) - [Good Enough Practices in Scientific Computing](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005510) - [Data Science in a Box](https://datasciencebox.org/) - [Syntax cheatsheet](https://osf.io/2k8fw/) (also available on [RStudio Cheatsheets page](https://www.rstudio.com/resources/cheatsheets/)) - [What they forgot to teach you about R](https://rstats.wtf/) - [Integrating Computation in Statistics: Instructional Decisions for Teaching R](https://youtu.be/ZsYJ81TwGW8) - [Speaking R](https://www.youtube.com/watch?v=ckW9sSdIVAc&t=676s)