Introduction to the R Statistical Computing Environment: One-Day Workshop

John Fox

Department of Sociology, McMaster University

ICPSR Summer Program at York University

2017

Short URL: tinyurl.com/York-R-course

Please read the installation instructions for R, R Studio, and Stan.

3D rgl graph Lowess

 

The R statistical programming language and computing environment is the de-facto standard among statisticians for writing statistical software and has become very popular in other fields, including the social sciences: It is now possibly the most widely used statistical software in the world. R is a free, open-source implementation and extension of the S language, and is available for Windows, Mac OS X, and Linux/Unix systems. The substantial capabilities of the basic R software are augmented by about 11,000 contributed R "packages" for various statistical methods, freely available on the Comprehensive R Archive Network (CRAN).

Most, but not all, of the material for the R workshop is drawn from Fox and Weisberg, An R Companion to Applied Regression, Second Edition, and from the third edition of this book, which is in preparation. Topics to be covered in the workshop include (with related resources):

bullet Getting started with R and RStudio: getting-started.R

bullet Workflow in R and R Studio: Enabling reproducible research: Duncan.Rmd

bullet Reading and manipulating data in R: data.R, Duncan.txt, Duncan.csv, Duncan.xlsx

bullet Basic statistical graphics in R: graphs.R

bullet Fitting and working with linear and generalized linear models in R: models.R

Acquiring R, RStudio, and Stan

R, RStudio, and Stan are all free software, available on the internet. Stan implements state-of-art methods for Bayesian inference, and may be accessed through R via the rstan package. Please see the instructions for installing R, RStudio, and Stan.

Selected Bibliography

Publishers of statistical texts have been producing a steady stream of books on R. Of particular note is Springer's Use R! series and Chapman and Hall/CRC's The R Series.

For a more extensive bibliography, see the syllabus for my R lectures at the ICPSR Summer Program in Ann Arbor.

Basic Text

A principal reference for this workshop is J. Fox and S. Weisberg, An R Companion to Applied Regression, Second Edition, Sage, 2011, but you should be able to follow the workshop without reading the book. Additional materials are available on the web site for the book, including several appendices (on multivariate linear models, structural-equation models, mixed models, survival analysis, and more). As mentioned, a third edition of this book is in preparation. The book is associated with the car and effects packages, both of which will be substantially revised for the third edition of the book.

Manuals

R is distributed with a set of manuals, which can be accessed conveniently through the RStudio help tab. The R manuals are also available at the CRAN web site.

A great deal of information about using the RStudio interactive development environment is available on the RStudio website at (see under "Documentation"). The RStudio IDE and R Markdown "cheat sheets" and R Markdown Reference Guide are particularly useful.

Similarly, the Stan website has a great deal of information about using Stan.

Mixed-Effects Models in R

Also see the package listing on CRAN and the Bayesian Inference and Statistics for the Social Sciences CRAN "task views."

A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. Rubin, Bayesian Data Analysis, Third Edition. Boca Raton: CRC/Chapman & Hall, 2013. More demanding than McElreath’s text, described below, this is a tour-de-force exposition of Bayesian methods, including for mixed-effects models. An appendix to the text explains how to use R and Stan for Bayesian inference. Andrew Gelman and Aki Vehtari are among the developers of Stan.

A. Gelman and J. Hill, Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge: Cambridge University Press, 2007. A wide-ranging and accessible yet deep treatment of hierarchical models and various related topics, predominantly but not exclusively from a Bayesian perspective, using both R and BUGS software. (BUGS is widely used software for fitting Bayesian models, largely superceded by Stan.)

R. McElreath, Statistical Retrinking: A Bayesian Course with Examples in R and Stan. Boca Raton: CRC/Chapman & Hall, 2016. The title is reasonably descriptive of this very readable introduction to modern Bayesian methods. The use of R and Stan in the book is somewhat idiosyncratic, employing the author’s rethinking package, which is freely available but not from CRAN.

J. C. Pinheiro and D. M. Bates, Mixed-Effects Models in S and S-PLUS. New York: Springer, 2000. An extensive treatment of linear and nonlinear mixed-effects models in S, focused on the authors' nlme package. Does not cover Bates, Maechler, and Bolker’s newer lme4 package, the capabilities of which overlap substantially with the nlme package. Unlike the nlme package, the lme4 package is capable of fitting generalized linear mixed models, but, also unlike nlme, lme4 does not support autocorrelated individual-level errors.

W. N. Venables and B. D. Ripley. Modern Applied Statistics with S, Fourth Edition. New York: Springer, 2002. An influential and wide-ranging treatment of data analysis using S and R, including a chapter on mixed-effects models. Many of the facilities described in the book are programmed in the associated (and very useful) MASS, nnet, and spatial packages, which are included in the standard R distribution. This text is more advanced and has a broader focus than the R Companion. I once considered the MASS book the best moderately advanced reference on statistical data analysis in S and R. The book is still very useful, but it is showing its age.