Structural Equation Modeling with the sem package for R, Writing R Packages, Teaching Statistics Using R and the R Commander

John Fox

(McMaster University)

IQS Barcelona

IQS logo

January 2016

Short URL: tinyurl.com/IQS-R-Course

Instructions for Installing R, R Studio, and R Packages

The R statistical programming language and computing environment has become the de-facto standard for writing statistical software among statisticians and has made substantial inroads in the social sciences -- it is now possibly the most widely used statistical software in the world. R is a free, open-source implementation of the S language, and is available for Windows, Mac OS X, and Unix/Linux systems.

The basic R system is developed and maintained by the R Core group, comprising 20 members, many of them eminent in the field of statistical computing. The R Project for Statistical Computing is a project of the R Foundation, whose membership includes the R Core group and several other individuals, and is also associated with the Free Software Foundation.

I'll cover three R-related topics that are essentially independent of each other: An introduction to structural-equation modeling that uses the sem package for R; an introduction to writing R packages; and a discussion of using R and the R Commander graphical user interface for teaching statistics course.


Resources

Dates Topics

Related Readings

Materials
Jan. 19-21 structural-equation models with the sem packages on-line appendix to Fox and Weisberg, An R Companion to Applied Regression, Second Edition, on structural-equation models script, notes, problems; answers; data files: Lincoln.R, Rindfuss.R, Wheaton.R

Jan. 22 (IQS) & Jan. 26 (Univ. de Barcelona)

writing and building R packages Writing R Extensions manual script, notes, matrixDemos.R, matrixDemos_1.0-5.tar.gz, matrixDemos_1.0-5.zip
Jan. 25 teaching with R and the R Commander   notes, data file: States.txt, sample problem

Selected (English) Bibliography

Publishers of statistical texts have been producing a steady stream of books on R. Of particular note is Springer's Use R! series of brief paperbacks on various R-related topics. Similarly, Chapman and Hall/CRC Press has The R Series.

Also see the package listing on CRAN and the various CRAN "task views."

R Manuals

R is distributed with a set of manuals, which are also available at the CRAN web site.

A manual for S-PLUS Trellis Graphics (also useful for the lattice package in R) is at also available on the web.

A great deal of information about using the RStudio interactive development environment is available on the RStudio website.

Programming in R (and S)

R. A. Becker, J. M. Chambers, and A .R. Wilks, The New S Language: A Programming Environment for Data Analysis and Statistics. Pacific Grove , CA : Wadsworth , 1988. Defines S Version 2, which forms the basis of the currently used S Versions 3 and 4, as well as R. (Sometimes called the “Blue Book.”)

J. M. Chambers, Programming with Data: A Guide to the S Language. New York : Springer, 1998. Describes the then-new features in S Version 4, including the newer formal object-oriented programming system (also incorporated in R), by the principal designer of the S language and a member of the R Core group of developers. Not an easy read. (The “Green Book.”)

J. M. Chambers, Software for Data Analysis: Programming with R. New York: Springer, 2008. Chambers’s newest book ranges quite widely, and emphasizes a deep understanding of the R language, along with object-oriented programming, and links between R and other software. Some topics are unusual, such as processing text data in R.

J. M. Chambers and T.J. Hastie, eds., Statistical Models in S. Pacific Grove , CA : Wadsworth , 1992.  An edited volume describing the statistical modeling capabilities in S, Versions 3 and 4, and R, and the object-oriented programming system used in S Version 3 and R (and available, for “backwards compatibility,” in S Version 4). In addition, the text covers S software for particular kinds of statistical models, including linear models, nonlinear models, generalized linear models, local-polynomial regression models, and generalized additive models. (The “White Book.”)

D. Eddelbbuettel, Seamless R and C++ Integration with Rcpp. New York: Springer, 2013. Judicious use of compiled code written in C, C++, or Fortran can substantially improve the efficiency of some R programs. The Rcpp package and its cousins simplify the process of integrating C++ code in R. I recommend this book to those who already know C++.

R. Gentleman, R Programming for Bioinformatics, Boca Raton: Chapman and Hall, 2009. A thorough, though at points relatively difficult, treatment of programming in R, by one of the original co-developers of R and a founder of the related Bioconductor Project (which develops computing tools for the analysis of genomic data). Don’t let the title fool you: Most of the book is of general interest to R programmers.

G. Grolemund, Hands-On Programming with R, Sebastopol CA: O'Reilly, 2014. A readable, easy-to-follow, basic introduction to R programming, which also introduces RStudio.

R. Ihaka and R. Gentleman, “R: A language for data analysis and graphics.” Journal of Computational and Graphical Statistics, 5:299-314, 1996. The original published description of the R project, now quite out of date but still worth looking at.

W. N. Venables and B. D. Ripley, S Programming. New York : Springer, 2000. A companion volume to Modern Applied Statistics with S, and at the time of its publication the definitive treatment of writing software in the various versions of S-PLUS and R; now increasingly dated, particularly with respect to R, but still useful for its programming advice. Brian Ripley is a member of the R Core group of developers, and Bill Venables is a member of the R Foundation.

H. Wickham, Advanced R. Boca Raton FL: Chapman and Hall/CRC, 2015. Hadley Wickham has contributed a number of widely used R packages (such as ggplot2 for graphics and plyr for data manipulation) and is associated with RStudio. As the name implies, you may (and should!) be interested in reading this book after you’ve learned the basics of R programming. A related volume by Wickham, R Packages, Sepastopol CA: O'Reilly, 2015, is (as its name implies) about how to write R packages. Wickham's approach to R programming is sometimes idiosyncratic but always carefully considered and interesting. The websites for the books provide access to the text. Hadley Wickham is a member of the R Foundation.

Xie, Y., Dynamic Documents with R and knitr. Boca Raton FL: Chapman and Hall/CRC, 2013. Yihui Xie describes the use of his knitr package for creating LaTeX documents with embedded executable R code. This package also provides the basis for R Markdown in RStudio.

Other R Sources (Some Free)

See the publications list on the R web site. The R Journal, the journal of the R Project for Statistical Computing, and its predecessor R News, are also good sources of information, as is the Journal of Statistical Software, an on-line American Statistical Association journal dominated by coverage of R packages.

A Source in Spanish

Maribel Peró Cebollero, David Leiva Ureña, Joan Guàrdia Olmos, Antonio Solanas Pérez, Estadística aplicada a las ciencias sociales mediante R y R-Commander, Garceta grupo editorial (2012).

Some Readings on Structural Equation Models

K. A. Bollen, "Latent Variables in Psychology and the Social Sciences", Annual Review of Psychology, 2002, 53: 605-634. Provides a good brief overview of latent-variable models.

K. A. Bollen, Structural Equations with Latent Variables (Wiley, 1989). Although it is now a bit dated, it's still my favourite book-length treatment of SEMs.

J. Fox, "Linear Structural-Equation Models" (Chapter 4, of Linear Statistical Models and Related Methods, Wiley, 1984). This is the basis of much of my lecture-slide material.