Documentation

Michael C Sachs

2025-05-27

The R help system

First line support: ?

A central difference between scripts and packages is structured documentation integrated into the R commandline.

Example: ?abs or help(abs) gives a detailed description of two arithmetic functions, with a reference, links to related subjects, and examples (to be run as example(abs))

Writing documentation is your responsibility. The standard format is the R documentation format. .Rd-files live in the man-folder of the source package. R builds boths interactive HTML and static PDF from the source.

This is what the .Rd file for abs looks like: MathFun.Rd

.Rd-format uses markup similar to LaTeX to define fields (name, title) and sections (description); it allows lists, links and some formatting, see Hadley’s R package book.

Building help from .Rd

Let’s start with a stupid function:

say_hello = function(name = "my little friend", punct = ".")
{
  cat( paste0("Hello, ", name, punct), "\n", sep="")
}
say_hello()
Hello, my little friend.

Let’s build a simple package in the working directory:

library(devtools)
create("say")
wd("say")
dump("say_hello", file = "R/say_hello.R")
rm(say_hello)  ## remove local copy
load_all()     ## load the package
say_hello()    ## this is the package-version

Building help with roxygen2

Problems with .Rd files:

  1. Markup is heavy, ugly and not very intuitive
  2. Documentation separate from code, hard to keep synchronized

Solution: package roxygen2

  • use special comments + tags with \@ directly in source .R file
  • write documentation block right on top of function
  • use document() to translate to .Rd file(s) during building

This what our example could look like:

roxygen2 support is built into devtools, so we simply update the documentation like this:

document()
load_all()
?say_hello

Note: text formatting, links, lists etc. work exactly as in .Rd

The @export tag

roxygen2 solves the problem of having separate .Rd files for documentation.

roxygen2 also solves the problem of having a separate NAMESPACE file:

  • only functions exported in NAMESPACE are directly avaiable when the package is loaded
  • by default, all functions defined in the package are exported via a call to exportPattern
  • document() will add all functions with the @export tag to NAMESPACE
  • functions without the tag will stay private (accessable through :::)

Tips

Error messages like this are common when installing from devtools:

Error in fetch(key) : lazy-load database 'C:/Users/aleplo/Documents/R/R-3.6.1/library/say/help/say.rdb' is corrupt

Restart R to get rid of them.

A roxygen2-style documentation header is often useful for functions in scripts:

  • Structured specification of what the function does
  • Easy upgrade to package later

Writing good help

Frequent recommendation for writing good documentation:

  • Make the introduction short (1-5 lines).

  • Avoid long argument descriptions move extra material to Details.

  • Describe the return value; make use of itemize/describe for structured return values (e.g. data frames).

  • Use examples to demonstrate common use cases; examples should self-contained, using built-in datasets or simulated data.

  • Consider vignettes for complex use cases.

If you find it very hard to document your function(s), you may want to re-think your function- or package design.

Limitations of help

R documentation mixes reference and tutorial, but is closer to a reference. It is often awkward for describing

  • background and motivation,

  • multiple or complex use cases,

  • use cases involving multiple functions.

Package-level documentation can help (see below), but these things are generally better in a vignette.

Vignettes

Complement to help

Any files in inst/doc in the source package will be copied to doc in the built/installed package: any format, any source.

A package vignette is a PDF or HTML document in doc in the built package generated from plain text literate source files (.Rmd, .Rnw) folder vignettes in the source package.

Multiple vignettes are possible, any content, but the main idea is tutorial or background rather than reference, e.g. browseVignettes(package = "survival").

Unless a package is extremely simple (a one-function package), write a vignette.

How to vignette

Generate a vignette template for our silly example:

use_vignette("SayingThings")

This creates a new sub-directory in the package folder called vignettes and puts a template .Rmd file there.

Vignettes are NOT built by default when using install, you need to insist:

install(build_vignettes = TRUE)
browseVignettes(package = "say")

Other

Adding package-level help

It can be helpful to have a central help page for the package as a whole. E.g.

package?devtools

roxygen can do this, too:

  1. Write documentation block for object "_PACKAGE"
  2. Add tag \@docType package
  3. It will automatically parse the DESCRIPTION file

Commonly this header is put into a separate .R file called <package-name>.R., but any R file will do.

This is what a package documentation for our package could look like: say.R

Adding & documenting data sets

Ready-made

If your package cannot use a standard data set, include your own.

Data sets live in directory data of the package. Technically, data can be stored as a text file, R code that will generate data, or an .rda file. The latter is most common, recommended and well supported by devtools:

names = c("Ronald", "Herbert", "Bill", "George")
use_data( names )

generates file data/names.rda with the single vector names. A local copy can be generated via

rm( names )     ## delete local copy
load_all()
data( names )
names

Data sets also need to be documented. This is again simple using roxygen, by including the data name with a roxygen header in one of the package .R files:

#' First names
#' 
#' A vector with four common English first names.
#' 
#' @format A character vector of length four. 
#' @source Synthetic data set
"names"

Documentation for data sets is built as before:

document()
load_all()
?names

Comments:

  1. Command from Hadley: Never @export a data set.

  2. You can use @example, too.

Pre-processing files

Sometimes an external data set needs clean-up for inclusion in a package.

You can do this at the command line, and call use_data. However, for reasons of

  • documentation,
  • replicability,
  • maintenance

you really should do it in an R script.

You could keep the script in data, in which case it will be run every time you load, build or install: that is probably overkill, and may be slow.

Instead:

  • Add a directory data-raw to the source package with code and raw data
  • Exclude this directory from the build (.Rbuildignore)
  • Run the code manually whenever the data needs updating

This is set up by calling use_data_raw:

use_data_raw("corenames")

which sets up the directory, puts a script template with the given name there, updates .Rbuildignore.

Your job is to copy the data file into data-raw, write the pre-processing code, and run the script once to build the .rda file.

Of course, you still have to document the data.

Diátaxis framework

diataxis

diataxis

Organizing documentation

  • I recommend using pkgdown to build a website for your package
  • You can flexibly organize documentation and vignettes

Example: causaloptim

Exercise

  1. Using your own package, or pick a package, classify the documentation according to the Diátaxis framework
  2. How would improve the documentation or reorganize it?