The R help system

First line support: ?

A central difference between scripts and packages is structured documentation integrated into the R commandline.

Example: ?abs or help(abs) gives a detailed description of two arithmetic functions, with a reference, links to related subjects, and examples (to be run as example(abs))

Writing documentation is your responsibility. The standard format is the R documentation format. .Rd-files live in the man-folder of the source package. R builds boths interactive HTML and static PDF from the source.

This is what the .Rd file for abs looks like: MathFun.Rd

.Rd-format uses markup similar to LaTeX to define fields (name, title) and sections (description); it allows lists, links and some formatting, see Hadley’s R package book.

Building help from .Rd

Let’s start with a stupid function:

say_hello = function(name = "my little friend", punct = ".")
{
  cat( paste0("Hello, ", name, punct), "\n", sep="")
}
say_hello()
Hello, my little friend.

Let’s build a simple package in the working directory:

library(devtools)
create("say")
wd("say")
dump("say_hello", file = "R/say_hello.R")
rm(say_hello)  ## remove local copy
load_all()     ## load the package
say_hello()    ## this is the package-version

We can use a standard R function to generate a template .Rd file:

dir.create("man")
prompt(say_hello, filename = "man/say_hello.Rd")

This is what the template thus generated looks like: say_hello_template.Rd.

We can now edit the template to fit our needs. This is what the edited template could look like: say_hello.Rd.

Note: while editing a .Rdfile in RStudio, the usual Compile report shortcut (Ctrl-Shift-K) will show a preview of the help page.

Once we are happy with the documentation (for now, at least), we can build the package again and check what it looks like:

load_all()
?say_hello

Note: % is a comment character and can break the build. Here in the URL, the % is escaped by a backslash \

Note: links to other functions from the See Also section do not work when the package is only loaded for development - the links will work when the package is installed:

install()
?say_hello   ## link to paste should be active

Easier: building help with roxygen2

Problems with .Rd files:

  1. Markup is heavy, ugly and not very intuitive
  2. Documentation separate from code, hard to keep synchronized

Solution: package roxygen2

  • use special comments + tags with \@ directly in source .R file
  • write documentation block right on top of function
  • use document() to translate to .Rd file(s) during building

This what our example could look like: say_hello.R

roxygen2 support is built into devtools, so we simply update the documentation like this:

document()
load_all()
?say_hello

Note: text formatting, links, lists etc. work exactly as in .Rd

The @export tag

roxygen2 solves the problem of having separate .Rd files for documentation.

roxygen2 also solves the problem of having a separate NAMESPACE file:

  • only functions exported in NAMESPACE are directly avaiable when the package is loaded
  • by default, all functions defined in the package are exported via a call to exportPattern
  • document() will add all functions with the @export tag to NAMESPACE
  • functions without the tag will stay private (accessable through :::)

Tips

Error messages like this are common when installing from devtools:

Error in fetch(key) : lazy-load database 'C:/Users/aleplo/Documents/R/R-3.6.1/library/say/help/say.rdb' is corrupt

Restart R to get rid of them.

A roxygen2-style documentation header is often useful for functions in scripts:

  • Structured specification of what the function does
  • Easy upgrade to package later

Writing good help

Frequent recommendation for writing good documentation:

  • Make the introduction short (1-5 lines).

  • Avoid long argument descriptions move extra material to Details.

  • Describe the return value; make use of itemize/describe for structured return values (e.g. data frames).

  • Use examples to demonstrate common use cases; examples should self-contained, using built-in datasets or simulated data.

  • Consider vignettes for complex use cases.

If you find it very hard to document your function(s), you may want to re-think your function- or package design.

Limitations of help

R documentation mixes reference and tutorial, but is closer to a reference. It is often awkward for describing

  • background and motivation,

  • multiple or complex use cases,

  • use cases involving multiple functions.

Package-level documentation can help (see below), but these things are generally better in a vignette.

Vignettes

Complement to help

Any files in inst/doc in the source package will be copied to doc in the built/installed package: any format, any source.

A package vignette is a PDF or HTML document in doc in the built package generated from plain text literate source files (.Rmd, .Rnw) folder vignettes in the source package.

Multiple vignettes are possible, any content, but the main idea is tutorial or background rather than reference, e.g. browseVignettes(package = "survival").

Unless a package is extremely simple (a one-function package), write a vignette.

How to vignette

Generate a vignette template for our silly example:

use_vignette("SayingThings")

This creates a new sub-directory in the package folder called vignettes and puts a template .Rmd file there.

You can edit this file as you see fit, though the current example may be a bit thin. This is what a an example could look like: SayingThings.Rmd

Vignettes are NOT built by default when using install, you need to insist:

install(build_vignettes = TRUE)
browseVignettes(package = "say")

Other

Adding package-level help

It can be helpful to have a central help page for the package as a whole. E.g.

package?devtools

roxygen can do this, too:

  1. Write documentation block for object NULL
  2. Add tags \@docType package and \@name <pkgname> to the block

Commonly this header is put into a separate .R file called <package-name>.R., but any R file will do.

This is what a package documentation for our package could look like: say.R

Adding & documenting data sets

Ready-made

If your package cannot use a standard data set, include your own.

Data sets live in directory data of the package. Technically, data can be stored as a text file, R code that will generate data, or an .rda file. The latter is most common, recommended and well supported by devtools:

names = c("Ronald", "Herbert", "Bill", "George")
use_data( names )

generates file data/names.rda with the single vector names. A local copy can be generated via

rm( names )     ## delete local copy
load_all()
data( names )
names

Data sets also need to be documented. This is again simple using roxygen, by including the data name with a roxygen header in one of the package .R files:

#' First names
#' 
#' A vector with four common English first names.
#' 
#' @format A character vector of length four. 
#' @source Synthetic data set
"names"

Documentation for data sets is built as before:

document()
load_all()
?names

Comments:

  1. Command from Hadley: Never @export a data set.

  2. You can use @example, too.

Pre-processing files

Sometimes an external data set needs clean-up for inclusion in a package.

You can do this at the command line, and call use_data. However, for reasons of

  • documentation,
  • replicability,
  • maintenance

you really should do it in an R script.

You could keep the script in data, in which case it will be run every time you load, build or install: that is probably overkill, and may be slow.

His Hadleyness recommends instead:

  • Add a directory data-raw to the source package with code and raw data
  • Exclude this directory from the build (.Rbuildignore)
  • Run the code manually whenever the data needs updating

This is set up by calling use_data_raw:

use_data_raw("corenames")

which sets up the directory, puts a script template with the given name there, updates .Rbuildignore, milks the cows and fixes the tractor.

Your job is to copy the data file into data-raw, write the pre-processing code, and run the script once to build the .rda file.

This is what data file and script could like here: corenames.txt corenames.R

Of course, you still have to document the data, e.g. data.R.

Exercises

  1. Take function nestedCC and add a roxygen header for documentation. Re-build the package, and check the help you have generated.

  2. Add a simple vignette to the package with nestedCC.

  3. Add the example file cohort.txt to the package:

    1. using use_data;
    2. using use_data_raw: clean up the data somewhat, e.g. define factors;
    3. write a short documentation of the data file.
  4. Add documentation for the package itself.