Dynamic Documents

Day 1, A

Michael C Sachs

What?

Code and prose

A “literate document” or “dynamic document” is a file that contains code and prose

The idea was developed by Donald Knuth in the 1980s; his motivation was to use it for software development, but allowing programmers to “concentrate … on explaining to humans what we want the computer to do”.

When the document is rendered the code also gets run, from top to bottom in the order it appears in the document, and the results inserted dynamically into the output document

How does it work?

In an rmarkdown (.Rmd) or Quarto (.qmd) document, at the top you have the yaml header, with key: value pairs listed inside a start and end fence (“---”). These key: value pairs contain information about the document and control how it is rendered.

Prose is written in markdown. This is a lightweight markup language (headers, bold, italics, links, figures, tables).

Code is inserted in between start and end fences (“```”), called chunks. Options controlling how the code is run and output can be added to each chunk.

Markdown

---
title: Example yaml header
author: Homer J Simpson
format: html
---

# First level header
## Second level 
### etc

*italics* for emphasis, **bold** for highlighting

- Lists
- can 
- also
    + be 
    + nested

1. Or enumerated
2. B
3. C

![alt text](path-to-image.png)

[text](link.html)

Equations:

$$
T = \frac{\overline{X} - \mu_0}{\hat{\sigma}/\sqrt{n}}
$$

Code chunks

Look like this, and the output is below

```{r}
rnorm(1)
```
[1] 1.202232

In quarto, chunk options are inside the fence after #|, e.g., #| echo: fenced, each on their own line

In rmarkdown, chunk options are inside the brackets (this is compatible with quarto for now)

You can also have code inline:

For example this `r rnorm(1)` will show up inline

which will put the output -0.3477475 without showing the code.

Rendering

  • The code chunks will be run sequentially in the same, fresh R session (remember when I said restart R frequently when developing)
  • The code and output will be inserted into an intermediate document, as directed by the code chunk options
  • That document is then processed by pandoc which turns it into a format designed for human readability, as directed by the document yaml header
    • html web pages
    • pdf documents (requires Latex)
    • word documents
    • html presentations (like this one)

Presentations

I like to use format: revealjs

A new slide starts with a level-1 (#) or -2 (##) heading, followed by a title. Then you can include anything, lists, code, output, etc:

## This is a slide title

Here is the _slide content_ 


## This starts a new slide

- More 
- ... content

# This is a new section

Getting output to look nice

Tools of the trade

  • Figures are straightforward, can use captions and cross-references
  • For inline output, you can write custom print methods, which we will try later
  • Tables are the hardest, there are some packages that help
    • knitr and the function kable, also the package kableExtra
    • xtable (designed for pdf output)
    • gtsummary

Customizing rendering

  • Bibliographies and cross references fully supported
  • With Quarto, layout and style can be controlled with commands inside ::: fences
  • html output: further tweaks with custom CSS
  • PDF output: Latex and related commands
  • word documents: template files? and some html is supported

Table example

This function outputs the data in a markdown table, which is then interpreted and rendered nicely when the intermediate document is processed by pandoc

knitr::kable(head(palmerpenguins::penguins))
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
Adelie Torgersen 39.1 18.7 181 3750 male 2007
Adelie Torgersen 39.5 17.4 186 3800 female 2007
Adelie Torgersen 40.3 18.0 195 3250 female 2007
Adelie Torgersen NA NA NA NA NA 2007
Adelie Torgersen 36.7 19.3 193 3450 female 2007
Adelie Torgersen 39.3 20.6 190 3650 male 2007

Endless possibilities

  • In theory, in your paper/presentation, every number and figure that is a result of your analysis can be generated from code
  • Easy to update if the data change
  • Can inspect the source document to determine where the number came from
  • In practice, do the best you can, and sometimes a well-documented “business process” is just as good
  • Content comes first – the default “look” and layout is good enough for most applications, but you can also endless tweak that to suit your need

How?

  • Rmarkdown, works great, been around for a while now
  • Quarto, relatively newer. Main advantage is increase flexibility of controlling output (and speed) and more features being introduced
  • Sweave, old school format used in R. See the survival package vignettes and in particular the “noweb” subdirectory which takes the literate programming approach.
  • Org mode, an emacs thing

More about quarto

Cell options affect the execution and output of executable code blocks. They are specified within comments at the top of a block. For example:

```{r}
#| label: fig-polar
#| echo: false
#| fig-cap: "A line plot on a polar axis"
```

Even more

Defaults can be set globally in the YAML header, e.g.,

---
execute:
  echo: true
---
  • Quarto works well with python, Julia, and observable (javascript)
  • You can organize collections of documents into books, websites, etc.

Try it yourself

Practical

We will briefly try out rmarkdown or quarto so that you are set up for taking notes and keeping track of your exercise solutions.

Link to lesson

Link home