Object-oriented programming

Basic idea

  • Hide complexity behind abstraction

  • Model concrete entities as software entities (“objects”)

  • Define relationships between software entities that mimick those between concrete entities

  • A way of thinking about problems and software

  • Many, many different implementations

Concepts

Class: a general definition of a data format

Ex.: a person class with name, personal ID number, email address etc.

Object: a specific instance of a class

Ex.: an object relating to Marie

Method: a procedure / function acting on objects from a class

Ex.: we can send emails to persons, including Marie

Inheritance: classes can form a hierarchy with shared data, methods

Ex. class employee inherits name, ID number etc. from class person, but also has information on contract, length of employment; we can send emails to employees, because they are persons.

The classic class system: S3

How come…? A demonstration

Let’s say we have run a linear model:

> set.seed(313)
> x = rnorm(10); y = 2 + 3* x + rnorm(10)/4
> lm1 = lm(y~x) 

So we can use look at the regression table:

> summary(lm1)

Call:
lm(formula = y ~ x)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.35985 -0.13888 -0.00807  0.13181  0.44398 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.96291    0.08037   24.42 8.43e-09 ***
x            2.98959    0.11895   25.13 6.72e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2471 on 8 degrees of freedom
Multiple R-squared:  0.9875,    Adjusted R-squared:  0.9859 
F-statistic: 631.6 on 1 and 8 DF,  p-value: 6.725e-09

But we can also use summary for a data frame:

> summary(sleep)
     extra        group        ID   
 Min.   :-1.600   1:10   1      :2  
 1st Qu.:-0.025   2:10   2      :2  
 Median : 0.950          3      :2  
 Mean   : 1.540          4      :2  
 3rd Qu.: 3.400          5      :2  
 Max.   : 5.500          6      :2  
                         (Other):8  

Is summary a long & complex function that does both things?

Generic functions, methods and classes

Not exactly:

> summary
function (object, ...) 
UseMethod("summary")
<bytecode: 0x55b7ad6c2518>
<environment: namespace:base>

summary is a generic function, as seen by the call to UseMethod: depending on the class of argument object, R will call a suitable method (a function) that gives a meaningful summary for whatever object is: one thing for a linear model, another thing for a data frame.

Calling the correct method for an object is called method dispatch. In S3, method dispatch is based on the generic function called (what the user wants to do) and the class of the first argument (what object the user wants to do something to). The corresponding method is found by its name:

<name of generic>.<name of argument class>

For our example, we check the class of our object:

> class(lm1)
[1] "lm"

so the name of the method dispatched is summary.lm:

> findFunction("summary.lm")
[[1]]
<environment: package:stats>
attr(,"name")
[1] "package:stats"
attr(,"path")
[1] "/usr/lib/R/library/stats"
> args(summary.lm)
function (object, correlation = FALSE, symbolic.cor = FALSE, 
    ...) 
NULL

Three standard generics are widely used for different classes:

  1. print is used to display an expression (implicitly or implicitly)

  2. summary generates a compressed summary of its argument

  3. plot provides a graphical display

Let’s look for functions that could be methods for generic print:

> apropos("^print\\.", ignore.case = FALSE, mode = "function")
 [1] "print.AsIs"                  "print.by"                   
 [3] "print.condition"             "print.connection"           
 [5] "print.data.frame"            "print.Date"                 
 [7] "print.default"               "print.difftime"             
 [9] "print.Dlist"                 "print.DLLInfo"              
[11] "print.DLLInfoList"           "print.DLLRegisteredRoutines"
[13] "print.eigen"                 "print.factor"               
[15] "print.function"              "print.hexmode"              
[17] "print.libraryIQR"            "print.listof"               
[19] "print.NativeRoutineList"     "print.noquote"              
[21] "print.numeric_version"       "print.octmode"              
[23] "print.packageInfo"           "print.POSIXct"              
[25] "print.POSIXlt"               "print.proc_time"            
[27] "print.restart"               "print.rle"                  
[29] "print.simple.list"           "print.srcfile"              
[31] "print.srcref"                "print.summary.table"        
[33] "print.summary.warnings"      "print.summaryDefault"       
[35] "print.table"                 "print.warnings"             

Note that this does hide complexity: user does not have to know what class an object has, but can (try to) print, summarize or plot it anyway.

(.S3methods(print) actually provides a more complete list of print methods.)

Default methods

Many functions seem to have .default at the end:

> apropos("\\.default$", ignore.case = FALSE, mode = "function")
 [1] "all.equal.default"     "anyDuplicated.default" "aperm.default"        
 [4] "as.array.default"      "as.character.default"  "as.data.frame.default"
 [7] "as.Date.default"       "as.expression.default" "as.function.default"  
[10] "as.list.default"       "as.matrix.default"     "as.null.default"      
[13] "as.POSIXct.default"    "as.POSIXlt.default"    "as.single.default"    
[16] "as.table.default"      "barplot.default"       "boxplot.default"      
[19] "by.default"            "chol.default"          "confint.default"      
[22] "contour.default"       "cut.default"           "density.default"      
[25] "diff.default"          "duplicated.default"    "format.default"       
[28] "hist.default"          "image.default"         "is.na<-.default"      
[31] "kappa.default"         "labels.default"        "levels.default"       
[34] "lines.default"         "mean.default"          "median.default"       
[37] "merge.default"         "model.frame.default"   "model.matrix.default" 
[40] "pairs.default"         "plot.default"          "points.default"       
[43] "pretty.default"        "print.default"         "qr.default"           
[46] "range.default"         "rev.default"           "row.names.default"    
[49] "row.names<-.default"   "rowsum.default"        "scale.default"        
[52] "seq.default"           "solve.default"         "sort.default"         
[55] "split.default"         "split<-.default"       "subset.default"       
[58] "summary.default"       "t.default"             "text.default"         
[61] "toString.default"      "transform.default"     "unique.default"       
[64] "update.default"        "with.default"          "xtfrm.default"        

These are not for a mystical default-class, but the fall-back functions used when the generic function can’t find a matching method.

E.g. mean is a generic function:

> mean
function (x, ...) 
UseMethod("mean")
<bytecode: 0x55b7af9fd598>
<environment: namespace:base>

The default method deals with non-numeric data, removes NA and trims the tails, if required, and then calls an internal function which does the actual work:

> mean.default
function (x, trim = 0, na.rm = FALSE, ...) 
{
    if (!is.numeric(x) && !is.complex(x) && !is.logical(x)) {
        warning("argument is not numeric or logical: returning NA")
        return(NA_real_)
    }
    if (na.rm) 
        x <- x[!is.na(x)]
    if (!is.numeric(trim) || length(trim) != 1L) 
        stop("'trim' must be numeric of length one")
    n <- length(x)
    if (trim > 0 && n) {
        if (is.complex(x)) 
            stop("trimmed means are not defined for complex data")
        if (anyNA(x)) 
            return(NA_real_)
        if (trim >= 0.5) 
            return(stats::median(x, na.rm = FALSE))
        lo <- floor(n * trim) + 1
        hi <- n + 1 - lo
        x <- sort.int(x, partial = unique(c(lo, hi)))[lo:hi]
    }
    .Internal(mean(x))
}
<bytecode: 0x55b7ae7749a8>
<environment: namespace:base>

Inheritance

The package data.table provides a rectangular (tidy) data format with powerful and efficient processing procedures.

> library(data.table)
> dt = data.table(mtcars)
> class(dt)
[1] "data.table" "data.frame"

All data table objects (like dt here) have two classes: (1) data.table and (2) data.frame.

This means that class data.table inherits from class data.frame, or equivalently, class data.table extends class data.frame; object dt is both a data table and data frame, and all methods for both classes can be used:

> inherits(dt, "data.frame")
[1] TRUE

However, the generic will dispatch along the class-vector, starting at the beginning: by default, the data.table methods will be called.

> ## Implicitly, dispatch to head.data.table on class(dt)[1] 
> head(dt)                          
    mpg cyl disp  hp drat    wt  qsec vs am gear carb
1: 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
2: 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
3: 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
4: 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
5: 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
6: 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
> 
> ## Explicitly, direct call
> data.table:::head.data.table(dt)  
    mpg cyl disp  hp drat    wt  qsec vs am gear carb
1: 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
2: 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
3: 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
4: 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
5: 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
6: 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

If no method for a class is defined, the dispatch moves to the next class element, and tries to find a fitting methods for that class.

E.g. data.table has no proper summary-method, and falls back on summary.data.frame:

> test <- try( getS3method(f = "summary", class = "data.table"), silent = TRUE)
> cat( test )
Error in getS3method(f = "summary", class = "data.table") : 
  S3 method 'summary.data.table' not found
> summary( dt )
      mpg             cyl             disp             hp       
 Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
 1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
 Median :19.20   Median :6.000   Median :196.3   Median :123.0  
 Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
 3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
 Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
      drat             wt             qsec             vs        
 Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
 1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
 Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
 Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
 3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
 Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
       am              gear            carb      
 Min.   :0.0000   Min.   :3.000   Min.   :1.000  
 1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
 Median :0.0000   Median :4.000   Median :2.000  
 Mean   :0.4062   Mean   :3.688   Mean   :2.812  
 3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
 Max.   :1.0000   Max.   :5.000   Max.   :8.000  

Making it classy

To create an object of class foo, simply set the class attribute to foo:

> x <- 1:10
> class(x) <- "foo"
> x
 [1]  1  2  3  4  5  6  7  8  9 10
attr(,"class")
[1] "foo"

You can remove the class attribute using unclass:

> unclass(x)
 [1]  1  2  3  4  5  6  7  8  9 10

You can hand-build an object of a certain class:

> x = 1:10
> x
 [1]  1  2  3  4  5  6  7  8  9 10
> levels(x) = as.character(x)
> x
 [1]  1  2  3  4  5  6  7  8  9 10
attr(,"levels")
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"
> class(x) = "factor"
> x
 [1] 1  2  3  4  5  6  7  8  9  10
Levels: 1 2 3 4 5 6 7 8 9 10
> is.factor(x)
[1] TRUE

Generally, classes have a constructor function which builds an object of the desiresd class from user input; often, the name of the function is the same as the class - e.g. lm to create lm, or here

> x = factor(1:10)
> x
 [1] 1  2  3  4  5  6  7  8  9  10
Levels: 1 2 3 4 5 6 7 8 9 10

Rolling your own

Writing a new method: both the class and the generic function exist already

When the class already has that method, this means that you override the existing method: this is risky and messy and should not be done lightly; see Exercise 2.

When the class does not have that method, you basically generate a new method for the existing class. E.g. there is no coef-method for base type numeric defined, and the default method cannot handle numeric vectors:

> x <- rnorm(10)
> test <- try( coef(x), silent = TRUE )
> cat(test)
Error : $ operator is invalid for atomic vectors

Say we want to have a method that returns the coefficient of variation (standard deviation divided by mean) for numeric vectors, we can do this:

> coef.numeric <- function(object, ...) sd(object)/mean(object)
> coef(x)
[1] -2.668404
  1. While legal, this is a bad idea, as this method really does something completely different from what coef is supposed to do, which is extracting model coefficients from a fitted model.

  2. Your method should generally have the same arguments (name and order) as the generic function, in this case coef.

Writing a new generic: this really only makes sense if you also write a corresponding method, otherwise there is nothing to do.

> blurp = function(x, ...) UseMethod("blurp")
> test = try( blurp(1), silent = TRUE)
> cat(test)
Error in UseMethod("blurp") : 
  no applicable method for 'blurp' applied to an object of class "c('double', 'numeric')"
> blurp.default = function(x, ...) "BLUUUURP!"
> blurp(1)
[1] "BLUUUURP!"
> blurp("a")
[1] "BLUUUURP!"

This should be some kind of activity that you is relevant across a range of different classes. This means you should probably write several methods for your generic, or offer something else to make this extra abstraction worthwhile.

You should also probably check the existing generics:

> .knownS3Generics
          Math            Ops        Summary        Complex   as.character 
        "base"         "base"         "base"         "base"         "base" 
 as.data.frame as.environment      as.matrix      as.vector          cbind 
        "base"         "base"         "base"         "base"         "base" 
        labels          print          rbind            rep            seq 
        "base"         "base"         "base"         "base"         "base" 
       seq.int          solve        summary              t           edit 
        "base"         "base"         "base"         "base"        "utils" 
           str        contour           hist       identify          image 
       "utils"     "graphics"     "graphics"     "graphics"     "graphics" 
         lines          pairs           plot         points           text 
    "graphics"     "graphics"     "graphics"     "graphics"     "graphics" 
          add1            AIC          anova         biplot           coef 
       "stats"        "stats"        "stats"        "stats"        "stats" 
       confint       deviance    df.residual          drop1     extractAIC 
       "stats"        "stats"        "stats"        "stats"        "stats" 
        fitted        formula         logLik    model.frame   model.matrix 
       "stats"        "stats"        "stats"        "stats"        "stats" 
       predict        profile         qqnorm      residuals    se.contrast 
       "stats"        "stats"        "stats"        "stats"        "stats" 
         terms         update           vcov 
       "stats"        "stats"        "stats" 

(For a more complete list, use utils:::getKnownS3generics(),)

Writing your own class: This is surprisingly easy - you only have to set the class attribute of an object; all generic functions will dispatch on the new class, though only to the default method, as you have not defined any methods yet.

You generally want a constructor function that returns an object of the new class.

This is the most common and easiest use case: you have a new thing that is a bit more complex, so make it a class, and write a print-method, maybe a summary and plot, too.

Documentation

S3 generic functions are functions: use Roxygen

S3 classes have no formal definition, so document the constructor function(s) that generates class objects:

  • Works well for class lm, which has essentially one constructor (lm, the function)

  • Does not work well for class htest, which has many, many constructors: see Value description in ?t.test.default, ?prob.test, ?wilcox.test

S3 methods are functions, and should be documented using Roxygen (unless they are really simple).

Other class systems in R

S4

A more formal version of S3:

  • explicit class definitions (using \@ for slots)

  • helper functions for defining generic functions & methods

  • multiple dispatch: methods for combinations of arguments

  • implemented in package methods

Example: mle in package stats4

> ## From example(mle)
> library(stats4)
> 
> ## Define data
> x <- 0:10
> y <- c(26, 17, 13, 12, 20, 5, 9, 8, 5, 4, 8)
> 
> ## Define a Poisson -log(likelihood)
> nLL <- function(lambda) -sum(stats::dpois(y, lambda, log = TRUE))
> 
> ## Fit the model
> fit1 <- mle(nLL, start = list(lambda = 5), nobs = length(y),
+             method = "Brent", lower = 1, upper = 20)
> fit1

Call:
mle(minuslogl = nLL, start = list(lambda = 5), method = "Brent", 
    nobs = length(y), lower = 1, upper = 20)

Coefficients:
  lambda 
11.54545 
> 
> ## This is an S4 object
> isS4(fit1)
[1] TRUE
> str(fit1)
Formal class 'mle' [package "stats4"] with 9 slots
  ..@ call     : language mle(minuslogl = nLL, start = list(lambda = 5), method = "Brent", nobs = length(y),      lower = 1, upper = 20)
  ..@ coef     : num 11.5
  ..@ fullcoef : Named num 11.5
  .. ..- attr(*, "names")= chr "lambda"
  ..@ vcov     : num [1, 1] 1.05
  ..@ min      : num 42.7
  ..@ details  :List of 6
  .. ..$ par        : num 11.5
  .. ..$ value      : num 42.7
  .. ..$ counts     : Named logi [1:2] NA NA
  .. .. ..- attr(*, "names")= chr [1:2] "function" "gradient"
  .. ..$ convergence: int 0
  .. ..$ message    : NULL
  .. ..$ hessian    : num [1, 1] 0.953
  ..@ minuslogl:function (lambda)  
  .. ..- attr(*, "srcref")= 'srcref' int [1:8] 9 8 9 65 8 65 9 9
  .. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x55b7b2f1e1e8> 
  ..@ nobs     : int 11
  ..@ method   : chr "Brent"
> 
> ## Methods work as before
> summary(fit1)
Maximum likelihood estimation

Call:
mle(minuslogl = nLL, start = list(lambda = 5), method = "Brent", 
    nobs = length(y), lower = 1, upper = 20)

Coefficients:
     Estimate Std. Error
[1,] 11.54545   1.024493

-2 log L: 85.45356 
> AIC(fit1)
[1] 87.45356
> AIC
standardGeneric for "AIC" defined from package "stats"

function (object, ..., k = 2) 
standardGeneric("AIC")
<environment: 0x55b7af711378>
Methods may be defined for arguments: object, k
Use  showMethods("AIC")  for currently available ones.
> 
> ## Get rid of stats4
> detach("package:stats4")

RC

“Reference Classes”:

  • More conventional class-system, where methods are part of the class definition (“message passing OO”)

  • Objects are mutable (implicit copies carry changes forward)

  • Based on environments

Exercises

  1. We have used lm(1~1) as a small artificial test example for linear models.

    1. Looking at the standard methods for lm-objects, find evidence that there is an even smaller valid linear model in R.
    2. Specify this model.
  2. An annoying feature of large data frames is that printing shows lots of useless information on the console. A tibble as promoted by HW only shows the top of the data, together with the dimension of the tibble, and lists extra columns at the bottom.

    1. Write a replacement method for print.data.frame that does something similar: start with a simple prototype that only shows the first few rows of the data, then add printing the dimension of the data frame on top.
    2. Study the tibble-package: which print method is dispatched for showing a tibble? Which function does the hard work? If it is a method, show the corresponding generic.