Hide complexity behind abstraction
Model concrete entities as software entities (“objects”)
Define relationships between software entities that mimick those between concrete entities
A way of thinking about problems and software
Many, many different implementations
Class: a general definition of a data format
Ex.: a person class with name, personal ID number, email address etc.
Object: a specific instance of a class
Ex.: an object relating to Marie
Method: a procedure / function acting on objects from a class
Ex.: we can send emails to persons, including Marie
Inheritance: classes can form a hierarchy with shared data, methods
Ex. class employee inherits name, ID number etc. from class person, but also has information on contract, length of employment; we can send emails to employees, because they are persons.
Let’s say we have run a linear model:
> set.seed(313)
> x = rnorm(10); y = 2 + 3* x + rnorm(10)/4
> lm1 = lm(y~x)
So we can use look at the regression table:
> summary(lm1)
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-0.35985 -0.13888 -0.00807 0.13181 0.44398
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.96291 0.08037 24.42 8.43e-09 ***
x 2.98959 0.11895 25.13 6.72e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.2471 on 8 degrees of freedom
Multiple R-squared: 0.9875, Adjusted R-squared: 0.9859
F-statistic: 631.6 on 1 and 8 DF, p-value: 6.725e-09
But we can also use summary
for a data frame:
> summary(sleep)
extra group ID
Min. :-1.600 1:10 1 :2
1st Qu.:-0.025 2:10 2 :2
Median : 0.950 3 :2
Mean : 1.540 4 :2
3rd Qu.: 3.400 5 :2
Max. : 5.500 6 :2
(Other):8
Is summary
a long & complex function that does both things?
Not exactly:
> summary
function (object, ...)
UseMethod("summary")
<bytecode: 0x55b7ad6c2518>
<environment: namespace:base>
summary
is a generic function, as seen by the call to UseMethod
: depending on the class of argument object
, R will call a suitable method (a function) that gives a meaningful summary for whatever object
is: one thing for a linear model, another thing for a data frame.
Calling the correct method for an object is called method dispatch. In S3, method dispatch is based on the generic function called (what the user wants to do) and the class of the first argument (what object the user wants to do something to). The corresponding method is found by its name:
<name of generic>.<name of argument class>
For our example, we check the class of our object:
> class(lm1)
[1] "lm"
so the name of the method dispatched is summary.lm
:
> findFunction("summary.lm")
[[1]]
<environment: package:stats>
attr(,"name")
[1] "package:stats"
attr(,"path")
[1] "/usr/lib/R/library/stats"
> args(summary.lm)
function (object, correlation = FALSE, symbolic.cor = FALSE,
...)
NULL
Three standard generics are widely used for different classes:
print
is used to display an expression (implicitly or implicitly)
summary
generates a compressed summary of its argument
plot
provides a graphical display
Let’s look for functions that could be methods for generic print
:
> apropos("^print\\.", ignore.case = FALSE, mode = "function")
[1] "print.AsIs" "print.by"
[3] "print.condition" "print.connection"
[5] "print.data.frame" "print.Date"
[7] "print.default" "print.difftime"
[9] "print.Dlist" "print.DLLInfo"
[11] "print.DLLInfoList" "print.DLLRegisteredRoutines"
[13] "print.eigen" "print.factor"
[15] "print.function" "print.hexmode"
[17] "print.libraryIQR" "print.listof"
[19] "print.NativeRoutineList" "print.noquote"
[21] "print.numeric_version" "print.octmode"
[23] "print.packageInfo" "print.POSIXct"
[25] "print.POSIXlt" "print.proc_time"
[27] "print.restart" "print.rle"
[29] "print.simple.list" "print.srcfile"
[31] "print.srcref" "print.summary.table"
[33] "print.summary.warnings" "print.summaryDefault"
[35] "print.table" "print.warnings"
Note that this does hide complexity: user does not have to know what class an object has, but can (try to) print, summarize or plot it anyway.
(.S3methods(print)
actually provides a more complete list of print methods.)
Many functions seem to have .default
at the end:
> apropos("\\.default$", ignore.case = FALSE, mode = "function")
[1] "all.equal.default" "anyDuplicated.default" "aperm.default"
[4] "as.array.default" "as.character.default" "as.data.frame.default"
[7] "as.Date.default" "as.expression.default" "as.function.default"
[10] "as.list.default" "as.matrix.default" "as.null.default"
[13] "as.POSIXct.default" "as.POSIXlt.default" "as.single.default"
[16] "as.table.default" "barplot.default" "boxplot.default"
[19] "by.default" "chol.default" "confint.default"
[22] "contour.default" "cut.default" "density.default"
[25] "diff.default" "duplicated.default" "format.default"
[28] "hist.default" "image.default" "is.na<-.default"
[31] "kappa.default" "labels.default" "levels.default"
[34] "lines.default" "mean.default" "median.default"
[37] "merge.default" "model.frame.default" "model.matrix.default"
[40] "pairs.default" "plot.default" "points.default"
[43] "pretty.default" "print.default" "qr.default"
[46] "range.default" "rev.default" "row.names.default"
[49] "row.names<-.default" "rowsum.default" "scale.default"
[52] "seq.default" "solve.default" "sort.default"
[55] "split.default" "split<-.default" "subset.default"
[58] "summary.default" "t.default" "text.default"
[61] "toString.default" "transform.default" "unique.default"
[64] "update.default" "with.default" "xtfrm.default"
These are not for a mystical default
-class, but the fall-back functions used when the generic function can’t find a matching method.
E.g. mean
is a generic function:
> mean
function (x, ...)
UseMethod("mean")
<bytecode: 0x55b7af9fd598>
<environment: namespace:base>
The default method deals with non-numeric data, removes NA
and trims the tails, if required, and then calls an internal function which does the actual work:
> mean.default
function (x, trim = 0, na.rm = FALSE, ...)
{
if (!is.numeric(x) && !is.complex(x) && !is.logical(x)) {
warning("argument is not numeric or logical: returning NA")
return(NA_real_)
}
if (na.rm)
x <- x[!is.na(x)]
if (!is.numeric(trim) || length(trim) != 1L)
stop("'trim' must be numeric of length one")
n <- length(x)
if (trim > 0 && n) {
if (is.complex(x))
stop("trimmed means are not defined for complex data")
if (anyNA(x))
return(NA_real_)
if (trim >= 0.5)
return(stats::median(x, na.rm = FALSE))
lo <- floor(n * trim) + 1
hi <- n + 1 - lo
x <- sort.int(x, partial = unique(c(lo, hi)))[lo:hi]
}
.Internal(mean(x))
}
<bytecode: 0x55b7ae7749a8>
<environment: namespace:base>
The package data.table
provides a rectangular (tidy) data format with powerful and efficient processing procedures.
> library(data.table)
> dt = data.table(mtcars)
> class(dt)
[1] "data.table" "data.frame"
All data table objects (like dt
here) have two classes: (1) data.table
and (2) data.frame
.
This means that class data.table
inherits from class data.frame
, or equivalently, class data.table
extends class data.frame
; object dt
is both a data table and data frame, and all methods for both classes can be used:
> inherits(dt, "data.frame")
[1] TRUE
However, the generic will dispatch along the class
-vector, starting at the beginning: by default, the data.table
methods will be called.
> ## Implicitly, dispatch to head.data.table on class(dt)[1]
> head(dt)
mpg cyl disp hp drat wt qsec vs am gear carb
1: 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2: 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3: 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
4: 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
5: 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
6: 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
>
> ## Explicitly, direct call
> data.table:::head.data.table(dt)
mpg cyl disp hp drat wt qsec vs am gear carb
1: 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2: 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3: 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
4: 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
5: 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
6: 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
If no method for a class is defined, the dispatch moves to the next class element, and tries to find a fitting methods for that class.
E.g. data.table
has no proper summary-method, and falls back on summary.data.frame
:
> test <- try( getS3method(f = "summary", class = "data.table"), silent = TRUE)
> cat( test )
Error in getS3method(f = "summary", class = "data.table") :
S3 method 'summary.data.table' not found
> summary( dt )
mpg cyl disp hp
Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
Median :19.20 Median :6.000 Median :196.3 Median :123.0
Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
drat wt qsec vs
Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
Median :3.695 Median :3.325 Median :17.71 Median :0.0000
Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
am gear carb
Min. :0.0000 Min. :3.000 Min. :1.000
1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
Median :0.0000 Median :4.000 Median :2.000
Mean :0.4062 Mean :3.688 Mean :2.812
3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
Max. :1.0000 Max. :5.000 Max. :8.000
To create an object of class foo
, simply set the class
attribute to foo
:
> x <- 1:10
> class(x) <- "foo"
> x
[1] 1 2 3 4 5 6 7 8 9 10
attr(,"class")
[1] "foo"
You can remove the class attribute using unclass
:
> unclass(x)
[1] 1 2 3 4 5 6 7 8 9 10
You can hand-build an object of a certain class:
> x = 1:10
> x
[1] 1 2 3 4 5 6 7 8 9 10
> levels(x) = as.character(x)
> x
[1] 1 2 3 4 5 6 7 8 9 10
attr(,"levels")
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"
> class(x) = "factor"
> x
[1] 1 2 3 4 5 6 7 8 9 10
Levels: 1 2 3 4 5 6 7 8 9 10
> is.factor(x)
[1] TRUE
Generally, classes have a constructor function which builds an object of the desiresd class from user input; often, the name of the function is the same as the class - e.g. lm
to create lm
, or here
> x = factor(1:10)
> x
[1] 1 2 3 4 5 6 7 8 9 10
Levels: 1 2 3 4 5 6 7 8 9 10
Writing a new method: both the class and the generic function exist already
When the class already has that method, this means that you override the existing method: this is risky and messy and should not be done lightly; see Exercise 2.
When the class does not have that method, you basically generate a new method for the existing class. E.g. there is no coef
-method for base type numeric
defined, and the default method cannot handle numeric vectors:
> x <- rnorm(10)
> test <- try( coef(x), silent = TRUE )
> cat(test)
Error : $ operator is invalid for atomic vectors
Say we want to have a method that returns the coefficient of variation (standard deviation divided by mean) for numeric vectors, we can do this:
> coef.numeric <- function(object, ...) sd(object)/mean(object)
> coef(x)
[1] -2.668404
While legal, this is a bad idea, as this method really does something completely different from what coef
is supposed to do, which is extracting model coefficients from a fitted model.
Your method should generally have the same arguments (name and order) as the generic function, in this case coef
.
Writing a new generic: this really only makes sense if you also write a corresponding method, otherwise there is nothing to do.
> blurp = function(x, ...) UseMethod("blurp")
> test = try( blurp(1), silent = TRUE)
> cat(test)
Error in UseMethod("blurp") :
no applicable method for 'blurp' applied to an object of class "c('double', 'numeric')"
> blurp.default = function(x, ...) "BLUUUURP!"
> blurp(1)
[1] "BLUUUURP!"
> blurp("a")
[1] "BLUUUURP!"
This should be some kind of activity that you is relevant across a range of different classes. This means you should probably write several methods for your generic, or offer something else to make this extra abstraction worthwhile.
You should also probably check the existing generics:
> .knownS3Generics
Math Ops Summary Complex as.character
"base" "base" "base" "base" "base"
as.data.frame as.environment as.matrix as.vector cbind
"base" "base" "base" "base" "base"
labels print rbind rep seq
"base" "base" "base" "base" "base"
seq.int solve summary t edit
"base" "base" "base" "base" "utils"
str contour hist identify image
"utils" "graphics" "graphics" "graphics" "graphics"
lines pairs plot points text
"graphics" "graphics" "graphics" "graphics" "graphics"
add1 AIC anova biplot coef
"stats" "stats" "stats" "stats" "stats"
confint deviance df.residual drop1 extractAIC
"stats" "stats" "stats" "stats" "stats"
fitted formula logLik model.frame model.matrix
"stats" "stats" "stats" "stats" "stats"
predict profile qqnorm residuals se.contrast
"stats" "stats" "stats" "stats" "stats"
terms update vcov
"stats" "stats" "stats"
(For a more complete list, use utils:::getKnownS3generics()
,)
Writing your own class: This is surprisingly easy - you only have to set the class
attribute of an object; all generic functions will dispatch on the new class, though only to the default method, as you have not defined any methods yet.
You generally want a constructor function that returns an object of the new class.
This is the most common and easiest use case: you have a new thing that is a bit more complex, so make it a class, and write a print-method, maybe a summary and plot, too.
S3 generic functions are functions: use Roxygen
S3 classes have no formal definition, so document the constructor function(s) that generates class objects:
Works well for class lm
, which has essentially one constructor (lm
, the function)
Does not work well for class htest
, which has many, many constructors: see Value description in ?t.test.default
, ?prob.test
, ?wilcox.test
S3 methods are functions, and should be documented using Roxygen (unless they are really simple).
A more formal version of S3:
explicit class definitions (using \@
for slots)
helper functions for defining generic functions & methods
multiple dispatch: methods for combinations of arguments
implemented in package methods
Example: mle
in package stats4
> ## From example(mle)
> library(stats4)
>
> ## Define data
> x <- 0:10
> y <- c(26, 17, 13, 12, 20, 5, 9, 8, 5, 4, 8)
>
> ## Define a Poisson -log(likelihood)
> nLL <- function(lambda) -sum(stats::dpois(y, lambda, log = TRUE))
>
> ## Fit the model
> fit1 <- mle(nLL, start = list(lambda = 5), nobs = length(y),
+ method = "Brent", lower = 1, upper = 20)
> fit1
Call:
mle(minuslogl = nLL, start = list(lambda = 5), method = "Brent",
nobs = length(y), lower = 1, upper = 20)
Coefficients:
lambda
11.54545
>
> ## This is an S4 object
> isS4(fit1)
[1] TRUE
> str(fit1)
Formal class 'mle' [package "stats4"] with 9 slots
..@ call : language mle(minuslogl = nLL, start = list(lambda = 5), method = "Brent", nobs = length(y), lower = 1, upper = 20)
..@ coef : num 11.5
..@ fullcoef : Named num 11.5
.. ..- attr(*, "names")= chr "lambda"
..@ vcov : num [1, 1] 1.05
..@ min : num 42.7
..@ details :List of 6
.. ..$ par : num 11.5
.. ..$ value : num 42.7
.. ..$ counts : Named logi [1:2] NA NA
.. .. ..- attr(*, "names")= chr [1:2] "function" "gradient"
.. ..$ convergence: int 0
.. ..$ message : NULL
.. ..$ hessian : num [1, 1] 0.953
..@ minuslogl:function (lambda)
.. ..- attr(*, "srcref")= 'srcref' int [1:8] 9 8 9 65 8 65 9 9
.. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x55b7b2f1e1e8>
..@ nobs : int 11
..@ method : chr "Brent"
>
> ## Methods work as before
> summary(fit1)
Maximum likelihood estimation
Call:
mle(minuslogl = nLL, start = list(lambda = 5), method = "Brent",
nobs = length(y), lower = 1, upper = 20)
Coefficients:
Estimate Std. Error
[1,] 11.54545 1.024493
-2 log L: 85.45356
> AIC(fit1)
[1] 87.45356
> AIC
standardGeneric for "AIC" defined from package "stats"
function (object, ..., k = 2)
standardGeneric("AIC")
<environment: 0x55b7af711378>
Methods may be defined for arguments: object, k
Use showMethods("AIC") for currently available ones.
>
> ## Get rid of stats4
> detach("package:stats4")
“Reference Classes”:
More conventional class-system, where methods are part of the class definition (“message passing OO”)
Objects are mutable (implicit copies carry changes forward)
Based on environments
We have used lm(1~1)
as a small artificial test example for linear models.
lm
-objects, find evidence that there is an even smaller valid linear model in R.An annoying feature of large data frames is that printing shows lots of useless information on the console. A tibble
as promoted by HW only shows the top of the data, together with the dimension of the tibble
, and lists extra columns at the bottom.
print.data.frame
that does something similar: start with a simple prototype that only shows the first few rows of the data, then add printing the dimension of the data frame on top.tibble
-package: which print method is dispatched for showing a tibble
? Which function does the hard work? If it is a method, show the corresponding generic.