In R, everything that happens is due to a function, and everything that exists is an object. Functions themselves are objects. Functional programming is about concepts related to functions working together. How do functions work together? We can classify functions according to their inputs and outputs:
Input/Output | Data | Function |
---|---|---|
Data | Regular function | Function factory |
Function | Functional | Function operator |
These concepts are loosely defined, because functions can take both data and function arguments and return data and function results.
Two additional concepts we will cover are closures which is a special type of function factory that is useful in statistics, and recursion.
Functionals invoke functions over some data structure to create an output that corresponds to that data structure. We’ve already used one functional: lapply
. Here is a simpler one:
> random_stat <- function(fun) {
+ fun(runif(100))
+ }
>
> random_stat(mean)
[1] 0.5001716
The data structure here is a single variable (runif(100)
). Functionals are useful for iterating over more complex data structures as well.
lapply
iterates over a list, applying the same function to each element. How can you achieve the same result using a for
loop?
> random_stats <- function(fun) {
+
+ lapply(list(uniform = runif(100), normal = rnorm(100),
+ t = rt(100, 4), exponential = rexp(100)), fun)
+
+ }
>
> random_stats(mean)
$uniform
[1] 0.5107888
$normal
[1] 0.1364124
$t
[1] -0.004655624
$exponential
[1] 0.9776181
> random_stats(sd)
$uniform
[1] 0.3128442
$normal
[1] 1.041357
$t
[1] 1.33664
$exponential
[1] 0.9067396
>
> lapply(list(mean = mean, sd = sd), random_stats)
$mean
$mean$uniform
[1] 0.5078511
$mean$normal
[1] 0.186606
$mean$t
[1] 0.1041629
$mean$exponential
[1] 1.025738
$sd
$sd$uniform
[1] 0.274014
$sd$normal
[1] 1.006277
$sd$t
[1] 1.376337
$sd$exponential
[1] 1.161874
mapply
iterates over a series of lists. Same question?
All of these iteration problems can be solved with loops. What are the advantages/disadvantages of using loops?
Function operators take functions as input and return functions as output, generally modifying the behaviour of the function.
Let’s see if we can implement an error capture adverb.
> #' Modify a function to fail without stopping
> #'
> #' @param f A function
>
> capture_errors <- function(f) {
+
+ function(...) {
+ tryCatch(f(...),
+ error = function(e) as.character(e))
+
+ }
+ }
>
> geometric_mean <- function(x) {
+
+ exp(mean(log(x)))
+
+ }
>
> safe_geom_mean <- capture_errors(geometric_mean)
> #lapply(iris, geometric_mean)
> lapply(iris, safe_geom_mean)
$Sepal.Length
[1] 5.78572
$Sepal.Width
[1] 3.026598
$Petal.Length
[1] 3.238267
$Petal.Width
[1] 0.8417075
$Species
[1] "Error in Math.factor(x): 'log' not meaningful for factors\n"
Function factories take data as input and return functions:
> power1 <- function(exp) {
+ function(x) {
+ x ^ exp
+ }
+ }
>
> square <- power1(2)
> cube <- power1(3)
Function factories are useful in statistics. Let’s do some regression!
> x <- rnorm(50)
> y <- 1 + 1.5 * x + rnorm(50, sd = 1)
>
> mse_loss <- function(xmat, outcome) {
+
+ function(beta) {
+
+ linpred <- xmat %*% beta
+ mean((linpred - outcome)^2)
+
+ }
+
+ }
>
>
> mse_loss(cbind(1, x), y)(c(0, 0))
[1] 4.938843
>
> optim(par = c(0, 0), mse_loss(cbind(1, x), y))
$par
[1] 1.066395 1.440556
$value
[1] 0.8810061
$counts
function gradient
75 NA
$convergence
[1] 0
$message
NULL
What are some ways we could make this function more general? Multivariate, different loss functions?
>
> general_loss <- function(xmat, outcome, fun) {
+
+ function(beta) {
+
+ linpred <- xmat %*% beta
+ fun(linpred, outcome)
+
+ }
+
+ }
>
> mad <- function(x1, x2) {
+
+ median(abs(x1 - x2))
+
+ }
> mad_loss <- general_loss(cbind(1, x), y, mad)
>
>
> optim(par = c(0, 0), mad_loss)
$par
[1] -0.06130709 0.29488930
$value
[1] 1.491382
$counts
function gradient
83 NA
$convergence
[1] 0
$message
NULL
Recursive functions are functions that call themselves. It is critical that they have an exit condition. The basic example is computing the Fibonacci numbers:
> fibonacci <- function(n) {
+
+ if(n < 2){
+
+ return(1)
+
+ } else {
+
+ fibonacci(n - 1) + fibonacci(n - 2)
+ }
+
+ }
>
> fibonacci(9)
[1] 55
This seems like a useless novelty, but they are actually quite useful in dealing with DAGs and nested data structures.
%>%
from the magrittr
package allows you to perform function composition in a clear, linear way. E.g.,> library(magrittr)
>
> set.seed(123)
> mean(exp(rnorm(100)))
[1] 1.652545
>
> ## becomes
> set.seed(123)
> rnorm(100) %>%
+ exp %>%
+ mean
[1] 1.652545
Is function composition an example of functional programming? Why or why not?