Background

In R, everything that happens is due to a function, and everything that exists is an object. Functions themselves are objects. Functional programming is about concepts related to functions working together. How do functions work together? We can classify functions according to their inputs and outputs:

Input/Output Data Function
Data Regular function Function factory
Function Functional Function operator

These concepts are loosely defined, because functions can take both data and function arguments and return data and function results.

Two additional concepts we will cover are closures which is a special type of function factory that is useful in statistics, and recursion.

Functionals

Functionals invoke functions over some data structure to create an output that corresponds to that data structure. We’ve already used one functional: lapply. Here is a simpler one:

> random_stat <- function(fun) {
+     fun(runif(100))
+ }
> 
> random_stat(mean)
[1] 0.5001716

The data structure here is a single variable (runif(100)). Functionals are useful for iterating over more complex data structures as well.

lapply iterates over a list, applying the same function to each element. How can you achieve the same result using a for loop?

> random_stats <- function(fun) {
+     
+     lapply(list(uniform = runif(100), normal = rnorm(100), 
+                 t = rt(100, 4), exponential = rexp(100)), fun)
+     
+ }
> 
> random_stats(mean)
$uniform
[1] 0.5107888

$normal
[1] 0.1364124

$t
[1] -0.004655624

$exponential
[1] 0.9776181
> random_stats(sd)
$uniform
[1] 0.3128442

$normal
[1] 1.041357

$t
[1] 1.33664

$exponential
[1] 0.9067396
> 
> lapply(list(mean = mean, sd = sd), random_stats)
$mean
$mean$uniform
[1] 0.5078511

$mean$normal
[1] 0.186606

$mean$t
[1] 0.1041629

$mean$exponential
[1] 1.025738


$sd
$sd$uniform
[1] 0.274014

$sd$normal
[1] 1.006277

$sd$t
[1] 1.376337

$sd$exponential
[1] 1.161874

mapply iterates over a series of lists. Same question?

All of these iteration problems can be solved with loops. What are the advantages/disadvantages of using loops?

Function operators

Function operators take functions as input and return functions as output, generally modifying the behaviour of the function.

Let’s see if we can implement an error capture adverb.

> #' Modify a function to fail without stopping
> #' 
> #' @param f A function
> 
> capture_errors <- function(f) {
+     
+     function(...) {
+         tryCatch(f(...), 
+                  error = function(e) as.character(e))    
+         
+     }
+ }
> 
> geometric_mean <- function(x) {
+     
+     exp(mean(log(x)))
+     
+ }
> 
> safe_geom_mean <- capture_errors(geometric_mean)
> #lapply(iris, geometric_mean)
> lapply(iris, safe_geom_mean)
$Sepal.Length
[1] 5.78572

$Sepal.Width
[1] 3.026598

$Petal.Length
[1] 3.238267

$Petal.Width
[1] 0.8417075

$Species
[1] "Error in Math.factor(x): 'log' not meaningful for factors\n"

Function factories and closures

Function factories take data as input and return functions:

> power1 <- function(exp) {
+   function(x) {
+     x ^ exp
+   }
+ }
> 
> square <- power1(2)
> cube <- power1(3)

Function factories are useful in statistics. Let’s do some regression!

> x <- rnorm(50)
> y <- 1 + 1.5 * x + rnorm(50, sd = 1)
> 
> mse_loss <- function(xmat, outcome) {
+     
+     function(beta) {
+         
+         linpred <- xmat %*% beta
+         mean((linpred - outcome)^2)
+         
+     }
+     
+ }
> 
> 
> mse_loss(cbind(1, x), y)(c(0, 0))
[1] 4.938843
> 
> optim(par = c(0, 0), mse_loss(cbind(1, x), y))
$par
[1] 1.066395 1.440556

$value
[1] 0.8810061

$counts
function gradient 
      75       NA 

$convergence
[1] 0

$message
NULL

What are some ways we could make this function more general? Multivariate, different loss functions?

> 
> general_loss <- function(xmat, outcome, fun) {
+     
+     function(beta) {
+         
+         linpred <- xmat %*% beta
+         fun(linpred, outcome)
+         
+     }
+     
+ }
> 
> mad <- function(x1, x2) {
+   
+   median(abs(x1 - x2))
+   
+ }
> mad_loss <- general_loss(cbind(1, x), y, mad)
> 
> 
> optim(par = c(0, 0), mad_loss)
$par
[1] -0.06130709  0.29488930

$value
[1] 1.491382

$counts
function gradient 
      83       NA 

$convergence
[1] 0

$message
NULL

Recursive functions

Recursive functions are functions that call themselves. It is critical that they have an exit condition. The basic example is computing the Fibonacci numbers:

> fibonacci <- function(n) {
+     
+     if(n < 2){
+         
+         return(1)
+         
+     } else {
+         
+         fibonacci(n - 1) + fibonacci(n - 2)
+     }
+     
+ }
> 
> fibonacci(9)
[1] 55

This seems like a useless novelty, but they are actually quite useful in dealing with DAGs and nested data structures.

Discussion

  1. The pipe operator %>% from the magrittr package allows you to perform function composition in a clear, linear way. E.g.,
> library(magrittr)
> 
> set.seed(123)
> mean(exp(rnorm(100)))
[1] 1.652545
> 
> ## becomes
> set.seed(123)
> rnorm(100) %>% 
+     exp %>% 
+     mean
[1] 1.652545

Is function composition an example of functional programming? Why or why not?