Functions

Day 2, B

Michael C Sachs

Using functions

A function and its components

Almost everything you do in R involves functions. You call a function by typing its name with its arguments (inputs) inside the parentheses:

The function takes the arguments you provide, does something, and then returns an object. To see what a function does, you can type its name without parentheses to see the source:

sample
function (x, size, replace = FALSE, prob = NULL) 
{
    if (length(x) == 1L && is.numeric(x) && is.finite(x) && x >= 
        1) {
        if (missing(size)) 
            size <- x
        sample.int(x, size, replace, prob)
    }
    else {
        if (missing(size)) 
            size <- length(x)
        x[sample.int(length(x), size, replace, prob)]
    }
}
<bytecode: 0x586014475ba8>
<environment: namespace:base>

The source shows you the arguments, their default values, and the expression defining the function. You can also look at the help file for the documentation:

help("sample")
# or
?sample

Using functions – arguments

Functions can have 0 or more arguments, with or without defaults.

The arguments can be given in order, or by name

Names can be partially matched, which can be confusing:

The ellipsis argument

Some functions take ... as an argument, e.g., paste, list, also the apply family.

There are 2 reasons for this:

  1. There could be varying numbers of arguments
  1. To pass optional arguments to other functions involved

Using functions – composition

Often we want to use the result of one function as the argument to another function. There are many ways to do this:

  1. Intermediate variables
  1. Nested function calls
  1. The pipe operator |> (available in R 4.0.1)

Using functions – the apply family

Some functions will take other functions as arguments. An example is the apply family of functions, which applies a function over an index or iterator. See help(apply)

apply repeated applies a function over the dimensions of an array. MARGIN indicates which dimension, and then for each index in that dimension, it applies FUN to the sub-array

Apply continued

tapply is commonly used with data. It subsets the data X based on the INDEX argument, then applies a function to each subset:

lapply

lapply is more general, in that it can take any index and apply any function that takes the index as an argument. It always returns a list. sapply does the same, but simplifies the result to an array, if possible.

mapply

This is the multivariate version of sapply that allows vector arguments.

See also the purrr package

Notes on speed and flexibility

The apply family of functions is computationally equivalent to a loop (with pre-allocation)

Using apply instead of a for loop will not be faster computationally

It may be faster to write, but it may also be harder to understand

You can do whatever you want inside a for loop, how would you do something more complex with lapply?

Writing your own functions

A simple function

A function with arguments

Local variables and scoping

name2 is a local variable. It exists only inside the function.

Modifying local variables outside the function has no effect. But be careful:

Arguments passed by value

Likewise, arguments modified inside the function do not change the object outside the function.

Lexical scoping

This is called lexical scoping: it defines how R looks for objects when they are referred to by name

If R sees a variable it needs to use inside a function, and it is not an argument or local variable, then it follows these rules to find the object with that name:

  1. Look in the environment where the function was defined.
  2. If not found, look in the parent environment of 1
  3. If not found continue going down into parents until there are no more.

Note the specification sees a variable and needs to use it. This is called lazy evaluation: R does not evaluate anything until it needs to use it

Lexical scoping example

This can be used to your advantage, e.g.,

Lazy evaluation example

One way to manually check for arguments is with missing:

Using match.arg

Look at the help file for t.test, and specifically the alternative argument. It is a vector with 3 elements, but only one is used. Also, it can be partially matched, e.g.,

How does that work? Using match.arg inside the function:

Anonymous functions

Your own functions do not need to be saved and assigned names. If a function does not have a name it is anonymous, I use these often with the apply family:

Since R 4.0.1, \() can be used as shorthand for function():

Operators

Operators are symbols like +, <-, %*%, [.

These are functions! To treat them like functions instead of operators, use backticks:

You can then treat operators as you would any other function, using them in apply or otherwise

You can also define your own operators:

Assignment operators have a special syntax:

Generic methods/functions

Look at the function print

print
function (x, ...) 
UseMethod("print")
<bytecode: 0x586013aa86a0>
<environment: namespace:base>

It is a generic function. UseMethod says depending on the class of argument object, R will call a suitable method (a function) that does something designed for whatever object is.

You can find all the special methods by running methods("print") (try it now).

The class of the object is a simple attribute and the method is defined by appending the class name after the function name separated by a dot. This is called the S3 class system:

Summary

In R, everything that happens is due to a function, and everything that exists is an object. Functions themselves are objects.

How do functions work together? We can classify functions according to their inputs and outputs:

Input/Output Data Function
Data Regular function Function factory
Function Functional Function operator

These concepts are loosely defined, because functions can take both data and function arguments and return data and function results.

Designing functions

When should you write a function? How should it be designed?

  1. The DRY principle: don’t repeat yourself
  2. Consider the audience
  • Don’t write a function unless you expect somebody to use it
  • Consider the most likely use cases, and remember you can’t make everyone happy
  1. Balance ease-of-use with understandability
  • Break down the task into a series of smaller tasks, and abstract them away into functions
  • Reuse or build? Dependencies (using functions from other packages) may change in unpredictable ways
  • Default arguments and error checking – you can’t prevent all errors, ultimately it is the users’ responsibility to use the tools correctly

Some more advanced topics

get and assign

Recall that we can retrieve a variable from a data frame by using a character string, e.g., penguins[["species"]].

We can use a character string to get or assign any other object using these functions. For example, this returns the function called mean

which we can use like a function

Likewise, an object can be created with assign

Uses of get and assign

Example, iterating over functions by name:

Example, retrieving a function programmatically,

Example, programmatically creating new variables,

do.call

A variant on get is do.call. This takes a function as the first argument, then a list containing the arguments for the function, do.call(<function>, <list of arguments to function>).

A common use for this is with functions that take a variable number of arguments, e.g., cbind, paste, where the arguments are created programmatically.

simple example,

arranging a list into a matrix

Global assigment operator

There is the <<- operator, which is used in functions and does (re)assignment outside the function. It searches the parent environments and reassigns where found, if not found it assigns in the global environment.

This is generally considered to be a bad idea, but now you know about it.

Recursive functions

Functions that call themselves are possible.

As with repeat loops, they need to have a break condition

These are actually useful when working with nested lists and directed acyclic graphs, for example.

Practical

  1. Modify and write functions
  2. Use apply to iterate functions over data
  3. Write your own class and generic print function

Link to lesson

Link home