Flow control and loops

Day 2, A

Michael C Sachs

Flow control

What?

  • Normally the code gets run line by line, in order, top to bottom

  • There are special commands that allow you change that

  • Conditional execution, choose which code to run depending on logical conditions

    • if(<condition>)
    • else if
    • else
  • Loops, repeat a chunk of code several times

    • repeat
    • for
    • while

If then else

if(<condition>) {
  
  <do something>
    
} else {
  
  <do something else>
  
}

If the <condition> evaluates to TRUE, then something gets executed, otherwise something else gets executed.

<condition> must be length one. No vectors allowed.

Examples

num_reps <- if(final_report) 5000 else 200

brackets are helpful for understanding and clarity

p_value <- if(nonparametric) {
  wilcox.test(mpg ~ vs, data = mtcars, exact = FALSE)$p.value
} else {
  t.test(mpg ~ vs, data = mtcars)$p.value
}

There does not need to be anything returned from the expression

if(log_transform) {
  
  data$Y <- log(data$Y)
  
}

ifelse

This is a function, not a statement like if and else.

It is vectorized and hence better suited for working with data.

ifelse(<logical vector>, <yes vector>, <no vector>). All three vectors should be the same length or recycling happens.

It returns a vector with elements from yes when TRUE, and elements from no when FALSE.

rawdata <- c("12.63", "62.45", "<2") ## lower limit of detection
as.numeric(ifelse(rawdata == "<2", 1, rawdata))
[1] 12.63 62.45  1.00

Loops

Basic concepts

A loop repeatedly and sequentially evaluates an expression, i.e.,

{
  <stuff contained inside curly brackets>
}

A loop will continue forever unless you tell it to stop. You tell it to stop in different ways for different loop expressions

Repeat

This is the simplest of loops, it will repeat an expression until it encounters break

This will run forever

repeat {
  print("hello")
}

This will run exactly once

repeat {
  print("hello")
  break
}

This will run 5 times

i <- 1
repeat {
  print("hello")
  if(i == 5) break
  i <- i + 1
}

While

Notice the pattern, repeat an expression until a condition is met.

The condition usually depends on a variable that changes at each iteration, in this case i, the iterator

while loops explicitly state the condition at the start:

i <- 1
while(i <= 5) {
  print("hello")
  i <- i + 1
}
[1] "hello"
[1] "hello"
[1] "hello"
[1] "hello"
[1] "hello"

For loops

A for loop explicitly states the sequence of the iterator at the start. Then the “end condition” is that the loop has reached the end of the sequence.

for(i in 1:5) {
  print("hello")
}
[1] "hello"
[1] "hello"
[1] "hello"
[1] "hello"
[1] "hello"

In this case, it is the most concise. I personally use for loops more than any other loop.

break can be used inside any loop to end it. next can be used to go to the next iteration.

for(i in 1:5) {
  if(i == 2) next
  print(paste("hello", i))
}
[1] "hello 1"
[1] "hello 3"
[1] "hello 4"
[1] "hello 5"

Example - Iterating through species

library(palmerpenguins)

species_names <- levels(penguins$species)
mean_bill_length <- numeric(length(species_names))
names(mean_bill_length) <- species_names

for(i in species_names){  ## iterator is a character
  
  mean_bill_length[i] <- ## indexing by name
    mean(subset(penguins, species == i)$bill_length_mm, na.rm = TRUE)
  
}

mean_bill_length
   Adelie Chinstrap    Gentoo 
 38.79139  48.83382  47.50488 

Bootstrap

mu.body_mass <- mean(penguins$body_mass_g, na.rm = TRUE)
bsmeans <- vector("numeric", length = 2000)
for(i in 1:length(bsmeans)) {
  resampled.body_mass <- sample(penguins$body_mass_g, replace = TRUE)
  bsmeans[i] <- mean(resampled.body_mass, na.rm = TRUE)
}
hist(bsmeans)
abline(v = mu.body_mass, col = "red")

Nested loops

A loop can itself contain a loop, or multiple loops.

species_names <- levels(penguins$species)
island_names <- levels(penguins$island)
mean_bm_matrix <- matrix(NA, nrow = length(species_names), 
                         ncol = length(island_names), 
                         dimnames = list(species_names, island_names))

for(i in species_names) {
  for(j in island_names) {
   
    thisset <- subset(penguins, species == i & 
                        island == j)
    
    if(nrow(thisset) == 0) next
    
    mean_bm_matrix[i, j] <- mean(thisset$body_mass_g, na.rm = TRUE)
     
  }
}
mean_bm_matrix
            Biscoe    Dream Torgersen
Adelie    3709.659 3688.393  3706.373
Chinstrap       NA 3733.088        NA
Gentoo    5076.016       NA        NA

Note on speed

You may see people warn you not to use for loops in R, “because they are slow”.

That is partially true, but speed is not the only thing, for loops can be much clearer and more understandable than the alternatives.

But, the slow thing in R is changing the size of an object, so you can avoid that by creating a vector/matrix/array of the correct size to hold the results of the loop:

system.time({
A1 <- NULL
for(i in 1:100000) {
  A1 <- c(A1, rnorm(1))
}
})
   user  system elapsed 
 12.344   0.101  12.464 
system.time({
A2 <- numeric(100000)
for(i in 1:length(A2)) {
  A2[i] <- rnorm(1)
}
})
   user  system elapsed 
  0.164   0.001   0.165 

Practical

  1. Practice working if and else statements
  2. Practice working with loops

Link to lesson

Link home