Flow control and loops – exercise solutions

Day 2, A

Author

Michael C Sachs

Simple loops and conditional statements

  1. Use a loop to print every number from 1 to 10
  2. Modify the loop to print every even number from 1 to 10 (hint: add an if statement and use (i %% 2) == 0 to check whether i is divisible by 2).
## 1.
for(i in 1:10){
  print(i)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
## 2. The %% operator is the modulo function.
##    i %% 2 attempts to divide i by 2, with 
##    whole numbers only and returns the remainder. 
for(i in 1:10){
  if((i %% 2) == 0) {
  print(i)
  }
}
[1] 2
[1] 4
[1] 6
[1] 8
[1] 10

Loops for statistical analysis

library(palmerpenguins)
  1. Write a loop that calculates and prints out the mean for each numeric variable in the penguins dataset
for(i in 1:ncol(penguins)) {
  ## cat prints to the console without quotes
  cat(names(penguins)[i], " - ", sep = "")
  vari <- penguins[[i]] ## this iterates through the columns of penguins
  if(is.numeric(vari)) { ## check whether it is numeric
    cat("Mean:", mean(vari, na.rm = TRUE))
  } else {
    cat("not numeric")
  }
  cat("\n") # new line
}
species - not numeric
island - not numeric
bill_length_mm - Mean: 43.92193
bill_depth_mm - Mean: 17.15117
flipper_length_mm - Mean: 200.9152
body_mass_g - Mean: 4201.754
sex - not numeric
year - Mean: 2008.029
  1. Modify your loop in 1 so that it prints out the mean, standard deviation, median, and interquartile range for each numeric variable in penguins.
f.names <- c("mean", "std dev", "median", "interquartile range")
functions <- list(mean, sd, median, IQR)
## we will add an inner loop that iterates through these function and names

for(i in 1:ncol(penguins)) {
  cat(names(penguins)[i], ":\n", sep = "")
  vari <- penguins[[i]]
  
  if(is.numeric(vari)) { 
    for(fn in 1:length(f.names)) {
      cat(f.names[fn], functions[[fn]](vari, na.rm = TRUE), "\n")
    }
  } else {
    cat("not numeric\n")
  }
  cat("--------\n") # new line
}
species:
not numeric
--------
island:
not numeric
--------
bill_length_mm:
mean 43.92193 
std dev 5.459584 
median 44.45 
interquartile range 9.275 
--------
bill_depth_mm:
mean 17.15117 
std dev 1.974793 
median 17.3 
interquartile range 3.1 
--------
flipper_length_mm:
mean 200.9152 
std dev 14.06171 
median 197 
interquartile range 23 
--------
body_mass_g:
mean 4201.754 
std dev 801.9545 
median 4050 
interquartile range 1200 
--------
sex:
not numeric
--------
year:
mean 2008.029 
std dev 0.8183559 
median 2008 
interquartile range 2 
--------
  1. Write a loop to compute 500 bootstrap replicates of the means of bill length, bill depth, and flipper length. Remember to pre-allocate a data structure to store the 500 times 3 values. Provide an estimate of the correlation of the sample means.
bootmeans <- matrix(NA, nrow = 500, ncol = 3, 
                    dimnames = list(NULL, c("bill_length", "bill_depth", "flipper_length")))
for(i in 1:nrow(bootmeans)){
  bootindex <- sample(1:nrow(penguins), replace = TRUE)
  peng.star <- penguins[bootindex, ]
  
  bootmeans[i, 1] <- mean(peng.star$bill_length_mm, na.rm = TRUE)
  bootmeans[i, 2] <- mean(peng.star$bill_depth_mm, na.rm = TRUE)
  bootmeans[i, 3] <- mean(peng.star$flipper_length_mm, na.rm = TRUE)
}

summary(bootmeans)
  bill_length      bill_depth    flipper_length 
 Min.   :43.11   Min.   :16.78   Min.   :198.1  
 1st Qu.:43.72   1st Qu.:17.07   1st Qu.:200.4  
 Median :43.92   Median :17.14   Median :200.9  
 Mean   :43.92   Mean   :17.15   Mean   :200.9  
 3rd Qu.:44.12   3rd Qu.:17.22   3rd Qu.:201.4  
 Max.   :44.88   Max.   :17.49   Max.   :203.1  
cor(bootmeans)
               bill_length bill_depth flipper_length
bill_length      1.0000000 -0.2786473      0.6238246
bill_depth      -0.2786473  1.0000000     -0.6494218
flipper_length   0.6238246 -0.6494218      1.0000000

Loops for numeric calculation

Loops are sometimes unavoidable if a calculation depends on the value at one or more of the previous iterations.

One way to compute the Kaplan-Meier curve for right censored data is to loop through the death times and accumulate the product of 1 minus the number of deaths at each time over the number at risk at that time. Complete the following code to compute the KM curve and compare to the result from the survival package.

amldat <- survival::aml
library(survival)

deathtimes <- c(0, sort(unique(amldat$time[amldat$status == 1])))
surv <- c(1, numeric(length(deathtimes) - 1))

for(i in 2:length(deathtimes)) {
  
  atrisk <- subset(amldat, time > deathtimes[i - 1])
  
  deaths_at_ti <- sum(atrisk$status[atrisk$time == deathtimes[i]] == 1)
  
  surv[i] <- surv[i - 1] * (1 - (deaths_at_ti / 
                                     nrow(atrisk)))
  
}

plot(surv ~ deathtimes)
lines(survfit(Surv(time, status) ~ 1, data = amldat))

Loops to do data manipulation

  1. Write a loop that contains an if then else statement that goes through the variables in penguins and replaces missing values with the mean for numeric double variables, and the most frequent value for characters or factors.
my_mode <- function(x) {
  
  converter <- get(paste0("as.", class(x)))
  tab <- table(x) |> sort(decreasing = TRUE) 
  names(tab)[1] |> converter()
  
}


for(colnum in 1:ncol(penguins)){
  
  thiscol <- penguins[[colnum]]
  if(is.double(thiscol)) {
    
    thismean <- mean(thiscol, na.rm = TRUE)
    penguins[is.na(thiscol), colnum] <- thismean
    
  } else {
    
    penguins[is.na(thiscol), colnum] <- my_mode(thiscol)
    
  } 
}

summary(penguins)
      species          island    bill_length_mm  bill_depth_mm  
 Adelie   :152   Biscoe   :168   Min.   :32.10   Min.   :13.10  
 Chinstrap: 68   Dream    :124   1st Qu.:39.27   1st Qu.:15.60  
 Gentoo   :124   Torgersen: 52   Median :44.25   Median :17.30  
                                 Mean   :43.92   Mean   :17.15  
                                 3rd Qu.:48.50   3rd Qu.:18.70  
                                 Max.   :59.60   Max.   :21.50  
 flipper_length_mm  body_mass_g       sex           year     
 Min.   :172.0     Min.   :2700   female:165   Min.   :2007  
 1st Qu.:190.0     1st Qu.:3550   male  :179   1st Qu.:2007  
 Median :197.0     Median :4025                Median :2008  
 Mean   :200.9     Mean   :4199                Mean   :2008  
 3rd Qu.:213.0     3rd Qu.:4750                3rd Qu.:2009  
 Max.   :231.0     Max.   :6300                Max.   :2009