Understanding conditional and repeated execution of R expressions
Author
Michael C Sachs
Learning objectives
In this lesson you will
Practice working with if and else statements for conditional execution
Practice working with loops for repeated execution
Simple loops and conditional statements
Use a loop to print every number from 1 to 10
Modify the loop to print every even number from 1 to 10 (hint: add an if statement and use (i %% 2) == 0 to check whether i is divisible by 2).
Loops for statistical analysis
Load the palmerpenguins dataset:
library(palmerpenguins)penguins
# A tibble: 344 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
5 Adelie Torgersen 36.7 19.3 193 3450
6 Adelie Torgersen 39.3 20.6 190 3650
7 Adelie Torgersen 38.9 17.8 181 3625
8 Adelie Torgersen 39.2 19.6 195 4675
9 Adelie Torgersen 34.1 18.1 193 3475
10 Adelie Torgersen 42 20.2 190 4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <fct>, year <int>
Write a loop that calculates and prints out the mean for each numeric variable in the penguins dataset
Hints
How do you determine if a variable is numeric? You can use the is.numeric function, which returns TRUE or FALSE
Inside a loop, sometimes R does not print things to the console, so you need to wrap them in the print function, e.g., print(mean(x)) inside the loop.
Modify your loop in 1 so that it prints out the mean, standard deviation, median, and interquartile range for each numeric variable in penguins.
Hints
Use a nested loop where one of the iterators is the name of a function. To retrieve a function by name, use the get function, e.g., get("mean") returns the mean function, which can then saved as an intermediate object and be used like any other function.
Write a loop to compute 500 bootstrap replicates of the means of bill length, bill depth, and flipper length. Remember to pre-allocate a data structure to store the 500 times 3 values. Provide an estimate of the correlation of the sample means.
Hints
To get a bootstrap sample of a vector, use the sample function with the argument replace = TRUE.
You can store the replicates in a matrix with 500 rows and 3 columns. Refer to the data structures lecture for information about indexing matrices.
Loops for numeric calculation
Loops are sometimes unavoidable if a calculation depends on the value at one or more of the previous iterations.
One way to compute the Kaplan-Meier curve for right censored data is to loop through the death times and accumulate the product of 1 minus the number of deaths at each time over the number at risk at that time. Complete the following code to compute the KM curve and compare to the result from the survival package.
amldat <- survival::amllibrary(survival)deathtimes <-c(0, sort(unique(amldat$time[amldat$status ==1])))surv <-c(1, numeric(length(deathtimes) -1))for(i in2:length(deathtimes)) {## n_i = number still at risk atrisk <-subset(amldat, time > deathtimes[i -1])## count the number of deaths at time ti, call it d_i## then compute 1 - d_i / n_i and multiply it by the previous survival probability}## plot(surv ~ deathtimes)## lines(survfit(Surv(time, status) ~ 1, data = amldat))
Optional: Loops to do data manipulation
Important
This exercises uses some concepts from functional programming, which we have not covered yet. Try it if you have time, but if not you can return to this after the functions lecture.
You may notice that some of the variables have missing values. We would like to replace the missing values with the “typical” value that is observed.
Write a loop that contains an if then else statement that goes through the variables in penguins and replaces missing values with the mean for numeric double variables, and the most frequent value for characters or factors.
Hints
Use this function to calculate the mode in a way that returns the same data type.
Inside the loop you will need to check whether the variable is of type “double”. Do this using the is.double function, which returns a TRUE or FALSE.
To assign a new value m to only the missing elements of a vector x, you can do the following x[is.na(x)] <- m. Refer to the data structures lecture for information about indexing vectors