split(penguins$body_mass_g, list(penguins$species, penguins$sex)) |>
lapply(FUN = mean_sd)
Functions – exercises
Day 2, B
Understanding how to create and use your own functions
Learning objectives
In this lesson you will
- Learn how the
apply
family of functions works, and the alternatives usingdplyr
anddata.table
- Practice writing and reusing your own functions
Iterating over data with functions
- Load the Palmer penguins dataset.
- Write a function to compute the mean and standard deviation of a numeric vector. We will apply this function to the numeric variables in
penguins
, and also by different subgroups
Stop and think
- What are the components of a function?
- Do I have to worry about missing data? How can I deal with it?
- What sort of data structure should I return?
Using the function
- Use your function to compute the mean and sd of all the numeric variables in
penguins
. - Use your function to compute the mean and sd of body mass by species and sex
- Try using one of the apply functions
Hints
sapply
instead of lapply
. What about tapply
?
- Try using
dplyr
: check out the functionsgroup_by
andsummarize
Hints
library(dplyr)
|> group_by(species, sex) |>
penguins summarize(mean_sd(body_mass_g))
- Try using
data.table
: use the.by
argument in the[
Hints
library(data.table)
<- data.table(penguins)
pengdt
mean_sd(body_mass_g), by = list(species, sex)]
pengdt[, ## . can be used as shorthand for list in data table
mean_sd(body_mass_g), by = .(species, sex)] pengdt[,
Classes and custom generics
Now that you have some functions to do something interesting, let’s create a “class” to indicate that the object has a specific meaning.
- Modify your mean and sd function so that the data structure that is returned has class “meansd”. There are two ways to do this:
- Say the object you currently return is called
res
, instead ofres
, returnstructure(res, class = "meansd")
- Add the line
class(res) <- "meansd"
before returning `res`` - Use the
attr
function to create and assign additional information, for example the name of the variable, You can get the name of the object passed tox
usingdeparse1(substitute(x))
.
- Say the object you currently return is called
Hints
<- function(x, na.rm = TRUE) {
mean_sd2 <- c(mean = mean(x, na.rm = na.rm),
res sd = sd(x, na.rm = na.rm))
attr(res, "variable") <- deparse1(substitute(x))
class(res) <- "meansd"
res }
- Write a custom print function
print.meansd
that nicely prints the mean and standard deviation. Use the functionsround
andpaste
functions to create a string, then print it out using thecat
function.
Hints
<- function(x, digits = 2) {
print.meansd
<- paste0(round(x["mean"], digits = digits), " (",
msd round(x["sd"], digits = digits), ")")
cat("mean (sd) of ",
attr(x, "variable"), ":",
"\n")
msd,
}
More functions
- Write another function that constructs a one-sample t-statistic from an estimated mean and standard deviation. Recall that the t-statistic to test the null hypothesis that \(\mu = \mu_0\) is \[ T = \frac{\overline{X} - \mu_0}{\hat{\sigma}/\sqrt{n}} \] where \(\overline{X}\) is the sample mean and \(\hat{\sigma}\) is the sample standard deviation and \(n\) is the sample size.
- Write another function that takes the t-statistic and calculates a p-value
- Compose your custom functions in order to test the null hypothesis that the mean body mass of penguins is 4000g. Try using the pipe operator
|>
.
If you have time or on your own
- Expand your class to include confidence interval and p-value calculation/printing. Check out the
scales::pvalue
function. - Look at the
t.test
function. What type of object does this return? - Look at the print method for the class of the object returned by
t.test
, use the commandstats:::print.htest
to find the source. How does it work? How would you modify it? - Are there any other methods are available for that class? Use the
methods
function to find out. What would be another useful method for this class?