Introduction and Overview

Day 1

Michael C Sachs

Introduction

Learning objectives

  • Understand and apply programming principles to handle repetitive tasks
  • Understand how to use and make your own functions in R
  • Use loops in R
  • Apply these principles to perform efficient data manipulation, analysis, and reporting

Why?

If your research involves any data, it likely also involves:

  • Data management and manipulation
  • Analysis
  • Reporting of results

This means lots of code.

We can borrow some ideas from software development to

  1. Make our lives easier – repetitive tasks, updates with new data
  2. Make science better – reproducible, understandable, reusable

Goals

At a bare minimum, we want our code to be:

Organized

  • Coherent structure of
    • folders, files, and scripts

Transportable

  • Works on
    • my future computer
    • other people’s computers

Understandable

  • Clearly specified dependencies
  • Readable code, sensible naming of things
  • Written documentation explaining what and how

Extras

Nice to have, but not absolutely necessary

Provably correct

Through automated testing

  • Pieces of code correctly do what they should
  • Code works together correctly, not sensitive to minor changes in the data
  • Valid statistical properties

Version controlled

  • History of development is tracked and documented

Overview of this course

Lectures

Lessons

Work through a problem together, learn tools and strategies

Sticky note system

red note = “I need help!”

green note = “Good to go!”

No sticky = still working

Share solutions/questions/responses on padlet:

https://padlet.com/sachsmc/rprogs23

Course website:

https://sachsmc.github.io/r-programming/

Schedule of topics

Today

  1. Reporting, dynamic documents
  2. Data structures and indexing

Thursday

  1. Flow control and loops
  2. Creating and using functions

Next Monday

  1. Working with data, merging and reshaping
  2. Working with data, dates and characters

Next Thursday

  1. Data visualization
  2. Time to work on exam

About me

What I do

  • Develop and evaluate statistical methods
    • Write R packages
    • Write R code to test the methods
  • Collaborate on research projects involving data analysis
    • Use register and other data from stats Denmark
    • Data analysis, visualization and reporting

My R philosophy

  • I started learning R in 2005 (version 2.2)
  • There is almost no problem that can’t be solved with R
    • Flexible and dynamic
    • Plays nicely with almost any other programming language

Course philosophy

  • There are no stupid questions
  • There are many ways to solve a problem, there is no single “right way”
  • I will not force you to learn any particular way, e.g., tidyverse vs data.table, we will focus on learning the general principles.
    • Try them out, then choose one and get good at it
    • Same with project organization, choose a system and stick with it
  • Show up to class and participate, you will pass and hopefully learn something useful

Exam

To get credit for the course, you must hand in and pass the exam which will be an email sent to michael.sachs@sund.ku.dk no later than 23.59 on April 4.

It must contain as attachments:

  1. A rendered output document in a format of your choice that includes prose, code, and output from code.
  2. The source document containing code
  3. Any other files necessary to run the code

The body of the email should be the “readme” file, with details on

  1. What is included
  2. How to run the code and reproduce the results

How to pass the exam

  1. Send something on time (Before midnight on April 4)
  2. Demonstrate mastery of at least 3 out of the 4 learning objectives.

An easy way to do this is to do the exercises and save them all in a single Rmarkdown or Quarto document.

Office hours

The week after the course and before April 4, I will hold office hours where you can come ask questions and get help on any R topic

I will send sign up link by email

Questions?

Link home