Day 3, B
We’ve worked a bit with the patient register data defining possible exposures/outcomes using ICD codes and dates
Often the goal of a join or merge is to add new variables to an existing data table
Example, given a table of hospitalization diagnoses, I want to know the drug dispensations that happened for each person after the hospitalization date
same as base R.
Slightly different (right join):
merge(x, y, by = "key", all.y = TRUE)
merge(x, y, by = "key", all = FALSE)
(the default)merge(x, y, by = "key", all = TRUE)
I want to merge these two tables where key1
matches exactly, and key2
in y
is less than key2
in x
Multi step process in base R
The previous example returns all matches satisfying the inequality, what if we only want the closest one?
This is available in dplyr
since version 1.1.0
Also can keep only the closest.
I prefer the dplyr in this case, because it is easier to communicate and inspect what is going on.
When working with real data, remember these principles when merging
After the next lecture, we will continue using the register data example to practice merging and defining new variables based on dates and strings.