Examples from the register
- Inpatient and outpatient hospitalizations are in different tables, outpatient has only admission date, inpatient also has discharge date
- Drugs are large in number, so drug registers are stored in different files by year
- Dates do not match exactly in different registers, often want to match within a range of dates
Non exact joins
I want to merge these two tables where key1
matches exactly, and key2
in y
is less than key2
in x
Base R
Multi step process in base R
Rolling joins with data.table
The previous example returns all matches satisfying the inequality, what if we only want the closest one?
Tidyverse
This is available in dplyr
since version 1.1.0
Also can keep only the closest.
I prefer the dplyr in this case, because it is easier to communicate and inspect what is going on.
Remember
When working with real data, remember these principles when merging
- What is the key? Is it named something else in the other table?
- Does the key define unique observations? If not (e.g., multiple rows per individual), then you will have a non one-to-one merge.
- Working with dates and ranges is possible, but be careful and check the results carefully
Practical
After the next lecture, we will continue using the register data example to practice merging and defining new variables based on dates and strings.
Link home