5.2 Joins
We mentioned earlier that there are several kinds of ways we can join data. The different kinds of joins are described below.
Let’s look at these joins with an example. We have two simulated datasets looking at DOS real estate allocation and staffing. We will look at how much area on average each bureau has given the number of employees
staffing_data <- import('data/Staffing_by_Bureau.csv') %>% as_tibble()
real_estate <- import('data/DoS_Real_Estate_Allocation.csv') %>% as_tibble()
real_estate
## # A tibble: 666 x 4
## Building Bureau Location Size
## <chr> <chr> <int> <int>
## 1 HST Administration (A) 4779 640
## 2 SA2 Administration (A) 4801 1090
## 3 HST Administration (A) 5109 1040
## 4 HST Administration (A) 3717 1620
## 5 SA4 Administration (A) 3940 1390
## 6 HST Administration (A) 3661 1480
## 7 HST Administration (A) 3374 1770
## 8 HST Administration (A) 3387 1940
## 9 SA10 African Affairs (AF) 2605 640
## 10 HST African Affairs (AF) 3573 720
## # … with 656 more rows
staffing_data
## # A tibble: 10,000 x 6
## Bureau Gender Grade Title Name YearsService
## <chr> <chr> <chr> <chr> <chr> <int>
## 1 Protocol (S/CPR) female FS1 Manager Cathy Ca… 13
## 2 Administration (A) male GS-9 Team Me… Jeffery … 13
## 3 Intelligence and Research … male FS-6 Analyst Max Green 11
## 4 Mission to the United Nati… male FS-3 Manager Donald A… 7
## 5 Foreign Missions (OFM) male FS-6 Team Me… Thomas L… 22
## 6 International Narcotics an… male GS-8 Team Me… Joseph A… 12
## 7 Administration (A) male GS-12 Analyst Michael … 6
## 8 Intelligence and Research … male FS-5 Team Me… Jesus Sh… 2
## 9 Science & Technology Advis… male N/A Manager Lawrence… 19
## 10 Administration (A) female FS-8 Team Me… Jennie C… 17
## # … with 9,990 more rows
The strategy is going to be to do a grouped summary of the staffing data to see how many people are in each Bureau, and then join that with the real estate data to compute the average area per employee by Bureau.
staff_summary <- staffing_data %>%
group_by(Bureau) %>%
tally(name = 'Pop')
realestate_summary <- real_estate %>%
group_by(Bureau) %>% summarize(Size = sum(Size))
realestate_summary %>% left_join(staff_summary) %>%
mutate(unit_area = Size/Pop) %>%
arrange(unit_area)
## Joining, by = "Bureau"
## # A tibble: 54 x 4
## Bureau Size Pop unit_area
## <chr> <int> <int> <dbl>
## 1 Global Youth Issues (GYI) 2090 345 6.06
## 2 Policy Planning Staff (S/P) 2420 240 10.1
## 3 Science & Technology Adviser (STAS) 4240 305 13.9
## 4 Foreign Missions (OFM) 4420 311 14.2
## 5 Trafficking in Persons (TIP) 5150 247 20.9
## 6 Medical Services (MED) 6760 308 21.9
## 7 Protocol (S/CPR) 7730 327 23.6
## 8 Administration (A) 10970 454 24.2
## 9 Oceans and International Environmental and Scient… 8420 330 25.5
## 10 Energy Resources (ENR) 10890 369 29.5
## # … with 44 more rows