Instructions:
I suggest you start working on this week’s assignment using just a R script file, and work out the different things you have to do using just R code.
The submission, of course, has to be a R Markdown file and the corresponding knitted HTML file. You will create this R Markdown file on your own, adding R chunks in it with code copied from your scirpt file. You also have to write in the question numbers and a minimal narrative in markdown as part of the R Markdown file, so that it reads more like a report.
The submission will consist of your R markdown file and knitted HTML file. You have to make sure that the HTML file is properly rendered. Both files are required, and omitting one is an incomplete assignment
Also, please remove any and all template-based text from the R markdown file before you submit it. We don’t want to see the template materials in any submissions.
Let’s look at the breast cancer data set we’be been using.
geom_bar
. How would you modify the dataset so that there is no separate bar for NA’s?
aes(color='HER2.Status')
or aes(fill='HER2.Status')
to the geom_bar
statement. What is the difference in these two choices?)
?geom_bar
as well as section 3.8 of R4DS to find this solution.
diamonds
dataset is included in the ggplot2
package.
x
, y
and z
variables)
The following links provide data for U.S. incidence rates per 100,000, standardized to the 2000 standard U.S. population, for brain, colon, esophageal, lung and oral cancers for the period 1975-2016. These data/HW6 are provided from the SEER program.
Our goal is to create a single graphic showing the patterns of incidence rates over this time period for the 5 cancers, something like the one below:
brain
, colon
, esophagus
, lung
and oral
.
For the next few points, I will refer to the both sexes datasets. You’ll do the same for the male and female datasets.
both_sexes
with the names of the cancer sites
names(brain)
gives the column names for the brain data. You can also change column names using names(brain) <- ...
since this is just a vector. In particular, you can try something like names(brain) <- stringr::str_replace(names(brain), 'both_sexes', 'brain')
Create a new composite dataset by repeatedly using left_join
or inner_join
to add each site-specific dataset to the composite data set. What I mean is, create a dataset joining A and B, then join C to the result, then join D to the result and so on.
pivot_longer
to make a dataset with 3 columns: year, type of cancer and cancer incidence rateCreate 3 plots like the one above, one for all races, one for whites and one for blacks. Assign the ggplot code for each to a name, i.e. something like plt1 <- ggplot(...) + ...
. Display the graph for all races, and create and display a panelled plot where the white and black plots are presented side-by-side. You can use functions from cowplot, ggpubr or patchwork as you like.