You are expected to edit this R Markdown document, especially the R code chunks. Once you’re done, please change eval=F
to eval=T
in each chunk in order to make the code run. Most of the R chunks are fill-in-the-blanks, with a few left empty for you to completely fill in.
Your submission for this week will comprise 2 files:
my_variable <- 1:5
my_var1able
mean(airquality$wind)
table(iris$Sepal_Length)
data
folder of your RStudio Project for this class.breast_cancer
.# Code to import data file here
__________ <- __::______("data/clinical_data_breast_cancer_modified.csv")
# Check for data types
____(breast_cancer)
# Convert here
breast_cancer$Gender <- as.factor(breast_cancer$Gender)
## You can repeat this for the other variables, making sure you spell them properly. Or you can comment
## proceed as follows:
breast_cancer <- breast_cancer %>%
mutate(Gender = as.factor(Gender),
______= ________(_______),
..... # fill this in with the other variables
)
## Comment out one of the two strategies for your submission
# Convert any new variables here
breast_cancer
match the modifications you madebrca_data
.brca_data <- ____::________('data/clinical_data_breast_cancer_hw.csv')
___(brca_data)
## Add code here to correct any problems in the data set
## There are two approaches here. The first looks at the data and finds out the actual values that aren't Positive or Negative
brca_data1 <- brca_data %>%
mutate(ER.Status = recode_factor(ER.Status, Indeterminate = NA_character_),
mutate(________ = recode_factor(_______, '_________' = NA_character_))
## There is an error in the above code, and a couple of solutions.
## The other is to run the same function on the columns for ER, PR and HER2 status
clean_markers <- function(x){
x <- recode_factor(x, Positive = 'Positive', _______ = '________', .default = NA_character_)
return(x)
}
brca_data2 <- brca_data %>%
mutate(across(c(____, ____, ____), ________))
## Verify that both data sets are identical
all(brca_data1 == brca_data2, na.rm=TRUE)
brca_data3 <- _______(brca_data, ______, ______, _______, _______)
Creating new variables (what dplyr
function will you use for all of these?): ___________
Create a variable giving the TNM status of each patient. The T, N and M statuses are given separately. I want a single variable encoded as, for example, “T2N0M0”. [Hint: The function paste
is your friend]
brca_data <- brca_data %>%
______(tnm_status = ___________(Tumor, Node, Metastasis, sep = ""))
brca_data3$tnm_status <- brca_data$tnm_status
case_when
might help]: - Luminal (ER positive and/or PR positive) - HER2 (HER2 positive) - Basal-like (ER, PR and HER2 negative)brca_data3 <- brca_data3 %>%
mutate(mol_cat = case_when( # fill in the next 4 lines
)
Vital.Status
) or the time of last contact if they are alive. This is a common computation for survival analysis studies, and is called the overall survival time. [Hint: the function ifelse
might be useful]## Fill this in yourself. Time to start leaving the nest
brca_data <- brca_data %>%
mutate(event_time = ___________(Vital.Status == 1, _____________, _______________))
brca_data3$event_time <- brca_data$event_time
Save the cleaned breast cancer dataset as brca_cleaned
. You can save this to your computer using saveRDS(brca_cleaned, file="<a filename of your choice>.rds")
. We’ll be using this dataset again when we do plots and modeling.