We will read a breast cancer data set available at http://www.araastat.com/BIOF339_PracticalR/Lectures/lecture2_files/BreastCancer.xlsx. This is an Microsoft Excel file of 569 subjects with breast tumor biopsies. The objective of this data set and study was to predict the class of tumor (malignant/benign) from several tumor characteristics.
Use the function download.file
to download this file to your computer, specifying where you save it.
Install the package readxl
from your favorite CRAN mirror
Read the file in to R using the read_excel
function from the readxl
package.
Find the structure of this data set
Find the dimension of this data set
Create new variables that are the differences between the worst recorded and mean values of each predictor (radius, texture, perimeter, etc) as new columns in the data
Create two subsets based on tumor status
In each subset, compute summaries of the different mean variables
haven
package to load them into R. You may have to install it.