Read and load a data set in Excel

We will read a breast cancer data set available at http://www.araastat.com/BIOF339_PracticalR/Lectures/lecture2_files/BreastCancer.xlsx. This is an Microsoft Excel file of 569 subjects with breast tumor biopsies. The objective of this data set and study was to predict the class of tumor (malignant/benign) from several tumor characteristics.

  1. Use the function download.file to download this file to your computer, specifying where you save it.

  2. Install the package readxl from your favorite CRAN mirror

  3. Read the file in to R using the read_excel function from the readxl package.

Data Exploration

  1. Find the structure of this data set

  2. Find the dimension of this data set

  3. Create new variables that are the differences between the worst recorded and mean values of each predictor (radius, texture, perimeter, etc) as new columns in the data

  4. Create two subsets based on tumor status

  5. In each subset, compute summaries of the different mean variables

Notes

  1. If you have SAS or SPSS data sets then you can use the haven package to load them into R. You may have to install it.