Lecture 2 Practicum

Read and load a data set in Excel

We will read a breast cancer data set available at http://www.araastat.com/BIOF339_PracticalR/Lectures/lecture2_files/BreastCancer.xlsx. This is an Microsoft Excel file of 569 subjects with breast tumor biopsies. The objective of this data set and study was to predict the class of tumor (malignant/benign) from several tumor characteristics.

Use the function download.file to download this file to your computer, specifying where you save it.
Install the package readxl from your favorite CRAN mirror
Read the file in to R using the read_excel function from the readxl package.

Data Exploration

Find the structure of this data set
Find the dimension of this data set
Create new variables that are the differences between the worst recorded and mean values of each predictor (radius, texture, perimeter, etc) as new columns in the data
Create two subsets based on tumor status
In each subset, compute summaries of the different mean variables

Notes

If you have SAS or SPSS data sets then you can use the haven package to load them into R. You may have to install it.

Lecture 2 Practicum

Abhijit Dasgupta

September 19, 2018

Read and load a data set in Excel

Data Exploration

Notes