BIOF 439: Data Visualization using R

Number of credits : 1

Summer 2021

Syllabus

Instructor

Abhijit Dasgupta, PhD

Contact information:

Course information

Prerequisites, if any: None

Course description

This course will demonstrate and practice the use of R in creating and presenting data visualizations. After a short introduction to R tools, especially the tidyverse packages, we will look at good principles for data visualization, examples of good and bad visualizations, and the use of ggplot2 to create static publication-quality graphs. We will also explore modern web-based interactive graphics using the htmlwidgets packages as well as dynamic graphics and dashboards that can be created using flexdashboard and Shiny. We will explore ways in which bioinformatics data can be presented using static and dynamic visualizations. Finally, we will use RMarkdown and several packages to develop web pages for presenting data visualizations as self-explanatory, and possibly interactive, storyboards.

Course materials

All course materials (lectures, videos, homework, discussions) will be available on the class Canvas site.

Learning Materials

Required and Recommended Texts: There are no required texts for this class. However, the following texts, freely available online, will be used for reference:

  1. R for Data Science [R4DS] by Hadley Wickham and Garrett Grolemund (available online)
  2. Principles of Data Visualization [PDV] by Claus O. Wilke (available online)
  3. Data Visualization: A Practical Introduction [DV] by Keiran Healy (available online)

Required Journal Articles: There are no required journal articles for this class

Course Goals

When you complete the course successfully, you will be able to:

  • Understand principles of good data visualization; avoid poor or inappropriate data visualization
  • Practical short introduction to R to enable data visualization; Manipulating data to enable good visualizations
  • Appropriate use of color, symbols and small multiples
  • Static and dynamic data visualizations
  • Using the web as a presentation medium

Communication

This class will communicate primarily via Slack.

You will see a channel #spring2021-a. Please join this channel. Please use Slack for broadcasting messages, answering questions and the like. When you ask a question, please ask it under the #general or `#spring2021-a channels, so others can learn as well. I should respond within 24 hours.

The Canvas Discussion forum will be used for graded class discussions.

I will also hold virtual office hours on Zoom, time TBD, one evening a week.

Structure of the course

This course will run for 7 weeks. Of these, there will be instructional material, including videos, lectures, slides, discussion, tutorials and homework, for 6 of the weeks. The seventh week will be dedicated to a culminating project that will be submitted by the end of the seventh week. Your grade will be determined by class participation, i.e., discussions & Slack participation (30%), homework assignments (50%) and the final project (20%).

Detailed course outline

Week 1

Week 2

Readings: R4DS Chapters 4 and 27
Resource: PDV Chapters 2-4

Theme: Descriptive plots

Week 3

Theme: Analytic plots

Week 4

Theme: R for Bioinformatics

Week 5

Theme: Dynamic visualization

Week 6

Theme: Presenting your graphs

Reference: R Markdown: The Definitive Guide by Yihui Xie, J.J. Allaire and Garrett Grolemund (available online)

Week 7

Class presentations and discussion

The Learning Process

I believe in teaching practical methods for using R as a tool in achieving informative data-driven visualizations. As such, this course is opinionated, in that I make certain choices of what parts of R to teach to make things most accessible and useful. The course will be a mixture of didactic lessons, interactive tutorials and exercises, culminating in a final project that brings different aspects of the course together into a single dashboard.

R is a tool to be used, not studied, and so I promote active learning by doing in order to become familiar with R, its advantages and disadvantages, and using R regularly through the course to learn its capabilities to visualize data. Students will be expected to create simple dashboards to show their data story from the first day, thus learning how to apply their learning to their own workflows and work environments.

Methods for students to achieve success

  1. Practice programming and coding with R
  2. See high quality online examples provided by members of the R community and learn
  3. Participate in class discussions on Slack
  4. Determine a target visualization they would like to create for presentation to their labs and work towards creating that.

Time commitment Daily practice for even 30 minutes is good, but for particular class work I don’t expect more than a couple of hours a week.

Students can be successful in this course through following the teaching materials, participating in discussions on Slack, and practice. R is a language in the same way that French or Japanese is a language (you’re just talking to a computer), and so the only way to retain the knowledge gained in this class is to use it. The exercises and tutorials are meant to get you used to using R for different purposes, so please do them diligently.

This course should take around 4-6 hours of time weekly, depending on the week.

Etiquette

The most important thing is to be polite, considerate and empathetic in all communications and discussions. There are different levels of knowledge about R in this class, and so some questions may appear trivial to some but are essential for others. Be kind, and if you can help a classmate, do so with grace and civility. The class learns best if we all help and support each other.

Policies

Academic Policies

This course adheres to all FAES policies described in the academic catalog and student handbook, including the Academic Integrity policy listed on page 11 of the academic catalog and student handbook. Be certain that you are knowledgeable about all of the policies listed in this syllabus, in the academic catalog and student handbook, and on the FAES website. As a student in this program, you are bound by those policies.

Guidelines for Disability Accommodations

FAES is committed to providing reasonable and appropriate accommodations to students with disabilities. Students with documented disabilities should contact Dr. Mindy Maris, Assistant Dean of Academic Programs.

Dropping the Course

Students are responsible for understanding FAES policies, procedures, and deadlines regarding dropping or withdrawing from the course or switching to audit status.

Harassment

FAES adheres to the NIH’s harassment policies, which can be found at the following link: https://hr.nih.gov/working-nih/civil/statement-workplace-harassment Faculty and students in FAES courses are responsible for being familiar with the NIH’s harassment policies and adhering to them.

Attendance

It is in your best interest to use, utilize, question and understand all the instructional material provided, and to submit questions and homework in a timely manner. Since this course is completely asynchronous, there is no attendance required at particular times.

Participation

Participation will be judged through the assigned discussions as well as through activity on Slack.

Assignment Submission

Assignment submission is through Canvas. Each submission will consist of a R Markdown file and the corresponding HTML file. Both are required. Just submitting the R Markdown doesn’t let us see the results easily, and just submitting the HTML doesn’t let us evaluate your code. If you have trouble knitting the R Markdown to HTML, let me know and I can help. If it’s really impossible and you’re tearing your hair out, reach out to me at least by Saturday so I can see if (a) I can help, or (b) I can see if reasonable accommodation can be made. The latter will be a rarity, generally.

Due Dates

Homework is assigned at 10am each Monday and is due by 11:59pm the following Sunday.

Late Submission Policies

No late submissions of homework or discussion are allowed. However, for homework, I will only use the top 4 scores for your grade, so you will have the option of not submitting or doing poorly on 2 of them.

Step-by-Step Guidelines for Submitting Assignments:

The guidelines for submitting assignments will be posted as a screencast during the first week of class.

Expectations for instructor’s feedback on assignments:

We will get your assignment grades and feedback to you within a week of submission.

Major Assignments

Grades will be based on the following requirements:

  1. Homeworks for each week are due Sunday at 11:59pm (50%)
    • No late homeworks
    • We’ll have 6 homeworks, I’ll score the top 4 for grade
  2. Final project: A R Markdown report/presentation demonstrating an end-to-end data analysis in R using your own data, from data ingestion to munging to analyses and graphics, with a brief introduction and conclusion (30%)
  3. Class participation (20%): Discussion topics in weeks 2-6

Final project

  • Create a R Markdown document or presentation
  • Use your own data, or data available on the web (legally)
  • Show me that you can
    • import data into R
    • manipulate (munge) the data
    • perform some analysis on the data
    • create a visualization
    • create a report in R Markdown
  • 5 minute lightning talks that can be recorded using Quicktime or Screencastify, which requires the Chrome browser