Chapter 1 What is R?

R is the most popular1 open source statistical programming language in the world. It allows you to

  1. read datasets written in a wide variety of formats,
  2. clean and process the data,
  3. derive summaries,
  4. run analytics,
  5. visualize
  6. create automated reports, presentations, websites, dashboards and interactive applications

R is not just a language, but an ecosystem comprising over 15,000 user- and corporation-developed packages or modules, all written in the R language for a variety of purposes. It is a very flexible and customizable language, which is why it is used by an estimated 2 million users worldwide for data analytics. The question R users often ask is not “Can it be done?” but rather “How can it be done?”. R is used in areas as varied as healthcare, economics, forestry, oceanography, pharmaceuticals, artificial intelligence and natural language processing.

Why is R so widely used? Some reasons are:

  • R is open source, so it is accessible to anyone with a computer
  • Since the code in R and all its packages are open, the community of users can help debug it and make it more reliable and robust
  • The R ecosystem is very rich in tools for doing data analytics in particular, so there is almost certainly something available for almost any task
  • The community of R users worldwide is a very strong, well-connected group who are welcoming, ready to help, cooperative and inclusive. Many users find this community to be one of the most attractive things about R
  • R produces really nice customizable visualizations with relatively little effort, which was one of the first reasons for popularity.