Chapter 1 What is R?
R is the most popular1 open source statistical programming language in the world. It allows you to
- read datasets written in a wide variety of formats,
- clean and process the data,
- derive summaries,
- run analytics,
- visualize
- create automated reports, presentations, websites, dashboards and interactive applications
R is not just a language, but an ecosystem comprising over 15,000 user- and corporation-developed packages or modules, all written in the R language for a variety of purposes. It is a very flexible and customizable language, which is why it is used by an estimated 2 million users worldwide for data analytics. The question R users often ask is not “Can it be done?” but rather “How can it be done?”. R is used in areas as varied as healthcare, economics, forestry, oceanography, pharmaceuticals, artificial intelligence and natural language processing.
Why is R so widely used? Some reasons are:
- R is open source, so it is accessible to anyone with a computer
- Since the code in R and all its packages are open, the community of users can help debug it and make it more reliable and robust
- The R ecosystem is very rich in tools for doing data analytics in particular, so there is almost certainly something available for almost any task
- The community of R users worldwide is a very strong, well-connected group who are welcoming, ready to help, cooperative and inclusive. Many users find this community to be one of the most attractive things about R
- R produces really nice customizable visualizations with relatively little effort, which was one of the first reasons for popularity.