A note on coding and programming
R does not have a point-and-click interface that you are probably more familiar with from Excel, Word or other computer applications. It requires you to code, i.e. write instructions for the computer to, in the case of R, read, analyze, graph and report on datasets.
R is first and foremost a language. So, instead of thinking that this is some geeky thing that “programmers” and “IT people” do, think of it as learning a language. You will see that, like any language, it has nouns, verbs, adjectives and adverbs, and you can create “sentences” that start with data and end in something useful like a table, graph or document. With a traditional spoken and written language like French, Arabic, Farsi or Japanese, you learn it to be able to interact with people at different posts around the world. With a programming language like R, you will be able to interact with data, to make sense of it, to describe it, and to present it.
Coding
Coding is writing explicit instructions to a very literal, and in some ways, stupid machine. The machine takes our code literally, and will do exactly what you tell it to do in the code. If you are getting unexpected results, it’s almost certainly your code that needs to be checked, not the machine.
R, the language
As we will see, R has many elements of a language.
- Objects: These are the nouns. We will act on objects to create new objects. Each object has a name which we will treat as the nouns in our code.
- Functions: These are the verbs. Functions will act on objects to create new objects.
- The
%>%
operator: This acts like the conjunction “then” to create “sentences” called pipes or chains. - Optional function arguments: These are adverbs which modify the action of the function (verb).
While writing code in R, we should be aware that R is case-sensitive, so mydata
is a different object
than myData
which is also different from Mydata
and My_Data
and MyData
and my_data
, and mean
, which is a function in R, is different from Mean
which is not defined in R.
You have to name all the objects you create in R if you want to see them again. Try and pick a naming system that is simple yet descriptive, rather than
data1
. Two typical conventions that are used are CamelCase and pothole_case. So you could name a dataset of operational budgets for January, 2019 asoperations_budget_2019_jan
orOperationsBudget2019Jan
or really anything you want, as long as it’s clear to you and doesn’t include some forbidden characters like-
,@
,$
which are reserved for other purposes, or doesn’t start with a number.Some people have a system where data objects (which are called
data.frame
ortibble
orvector
ormatrix
) should be capitalized, while function names should not. Data objects should probably be nouns and functions verbs, since that reminds us of their functions. There are different opinions. Some influential ones are here and here. As they say, finding a good name is hard, but often worth the effort.
The ultimate goal for every script file is to create a “story” using the language of R, starting from data to create descriptions, understand patterns through visualization and modeling, and analyzing the data in general. Scripts make this story reproducible, and also transferable to different data sets.
Of course, as with any beginner writer, your coding will be sloppy at first, will suffer many stops and starts and strike-throughs and modifications and throwing things into the proverbial trash. With practice, this will become easier and smoother and more effective and more expressive. This workshop is designed to give you an initial push towards that goal.
So, let’s start this journey.