4.4 Other types of cleaning

There are different functions that you can apply to a dataset for different cleaning purposes. A selection are given below:

  1. distinct() keeps the unique (non-duplicate) rows of a dataset. Usage: dataset %>% distinct()
  2. If you want to keep only rows with complete data, you can invoke drop_na. Usage: dataset %>% drop_na(). You can modify drop_na by specifying variables from which you want to drop the missing values.
  3. If you want to convert a value to missing (commonly 99 is used for missing data), then you can use replace_na within mutate to change to missing values on a column-by-column basis. Usage: dataset %>% mutate(var1 = na_if(var1, 99))