4.4 Other types of cleaning
There are different functions that you can apply to a dataset for different cleaning purposes. A selection are given below:
distinct()
keeps the unique (non-duplicate) rows of a dataset. Usage:dataset %>% distinct()
- If you want to keep only rows with complete data, you can invoke
drop_na
. Usage:dataset %>% drop_na()
. You can modifydrop_na
by specifying variables from which you want to drop the missing values. - If you want to convert a value to missing (commonly 99 is used for missing data), then you can use
replace_na
withinmutate
to change to missing values on a column-by-column basis. Usage:dataset %>% mutate(var1 = na_if(var1, 99))