4.4 Other types of cleaning
There are different functions that you can apply to a dataset for different cleaning purposes. A selection are given below:
distinct()keeps the unique (non-duplicate) rows of a dataset. Usage:dataset %>% distinct()- If you want to keep only rows with complete data, you can invoke
drop_na. Usage:dataset %>% drop_na(). You can modifydrop_naby specifying variables from which you want to drop the missing values. - If you want to convert a value to missing (commonly 99 is used for missing data), then you can use
replace_nawithinmutateto change to missing values on a column-by-column basis. Usage:dataset %>% mutate(var1 = na_if(var1, 99))