Lets look at some useful tips for handling missing values in Stata, as those can cause a lot of problems if not taken care of properly when working with any dataset.
- Identify: By using the missing() function to identify missing values in variables we count the number of missing values in a variable.
sysuse auto, clear
count if missing(rep78)
- Exclude: Use the if condition to exclude missing values when working with data. For instance, calculating the average number of repairs excluding missing values:
su rep78 if !missing(rep78)
- Replace: Use the replace command to replace missing values with appropriate values. If there were any missing values in the weight variable we could replace them with the average value, like so:
su weight, meanonly
replace weight = r(mean) if missing(weight)
- Drop: Perhaps the most commonly used technique. Use the drop command to remove missing values altogether:
drop if missing(rep78)
- Categorize: For categorical variables we have the possibility to use a specific category to indicate missing values, for instance if there were missing values in the foreign variable we could assign them the value 2:
recode foreign .a = 2