Handling missing values in Stata

May 08, 2019

Lets look at some useful tips for handling missing values in Stata, as those can cause a lot of problems if not taken care of properly when working with any dataset.

  1. Identify: By using the missing() function to identify missing values in variables we count the number of missing values in a variable.
sysuse auto, clear
count if missing(rep78)
  1. Exclude: Use the if condition to exclude missing values when working with data. For instance, calculating the average number of repairs excluding missing values:
su rep78 if !missing(rep78)
  1. Replace: Use the replace command to replace missing values with appropriate values. If there were any missing values in the weight variable we could replace them with the average value, like so:
su weight, meanonly
replace weight = r(mean) if missing(weight)
  1. Drop: Perhaps the most commonly used technique. Use the drop command to remove missing values altogether:

    drop if missing(rep78)
  1. Categorize: For categorical variables we have the possibility to use a specific category to indicate missing values, for instance if there were missing values in the foreign variable we could assign them the value 2:

    recode foreign .a = 2

Profile picture

Written by Johan Osterberg who lives and works in Gothenburg, Sweden as a developer specialized in e-commerce. Connect with me on Linkedin

2024 © Johan Osterberg