# Handling missing values in Stata

May 16, 2019

Effectively handling missing data in **Stata** largely comes down to dealing with how Stata treats missing values to begin with. Stata treats missing values as **positive infinity** and will denote them with a period (ie. **.**). Furthermore Stata will evaluate the missing values in any logical operations performed, which is why it is crucial to take special measures to avoid the risk of making false inferences about the data. The simplest but also the most verbose way to handle missing values is to use the **!=**. Some other options include recoding missing values (using the **mvencode** command) or removing them altogether.

Let’s look at some examples using the auto-data dataset:

`sysuse auto`

Now we can use the **misstable** command in order to find missing values in the dataset, like so:

`misstable summarize `

Here we can see that there are 5 missing values for the variable **rep78** (repair records from 1978). As stated earlier Stata will evaluate the missing values to positive infinity, and will ultimately be regarded as greater than any of the numerical values in the prior sequence. One way to mitigate this is to use the **!=** operator.

To see which cars have a higher repair record of 5, type:

`tab make if rep78 >= 5 & rep78 != .`

Another option would be to recode the missing values with the command **mvencode**. This is clearly a more elegant solution even if it might be difficult to come up with a meaningful re-coding strategy. For example, to turn all missing values into **-1**, type

`mvencode *, mv(-1)`

The **mvdecode** command can similarly be used to decode any encoded values back to their original value:

`mvdecode *, mv(-1)`

Written by **Johan Osterberg** who lives and works in Gothenburg, Sweden as a developer specialized in e-commerce. Connect with me on Linkedin