Johan Osterberg - Product Engineer

Binary variables in Stata

April 29, 2019

Creating binary variables in Stata is useful for many purposes, for example if you quickly need to get an overview of a variable with a large set of values.

Using the nlsw88 training dataset we’ll generate a variable indicating high income. Let’s say the cut-off point will be $25 per hour:

sysuse nlsw88, clear 
generate high_income = 1 if wage > 25
replace high_income = 0 if wage <= 25
tab high_income

Here we generated a variable high_income with the value 1 for all observations larger than $25 per hour. Then we replaced the newly created variable for all observations with an hourly wage equal to or less than $25.

Result of generating the high-income binary variable

In this example we can see that only 56 observations or roughly 2.5% fall into the high income category.

To achieve the same results with the recode command, we’d do something like this:

drop high_income 
recode wage (min/25 = 0) (else = 1), gen(high_income)
tab high_income  

The same results as tabulating the previous version of the variable should be displayed.

Johan Osterberg

Written by Johan Osterberg who lives and works in Gothenburg, Sweden as a developer specialized in e-commerce. Connect with me on Linkedin