Factor variables in Stata

April 21, 2019

In Stata categorical or binary, indicator, dummy variables (used interchangably) are treated as factor variables. Examples of categorical are industry workers or members of a race which all contain several categories (ie. black, white, asian). In the case of indicator variables, there are only two categories denoted by 0 and 1, meant to represent a characteristic such as if respondent lives in south or if a car is foreign. Stata treats categorical variables as factor variables and in operations they are prefixed with an i, such as:

sysuse nlsw88, clear summarize i.south

The results are shown for the southerners in the sample (1), the same result you would get with just summarize south. The baselevel is 0 and to show that, use allbaselevels:

summarize i.south, allbaselevels

Summarize incl. baselevel

To get statistics for the baselevel (ie, south = 0) use:

summarize ib1.south

Perhaps more interesting is to get statistics for all levels:

summarize ibn.south

Summarize all levels


Profile picture

Written by Johan Osterberg who lives and works in Gothenburg, Sweden as a developer specialized in e-commerce. Connect with me on Linkedin

2024 © Johan Osterberg