Johan Osterberg - Product Engineer

Distributional analysis in Stata

April 17, 2019

When working with a continous variable such as income it is often needed to get an overview its distributional properties.

Let’s look at the variable wage in the nlsw88 dataset (U.S. National Longitudinal Study of Young Women) by using the inspect command.

sysuse nlsw88, clear
inspect wage

In the results window Stata renders a rather primitive histogram of the data showing that there is a skewness to the right. Furthermore this is indicative that wage is not a normally distributed variable.

Inspect wages

To get richer detail about the variable wage we can use the summarize command and append the option detail:

su wage, detail 

The results window shows the percentile values indicating a large range in hourly wage as suspected. The positive value for the coefficient skewness (3.096) again points to the distribution being skewed to the right. The high value for the coefficient kurtosis is indicative of longer tails than the normal distribution, and in this case there would be a particularly long thin tail to the right.

Summarize wages

To further drive this point home, let’s try out the sktest command. This will clarify to which degree the variable wage differs from the normal distribution.

sktest wage  

The resulting p-values are significantly below 0.05 which means that we can safely reject that the hypothesis that the variable is normally distributed.

Sktest wages

Johan Osterberg

Written by Johan Osterberg who lives and works in Gothenburg, Sweden as a developer specialized in e-commerce. Connect with me on Linkedin