Distributional analysis in Stata
April 17, 2019
When working with a continous variable such as income it is often needed to get an overview its distributional properties.
Let’s look at the variable wage in the nlsw88 dataset (U.S. National Longitudinal Study of Young Women) by using the inspect command.
sysuse nlsw88, clear inspect wage
In the results window Stata renders a rather primitive histogram of the data showing that there is a skewness to the right. Furthermore this is indicative that wage is not a normally distributed variable.
To get richer detail about the variable wage we can use the summarize command and append the option detail:
su wage, detail
The results window shows the percentile values indicating a large range in hourly wage as suspected. The positive value for the coefficient skewness (3.096) again points to the distribution being skewed to the right. The high value for the coefficient kurtosis is indicative of longer tails than the normal distribution, and in this case there would be a particularly long thin tail to the right.
To further drive this point home, let’s try out the sktest command. This will clarify to which degree the variable wage differs from the normal distribution.
The resulting p-values are significantly below 0.05 which means that we can safely reject that the hypothesis that the variable is normally distributed.
Written by Johan Osterberg who lives and works in Gothenburg, Sweden as a developer specialized in e-commerce. Connect with me on Linkedin