# Distributional analysis in Stata

April 17, 2019

When working with a continous variable such as income it is often needed to get an overview its distributional properties.

Let’s look at the variable **wage** in the **nlsw88** dataset (U.S. National Longitudinal Study of Young Women) by using the **inspect** command.

```
sysuse nlsw88, clear
inspect wage
```

In the results window Stata renders a rather primitive histogram of the data showing that there is a skewness to the right. Furthermore this is indicative that wage is not a normally distributed variable.

To get richer detail about the variable wage we can use the **summarize** command and append the option **detail**:

`su wage, detail `

The results window shows the percentile values indicating a large range in hourly wage as suspected. The positive value for the coefficient **skewness** (3.096) again points to the distribution being skewed to the right. The high value for the coefficient **kurtosis** is indicative of longer tails than the normal distribution, and in this case there would be a particularly long thin tail to the right.

To further drive this point home, let’s try out the **sktest** command. This will clarify to which degree the variable wage differs from the normal distribution.

`sktest wage `

The resulting p-values are significantly below **0.05** which means that we can safely reject that the hypothesis that the variable is normally distributed.

Written by **Johan Osterberg** who lives and works in Gothenburg, Sweden as a developer specialized in e-commerce. Connect with me on Linkedin