Let's look at two statistical concepts commonly used to describe a distribution - skewness and kurtosis.
Skewness is a measurement of the asymmetry of the probability distribution. Simply put, skewness gives an indication of the direction and amount by which a dataset is stretched or squashed.
- If skewness > 0, the data are spread out more to the right of the mean than to the left.
- If skewness < 0, the data are spread out more to the left.
- If skewness is close to 0, the data are roughly symmetrical.
Kurtosis is a sort of measure of the tails of the probability distribution. In other words how weighty the tails of the distribution are compared to a normal distribution.
- If kurtosis > 3, the distribution has heavier tails with a sharper peak than the normal distribution with more outliers.
- If kurtosis < 3, the distribution has lighter tails with a flatter peak than the normal distribution with fewer outliers.
- If Kurtosis is close to 0, the distribution is approximately normal.
Let's look at some examples in Stata. The summarize command along with the detail option will suffice for this:
sysuse auto, clear
summarize mpg, detail
Looking at this we can see that we get a skewness of 0.9487176 and a kurtosis of 3.975005. Going back to our previous reasoning this would mean that we have a slightly positively skewed distribution with a bit of overweight on the tails. In any case, both numbers are still rather close to 0 and 3 respectively which would categorize as a perfectly normal distribution so not that far off.
Let's look at another example, this using the nlsw88 dataset.
sysuse nlsw88, clear
summarize wage, detail
Here we get some more interesting numbers, with a skewness of 3.096199 and 15.85446 for kurtosis. This would indicate a quite positively skewed distribution with very heavy tails.