Correlation commands in Stata

May 28, 2019

For this post, let’s look at some different types of correlation measures and how to perform the same in Stata.

Pairwise (Pearson) Correlation measures the degree to which two variables are related on a linear level. Consequently If as one variable increases, the other also increases, there is a positive Pearson correlation. If one variable decreases as the other increases, there’s a negative Pearson correlation. Ranges between -1 (perfect negative correlation) and 1 (perfect positive correlation). A value of 0 means there is no correlation. A prerequisite is that both variables are continuous and follow a normal distribution as much as possible. In Stata we use the pwcorr command:

sysuse auto, clear
pwcorr mpg weight price, sig

Looking at the result we can see there is a negative correlation between weight and mpg, as well as between price and mpg. In addition there’s the corr command (or correlate if you perform long form).

correlate mpg weight price

Spearman’s Rank Correlation is used to measure the strength and direction of the relationship between two ranked variables. Unlike Pearson it does not assume variables to be normally distributed or linearly related. The variables should be of ordinal, interval, or ratio types. It is particularly useful when data is not normally distributed or when ordinal variables are part of a relation. In Stata there’s the spearman command:

spearman mpg weight

Kendall’s Tau Correlation is similar to Spearman but based on the differences between the number of concordant and discordant pairs of observations. Tends to perform better than Spearman on small sample sizes or when there are many tied ranks. In Stata we have the ktau command:

ktau mpg weight

Canonical Correlation analyzes the relationship between two sets of variables and aims to identify linear combinations (ie. canonical variables) from each set that display maximum correlation. It assumes multivariate normality among variables. It is mainly used when you are exploring the relationship between two sets of variables. Canonical correlation is more complex is more complex that our previous examples and have other use cases since it deals with sets of variables (as opposed to pairs). In Stata we use the canon command:

canon (length weight headroom trunk) (displ mpg gear_ratio turn)

The above snippet relates two sets of variables length, weight, headroom, trunk to displacement, mpg, gear_ratio and turn.

This was a brief introduction to correlation, we’ll go more in depth on these concepts in upcoming posts.


Profile picture

Written by Johan Osterberg who lives and works in Gothenburg, Sweden as a developer specialized in e-commerce. Connect with me on Linkedin

2024 © Johan Osterberg