Johan Osterberg - Product Engineer

Combining datasets in Stata

April 23, 2019

If you have two (or more) datasets that you want to combine, this can be achieved by using either the append or merge commands.

The main use case for the append command is if you have two datasets containing the same variables but with different sets of observations. By using the append command dataset2 will simply be added to dataset1, like so:

use dataset1, clear
append using dataset2

The resulting dataset will now contain all the observations from both dataset1 and dataset2.

The merge command on the other hand is useful if you have two datasets containing different variables that you want to combine into one dataset. Let’s say one common variable commonvar and you want to perform a one-to-one type merge:

use dataset1, clear
merge 1:1 commonvar using dataset2

Note that for each variable that is not present in the other dataset Stata will add missing values to compensate for the lack of observations.


Johan Osterberg

Written by Johan Osterberg who lives and works in Gothenburg, Sweden as a developer specialized in e-commerce. Connect with me on Linkedin