Using the split command in Stata

May 05, 2019

Let’s look at another example of recoding in Stata, this time using the split command. For this example we’ll use the variable make in the auto dataset and use that to populate a new categorical variable called brand (that will only contain the brand names).

sysuse auto, clear
gen brand = ""  
foreach var of varlist make {
    split `var', generate(brand_temp)  
    replace brand = brand_temp2 if brand == ""  
    drop brand_temp*  
}

tab brand

Here we split the make variable into several new variables, with each word from make being a separate variable. Then we replace the brand variable with the first word of the make variable, in case brand is still an empty string. Finally we drop the new variables created by the split command and tabulate to inspect our work.

Tabulate new variable brand


Profile picture

Written by Johan Osterberg who lives and works in Gothenburg, Sweden as a developer specialized in e-commerce. Connect with me on Linkedin

2024 © Johan Osterberg