Johan Osterberg - Product Engineer

Working with strings in Stata

May 07, 2019

The two main commands used in Stata when working with strings are tostring, destring, encode and decode.

For this example, let’s load the auto training dataset and run the describe command:

sysuse auto, clear  
describe  

Result of describe command

We see that price for instance is of type int, let’s try and change that to string.

tostring price, replace 

The output from that command confirms that the variable price now has been transformed to string, we’ll run describe again just to make sure:

Result of describe command

Now let’s reverse this by transforming price back to numerical value:

destring price, replace  
describe

Result of describe command

The output confirms that all values in the selection are numeric and have been converted to int.

The encode command can also be used to convert string values to numerical. Take the make variable for instance:

encode make, gen(newmake)
tab newmake, nol

Result of encode command

Here we encoded the string variable make and stored it a new variable called newmake. Tabulating that with the nol (no labels) option shows that the string values have been converted to numerical. The command decode works in the opposite way, by decoding the string values into numerical and storing them in a new variable:

decode newmake, gen(newmake2)  
describe  

Result of decode command

Here we can see that the two generated variables are of types numerical (long) in the case of the encode command and back to string in the case of the decode command.


Johan Osterberg

Written by Johan Osterberg who lives and works in Gothenburg, Sweden as a developer specialized in e-commerce. Connect with me on Linkedin