Working with strings in Stata
May 07, 2019
The two main commands used in Stata when working with strings are tostring, destring, encode and decode.
For this example, let’s load the auto training dataset and run the describe command:
sysuse auto, clear describe
We see that price for instance is of type int, let’s try and change that to string.
tostring price, replace
The output from that command confirms that the variable price now has been transformed to string, we’ll run describe again just to make sure:
Now let’s reverse this by transforming price back to numerical value:
destring price, replace describe
The output confirms that all values in the selection are numeric and have been converted to int.
The encode command can also be used to convert string values to numerical. Take the make variable for instance:
encode make, gen(newmake) tab newmake, nol
Here we encoded the string variable make and stored it a new variable called newmake. Tabulating that with the nol (no labels) option shows that the string values have been converted to numerical. The command decode works in the opposite way, by decoding the string values into numerical and storing them in a new variable:
decode newmake, gen(newmake2) describe
Here we can see that the two generated variables are of types numerical (long) in the case of the encode command and back to string in the case of the decode command.
Written by Johan Osterberg who lives and works in Gothenburg, Sweden as a developer specialized in e-commerce. Connect with me on Linkedin