Johan Osterberg - Product Engineer

Working with datatypes in Stata

June 13, 2019

For this tutorial I have imported data consisting of e-commerce orders from an Excel file into Stata and then saved it as a dataset.


Describing a custom dataset

So we can see Stata got it right as far as the variables ORDERID and STATUS which have correctly been encoded as numeric types, whereas ORDERTOTAL has been encoded as a string. This seems strange as the ORDERTOTAL variable only contains numeric information. In order to rectify, we will make use of the destring command:

destring ORDERTOTAL, replace 

Result of destringing a numeric variable

We can see that the operation completed successfully and that variable in question was re-coded as double, which will be adequate.

What about ORDERDATE? It is currently encoded as a string as well and for sure it would be more usable as some sort of date representation. In Stata there is the date function, however that doesnt handle hours, minutes and seconds. One other option is to use the clock function. This function generates very large values, which is why we have to temporarily store it in a double to begin with and then format for human-readability:

gen double ORDERDATE2 = clock(ORDERDATE, "YMD hms")
format ORDERDATE2 %tc

Listing this, now a datetime value like 2019-06-20 19:55:18.943 will be rendered as 20jun2019 19:55:18 which is good enough in this case.

Johan Osterberg

Written by Johan Osterberg who lives and works in Gothenburg, Sweden as a developer specialized in e-commerce. Connect with me on Linkedin