Johan Osterberg - Product Engineer

Getting started with Stata

March 29, 2019

Stata is a powerful statistics package, that has been around since the mid ‘80-s or so. And it sure looks like it! That was my first impression anyway. Looks can be deceiving though, and as soon I started using this application I soon disregarded this initial reaction, and have since come to appreciate the simplicity of the legacy look and feel of the GUI.

There are evaluation trials or short-term student licences of Stata, personally I downloaded mine off the website of a university course I enrolled in. In general Stata has similarites to SPSS or even Excel for that matter. The interface used to type code is basically a CLI which I find appealing as opposed to the GUI-heavy SPSS for example. It also boots superquickly which is nice compared to SPSS.

Installing Stata is straight forward, just follow the instructions. The command line interface used input code is docked at the bottom of the window. In order to find more info about a particular command just type help command_name, for instance help pwcorr which gives information about the pairwise correlation command. This opens a new window with all possible options and flags available to append to this command.

In order to try out some commands on real data, load one of Statas sample data-sets, in this case i will use one that goes by the alias of auto, like so: sysuse auto. Now this data set is loaded into memory and will be availble for queries. Now, to browse the data set just type browse and a new window will open with the data laid out in tabular form. To just view selected columns, type browse make weight length and just these columns will be rendered.

To get a brief overview of data set, use the summarize command: su, which will render a summary in the main output window. Other similarly useful commands are tab and desc.

For the purpose of performing something more practical, for instance the regress command, just type regress price weight. In the resulting output we can see that weight is positively correlated to price by 2.044063 units. The p-value is at 0.000 which indicates that the variable is statistically significant.

More on the regress command in another post.

Johan Osterberg

Written by Johan Osterberg who lives and works in Gothenburg, Sweden as a developer specialized in e-commerce. Connect with me on Linkedin