From SQL to Stata part 1

May 27, 2019

In this post we’ll look at some Stata tips that might be useful if you’re coming from a SQL background (like myself). This will likely be a recurring theme, therefore this is gonna be part one. In any case, let’s start off with the basics:

  1. Data import: Similarly to how you would import data into your SQL database by using commands such as LOAD DATA INFILE or BULK in Stata there is the import delimited command that can be used to load data from an external csv-file.
  2. Data export: In Stata there is the export command which is equivalant to the bcp utily in SQL Server or SELECT INTO OUTFILE in MySQL.
  3. Data selection: In SQL, there’s the SELECT statement used to fetch data from a database. In Stata though the command with most similar purpose is list. As an example, list in 1/10 in Stata is equivalent to the SQL statement SELECT * FROM table LIMIT 10.

SQL:

SELECT column FROM table;

Stata:

list variable
  1. Data Filtering: The WHERE clause corresponds to Stata’s if condition used in commands. The main use of both is to filter data based on certain conditions, as in

SQL:

SELECT * FROM table WHERE column = 'value';

Stata:

list if variable == 'value'
  1. Data clustering: In similarity to the SQL clause GROUP BY which is used to cluster rows with same values into aggregated data, in Stata there is the bysort command. From the Stata docs: by and bysort are really the same command; bysort is just by with the sort option. The bysort command is especially useful in combination with commands such as summarize.

SQL:

SELECT column1, AVG(column2) FROM table GROUP BY column1;

Stata:

bysort variable1: summarize variable2
  1. Data merging: In SQL there are different join types (INNER JOIN, LEFT JOIN etc.) used combine rows from two or more tables. In Stata, the merge command accomplishes a similar task based on a common varible inbetween two datasets:

SQL:

SELECT column FROM table1 INNER JOIN table2 ON table1.id = table2.id;

Stata:

use dataset1, clear
merge 1:1 id using dataset2.dta

There you have it, some basic commonalities between SQL and Stata. In upcoming posts we’ll further explore more similar concepts.


Profile picture

Written by Johan Osterberg who lives and works in Gothenburg, Sweden as a developer specialized in e-commerce. Connect with me on Linkedin

2024 © Johan Osterberg