Importing R datasets in Stata
April 09, 2020In this post we look at how you would import a R dataset into Stata. In particular we will export the happy dataset which is in the…
Written by Johan Osterberg who lives and works in Gothenburg, Sweden as a developer specialized in e-commerce. Connect with me on Linkedin
In this post we look at how you would import a R dataset into Stata. In particular we will export the happy dataset which is in the…
It can be frustrating when your computer has shut down in the middle of some sql-session and you’ve seemingly lost your work. SSMS keeps…
Doing analysis of variance (often refered to as ANOVA) in Stata is easy. Let’s begin by finding a suitable dependent variable which should…
On this blog i frequently feature examples from Stata, but also from SPSS, Microsoft Excel and R Studio. They are all are popular software…
Let’s look at an alternate way of performing a chi-square test in Stata, this time using the tabi command. First, let’s load up the auto…
In this example again we’ll look at some order data: Here we see a table of orders, grand total for each order and two as of yet empty…
For this post let’s look at simple linear regression in Stata. As discussed previously, simple regression basically involves one independent…
For this tutorial I have imported data consisting of e-commerce orders from an Excel file into Stata and then saved it as a dataset. So we…
One way to visualize a simple regression model is to use a scatter plot. For this example we’re gonna use the nlsw88 dataset and…
In the previous post we looked at regression in Stata by exploring two commands, correlate and pwcorr. In this brief post we’ll explore…
In this post we’ll briefly be looking at regression in Stata by exploring two commands, correlate and pwcorr. The difference between these…
This post describes how to calculate a confidence interval on a sample set of more than 30 observations. For this example let’s start out…
The list command is one of the most useful commands in Stata when inspecting data. In this post we’ll look at some ways we can list the…
In another post we looked at analyzing a git repository by the use of external software. This is of course possible to do with Stata as well…
Let’s look at two statistical concepts commonly used to describe a distribution - skewness and kurtosis. Skewness is a measurement of the…
In a recent post we looked at listing percentiles of a variable. When it comes to getting an overview of a variable perhaps the simplest and…
Sometimes you want to display the percentiles of a variable to get an idea of how values are distributed. One way of achieving this is by…
In this post we’ll look at some tips to get the most out of the Stata help. In order to use Stata effectively it’s crucial to get a good…
Whenever you get your hands on a new dataset you probably want to get as much insight as possible, as quickly as possible so that you know…
To identify unique values of variables in Stata one option is to use the levelsof command. From the Stata documentation: levelsof serves two…
In order to get the unique values of a variable (for example how many times an identifier occurs among observations) there are a few…
For this post, let’s look at some different types of correlation measures and how to perform the same in Stata. Pairwise (Pearson…
In this post we’ll look at some Stata tips that might be useful if you’re coming from a SQL background (like myself). This will likely be a…
The net command in Stata is a tool that allows users to install and manage user-written packages from the internet. With net, it’s possible…
Bar charts are another graphing tool that can be used for example when exploring relationships between variables. Let’s look at an example…
Local smoothing is a smoothing or generalization technique to further enhance the visual representation of a relationship between two…
In the last post we looked at linear fit in order to visualize relationships between two variables. For instance we can explore the…
In a another post we talked about scatter plots and the possibility of overlaying multiple types of plots on top of eachother. This is…
Scatter plots are great visualization tools, especially for smaller datasets. When exploring relationships between two variables, scatter…
Just as with commands such as describe or summarize, drawing pie charts can be useful when familiarizing with a new dataset (or for…
The webuse command is used to load datasets directly from the Stata website. This is convenient as you dont have to download datasets…
The bysort command in Stata is used as a prefix before other commands, and it allows you to perform those commands within groups of…
Running a one-sample t-test in SPSS is very straight forward. For the purpose of this example, let’s use the demo.sav training dataset. From…
Effectively handling missing data in Stata largely comes down to dealing with how Stata treats missing values to begin with. Stata treats…
In this tutorial we’ll look at performing a chi-square test in Stata. We’ll use the nlsw88 dataset for this, so In order to perform a chi…
In this quick tutorial we’ll look at performing a chi-square test on a single sample. More specifically, we’ll look into performing a…
The notes command is used for attaching notes to a variable. Can be useful in when it might be hard to interpret a variable, or when any…
Looping in Stata is mainly achieved by use of the foreach command. Using the nlsw88 dataset we’ll generate a new variable with observations…
The codebook command in Stata is useful for initial data exploration. Like tabulate or inspect it provides a quick overview of a dataset…
Lets look at some useful tips for handling missing values in Stata, as those can cause a lot of problems if not taken care of properly when…
The two main commands used in Stata when working with strings are tostring, destring, encode and decode. For this example, let’s load the…
Instead of using the split command for string manipulation we can use the word function. Looking at a previous example of transforming the…
Let’s look at another example of recoding in Stata, this time using the split command. For this example we’ll use the variable make in the…
All the time when using the Stata command editor or console (or whatever you wanna call it) I find that I would want to be able to use some…
Performing a Kernel density estimation in Stata is a simple task. In particular it can be visualized by way of a kernel density plot which…
Let’s say you want to make some temporary in-memory manipulations of your data without having to reload the whole dataset in order to get…
Dropping observations in Stata is easy. Looking at the nlsw88 training dataset for instance, let’s say we’d want to narrow the survey to…
Creating binary variables in Stata is useful for many purposes, for example if you quickly need to get an overview of a variable with a…
Sometimes you need to split a variable into groups. There are several ways to achieve this in Stata, in this post we’ll use the egen command…
It’s possible to create indicator variables from categorical variables as well, by using the tabulate command. Let’s have a look: Here we…
As an alternative to dropping variables, it’s possible to choose which ones to keep in a dataset. You can use the Variables Manager in Stata…
This post will go through the basics of setting up Source{d} on Windows.
In this post we’ll continue looking at factor variables, and in particular how they can interact with continuous variables. Interactions can…
If you have two (or more) datasets that you want to combine, this can be achieved by using either the append or merge commands. The main use…
Dropping values in Stata is achieved by the drop command. Let’s load the auto training dataset and create a new variable: That’s just a new…
In Stata categorical or binary, indicator, dummy variables (used interchangably) are treated as factor variables. Examples of categorical…
In a previous post we looked at performing basic graphic visualizations in Stata. It is also possible to save graphics in a number of…
Generating new variables in Stata is mainly achieved by use of the gen command. The simplest way to use the is by appending the newvar…
Stata offers basic graphic visualization, which can be useful to get grasp of the distribution of a variable. As evertything in Stata, it’s…
I recently re-visited one of my all-time favorite books on software engineering, The Mythical Man-Month by Fred Brooks. The main reason for…
When working with a continous variable such as income it is often needed to get an overview its distributional properties. Let’s look at the…
For the purpose of this tutorial, I’m gonna be using the sample data set demo.sav, available under installdir/IBM/SPSS/Statistics/[version…
Let’s look at some examples of recoding variables in Stata. First off we’ll recode a numeric variable into categories using the auto dataset…
Recoding variables in Stata is very simple. It is useful in many scenarios, for instance where you would want to merge a variable containing…
In a previous post we looked briefly at the summarize command, let’s look a bit deeper. When opening a new dataset it’s a great way to get…
Some of the most basic utility commands to summarize, describe and get an overview of your data are When opening a new dataset it’s crucial…
In this short post we’ll look at importing data from an Excel file into Stata. For this example we’ll use some training data from another…
When working with long commands sometimes you want to split them into several lines in your do-files, for brevity or readability (or…
Using do-files (a text file with the .do extension) is the preferred way of working in Stata. They are simply text files containing…
In a previous post we looked at working with log files in Stata. As discussed in that post, Stata files take the file extenstion .smcl. If…
Log files in Stata can be useful if you want to keep track of your work over time, simply because it enables you to save your session for…
In this post we’ll have a look at the table command which can be used to visualise summary statistics by tabulating variables against one…
Tabulating and summarizing are common activities when exploring a new dataset. Tabulating can be useful when laying out categorical…
The Pragmatic Programmer: From Journeyman to Master is a classic software development book written by Andrew Hunt and David Thomas. First…
I recently read Actionable Agile Metrics for Predictability by Daniel S. Vacanti as I’ve heard several times that it’s one of the most…
One neat feature of Stata is that you can load and work with online datasets. For example on the Stata website there are numerous links to…
In order to browse or edit data cells in Stata, the browse and edit commands are the most commonly used. The difference between those two…
Whenever you have modified a dataset in Stata that you want to save you need to do so explicitly, otherwise it will just be cleared from…
Stata is a powerful statistics package, that has been around since the mid ‘80-s or so. And it sure looks like it! That was my first…