Profile picture

Written by Johan Osterberg who lives and works in Gothenburg, Sweden as a developer specialized in e-commerce. Connect with me on Linkedin

  1. April 09, 2020

    In this post we look at how you would import a R dataset into Stata. In particular we will export the happy dataset which is in the…

  2. July 27, 2019

    Doing analysis of variance (often refered to as ANOVA) in Stata is easy. Let’s begin by finding a suitable dependent variable which should…

  3. July 03, 2019

    On this blog i frequently feature examples from Stata, but also from SPSS, Microsoft Excel and R Studio. They are all are popular software…

  4. June 16, 2019

    Let’s look at an alternate way of performing a chi-square test in Stata, this time using the tabi command. First, let’s load up the auto…

  5. June 14, 2019

    For this post let’s look at simple linear regression in Stata. As discussed previously, simple regression basically involves one independent…

  6. June 13, 2019

    For this tutorial I have imported data consisting of e-commerce orders from an Excel file into Stata and then saved it as a dataset. So we…

  7. June 11, 2019

    In the previous post we looked at regression in Stata by exploring two commands, correlate and pwcorr. In this brief post we’ll explore…

  8. June 10, 2019

    In this post we’ll briefly be looking at regression in Stata by exploring two commands, correlate and pwcorr. The difference between these…

  9. June 08, 2019

    The list command is one of the most useful commands in Stata when inspecting data. In this post we’ll look at some ways we can list the…

  10. June 06, 2019

    In another post we looked at analyzing a git repository by the use of external software. This is of course possible to do with Stata as well…

  11. June 05, 2019

    Let’s look at two statistical concepts commonly used to describe a distribution - skewness and kurtosis. Skewness is a measurement of the…

  12. June 04, 2019

    In a recent post we looked at listing percentiles of a variable. When it comes to getting an overview of a variable perhaps the simplest and…

  13. June 03, 2019

    Sometimes you want to display the percentiles of a variable to get an idea of how values are distributed. One way of achieving this is by…

  14. June 01, 2019

    In this post we’ll look at some tips to get the most out of the Stata help. In order to use Stata effectively it’s crucial to get a good…

  15. May 30, 2019

    To identify unique values of variables in Stata one option is to use the levelsof command. From the Stata documentation: levelsof serves two…

  16. May 29, 2019

    In order to get the unique values of a variable (for example how many times an identifier occurs among observations) there are a few…

  17. May 28, 2019

    For this post, let’s look at some different types of correlation measures and how to perform the same in Stata. Pairwise (Pearson…

  18. May 27, 2019

    In this post we’ll look at some Stata tips that might be useful if you’re coming from a SQL background (like myself). This will likely be a…

  19. May 26, 2019

    The net command in Stata is a tool that allows users to install and manage user-written packages from the internet. With net, it’s possible…

  20. May 25, 2019

    Bar charts are another graphing tool that can be used for example when exploring relationships between variables. Let’s look at an example…

  21. May 24, 2019

    Local smoothing is a smoothing or generalization technique to further enhance the visual representation of a relationship between two…

  22. May 23, 2019

    In the last post we looked at linear fit in order to visualize relationships between two variables. For instance we can explore the…

  23. May 22, 2019

    In a another post we talked about scatter plots and the possibility of overlaying multiple types of plots on top of eachother. This is…

  24. May 21, 2019

    Scatter plots are great visualization tools, especially for smaller datasets. When exploring relationships between two variables, scatter…

  25. May 20, 2019

    Just as with commands such as describe or summarize, drawing pie charts can be useful when familiarizing with a new dataset (or for…

  26. May 19, 2019

    The webuse command is used to load datasets directly from the Stata website. This is convenient as you dont have to download datasets…

  27. May 18, 2019

    The bysort command in Stata is used as a prefix before other commands, and it allows you to perform those commands within groups of…

  28. May 17, 2019

    Running a one-sample t-test in SPSS is very straight forward. For the purpose of this example, let’s use the demo.sav training dataset. From…

  29. May 16, 2019

    Effectively handling missing data in Stata largely comes down to dealing with how Stata treats missing values to begin with. Stata treats…

  30. May 13, 2019

    In this tutorial we’ll look at performing a chi-square test in Stata. We’ll use the nlsw88 dataset for this, so In order to perform a chi…

  31. May 12, 2019

    In this quick tutorial we’ll look at performing a chi-square test on a single sample. More specifically, we’ll look into performing a…

  32. May 11, 2019

    The notes command is used for attaching notes to a variable. Can be useful in when it might be hard to interpret a variable, or when any…

  33. May 10, 2019

    Looping in Stata is mainly achieved by use of the foreach command. Using the nlsw88 dataset we’ll generate a new variable with observations…

  34. May 09, 2019

    The codebook command in Stata is useful for initial data exploration. Like tabulate or inspect it provides a quick overview of a dataset…

  35. May 08, 2019

    Lets look at some useful tips for handling missing values in Stata, as those can cause a lot of problems if not taken care of properly when…

  36. May 07, 2019

    The two main commands used in Stata when working with strings are tostring, destring, encode and decode. For this example, let’s load the…

  37. May 06, 2019

    Instead of using the split command for string manipulation we can use the word function. Looking at a previous example of transforming the…

  38. May 05, 2019

    Let’s look at another example of recoding in Stata, this time using the split command. For this example we’ll use the variable make in the…

  39. May 04, 2019

    All the time when using the Stata command editor or console (or whatever you wanna call it) I find that I would want to be able to use some…

  40. May 02, 2019

    Performing a Kernel density estimation in Stata is a simple task. In particular it can be visualized by way of a kernel density plot which…

  41. May 01, 2019

    Let’s say you want to make some temporary in-memory manipulations of your data without having to reload the whole dataset in order to get…

  42. April 30, 2019

    Dropping observations in Stata is easy. Looking at the nlsw88 training dataset for instance, let’s say we’d want to narrow the survey to…

  43. April 29, 2019

    Creating binary variables in Stata is useful for many purposes, for example if you quickly need to get an overview of a variable with a…

  44. April 28, 2019

    Sometimes you need to split a variable into groups. There are several ways to achieve this in Stata, in this post we’ll use the egen command…

  45. April 26, 2019

    As an alternative to dropping variables, it’s possible to choose which ones to keep in a dataset. You can use the Variables Manager in Stata…

  46. April 24, 2019

    In this post we’ll continue looking at factor variables, and in particular how they can interact with continuous variables. Interactions can…

  47. April 23, 2019

    If you have two (or more) datasets that you want to combine, this can be achieved by using either the append or merge commands. The main use…

  48. April 22, 2019

    Dropping values in Stata is achieved by the drop command. Let’s load the auto training dataset and create a new variable: That’s just a new…

  49. April 21, 2019

    In Stata categorical or binary, indicator, dummy variables (used interchangably) are treated as factor variables. Examples of categorical…

  50. April 20, 2019

    In a previous post we looked at performing basic graphic visualizations in Stata. It is also possible to save graphics in a number of…

  51. April 19, 2019

    Generating new variables in Stata is mainly achieved by use of the gen command. The simplest way to use the is by appending the newvar…

  52. April 19, 2019

    Stata offers basic graphic visualization, which can be useful to get grasp of the distribution of a variable. As evertything in Stata, it’s…

  53. April 18, 2019

    I recently re-visited one of my all-time favorite books on software engineering, The Mythical Man-Month by Fred Brooks. The main reason for…

  54. April 17, 2019

    When working with a continous variable such as income it is often needed to get an overview its distributional properties. Let’s look at the…

  55. April 16, 2019

    For the purpose of this tutorial, I’m gonna be using the sample data set demo.sav, available under installdir/IBM/SPSS/Statistics/[version…

  56. April 16, 2019

    Let’s look at some examples of recoding variables in Stata. First off we’ll recode a numeric variable into categories using the auto dataset…

  57. April 15, 2019

    Recoding variables in Stata is very simple. It is useful in many scenarios, for instance where you would want to merge a variable containing…

  58. April 14, 2019

    In a previous post we looked briefly at the summarize command, let’s look a bit deeper. When opening a new dataset it’s a great way to get…

  59. April 13, 2019

    Some of the most basic utility commands to summarize, describe and get an overview of your data are When opening a new dataset it’s crucial…

  60. April 13, 2019

    In this short post we’ll look at importing data from an Excel file into Stata. For this example we’ll use some training data from another…

  61. April 12, 2019

    When working with long commands sometimes you want to split them into several lines in your do-files, for brevity or readability (or…

  62. April 12, 2019

    Using do-files (a text file with the .do extension) is the preferred way of working in Stata. They are simply text files containing…

  63. April 11, 2019

    In a previous post we looked at working with log files in Stata. As discussed in that post, Stata files take the file extenstion .smcl. If…

  64. April 11, 2019

    Log files in Stata can be useful if you want to keep track of your work over time, simply because it enables you to save your session for…

  65. April 10, 2019

    In this post we’ll have a look at the table command which can be used to visualise summary statistics by tabulating variables against one…

  66. April 09, 2019

    Tabulating and summarizing are common activities when exploring a new dataset. Tabulating can be useful when laying out categorical…

  67. April 08, 2019

    The Pragmatic Programmer: From Journeyman to Master is a classic software development book written by Andrew Hunt and David Thomas. First…

  68. April 05, 2019

    One neat feature of Stata is that you can load and work with online datasets. For example on the Stata website there are numerous links to…

  69. April 03, 2019

    In order to browse or edit data cells in Stata, the browse and edit commands are the most commonly used. The difference between those two…

  70. March 30, 2019

    Whenever you have modified a dataset in Stata that you want to save you need to do so explicitly, otherwise it will just be cleared from…

  71. March 29, 2019

    Stata is a powerful statistics package, that has been around since the mid ‘80-s or so. And it sure looks like it! That was my first…

2024 © Johan Osterberg