Analyzing git logs in Stata

June 06, 2019

In another post we looked at analyzing a git repository by the use of external software. This is of course possible to do with Stata as well, with the added benefit of that you now can apply the power of Stata. Let’s look at a simple example of how this could be done. First of all start by extracting the git log to a csv-file for further processing:

git log --all --pretty=format:'%H,""%s"",%an,%ae,%ad' > git_log.csv

Here we are extracting and formating the following commit information - commit hash, subject, author name, author email and author date. Your needs may vary, please reference git log and the various format options available in the git documentation.

Next, optionally clean or pre-process the external .csv in your programming language of choice. For instance you might wanna add column headers or format the subject or author date information.

Regardless if you pre-process the csv or not, it should be possible to import the csv right into Stata and start working with your git data:

import delimited git_log.csv, clear

If your csv contains column headers and they are not automatically handled by Stata you might wanna add:

import delimited git_log.csv, varnames(1)

By telling Stata to explicity to map the variable names to the csv column headers you wont have to do it manually. In any case, now your git data should be imported into Stata and you can start running statistical analysis on it.


Profile picture

Written by Johan Osterberg who lives and works in Gothenburg, Sweden as a developer specialized in e-commerce. Connect with me on Linkedin

2024 © Johan Osterberg