Open source tool called Gitrob trawls the Github repositories for sensitive data

Security researcher and member of SoundCloud security team, Michael Henriksen has developed a open source command line tool that can crawl the GitHub repositories and reveal sensitive information back to him.

Henriksen was tasked by SoundCloud to come up with creating a system that will constantly check the company’s GitHub organizations (i.e. repositories) for unintentionally leaked sensitive information. Henrikson did just that.

He has developed an open source, command line tool that can be used for occasional checks of the same nature both by companies’ security personnel and by professional penetration testers looking for an easy way into a target organizations’ networks.

Developers generally like to share their code with many of them do so by open sourcing it on GitHub, a social code hosting and collaboration service. Many companies also use GitHub as a convenient place to host both private and public code repositories by creating GitHub organizations where employees can be joined.

Sometimes employees might publish things that should not be publicly available.  They may publish sensitive information such as credentials, private keys, secret tokens, and so on,  Such things can be harvested by cyber criminals and in turn they can directly compromise the system of the company that owns that particular repository. This can happen by accident or because the employee does not know the sensitivity of the information.

Henriksen’s tool Gitrob makes it easy to search all the public repositories of a company’s GitHub organization, as well as all the public repositories of the organization’s members (the company’s employees).

How it works

Gitrob first starts collecting all public repositories of the organization itself. It then goes on to collect all the organization members and their public repositories, in order to compile a list of repositories that might be related or have relevance to the organization.

When the list of repositories has been compiled, it proceeds to gather all the filenames in each repository and runs them through a series of observers that will flag the files, if they match any patterns of known sensitive files. This step might take a while if the organization is big or if the members have a lot of public repositories.

All of the members, repositories and files will be saved to a PostgreSQL database. When everything has been sifted through, it will start a Sinatra web server locally on the machine, which will serve a simple web application to present the collected data for analysis.

Henriksen tested the tool against a number of GitHub organizations belonging big and small firms and found surprising results. “The tool found several interesting things ranging from low-level, to bad and all the way to company-destroying kind of information disclosure,” he noted, adding that he notified the companies in question of this so that they can remove the information in question.

Gitrob can be downloaded from here along with information about how to install and use it.