We Created an Awesome Search Tool for GitHub Repos in Python
Recently the folks at Cloudify created a tool called Surch allowing devs to search GitHub for any string variable. Read on to learn what instigated the creation of such a tool, what the current properties of Surch are, and what the roadmap has planned for future additional features.
Join the DZone community and get the full member experience.Join For Free
Recently, we got a notification from Amazon that one of our secret keys, which was accidentally left in a public GitHub repo, was being abused. Since it wasn't our own team doing the abusing, we decided to put a stop to this happening in the future. So, the idea here was to create a mechanism to search GitHub for any string variable that matched any of our secret keys. Naturally, you can use this to search any string variable.
We looked for some of the other apps out there that can already do this and we found gitrob, a Ruby-based tool which didn't offer many integrations and was a bit heavy on the code. It also doesn't include much documentation or have any sort of community to turn to for help.
So, we decided to build our own tool—and it's called Surch!
What Is Surch?
Surch is an open source Python package that can be installed with command
pip install Surch. Simply run your Surch search in the terminal. If a commit is found with a string matching one of your predetermined strings, you will see the log showing it on the screen and can proceed to manually delete the problematic commit, then notify GitHub to remove it using 'sha commit'.
NOTE: It's up to you if you would prefer to change the credentials, but we would highly suggest it.
When we started talking about how to build this tool, our original thought process went something like this:
- We want one script that simply runs the search on GitHub organizations searching all repos within.
- It needs to have the ability to run on public repos (private repos have passwords and are more complicated to handle).
For our purposes, we added a Jenkins job to automatically run Surch once per day.
By the time our first release came around, it was actually a lot more sophisticated, with the following additional features:
- You can choose to only search a specific repo by adding it to the search string
- You can choose to only search a specific user's repo by adding it to the search string
- Options to include or exclude repos based on your search parameters
- Integrations with PagerDuty for instant notifications
- Vault integration - Vault allows for safe secret or password storage, and we integrated it to automatically update Surch when a secret key is updated
- Integration with Slack
- Integration with AWS Lambda - git would notify Lambda that a commit was made and Surch automatically conducts a search on that specific commit only
- The ability to run searches on private repos
- Automate the handling and fixing of problematic commits
- Install with
pip install Surch
Surch repo REPO_URL -FLAGSto search a specific repo
Surch org ORG_NAME -FLAGSto search all repos in that organization
--incude-repo REPO_NAMEto search only that specific repo in an organization and add the same flag multiple times to search many repos
For more info on this project, see the docs. All contributions are appreciated!
Published at DZone with permission of Haviv Weizman, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.