After extensive testing, GitHub made it publicly available to scan code for vulnerabilities. Anyone can run a scanner on their own repository and find vulnerabilities before they go to production. The scanner works for repositories in C, C ++, C #, JavaScript, TypeScript, Python and Go.

The scanner is based on CodeQL technology developed by Semmle, which was acquired by GitHub last year. CodeQL is considered the world’s first vulnerability scanner. Beta testing began on GitHub in May 2020. The function is now available to everyone.

How to turn it on

Scanning starts from the Security tab in the repository.

There we click Set up code scanning.

In the next window, we need to select the workflow that we want to use for scanning. The fact is that CodeQL supports the connection of third-party engines. For the standard engine, select “CodeQL Analysis”.

Basically, this workflow can be configured: enable scheduled scanning, scan for each push or pull request, use your own configuration file, launch additional search queries during scanning.

Then click the Start commit button and write a name for the new commit.

Choose to commit to the master branch or create a new branch and launch a pull request.

That’s all. At the end, click the Commit new file or Propose new file button.

After specifying the commit, the vulnerability scanner will analyze your code in accordance with the frequency specified in the workflow.

After activating CodeQL, you can view the results and change the scan parameters.

CodeQL engine

The CodeQL engine searches for potential vulnerabilities using a dictionary of more than 2000 queries. The dictionary is compiled by GitHub and the community of users who have tested the system. This database will be constantly updated, and everyone can add to it individually, simply by editing the configuration file.

The scanning tool is built on the SARIF (OASIS Static Analysis Results Interchange Format) static code analysis standard and supports connecting third-party engines that will work in a single interface. The export of results via unified APIs is also supported.

Since submission in May 2020, more than 12,000 repositories have been scanned (1.4 million passes in total) and more than 20,000 security issues have been identified, including remote code execution (RCE) vulnerabilities, SQL injection and cross-site scripting (XSS).

Developers and maintainers fixed 72% of found vulnerabilities within 30 days of their discovery, before merging the code into the main branch. This is a good result, because according to statistics, less than 30% of found vulnerabilities are fixed within a month after detection.

As a result of beta testing, 132 commits from the community were made to the open source query dictionary. To enable GitHub users to run third-party tools, agreements have been concluded with more than a dozen developers of security systems and open source tools for static analysis, container scanning and validation of infrastructure as code – this is an approach for managing and describing infrastructure through configuration files, and not through manual editing of configurations on servers or interactive interaction.

In addition to searching for vulnerabilities, GitHub also partners with 24 third-party service providers to find their secrets in their code that cannot be published in the clear, such as access keys. Partners include AWS, Google Cloud, Azure, Dropbox, Slack, Discord, npm, Stripe, and Twilio. Scanning for secrets happens automatically in both public and private repositories.

Code scanning is free for public repositories and is included in the Advanced Security package for GitHub Enterprise (which is a paid service). Some exotic options (list of allowed IP addresses, SAML, LDAP support, etc.) are available only in the paid version.

However, you need to add here, that some open source software authors complain that scanning gives too many false positives.

In theory, automatic checking of all repositories is a good thing, but in practice, it is not very pleasant to be constantly distracted by reports of false “vulnerabilities”, especially in dev repositories or outdated archives that will never go to production. It gets boring very quickly. Some authors say that most of the vulnerabilities in their own code are actually noise or not applicable in a particular case.

That is, the GitHub crawler can trigger all of the symptoms of a condition known as security fatigue.