GitHub Vulnerability: Deleted Repository Data Remains Accessible

Study Finds: Deleted GitHub Repository Data Remains Accessible Forever

Researchers from Truffle Security have discovered a vulnerability in GitHub that makes repository data permanently accessible. This flaw potentially allows attackers to access sensitive information, such as API keys and secrets, even after users believe they have deleted it. Not only is this issue known to the company, but it is also a fundamental part of the platform’s architecture.

What Is the GitHub Cross Fork Object Reference (CFOR) Vulnerability?

Truffle Security has named the vulnerability GitHub Cross Fork Object Reference (CFOR). The vulnerability occurs when “one fork of a repository can access sensitive data from another fork, including data from private and deleted forks.”

How the Vulnerability Works

The researchers describe the vulnerability using a typical GitHub workflow. A user forks a public repository, makes changes, and then deletes the fork. Logically, data from the deleted fork should become inaccessible, but in practice, it remains available forever, and control over the information is lost.

The study found that data from deleted forks can be found quite frequently. In several popular repositories of a major artificial intelligence company, dozens of valid API keys were discovered, encoded in example files and left in forks even after deletion.

Beyond Deleted Forks: Other Risks

The problem goes beyond just data from deleted forks. When a user creates a public repository and later deletes it, any data added after a fork was created remains accessible through that fork. In other words, all commits from the “parent” repository continue to exist and are accessible through any fork.

Another dangerous scenario involves private repositories. If a private repository is later made public and there is a fork with additional features, data from the private fork can become available to the public. This happens because changing the visibility of the “parent” repository splits the repository network into private and public versions, and data added to the private fork before this change remains accessible.

How Attackers Can Access Deleted Data

To access such data, all that’s needed is the commit hash. Destructive actions in the GitHub repository network remove references to commit data from the standard interface and git operations, but the data itself remains and is accessible if the commit hash is known. Commits can also be found using the GitHub API, making the data even more vulnerable.

GitHub’s Response and User Awareness

GitHub does not hide its architectural decisions and documents them for users. However, many developers, especially beginners, may not realize the scale of the problem.

Key Takeaways and Recommendations

  • The findings are alarming and highlight the need to rotate access keys to prevent data leaks.
  • GitHub, like other version control systems, has architectural features that can lead to unintentional disclosure of sensitive information.
  • It is important to raise user awareness about such vulnerabilities and take steps to protect data.

The research also showed that the issue of retaining data from deleted and private repositories is not limited to GitHub. Similar vulnerabilities may exist in other version control systems as well.

Leave a Reply