Discovering Unsecured Firebase Databases

It is not news that finding leaked Firebase databases is simple. However, despite this problem being well-known, it is not immediately obvious if any further actions have been taken by Google as added mitigation. To put into perspective the severity of this issue, Goon Security conducted a 12-hour GasLeak campaign and analyzed a treasure trove of leaked data. This campaign resulted in over 13,000,000 records including emails, passwords, phone numbers, and private messages from over 450 databases. Goon Security is working with affected organizations to remediate these unsecured database instances.

GasLeak is a very simple tool that abuses a very simple feature of Firebase realtime databases. When creating a Firebase database the user is prompted to select either a “locked mode” or a “test mode”. The “locked mode” option restricts read and write, while “test mode” allows anyone to read and write. A red warning message is displayed when read and write is allowed publicly.

When a user clicks “Dismiss” this warning goes away and seemingly never appears again.

This means when a developer is finished with testing and is ready to move into a production environment, it is up to them to remember that the database is in a “testing mode”. While this is no fault of Google’s, this leaves a huge opportunity for attackers to capitalize on developers. 

EDIT: Google has made very important changes to this feature. Firebase users are emailed reminders about their insecure security rules, and testing mode now lasts only 30 days.

Data is ordinarily accessed via a JSON file, as shown in the screenshot below.

This means that even if the database was open to the public, you would need to bruteforce the filenames.

However, omitting the filename returns the entire database, skipping the need for directory busting.

Now, all GasLeak has to do is abuse this feature by brute forcing subdomains. Goon Security ran a 12-hour campaign using a wordlist generated from the Alexa Top 1 million. 

Gigabytes worth of leaked databases were discovered, and millions of lines of leaked credentials. Some of the interesting finds are outlined below.

PAN numbers / passbooks

Phone call / transaction logs

Appended at the bottom of 4 of these databases, it appears another researcher made their mark as well.

Team:

  • Jaggar Henry
  • Antero Nevarez-Lira
  • Donald Connors
  • Evelyn Griffin

Simple Subdomain Takeover Automation

Subdomain takeovers are not new, as a matter of fact, they have been explained many, many, many. However, even though a spotlight has been shone on this issue, even tech giants such as Uber and Microsoft have felt the heat. With so much coverage, why does this continue to be a problem?

The answer is quite simple: DNS records are free, payment plans are not. When a paid SaaS is no longer needed, like an AWS S3 Bucket from a dead project, it is easy to remember to cancel that service to avoid draining the account balance. The CNAME, however, is more likely to be forgotten as there is no immediate extrinsic motivation to remove it. Unless the DevOps team actively audits zone files, this DNS record will be lost with time. This results in a subdomain that is pointing to a third-party service not in control of the root domain. 

When an individual can claim this third-party service, the subdomain is practically theirs, as they can now serve arbitrary content – i.e. subdomain takeover.

To combat this issue, Rupert was developed, a tool to automatically enumerate and fingerprint for subdomain takeovers. This program is built with Python 3 and includes a wrapper for the tool “subfinder” by ProjectDiscovery.

Using subfinder, subdomains are gathered using a variety of techniques ranging from search engine indexing to DNS dumpsters. An HTTP GET request is sent to each of these subdomains, and the response is parsed for takeover fingerprints.

Some of EdOverflow’s fingerprints

This list was compiled from EdOverflow’s “Can I takeover XYZ?” GitHub repository.

Rupert is very simple, but very, very, effective. A request is made to the web application, and if the response contains one of the fingerprints, it is detected as a possible subdomain takeover. Automating this process resulted in over 1,000 unique subdomain takeovers. Plenty .gov’s, .edu’s, Fortune 500s, and news sites.

Example domains helped by this tool:

  • Hawaii.gov
  • Intel.com
  • Mercedes-benz.com

Team:

  • Jaggar Henry
  • Antero Nevarez-Lira
  • Donald Connors
  • Evelyn Griffin

Discovering Leaked API Tokens in Webpack JS Files

On Wednesday, July 8th, Goon Security released research outlining the problems involving how researchers scan code for leaked API tokens. Several false positive reduction techniques were introduced, and now it is time to go over how they fared in the real world. While in controlled testing these reduction techniques were able to reduce false positives by a significant amount,  an EAN campaign was launched to discern whether or not it is feasible to scan thousands of live JavaScript files for leaks.

Download the tool: https://github.com/Goon-Security/EAN_CLI

Webpack

The EAN campaign was centered around Webpack, a module bundling system for Node.JS. An interesting feature this campaign focuses on, is that environment variables are bundled into the Webpack build. The reason for targeting Webpack specifically is due to environment tokens being embedded into the build. NOTE: This is not a problem with Webpack. Developers should read documentation.

While in this particular excerpt only variables with the prefix “REACT_APP_” are mentioned, environment variables passed directly into the React app with the “–env” flag are also embedded in the build. While the Webpack documentation is very clear about this feature, not all developers read the documentation prior to building an application. This allows for a margin of error where environment variables that are designed to not be hardcoded, are automatically hardcoded by Webpack. It is impossible to know for certain if tokens being leaked in the main build are a result of this feature without consulting the developers directly, however, we can test this hypothesis by using the tool EAN and scanning Webpack bundles for tokens.

This campaign lasted a total of 11 hours and scanned thousands of JavaScript files. Many “google-site-verification” keys were discovered, as well as integrity hashes, which were very easily discarded through trivial parsing. The remaining tokens were a mix of public Captcha keys and public analytics tokens which were not as easily ignored.

The remaining tokens were assessed and sorted by potential impact. Upon analysis, numerous high-impact API tokens were discovered.

Social media tokens

The Facebook Graph API token of a popular news / media site was discovered. Two business pages were linked to the token. Both of these pages have ~1,000,000 likes. 

AWS Buckets

A popular review site published both the AWS_SECRET and AWS_KEY. This token lacked proper permissioning and levied sizable credentials to the user.

Contentful CDN

Many, many, Contentful API keys were found during this campaign. It appears many of these gave the user the ability to download, upload, and delete CDN material, potentially allowing for heavy escalation attacks.

Team:

  • Jaggar Henry
  • Antero Nevarez-Lira
  • Donald Connors
  • Evelyn Griffin

While there are plenty of tokens that are low to no impact, the sheer amount of tokens found by EAN in Webpack builds makes it statistically improbable to not include targets of high interest. The Goon Security team will continue to research token analysis and false positive reduction techniques. This research is, and will continue to be used, to find new and innovative ways to detect numerous vulnerabilities across thousands of platforms to aid in remediation.

Discovering Leaked API keys in Web Applications with Modern Entropy Analysis

Whether by accident or from bad practice, sensitive data such as API keys are being leaked by developers that push hard coded credentials into production environments.

There are many solutions than have been offered, both paid and free, and while these tools have proven their ability to spot potential tokens in GitHub repositories, the lack of false positive reduction makes it troublesome to scan a large volume of files in other territories. JavaScript files, for example, may contain API keys, but current solutions generate so much noise security researchers find themselves sorting through false positives by hand.

The crux of key discovery relies on entropy analysis. Tokens are designed to have high randomness, or entropy, to make them very difficult for an attacker to brute force. Researchers use this to their advantage by parsing files and looking for high-entropy strings. By far the most common way of doing so is by using Shannon’s Entropy algorithm. A mathematician by the name of Claude Shannon discussed the topic of quantifying entropy in 1948, publishing the book “A Mathematical Theory of Communication”. Researchers have applied this math to aid in discovering leaked secrets by setting a requirement level of 4.3. Take for example the comparison of the two strings “qUMOImqy7XeJn4HB96RGLPTYp67wGm39”, and “TotallyNotASecret183”. Using Shannon’s math “qUMOImqy7XeJn4HB96RGLPTYp67wGm39” has an entropy score of 4.625 and passes the test, while “TotallyNotASecret183” has a score of 3.784 and does not meet the requirement of an API token.

While Shannon’s method of quantifying entropy works on a basic level, taking a look at the math and its modern application shows something curious.



When applied, this math uses the probability of a character being chosen from a sequence to quantify entropy. We challenge this thought. Applied to strings this determines redundancy, NOT randomness. Probability is unchanging regardless of the order of the sequence. This is proven by observing the entropy level of “AABBCCDDEEFF” and “BECFDFBDEACA” – both strings contain the same character set, and both strings share a score of 2.584. Shannon’s entropy calculation does not take into account character relationships and therefore misses the mark.

The goal is to discover and remediate as many leaked tokens in the wild as possible. To accomplish this, several false positive reduction techniques have been developed. 

To attain max randomness, tokens have a standard of being mixed-cased. Thinking simply, a generated ASCII letter has a 50% chance of being uppercase (it is, or it is not). This is easily verified by analyzing the uppercase percentage of 1,000,000 freshly generated JWT tokens as well as 1,216 JavaScript files.

On average, ~48.18% of characters in a JWT token are uppercase

On average, ~4%-8% of characters in a JavaScript file are uppercase

Tokens are random by nature, so it is futile to set a rule of how many characters should be uppercase. However, we can set guidelines to what we should reasonably expect. If ~48.18% of characters in a JWT token are uppercase, it is reasonable to expect at least 15 percent of those characters are uppercase. To emphasize this point, we tested 1,000,000 freshly generated JWT tokens – 0.0% had less than 15% uppercase characters

Here is an example of the false positive reduction in action.

The string “abcdefghijklmnopqrst” has an entropy score of 4.32, which passes the original test that many tools use. However, it lacks at least 15% uppercase characters so it is disregarded. The same concept is used to set a max of 75% uppercase characters allowed to be recognized as a valid token.

It is possible to reduce false positives even further by performing lexical analysis on the potential token. The {} published a ranking of letters in the English language and how often they occur in written text. Since programming languages have their base in English, they follow similar standards. This is easily proven by analyzing over 1,000 JavaScript files and counting occurrences of letters.

Based on the graph, the letters “w”, “k”, “x”, “j”, “q”, and “z” are the 6 least common letters that appear. Keeping this in mind, it is important to note that API tokens are designed to be random, so they do not adhere to these lexical patterns. By analyzing JWT tokens, it is clear that characters that appear less in the English language statistically appear more in strings designed for high entropy. In a 32-bit sequence of alphanumeric + symbol, 5.8 of the 6 least common characters appear. 

By using all of these techniques, the amount of potential false positives is so low a Discord bot was built to automate this process a great deal.

The EAN Discord bot takes a user supplied URL and crawls for linked JavaScript files. These files are then crawled for high entropy secrets and sent through false positive reduction and returned by the bot.

The link to the bot can found here:
https://github.com/Goon-Security/EAN-Discord

The next goal is to completely automate this process and scan the Alexa top 1,000 domains. Goon Security’s next release on July 14th, 2020 will explore the success of this process as well as vulnerabilities discovered from complete automation.

Team:

  • Jaggar Henry
  • Antero Nevarez-Lira
  • Donald Connors
  • Evelyn Griffin

Sources:

Shannon, C. E., & Weaver, W. (1949). The mathematical theory of communication. Urbana, IL: University of Illinois Press.

Lexico. (n.d.). from https://www.lexico.com/explore/which-letters-are-used-most