It is not news that finding leaked Firebase databases is simple. However, despite this problem being well-known, it is not immediately obvious if any further actions have been taken by Google as added mitigation. To put into perspective the severity of this issue, Goon Security conducted a 12-hour GasLeak campaign and analyzed a treasure trove of leaked data. This campaign resulted in over 13,000,000 records including emails, passwords, phone numbers, and private messages from over 450 databases. Goon Security is working with affected organizations to remediate these unsecured database instances.
GasLeak is a very simple tool that abuses a very simple feature of Firebase realtime databases. When creating a Firebase database the user is prompted to select either a “locked mode” or a “test mode”. The “locked mode” option restricts read and write, while “test mode” allows anyone to read and write. A red warning message is displayed when read and write is allowed publicly.
When a user clicks “Dismiss” this warning goes away and seemingly never appears again.
This means when a developer is finished with testing and is ready to move into a production environment, it is up to them to remember that the database is in a “testing mode”. While this is no fault of Google’s, this leaves a huge opportunity for attackers to capitalize on developers.
EDIT: Google has made very important changes to this feature. Firebase users are emailed reminders about their insecure security rules, and testing mode now lasts only 30 days.
Data is ordinarily accessed via a JSON file, as shown in the screenshot below.
This means that even if the database was open to the public, you would need to bruteforce the filenames.
However, omitting the filename returns the entire database,skipping the need for directory busting.
Now, all GasLeak has to do is abuse this feature by brute forcing subdomains. Goon Security ran a 12-hour campaign using a wordlist generated from the Alexa Top 1 million.
Gigabytes worth of leaked databases were discovered, and millions of lines of leaked credentials. Some of the interesting finds are outlined below.
PAN numbers / passbooks
Phone call / transaction logs
Appended at the bottom of 4 of these databases, it appears another researcher made their mark as well.
Subdomain takeovers are not new, as a matter of fact, they have been explained many, many, many. However, even though a spotlight has been shone on this issue, even tech giants such as Uber and Microsoft have felt the heat. With so much coverage, why does this continue to be a problem?
The answer is quite simple: DNS records are free, payment plans are not. When a paid SaaS is no longer needed, like an AWS S3 Bucket from a dead project, it is easy to remember to cancel that service to avoid draining the account balance. The CNAME, however, is more likely to be forgotten as there is no immediate extrinsic motivation to remove it. Unless the DevOps team actively audits zone files, this DNS record will be lost with time. This results in a subdomain that is pointing to a third-party service not in control of the root domain.
When an individual can claim this third-party service, the subdomain is practically theirs, as they can now serve arbitrary content – i.e. subdomain takeover.
To combat this issue, Rupert was developed, a tool to automatically enumerate and fingerprint for subdomain takeovers. This program is built with Python 3 and includes a wrapper for the tool “subfinder” by ProjectDiscovery.
Using subfinder, subdomains are gathered using a variety of techniques ranging from search engine indexing to DNS dumpsters. An HTTP GET request is sent to each of these subdomains, and the response is parsed for takeover fingerprints.
Rupert is very simple, but very, very, effective. A request is made to the web application, and if the response contains one of the fingerprints, it is detected as a possible subdomain takeover. Automating this process resulted inover 1,000 unique subdomain takeovers. Plenty .gov’s, .edu’s, Fortune 500s, and news sites.
The EAN campaign was centered around Webpack, a module bundling system for Node.JS. An interesting feature this campaign focuses on, is that environment variables are bundled into the Webpack build. The reason for targeting Webpack specifically is due to environment tokens being embedded into the build. NOTE: This is not a problem with Webpack. Developers should read documentation.
While in this particular excerpt only variables with the prefix “REACT_APP_” are mentioned, environment variables passed directly into the React app with the “–env” flag are also embedded in the build. While the Webpack documentation is very clear about this feature, not all developers read the documentation prior to building an application. This allows for a margin of error where environment variables that are designed to not be hardcoded, are automatically hardcoded by Webpack. It is impossible to know for certain if tokens being leaked in the main build are a result of this feature without consulting the developers directly, however, we can test this hypothesis by using the tool EAN and scanning Webpack bundles for tokens.
The remaining tokens were assessed and sorted by potential impact. Upon analysis, numerous high-impact API tokens were discovered.
Social media tokens
The Facebook Graph API token of a popular news / media site was discovered. Two business pages were linked to the token. Both of these pages have ~1,000,000 likes.
A popular review site published both the AWS_SECRET and AWS_KEY. This token lacked proper permissioning and levied sizable credentials to the user.
Many, many, Contentful API keys were found during this campaign. It appears many of these gave the user the ability to download, upload, and delete CDN material, potentially allowing for heavy escalation attacks.
While there are plenty of tokens that are low to no impact, the sheer amount of tokens found by EAN in Webpack builds makes it statistically improbable to not include targets of high interest. The Goon Security team will continue to research token analysis and false positive reduction techniques. This research is, and will continue to be used, to find new and innovative ways to detect numerous vulnerabilities across thousands of platforms to aid in remediation.
Whether by accident or from bad practice, sensitive data such as API keys are being leaked by developers that push hard coded credentials into production environments.
The crux of key discovery relies on entropy analysis. Tokens are designed to have high randomness, or entropy, to make them very difficult for an attacker to brute force. Researchers use this to their advantage by parsing files and looking for high-entropy strings. By far the most common way of doing so is by using Shannon’s Entropy algorithm. A mathematician by the name of Claude Shannon discussed the topic of quantifying entropy in 1948, publishing the book “A Mathematical Theory of Communication”. Researchers have applied this math to aid in discovering leaked secrets by setting a requirement level of 4.3. Take for example the comparison of the two strings “qUMOImqy7XeJn4HB96RGLPTYp67wGm39”, and “TotallyNotASecret183”. Using Shannon’s math “qUMOImqy7XeJn4HB96RGLPTYp67wGm39” has an entropy score of 4.625 and passes the test, while “TotallyNotASecret183” has a score of 3.784 and does not meet the requirement of an API token.
While Shannon’s method of quantifying entropy works on a basic level, taking a look at the math and its modern application shows something curious.
When applied, this math uses the probability of a character being chosen from a sequence to quantify entropy. We challenge this thought. Applied to strings this determines redundancy, NOT randomness. Probability is unchanging regardless of the order of the sequence. This is proven by observing the entropy level of “AABBCCDDEEFF” and “BECFDFBDEACA” – both strings contain the same character set, and both strings share a score of 2.584. Shannon’s entropy calculation does not take into account character relationships and therefore misses the mark.
The goal is to discover and remediate as many leaked tokens in the wild as possible. To accomplish this, several false positive reduction techniques have been developed.
On average, ~48.18% of characters in a JWT token are uppercase
Tokens are random by nature, so it is futile to set a rule of how many characters should be uppercase. However, we can set guidelines to what we should reasonably expect. If ~48.18% of characters in a JWT token are uppercase, it is reasonable to expect at least 15 percent of those characters are uppercase. To emphasize this point, we tested 1,000,000 freshly generated JWT tokens – 0.0% had less than 15% uppercase characters.
Here is an example of the false positive reduction in action.
The string “abcdefghijklmnopqrst” has an entropy score of 4.32, which passes the original test that many tools use. However, it lacks at least 15% uppercase characters so it is disregarded. The same concept is used to set a max of 75% uppercase characters allowed to be recognized as a valid token.
Based on the graph, the letters “w”, “k”, “x”, “j”, “q”, and “z” are the 6 least common letters that appear. Keeping this in mind, it is important to note that API tokens are designed to be random, so they do not adhere to these lexical patterns. By analyzing JWT tokens, it is clear that characters that appear less in the English language statistically appear more in strings designed for high entropy. In a 32-bit sequence of alphanumeric + symbol, 5.8 of the 6 least common characters appear.
By using all of these techniques, the amount of potential false positives is so low a Discord bot was built to automate this process a great deal.
The next goal is to completely automate this process and scan the Alexa top 1,000 domains. Goon Security’s next release on July 14th, 2020 will explore the success of this process as well as vulnerabilities discovered from complete automation.
Shannon, C. E., & Weaver, W. (1949). The mathematical theory of communication. Urbana, IL: University of Illinois Press.