Let’s discuss escaping output with APIs, as I’ve found this is an area that’s often overlooked and may come back to bite you. I once found a vulnerability in a popular open-source project that made the unfortunate assumption that API output didn’t need to be escaped, and I’ll tell you that story shortly—but first, let’s look at what escaping is.
What is escaping output?
If you’re a web developer, the first thing you think of when you hear “escaping output” will probably be escaping user input when it’s displayed within HTML - usually by translating special characters into special HTML codes. This instructs the browser to render these characters safely, rather than interpret them as HTML.
For example, if we take the following string:
"Hello, World!" > "foobar"
It’s a perfectly legitimate string that a user may submit into a form field; however, if we try to put that value into the form field, we’ll have a problem:
<input type="text" value=""Hello, World!" > "foobar"">
Which looks like this in the browser:
While this is a safe example, it’s trivial to exploit this to inject some malicious code into the page.
Consider this input:
"><script>alert('Boom! XSS!')</script>
Which produces this HTML:
<input type="text" value=""><script>alert('Boom! XSS!')</script>">
This gives us one of these in the browser:
The issue here is the double-quote (") within the string breaks out of the HTML value attribute, and the angle brackets (< >) are used to close the input tag and open a new script tag.
The solution here is to use output escaping - especially by swapping out those special characters with their HTML Code equivalents:
For example:
& → &
" → "
' → '
< → <
\> → >
We can do this easily in PHP with the htmlspecialchars()
method:
> htmlspecialchars("">");
"><script>alert('Boom! XSS!')</script>
The resulting string is safe to use within HTML - it won’t break out of any HTML attributes or introduce new HTML tags.
But all of that relates to HTML, right?
Do I Need To Escape Non-HTML Output?
It’s fairly common for APIs to return JSON instead of HTML, and you’ll usually build your JSON using a converter. So, do you need to worry about escaping user values?
Let’s take a look.
Consider this PHP:
<?php $output = [ 'output' => "<img src=x onerror=alert('Boom!')>", ];
echo json_encode($output);
We would expect the JSON output to look something like this:
{
"output": "<img src=x onerror=alert('Boom!')>"
}
Which looks like safe JSON, right?
Running it in the browser gives me this:
Ok, so you could argue that this is a content-type issue, and it is really easily solved by adding the following header:
header('Content-Type: application/json');
However, this relies on your application returning the right content type in the header and the browser actually honoring it. Unfortunately, the browser tries to be helpful, and if things get corrupted or broken, there is always the possibility that some HTML will be executed. You’d also have to load the API’s JSON page directly for it to execute, although this could be done inside an IFrame under the right conditions.
So you probably noticed the weirdness \u003C and \u003E
in the above screenshot. Chrome did that automatically when rendering the JSON - I’m not entirely sure why, but I assume it’s a security escaping thing..?
Regardless, here’s the raw output that was sent to the browser:
{ "output": "<img src=x onerror=alert('Boom!')>" }
However, this provides a direct hint as to what we can do to make this JSON safer—and we looked at it earlier!
We can escape the special characters!
We can’t use the HTML character codes we used above, but we can use those HEX sequences to represent the Unicode for the special characters.
This can be done with PHP’s json_encode()
method using these flags:
JSON_HEX_TAG: Converts < and > to \u003C
and \u003E
.
JSON_HEX_AMP: Converts & to \u0026
.
JSON_HEX_APOS: Converts ' (single quote) to \u0027
.
JSON_HEX_QUOT: Converts " (double quote) to \u0022
.
Here’s our new code:
<?php
header('Content-Type: application/json');
$output = [
'output' => "<img src=x onerror=alert('Boom!')>",
];
echo json_encode($output, JSON_HEX_TAG | JSON_HEX_AMP | JSON_HEX_APOS | JSON_HEX_QUOT);
Which gives us this output:
{ "output": "\u003Cimg src=x onerror=alert(\u0027Boom!\u0027)\u003E" }
Now the brackets and quotes have been escaped, so removing the content type header from the code doesn’t affect the output. The output is properly escaped so that no malicious code can be executed.
Wrap up the function and flags in a helper function, and use it on all of your JSON outputs. You’ll have some solid protections in place.
What About Consuming APIs?
Escaping output inside JSON is a common finding I come across when I do my penetration tests, and Burp Suite always gives me a bunch of these reports after a scan of a SPA:
However, it’s worth pointing out that last line - which I’ve highlighted.
“However, the issue might be indirectly exploitable if a client-side script processes the response and embeds it into an HTML context.”
There are two sides to this:
First, if you’re consuming data from your own API and rendering it on the page, you need to be aware of the data you’re sending to your front end and how you’re rendering it. Second, if you’re consuming APIs from other providers, how are you handling the data they’ve given you?
Consuming Your Own APIs
When you’re consuming your own APIs within something like a SPA, it’s tempting to put all of your escaping on the server side and just render what you’ve given in the front end blind. Maybe you’re manipulating the data complexly to build specific HTML blocks or inject some markup? Or maybe it’s just more consistent to do everything on the server side and have the front end handle rendering the template?
While it’s not a terrible solution, it does require consistency across your application in how escaping is handled. It’s easy to overlook this if you’re working on the backend and know you’ll be sending the output to the view in JSON—not HTML.
I’ve come across a number of vulnerabilities in which HTML was constructed in the backend and sent to the browser—with neither side escaping the output! One of my recent ones involved search results, where the search terms were highlighted in the results through some HTML.
The backend just did a string-replace of the search term to add a <span>
tag wrapper to highlight it, but didn’t escape the search term - which was user input. The front end rendered the output raw because of the injected <span>
tags, and Cross-Site Scripting (XSS) was quite easy to inject and abuse.
There were two ways to fix this:
Either escape the search terms on the backend as part of the injection—which comes down to being consistent in escaping everything the API sends to the browser—or do the search term highlighting in the front end and escape the search term there.
Both are valid solutions, but you need to be consistent.
Consuming Other APIs
I teased it at the start, and now it’s time to investigate the vulnerability I found in Mastodon!
I was procrastinating one day when I noticed the following post:
It caught my eye because the text was clearly truncated, but it was showing the preview for a link (to my website) - which wasn’t present in what I could see.
Confused, I clicked “Show Original” and was presented with this:
The original message contained a lot more information (in German), including the link I was expecting to see. While I don’t know any words in German, it was pretty clear that the truncation lined up with that <script>
tag on the second line, which got me thinking… 🤔😈
I checked the source, and the translation was injected onto the page without escaping!
So naturally, I had to try this myself. I composed a post in German and tried it!
Here’s the original translated version:
And here’s the translation!
The browser’s Content Security Policy (CSP) blocked the attack, but I had a successful XSS vector on Mastodon, an open-source project used by a huge number of people…
I dutifully (responsibly) reported it to Mastodon, who resolved the issue quickly and rolled out an update.
The cause of the issue was simple: Mastodon trusted the translation service's API output to be safe and wasn’t escaping it.
The fix was pretty straightforward, they had to escape the output from the translation API before rendering it.
Summary
As a security person, I feel like I say this a lot, but I need to repeat it here again:
Don’t forget about output escaping!
Don’t forget about output escaping!
Don’t forget about output escaping!
Don’t forget about output escaping!
Don’t forget about output escaping!
It’s easy to overlook it and forget where the boundaries are or that it applies to more than just raw HTML, but it’s critical that you’re always thinking about escaping. This is where a tool like Treblle’s API Security can help - it’ll monitor your APIs and inform you about content type issues and malicious-looking values.💡
Start optimizing your API performance today with Treblle. Experience the benefits of real-time monitoring, comprehensive logging, and actionable insights. See how Treblle can enhance your API observability and help you maintain robust and reliable APIs!