Avoiding javascript injection

The cardinal rule of web development is never trust user supplied data to be safe.  A surprising number of developers don’t take this seriously when inserting into a database.  An even larger group incorrectly trust their raw data for output.  This opens upon the browser to what are called injection attacks.

Injection attacks open up your web application to malicious users who can use it to get your application to output things you never intended it to, like a block of javascript that passes the session id to a remote server.  The solution is to always convert your data into a benign form before outputting.  With database queries this means adding slashes to both quotes and slash characters inside of your string variables.  In HTML this means converting dangerous characters into html entities.  (Those little < &gt, & things you’ll see all over the source for the better web sites.)

Usually following these two techniques religiously is enough to secure your application from injection attacks.  However, I ran into an interesting problem the other day that requires a third type of escaping.

I was using PHP to generate Javascript string variables on the fly.  Normally, when creating HTML I would convert my data to html entities before outputting anything to the browser.  Unfortunately, when you’re outputting javascript you can’t count on a given browser to handle html entities.  Some browsers will not convert the entity into a character before passing it to their javascript engine, which can result in javascript bugs and ugly output.

My first thought was since I was dealing with javascript strings I would simply slash the same things you would slash when preparing data for a database insert: quotes and slashes.  Unfortunately, it turns out that for browser consumption this isn’t quite enough.

For example, imagine you have the following php script:

<script>
var bug="<?= addslashes( $_GET['urlvar'])?>";
</script>

I could abuse this script to  generate the following output:

<script>
var bug="</script><script>alert(document.domain)</script>";
<script>

That’s right–there are no quotes in my string that would need to be slashed.    Depending on the way the browser interprets the string, there may be (and probably is) an injection vector in there.  My second thought was to enclose the string in a CDATA section, but to my surprise the browser (Firefox 3) totally ignored the cdata tags and still interpretted the injected close script tag.  This wouldn’t have worked anyway, since you could have injected a CDATA close and then the  javascript tags.   So to prepare a string for output as a javascript string variable you also have to escape the less-than and greater-than characters.  For the sake of xhtml compliance we should probably escape the ampersands as well.  How do you escape a less than character without using an html entity?

Hex codes, my friends, hex codes.  In a javascript string you can represent characters inside a string variable with \x## where # represents a hexadecimal value.  Just look up the ascii values for less than (3C) and greater than (3E), and then you can escape the characters before outputting.  Don’t be alarmed by the ascii, the character codes for these two characters are the same in utf-8 as well.

Here is a sample PHP function that does what we need:

/**
 * Builds a value that is safe for use in a javascript string variable embedded in XHTML.
 *
 * @param string $str
 * @return string
 */
function js($str) {
    $l=strlen($str);
    $ret="";
    for($i=0;$i<$l;++$i) {
        switch($str[$i]) {
            case "\r": $ret.="\\r"; break;
            case "\n": $ret.="\\n"; break;
            case "\"":     case "'":     case "<": case ">":  case "&":
                $ret.='\x'.dechex( ord($str[$i] ) );
            break;
            default:
                $ret.=$str[$i];
            break;
        }
    }
    return $ret;
}

So the output with hex codes escapes in the string would be:

<script>
var bug="\x3C/script\x3E\x3Cscript\x3Ealert(document.domain)\x3C/script\x3E";
<script>

Congratulations!    That gets you 99% of the way. But suppose I was to use my javascript variable to generate more html.  For example:

<div id="mydiv"> </div>
<script>
var bug="\x3C/script\x3E\x3Cscript\x3Ealert(document.domain)\x3C/script\x3E";
document.getElementById('mydiv').innerHTML=bug;
<script>

The script above re-introduces the injection vulnerability because it takes our nice clean string and tells the browser it is html.  Before doing that we need to again convert the string into a safe format.

Since we’re telling the browser the data is html, an html entity encoding is appropriate here.  You would have to implement an html entity function in javascript, because it doesn’t have a built in function.  If we named our funcion html_entities()  then it would be safe to do something like the following:

<div id="mydiv"> </div>
<script>
var bug="\x3C/script\x3E\x3Cscript\x3Ealert(document.domain)\x3C/script\x3E";
document.getElementById('mydiv').innerHTML=html_entities(bug);
<script>

However, an alternative technique, which is my prefered technique is actually to use the jQuery and it’s built in jQuery.text() method which tells the browser to inject plain text rather than html.

Example:

<div id="mydiv"> </div>
<script>
var bug="\x3C/script\x3E\x3Cscript\x3Ealert(document.domain)\x3C/script\x3E";
$('#mydiv').text(bug);
<script>

Summary

Any time you are sending data between layers you need to examine the format of the data that the source is expecting and translate your data as appropriate.  The following table summarizes the built-in php functions you can use to prepare your data for use.

Destination PHP function to encode with
Database addslashes(), mysql_real_escape()
HTML htmlentities()
Javascript string variable No built-in function. See js() function from this article
Command line escapeshellarg()

On a related note, if you are passing user supplied data on the command line, don’t!  In Linux there is no way to escape the single quote character, so the best you can do is remove single quotes. Instead find a way to pipe your data into your program rather than using the command line.  It’s must safer and will allow you to work with single quotes.  If you insist on passing command line variables see php’s escapeshellarg() function.

Leave a Reply

Your email address will not be published. Required fields are marked *