wonko.com

Hi! I'm Ryan Grove: Sorcerer at SmugMug, lover of movies, eater of pie, connoisseur of awesome.

Posts tagged with “security”

Displaying items 1 - 10 of 28

Why loading JavaScript over SSL from a third-party CDN is a bad idea

Let’s say you have a website at https://buygadgets.example.com. Users shop for shiny gadgets on your website and then enter their credit card numbers to buy them.

Because you value the security and privacy of your users, you use SSL for all traffic. You paid top dollar for an SSL certificate signed by one of the most trusted certificate authorities in the world, so your users can always be certain that they’re communicating with your website and not some other site pretending to be yours.

To build your site, you used an open source JavaScript library called FooLib. FooLib is awesome, and it’s backed by FooCo, one of the world’s largest and most trusted technology companies. This company even provides hosted versions of FooLib on their super fast content delivery network (CDN), so that anyone who wants to can link to http://cdn.foolib.com/foo.js instead of having to host the JavaScript on their own servers.

Because browsers display a warning when you serve a page that has a mix of HTTP and HTTPS content, you want to serve FooLib over SSL. Nobody wants to annoy their users with scary security warnings. Luckily, FooCo’s CDN supports SSL! You can just load https://cdn.foolib.com/foo.js, and now your users don’t see that pesky security warning anymore.

Unfortunately, you’re now deceiving your users, and that fancy SSL certificate you bought from the world’s most trusted CA is worthless.

Why? Because you’re letting FooCo execute any JavaScript it wants on your website. You’re loading that JavaScript securely over SSL, so the browser isn’t displaying any scary warnings, but now your users aren’t just communicating with buygadgets.example.com. Now they’re also communicating with cdn.foolib.com, and since cdn.foolib.com can run JavaScript on your pages, they can also see any information the user reads or enters on those pages.

“But why would FooCo do something like that?” you ask. “After all, their motto is `Don’t be naughty`!”

Of course FooCo would never do that. They’re a solid, upstanding, trustworthy company with nothing to gain from stealing credit card numbers. They’re providing a valuable service to the community, and they genuinely do it out of the goodness of their hearts.

But you’re still deceiving your users.

Your SSL certificate says to the user “Hey, you’re safe. It’s only you and me talking here, and nobody else can decrypt our communications. And you can rest assured that I’m really who you think I am, because this trustworthy CA says so.”.

But when you load FooLib from FooCo’s CDN, you’re silently inviting FooCo into that conversation as well. FooCo has their own SSL certificate, which is also signed by a trustworthy CA, but your user doesn’t want to share their information with FooCo. They want to share their information with you. By inviting FooCo into this confidential conversation without even telling your user that you’ve done it, you’re breaking the contract that was implied by your site’s SSL certificate and by the soothing lock icon in the browser’s location bar.

The user thinks they’re only telling you their secrets, but they’re also telling FooCo their secrets. And that’s not cool.

If user trust is important to you, you shouldn’t load JavaScript from third-party CDNs on secure pages. Not even if those CDNs support SSL, and not even if they’re run by the world’s most trustworthy companies.

Now, let’s be realistic: security is always a tradeoff. You trade some convenience for some security. Some websites are willing to share their users’ private information with FooCo, because it’s more convenient to do that than to host a JavaScript library locally. That’s a decision you have to make for yourself.

But as a user, I can tell you that if I found out that a company I trusted was silently making my private information available to some other company without my knowledge, all while making me think they were keeping this information confidential, I’d be pretty pissed. Even if no harm came from it, and even if it was done with the best intentions, I’d consider it a violation of my trust. And I don’t like companies that violate my trust.

Obligatory disclaimer: I work on the YUI JavaScript library at Yahoo!, but these are my own opinions. Nothing in this post should be construed as representing the views of my employer or anyone else on the YUI team.

There's more to HTML escaping than &, <, >, and "

A few days ago I tweeted:

If I had a dollar for every HTML escaper that only escapes &, <, >, and ", I'd have $0. Because my account would've been pwned via XSS."

This was exaggeration for effect—there aren’t many cases where a simple XSS injection could actually empty a bank account—but I wanted to make a point.

By some coincidence, I’ve found myself working with various open source projects recently that take a half-assed approach to HTML escaping. It’s something that tends to be implemented as an afterthought, which is unfortunate because it can be critical for the security of users of these projects. I won’t name any names in this post (pull requests are forthcoming), but I will explain some of the common problems I’ve seen, why they’re problems, and what can be done to fix them.

This post is not an introduction to HTML escaping. It assumes that you already know what HTML escaping is and why it’s necessary. This post also is not a comprehensive catalog of XSS vectors; the examples here are illustrative, but they certainly aren’t the only attacks you need to worry about. The intent of this post is to explain some dangers that you may not be aware of, and to encourage you to read more about them and write safer code.

Note that this post only discusses escaping, which is something entirely different (and far less complicated) than sanitizing. HTML sanitization is a topic for another time.

Escaping < and > isn’t enough

The worst HTML escaper I’ve seen in a major open source project only escapes the < and > characters. This may actually be worse than not escaping anything at all, since it gives the illusion of security, but is trivial to defeat.

For example, let’s say I have the following template, and I’m going to replace the placeholder values, indicated in [square brackets], with HTML-escaped user input:

<a href="/user/[username]">[username]</a>

The attacker enters foo" onmouseover="alert(1) as their username. End result, even after escaping:

<a href="/user/foo" onmouseover="alert(1)">foo" onmouseover="alert(1)</a>

Because the " character wasn’t escaped and the attacker’s input was used in an attribute value, the attacker was able to inject arbitrary attributes and therefore JavaScript (which, in a real XSS attack, would probably be something more harmful than an alert).

This is a classic example of making input safer in one context—in this case, as the content of an <a> element—without considering the other contexts in which it’s likely to be used, such as inside an attribute value.

Escaping &, <, >, and " isn’t enough

The characters &, <, >, and " are the ones most commonly targeted by HTML escaper implementations. This seems to be the minimum set of characters that people think need to be escaped. Unfortunately, it’s still not safe if you don’t have complete control over where the escaped values will be used.

Consider the following template, in which the template author has used single-quoted attribute values:

<a href='/user/[username]'>[username]</a>

This is exploitable using the same attack as the previous example, but with single quotes instead of double quotes: foo' onmouseover='alert(1):

<a href='/user/foo' onmouseover='alert(1)'>foo' onmouseover='alert(1)</a>

You may be saying, “But I always use double quotes to quote attribute values!” Are you also the only person who will ever use your HTML escaper? And are you immune to typos?

Escaping &, <, >, ", and ' isn't enough

This is the character set used by PHP’s ubiquitous htmlspecialchars function, and as you may have guessed, it still falls down on attribute values for two reasons.

First, as Hacker News users DanBlake and nbpoole pointed out in a discussion of this blog post, Internet Explorer treats ` as an attribute delimiter. It may be an edge case, but it’s still a potential attack vector, so ` needs to be escaped too.

Second, HTML also allows attribute values to be completely unquoted. Believe it or not, unquoted attribute values are fairly popular (some people are too lazy to quote them, others are performance zealots who can’t bear the thought of wasting those extra bytes).

Unquoted attribute values are one of the single biggest XSS vectors there is. If you don’t quote your attribute values, you’re essentially leaving the door wide open for naughty people to inject naughty things into your HTML. Very few escaper implementations cover all the edge cases necessary to prevent unquoted attribute values from becoming XSS vectors.

Escaping &, <, >, ", ', `, , !, @, $, %, (, ), =, +, {, }, [, and ] is almost enough

All those characters up there (including the space character!) can be used to break out of an unquoted HTML attribute value. If you escape every last one of them, then you’re probably pretty close to being safe. But you’re still not so safe that you can just start throwing around user input willy nilly.

Why? Because this still doesn’t cover some context-specific cases like inserting user input into the body of an inline <script> element or using user input as part of a URL.

Context is key

If you haven’t figured it out already, the primary message I’m trying to convey here is that you must be aware of the context in which you’re working with user input. Some contexts are more susceptible to attack than others, and there’s no single magic escaping bullet that will protect you or your users in all cases.

In other words, you don’t need to escape everything all the time, but you do need to escape everything that’s important in the particular contexts in which you’re displaying user input.

But there’s still one more wrench to throw into the works…

Always specify a charset, or UTF-7 will eat your face

Even if you do everything else right, serving a page that doesn’t explicitly specify a character set can leave Internet Explorer users open to XSS, thanks to the way IE sniffs out the charset when it isn’t specified.

If an attacker is able to get your page to echo back something that looks like UTF-7 encoding early enough in the page, he may be able to trick IE into rendering the page using UTF-7. This could turn the following seemingly harmless input…

+ADw-script+AD4-alert(1)+ADw-/script+AD4-

…into something potentially harmful:

<script>alert(1)</script>

I recommend specifying a UTF-8 charset in both the Content-Type HTTP response header and a <meta> tag, since it’s easy for one or the other to get switched off or omitted inadvertently as a codebase ages (this has happened to me).

Further reading

As I mentioned in the disclaimer at the top of this post, this is not a comprehensive reference of all the things that can go wrong with HTML escaping. It’s not even a guide. It’s more of a tip-of-the-iceberg preview. Please don’t assume that, having read this post, you now know everything there is to know about HTML escaping. I can guarantee that you don’t, because I don’t.

I learned a lot from the following sources, and I highly recommend them if you’re interested in learning more:

Sanitize 2.0.0 released

Version 2.0.0 of Sanitize, my whitelist-based HTML filtering library for Ruby, is now available. This release includes several new features and some changes to existing features. I’ll cover the big stuff in this blog post; for the complete list of changes, see the HISTORY.md file.

Installing

To install or upgrade Sanitize via RubyGems, run:

gem install sanitize

Sanitize is fully compatible with Ruby 1.8.7, 1.9.1 and 1.9.2.

Transformers

The most significant change in this release is that Sanitize’s core filtering logic is now implemented entirely as a set of always-on transformers. This simplifies the core code and means that Sanitize itself is now built on the same powerful transformer architecture that you can use in your own apps to enhance or alter Sanitize’s functionality.

The environment object provided as input to transformers now contains a slightly different set of data, and transformer output has been simplified. Transformers are no longer required to return anything, and are expected to make any desired alterations directly to the current node and/or document.

Sanitize now has the ability to traverse the document and execute transformers using either depth-first traversal (the default behavior, same as before) or breadth-first traversal (new in 2.0.0). If necessary, you can even run one set of transformers using one traversal method and another using the other method. This allows for greater flexibility and less complexity when writing certain types of transformers.

The README has more details on these changes and new features.

Other notable changes

  • Sanitize now outputs HTML4/HTML5 markup by default instead of XHTML (e.g., <img src="foo.jpg"> instead of <img src="foo.jpg" />, etc.). If you prefer the old behavior, you can set the :output config to :xhtml.
  • Some new elements and attributes (including several HTML5 elements) have been added to the built-in basic and relaxed whitelists. See HISTORY.md for the complete list.
  • Elements like <br>, <p>, and others are now replaced with whitespace when they’re removed in order to preserve the readability of the remaining text content. The list of elements that will be replaced with whitespace when removed is configurable using the :whitespace_elements setting.

Be aware that if you expect specific output from Sanitize in your unit tests, you may need to update your tests. The HTML output from this release may not precisely match the output from previous releases.

Try it out, report bugs

As always, you can try out Sanitize’s built-in filters using the test page at sanitize.pieisgood.org. Please use Sanitize’s GitHub issue tracker to report bugs and file feature requests.

Privileges, rights, and the slippery slopes that surround them

The most common argument I see in support of the TSA’s authority to require passengers to undergo screenings that would, in other circumstances, be considered a violation of their Constitutional rights, is that traveling by air is a privilege and not a right. Since travelers choose to travel by air and are not required to do so, it’s legal for the TSA to require them to submit to X-rays, pat downs, and any other screening procedures the TSA deems necessary before they may board a plane.

This is a good argument. In fact, it’s so good that it has allowed the TSA (and its privately operated predecessors) to operate for years in the face of frequent legal challenges and protests.

It’s also a dangerous argument: it sneakily undermines citizens’ Constitutional rights by declaring those rights to exist only within certain boundaries—boundaries that are defined by the government from whom those rights are supposed to protect us, and defined in a way that bypasses the checks and balances required by the Constitution.

Flying was once a luxury, in the same way that traveling by automobile was once a luxury. Fifty years ago, few people could claim that their livelihood or their well-being required them to travel by air. One hundred years ago, the same was true of cars.

Today, though, air travel is commonplace. Many people have jobs that require them—require them—to travel frequently across long distances. In a large country like the United States, without any ground-based high speed transit infrastructure to speak of, flying is often the only realistic option. A trip that takes hours by plane may take days by train or by car.

To a casual traveler who objects to being X-rayed or patted down, this may seem like a relatively easy choice. Take fewer trips, or budget more time and turn them into road trips. To someone who is required to travel as a condition of their employment, this choice isn’t so easy. Budgeting time for road trips isn’t likely to be an option. This person is faced with a more difficult choice: they can forfeit their civil liberties or they can find a new job.

It would be unwise to assume that the TSA will be content to limit their screenings to air travel. With a strong foothold in all of the nation’s airports, it’s only a matter of time before the TSA asserts the authority to require screenings for rail travel. If things continue in this vein, we’ll see mandatory TSA checkpoints on major interstate highways within the next thirty years (the justification: drivers don’t have to take the interstate).

One thing history has taught us is that people who willingly give up their freedoms rarely have an easy time getting them back. Nobody wants to make it easy for evildoers to smuggle weapons onto airplanes, but we need to find better solutions than the ones the TSA claims are necessary, and these solutions need to be balanced with the actual risks they’re intended to guard against.

Freedom makes a terrible currency. Using freedom to buy safety only leads to bankruptcy.

"No thanks, I've already had cancer, just feel me up or whatever."

My former coworker Isaac Schlueter is flying today and brought some copies of the UCSF letter with him. While waiting in line at the TSA checkpoint, he struck up a conversation with the family behind him:

Turns out she’s a breast cancer survivor. And her doctor has told her to avoid x-rays, even at the dentist, unless absolutely medically necessary. And she didn’t realize that “millimeter wave digital backscatter detection” used x-rays, because the TSA doesn’t actually put that on the sign.

She did the rest.

When we got to the scanner, I opted out. Then they opted out. She’d already convinced the family behind them to do the same. Her response to the TSA agent was awesome, I wish I’d thought of it:

“Ma’am, please step over here.”

“No thanks, I’ve already had cancer, just feel me up or whatever.”

After the first 4 “OPT-OUT” calls, they just passed us all through the regular metal detector. No one got groped.

Internets, please do more things like this.

Why I will avoid flying from now on

I just got back from a trip to Sunnyvale for YUIConf 2010.

Since I work remotely, I typically fly to California every few months for a week in the office. I’ve always had decent luck at not getting selected for “enhanced” screening by the TSA, and this trip was no different. I managed to slip through unscathed, but only barely.

The Terminal B checkpoint at SJC now has a backscatter machine next to each metal detector. Of the two lines that were active when I went through security yesterday evening, one backscatter machine was in active use and another was roped off. I managed to get myself into the line with the roped-off machine, but even so, TSA agents were selecting people from both lines for additional screening.

I wasn’t selected, but as I was putting my shoes back on inside the checkpoint, I noticed that a woman behind me (blonde, wearing a spandex sports bra and shorts, with her midriff exposed) who had already gone through either the metal detector or the backscatter machine (I didn’t see which) had been selected for further screening by a male TSA agent.

She didn’t seem to be objecting, and I didn’t stick around to see what happened, but unless this woman was hiding explosives literally inside her body, this additional screening didn’t make anything safer for anyone.

Until today, I was under the impression that if I were to refuse both a backscatter scan and a pat-down, I would still be free to leave the airport and find another mode of travel. Then I saw this well-documented blog post containing both video (albeit pointed at the ceiling) and audio of a man attempting to do just that and being detained and then threatened with fines and a civil suit simply for trying to leave the airport after refusing to allow a TSA agent to give him a pat-down.

That changes the entire game. Now, merely entering the line at a security checkpoint apparently makes you a prisoner of the TSA. You either do whatever they tell you to, even if it means allowing them to shoot x-rays at you and scrutinize your genitalia, or you pay a $10,000 fine and fight a legal battle.

Sunnyvale isn’t far enough from Portland that I’m willing to give up both my civil liberties and my dignity to get there. I’ll drive from now on.

Sanitize 1.2.0 released

Version 1.2.0 of Sanitize, my whitelist-based HTML sanitizing library for Ruby, is now available. Consult the HISTORY file for a complete list of changes.

Introducing Transformers

This release adds a major new feature called transformers. Transformers allow you to filter and alter HTML nodes using your own custom logic, on top of (or instead of) Sanitize’s core filter. A transformer is any Ruby object that responds to call() (such as a lambda or proc) and returns either nil or a Hash containing certain optional response values.

To use one or more transformers, pass them to the :transformers config setting:

Sanitize.clean(html, :transformers => [transformer_one, transformer_two])

Input

Each registered transformer’s call() method will be called once for each element node in the HTML, and will receive as an argument an environment Hash that contains Sanitize config information and a reference to a Nokogiri::XML::Node object.

The transformer has full access to the Nokogiri::XML::Node that’s passed into it and to the rest of the document via the node’s document() method. Any changes will be reflected instantly in the document and passed on to subsequently-called transformers and to Sanitize itself. A transformer may even call Sanitize internally to perform custom sanitization if needed.

Transformers have a tremendous amount of power, including the power to completely bypass Sanitize’s built-in filtering.

Output

A transformer may return either nil or a Hash. A return value of nil indicates that the transformer does not wish to act on the current node in any way. A returned Hash may contain instructions that tell Sanitize to whitelist certain attributes or nodes, or to replace the current node with a new node (see the README for specifics).

Example: Transformer to whitelist YouTube video embeds

The following example demonstrates how to create a Sanitize transformer that will safely whitelist valid YouTube video embeds without having to blindly allow other kinds of embedded content, which would be the case if you tried to do this by just whitelisting all <object>, <embed>, and <param> elements:

lambda do |env|
  node      = env[:node]
  node_name = node.name.to_s.downcase
  parent    = node.parent

  # Since the transformer receives the deepest nodes first, we look for a
  # <param> element or an <embed> element whose parent is an <object>.
  return nil unless (node_name == 'param' || node_name == 'embed') &&
      parent.name.to_s.downcase == 'object'

  if node_name == 'param'
    # Quick XPath search to find the <param> node that contains the video URL.
    return nil unless movie_node = parent.search('param[@name="movie"]')[0]
    url = movie_node['value']
  else
    # Since this is an <embed>, the video URL is in the "src" attribute. No
    # extra work needed.
    url = node['src']
  end

  # Verify that the video URL is actually a valid YouTube video URL.
  return nil unless url =~ /^http:\/\/(?:www\.)?youtube\.com\/v\//

  # We're now certain that this is a YouTube embed, but we still need to run
  # it through a special Sanitize step to ensure that no unwanted elements or
  # attributes that don't belong in a YouTube embed can sneak in.
  Sanitize.clean_node!(parent, {
    :elements   => ['embed', 'object', 'param'],
    :attributes => {
      'embed'  => ['allowfullscreen', 'allowscriptaccess', 'height', 'src', 'type', 'width'],
      'object' => ['height', 'width'],
      'param'  => ['name', 'value']
    }
  })

  # Now that we're sure that this is a valid YouTube embed and that there are
  # no unwanted elements or attributes hidden inside it, we can tell Sanitize
  # to whitelist the current node (<param> or <embed>) and its parent
  # (<object>).
  {:whitelist_nodes => [node, parent]}
end

For more details on transformers, consult the README.

Installing

To install or upgrade Sanitize via RubyGems, run:

gem install sanitize

Sanitize 1.1.0 released

Sanitize 1.1.0 is now available. The biggest change in this release is a migration from Hpricot to Nokogiri, contributed by Adam Hooper. In addition, a new :output config setting allows you to specify whether you want Sanitize to output XHTML (the default) or HTML4, and Peter Cooper contributed a fix for a bug in which Sanitize would incorrectly strip a whitelisted URL if a path segment contained a colon.

To install or upgrade Sanitize via RubyGems, run:

gem install sanitize

Sanitize 1.0.6 released

Sanitize 1.0.6 is now available. This release brings minor bug fixes and a new feature: you can now specify the symbol :all in place of an element name in the attributes config hash to allow certain attributes on all elements. This is useful if you want to allow all elements to have a class attribute, for example. Thanks to Mutwin Kraus for the patch.

To install or upgrade Sanitize via RubyGems, run:

gem install sanitize

Sanitize 1.0.5 released with a security fix

Sanitize 1.0.5 fixes a bug introduced in version 1.0.3 that prevented non-whitelisted protocols from being cleaned when relative URLs were allowed. Upgrading is strongly recommended.

The DEFAULT and RESTRICTED configs in previous versions of Sanitize are not vulnerable to this bug. The BASIC and RELAXED configs, as well as any custom config that allows relative URLs, are vulnerable in versions 1.0.3 and 1.0.4.

To install or upgrade Sanitize via RubyGems, run:

gem install sanitize

Thanks to Dev Purkayastha for reporting this issue and submitting additional test cases.