Merry Christmas, Internets! My gift to you this year is Sanitize, a whitelist-based HTML sanitizer written in Ruby. Given a list of acceptable elements and attributes, Sanitize will remove all unacceptable HTML from a string.
Using a simple configuration syntax, you can tell Sanitize to allow certain elements, certain attributes within those elements, and even certain URL protocols within attributes that contain URLs. Any HTML elements or attributes that you don’t explicitly allow will be removed.
Because it’s based on Hpricot, a full-fledged HTML parser, rather than a bunch of fragile regular expressions, Sanitize has no trouble dealing with malformed or maliciously-formed HTML. When in doubt, Sanitize always errs on the side of caution.
Using Sanitize is easy. First, install it:
sudo gem install sanitize
Then call it like so:
require 'rubygems' require 'sanitize' html = '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg" />' Sanitize.clean(html) # => 'foo'
By default, Sanitize removes all HTML. You can use one of the built-in configs to tell Sanitize to allow certain attributes and elements:
Sanitize.clean(html, Sanitize::Config::RESTRICTED) # => '<b>foo</b>' Sanitize.clean(html, Sanitize::Config::BASIC) # => '<b><a href="http://foo.com/" rel="nofollow">foo</a></b>' Sanitize.clean(html, Sanitize::Config::RELAXED) # => '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg" />'
Or, if you’d like more control over what’s allowed, you can provide your own custom configuration:
Sanitize.clean(html, :elements => ['a', 'span'],
:attributes => {'a' => ['href', 'title'], 'span' => ['class']},
:protocols => {'a' => {'href' => ['http', 'https', 'mailto']}})
For more details, see the Sanitize Documentation.
Comments
Cool plugin. but one little issue i'm having.
I appreciate you donating this code to the open source community. I have one small issue. The plugin works great except with relative links. I tried adding “/” to the protocols but it does not seem to work. Any advice would be useful….
Re: Cool plugin. but one little issue i'm having.
Good catch, Johnny. I’ve pushed a change to the git repo that adds support for relative URLs. With this change, you can allow relative URLs by including the special value
:relativein a protocol config array, like so::protocols => { 'a' => {'href' => ['http', 'https', :relative]} }The Basic and Relaxed configs have also been updated to allow relative URLs.
Thank you!
This looks very cool. I’ll definitely be using it in my next project. It’s very easy to use. Thanks and have a Happy New Year!
Dealing with entities
Hello and happy new year !
Thanx for this gem.
How can I deal with html entities ? Each one is replaced by “?” character :
>> Sanitize.clean(‘& eacute ;’)
=> “?”
Happy new year
using it with rails
thanks for the gem, it is awesome.
is there any other way to use this apart from installing it as a gem in the machine, if i want to use it in my rails application?? maybe like a plugin or something???
Re: Dealing with entities
This appears to be a bug in Hpricot. I’ve pushed a workaround to the git repo. Thanks for the report!
Re: using it with rails
I don’t use Rails myself, but you should be able to unpack the Sanitize gem (and its dependencies, Hpricot and HTMLEntities) into Rails’s
vendor/gemsdirectory. Here’s a nice howto guide.Nokogiri
Ryan,
You might want to check out Nokogiri. …Nokogiri is faster, and less buggy than Hpricot… http://github.com/tenderlove/nokogiri/tree/master
(Nokogiri’s #inner_text will strip HTML.)
…might be worth getting your Sanitize lib to work with Nokogiri.
cheers
Re: Dealing with entities
Thank you Ryan, it’s working very well.
Sanitizers considered harmful
I am nothing if not skeptical about sanitization that does not involve a full pass through a real browser instance. The mother of all HTML sanitization hacks was probably Samy’s profile hack on MySpace.
Here is an explaination of what he did. My question is, will Sanitize correctly filter out his hack (and each of the steps we took to get there)? And second to that shouldn’t it be part of the test cases as it is about as pathological as you’ll get?
I’m pretty sure that the answer is going to be “no”. Sanitize is going to only filter out elements and attributes. The problem is that there are many many ways to hide malicious things in the markup that is going to defeat this sanitizer. And to be honest, a bad sanitizer is probably worse than no sanitizer.
The problem is that you need to take into account all the ways tha browsers really mess things up. I notice that you’re testing for entities in place of the colon to hide javascript:, but you have no test for an entity in place of any other character. Given how and when entities are handled, its reasonable for browser to interpret
javascript:likejavascript:. In fact, I belive that niave email address obfuscation scripts do this (you know those scripts that try to hide your address from spam bots).More notably some browsers will actually interpret the string
java\nscript:asjavascript:; they probably shouldn’t ignore the newline but some do (and given samy’s success, I’ll bet its IE that does).Before anyone uses this library in production, Ryan really needs to beef up his test cases. I also think you’re going to need add something like JSLint which can be set to use a safe subset of JavaScript. Now JSLint is written in JavaScript so you’ll either need to find a JS engine to work with or you’ll have to rewrite the code in Ruby; that shouldn’t be impossible as Doug Crockford has described the methods he wrote to create JSLint.
I would also like to see a “beat the sanitizer” website setup where people can test the sanitizer against malicious code and send failure reports to Ryan (who would then fix the sanitizer and exapand the test coverage accordingly). Mind you, you’d have to be careful with what you do with known malicious code that bypasses your sanitizer since someone might send you truely malicious code not just something that could be exploited to deliver malicious code.
Re: Sanitizers considered harmful
I appreciate your skepticism, Adam, and I welcome test cases that will help me improve Sanitize. However, it’s irresponsible of you to make accusations based purely on speculation. If you think you have a way to break Sanitize, test it. If it works, let me know so I can fix it. Don’t guess, and don’t make accusations based on guesses.
You seem to be under the misconception that Sanitize is intended to make it safe to include CSS and JavaScript in your HTML. It isn’t. Sanitize is intended only to clean HTML. If you tell Sanitize to allow elements (such as
<script>) or attributes (such asstyle) that allow code execution, you’re taking your safety into your own hands and should definitely look into AdSafe, Caja, or other sandboxing tools. By default, Sanitize strips all elements and attributes, and none of the included configurations allows unsafe elements or attributes.For the record, here’s the result of running a string containing the Samy worm through each of Sanitize’s built-in configurations:
And here’s the result of each of the
javascript:variations you proposed, which aren’t tested for in the Sanitize unit tests because the only character that matters to Sanitize’s protocol-filtering code is the colon. As long as the colon is recognized correctly, the protocol will be sanitized properly:s = Sanitize.new(Sanitize::Config::RELAXED) s.clean("<a href=\"javascript:alert('hi')\">foo</a>") # => "<a>foo</a>" s.clean("<a href=\"java\nscript:alert('hi')\">foo</a>") # => "<a>foo</a>"Remember, Sanitize is based on a whitelist, not a blacklist. You don’t need to tell it what to block, you only need to tell it what to allow. When Sanitize checks for a valid protocol, it doesn’t look for variations of
javascript:that need to be filtered out. It looks for a:character and then ensures that anything preceding it is in the protocol whitelist.Again, skepticism is always healthy, and I thank you for that, but speculation without experimentation is useless and can result in harmful misinformation.
Re: Sanitizers considered harmful
By the way, I couldn’t resist the challenge. You can now test Sanitize to your heart’s content on my very own server at http://sanitize.pieisgood.org. Do let me know if you discover anything saucy.
HTML 5
Very cool, Ryan; library looks good!
Quick question about some new behavior allowed by HTML 5; a minor change was made to allow any element (not just anchors) to include an HREF, such as divs and spans. How will Sanitize handle this?
I tried http://sanitize.pieisgood.org/ but it rejected all divs… didn’t delve much further yet, but may take another look later to see what I come up with.
But the question still remains: is this behavior that you’d like to whitelist by default (since it’s much like a simple link in a different tag)? That is, if you even do whitelist links.
Truly Wacky HTML
I had written a similar library to this for use scrubbing html emails down, and I ran across some truly weird HTML that I couldn’t get Hpricot to parse.
I ended up switching to Nokogiri, because it tossed anything that didn’t make sense. I ran the sample below through your tester and it came up with all the wacky attributes still intact, as did my own Hpricot parser.
Here’s a sample from one test case:
<table class="zarg" randomstuffhere background-image:url('http://images.webbuyersguide.com/newsletterimages/right_bg.gif') background-repeat:repeat-y>… etc. etc.Re: HTML 5
None of the built-in configs allows
hrefattributes on elements other than<a>, but you can easily tell Sanitize to allow any attribute on any element you want:html = '<div href="http://foo.com/">Foo</div>' # Allow divs with href attributes containing relative URLs or HTTP/HTTPS URLs. config = { :elements => ['div'], :attributes => {'div' => ['href']}, :protocols => {'div' => {'href' => ['http', 'https', :relative]}} } Sanitize.clean(html, config) # => html (unmodified)Sanitize doesn’t actually understand anything about the semantics of HTML other than what you tell it, so it won’t have any problem dealing with HTML 5.
As for your question about whether Sanitize will whitelist such things by default: nope, Sanitize will never ever whitelist anything by default, but as HTML evolves, the included configs will be updated to take such things into account.
Re: Truly Wacky HTML
Matt, when I run that example through Sanitize, it doesn’t leave it intact; it (correctly) entifies the markup, rendering it harmless but still displayable. This is one of Sanitize’s safety fallbacks when it encounters something it can’t parse.
Since the example is not even remotely valid HTML, I don’t think it’s fair to expect Hpricot (or any HTML parser) to be able to parse it. However, it is fair to expect that any worthwhile HTML sanitizer will at least sanitize it, which Sanitize does.
If you have any other wacky examples like that, I’d love to see what Sanitize does with them. In this case, though, I think it’s doing the right thing.
Rad!
This is pretty darn awesome.
Like your site, too. Nice work with the fonts and such things. :)
re: Nokogiri
While nokogiri might in some cases be faster than hpricot, there are also cases where nokogiri is exceptionally difficult to get working because of poor management of dependencies. I am sure that Ryan is aware of both nokogiri and hpricot, and that he’s made educated choices.
A whitelist-parser-based sanitizer is certainly a welcome addition to the toolbox. Thanks Ryan!
Sanitize rocks!
Thank you Ryan for your gift! Now i can parse web content using a few lines of code.
Happy new year!
Awesome
This is exactly what I was looking for. I’ll be adding it to my web site soon, to allow visitors to use a small subset of html to markup their entries. Thanks!
Oddity in parsing malformed href?
Hey, I’ve found what may be unintended behavior when parsing particularly malformed tags. Not sure if this is best approached via Sanitize or Hpricot, but here’s what I have…
Attempting to clean a malformed href tag NULLs the entire message when using a config that allows the anchor tag and allows protocols. (Basic, Relaxed.)
I’d expect the broken tag to be jettisoned, since without a properly-formed protocol reference, there’s essentially nothing there of interest — I’d just expect the accompanying text to be preserved.
Here’s a short irb transcript…
http://pastebin.com/f7a37ddb7
I’m seeing this with the 1.0.1 gem on Debian Etch, and have also verified that your sanitize.pieisgood.org interface behaves oddly with the above attempt: it produces a “500 Server internal error” message. Actually, this comment interface chokes on it too, necessitating the pastebin link. :)
Sanitize is great for my needs otherwise, by the way. I’ve got about 250k pieces (and growing) of wildly different user-submitted content that need to be sanitized and Sanitize performs exactly as desired on all but about 15 of them. All of those are related either to PEBCAK issues like the above, or to wacky Unicode strings I haven’t quite gotten my head around yet.
Thanks very much for your work!
Re: Oddity in parsing malformed href?
Thanks Daemian. That was indeed a bug in Sanitize. I’ve pushed a fix to the git repo. Please let me know if you discover anything else like that.
Re: Truly Wacky HTML
Ah, I didn’t realize that it had escaped the html. In my case, I need a library that strips invalid markup components, not one that escapes the whole tag. I certainly understand the rationale for escaping, but that’s not what I need :).
Nice work all the same!
clean! doesn't work as documented
The clean! method should return nil when no changes are required. However:
> Sanitize.clean!("<div id='myid' class='myclass' style='color:red'>hi</div>", :elements=>'div', :attributes=>{'div'=>'id class style'}) => "<div class=\"myclass\" id=\"myid\" style=\"color:red\">hi</div>"No changes are required but nil is not returned, instead the type of quotes and the order of the attributes have been modified.
This could be fixed by changing the comparison at the end of the clean! method from:
toRe: clean! doesn't work as documented
Thanks Daniel. You’re right, the documentation in this case is misleading. It should say that
clean!will returnnilif no changes were made, not if no changes were necessary. I’ll include a fix in the next release.sneaky html
Hi there!
First of all thx a lot for the gem, it’s been very useful for me!
However, I’ve just tried on the server you set up the following string:
‘’
And it seems that 2 of the logos appear…
uppsss
I didn’t want to put the images here!
So sorry!
It’s just the same IMG tag 4 times repeated. Somehow 2 of them appears!
Re: sneaky html
Thanks Cristobal, I’ll investigate and get a fix out as soon as possible.
In the future, please report things like this directly to me via email before disclosing them publicly so I have a chance to provide a fix before knowledge of the vulnerability is widespread.
Re: sneaky html
Sanitize 1.0.4 is now available via RubyGems with a fix for this issue.
form_for
I have a basic form:
<% form_for … do |f| >
<= Sanitize.clean(f.text_field :title,
Sanitize::Config::RESTRICTED) >
< end %>
This is incorrect, could anyone help me out.
How to specify insertion of nofollow rels?
I notice that the Sanitize::Config::BASIC adds a
rel=“nofollow” to links whereas Sanitize::Config::RELAXED doesn’t.
However looking at the Documentation for the two configs didn’t give me any clue how I could make my own config that would, e.g. be very similar to RELAXED but add the rel=“nofollow” attribute.
tips?
Re: How to specify insertion of nofollow rels?
The
:add_attributesconfig param is what you’re looking for. It’s a Hash of element names, each of which is in turn a Hash of attribute names and values that should be added to all instances of that element.Here’s the source of
Sanitize::Config::BASICso you can see how it’s done:class Sanitize module Config BASIC = { :elements => [ 'a', 'b', 'blockquote', 'br', 'cite', 'code', 'dd', 'dl', 'dt', 'em', 'i', 'li', 'ol', 'p', 'pre', 'q', 'small', 'strike', 'strong', 'sub', 'sup', 'u', 'ul'], :attributes => { 'a' => ['href'], 'blockquote' => ['cite'], 'q' => ['cite'] }, :add_attributes => { 'a' => {'rel' => 'nofollow'} }, :protocols => { 'a' => {'href' => ['ftp', 'http', 'https', 'mailto', :relative]}, 'blockquote' => {'cite' => ['http', 'https', :relative]}, 'q' => {'cite' => ['http', 'https', :relative]} } } end endIf you’d like a version of the
RELAXEDconfig that addsrel="nofollow"to links, this should do the trick:config = Sanitize::Config::RELAXED.merge({:add_attributes => {'a' => {'rel' => 'nofollow'}}})' entity not expanded in IE6
I’ve just noticed that if I view Sanitize’s output in IE6, instances of the ' entity aren’t rendered as I’d expected. As it turns out, the ' entity is treated differently since it’s part of the XML spec. So the code would seem to be performing as specified, it’s just our (my) expectations that are off. This URL has more…
http://cssvault.com/blog/2007/10/17/internet-explorer-apos-feature/
I was all set to submit a patch for this behavior via github but grepping through sanitize’s source shows only one instance of the ' term — in a test — so I’m guessing this behavior might originate in hpricot. Let me know if that’s accurate and I’ll be happy to report this there and see if _why cares about IE6. (I have to, sadly, and once you start seeing ampersand entities instead of single quotes you realize just how outstandingly popular they are.)
Incidentally, looks like the title field and body field of this comment interface interpret “&apos;” differently.
Re: ' entity not expanded in IE6
Thanks for reporting this, Daemian. I don’t think Hpricot is at fault here, though. The HTMLEntities gem seems to be the culprit. I’ve been planning to get rid of that dependency anyway by rolling the necessary functionality into Sanitize, so I’ll fix this as part of that change.
As for the comment field behavior, that’s because the title field doesn’t allow HTML (so the string “'” is escaped and displayed literally) whereas the comment body does allow HTML (so the string “'” is not escaped, and is interpreted as an entity by the browser).
' IE feature
I am having the same problem with the &apos ; in IE 7..
Re: ' IE feature
The ' issue is fixed in the latest development version of Sanitize on GitHub.
A Note for Rails Devs
The latest gem (1.0.5) works with a mongrel development server but fails during a “rake” Test::Unit run. I get an error that it could not find the HTMLEntities gem even though it was installed and unpacked into vendor/gems. To work around this I put the latest source of the gem in vendor/gems (it does not require HTMLEntities) and everything works with Rails.
*The hpricot gem is also in vendor/gems as I “vendor everything”. It should work with your Rails app if you just install them as gems. When 1.0.6 of this gem is released, it should just work out of the box without this workaround.
Redundant Elements
I’ve got some awful-looking HTML I’m parsing and Sanitize is doing a great job for the most part. However, there are some nested tags (
<b> <b> Foo! </b> </b>) that it’s not cleaning up. Perhaps that’s outside the scope of an otherwise excellent plugin but it’d be neat if it could fix that as well :)Re: Redundant Elements
Sanitize will strip nested tags if they’re not in the whitelist, but if a tag is whitelisted, Sanitize leaves it alone, even if it’s redundant. You’re probably looking for something more along the lines of Tidy.
Re: Redundant Elements
I probably am (as a bit of a newbie to Ruby). Thanks for the heads up! :)
Can Sanitize remove contents of a script tag?
I’m using Sanitize and it works great (thanks!), but when it strips out a script tag, it leaves the contents of the tag in place. While this makes sense for some tags, in this case it can leave a blob of javascript visible to the end user, which is undesirable (I am processing 3rd party HTML and can’t prevent script tags in the body content). Is there a way to have sanitize remove a tag and all of its contents? Thanks!
Re: Can Sanitize remove contents of a script tag?
By default Sanitize tries to preserve (but make safe) any non-tag content, since its primary use case is for sanitizing things like blog comments where removing the contents of a non-whitelisted tag could result in unexpected data loss.
That said, I do plan to add an option to a future version of Sanitize to allow you to specify that you want the contents of non-whitelisted tags removed completely (I’ve already received one or two patches along these lines, I just haven’t been entirely happy with them).
very quick question
I looks like someone asked this question already in the comments. However, I don’t see a solution. I could just be thick. I love the sanitize gem and had no problem using most of it. One issue i’m having is that whenever I have html entities like it turns it into a question mark. My question is thus how do I allow html entities. thanks ahead of time.
Re: very quick question
As of the latest release (1.0.6) Sanitize should pass all well-formed entities through untouched, assuming they’re not used in a malicious context. What version are you using?
clean! gives me issues as well
Love the gem, thanks for your work!
I am having an issue with the clean! function. I am writing a validation method that needs to know when the string of html is dirty (requires cleaning), however the clean! function (as pointed out in a previous comment) returns a true value when the string is modified, but not when it needed modification.
I’ve monkey-patched the gem in a rails/config/initializer script and that works for now. Do you plan to change the api in the future or do you plan to leave the current behaviour of clean! ?
Re: clean! gives me issues as well
The
clean!method has always worked correctly; however, as discussed above, the documentation for the method in the first release of Sanitize was misleading as to the method’s purpose. The documentation in later releases is correct:The purpose of
clean!is to sanitize the given string in place rather than returning a sanitized copy of the given string. In other words, it’s a destructive verson ofclean.It sounds like what you want is a method that tells you whether the given string needs to be sanitized, but doesn’t actually sanitize it. There currently isn’t a method that does this, but something like the following (which I imagine is similar to what you’ve hacked up) would work:
I’m curious what your use case for this is, though. If you’re trying to save processing time by not sanitizing already-clean strings, this won’t do the trick, since the only way to determine whether a string is dirty is to actually clean it (or a copy of it, as in the example above).
Re: clean! gives me issues as well
Unfortunately this won’t work. Sanitize.clean! was returning html instead of nil because of 2 different situations:
For issue #1, I experienced img tags with many attributes would sometimes come back with their attributes in a different order than they appeared in the original html. No modificaitons were necessary, the source and resultant html were different. I’m guessing its not limited to img tags, but any tag with multiple attributes.
For issue #2 here is a small example,
Gets transformed into:
As far as I know the single quote is a perfectly valid character in HTML and doesn’t (normally) need be represented as an html entity.
I guess the next question is WHY do I care if the html experienced minor modifications that don’t affect anything visual. Its a good / valid question. I use it for model attribute validation.
I prefer not to silently modify the data I get from outside sources (I deal with data from the web and from bulk files that are imported on a regular basis). Its important (especially for the bulk files) for me to know when the data I get is invalid so it can be fixed at the source. I need to generate a log of results of what fields had to be skipped and why. Neither of the above two issues would require the data to considered invalid.
I have also been experiencing seemingly random genmentation faults:
I’m guessing it stems from Hpricot. Thoughts?
Re: clean! gives me issues as well
If all you want to do is correct invalid HTML, leaving it untouched if it’s already valid, you want Tidy, not Sanitize. The sole purpose of Sanitize is to remove all but a safe subset of HTML from user-supplied input. If you’re using it for anything else, it’s probably not the best tool for the job. Sanitize doesn’t understand HTML; it understands whitelists, which tell it how to make HTML safe. Tidy, on the other hand, actually understands HTML (but won’t make it safe).
This is why Sanitize converts apostrophes to entities. It’s not always necessary, but it is always safer. Sanitize’s purpose is to sanitize input, which means that safety is its primary concern. The documentation for
clean!(see above) says “if no changes were made” and not “if no changes were necessary” for this very reason.The
clean!method only exists in Sanitize because having a destructive alternative for a non-destructive string method is typical in Ruby classes and I thought it likely that people would ask for it if it wasn’t there. I’m starting to think it would be wiser to remove it though, since it seems to cause a significant amount of confusion.As for the segfaults, Hpricot 0.7 seems to have been a pretty crappy release. I haven’t had a chance to test Sanitize with Hpricot 0.8 yet, but you may have better luck with it. I should have time to get up to speed on the latest Hpricot shenanigans this weekend.
Re: clean! gives me issues as well
I believe the confusion with
clean!comes from the non-standard behavior. The typical behavior of destructive string methods in Ruby is to always return the modified string (see gsub!, strip!, capitalize!, etc), whereasclean!only returns when something has changed.The “if no changes were made” phraseology in the api for
clean!also implies that the Sanitize gem might have the ability to pass up making changes if they are not necessary, which (as I’ve learned) isn’t always the case. It just compares the before and after results and returns nil if they are the same.I would suggest leaving
clean!but simply return the modified contents always. Had that been the original behavior I wouldn’t probably have gone down the rabbit hole of trying to figure out if it could also be used to validate w/out modifying strings.Thanks for your awesome work. Even if I can’t use this to validate html prior to saving to the db, its an excellent tool and I plan to continue using it.
Re: clean! gives me issues as well
Actually, the standard behavior of all three of the methods you mentioned, and of most destructive string methods in Ruby, is to return the modified string or
nilif the string was not modified, which is exactly whatclean!does. Take a look at the API docs:gsub!,strip!,capitalize!.The documentation for
clean!is also patterned after the documentation of standard Ruby string methods, in which the phrase “if no changes were made” is frequently used to describe this behavior.Your suggestion that
clean!always return the modified string is a little puzzling, since this is already what happens. If there is a modified string,clean!will always return it. If the string is not modified, thenclean!returnsnil. To do anything else would be non-standard.It sounds like your confusion stemmed more from the fact that you misunderstood the purpose of the library than from the behavior of
clean!. I’ll try to make it more explicit in the documentation that Sanitize is not intended to be a validator or a replacement for HTML Tidy.can it whitelist specific urls?
Sorry for the double post.
I’m curious how everyone handles things like embed tags. For example, I want to allow some sources of embed tags (i.e. youtube, dailyshow, etc) but not all embed tags. Can sanitize do this or how else does everyone handle this?
Re: can it whitelist specific urls?
Sanitize doesn’t currently provide an option to whitelist specific URLs, but I’ll consider adding this feature.
Sanitize, Ruby 1.9, hpricot and nokogiri
First, I’d like to thanks for creating sanitize. It justs do the work I need in treating some RSS feed entries.
I tried to use it with Ruby 1.9.1p129 and got problems related to encoding. So I changed the encoding of the string I was sending to ASCII-8BIT using the method “force_encoding” (I had similar problems in Rails).
Then I got another error, this time inside hpricot:
/usr/local/lib/ruby19/gems/1.9 .1/gems/hpricot-0.8.1/lib/hpri cot/traverse.rb:198:in `block in reparent’: undefined method `parent=’ for "":String (NoMethodError)
So, I think hpricot 0.8.1 is not 100% bug free with 1.9.1
My question is the following: Do you think it’s worth porting Sanitize to use nokogiri instead of hpricot?
It would be nice use nokogiri (that is faster) and ruby 1.9. It’s possible I’ll try this when I’ll need to optimize my app.
Complements
Neat! I have been using htmLawed (tinyurl.com/htmlawed) for my PHP projects, and this tool will seems a good equivalent for my Ruby ones.
Matching Rails Sanitize API
Has anyone adapted this to Rails’ white-list sanitizer API? Would be handy to just do:
Rails::Initializer.run do |config| sanitizer = Sanitizer.new llowed_tags = ‘table’, ‘tr’, ‘td’ llowed_attributes = ‘id’, ‘class’, ‘style’
config.action_view.white_list_
config.action_view.sanitized_a
config.action_view.sanitized_a
end
If not I will take a stab.