wonko.com

Hi! I'm Ryan Grove: Sorcerer at SmugMug, lover of movies, eater of pie, connoisseur of awesome.

Posts tagged with “ruby”

Displaying items 1 - 10 of 33

Larch 1.1.0 released

Version 1.1.0 of Larch is now available.

Larch is a command line tool for copying email quickly and reliably from one IMAP server to another. It’s smart enough not to copy messages that already exist on the destination server, robust enough to pick up where it left off when interrupted, and also capable of more advanced operations like syncing message flags from the source server to the destination server. Larch is particularly well-suited for copying messages to, from, and between Gmail and Google Apps accounts.

Installing

Larch is a Ruby application and requires Ruby 1.8.6 or higher (1.9.2 is recommended). Once Ruby is installed, install or upgrade Larch via RubyGems:

gem install larch

What’s new

This release is the culmination of over a year of bug fixes and feature development. Larch is now faster and more reliable than ever. The most significant new features are:

  • Mailbox and message state information is now stored in a local SQLite database, which allows Larch to resync and resume interrupted operations much more quickly without having to rescan all messages.
  • You can now provide configuration options via a config file, so you don’t need to pass all options on the command line. This also allows the use of named session configs, so running larch gmail-to-yahoo will use the config options defined under the gmail-to-yahoo section in the config file.
  • Yahoo! Mail IMAP is now supported when connecting to imap.mail.yahoo.com or imap-ssl.mail.yahoo.com, even for non-pro accounts. Note that Yahoo! doesn’t officially support the use of IMAP by non-pro accounts, so this feature should be considered experimental.
  • Performance and reliability have been improved significantly. Many bugs have been fixed and many workarounds have been added for misbehaving servers.

This is only a partial list of changes. For the complete list, see the HISTORY file. For more details on how to use Larch’s new features, see the comprehensive README.

Try it out, report bugs

Please use Larch’s GitHub issue tracker to report bugs and request features. If you have questions or need help with Larch, the first thing you should do is read the very detailed documentation. If that fails, search the archives of the Larch mailing list on Google Groups. If you still can’t find an answer, send your question to the list.

Sanitize 2.0.0 released

Version 2.0.0 of Sanitize, my whitelist-based HTML filtering library for Ruby, is now available. This release includes several new features and some changes to existing features. I’ll cover the big stuff in this blog post; for the complete list of changes, see the HISTORY.md file.

Installing

To install or upgrade Sanitize via RubyGems, run:

gem install sanitize

Sanitize is fully compatible with Ruby 1.8.7, 1.9.1 and 1.9.2.

Transformers

The most significant change in this release is that Sanitize’s core filtering logic is now implemented entirely as a set of always-on transformers. This simplifies the core code and means that Sanitize itself is now built on the same powerful transformer architecture that you can use in your own apps to enhance or alter Sanitize’s functionality.

The environment object provided as input to transformers now contains a slightly different set of data, and transformer output has been simplified. Transformers are no longer required to return anything, and are expected to make any desired alterations directly to the current node and/or document.

Sanitize now has the ability to traverse the document and execute transformers using either depth-first traversal (the default behavior, same as before) or breadth-first traversal (new in 2.0.0). If necessary, you can even run one set of transformers using one traversal method and another using the other method. This allows for greater flexibility and less complexity when writing certain types of transformers.

The README has more details on these changes and new features.

Other notable changes

  • Sanitize now outputs HTML4/HTML5 markup by default instead of XHTML (e.g., <img src="foo.jpg"> instead of <img src="foo.jpg" />, etc.). If you prefer the old behavior, you can set the :output config to :xhtml.
  • Some new elements and attributes (including several HTML5 elements) have been added to the built-in basic and relaxed whitelists. See HISTORY.md for the complete list.
  • Elements like <br>, <p>, and others are now replaced with whitespace when they’re removed in order to preserve the readability of the remaining text content. The list of elements that will be replaced with whitespace when removed is configurable using the :whitespace_elements setting.

Be aware that if you expect specific output from Sanitize in your unit tests, you may need to update your tests. The HTML output from this release may not precisely match the output from previous releases.

Try it out, report bugs

As always, you can try out Sanitize’s built-in filters using the test page at sanitize.pieisgood.org. Please use Sanitize’s GitHub issue tracker to report bugs and file feature requests.

Ruby script to retrieve and display Comcast data usage

Update (2011-04-03): Comcast’s user account pages now appear to require JavaScript, which makes it impossible to scrape the usage data using a simple script. As a result, this script no longer works.

Comcast has often advertised their high speed Internet service as providing “unlimited” data transfer, but when they say “unlimited”, what they really mean is “limited to 250GB a month”.

Just before the new year, Comcast finally rolled out a data usage meter to users in the Portland, Oregon area so we can actually tell when we’re in danger of exceeding that 250GB ceiling. I find this usage meter incredibly helpful in achieving my goal of using as much of my monthly 250GB data allotment as I possibly can. I feel it’s my duty to get my full money’s worth.

Unfortunately, the meter is buried several pages deep in Comcast’s account site, which is a slow and ugly beast that requires a login, several redirects, and a click or two. So I whipped up a little Ruby script to do the dirty work for me and just print out my current usage total.

Before using the script, you’ll need to install the Mechanize gem:

gem install mechanize

Here’s the script:

#!/usr/bin/env ruby

require 'rubygems'
require 'mechanize'

URL_LOGIN = 'https://login.comcast.net/login?continue=https://login.comcast.net/account'
URL_USERS = 'https://customer.comcast.com/Secure/Users.aspx'

abort "Usage: #{$0} <username> <password>" unless ARGV.length == 2

agent = Mechanize.new

agent.follow_meta_refresh = true
agent.redirect_ok = true
agent.user_agent = 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6'

login_page = agent.get(URL_LOGIN)

login_form = login_page.form_with(:name => 'signin')
login_form.user = ARGV[0]
login_form.passwd = ARGV[1]

redirect_page = agent.submit(login_form)
redirect_form = redirect_page.form_with(:name => 'redir')

abort 'Error: Login failed' unless redirect_form

account_page = agent.submit(redirect_form, redirect_form.buttons.first)

users_page = agent.get(URL_USERS)
usage_text = users_page.search("div[@class='usage-graph-legend']").first.content

puts usage_text.strip

Save it to an executable file (I called it capmon.rb), then run it like so, passing in your Comcast.net username and password (they’ll be sent securely over HTTPS):

./capmon.rb myusername mypass

The script will log into your Comcast account, go through all those painful redirects and clicks, and eventually spit out your usage stats, which will look something like this:

166GB of 250GB

Couldn’t be simpler! Naturally, this script won’t work for you unless you’re a Comcast customer in a region where the usage meter is currently available. Also, the script will break if Comcast changes their login flow or page structure, but I’ll try to keep this post updated if that happens.

This script is available as a GitHub gist as well. If you’d like to modify it and make it better, please fork the gist.

Monkeypatch to fix Ruby Net::IMAP + Dovecot response parsing bug

The Net::IMAP standard library distributed with Ruby 1.8.6, 1.8.7, and 1.9.1 contains a response parsing bug that can cause an endless hang (in 1.8.x) or raise an exception (in 1.9.1) when switching between mailboxes on a Dovecot 1.2.x server.

The bug has been fixed in Ruby’s SVN trunk and should eventually make it into the 1.9.2 release, but if you’re using Net::IMAP with a current or older Ruby release and need a fix for this, the following monkeypatch (which just replaces the old buggy method with the fixed one from SVN) should do the trick.

Fortunately, this fix is the only difference from the 1.8.6, 1.8.7, and 1.9.1 versions of this method, so the monkeypatch works for all three versions. Just add it to your own code at some point after requiring Net::IMAP.

if RUBY_VERSION <= '1.9.1'
  module Net # :nodoc:
    class IMAP # :nodoc:
      class ResponseParser # :nodoc:
        private

        # This monkeypatched method is the one included in Ruby SVN trunk as
        # of 2010-02-08.
        def resp_text_code
          @lex_state = EXPR_BEG
          match(T_LBRA)
          token = match(T_ATOM)
          name = token.value.upcase
          case name
          when /\A(?:ALERT|PARSE|READ-ONLY|READ-WRITE|TRYCREATE|NOMODSEQ)\z/n
            result = ResponseCode.new(name, nil)
          when /\A(?:PERMANENTFLAGS)\z/n
            match(T_SPACE)
            result = ResponseCode.new(name, flag_list)
          when /\A(?:UIDVALIDITY|UIDNEXT|UNSEEN)\z/n
            match(T_SPACE)
            result = ResponseCode.new(name, number)
          else
            token = lookahead
            if token.symbol == T_SPACE
              shift_token
              @lex_state = EXPR_CTEXT
              token = match(T_TEXT)
              @lex_state = EXPR_BEG
              result = ResponseCode.new(name, token.value)
            else
              result = ResponseCode.new(name, nil)
            end
          end
          match(T_RBRA)
          @lex_state = EXPR_RTEXT
          return result
        end
      end

    end
  end
end

If you’re a Larch user, the latest Larch development gem includes this fix.

Sanitize 1.2.0 released

Version 1.2.0 of Sanitize, my whitelist-based HTML sanitizing library for Ruby, is now available. Consult the HISTORY file for a complete list of changes.

Introducing Transformers

This release adds a major new feature called transformers. Transformers allow you to filter and alter HTML nodes using your own custom logic, on top of (or instead of) Sanitize’s core filter. A transformer is any Ruby object that responds to call() (such as a lambda or proc) and returns either nil or a Hash containing certain optional response values.

To use one or more transformers, pass them to the :transformers config setting:

Sanitize.clean(html, :transformers => [transformer_one, transformer_two])

Input

Each registered transformer’s call() method will be called once for each element node in the HTML, and will receive as an argument an environment Hash that contains Sanitize config information and a reference to a Nokogiri::XML::Node object.

The transformer has full access to the Nokogiri::XML::Node that’s passed into it and to the rest of the document via the node’s document() method. Any changes will be reflected instantly in the document and passed on to subsequently-called transformers and to Sanitize itself. A transformer may even call Sanitize internally to perform custom sanitization if needed.

Transformers have a tremendous amount of power, including the power to completely bypass Sanitize’s built-in filtering.

Output

A transformer may return either nil or a Hash. A return value of nil indicates that the transformer does not wish to act on the current node in any way. A returned Hash may contain instructions that tell Sanitize to whitelist certain attributes or nodes, or to replace the current node with a new node (see the README for specifics).

Example: Transformer to whitelist YouTube video embeds

The following example demonstrates how to create a Sanitize transformer that will safely whitelist valid YouTube video embeds without having to blindly allow other kinds of embedded content, which would be the case if you tried to do this by just whitelisting all <object>, <embed>, and <param> elements:

lambda do |env|
  node      = env[:node]
  node_name = node.name.to_s.downcase
  parent    = node.parent

  # Since the transformer receives the deepest nodes first, we look for a
  # <param> element or an <embed> element whose parent is an <object>.
  return nil unless (node_name == 'param' || node_name == 'embed') &&
      parent.name.to_s.downcase == 'object'

  if node_name == 'param'
    # Quick XPath search to find the <param> node that contains the video URL.
    return nil unless movie_node = parent.search('param[@name="movie"]')[0]
    url = movie_node['value']
  else
    # Since this is an <embed>, the video URL is in the "src" attribute. No
    # extra work needed.
    url = node['src']
  end

  # Verify that the video URL is actually a valid YouTube video URL.
  return nil unless url =~ /^http:\/\/(?:www\.)?youtube\.com\/v\//

  # We're now certain that this is a YouTube embed, but we still need to run
  # it through a special Sanitize step to ensure that no unwanted elements or
  # attributes that don't belong in a YouTube embed can sneak in.
  Sanitize.clean_node!(parent, {
    :elements   => ['embed', 'object', 'param'],
    :attributes => {
      'embed'  => ['allowfullscreen', 'allowscriptaccess', 'height', 'src', 'type', 'width'],
      'object' => ['height', 'width'],
      'param'  => ['name', 'value']
    }
  })

  # Now that we're sure that this is a valid YouTube embed and that there are
  # no unwanted elements or attributes hidden inside it, we can tell Sanitize
  # to whitelist the current node (<param> or <embed>) and its parent
  # (<object>).
  {:whitelist_nodes => [node, parent]}
end

For more details on transformers, consult the README.

Installing

To install or upgrade Sanitize via RubyGems, run:

gem install sanitize

Sanitize 1.1.0 released

Sanitize 1.1.0 is now available. The biggest change in this release is a migration from Hpricot to Nokogiri, contributed by Adam Hooper. In addition, a new :output config setting allows you to specify whether you want Sanitize to output XHTML (the default) or HTML4, and Peter Cooper contributed a fix for a bug in which Sanitize would incorrectly strip a whitelisted URL if a path segment contained a colon.

To install or upgrade Sanitize via RubyGems, run:

gem install sanitize

Larch 1.0.1 released

Larch 1.0.1 has been released. Notable changes in this release include the following:

  • Ruby 1.9.1 support.
  • Much more robust handling of unexpected server disconnects and dropped connections.
  • Added --all option to copy all folders recursively.
  • Added --all-subscribed option to copy all subscribed folders recursively.
  • Added --dry-run option to simulate changes without actually making them.
  • Added --exclude and --exclude-file options to specify folders that should not be copied.
  • Added --ssl-certs option to specify a bundle of trusted SSL certificates.
  • Added --ssl-verify option to verify server SSL certificates.
  • Added a new “insane” logging level, which will output all IMAP commands and responses to STDERR.
  • Fixed excessive post-scan processing times for very large mailboxes.
  • Fixed potential scan problems with very large mailboxes on certain servers.
  • POSIX signals are no longer trapped on platforms that aren’t likely to support them.

To install Larch via RubyGems, run:

sudo gem install larch

Visit Larch’s GitHub page for usage information and other documentation. If you have questions or need help with something, please use the Larch mailing list.

Larch syncs messages from one IMAP server to another

I’ve just released Larch, a Ruby application that syncs messages from one IMAP server to another quickly and safely. It’s smart enough not to copy messages that already exist on the destination and robust enough to deal with ornery or misbehaving servers.

A couple years ago I posted a quick and dirty script that did this. Larch is a much more mature tool that does essentially the same thing (only faster and more reliably).

Larch is particularly well-suited for copying email to, from, or between Gmail accounts. During its development, I used it to sync over 25,000 messages from a Dovecot IMAP server to Gmail.

To install Larch via RubyGems, run:

sudo gem install larch

Visit Larch’s GitHub page for usage information and other documentation.

Update: Larch now has a mailing list where you can post questions and comments and find answers to questions that have already been answered.

Sanitize 1.0.6 released

Sanitize 1.0.6 is now available. This release brings minor bug fixes and a new feature: you can now specify the symbol :all in place of an element name in the attributes config hash to allow certain attributes on all elements. This is useful if you want to allow all elements to have a class attribute, for example. Thanks to Mutwin Kraus for the patch.

To install or upgrade Sanitize via RubyGems, run:

gem install sanitize

Sanitize 1.0.5 released with a security fix

Sanitize 1.0.5 fixes a bug introduced in version 1.0.3 that prevented non-whitelisted protocols from being cleaned when relative URLs were allowed. Upgrading is strongly recommended.

The DEFAULT and RESTRICTED configs in previous versions of Sanitize are not vulnerable to this bug. The BASIC and RELAXED configs, as well as any custom config that allows relative URLs, are vulnerable in versions 1.0.3 and 1.0.4.

To install or upgrade Sanitize via RubyGems, run:

gem install sanitize

Thanks to Dev Purkayastha for reporting this issue and submitting additional test cases.