Update (2009-03-16): This script has been superseded by Larch, a full-fledged Ruby application that does the same thing, only faster and more reliably.
Last night after Gmail began rolling out IMAP support, I started investigating ways to copy my huge email archive (thousands and thousands of messages dating back to 2003) from my IMAP server to Gmail’s IMAP server.
Copying the messages from one account to the other in Thunderbird works, but it’s glacially slow, needs babysitting, and is prone to creating duplicate messages unless the entire copy operation works right the first time. Great for copying a few messages, not so great for copying thousands.
I also investigated imapsync, a Perl script that’s somewhat faster and more reliable than Thunderbird and doesn’t create duplicate messages, but for some reason using imapsync results in the messages on Gmail being timestamped with the time they were imported rather than the time they were sent or received, which is unacceptable. I tried using the --syncinternaldates
option to rectify this, but it didn’t work.
So, since the best way to get something done right is to do it yourself, I set about writing my own tool to transfer my email. Thanks to Ruby and Net::IMAP, this turned out to be pretty easy.
Here’s what I came up with. It’s not pretty, it’s not user friendly, and it doesn’t do much error checking, but it’s extremely fast, it works, and if it fails at any point you can just run it again and it’ll pick up where it left off. Share and enjoy.
#!/usr/bin/env ruby
require 'net/imap'
# Source server connection info.
SOURCE_NAME = 'username@example.com'
SOURCE_HOST = 'mail.example.com'
SOURCE_PORT = 143
SOURCE_SSL = false
SOURCE_USER = 'username'
SOURCE_PASS = 'password'
# Destination server connection info.
DEST_NAME = 'username@gmail.com'
DEST_HOST = 'imap.gmail.com'
DEST_PORT = 993
DEST_SSL = true
DEST_USER = 'username@gmail.com'
DEST_PASS = 'password'
# Mapping of source folders to destination folders. The key is the name of the
# folder on the source server, the value is the name on the destination server.
# Any folder not specified here will be ignored. If a destination folder does
# not exist, it will be created.
FOLDERS = {
'INBOX' => 'INBOX',
'sourcefolder' => 'gmailfolder'
}
# Maximum number of messages to select at once.
UID_BLOCK_SIZE = 1024
# Utility methods.
def dd(message)
puts "[#{DEST_NAME}] #{message}"
end
def ds(message)
puts "[#{SOURCE_NAME}] #{message}"
end
def uid_fetch_block(server, uids, *args)
pos = 0
while pos < uids.size
server.uid_fetch(uids[pos, UID_BLOCK_SIZE], *args).each {|data| yield data }
pos += UID_BLOCK_SIZE
end
end
@failures = 0
@existing = 0
@synced = 0
# Connect and log into both servers.
ds 'Connecting...'
source = Net::IMAP.new(SOURCE_HOST, SOURCE_PORT, SOURCE_SSL)
ds 'Logging in...'
source.login(SOURCE_USER, SOURCE_PASS)
dd 'Connecting...'
dest = Net::IMAP.new(DEST_HOST, DEST_PORT, DEST_SSL)
dd 'Logging in...'
dest.login(DEST_USER, DEST_PASS)
# Loop through folders and copy messages.
FOLDERS.each do |source_folder, dest_folder|
# Open source folder in read-only mode.
begin
ds "Selecting folder '#{source_folder}'..."
source.examine(source_folder)
rescue => e
ds "Error: select failed: #{e}"
next
end
# Open (or create) destination folder in read-write mode.
begin
dd "Selecting folder '#{dest_folder}'..."
dest.select(dest_folder)
rescue => e
begin
dd "Folder not found; creating..."
dest.create(dest_folder)
dest.select(dest_folder)
rescue => ee
dd "Error: could not create folder: #{e}"
next
end
end
# Build a lookup hash of all message ids present in the destination folder.
dest_info = {}
dd 'Analyzing existing messages...'
uids = dest.uid_search(['ALL'])
if uids.length > 0
uid_fetch_block(dest, uids, ['ENVELOPE']) do |data|
dest_info[data.attr['ENVELOPE'].message_id] = true
end
end
dd "Found #{uids.length} messages"
# Loop through all messages in the source folder.
uids = source.uid_search(['ALL'])
ds "Found #{uids.length} messages"
if uids.length > 0
uid_fetch_block(source, uids, ['ENVELOPE']) do |data|
mid = data.attr['ENVELOPE'].message_id
# If this message is already in the destination folder, skip it.
if dest_info[mid]
@existing += 1
next
end
# Download the full message body from the source folder.
ds "Downloading message #{mid}..."
msg = source.uid_fetch(data.attr['UID'], ['RFC822', 'FLAGS',
'INTERNALDATE']).first
# Append the message to the destination folder, preserving flags and
# internal timestamp.
dd "Storing message #{mid}..."
tries = 0
begin
tries += 1
dest.append(dest_folder, msg.attr['RFC822'], msg.attr['FLAGS'],
msg.attr['INTERNALDATE'])
@synced += 1
rescue Net::IMAP::NoResponseError => ex
if tries < 10
dd "Error: #{ex.message}. Retrying..."
sleep 1 * tries
retry
else
@failures += 1
dd "Error: #{ex.message}. Tried and failed #{tries} times; giving up on this message."
end
end
end
end
source.close
dest.close
end
puts "Finished. Message counts: #{@existing} untouched, #{@synced} transferred, #{@failures} failures."
Update: Now includes Steve K’s patch to fix BadResponseError exceptions. Thanks Steve!
Update (2009-03-02): Brought the script up to date with several bug fixes and enhancements (including those contributed in comments below). Thanks everyone!
Update (2009-03-16): This script has been superseded by Larch, a full-fledged Ruby application that does the same thing, only faster and more reliably.