Wednesday, March 28, 2007

Posting to Blogger via Ruby

TextMate has what seems to be a very nice blogging bundle for programmatically sending posts to your blogging engine of choice. Except that it doesn't work for the new Blogger API. Or at least it didn't the last time I checked. Mostly I just wanted to see if I could write my own script to send to Blogger.

This is a Ruby script based on the Python script located at http://djcraven.blogspot.com/2006/10/success-posting-to-blogger-beta-using.html -- the author of that script did the heavy lifting in terms of the GData API calls, what I did was translate it into Ruby in a somewhat more flexible structure. At least I hope so. I also hope this will look like decent, idiomatic Ruby and not like a horrific hack. And I'd like a pony. If you're asking.

Let's do this in pieces. The first piece of the puzzle is using the Google ClientLogin API to get an authentication token. The token is then passed as a parameter to later calls when we actually want to post something. Google says that the lifespan of the token is dependent on the application being used, but I don't see where they specify how long Blogger keeps them.

Here's the start of our class. We're in a module called Blogger and a class called Blog. I've got an external dependency here on BlueCloth because I'm also going to automatically translate Markdown later, but that's not something you need to do...


module Blogger

require 'net/https'
require 'net/http'
require "uri"
require "bluecloth"

class Blog

attr_accessor :account, :password, :blogid

@@auth_url = "www.google.com"
@@auth_path = URI.parse('/accounts/ClientLogin')

def http
http = Net::HTTP.new(@@auth_url, 443)
http.use_ssl = true
http
end

def request_data
["Email=#{account}",
"Passwd=#{password}",
"service=blogger",
"service=TestCompany-TestApp-0.0"].join("&amp")
end

def auth_headers
{'Content-Type' =>
'application/x-www-form-urlencoded'}
end

def auth_token
response, data = http.post(@@auth_path,
request_data, auth_headers)
return nil unless response.code == '200'
data.match(/Auth=(\S*)/)[1]
end


After the setup lines, we've got three methods here. The first two define the data objects. The first one creates a Ruby HTTP object, set up for an HTTPS connection to the Google ClientLogin URL. The second builds the request data string, tying together the four pieces of data into a single string. They are both used in the auth_token method -- I'm taking advantage of one of my favorite features of Ruby, which is the lack of distinction between local variables, no-argument methods, and data field getters.

So, http, request_data, and auth_headers are just dropped into the auth_token code as if they were local, even though they are separate methods. To me, that makes the code read cleanly, and encourages moving small bits out to separate methods where they can be separately tested and documented.

The auth_token itself takes the path, the request_data, and the headers, and uses the HTTP object to make a secure post call. Request data, by the way, has account and password information that I'm assuming would be defined in an abstract subclass of this Blog class. If the response code is 200, we're good to go, and we do a little regular expression magic to extract and return the authentication token.

We use the authentication token in our actual post call, the one that contains the blog post itself. We need to build up the XML Atom document to send to the Blogger server. Step one is to get the data. Since I'm going to be doing this from TextMate, I'm going to assume we're going to start with the file name that will eventually be provided by the TextMate command. First, we'll convert it to an array of lines.


def read_data(filename)
text = open(filename) { |f| f.readlines }
end


In case it's not already clear, this method and the ones that follow are all parts of the
Blog
class. We're splitting this into lines to facilitate some processing of the text. I'm assuming that the first line of my file will be the title of the post, the second line will be a comma-delimted list of category labels, and the remainder of the file will be the body, written in Markdown. Obviously, that specific format is a weird quirk of the particular blog I'm posting to, which is set up to take XML and not to preserve blank lines. In your case, do whatever you need to do. Building that data structure
is quite simple in Ruby.


def build_data(lines)
categories = lines[1].split(",").map {|c| c.strip}
body = BlueCloth.new(lines[2..-1].join(" ")).to_html
data_xml(lines[0], categories, body)
end


The last line of build_data calls the data_xml method that builds up the XML document:


def data_xml(title, categories, body)
result = []
result << "<entry xmlns='http://www.w3.org/2005/Atom'>"
result << " <title type='text'>#{title}</title>"
for cat in categories
result << " <category scheme='http://www.blogger.com/atom/ns#' "
result << " term='#{cat.strip}'/>"
end
result << " <content type='xhtml'>"
result << body.to_s
result << " </content>"
result << "</entry>"
result.join("\n")
end


I was going to use the REXML library for this -- normally I'm a big fan of building XML programatically. However, REXML really, really didn't like it when the included content contained HTML tags. So I decided it'd be much less aggravating to build the XML from scratch. Details on the syndication format can be found in the GData and Atom online docs.

Once we can create the data, we can finally make our post:


def post_headers
{'Content-Type' => 'application/atom+xml',
'Authorization' =>"GoogleLogin auth=#{auth_token}"}
end

def uri
"http://www.blogger.com/feeds/#{blogid}/posts/full"
end

def post(lines)
response, content = http.post(uri, build_data(lines), post_headers)
while response.code == '302'
response, content = http.request(response.location, entry.headers)
end
return response.code == '201'
end

def post_file(filename)
post(read_data(filename))
end
end
end # of module


Couple things to point out here. First, we call auth_token directly when we're building up our header in preparation for the call -- so that's when we perform the HTTPS login shown above. The while loop takes care of following along if Blogger decides to redirect the post. Finally, we return true if Blogger sends us an OK response code 201, meaning the post has been successfully added.

To use this, you need to set up your own subclass of Blog with the expected information, for example.


class MyBlog < Blog
def initialize
@account = "YOUR_GMAIL_HERE"
@password = "YOUR_PASSWORD_HERE"
@blogid = "YOUR_BLOG_ID"
end
end


As written, that would be included in the Blogger module, otherwise, you'd need to qualify the name Blog. The invocation of the whole thing looks like this. This assumes that the filename with the post is in ARGV[0], placed there by TextMate.


p Blogger::MyBlog.new.post_file(ARGV[0])


One problem, though. We're making an HTTPS connection for authentication to keep our password secure, but including the password in plain text in the script file. Good point. There's a cool way around that in TextMate, and probably