Mark Pilgrim has an interesting post today on using rewrite rules to configure Apache to keep out nasty bots; if you run your own server, it's a must-read.
In order to be a good boy myself, I hacked up the trackback module in LSblog to obey robots.txt files, even though I'm not sure it's strictly necessary (the worst it can do is go two posts deep in a site: once to find a trackback URL, and once to access the URL). But, on the upside, it should stop some 403 errors with tracking back to Google.
Since I'm sort-of-iced-in (although the promised ice storm hasn't materialized here, I didn't feel like risking being stranded away from the house), I also moved LSblog to mod_python 3.0.1; it took some fiddling with RewriteRules to make it all work nicely. It's currently using the CGI emulation (which incidentally is buggy — apply this patch), although I'll probably move to the Publisher module eventually, mainly since it has a cooler interface. Currently both the main page and the RSS feed are being served via mod_python; it seems to have halved the page-load times. (There's still some icky database queries that have to be run each page-load; maybe eventually I'll stick a reverse caching proxy in front, if the load ever justifies it. But currently my load average is pegged at 1.00, so I'm in no hurry.)
Hmm. The whole "obey robots.txt" thing didn't work out as well as hoped; it seems we might want to access a cgi-bin directory, but robots are normally excluded from those. (I guess it boils down to a question of how autonomous a robot must be before it's a robot...)
Deeply strange hoodoo. There must be some wacky interaction between LSblog's trackback and Mark's tb.cgi; the comment count goes up, but the comments page doesn't get updated.