The Cape Town March 2007 GeekDinner was held last night, and I think it can be called a success.  From my perspective, most of the geeks were mingling, there were lots of questions on the presentations, there was impromptu discussion and presentations in the form of open mic sessions, and so forth.  I found it engaging, dynamic, and interactive.

The sessions worked well - we had one fifteen-to-twenty minute session on WAPA, the Wireless Access Provider Association - I guess we could call it the main attraction.  This was given by David Jarvis, whose day job is running the wireless ISP Uninetwork.  He explained that WAPA is an industry representative body for wireless access providers, to work together to ensure a sustainable wireless access service industry.  They're there to self-regulate and live up to a code of conduct and to make sure they're all behaving properly, and ultimately would like proper legal recognition of this activity.

Then, a ten minute or so session announcing Teraco, the latest project by serial entrepreneurs Joe, Abraham (who are both behind the Frogfoot ISP and wireless access provider Amobia), and Matt Tagg (the guy behind Web Africa's success).  Teraco is to be world-class vendor-neutral data centre in Cape Town, with N+1 redundancy throughout.  They'll just provide the location - customers will need to get their own agreements to have their traffic carried from the multiple carriers in the centre, and customers will also be able to get direct links to other customers in the data centre.

Then, another ten minute or so session by Jonathan Endersby on a restaurant review site he'd like to work on, with full Web 2.0 buzzword compliance.

I gave a hopefully-very-short (and hopefully somewhat accurate) talk on OpenID, entitled OpenID in three minutes.  There were quite a few questions during the talk and in the comments time though, so it was quite a bit more than three minutes.

Jeremy Thurgood gave a similarly quick talk on Erlang, a distributed, concurrent, robust, functional programming language with cool features like hot code upgrades and soft real-time scheduling.  Andy Rabagliati did one on peering, and Morgan Collett talked about the Ubuntu-ZA community.

You can follow others' thoughts on the GeekDinner on the GeekDinner planet.

The Cape Town Linux User Group was lucky to get both a behind-the-scenes and front-end explanation by Johan Hartzenburg of ZFS - Sun Microsystems' new advanced all-singing all-dancing filesystem which is also a volume manager and, I'm sure will eventually be able to send email before becoming Emacs.

Johan explained to us how ZFS manages to always be consistent - by never editing existing metadata entries, but rather copying the entry to a new entry, editing the new entry, and then replacing the link of the original entry's parent to the new entry.  But, of course, because it never edits an entry directly, the parent goes through the same process, until it reaches the uberblock.  The uberblock never has a new copy created of itself, but there are multiple copies of it, and updating the uberblock is an atomic operation.  Even if things go awry while this is half-complete, any of the uberblocks is consistent (and, I think it has a timestamp to fall back on).

This all sounds really inefficient, but ends up not being so.  The new blocks are generally all written near each other, making a whole bunch of random writes actually often be more efficient by having all the new data and metadata all be written near each other.

Unused metadata and data blocks are then removed.

Using this design makes snapshots pretty trivial - since all you need to do is not delete the original metadata and data blocks used in the snapshot.  Everything speeds on ahead, and the scrubber just doesn't free those blocks.

Also, using this design makes changing on-disk options pretty simple.  This includes, for example, how ZFS can efficiently handle different byte orders.  On read, ZFS can handle either order, but on write will always use the most efficient byte order.  Similarly, compression can be used on a data block level - every time a change happens to a file, it can compress the new data block that is created.

This also includes how to bring more members into the pool and harness the increased I/O bandwidth.  The "allocator" just needs to allocate the new blocks created to be edited to the new member, and as time goes by, all members of the pool naturally tend to have equal amounts of data, and thus maximising bandwidth in concurrent read or write requests.

The command line tools are incredibly simple and powerful, and with ZFS you don't have to worry about device renaming, as it records on the disks all the information necessary to find out where in the ZFS hierarchy that disk lives. Easy to use, and hard to screw up?  How can it possibly succeed?

Solaris servers with Xen as Dom0 (which seems to be progressing well) with a massive ZFS storage pool and multiple virtual machines just so sounds like a winning plan.  Or FreeBSD, once ZFS-on-FreeBSD (going well, I see) and Xen-Dom0-on-FreeBSD (not quite as encouraging) are available in stable forms.

I'm growing a little tired by how the "industry" complains about lack of skills.  Now, they're saying software graduates are lazy.

I think the "industry" has a few problems.  Frankly, it's boring.  And, well, it's wrong-headed.  Why does "the industry" always look for a quantity of software graduates?  I've seen job adverts for "10 Junior PHP programmers", for example.  It's not unusual for a large number of people with similar, low-end, skills being asked for.  At the same time, there's a massive gap between that low experience level and the early-career experience level.

My first job was great.  I was well-paid, well-treated, and was surrounded by intelligent people.  Three rather decent jobs after that, I wasn't even earning the inflation-adjusted amount I was earning in my first job.  And, yet, somehow, a few months after that meant a difference of 40% or so in the salaries of the types of positions I was being asked to apply for.

I can only imagine it's as irritating for other people, working their way up between the fifteen billion other graduates that compete for the sorts of jobs you're able to apply to with your experience.  You're at least 20 times more effective than the average of those fifteen billion people, and somehow you're being extortionate for asking for 10-30% more salary.  And you need that salary, since, being interested in the field, you want to buy books, be online, and so forth.

The reason there are fifteen billion other graduates is because the companies are asking for quantity of staff, not quality of them.  People see lots of jobs open, and so decide to go "into IT".  Those people who go "into IT" because of the available jobs are just not worth as much as those actually interested in the particular subject.  10 low-experience programmers earning R5k a month (take home) aren't nearly as valuable as 2 higher-experience programmers earning R25k a month (take home), and the 10 low-experience programmers also cost more because they use more desk space, more parking bays, and so forth.

That's bad enough, of course, but now they're calling IT graduates lazy, because the average IT graduate probably is lazy, because the average IT graduate is in the wrong field.  If you want someone who isn't lazy, don't ask for 10 graduates - ask for 2 higher-experience people.

The worst reason I see for hiring more junior people is that more senior people - people with more love for the field - tend to move jobs, which is a lot worse than if you only have one or two of the ten people churning at a time.  But, frankly, look at the way you treat the more senior people, and you'll quickly find why they leave - because they're not given the things they need to perform.

Unless you're grossly underpaying your senior staff, they're likely to stay if they're treated well.   Sure, that may mean forking out more so that the development area is properly lit.  Or that the environment has enough air flow.  That the temperature is managed.  That there are enough plug points.  That there are two LCD monitors on their desks.  Heck, offices for every, or every two or three, developers, so that they're not constantly surrounded by noisy colleagues who sing along to music playing on their earphones, or are making sales calls, or who just talk to themselves loudly or discuss the cricket or how to make money fast with property with other members of the staff.  But, you'll find, they're worth it.  They're worth more than five other people, and don't cost five times as much.  And they're there and if you treat them well, they'll stay there.  And they know what they're doing already!  Less time wasted on training!

Now, people will say that I'm being elitist - that I'm not thinking about ways for junior people to join the industry.  Well, firstly, boo hoo!  Why should we care about all of the artificially high number of people who go into an industry for the wrong reason and into an industry that doesn't actually need them?  We should care about those that are in it for the right reason, and those that would be in it if given the opportunity.

With less chaff, there will be less competition for those that are in it for the jobs available.  Those that would be in it if given the opportunity are not a problem that is solved by having tons of low-experience jobs.  That requires work before they even decide what job they want to go into - they need to know that they're interested and/or suitable in it by then.

Of course, this does leave a lot of people who've been through all these courses and so forth without something to do.  Maybe we can buy all of them a series of books by W. Richard Stevens, Frederick Brooks, and Donald Knuth, and see who floats.  It'll be cheaper than the fly-by-night or utterly useless "programming course" they've been on and will go on again when they're conned into thinking it'll get them a well-paying job.

Joe Frog was wondering how sorted the South African blogosphere is now that Amatomu is on the scene.  His commentary centred around three areas - "True blogs", "local traffic patterns", and "locally-hosted content".  I was writing a comment there until I realised it was a bit long.

True blogs 

The "True blogs" is perhaps the most contentious and problematic concept to discuss - because everyone does have an idea of what a "true blog" is, and they are all right, in a way.

Were I to worry where I am located in the stats, it would be terrible for me to know that I am the #500  most interesting person in the South African blogosphere.  But it wouldn't be so bad, if the #500 means "blogs" which includes 400 news sites, 50 mass-blogging sites, and so forth.  If I ended up at #25 of the personal blogs - the "true blogs" in my "personal blogger" mindset, then I'd feel quite good.  But I'd be a little irritated that I'd have to self-justify this to myself whenever I looked at the stats.  (Hourly, I'm sure, in this mindset.)

The #1 technology blog, and #3 overall blog, at the time of writing is Tectonic, the great news site started by Alastair Otter.  Because of my long-time admiration of it, it would be hard for me to support a definition of "true blog" that would deny them access.  The core content comes from three or so reporters who put a lot of effort into generating content - and is the primary source of news on the local open source scene (again, admitting my bias).

On the other hand, if IOL (which, since I was lead developer there for some time, I have an attachment to as well) was suddenly a "blog", it would devalue the meaning of the listings.  And, frankly, mainstream media is a lot of what people who follow blogs are looking for an alternative for.  The fact that IOL is primarily an aggregation of mainstream media stories - of articles that were in a newspaper or which arrived from a press feed - must deny them the title "blog".

Similarly to Tectonic, #3 politics blog, Commentary, is obviously a "true blog" with its three core contributors (I think there's been the occasional post by others - someone can correct me if it bothers them).  But I find it hard to justify #28 overall, My Broadband and My ADSL blogs, as a "true blog" (sorry to take it out on you - nothing personal, you're just the first to catch my eye).

Maybe it's about coherency and consistency: Commentary is very much about politics; it's what makes Commentary what it is.  But group blogs (even though there seem to be only six or so people there) like My Broadband and My ADSL blogs, it's more like six separate voices through one funnel.  It's like those interested in Commentary are almost always going to be interested in everything said, due to the cohesive content, but I wouldn't like to have to listen to the other voices if I found a particular author on a group blog when adding the RSS feed.  If there's a separate RSS feed for each contributor, they should each be separate (or we'd have to treat dotnet.org.za with its 20+ bloggers with separate blogs as just one "blog" as well).

Beyond this, is there much we can do beyond classifying blogs as "personal", "company", "news", and so forth, and managing the list separately?  And what advantage is there to this beyond soothing a few bruised egos?

Well, for me, I'm just not interested in blogs that don't discuss personal feelings on news items.  Unless you're Andrew Sullivan, you just don't get to write personal feelings on the news with more than a few items a day.  I don't want to know _what_ happened, I want to know what people I respect think is important enough to talk about, and what their feelings on it are.  The value of Keo is much higher than the value of "rugby24.com" which just regurgitates what's happening in rugby.

Local traffic patterns

Local traffic patterns is an interesting subject, but not much to discuss.

As someone whose content is primarily of interest to two rather small niches - a larger group of open source (primarily Python) developers, and a much smaller group of South African open source people, I'd be rather unhappy that simply because there aren't many South African open source (primarily Python) developers that my "South African-ness" rating suffers.  I'm a South African, and I generate all my content myself.  I don't "point" much, and when I do, it's for South African content (usually news about open source events).  I write code, and walk people through the code.  I explain how to do things.  If I'm not doing that, I'm giving personal opinion.

On the other hand, one has to wonder if something like Engadget (were it written by South Africans, or even hosted in South Africa (haha)) should be counted as a "South African blog", since its traffic would almost exclusively be from international visitors.

Something like GeoIP could be used to capture the data easily enough - but how to display it?

Locally-hosted content

Well, this is a no-brainer.  I can't get hosting in this country for nearly as cheaply as I can from elsewhere.  I don't run some boring blog-clone hosted on typepad or Blogspot, or run a blog-clone on my own Wordpress instance - I wrote my own damn blog software, and it doesn't use some lowest-common-denominator programming language!  And, well, I'm a bit of a geek, so I want root on the machine, and it'll run all sorts of other things besides a web site!  And it must have a decent amount of memory (512MB+), or I couldn't do lots of stuff I'd like to do!

Of course, I'd love to be hosted locally for the same amount of money I pay now (or even 25% more) for the exact same level of service - in terms of uptime, latency, bandwidth, and traffic costs.  But it's really not of interest to a service like Amatomu.

I was innocently trying to add OpenID to an application, following the advice on Damian Cugley's article on using OpenID with Turbogears and on the TurboGears' documentation site's article on OpenID with Identity when I realised that it was way too much like hard work to implement.  Edit this page, put this there, and so forth.  Why couldn't I just import something and have it Just Work?

Well, it seems, that's not hard at all, actually.  Thus, TGOpenIDLogin, which Just Works (well, python-openid Just Works, and I just use that) when you transfer login and so forth to it.  It's a turboGears controller that you can hook up into any TurboGears application and have it take care of logging in of people using OpenID.

It can't be simpler:

from tgopenidlogin.controllers import OpenIDLoginController

class Root(controllers.RootController):
    ...

    openid = OpenIDLoginController(User, VisitIdentity)
    login = openid.login
    logout = openid.logout

It remembers where you were trying to go, and comes with a simple OpenID form to put on any page which will remember what page you were on when you tried to log in - it's tgopenidlogin.widgets.OpenIDLoginForm.

You need to pass in the model for your User and VisitIdentity objects so that it can create users and update user details from their OpenID server, and so that it can log them on. Your User model needs to support usernames of 255 characters long. You can also pass in the web path to the OpenIDLoginController (it defaults to "/openid" relative to your web app base). You can pass in your own OpenID store, or it'll use a SQLite store (well, if you have pysqlite2 installed). You can also set your OpenID trust_root, or it'll default to the base of your web application.

Nothing invalidates logging in with plain username and password if you still want that. Your current login page can have a separate form (using widgets.OpenIDLoginForm, for simplicity) or a link to the TGOpenIDLoginController - just don't put in login = openid.login in your controller...

Still quite a bit to do:

  • It doesn't save original_parameters like the standard TurboGears login does.  This will require storing the parameters somewhere - probably the session.
  • The User/VisitIdentity stuff might only work with SQLAlchemy with assign_mapper and Elixir and maybe SQLObject - non-assign_mapper SQLAlchemy will need a separate handler.  This is probably easiest handled by making it easy to inherit from the controller.
  • The post-authentication action, redirecting to the target page, might not be useful for places that want full registration.  Again, probably best to stub it out with a default implementation, and let people inherit from the controller and override.
  • Oh, I haven't really tested interoperability.  I just used the example server from python-openid, and one that just failed, and few pages without OpenID server links.
Three geeky meetings this week in Cape Town.  The (Western) Cape Linux User Group is having a meeting on Tuesday on the ZFS filesystem at their usual venue at UCT, and the usual dinner afterwards.  On Wednesday, the first revised re-highjacked GeekDinner will be held at Barbarella's at Constantia, and there'll be networking and the occasional presentation on things like geeky things (we hope).  And finally, the Cape Town Python User Group also returns after an absence for their second meeting on Saturday at AIMS in Muizenburg.

I've been trying to make Gibe, my not-yet-released web log software, less and less about my needs and more about being able to cope with others' needs.  That means ripping out a lot of stuff that's custom to me, but somehow making it still available on my own web log.  I've made some progress with that - I've made the comment format customisable, as well as found a solution to add custom scripts like Google Analytics and a syntax highlighter.  It also means making it easy to add new stuff, which is what my experiment over the weekend was - to add a set of social bookmarking links to posts, so that people can post interesting links to places like Digg, Reddit, del.icio.us, and South Africa's Muti.

The sociable Wordpress plugin is one I've seen around, and I figured I might as well see how it was put together.  At its core, it is a list of social bookmarking sites and how to post to them:

$sociable_known_sites = Array(

        'blinkbits' => Array(
                'favicon' => 'blinkbits.png',
                'url' => 'http://www.blinkbits.com/bookmarklets/save.php?v=1&source_url=PERMALINK&title=TITLE&body=TITLE',
        ),

        'BlinkList' => Array(
                'favicon' => 'blinklist.png',
                'url' => 'http://www.blinklist.com/index.php?Action=Blink/addblink.php&Url=PERMALINK&Title=TITLE',
                'description' => 'Description',
        ),

        'BlogMemes' => Array(
                'favicon' => 'blogmemes.png',
                'url' => 'http://www.blogmemes.net/post.php?url=PERMALINK&title=TITLE',
        ),
        ...
);

First order of business was to convert that into a Python data structure.  Besides taking a detour via the itertools module just for fun, it didn't take too much effort:

#!/usr/bin/env python

import sys
import itertools
import re

if len(sys.argv) > 1:
    fp = open(sys.argv[1])
else:
    fp = sys.stdin

lines = fp.readlines()

beg = re.compile('^\$sociable_known_sites')
end = re.compile('^\);')
def find_beginning(item):
    return not beg.match(item)

def find_end(item):
    return not end.match(item)
    
def dropbefore(lines):
    for line in itertools.dropwhile(find_beginning, lines):
        yield line

def dropafter(lines):
    for line in itertools.takewhile(find_end, lines):
        yield line

start_item = re.compile(r"""^\t('[^']*') => Array\(""")
favicon_item = re.compile(r"""\t\t('favicon') => '([^']*)',""")
mid_item = re.compile(r"""\t\t('[^']*') => ('[^']*'),""")
end_item = re.compile(r"""\t\),""")

def start_item_handle(m):
    print "\t%s: {" % (m.groups()[0],)

def favicon_item_handle(m):
    k, v = m.groups()
    print "\t\t%s: turbogears.url('/tg_widgets/tgsociable/images/%s')," % (k, v)

def mid_item_handle(m):
    k, v = m.groups()
    v = v.replace("&", "&")
    print "\t\t%s: %s," % (k, v)

def end_item_handle(m):
    print "\t},"

handlers = [
    (start_item, start_item_handle),
    (favicon_item, favicon_item_handle),
    (mid_item, mid_item_handle),
    (end_item, end_item_handle),
]

def handle_lines(lines):
    for line in dropafter(dropbefore(lines)):
        line = line.rstrip()
        for regex, handler in handlers:
            m = regex.search(line)
            if m:
                handler(m)
                break

print "import turbogears"
print "all_sites = {"
handle_lines(lines)
print "}"

This outputs the expected Python code (using turbogears.url in a not-to-correct way...):

import turbogears
all_sites = {
        'blinkbits': {
                'favicon': turbogears.url('/tg_widgets/tgsociable/images/blinkbits.png'),
                'url': 'http://www.blinkbits.com/bookmarklets/save.php?v=1&source_url=PERMALINK&title=TITLE&body=TITLE',
        },
        'BlinkList': {
                'favicon': turbogears.url('/tg_widgets/tgsociable/images/blinklist.png'),
                'url': 'http://www.blinklist.com/index.php?Action=Blink/addblink.php&Url=PERMALINK&Title=TITLE',
        },
        ...
} 

Next, displaying the actual HTML.  At this point, the PHP is not conducive to programmatic Python conversion:

        $html .= "\n<div class=\"sociable\">\n<span class=\"sociable_tagline\">\n";
        $html .= get_option("sociable_tagline");
        $html .= "\n\t<span>" . __("These icons link to social bookmarking sites where readers can share and discover new web pages.", 'sociable') . "</span>";
        $html .= "\n</span>\n<ul>\n";

        foreach($display as $sitename) {
                // if they specify an unknown or inactive site, ignore it
                if (!in_array($sitename, $active_sites))
                        continue;

                $site = $sociable_known_sites[$sitename];
                $html .= "\t<li>";

                $url = $site['url'];
                $url = str_replace('PERMALINK', $permalink, $url);
                $url = str_replace('TITLE', $title, $url);
                $url = str_replace('RSS', $rss, $url);
                $url = str_replace('BLOGNAME', $blogname, $url);
                $url = str_replace('VERSION', $sociable_version, $url);

                $html .= "<a href=\"$url\" title=\"$sitename\"";
                if ($site['description'])
                    $html .= " onfocus=\"sociable_description_link(this, '{$site['description']}')\"";
                $html .= ">";
                $html .= "<img src=\"$imagepath{$site['favicon']}\" title=\"$sitename\" alt=\"$sitename\" class=\"sociable-hovers";
                if ($site['class'])
                    $html .= " sociable_{$site['class']}";
                $html .= "\" />";
                $html .= "</a></li>\n";
        }

        $html .= "</ul>\n</div>\n";

        return $html;

This turns out a lot prettier thanks to the templating engine (Kid, in this case):

class SociableWidget(widgets.Widget):
    template = """
<div class="sociable" xmlns:py="http://purl.org/kid/ns#">
  <span class="sociable_tagline">
    <strong py:content="sociable_tagline">get_option("sociable_tagline");</strong>
    <span py:content="sociable_tagline_description">_("These icons link to social bookmarking sites where readers can share and discover new web pages.", 'sociable')</span>
  </span>

  <ul>  
    <li py:for="site in sites">
      <a py:attrs="site['anchor_attrs']">
        <img py:attrs="site['img_attrs']" />
      </a>
    </li>
  </ul>
</div>
    """

There's a bit more work to make it automatically include the right CSS and Javascript, and to make it configurable (including allowing for extra sites to be added) from the caller of the widget:

    css = [
        widgets.CSSLink("tgsociable", "css/sociable.css"),
    ]
    javascript = [
        widgets.JSLink("tgsociable", "javascript/description_selection.js"),
    ]

    params_doc = {
        'active_sites' : 'Sites to display sociable icons for',
        'sociable_tagline' : 'Tag line heading',
        'sociable_tagline_description' : 'Tag line explanation',
        'extra_sites' : 'Sites not in the existing sites list that you want to use',
    }
    params = params_doc.keys()

    active_sites = ["Digg", "Reddit", "del.icio.us"]
    sociable_tagline = "Share and Enjoy:"
    sociable_tagline_description = "These icons link to social bookmarking sites where readers can share and discover new web pages."
    extra_sites = {
# Example:
#        'muti': {
#                'favicon': 'http://muti.co.za/images/favicon.ico',
#                'url': 'http://muti.co.za/submit?url=PERMALINK&title=TITLE',
#        },
    }

    def update_params(self, d):
        super(SociableWidget, self).update_params(d)
        active_sites = d['active_sites']
        d['sites'] = []

        my_all_sites = all_sites.copy()
        my_all_sites.update(d['extra_sites'])

        for sitename in active_sites:
            if sitename not in my_all_sites:
                continue

            site = my_all_sites[sitename]

            url = site['url'];
            url = url.replace('PERMALINK', d['post_url'])
            url = url.replace('TITLE', d['post_title'])
            url = url.replace('RSS', d['blog_rss'])
            url = url.replace('BLOGNAME', d['blog_name'])
            sociable_version = "2.0"
            url = url.replace('VERSION', sociable_version)

            anchor_attrs = {}
            anchor_attrs['href'] = url
            if 'description' in site:
                anchor_attrs['onfocus'] = "sociable_description_link(this, '%s')" % (site['description'],)
            img_attrs = {}
            img_attrs['src'] = site['favicon']
            img_attrs['title'] = sitename
            img_attrs['alt'] = sitename
            img_attrs['class'] = "sociable_hovers"
            if 'class' in site:
                img_attrs['class'] += " " + site['class']
            d['sites'].append(dict(anchor_attrs = anchor_attrs, img_attrs = img_attrs))

To use the widget is trivial. First, create the widget instance:

# No configuration - shows Digg, Reddit, and del.icio.us
widget = SociableWidget()

# Choose which sites to show:
widget = SociableWidget(active_sites = [#39;del.icio.us'])

# Add extra sites of your own
extra_sites = {
    'muti': {
    'favicon': 'http://muti.co.za/images/favicon.ico',
    'url': 'http://muti.co.za/submit?url=PERMALINK&title=TITLE',
}
widget = SociableWidget(extra_sites = extra_sites, active_sites = ['muti', 'del.icio.us'])

You can then pass this widget into a template. In the template, you need to provide the post URL, post title, RSS feed, and blog name when you display the widget - something like:

<span py:replace="ET(widget.display(post_url=tg.base_url, 
    post_title=post.title, blog_name = blog.name,
    blog_rss = tg.url_for('rss2.0.xml')))" />

At this point, this is just a plain TurboGears widget, that can be used in any TurboGears application (converting it to ToscaWidgets would be pretty trivial, and then it'll also be usable from Pylons and other platforms that are supported). Hooking it up so that it automatically gets displayed in Gibe posts took a bit more work, and I'll post about that later.

You can download the TGSociable widget on my TGSociable page.  You can also follow other posts about TGSociable here - I'll use the tag tgsociable.

Woo JoeGeekDinner has got some press on IOL Technology.  Hopefully this will help attract those geeks who aren't part of the whole blogging thing (yet).  It's just less than a week to go - 28th March at Barbarellas in Constantia Village, Cape Town.  Still need to figure out what I'm going to talk about in one of the three-minute-talk/two-minutes-questions sessions, though.

When I first started writing gibe, I wasn't too concerned about much beyond the adventure of writing my own software.  In fact, I'm still not particularly concerned about more than that.  But my initial three-second decision and implementation of TinyMCE for comments has caused at least as much trouble as I expected it to, and I've now crossed off the "make it so that different comment formats are supported" item in my checklist.

Gibe was always intended to have a plugin architecture.  Plugins would add new URLs to the routes router, allowing me to add RSS, tags pages, and so forth, all without changing any core code.  Plugins would register themselves as using pkg_resources.  It would be a sunny, but cool, day at the beach with the dogs.  And so forth.

But until that wonderful day (I'm hoping this weekend), I'll have to settle with the comment format problem.

Initially, the comment form was generated using TurboGears's wonderful widgets system (and will soon use its heir, ToscaWidgets):

class CommentFormFields(widgets.WidgetsList):
    name = widgets.TextField(validator=validators.NotEmpty,
        label = "Your name", attrs=dict(size=60))
    email = widgets.TextField(
        validator=validators.All(validators.Email, validators.NotEmpty),
        label = "Email", attrs=dict(size=60))
    website = widgets.TextField(validator=validators.URL,
        label = "Site", attrs=dict(size=60))
    comment = TinyMCE(validator=validators.NotEmpty, label = "Comment",
        mce_options = dict(
            mode = "exact",
            theme = "advanced",
            plugins = "fullscreen",
            relative_urls = False,
            theme_advanced_buttons2_add = "fullscreen",
            extended_valid_elements = "a[href|target|name]",
            paste_auto_cleanup_on_paste = True,
            paste_convert_headers_to_strong = True,
            paste_strip_class_attributes = "all",
            theme_advanced_buttons3 = "",
            remove_linebreaks = False,
            browsers = "msie,gecko",
        ),
        rows=15,
        cols=60,
    )
    postid = widgets.HiddenField()

comment_form = widgets.TableForm(fields=CommentFormFields(), submit_text="Post")

The comments were displayed directly from the model from the templates, assuming HTML output:

        <div class="blogcomment" py:for="comment in post.comments" py:if="comment.approved">
        <div id="au${comment.comment_id}">
        <p><span class="blog_comment_post_time" 
          py:content="comment.posted_time.strftime(str('%B %d, %y %X'))">September 20, 2006 22:00</span>
        |
          <span class="reference" py:content="tg.ET(comment.getReference())"><a href="http://www.greenman.co.za/b2evolution/blogs/">Ian</a></span></p>
        <p py:content="HTML(comment.content)">Any idea what the other three were using?</p>
        </div>
        </div>

Next step - add a getContentHtml method to the model (could've been a property, I suppose. Can always change later...), and use that from the template. First, just the plain method for the current case:

    def getContentHtml(self):
        return self.content

Then, add a content_format column to my Comment model (I'm currently using ActiveMapper on SQLAlchemy, but I'll be moving to the Elixir declarative layer over SQLAlchemy soon), defaulting to html if no content_format is provided, so that a default conversion to HTML can be done for it:

class Comment(ActiveMapper):
    class mapping:
        comment_id = column(Integer, primary_key=True)
        post_id = column(Integer, foreign_key=ForeignKey('post.post_id'))
        blog_id = column(Integer, foreign_key=ForeignKey('blog.blog_id'))
        author_id = column(Integer, foreign_key=ForeignKey('user.user_id'))
        author_name = column(Unicode(255))
        author_url = column(String(255))
        author_email = column(String(255))
        content = column(Unicode())
        posted_time = column(DateTime)
        approved = column(Boolean, default=False)
        content_format = column(Unicode(25), default = "html")

I decided to use the dispatch module (sorry, that's the best link I could find) to decide how to convert from the comment's content in its format to HTML - it allowed me to easily provide alternate implementations in the same place before going fully into the plugin space:

    @dispatch.generic()
    def getContentHtml(self):
        pass

    @getContentHtml.when('self.content_format == "html"')
    def getContentHtmlfromHtml(self):
        return self.content

Everything still works, but there's no way to change the comment form field, and no way to specify the comment format and do anything necessary before writing it to the database. I now added an entry point, a means by which modules can advertise themselves (or their classes or functions) as available for a particular topic. In this case, a comment format "topic", which is basically a registry of comment formats and how to deal with them. First, edit setup.py to add an entry point for the current TinyMCE-based system:

    entry_points = """
        [gibe.comment_formats]
        tinymce = gibe.tinymcesupport
        postmarkup = gibe.postmarkupsupport
    """,

I'm probably committing a major faux pas, but I read the entry points into a simple format registry dictionary:

format_registry = {}
for comment_format_mod in pkg_resources.iter_entry_points("gibe.comment_formats"):
    mod = comment_format_mod.load()
    format_registry[comment_format_mod.name] = mod

Now the comment form needs to be updated so that it can use whichever format it is configured to use in the configuration file. The comment form fields need to be modified by the comment format plugin, and a content_format field needs to be added to the comment form too (and should be validated to be a format we can handle). This turns out to be pretty easy:

def fields():
    formats = []

    formats_to_check = [
        cherrypy.config.get('gibe.comment_format.preferred'),
        cherrypy.config.get('gibe.comment_format.fallback', None),
        ['postmarkup', 'tinymce'],
    ]

    for fs in formats_to_check:
        if isinstance(fs, (str, unicode)):
            fs = [fs]
        for f in fs:
            if f not in formats:
                formats.append(f)

    for format in formats:
        if format in format_registry:
            class CommentFormFields(widgets.WidgetsList):
                postid = widgets.HiddenField()

                content_format = widgets.HiddenField(default=format,
                    validator=validators.OneOf(format_registry.keys()))

                name = widgets.TextField(validator=validators.NotEmpty,
                    label = "Your name", attrs=dict(size=60))

                email = widgets.TextField(label = "Email",
                    validator=validators.All(validators.Email, validators.NotEmpty),
                    attrs=dict(size=60))

                website = widgets.TextField(validator=validators.URL,
                    label = "Site", attrs=dict(size=60))

            comment_form_fields = CommentFormFields()

            format_registry[format].addCommentFields(comment_form_fields)
            return comment_form_fields

comment_form = widgets.TableForm(fields=fields(), submit_text="Post")

On the TinyMCE format plugin side, the addCommentFields function adds the TinyMCE form field to the widgets list:

from tinymce import TinyMCE

from turbogears import widgets, validators
def addCommentFields(wl):
    class CommentFormFieldsExtra(widgets.WidgetsList):
        comment = TinyMCE(validator=validators.NotEmpty, label = "Comment",
            mce_options = dict(
                mode = "exact",
                theme = "advanced",
                plugins = "fullscreen",
                relative_urls = False,
                theme_advanced_buttons2_add = "fullscreen",
                extended_valid_elements = "a[href|target|name]",
                paste_auto_cleanup_on_paste = True,
                paste_convert_headers_to_strong = True,
                paste_strip_class_attributes = "all",
                theme_advanced_buttons3 = "",
                remove_linebreaks = False,
                browsers = "msie,gecko",
            ),
            rows=15,
            cols=60,
        )
    wl.extend(CommentFormFieldsExtra())

Almost done with the core code - the add comment method needs to allow the comment format plugin to convert, sanitise, or reject the incoming comment. In the original, the Genshi HTMLSanitizer filter is used to sanitise the HTML. I decided that the plugin modules would have a class named Commenting available for doing conversion and rejection and possibly for post-save actions. (This also allows for modules to convert from a format to HTML, and then change the content_format variable to html, and not have to write their getContentHtml implementation.)

    @error_handler(post)
    @validate(form=comment_form)
    def add_comment(self, blog, **kw):
        post = Post.get_by(post_id = kw['postid'])
        if not post.accept_comments:
            flash("Post does not allow comments")
            raise routes.redirect_to('posts', post = post)

        content_format = kw['content_format']
        commenting = format_registry[content_format].Commenting()

        commenting.convert(kw)

        ckw = {
            'post_id': kw['postid'],
            'blog_id': blog.blog_id,
            'author_name': kw['name'],
            'author_url': kw['website'],
            'author_email': kw['email'],
            'content': kw['comment'],
            'approved': True,
            'posted_time': datetime.now(),
            'content_format': kw['content_format'],
        }

        c = Comment(**ckw)

        commenting.post_save(kw)

        self._check_for_spam(blog, c, kw)

        raise routes.redirect_to('posts', post = post)

Implementing this again is quite trivial:

import commenting
from gibe.util import sanitise
class Commenting(commenting.Commenting):
    def convert(self, kw):
        kw['comment'] = sanitise(kw['comment']).decode('utf-8')

And that's about it for the core code, and recreating the TinyMCE support. Of course, the whole point was to offer other comment formats, and so I implemented the newly created postmarkup module which provides BBCode-like support for user-supplied data in a controlled fashion.

First, the postmarkup object needs to be created:

pm = postmarkup.PostMarkup().default_tags()
pm.add_tag('link', postmarkup.LinkTag, 'link')
pm.add_tag('quote', postmarkup.QuoteTag)
pm.add_tag('code', CodeTag)

Then, a function to convert from postmarkup to HTML:

from gibe.model import Comment
@Comment.getContentHtml.when('self.content_format == "postmarkup"')
def getContentHtmlPostMarkup(self):
    return pm.render_to_html(self.content.encode('utf-8')).decode('utf-8')

An addCommentFields function:

def addCommentFields(wl):
    class CommentFormFieldsExtra(widgets.WidgetsList):
        comment = widgets.TextArea(label = "Your comment", rows=15, cols=45,
            validator=validators.NotEmpty,
        )
        explanation = PostMarkupExplanation()
    wl.extend(CommentFormFieldsExtra())

No conversion necessary to write to the database, so just an empty Commenting class:

class Commenting(commenting.Commenting):
    def convert(self, kw):
        pass
    def post_save(self, kw):
        pass

For bonus points, I added an explanation to the add comment form so that people know how to format their comments:

from turbogears import widgets, validators
class PostMarkupExplanation(widgets.FormField):
    template = """
    <div xmlns:py="http://purl.org/kid/ns#"
        class="${field_class}"
        id="${field_id}"
    >
        <p>The text area above accepts Post Markup, a BBCode work-alike.</p>

        <pre>
[b]foo[/b]: <strong>foo</strong>
[i]foo[/i]: <em>foo</em>
[link]http://nxsy.org/[/link]: <a href="http://nxsy.org/">http://nxsy.org/</a> [nxsy.org]
[link http://nxsy.org/]Neil[/link]: <a href="http://nxsy.org/">Neil</a> [nxsy.org]
        </pre>

        <p>You can also use:</p>
        <pre>
[code python]
import foo
[/code]
        </pre>
    </div>
    """

Add it to the entry_points in setup.py, and we're done.

Usually I intro with some inane comment like:

There are two types of people - those who love Amarok, and those that don't matter.

But now I get to say:

I use Amarok, as recommended (and generally given fan-boy loving) by Wil Wheaton.

Anyway, I use Amarok (formerly amaroK), and I love that it makes exploring my music fun.  My noisy work environment (grr!) means that I'm spending almost all my time listening to music, which has certainly made me appreciate Amarok more.  But occasionally I'm summoned from my other, productive world by real-world "needs" like food, drink, the toilet, and having to find out what someone means when there's no spec to consult (grr!).

Being a former systems administrator (and, indeed, a former card-carrying security specialist - the card is now a bookmark...), I lock my console for even the smallest interruption.  After the first few hundred interruptions (ie, the first two or three days), I got irritated by not having paused my music and having locked my screen and having to unlock it, pause my music, and lock again.  So, I wrote something to automatically pause when I lock my screen - I'm using GNOME's screensaver (aka gnome-screensaver) on Ubuntu.

Unlike xscreensaver, it doesn't have a -watch option - you have to listen to dbus events.  Hint to gnome-screensaver people - dbus is a nice behind-the-scenes way of doing things, but sometimes it is nice to have a specific way to watch for things.  Even if it just runs dbus-monitor with the right commands for you.  Let's not forget our Unix heritage...

Getting the pausing working from dbus messages was actually quite simple - I just combined a Perl regex from one source, and Amarok command line options from another, in a simple Python program:

#!/usr/bin/env python

import subprocess
import re

DBUS_MONITOR = ["dbus-monitor", "--session",
    "type='signal',interface='org.gnome.ScreenSaver',member='SessionIdleChanged'"]
PAUSE_AMAROK = ["amarok", "--pause"]
PLAY_AMAROK = ["amarok", "--play-pause"]

screensaver_on = re.compile("boolean true")
screensaver_off = re.compile("boolean false")

def main():
    a = subprocess.Popen(DBUS_MONITOR, bufsize=1,
        stdout=subprocess.PIPE, stderr=subprocess.STDOUT,
        close_fds=True)
    out = a.stdout

    while a.poll() is None:
        line = out.readline()
        if screensaver_on.search(line):
            subprocess.Popen(PAUSE_AMAROK).communicate()
        if screensaver_off.search(line):
            subprocess.Popen(PLAY_AMAROK).communicate()

Simply, dbus-monitor watches the dbus events and delivers the events that are asked for (otherwise all of them), and send them to stdout.  When the screen saver turns on, I tell Amarok to pause.  When it turns back off, I tell Amarok to unpause.  To be utterly random, I used the subprocess module to call dbus-monitor and Amarok's command line.

Amarok also offers a DCOP interface to tell it what to do and find out what it is doing.  Between the dbus and dcop Python modules, we could get rid of all the silly command line stuff.  But it works fine now.  (And since dbus is replacing DCOP in KDE4, there will almost certainly be a Amarok plugin built-in to do this.)

I also added simple Python daemonising code, stolen from the ActiveState Python Cookbook, so that I can just fire-and-forget it:

def daemonize(func):
    import os
    import sys
    try: 
        pid = os.fork() 
        if pid > 0:
            # exit first parent
            sys.exit(0) 
    except OSError, e: 
        print >>sys.stderr, "fork #1 failed: %d (%s)" % (e.errno, e.strerror) 
        sys.exit(1)

    # decouple from parent environment
    os.chdir("/") 
    os.setsid() 
    os.umask(0) 

    # do second fork
    try: 
        pid = os.fork() 
        if pid > 0:
            # exit from second parent, print eventual PID before
            print "Daemon PID %d" % pid 
            sys.exit(0) 
    except OSError, e: 
        print >>sys.stderr, "fork #2 failed: %d (%s)" % (e.errno, e.strerror) 
        sys.exit(1) 

    # start the daemon main loop
    func()

if __name__ == "__main__":
    daemonize(main)