Benjamin Johnston - Markdown Metadata

My homepage is a place where I can experiment with technology as I know the “client” is tech-savvy.

When designing the technical architecture behind this site, I had three objectives:

The site must be served from static HTML
The site must be easy to update
The technology must be simple and elegant

My solution is to use Markdown encapsulated in RFC822 messages.

A Directory of Text Files

Nothing can be more simple and easy to update than a directory of text files!

Markdown: Markdown is a simple text-like syntax for rich documents. Markdown documents are easily transformed into formatted HTML and so a simple directory of Markdown files almost addresses all three criteria. Unfortunately, Markdown lacks a syntax for meta-data so a bit of extra work is required to fit Markdown into a web publishing workflow.
RFC822: The 20+ year old email message format defined in RFC822 (or RFC5322) is perfect for encapsulating text files. Indeed, this is all that email was in its early days: a text message with a set of headers containing meta-data.

Example Message

For example, a simple post might be stored in a file such as the following:

Title: Example
Type: Article

Hello, World!

Implementation

The wonderful thing about using text-based formats and widespread standards is the ease of processing. For example, the following Python code will load a post stored in a file called test.txt, format the body into HTML and extract the field Title:

import markdown
import email

with open('test.txt') as f:
    post = email.message_from_file(f)
body = post.get_payload()

html = markdown.markdown(body, output_format='html5')
title = post['title']

Generating the entire site is then a simple matter of transforming all text files into HTML and manipulating those files using the meta-data.

Templating

To make it easier to ensure consistency across the site, I also used a simple templating language.

I preprocess the markdown files to handle expressions such as <%= expr %>. These are evaluated in Python using an extremely simple templating engine:

import re

def apply_template(template, vars):
    actions = re.split('<%=(.*?)%>', template) + ['""']
    result = []
    for text, expr in zip(* [iter(actions)] * 2):
        result.append(text)
        result.append(str(eval(expr, vars)))
    return ''.join(result)

I admit that it isn’t a paragon of readability but it gets the job done without fuss.

Here’s how it works. Assume we’ve got a text file:

text_1 <%= expr_1 %> text_2 <%= expr_1 %> text_3

The first line of apply_template splits the template into a list:

['text_1 ',  ' expr_1 ', 
 ' text_2 ', ' expr_2 ', 
 ' text_3',  '""']

zip(* [iter(actions)] * 2) is a Python trick equivalent to:

action_iterator = iter(actions)
paired_list = zip(action_iterator, action_iterator)

Here, zip will draw consecutive elements from the same iterator. The result is that the elements of the actions list are paired into tuples:

[('text_1 ',  ' expr_1 '), 
 (' text_2 ', ' expr_2 '), 
 (' text_3 ', '""')]

The loop then evaluates each expression (eval) and collects the results into a list that is concatenated and returned.

Summary

Simple, no fuss and it works well. So far I’m very happy with this publishing workflow.