Invitations and the VCard Format

2010-12-16 | projects Themis

My next goal for the Themis project is to parse an invitation from an email. I am starting with invitations generated by MS Outlook because that’s my target audience, but a peak inside of a Google Calendar invitation gives me hope that I’ll be able to support multiple calendars without much trouble.

Outlook invitations are sent in the VCalendar format, content type “text/calendar”. The standard was published as RFC 2445 in 1998. It describes a standard layout for calendar data in the VCard format, which is described in RFC 2425.

VCard is a simple text based format with a nested structure that’s similar to XML. It uses colons and semicolons instead of angle brackets, supports attributes, and has standard representations for several common data types. Here is an example of the same data represented in both formats (content from RFC 2425):

An example of VCard data beside an XML representation of the same data.

I looked around for a library to do the job,but decided to build my own. Writing this kind of code isn’t a good way to deliver business value, but it sure is fun. A bit of time with a white board and a couple of evenings coding, and I have a mostly functional VCard parser ready to go.

When I am setting out to build something like a parser, I like to consider the designs of other established libraries that perform a similar function. As I’ve already mentioned, VCard is a lot like XML in structure, so XML parsers are a great place to look for inspiration. I built a structure that loosely mimics the XmlDocument model, an object tree that holds the entire document in memory. It won’t perform as fast as using something like an XmlReader, but it makes it easier to handle variations in document format with polymorphism. Since the VCard documents I’m processing should never be too big, it won’t be much of a performance burden anyway.

The object structure that holds VCard data looks like this:

At first I wanted to have a property like Value, but there is no reliable way to know the type of a VCard value without knowing the structure of the document, and at this level I don’t want to be coupled to the structure of the document. Instead, I will be adding a series of methods to get and set the EscapedValue as a specific type.

The VCardSimpleValue class was a late addition to the model. I needed a way to hold parameter values (the equivalent of attributes in XML), but since they can’t have parameters of their own I didn’t want to pick up the Parameters collection. I also considered making a type separate from VCardValue for the parameters, but both these classes will need the ability to read and write the escaped value, and I don’t want to write that code twice.

I’ve added the first unit tests to the solution, checking the parser against a number of examples in the specifications and real snippets extracted from emails sent by Outlook. I’ve also added tests for a number of failure cases in the parser such as groups without an ending or values lines without a value delimiter.

My next addition will be parsers for the value types and a stub in the test harness that replies to emails with some info about the original request.

The code at the time of this post is available here: https://github.com/jessemcdowell/themis/tree/8e0c428df0e33145df0323d6e9ae617114dd0f85