The Code of the XML Geeks

October 3, 1998

Peter Murray-Rust

-----BEGIN GEEK CODE BLOCK----- GIT d- s-:+ a42 C+++$ UL+ P L(+) E W+(++) N o? K+ w-- O-
        M+(--) V !PS PE+ Y+ PGP- t 5 X+ R? tv b+ DI++ D--- G e>+++ h! r+ x+ ------END GEEK CODE

From time to time XML:Geek encounters a mail message with a strange .sig, something like what you see above.

What is it? Runaway Perl? A Uuencode of someone's cat? No! It's the Geek Code, originally defined at:[1]

The Geek Code was developed by Robert Hayden as a way for geeks to identify themselves and their interests to others, especially other geeks. It is both wonderful and horrible. Wonderful because it is in the highest traditions of the Internet. It represents the basic desire of all geeks to communicate as virtual social animals. It says "this is what I'm like" and invites the response:

wow - it's not just me who has this wierd passion to [hack Linux | wear 4 pens in my trousers | attend Trekkie conventions | hate Barney dinosaur | not know how old I am | not know who I am]".

A geek can identify over 30 dimensions of his/her personality. [And you thought geeks were mono-dimensional]. And each of these dimensions is quantised in many steps. The Geek Code is perhaps the most valuable piece of applied sociological technology in the late C20. But grossly underused for the reasons below.

Horrible because it offends against the golden laws of communication as laid down in the XML spec V1.0:

  • XML shall be human-readable and reasonably clear.
  • Terseness shall be of minimal importance in XML documents.

In the wrong context the traditional geek .sig over the wire might debit your bank account, post your personal details to alt.* , or interfere with the nearest nuclear power station. It might even corrupt your registry. So XML:geek comes to the rescue with XML, and takes XML itself as the extension to the Geek Code. So even if you don't know geek, you'll understand xmlGeek:


First we create a DTD/namespace for the XML Geek Code. Names starting with xml are reserved by the W3C but when they (the W3C) see this paper and undertstand its importance, <xmlGeek> will be instantly adopted as a Recommendation. So all elements and attributes belong to the xmlGeek namespace.

xmlGeek has a DTD which can be used to validate xmlGeek document instances, but those XML:geeks who abhor DTDs (v.i.) can omit this. Whether they can extend the effective DTD (i.e. add elements and attributes as they wish) depends on the floating opinion of the XML:geek community.

The xmlGeek DTD

<!-- xmlGeek DTD V0.1 --> <!ELEMENT xmlGeek (subject)*> <!ELEMENT subject
        (#PCDATA)> <!ATTLIST subject     title (AFDR | attribute | book | DOM |
        DTD | entity | FPI |     namespace | notation | parser | RDF | SGML | W3C |
        WF | XLink |     XPointer | XSL | geekCode | other) #REQUIRED>
        <!ATTLIST subject qualifier CDATA #IMPLIED> 

Even !(XML:geek)s or (!XML):geeks can understand what the DTD dictates about the structure of an xmlGeek document:

  • An xmlGeek element can contain zero or more subjects .
  • Each subject must have a title (or we wouldn't know what it was) and
  • Each title must be chosen from about 20 options (this will increase in later versions) and
  • Each subject may have a qualifier (v.i.). The value of this is arbitrary, though some common semantics are suggested below.
  • Each subject can have zero or more characters (#PCDATA ) to add personalised information which the qualifiers cannot support.
  • That's it.

Note, of course, that the DTD provides no formal way for associating semantics with any of these. The rest of this document (some of which will occur in later postings) adds textual semantics to the DTD. When schemas, such as XSchema, are finalised the DTD will be converted to XML instance syntax. in this case comments and XLinks can be added.


This is the simplest XML:geek document. It simply announces "I am an XML:geek". The remaining semantics are undefined. It could mean:

  • I am giving out no other personal details.
  • My fingers are sore.
  • People send hate-mail if my .sig is over 4 lines.
  • My sysadmin doesn't like .sigs s/he can't understand.
  • I don't know anything else about myself.
  • I can express no other opinions without the permission of my employer.
  • I can't understand the rest of the xmlGeek Code [you're not a very geekish  xmlGeek].


Most real xmlGeeks have strong opinions on XML. Many of them (the geeks) are not shy. Each subject defines a potential area of XML:geekdom. None are mandatory and all are repeatable (useful if you have a split personality or live in a multidimensional universe or are a virtual hyperperson.) However most geeks will only have one subject for a given title, except other.

The title is a restricted list of known and carefully maintained categories of geekdom, but which can be eXtended (this is XML, of course) by the magic, repeatable, reserved word other . This allows free text to be added. The current values of title are enumerated and elucidated below.


The original Geek Code makes great use of character-based qualifiers/quantifiers such as ++ or ?. In honour of this traditional the following values of qualifier are suggested (although as this is CDATA others can be added):

  • ++++ I am the world expert | this is the most important thing in my life | I publicly castigate opponents of my view
  • +++ I feel pretty strongly | I have written code/examples/talks/books
  • ++ I'm well on the way | I can pontificate in bars | I post to lists regularly | I think I could make some money from it
  • + I think I understand this | I bought a book about it
  • ? I don't understand this | I have never heard of it | I can't make it work | I have several opinions | I don't know who I am
  • ! I am not in this category | I don't have one | I never read books | I don't exist
  • - Probably a waste of time | I'm forced to do this by my boss
  • -- They got this seriously wrong | this fouls up my daily work | these guys are boneheads
  • --- This is an invention of the Evil Empire (identity of EE is time-dependent) | This will screw up computing for the next century | this has completely destroyed my current captive market
  • ---- I am running a public crusade against this
  • d numeric digit representing intensity (e.g. number of books/parsers/browsers written by me | mean time between crashes)
  • a alphabetic character (undefined)
  • p additional XML/unix-like characters (e.g. * I can do this as little or often as I please)

Qualifier primitives can be re-combined. Thus ++(?)2 could mean "I think I really like this idea. I don't understand it but I have had two attempts".

subject #PCDATA

The semantics of the content of subject are undefined. Any XML characters are allowed. (If you need to use CDATA to express your deepest personality you are a remarkable xmlGeek - but see below).  We suggest that the content is human-legible and reasonably clear. A Base64 gif of the author's coffee-station is not encouraged. (All XML:geeks are assumed to be coffee addicts and therefore a coffee field is redundant.). The author of the xmlGeek DTD takes no responsibility for the content of #PCDATA (or anything else).


We start with an example, before explaining the (context-dependent) interpretation of the qualifiers:

<xmlGeek>   <subject title="namespace"
        qualifier="+++"/>   <subject title="DTD"
        qualifier="---">Waiting for DTD editor</subject>   <subject
        title="RDF" qualifier="?"/>   <subject title="book"
        qualifier="6"/>   <subject title="other">Written XML
        robot</subject> </xmlGeek>

This XML:geek is worth knowing. S/he finds namespaces useful and can hack them. Likes the freedom of XML (no DTD needed). Could do with some help on RDF (doesn't/can't grok it). Spends hard-earned cash on XML books. And has found novel uses of XML. Almost mandatory to mail an exploratory greeting e-mail.

When this information gets into XML-aware search engines, s/he will be instantly locatable. A namespace freak who hates DTDs can search for friends with an XPointer-like string:


(Since XPointer is not finalised, and there isn't a search syntax yet, don't shoot me.  It means "find all XML documents which contain an xmlGeek element. Then find all children with elementName of subject , with a title attribute whose value is namespace and a qualifier whose value includes the string "+++"). It can be refined with booleans - we can exclude all notation freaks, for example.

This will revolutionise e-social life and e-commerce. A bookshop could automatically mail compulsive XML:biblioaddicts like this XML:geek whenever a new title appeared. A specialist XML search engine could gather wierd other activities of XML:Geeks, and organise SIGs for them . And so on.


The title attribute (see the DTD) can take a number of predetermined values (at present about 20). The semantics of the values should be obvious (e.g. "DTD" is about DTDs. "AFDR" is about Architectural Forms - you knew that, didn't you.). If none of them fit, use other and use the subject content to express yourself. If there is popular demand for another subject, xmlGeek might be sympathetic in the next revision.

qualifier values

The semantics of each qualifier value are sometimes dependent on the title. These will be enumerated in detail for all titles, but in this first release only DTD will be documented. (XML:geek is following the time-honoured tradition of releasing software without documentation).


The DTD is at the heart of XML. Or is it? XML allows you to omit its use. Or not even to have one. Opinions vary - and are summarised below.

++++ DTDs are the greatest invention since the printing press. I wrote most of the TEI. Good DTDs are huge.

+++ DTDs are great for subduing my staff. All documents, even the lunch menu, have to conform to the company DTD.

++ DTDs make it a lot easier to write my software. No unpleasant surprises in document instances. DTDs are the  contract between authors and readers.

+ I use them occasionally

? DDT is for killing bugs, No?

! I can't use DTDs.

- I suppose some people need them. I never write XML documents for which DTDs would be useful.

-- XML documents with namespaces will never use DTDs. So stop trying to force them on us.

--- DTDs are an outdated hangover from SGML oldies. They were invented by the US military. They'll disappear in a year.

---- DTDs are a conspiracy of the publishing community to grind down authors. Using 12083 Finnegan's Wake would still be unpublished. No work of art needs any constraint.

2 I have written 2 DTDs. Find them  on Robin Cover's page

Using xmlGeek

An xmlGeek element is unlikely to form a complete document so it needs a namespace. A document might look like this:

<?xml version="1.0"?> <memo
        xmlns="">   <to>BigBoss</to>  
        <from>XML greaseMonkey</from>   <content> Your latest <a
        href="/memos/12083">memo</a> on a proposed company DTD is unworkable.
          </content>   <G:xmlGeek
        xmlns:G="">     <G:subject
        title="DTD" qualifier="---"/> <G:subject title="FPI"
        qualifier="---(-)"/>     <G:subject
        title="namespace" qualifier="++++"/>     <G:subject
        title="book">Understanding XML (1998), GreaseMonkey Press</G:subject>
          </G:xmlGeek> </memo>

Note the efficiency of communication. Not only does the XML:geek[3] give a clear message, but indicates that s/he will never  tolerate a company DTD, nor the absurd requirement to have FPIs for every department. And the XML:geek has written a book on XML, and published it without a conventional publisher, so this is not a geek to take lightly. [A real geek will shudder at the lack of markup in the book description, but what else can PCDATA do?]

xmlGeek extends the geekCode, which can be encapsulated in the geekCode subject. The opening Geek Code example could be written:

<subject title="geekCode"> GIT d- s-:+ a42 C+++$ UL+ P L(+) E W+(++) N o?
        K+ w-- O- M+(--) V !PS PE+ Y+ PGP- t 5 X+ R? tv b+ DI++ D--- G e>+++ h! r+ x+

True XML:geeks will have spotted the embedded ">" but realised that this was still parsable. Purists would embded it in <![CDATA[ ... ]]>. Others can never remember the appalling CDATA syntax. XML novices should use &gt;

More semantics will follow in the next "XML:geek".


[1] This was the official home of GeekCode V3.1 - I think it's moved. Will a Geek please let me know where it is now?

[2] The Geek Code carries the copyright:

The Geek Code is copyright 1993,1994 by Robert A. Hayden. All rights reserved. You are free to distribute this code in electronic format provided that the file remains unmodified and this copyright notice remains attached.

[3] I adopted the term "XML GreaseMonkey" from Murray Altheim's .sig because I love it. The content of the fictitious message does not represent his views or anyone else's.

Copyright Peter Murray-Rust, 1998