|
|
 |
Article:
 |
 |
From Wiki to XML, through SGML
|
| Subject: |
Please, more XML-Wiki articles |
| Date: |
2004-03-04 02:00:37 |
| From: |
Anthony Thompson |
|
|
|
|
I think Wiki text is the perfect text for users to use in a home-grown content management system, since it's not too hard for users to learn that blank lines separate paragraphs, asterisks mean *bold*, etc.
Getting Wikitext -> XHTML, and the other way around (XHTML -> Wikitext), however, is the tough part, and something I imagine XML would be perfect for. Does anyone know of any solutions for this, other than having to go through SGML as this article indicates?
|
- Please, more XML-Wiki articles
2004-03-08 08:41:06 Brian Ewins
[Reply]
Going from XHTML to wiki text is fairly trivial with XSL-T, if you restrict the syntax enough, e.g. if you only look at pages generated by a specific CMS (eg another wiki).
I was recently writing some stuff to pull out 'text-like' chunks from XHTML for a translator to work on, its somewhat relevant. The spans of text with minimal embedded markup were identified by doing a depth-first search in a DOM for nodes that had mixed content (ie they have at least one non-blank text child node).
This gives you a list of child nodes that may look text-like. The list was further narrowed by removing from the start & end nodes that contained no non-blank text nodes at any depth (eg, omits "br" padding)
We processed the omitted nodes to pull out some attributes too (alt, title and value attrs were interesting for translation - obviously the 9 url attributes in html would interest a wiki extractor: action, src, codebase, usemap, cite, href, longdesc, profile and background - nb background isn't in xhtml, its a netscape thing).
Once you're down to this 'minimal' markup it should be even easier to get to a wiki-like representation as you're generally only left with a & span from your xhtml. I wrote this in java, but looking back at where we ended up I'm sure the same algorithm is expressible in xsl (not sure how you'd do the 'narrow' bit though).
|
 |
Sponsored By:
|