From English to Dutch?
July 28, 2004
I need to translate text in an XML document from English to Dutch. The XML document in question is used as a language file in vBulletin forum software. The problem is that I have to get the text to be translated out of the XML code and, after translation, back into the code in the right place.
A: This is an interesting question -- although it may seem much more interesting at first glance than it really is.
First, for the uninitiated, vBulletin is an extremely popular package for managing bulletin boards -- forums -- on web sites. (At this writing, the vBulletin site lists over 4,800 vBulletin-powered forums. And those are just the sites that have explicitly requested a listing!) Written in the likewise-popular scripting language PHP, vBulletin uses the open source MySQL database as a storage-and-retrieval back end.
This question is interesting because it asserts that the questioner really does need XML to translate from one language (English) to another (Dutch). It's roughly possible to do this, and I'll discuss how in a moment. The question may be less interesting to you if you know that vBulletin comes with a language translation feature built in; what salvages the question for our purposes, I think, is that this translation feature is XML-based.
vBulletin's Built-In Language Translation Feature
The XML in vBulletin's language management is normally hidden from view, behind a
vBulletin
"Administrator Control Panel" interface. (It's accessed by way of the Control
Panel's Languages & Phrases -> Language Manager
menu.) This simplifies
the task of making language changes.
Note: Tinkering with vBulletin's language translation features is fraught with potential complication. What follows in this column enormously simplifies the process. For full information, consult the vBulletin manual's section on translation and, especially, the vBulletin.org forum for administrators and "hackers" of vBulletin forums.
One thing to understand from the start is this: what's being translated is not forum user messages, replies, and the like -- the "content" of a forum. What vBulletin can translate is the forum user interface. For instance, a typical forum interface (vBulletin's or otherwise) includes text like the following:
- First
- 1 day ago
- Advanced search
- Delete thread
- Edit profile
- Contact us
Such English-language words and phrases are what vBulletin can translate into other languages, such as Dutch.
The principal mechanism for translation, what's called a "language pack," is an XML document whose structure looks something like this:
<language name="English (US)" vbversion="[vBulletin version]" type="custom"> <settings> [various options] </settings> <phrasetype name="GLOBAL"> <phrase name="[vBulletin phrase]">[translated value] </phrase> [etc. - more phrases of this type] </phrasetype> <phrasetype name="BB Code Tools"> <phrase name="[vBulletin phrase]">[translated value] </phrase> [etc. - more phrases of this type] </phrasetype> <phrasetype name="vBulletin Settings"> <phrase name="[vBulletin phrase]">[translated value] </phrase> [etc. - more phrases of this type] </phrasetype> <phrasetype name="FAQ Title"> <phrase name="[vBulletin phrase]">[translated value] </phrase>[etc. - more phrases of this type] </phrasetype> <phrasetype name="FAQ Text"> <phrase name="[vBulletin phrase]">[translated value] </phrase> [etc. - more phrases of this type] </phrasetype> <phrasetype name="Control Panel Global"> <phrase name="[vBulletin phrase]">[translated value] </phrase> [etc. - more phrases of this type] </phrasetype> <phrasetype name="Permissions"> <phrase name="[vBulletin phrase]">[translated value] </phrase> [etc. - more phrases of this type] </phrasetype> [etc. - many more phrase types!] </language>
Among the options covered by the settings
element are the character set
(specified by a charset
element) and the "thousands separator" for
numbers (the thousandsep
element). The settings
element does not,
however, drive any actual translation. All of that power resides in the
phrasetype
elements and their phrase
children.
Each phrasetype
element identifies some general component of the user
interface; the phrase
elements -- specifically, their name
attributes -- identify specific English-language phrases to be translated as indicated
by
the corresponding "translated values." Each phrase's name
attribute
is an English-language phrase, all lowercase, with underscore characters in place
of spaces.
For example, one phrasetype
element, whose name is "Posting," has
among its child phrase
elements such names as "additional_options",
"after_you_submit_your_message", "attach_files",
"close_this_thread", and "delete_message". The English phrase
"Attach Files" (which a forum user might see when composing a message for posting)
is thus translated using the text content of the phrase
element whose
name
attribute has a value of attach_files
. If this text content
were, for instance, the string "ABCD", then the phrase "Attach Files"
would be replaced with the string "ABCD" wherever it appears in the vBulletin
interface in the context of posting a message.
All of which sounds rather straightforward. So what's the big deal about translating the user interface via vBulletin's built-in feature set?
The big deal is in the number of phrases, the sheer breadth of text to be translated. I did see a post about one translation effort under way, which referred to "5000+ phrases." Whatever the exact number of words and phrases to be translated, a typical vBulletin translation must be the work of numerous (mostly volunteer) individuals, fluent in both English and whatever the target language might be, spread over the course of months of work.
There aren't that many language packs available so far. (The language-pack feature has been available only with vBulletin version 3+, which has been out for less than a year. Earlier versions required significant hacking of source code and/or vBulletin templates.) That said, I did find an English-to-Dutch effort already underway, begun in October, 2003; in May, 2004, they released version 1.0 of their Dutch language pack. And that, almost certainly, will be the questioner's best bet.
The Pure-XML Option
I can think of a few cases where the above solution won't help (there may be others, of course):
- Your version of vBulletin doesn't support the language-pack feature.
- Your version of vBulletin supports the language-pack feature, but you don't want to (or can't) use it.
- What you want to translate isn't the user-interface text, but the actual forum message content -- messages and replies.
Note: For information on English-to-Dutch translation in vBulletin versions earlier than 3.0, you might check this thread on the vBulletin.org community forum. The messages on the thread are themselves in Dutch, which I can't read, but the thread was pointed out to someone else who asked a similar question in November, 2001.
In any of these cases, your task is exponentially more complex. In theory, it's possible to do a word-for-word translation from one language to another; in practice, it's almost impossible: how do you handle things like irregular English verbs and other non-standard word forms? If you're translating forum content, how will you deal with misspellings? Do you need to consider "translating" in-word punctuation, such as hyphens and apostrophes, to full words in the target language? And so on.
The better approach -- the only one rooted in sanity -- is to do what vBulletin itself does: translate entire phrases. You might undertake to do so with an enormous XSLT stylesheet, which matches all possible phrases with their translations. (This is somewhat simpler if you're "just" translating user-interface text: the number of phrases may be quite large, but at least it's finite.)
Or you might instead leverage existing work in translating an XML document from one
language to another. Many of the issues in doing so are covered in Andrzej Zydron's article here on
XML.com, from January, 2004. Pay special attention to his discussion (in part two of the series)
of XLIFF (the
XML Localization Interchange File Format). Note, for example, the correspondence between
XLIFF's source
and target
elements and the phrase
element's name
attribute and its text content in vBulletin. Also read and heed
what Zydron says about "fuzzy" translation.
In any case, if you can't use vBulletin's built-in translation feature, don't expect that someone will already have a solution prepared for you. Ready yourself for many months of work!
On to Better Things
I've been writing XML.com's "XML Q&A" column for exactly four years now. In that time, the number of XML-relevant newsgroups, web sites, and other resources has exploded. No longer, as a rule, do newcomers to XML want to understand the basics of the standards; they've already got their answers online, or from any of a hundred books. The upshot is that a Q-and-A column has what might be (at best) only limited continued relevance.
Next month I'll be moving, like a hermit crab whose current quarters are starting to feel a bit tight, to the shelter of a new column focused on XML applications.
Also in XML Q&A |
|
Hardcore XML veterans will infer from the word "application" a particular meaning: an XML application in this sense is also known, more commonly and informally, as a vocabulary, a dialect, a flavor -- a conceptual model that might be formalized in a DTD or XML schema. If you're new to XML, on the other hand, when you hear the word "application" you might think simply of software: parsers, editors, XSLT processors, and the like.
In the new column, I will cover both types of application: "vocabularies" -- such as vBulletin's XML-based language-translation described above -- and software that consumes or generates or otherwise uses XML in an interesting or novel way. In either case, my focus will not be on the applications you've probably already heard or read about. Don't expect me to devote much space to XML schema, say, or XSLT, or RSS, or SOAP; nor should you expect to find capsule reviews of packages such as XML Spy, Saxon, RenderX's Xep, or Macromedia's SVG Viewer. Rather, I hope to cast some light into XML's little back corners -- niches whose existence most of you may not have even suspected.
In the meantime, thanks to XML.com's editors for their support for "XML Q&A" (as well as the new column). Thanks, especially, to the people who posted their questions -- sometimes anguished, sometimes bemused -- in the newsgroups I monitored every month. And thanks, finally, to my readers, who every month gave me valuable feedback and the motivation to improve. "Q&A" wouldn't have lasted this long without any of you; I look forward to making your acquaintance all over again in the new column.