XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.


SVG and Typography: Characters

SVG and Typography: Characters

May 12, 2004

In the second part of our discussion of SVG and typography we explore some time-honored practices of typographic excellence; as we go along, each “type issue” will lead to the discussion of relevant technical aspects of SVG. The typography issues covered are listed below. Beside each one of them is the associated technical SVG issue discussed:

  • Quotes, Hyphens, Ellipses (Character References)
  • Fonts (Embedding SVG Fonts, Creating SVG fonts from True Type)
  • Non-Latin Scripts (Proper fonts vs. cheating fonts, encodings, bidirectionality)
  • Ligatures and the Euro Sign

Quotes, Hyphens, Ellipses

“The devil is in the details,” details such as the quotes surrounding the previous phrase.

Figure 1
Fig. 1 Smart vs Dumb quotes

There are two kinds of quotes: “straight” or “dumb” quotes and “curly” or “smart” quotes. As you can see in Fig. 1 on the left, using smart quotes gives the text a professional look. This is because the paired quotation marks are specifically designed for each font, unlike the neutral “dumb” marks which are often considered a faux pas in professional typesetting.

The common misuse of dumb quotes is just the most popular version of a larger problem: using characters which are not appropriate but are similar to the correct ones and easier to input with a keyboard. For some, this problem is taken care of by their editing software like MS Word, which automatically will convert straight to curly quotes; but since a large part of SVG development is manual or the output of our custom software, we cannot afford to hope the editor will fix it. We need to understand the details of how these characters are included.

Numeric Character References

The correct way to introduce curly quotes in SVG documents is through numeric character references. The Left Double Quotation Mark is character U+201C in the Unicode standard so it can be included in your SVG document via the decimal numeric character reference “; its right counterpart, character U+201D, is included via ”. It is also possible to use hexadecimal character references, in which case you would use “ and ”. The SVG code for Fig. 2 (quotes.svg) mixes the two approaches.

<?xml version="1.0"?>
<svg xmlns="http://www.w3.org/2000/svg"
     width="220" height="220" version="1.1">
  <rect x="1" y="20" width="110" height="110" 
<text x="80" y="40" style="font-family:Arial; font-size: 10pt; fill:red;">Eno</text>
<text x="3" y="110" style="font-family:Arial; font-size: 12pt; fill:white;"> &#8220; Fabrication &#x201D; </text> </svg>

Fig 2. Quotes.svg

There are several reasons for using numeric character references in SVG instead of other methods such as HTML character entity references. Let's examine them briefly:

  • HTML character entity references (&ldquo; and &rdquo; for double curly quotes) are not defined in SVG. If you were to try to include them in an SVG document, compliant viewers such as the Adobe SVG viewer v3.01 will not show the character, because such entities are not pre-defined like they are in HTML.

  • Specifying an encoding such as UTF-8 and including the character directly in the document is a technically valid alternative. However, many programming tools have difficulties showing and manipulating UTF-8 and other encodings. This difficulty is relevant not only for curly quotes but also for any character that is difficult to input or display in common programming tools, including non-Latin alphabetic characters.
    Part of the beauty of SVG is that you can write simple programs to generate it and use common text programs to manipulate it; however, that simplicity is blurred when common operations like searching via the command line become difficult because the characters in question are not supported by your input methods. In other words, you can use any good old terminal to write grep -n "&#x0424D" code/*.svg, but you would have to go through contortions to get grep -n "Ф" code/*.svg

  • Non-standard character sets such as Microsoft windows-1252 are being deprecated and should be avoided because of their conflict with Unicode. For a more detailed explanation of the problems related with windows-1252 please refer to David Wheeler's article “Curling Quotes in HTML,SGML, and XML”, which also mentions (in a slightly different light) the two points above.

XML allows both decimal and hexadecimal numeric character references, so just as shown in Fig.2 you can use either one in SVG. I prefer and recommend hexadecimal references. Some respectable sources advocate the exclusive use of decimal references to keep “maximum backwards compatibility with SGML” because before XML, SGML only supported decimal references. In practice, however, one is more likely to use XML tools to process SVG, and there are ways in most modern SGML tools to enable hexadecimal references. More important is the argument that the Unicode standard and literature refers to every character by its hexadecimal code, making hexadecimal references very convenient.

Commonly Bungled Characters and their Correct Codes

Now that we know the how and why of inserting special characters in SVG, lets go back to typography and some variations of the bungled curly quotes syndrome, including single quotes, hyphens, and ellipses.

Character(s) Common Error Examples
Single Quotes Using the ASCII grave accent (U+0060) and a “corresponding” acute accent (U+00B4) is a common error, which looks about as bad as using two apostrophes (U+0027). The correct single quote marks are U+2018 and U+2019 (except when writing code that uses apostrophes).
  • `this is a hack from typewriter days´
  • 'This is also wrong'
  • ‘Nice, no?,’ she asked
Double Quotes Using the ASCII quotation mark (a.k.a. dumb quotes) when quoting text is a common mistake. Instead, use smart quotes, characters U+201C and U+201D inserted in SVG documents via their corresponding hexadecimal numeric character references &#x201C; and &#x201D;. "this is a common typographic typo too"
print OUT "in code it is ok";
“This is not an exit,” Pat says
Hyphen, n-dash, m-dash The character U+002D is the plain hyphen accessed on your keyboard. It's typographic purpose is to break words at the end of a line (to hyphenate); however, the hyphen is commonly abused to indicate ranges or to break the flow of a sentence.
The correct characters for such purposes are, respectively, the n-dash (U+2014), and the m-dash (U+2015).
The hyphen is the shortest of the three characters, the n-dash is larger and commonly about as wide as the letter “n”. The m-dash is the longest of the three and should not be replaced by two hyphens, as I'm sure you've seen done before.
  • Using an n-dash in March 3—8 is a subtle but elegant improvement over March 3-8
  • this is wrong -- and ugly --
  • Boggart―or at least his presence―will remain.
Ellipses Although many people are used to create ‘faux—ellipses’ using three dots ―something that some packages like MS Word automatically correct―, horizontal ellipses have their own character, and we must include it explicitly using &#8230; Isn't that special…
This isn't...

The SVG graphic in Figure 3 and its associated code illustrate the points above.

<?xml version="1.0"?>
<svg xmlns="http://www.w3.org/2000/svg" height="400" width="400" 

<image xlink:href="triceratops.png"
        width="303" height="216" x="1" y="1"/>

<text x="55" y="45" style="font-family:Arial; font-size: 24pt;
fill:#F8431C;"> sands of time&#8230; </text> <text x="2" y="60" style="font-family:Times New Roman;
font-size:14pt; fill:#F8431C;"> &#8220;Not an experience-a revelation&#x201D; </text> <text x="125" y="185" style="font-family:Times New Roman;
font-size: 14pt; fill:#F8431C;"> Stefan George Institute </text> <text x="210" y="205" style="font-family:Times New Roman;
font-size: 14pt; fill:#F8431C;"> June 10&#x2013;24 </text> <image height="16" width="16" y="192" x="99"
xlink:href="triceratops.png"/> </svg>

Fig 3. triceratops.svg

Before moving on, a word of caution about smarts quotes: always use curly quotes except when showing code. String literals in programming languages, attributes in XML, and other such technical code is only correctly presented in dumb quotes, the way it would compile/parse. Using curly quotes to show code is not only incorrect but looks cluelessly affected, roughly similar to eating a Snickers bar with fork and knife.

Pages: 1, 2, 3

Next Pagearrow