The Byte Order Mark

The BOM is a really helpful idea. To start with, it helps avoid confusing 8-bit and 16-bit character encodings. If you try to read a BOM-flagged 16-bit file in 7-bit or 8-bit mode, the first two characters are going to look like either -1 and -2 or 254 and 255 depending on how you look at them; either case is a good signal that something is seriously wrong.

Once you know you're in 16-bit mode, the BOM is still helpful. This is because computers internally often treat 16-bit numbers as pairs of 8-bit numbers. Then, when they transmit a 16-bit number, sometimes they transmit the two 8-bit halves low-half-first, sometimes high-half-first. The rules say they should transmit the BOM (#fffe) as (#ff, #fe). But if you look at the first two bytes of a file and see (#fe, #ff), this is a really reliable hint that the bytes are being swapped, and will in many cases allow the processor to read the file.

Back-link to spec

Copyright © 1998, Tim Bray. All rights reserved.