The special case of the less-than ('<'), greater-than ('>'), and ampersand ('&') characters. In general, the safest way to write HTML is in US-ASCII (ANSI X3.4, a 7-bit code), expressing characters from the upper half of the 8-bit code by using HTML entities.
Working with 8-bit characters can also be successful in many practical situations: Unix and MS-Windows (using Latin-1), and also Macs (with some reservations).
Latin-1 (ISO-8859-1) is intended for English, French, German, Spanish, Portuguese, and various other western European languages. (It is inadequate for many languages of central and eastern Europe and elsewhere, let alone for languages not written in the Roman alphabet.) On the Web, these are the only characters reliably supported. In particular, characters 128 through 159 as used in MS-Windows are not part of the ISO-8859-1 code set and will not be displayed as Windows users expect. These characters include the em dash, en dash, curly quotes, bullet, and trademark symbol; neither the actual character (the single byte) nor its �nnn; decimal equivalent is correct in HTML. Also, ISO-8859-1 does not include the Euro currency character. (See the last paragraph of this answer for more about such characters.)
On platforms whose own character code isn't ISO-8859-1, such as MS-DOS and Mac OS, there may be problems: you have to use text transfer methods that convert between the platform's own code and ISO-8859-1 (e.g., Fetch for the Mac), or convert separately (e.g., GNU recode). Using 7-bit ASCII with entities avoids those problems, but this FAQ is too small to cover other possibilities in detail.
If you run a web server (httpd) on a platform whose own character code isn't ISO-8859-1, such as a Mac or an IBM mainframe, then it's the job of the server to convert text documents into ISO-8859-1 code when sending them to the network.
If you want to use characters not in ISO-8859-1, you must use HTML 4 or XHTML rather than HTML 3.2, choose an appropriate alternative character set (and for certain character sets, choose the encoding system too), and use one method or other of specifying this.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment