word looked up : home / archive

 Character encodings in HTML 

HTML has been in use since 1991, but the first standardized version with a reasonably complete treatment of international characters was version 4.0, not published until 1997. Considerable care must be exercised when creating HTML pages with special characters outside the range of normal ASCII to ensure two goals: the integrity of the information stored in the HTML document, and proper display of the document by the largest possible variety of browsers.

The Document Character Set

When HTML documents are served to the viewer, there are two ways to tell the browser what specific character encoding is used. First, HTTP headers can be sent by the server along with each page. A typical header looks like this:

Content-Type: text/html; charset=ISO-8859-1

The other method is for the HTML document to include this information at its top, inside the HEAD element.

<meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">

Either method advises the receiver that the file being sent uses the character set specified. Of course, it would be a very bad idea to send incorrect information. For example, a server where multiple users may place files created on different machines cannot promise that all the files it sends will conform (some users may have machines with different character sets). For this reason, many servers simply do not send the information at all, to avoid making any false promises.

Browsers receiving a file with no character set information must make a blind assumption. The safest is probably to assume ISO 8859-1, but it is also common for browsers to assume the character set native to the machine on which they are running. The consequence of choosing incorrectly is that characters outside the printable ASCII range (32 to 126) may appear incorrectly. This presents few problems for English-speaking users, but European users require characters outside that range for everyday use.

For maximum compatibility, it is increasingly common for multilingual websites to use the UTF-8 encoding of the ISO 10646/Unicode character set, which provides a superset of almost all existing character sets.

It is important to point out that successful viewing of a page isn't necessarilty an indication that it is encoded correctly. If the creator of a page and the reader are both assuming some machine-specific character set, and the server doesn't send any identifying information, then the reader will nonetheless see the page as the creator intended, but other readers with different native sets will not.

Character Entity References

In addition to native character encodings, characters can also be encoded as HTML entities, using the encoding format derived from the use of character entities in SGML.

Many symbolic character entities have been defined. For example, the character 'λ' can be encoded as &lambda;. This use of the '&' character as an escape code[?] for character entities means that literal '&' characters in HTML need to be encoded as an entity themselves, as &amp;. Similar escapes are required for the '<' and '>' characters, encoded as &lt; and &gt; respectively.

Decimal and hexadecimal HTML entities can also be used, based on the Unicode numeric code for the character encoded. For example, λ can also be represented as a decimal-coded entity as &#955;.

 
Note that unnecessary use of HTML character references may significantly reduce the readability of HTML. If the character encoding for a web page is chosen appropriately then HTML character references are usually only required for a few special characters. The characters &, < and > always need to be encoded, as noted above.

to conceal some surreptitious document, and answered: "Yes." "It certainly is most charming; but I think I prefer the 'Quitasol' when he was in Spain in '92/92.html">92." In '92--nine years before he had been born! What had been the right.html">right to share.html">share in his future, surely he had a right to share in their life.html">life hard-lived, the mysterious impress of emotions, experience, and sanctity, to make curiosity impertinent. His mother must have had a he could not frame what he felt.html">felt about her. He got up, and stood ring of mountains glamorous in sinking sunlight. Her life was like as yet such a baby of a thing, hopelessly ignorant and innocent! the blue-green plain, as if out of a sea, Phoenicians had dwelt--a unknown to him, as secret, as that Phoenician past was to the town clamoured so gaily, day in, day out. He felt aggrieved that she loved him and his father, and was beautiful. His callow ignorance-- else!--made him small in his own eyes. That night, from the balcony of his bedroom, he gazed down.html">down on the gold; and, long after, he lay awake, listening to the cry of the Spanish city darkened under her white stars! "What says the voice-its clear-lingering anguish? Just a road-man, flinging to the moon his song? "No! Tis one deprived, whose lover's heart is weeping, "bereaved" was too final, and no other word of two syllables short- is weeping." It was past two by the time he had finished it, and least twenty-four times. Next day he wrote it out and enclosed it in down, so as to have his mind free and companionable. .

 On wordlookup.net  

All is still licensed under the GNU FDL.
It uses material from the wikipedia.



logo

navig stuff

home
archive