Re: Diacritical marks in array don't translate
Available news archives: comp.lang.tcl - comp.lang.python - comp.security.firewalls - sci.crypt - comp.lang.php - comp.lang.javascript
Google
 
Web news.hping.org


comp.lang.javascript archive

Re: Diacritical marks in array don't translate

From: Thomas 'PointedEars' Lahn <PointedEars@web.de>
Date: Fri Nov 11 2005 - 16:42:24 CET

jiverbean wrote:

> I have an array with strings of German words:
>
> profile[1] = "Fröhliches Fräulein";
>
> Because HTML doesn't or didn't allow some of these characters,

That's an urban legend that will probably never die. HTML allows these
characters, HTTP is and has been 8-bit-safe. You just need to declare
that with the Content-Type header and, for offline use,

  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=...">
    ...
  </head>

A good reason for escaping 8-bit characters _in HTML_ is editing on
different platforms without having the knowledge or facility (due to
keyboard layout) to type them there.

<http://www.htmlhelp.com/faq/html/design.html#entity-or-number>

> I wrote:
>
> profile[1] = "Fr&ouml;hliches Fr&auml;ulein";

JS (programming language) is not HTML (markup language). This source code
has to be interpreted by the JS engine, and it does not and is not supposed
to "know" how to handle SGML character entity references like "&ouml;".

There is no problem you have to work around.
 
> but when I use an alert(profile[1]); the dialog displays the escape
> codes instead of the diacritical marks. I then figured the unescape()
> function would solve the problem, but not. I don't want to write:
>
> profile[1] = "Fr%190hliches Fr%191ulein";
> alert(unescape(profile[1]));

It is not supposed to work anyway. unescape(), which is proprietary,
accepts only 8-bit escape sequences (in contrast to standardized
decodeURI*()). The above results in

  Fr<EM>0hliches Fr<EM>1ulein

where <EM> is the character at code point 0x19 (31).
 
> ________________________________________________
> [...]

Signatures are to be delimited by a line containing only "--<SP><CR><LF>".

HTH

PointedEars (a German)
Received on Mon Nov 21 03:25:50 2005