Re: 'ascii' codec can't encode character u'\u2013'
Available news archives: comp.lang.tcl - comp.lang.python - comp.security.firewalls - sci.crypt - comp.lang.php - comp.lang.javascript
Google
 
Web news.hping.org


comp.lang.python archive

Re: 'ascii' codec can't encode character u'\u2013'

From: Fredrik Lundh <fredrik@pythonware.com>
Date: Fri Sep 30 2005 - 15:50:05 CEST

Thomas Armstrong wrote:

> I'm trying to parse a UTF-8 document with special characters like
> acute-accent vowels:
> --------
> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
> ...
> -------
>
> But I get this error message:
> -------
> UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in
> position 122: ordinal not in range(128)
> -------

> It works, but I don't want to substitute each special character, because there
> are always forgotten ones which can crack the program.

if you really want to use latin-1 in the database, and you don't mind dropping
unsupported characters, you can use

    text_extrated = text_extrated.encode('iso-8859-1', 'replace')

or

    text_extrated = text_extrated.encode('iso-8859-1', 'ignore')

a better approach is of course to convert your database to use UTF-8 and use

    text_extrated = text_extrated.encode('utf-8')

it's also a good idea to switch to parameter substitution in your SQL queries:

    cursor.execute ("update ... set text = %s where id = %s", text_extrated, id)

it's possible that your database layer can automatically encode unicode strings if
you pass them in as parameters; see the database API documentation for details.

</F>
Received on Sat Oct 15 04:00:05 2005