Re: xml.sax removing newlines from attribute value?
Available news archives: comp.lang.tcl - comp.lang.python - comp.security.firewalls - sci.crypt - comp.lang.php - comp.lang.javascript
Google
 
Web news.hping.org


comp.lang.python archive

Re: xml.sax removing newlines from attribute value?

From: Grant Edwards <grante@visi.com>
Date: Thu Sep 29 2005 - 21:12:20 CEST

On 2005-09-29, Fredrik Lundh <fredrik@pythonware.com> wrote:

>> I'm using xml.sax to parse the "datebook" xml file generated
>> by QTopiaDesktop. When I look at the xml file, some of the
>> attribute strings have newlines in them (as they are supposed
>> to).
>>
>> However, when xml.sax passes the attributes to my
>> startElement() method the newlines seem to have been deleted.
>>
>> How do I get the un-munged element attribute values?
>
> newlines as in chr(10) rather than &#xa; ?

Yup, Looks that way.

> if so, the only way is to avoid XML:
>
> http://www.w3.org/TR/REC-xml/#AVNormalize

I can't quite find it in the BNF, but I take it that chr(10)
isn't really allowed in XML attribute strings. IOW, the file
generate by Trolltech's app is broken.

> if the "yes, I know, but I have good reasons" approach is okay
> with you,

I didn't define the file or write the program that generated
it. It's claimed to be "xml", and I'm just trying to parse it.

> and you're big enough to defend yourself against the
> XML-Is-The-Law crowd, you can use a "sloppy" XML parsers such
> as sgmlop to deal with your files:
>
> http://effbot.org/zone/sgmlop-index.htm

Good to know for future reference. For now, I think I'll just
live with the way it works. Everything basically works, except
some strings don't display quite "right". My current app
treats the file as read-only. If I ever get around to
modifying data and writing it back, I'll probably have to deal
with the newline issue at that point.

-- 
Grant Edwards                   grante             Yow!  When this load is
                                  at               DONE I think I'll wash
                               visi.com            it AGAIN...
Received on Sat Oct 15 03:57:28 2005