Re: not quite 1252
Available news archives: comp.lang.tcl - comp.lang.python - comp.security.firewalls - sci.crypt - comp.lang.php - comp.lang.javascript
Google
 
Web news.hping.org


comp.lang.python archive

Re: not quite 1252

From: Anton Vredegoor <anton.vredegoor@gmail.com>
Date: Sat Apr 29 2006 - 12:08:12 CEST

Martin v. Löwis wrote:

> Well, if the document is UTF-8, you should decode it as UTF-8, of
> course.

Thanks. This and:

http://en.wikipedia.org/wiki/UTF-8

solved my problem with understanding the encoding.

Anton

proof that I understand it now (please anyone, prove me wrong if you can):

from zipfile import ZipFile, ZIP_DEFLATED

def by80(seq):
     it = iter(seq)
     while it:
         yield ''.join(it.next() for i in range(80))

def utfCheck(infn):
     zin = ZipFile(infn, 'r', ZIP_DEFLATED)
     data = zin.read('content.xml').decode('utf-8')
     for line in by80(data):
         print line.encode('1252')

def test():
     infn = "xxx.sxw"
     utfCheck(infn)

if __name__=='__main__':
     test()
Received on Mon May 1 00:43:01 2006