Splitting on a word
Available news archives: comp.lang.tcl - comp.lang.python - comp.security.firewalls - sci.crypt - comp.lang.php - comp.lang.javascript
Google
 
Web news.hping.org


comp.lang.python archive

Splitting on a word

From: <qwweeeit@yahoo.it>
Date: Wed Jul 13 2005 - 15:19:54 CEST

Hi all,
I am writing a script to visualize (and print)
the web references hidden in the html files as:
' underlined reference'
Optimizing my code, I found that an essential step is:
splitting on a word (in this case 'href').

I am asking if there is some alternative (more pythonic...):

# SplitMultichar.py

import re

# string s simulating an html file
s='ffy: ytrty python fyt <A
HREF="wwwx">wx</A> dtrtf'
p=re.compile(r'\bhref\b',re.I)

lHref=p.findall(s) # lHref=['href','HREF']
# for normal html files the lHref list has more elements
# (more web references)

c='~' # char to be used as delimiter
# c=chr(127) # char to be used as delimiter
for i in lHref:
    s=s.replace(i,c)

# s ='ffy: ytrty <a ~="www.python.org">python</a> fyt <A
~="wwwx">wx</A> dtrtf'

list=s.split(c)
# list=['ffy: ytrty <a ', '="www.python.org">python</a> fyt <A ',
'="wwwx">wx</A> dtrtf']
#=-----------------------------------------------------

If you save the original s string to xxx.html, any browser
can visualize it.
To be sure as delimiter I choose chr(127)
which surely is not present in the html file.
Bye.
Received on Thu Sep 29 16:55:44 2005