Re: Optimising a regexp
Available news archives: comp.lang.tcl - comp.lang.python - comp.security.firewalls - sci.crypt - comp.lang.php - comp.lang.javascript
Google
 
Web news.hping.org


comp.lang.tcl archive

Re: Optimising a regexp

From: Glenn Jackman <xx087@freenet.carleton.ca>
Date: Wed Mar 01 2006 - 19:06:32 CET

At 2006-03-01 03:00AM, Paul Whitfield <NoSpam@iinet.net.au> wrote:
> Arjen Markus wrote:
> > You are looking for a fixed string (</table>) that marks the end of the
> > data you are interested in. Why not simply use [string first]?
> >
>
> I am actually looking at code from the tclwebtest package, so
> I can not claim ownership :-)
>
> The problem with string first is that it does not have a -nocase option.
> And the html tags can be upper lower or mixed case.
>
> And tclwebtest is a general html testing package so it is not safe to
> assume anything about case.

You used:
    set maybe_table_p [regexp -nocase -indices {.*?(</table>)} [string range $body $offset end] nada match]]

Arjen is suggesting:
    set body_segment [string range $body $offset end]

    # search on a throw-away, lower case copy of $body
    set index [string first {</table>} [string tolower $body_segment]]

    # now get the "real" html you want
    if {$index != -1} {
        set maybe_table_p [string range $body_segment 0 $index]
    }

or even
    set index [string first </table> [string tolower $body] $offset]
    if {$index != -1} {
        set maybe_table_p [string range $body $offset $index]
    }

-- 
Glenn Jackman
Ulterior Designer
Received on Sun Apr 30 02:19:16 2006