Re: regexp for sequence of quoted strings
Available news archives: comp.lang.tcl - comp.lang.python - comp.security.firewalls - sci.crypt - comp.lang.php - comp.lang.javascript
Google
 
Web news.hping.org


comp.lang.python archive

Re: regexp for sequence of quoted strings

From: Alexander Schmolck <a.schmolck@gmx.net>
Date: Wed May 25 2005 - 22:55:28 CEST

gry@ll.mit.edu writes:

> I have a string like:
> {'the','dog\'s','bite'}
> or maybe:
> {'the'}
> or sometimes:
> {}
>
> [FYI: this is postgresql database "array" field output format]
>
> which I'm trying to parse with the re module.
> A single quoted string would, I think, be:
> r"\{'([^']|\\')*'\}"

what about {'dog \\', ...} ?

If you don't need to validate anything you can just forget about the commas
etc and extract all the 'strings' with findall,

The regexp below is a bit too complicated (adapted from something else) but I
think will work:

In [90]:rex = re.compile(r"'(?:[^\n]|(?<!\\)(?:\\)(?:\\\\)*\n)*?(?<!\\)(?:\\\\)*?'")

In [91]:rex.findall(r"{'the','dog\'s','bite'}")
Out[91]:["'the'", "'dog\\'s'", "'bite'"]

Otherwise just add something like ",|}$" to deal with the final } instead of a
comma.

Alternatively, you could also write a regexp to split on the "','" bit and trim
the first and the last split.

'as
Received on Thu Sep 29 16:13:54 2005