Re: How to search HUGE XML with DOM?
Available news archives: comp.lang.tcl - comp.lang.python - comp.security.firewalls - sci.crypt - comp.lang.php - comp.lang.javascript
Google
 
Web news.hping.org


comp.lang.python archive

Re: How to search HUGE XML with DOM?

From: Diez B. Roggisch <deets@nospam.web.de>
Date: Fri Mar 31 2006 - 13:51:01 CEST

> the xml.dom.minidom object is too slow when parsing such a big XML file
> to a DOM object. while pulldom should spend quite a long time going
> through the whole database file. How to enhance the searching speed?
> Are there existing solution or algorithm? Thank you for your
> suggetion...

I've told you that before, and I tell you again: RDBMS is the way to go.
There might be XML-parsers that work faster - I suppose cElementTree can
gain you some speed - but ultimately the problems are inherent in the
representation as DOM: no type-information, no indices, no nothing. Just a
huge pile of nodes in memory.

So all searches are linear in the number of nodes. Of course you might be
able to create indices yourself, even devise a clever scheme to make using
them as declarative as possible. But that would in the end mean nothing but
re-creating RDBMS technology - why do that, if it's already there?

Maybe there are frameworks out there that support you in this, but the very
nature of XML makes that for sure a more tedious task than just defining a
simple SQL-Schema. If I'd have to search for some XML-tools that go beyond
DOM, I'd go for uche ogbuji's 4suite as a starter and work my way down from
there - maybe AMARA is what you need?

Now having said that: I'm not a SQL-bigot. Just use the right tool for the
job.

Regards,

Diez
Received on Sun Apr 30 21:46:19 2006