Donal K. Fellows wrote:
> MH wrote:
>> Well.. That depends.. I guess doing a chunk in binary mode would be
>> reasonably fast. Does Tcl actually DO a memcmp on the results though, or
>> does it first convert them to some other representation in order to do a
>> binary comparison?
>>
>> As for speed, well.. Doing Tcl_Gets on a channel (inside a C++
>> program) versus
>> fdreopen and fgets, for a 1 million line file adds a ~1.5 second
>> penalty to
>> my parsing program.
>
> I was thinking about handling the files as binary (so no character set
> conversion, leading to the channel system managing its buffers using
> memcpy instead of slower operations), using [read] (so fixed size chunks
> which again encourage fast data handling), and using [string compare]
> (which really does do memcmp; I've checked!) If the chunks are fairly
> large (a few megabytes is reasonable on a modern machine) the overall
> comparison should be quick. This is considerably at odds with what you
> were proposing...
>
> Indeed, what I'm proposing is exactly this:
>
> proc filesEqual {file1 file2 {chunk 8388608}} {
> # Test the "Duh!" case first
> if {[file size $file1] != [file size $file2]} {
> return 0
> }
>
> # Written to use 8.5a4 features...
> set f1 [open $file1 rb]; set f2 [open $file2 rb]
> # Otherwise, use [fconfigure $f1 -translation binary] here
>
> while {![eof $f1]} {
> # NB: the 'ne' operator is currently slower for this task
> if {[string compare [read $f1 $chunk] [read $f2 $chunk]]} {
> close $f1; close $f2
> return 0
> }
> }
> close $f1; close $f2
> return 1
> }
> ...
Donal,
Would you not want to also fconfigure f1 and f2 with -buffersize $chunk?
Or would this not help?
(I figured it would configure the I/O subsystem to attempt to do read aheads
to keep the buffer full.)
--
+--------------------------------+---------------------------------------+
| Gerald W. Lester |
|"The man who fights for his ideals is the man who is alive." - Cervantes|
+------------------------------------------------------------------------+
Received on Sun Apr 30 03:30:32 2006