Douglas A. Gwyn wrote:
>> Bryan Olson wrote:
>>
>>> Later, in code I wrote for that same thread, I passed an
>>> arbitrary char value to the C library's toupper() function. I
>>> had even looked it up in Harbison and Steele's highly-regarded
>>> /C, A Reference Manual/. Turned out the manual was wrong; passing
>>> an arbitrary value of type char to toupper() can cause a buffer
>>> overrun.
[...]
> Olson has mischaracterized the issue, as usual.
Olson stands by his reporting.
> The toupper function has an int argument, not char,
> and it is perfectly safe to feed it any character
> code (or EOF). Olson's problem seems to be that he
> was unaware of the possibility of sign extension
> upon widening of a signed integer type, which might
> necessitate masking off the extension. Of *course*
> if you feed a wildly out-of-range value to toupper
> you get undefined behavior.
That's a misunderstanding of sign-extension and how char values
are converted to int. The sign extension of (char)-17 is the int
value -17. There's no possibility of a "wildly out of range
value". The problem is *not* that C's conversions can change a
char value to an int value that's a different integer; they
cannot. The problem is when char is signed, the integer -17 is a
legal value for a char, but under the ANSI standard, calling
toupper(-17) has undefined behavior (Assuming EOF != -17).
> The only way the C
> standard could have guaranteed well-defined behavior
> in the face of such abuse would have been to force
> the toupper function itself (and all the other
> <ctype.h> functions) to perform the masking
Far from being "the only way", it's not even a way. Masking off
bits is wrong here: distinct values could be masked to the same
thing. Nevertheless, any remotely competent C programmer could
write an efficient toupper() that takes all values representable
as either char or unsigned char.
> in every
> case, even those in which there was never a problem
> (due to the programmer knowing what he was doing).
> It is part of the spirit of C as a SIL that it
> doesn't impose penalties on competent programmers
> merely in order to aid the incompetent.
I had thought I was on strong ground with my call to toupper(),
having read:
All of the facilities described here operate properly on all
values representable as type char or type unsigned char, and
also for the value EOF, but are undefined for all other
integer values unless the individual description states
otherwise.
[Samuel P. Harbison and Guy L. Steele, Jr., /C: A Reference
Manual/, Fifth Edition, Prentice-Hall, 2002; Chapter 12,
"Character Processing", page 335.]
The manual is widely respected; for example, the comp.lang.c FAQ
recommends it as "excellent". I was therefore quite surprised
when "infobahn" pointed out that the standard [ISO/IEC
9899:1999] reads:
7.4 (ctype.h)
The header <ctype.h> declares several functions useful for
classifying and mapping characters.166) In all cases the
argument is an int, the value of which shall be
representable as an unsigned char or shall equal the value
of the macro EOF. If the argument has any other value, the
behavior is undefined.
So at this point, I looked up the errata for H&S; and purchased
a copy of the standard. The item was not yet in the errata, so I
wrote it up and submitted it. Sam Harbison acknowledged that it
seems to be a bug in the text, but noted he's behind in
maintaining the errata. It should appear in the next update.
I noted this to a friend who works on GCC. His reaction was that
it's really a bug in the standard.
--
--Bryan
Received on Thu Sep 29 21:38:26 2005