Re: Public disclosure of discovered vulnerabilities
Available news archives: comp.lang.tcl - comp.lang.python - comp.security.firewalls - sci.crypt - comp.lang.php - comp.lang.javascript
Google
 
Web news.hping.org


sci.crypt archive

Re: Public disclosure of discovered vulnerabilities

From: Bryan Olson <fakeaddress@nowhere.org>
Date: Wed May 25 2005 - 02:38:45 CEST

Douglas A. Gwyn wrote:
>> Bryan Olson wrote:
>>
>>> Later, in code I wrote for that same thread, I passed an
>>> arbitrary char value to the C library's toupper() function. I
>>> had even looked it up in Harbison and Steele's highly-regarded
>>> /C, A Reference Manual/. Turned out the manual was wrong; passing
>>> an arbitrary value of type char to toupper() can cause a buffer
>>> overrun.
[...]
> Olson has mischaracterized the issue, as usual.

Olson stands by his reporting.

> The toupper function has an int argument, not char,
> and it is perfectly safe to feed it any character
> code (or EOF). Olson's problem seems to be that he
> was unaware of the possibility of sign extension
> upon widening of a signed integer type, which might
> necessitate masking off the extension. Of *course*
> if you feed a wildly out-of-range value to toupper
> you get undefined behavior.

That's a misunderstanding of sign-extension and how char values
are converted to int. The sign extension of (char)-17 is the int
value -17. There's no possibility of a "wildly out of range
value". The problem is *not* that C's conversions can change a
char value to an int value that's a different integer; they
cannot. The problem is when char is signed, the integer -17 is a
legal value for a char, but under the ANSI standard, calling
toupper(-17) has undefined behavior (Assuming EOF != -17).

> The only way the C
> standard could have guaranteed well-defined behavior
> in the face of such abuse would have been to force
> the toupper function itself (and all the other
> <ctype.h> functions) to perform the masking

Far from being "the only way", it's not even a way. Masking off
bits is wrong here: distinct values could be masked to the same
thing. Nevertheless, any remotely competent C programmer could
write an efficient toupper() that takes all values representable
as either char or unsigned char.

> in every
> case, even those in which there was never a problem
> (due to the programmer knowing what he was doing).
> It is part of the spirit of C as a SIL that it
> doesn't impose penalties on competent programmers
> merely in order to aid the incompetent.

I had thought I was on strong ground with my call to toupper(),
having read:

      All of the facilities described here operate properly on all
      values representable as type char or type unsigned char, and
      also for the value EOF, but are undefined for all other
      integer values unless the individual description states
      otherwise.
      [Samuel P. Harbison and Guy L. Steele, Jr., /C: A Reference
      Manual/, Fifth Edition, Prentice-Hall, 2002; Chapter 12,
      "Character Processing", page 335.]

The manual is widely respected; for example, the comp.lang.c FAQ
recommends it as "excellent". I was therefore quite surprised
when "infobahn" pointed out that the standard [ISO/IEC
9899:1999] reads:

     7.4 (ctype.h)

     The header <ctype.h> declares several functions useful for
     classifying and mapping characters.166) In all cases the
     argument is an int, the value of which shall be
     representable as an unsigned char or shall equal the value
     of the macro EOF. If the argument has any other value, the
     behavior is undefined.

So at this point, I looked up the errata for H&S; and purchased
a copy of the standard. The item was not yet in the errata, so I
wrote it up and submitted it. Sam Harbison acknowledged that it
seems to be a bug in the text, but noted he's behind in
maintaining the errata. It should appear in the next update.

I noted this to a friend who works on GCC. His reaction was that
it's really a bug in the standard.

-- 
--Bryan
Received on Thu Sep 29 21:38:26 2005