Re: SSN encryption
Available news archives: comp.lang.tcl - comp.lang.python - comp.security.firewalls - sci.crypt - comp.lang.php - comp.lang.javascript
Google
 
Web news.hping.org


sci.crypt archive

Re: SSN encryption

From: <drfremove@nber.org>
Date: Fri Sep 30 2005 - 00:30:37 CEST

Joseph Ashwood wrote:
> "drfremove@nber.org" <feenberg@gmail.com> wrote in message
> news:1127996717.351192.56310@g14g2000cwa.googlegroups.com...
> > The requirement is that the same SSN [and only the SSN] should encrypt to
> > the same value [for later comparison]
>
> Let's begin by attempting to find the security requirements (I'll give you a
> hint, any system that has to hold only the SSN like this will not be
> secure). First the SSN is not anything approaching random, in fact the
> Social Security Administration has a website on they are generated
> (http://www.ssa.gov/history/ssn/geocard.html) so there is very little that
> is unguessable, for example 602-12-3456 (apologies to whomever has this
> number, it has been issued and I only guessed) is from California,
> eliminating 3 digits immediately, and the 12 identifies that it is fairly
> early in the usage, in fact while I don't have the timing information in
> front of me 602-12-xxxx is probably mostly retired if not permanently so. So
> the level of protection has to be extremely high in order to prevent the
> leakage of the last group of numbers (serial number).

The dataset does include state of residence, which is correlated with
state of issuance of SSN. However, if I am looking for a particular
person, knowing that they are from California allows me to exclude 80%
of the records. Even if that knowledge also allows me to guess at part
of the SSN, once I have done so it really only allows me to exclude the
same 80%, so SSN structure isn't a source of weakness unless it helps
the intruder solve for the remaining digits. With the right encryption
technology, it shouldn't have that effect.

>
> The requirement of a 1-1 mapping leads to low levels of protection. Even
> assuming all 9 digits are unknown that is less than 2^30 work, so a modern
> computer can run through all combinations in a matter of minutes. This means

Let me make it clear that the threat model is not that someone would
take a record and solve for the SSN that goes with it, but that they
would locate the record in the database that had a particular SSN of
interest to them (an ex-spouse, for instance). With 100 million SSNs in
the database, that may be much harder (again, depending on the method).

I understand that for a 9 character message, there is no problem for a
computer to run through all possible messages, but this isn't the crux
of the problem, which would be to identify which solution was the
correct one. With some methods of encryption, it would be possible to
tell immediately that the correct key had been found - if the decrypt
contained only digits that would be strong evidence the key had been
found if invalid keys would decrypt to random bits which would rarely
be valid SSNs. With another method every key, correct or not, might
decrypt to a string of digits valid as an SSN. In that case it would be
more difficult to tell if the correct key had been found. The intruder
could look at the content of the record for clues, but by assumption
the intruder doesn't know enough of the record content to use that to
identify the individual. That is why I resist just hashing to a much
longer string.

> that for security you MUST have a random component. Unfortunately, this
> violates the aforementioned requirement.
>
> Your best bet would be to use a hash function (e.g. MD5 chosen because it's
> fast and the security levels here cannot be high anyway). Your other option

The intruder would take the known SSN, hash it using any MD5 program,
and search the database for the matching hash value. There is no key to
solve for - the equivalent of ROT13 security. This is much weaker than
necessary. For example, we could add (without carry) a constant K to
every SSN. Then the intruder would need to have alternate means of
identifying at least one record before he could solve for K and with
that search the database for the target record. That would be the ROTN
level of security. (Admittedly he could probably solve for the first
few digits from knowledge of the SSN structure - that isn't the point).
I expect it is possible to do much better. For example, if I make a
separate substitution map for each character position, I need 9 maps,
each with 10 elements. An intruder with knowledge of the true SSN for
as few as 9 records could solve for all the maps, still a substantial
improvement over ROTN, which beats ROT13. I'd like to do much better
than that.

> would be to use the suggestion by Tom, and depend on decrypting each of the
> SSNs for comparison on addition to the database (this can be secure).

If we are going make the comparisons to prior entries, then we can
assign serial numbers to individuals as they are encountered and no
encryption technology is required. However the comparison to the entire
database is expensive and something we were hoping to avoid with
appropriate encryption technology.

> Joe

Daniel Feenberg
feenberg isat nber dotte org
Received on Sat Oct 15 04:37:51 2005