Re: SSN encryption
Available news archives: comp.lang.tcl - comp.lang.python - comp.security.firewalls - sci.crypt - comp.lang.php - comp.lang.javascript
Google
 
Web news.hping.org


sci.crypt archive

Re: SSN encryption

From: Joseph Ashwood <ashwood@msn.com>
Date: Fri Sep 30 2005 - 02:06:15 CEST

"drfremove@nber.org" <feenberg@gmail.com> wrote in message
news:1128033037.054569.253290@f14g2000cwb.googlegroups.com...
> The dataset does include state of residence, which is correlated with
> state of issuance of SSN. However, if I am looking for a particular
> person, knowing that they are from California allows me to exclude 80%
> of the records. Even if that knowledge also allows me to guess at part
> of the SSN, once I have done so it really only allows me to exclude the
> same 80%, so SSN structure isn't a source of weakness unless it helps
> the intruder solve for the remaining digits. With the right encryption
> technology, it shouldn't have that effect.

You're overestimating the ability of encryption to solve the problem. But
I'll cover that more later.

>> The requirement of a 1-1 mapping leads to low levels of protection.

> Let me make it clear that the threat model is not that someone would
> take a record and solve for the SSN that goes with it, but that they
> would locate the record in the database that had a particular SSN of
> interest to them (an ex-spouse, for instance). With 100 million SSNs in
> the database, that may be much harder (again, depending on the method).

Actually it's even easier. Given any unkeyed 1-1 mapping this is simply a
database lookup. Using a keyed 1-1 mapping gets you some, but not much, the
only solution is the one that Tom gave, it is not perfect, and actually
violates your original statement, but it is the only secure way to do it.

> I understand that for a 9 character message, there is no problem for a
> computer to run through all possible messages, but this isn't the crux
> of the problem, which would be to identify which solution was the
> correct one.

Actually, that is the problem. If you have a 1-1 mapping where each output
is unique, there exists an onto mapping, and with only 2^30 of them even the
smallest hard drive on the market could store all of them making it actually
faster to break than compute in the first place.

> With some methods of encryption, it would be possible to
> tell immediately that the correct key had been found - if the decrypt
> contained only digits that would be strong evidence the key had been
> found if invalid keys would decrypt to random bits which would rarely
> be valid SSNs.

You seem to have a high opinion of encryption, unfortunately the limits of
encryption are far below where you put them. What you have is an unsolvable
problem, if you change a requirement (either security is not required or 1-n
is acceptable) there are solutions, without changing requirements the onto
hole will break your security.

> With another method every key, correct or not, might
> decrypt to a string of digits valid as an SSN.

Such systems would rely on compressing the plaintext first, or something
that is provably equivalent (see Unicity Distance).

> In that case it would be
> more difficult to tell if the correct key had been found.

Actually it wouldn't. Your original statement of the problem is that
encrypting the same data will lead to the same result. In the example given
you have an attacker who knows the information being encrypted, the
ciphertext and key are unknown. The odds of a collision in
plaintext:(ciphertext*) for even 4.2 billion (2^32) entries with a 128-bit
cipher are 1/2^96, so the key will be found. Again see Unicity Distance.

> The intruder
> could look at the content of the record for clues, but by assumption
> the intruder doesn't know enough of the record content to use that to
> identify the individual.

Earlier you said "ex-spouse" with known SSN, I should hope someone knows
enough about their ex-spouse to fill in the rest of the information. But if
the extra information is unique, use it in the hash, that will deliver the
security be eliminating one of the limiting assumptions.

>That is why I resist just hashing to a much
> longer string.

You won't find encryption that delivers better, going with 64-bit blocks
(e.g. 3DES) you have severe limiting factors on your database size before
hitting collisions (don't think you'll reach them with an SSN index, but
they are worth considering), also 64-bit blocks should generally not be
considered for new systems because of various limitations. 128-bit blocks
are the same size as the MD5 hash I suggested (although if you want it to
actually be secure I'd suggest a different hash, perhaps SHA-256, or
Whilrpool).

>
>> that for security you MUST have a random component. Unfortunately, this
>> violates the aforementioned requirement.
>>
>> Your best bet would be to use a hash function (e.g. MD5 chosen because
>> it's
>> fast and the security levels here cannot be high anyway). Your other
>> option
>
> The intruder would take the known SSN, hash it using any MD5 program,
> and search the database for the matching hash value. There is no key to
> solve for - the equivalent of ROT13 security.

Exactly the problem you will have unless you can add randomness to the
system. Without a random component your key is not random either, you have
in effect created an encryption oracle in the database add area, a
knowledgable attacker can simply use that to generate the colliding
information regardless of the system used. It is absolutely critical that
you eliminate one of your requirements or the problem is completely
intractable.

> This is much weaker than
> necessary.

Exactly my point, your problem as given is intractable, in my previous
example I chose to sacrifice the security requirement simply because with
the other requirements the security requirement appeared the softest.

> For example, we could add (without carry) a constant K to
> every SSN. Then the intruder would need to have alternate means of
> identifying at least one record before he could solve for K and with
> that search the database for the target record. That would be the ROTN
> level of security. (Admittedly he could probably solve for the first
> few digits from knowledge of the SSN structure - that isn't the point).
> I expect it is possible to do much better. For example, if I make a
> separate substitution map for each character position, I need 9 maps,
> each with 10 elements. An intruder with knowledge of the true SSN for
> as few as 9 records could solve for all the maps, still a substantial
> improvement over ROTN, which beats ROT13. I'd like to do much better
> than that.

Then you'll have to sacrifice one of the other requirements. In the mean
time it still won't beat the MD5 suggestion I made, you'll be less secure,
and it will likely be slower. It is necessary to sacrifice one of the
requirements, pick one, any one.

> If we are going make the comparisons to prior entries, then we can
> assign serial numbers to individuals as they are encountered and no
> encryption technology is required. However the comparison to the entire
> database is expensive and something we were hoping to avoid with
> appropriate encryption technology.

"Hoping to avoid" sounds like a soft requirement to me, maybe it should be
on the short list of ones to sacrifice.
                Joe
Received on Sat Oct 15 04:37:53 2005