Tag: Hashing

Hashes Are Not *$&%@! Magic

September 27, 2012 » Geek

I’m going to get on a programming soapbox real quick and cover a topic that seems to confuse some people.

Hashes Are Not *$&%@! Magic

Some people seem to think that swapping out a secret with a hashed version of that secret makes it all safe and cozy, but that’s simply not true.

Yes, cryptographic hashes are a very important part of digital security, for a number of good reasons, but they have to be applied in a manner which takes the whole system into account.

The impetus for this work was a login integration I recently updated, because some other developer foolishly applied hashes.

Essentially, we were cross-posting a login form on one website to another. Nothing fancy. Ignore the lack of CSRF control.

The New Form

But the new form would need a change. Instead of sending the username and password, we would send the username, and an MD5 hash of the concatenation of username and password.

Now, I’m sure when this idea was implemented, it was sold as a way to authenticate the user, without exposing their password in plaintext (note that they don’t use SSL). Brilliant!

Yes, it does obscure the plaintext password, but it is not any more secure.

You see, they didn’t think about the system as a whole, they were just focused on obscuring the password.

All that happened here is a substitution of shared secrets.

Previously the server compared the username and password it has on file to what was sent in. Now it compares the username and the hashed password to what it has on file. Do you see what we did? We’ve simply swapped the secret of the plaintext password for the secret of the hashed password. I can still intercept your form submission over the wire and steal your credentials.

I don’t have to prove I know the password, I have to prove I know the secret.

Zero gain, and you’ve added complexity.

MD5, lol

As a bonus, they picked MD5, probably because it’s been implemented many times, there is a JavaScript version readily available, and it tends to be one of the first hashes people learn about, due to it’s age.

But MD5 is weak. And we have the salt, if you can call it that, in the username. An old 2Ghz P4 can try about 20 Million hashes a second, and throwing a modern GPU at it you can test several billion hashes a second. If we want the plaintext password, we can get it unless it is reasonably large (7+ characters) and fairly complex (at least one non-alphanumeric character).

(╯°□°)╯︵ ┻━┻

For an extra thought, consider how they must be storing these passwords. Either there scheme has always been MD5(CONCAT(username,password)) or they are storing them in plaintext and are (hopefully) migrating to hashed.

Fundamentals: Hashing

November 20, 2011 » Geek

What the hash?

This post is about a fundamental programming concept, hashing.

“A hash function is any algorithm or subroutine that maps large data sets to smaller data sets, called keys.”

Hash function
Wikipedia – 2011/11/15

So, a hashing function can turn this:

Hello, my name is John Hobbs. I like hashes.

into this:

8ced81aa42cd91930024054f26ed92ea

Doesn’t seem super useful, does it? The content looses its semantic value. 8ced81aa42cd91930024054f26ed92ea is not the same as a personal introduction.

That’s okay though, we can find a use for a hash that won’t be seen by a human.

But first, let’s try our hand at writing a hash function.

Mapping to keys

So, the principal function of a hashing algorithm is to reduce a large block of data into a smaller block of data.

Let’s set up some arbitrary limits for developing our function.

For simplicity, let’s work at hashing values into a one byte integer, so eight bits.

Let’s also focus on only hashing ASCII strings, since ASCII characters are conveniently one byte long.

Now that that is established, let me propose a hashing function in pseudo-code:

For those of you not familiar with XOR, I mean exclusive or. This means comparing the bytes bit by bit. If the bits are the same, the result is zero. If they are different, the result is one. The common notation for XOR is ^. Here’s an example of 5 ^ 117:

Working at a byte by byte level like this ensures that our hash will not exceed 8 bits in length, or an integer value of 128.

Buckets

With our shiny little hash function in mind, let’s use find a use.

One great use is lookup tables. These are sometimes called dictionaries, associated arrays, hash tables, etc.

Basically, they map a key to a value. So you tell a table that “my name” is “john”, and then later, if you ask it for “my name” is tells you “john”. Great tool for amnesiacs.

Hashing is perfect here for faster lookups, and for giving bound memory performance.

Think of it as a finite line of buckets. When you want to store something you hand it to the walrus, and say “Please keep this fish in a bucket called ‘Salmon’.”

Now, the walrus only has so many buckets, and none of them are labeled “Salmon”.

So, he uses a special walrus method (a hashing function) that deterministically says “Salmon == 5”, and puts the fish in the fifth bucket.

Later, when you come back and ask for the fish from the “Salmon” bucket, he knows to get the fish out of the fifth bucket because he can derive that “Salmon == 5” using the same method as before.

Unfortunately, he ate the fish already, so there is nothing to give you

Walrus are selfish that way.

Implementation

Now that we have a use case, let’s implement our hashing function.

I chose JavaScript here.

Let’s give it a try:

Cool! Now there is just a little more glue to make this complete. We have to have a get and a set for values, and an array to store them in.

The get and set functions are pretty similar. You hash the key, and use that integer as the index to either set, or retrieve a value in the table array.

Let’s give it a try!

Perfect!

Collision course

Now let’s use our SimpleHashTable to store some stuff!

Wait, what the heck just happened?!

It shouldn’t have printed the value for q, we asked for salutations!

This is known as a collision, and it happens.

What happened is that the hash value we compute for q is the same value as the hash for salutations. It’s bound to happen, we only have 128 possible buckets and an infinite amount of possible keys.

So we have several ways to improve this situation.

One option is more buckets.

We can also improve the hashing function, but this only buys us fewer, more predictable collision rates.

Another common fix is a second level to each bucket keeping the exact key, value tuple, and doing a scan on collisions.

That would be like the walrus putting a sticky note that says “Salmon” on the fish before tossing it into the bucket. Then, when you ask for the “Salmon”, he rifles through the bucket, checking each fish until he finds one labeled “Salmon”.

I’ll leave these as an exercise for the reader, because it’s time we got moving on.

Real hashes

Okay, so now you have an example of a simple hash you could use for table lookups. That’s cool and everything, but in general we should leave details like that to the neckbeards. It’s good to know roughly how it works, but we don’t need to do it ourselves.

What else are hashes good for? Well, there is a specie of hash called a cryptographic hash. These hashes are cryptographic primitives, with strong promises of distribution and collision avoidance.

Examples you may have come across are MD5, or the SHA family of algorithms.

Implementation

These algorithms are considerably more complex than the hash function we created earlier, so I would urge you not to write your own except as an exercise.

Implementations are available in a variety of languages, so Google wisely.

Uses

This class of hash has several every day uses that apply to developer-types.

Data Integrity

Since hashes are deterministic functions, the output can be used as a fingerprint for the input. Passing data and the hash of that data allows for others to quickly check and make sure that the data has not become corrupted or tampered with in transit.

This assumes a secure channel for the hash, so use it wisely.

If you are on a Linux machine, you can quickly get the fingerprint for a file. In this example I’ll use the sha1 algorithm to verify the Redis 2.4.2 source code.

Comparing that hash to the Redis website, it looks like my download completed without errors. Huzzah!

Secret Verification

Another common use of hashing is to verify secrets, without storing the secret in the clear.

Say I was writing a system login, and I needed to use usernames and passwords.

I know it’s bad to store passwords in plaintext on my server. What if my server gets hacked and the user table is stolen?

I can’t store my passwords in the clear, but I need to reliably verify them.

Hashes make for a great solution to this problem.

You hash the password and store it in your database. Then, when a user wants to login, you hash the password they provide and compare it to the one in the database.

If they match, you log them in.

Doing this properly is a bit more complicated than that, so if you need to work with passwords like this, I recommend you read up.

Data Authenticity

We can also use hashes to confirm the authenticity of a message.

Say we both know a secret phrase, and no one else does.

If you want to send a message to me, and want it to be provably from you, you can send me the message along with a hash of the message concatenated to the secret.

Then, when I get the message, I also take the hash of the message plus the secret and compare it to your provided hash.

If they match, then I know it’s from you.

This is encapsulated in something called a Hashed Message Authentication Code, or HMAC.

File De-duplication

One final case that I regularly use is file de-duplication.

If you host uploads for a website, it’s a nice trick to store upload meta data (user, original name, upload time, etc.) separate from the file itself.

By de-coupling these you can hash files on upload, and if you already have a copy of that content, discard it and point the meta data at the existing file.

Here is an example in pseudo-code:

Conclusion

Hashes are useful tools for programmers, so learn the concepts and the applications.

I hope you learned something from this post! How do you use hashes?

Replacing Kohana 3 Auth module hashing

November 9, 2011 » Geek

The password hashing in the Auth module provided with Kohana 3.1 is not very good. By default it is a simple sha256 hmac with a global salt.

modules/auth/classes/kohana/auth.php

This isn’t strong. If you loose the hashes and the salt it’s just a matter of winding up a GPU.

So how can we fix this? Well, thanks to Kohana’s structure we can easily override the Auth class and tweak it. However, due to Auth’s structure, we can’t drop the global salt. The hash function has to stand alone, so no passing in salts from the database.

That leaves us with key stretching.

Now, I don’t want to deal with a custom key stretching implementation, I’m not a cryptographer. So, let’s find an existing algorithm.

One that pops to mind is PBKDF2. This is a pretty simple algorithm, so it was easy to find and spot check a PHP implementation

We just take some test vectors from RFC 3962 and run them against the code we found.

Run it, and everything checks out:

So now all that’s left is to drop it in, which is pretty simple. One thing to note is that I wanted this to stay compatible with the default auth config file, so I just extended that a little bit.

application/classes/auth.php

application/config/auth.php

One item to note is that I am packing these with base64_encode. This is to fit into the default field type for the ORM driver. That is also why my length is stunted to 45. If you really want to go all out, alter your table to use a TINYBLOB, up the length to 256 bit and up the rounds.

So that is how I replace weak hashing in K3 with something a bit better.

How do you do it?

Premature optimization is the root of all evil. But don’t be stupid.

August 2, 2010 » Geek

There is a relatively prevalent quote in the programming world, bandied about by programmers of all creeds.

“Premature optimization is the root of all evil.”

– Donald Knuth

I agree. What we intuit about optimization is usually wrong. And what’s more, until you hit a bottleneck, it’s often a waste of time. Moore has been good to us, and CPU bound problems aren’t as common as they once were.

That said, I think people should not cling to this. It’s dumb.

Today I was looking for a better way to compute sha1 file hashes in PHP. I know about sha1() obviously, but that’s not how it should be done for files.

Python provides the excellent hashlib that lets you update a hash with blocks of data. It’s excellent for file hashing, because you can read in data in chunks and update the hash as you go, thus avoiding reading the whole file into memory at once. Here, have a sample program that reads in 512 byte chunks.

So I quickly found the sha1_file() function. Perfect, I’m sure this reads in chunks, otherwise they would not have bothered to make the function.

Then I scroll down to check the user contributed notes for anything interesting. The top two notes are examples of using this snippet to get a sha1 hash of a file:

I was stunned. The definition of file_get_contents() is as follows:

“Reads entire file into a string.”

Surely they are not suggesting that I read the entire file into memory, as a PHP variable, and then pass it on to the hashing function? Who would think that is a good solution, when there is a perfectly good built in function to do this for you, in chunks, in a neater fashion?

My only explanation is ignorance, or utter lack of consideration. I gave it a test with these two PHP scripts.

The first version weighed in as using slightly more memory than second one. About 89k actually. And test.jpg is 88k.

Imagine if that picture was a few megs bigger. Imagine if I could somehow trigger this process on your website, over and over again with ab or something. It’s a DOS in the making.

Develop however you want to, just please, please don’t be stupid.