Tag: JavaScript

Fundamentals: Hashing

November 20, 2011 » Geek

What the hash?

This post is about a fundamental programming concept, hashing.

“A hash function is any algorithm or subroutine that maps large data sets to smaller data sets, called keys.”

Hash function
Wikipedia – 2011/11/15

So, a hashing function can turn this:

Hello, my name is John Hobbs. I like hashes.

into this:

8ced81aa42cd91930024054f26ed92ea

Doesn’t seem super useful, does it? The content looses its semantic value. 8ced81aa42cd91930024054f26ed92ea is not the same as a personal introduction.

That’s okay though, we can find a use for a hash that won’t be seen by a human.

But first, let’s try our hand at writing a hash function.

Mapping to keys

So, the principal function of a hashing algorithm is to reduce a large block of data into a smaller block of data.

Let’s set up some arbitrary limits for developing our function.

For simplicity, let’s work at hashing values into a one byte integer, so eight bits.

Let’s also focus on only hashing ASCII strings, since ASCII characters are conveniently one byte long.

Now that that is established, let me propose a hashing function in pseudo-code:

For those of you not familiar with XOR, I mean exclusive or. This means comparing the bytes bit by bit. If the bits are the same, the result is zero. If they are different, the result is one. The common notation for XOR is ^. Here’s an example of 5 ^ 117:

Working at a byte by byte level like this ensures that our hash will not exceed 8 bits in length, or an integer value of 128.

Buckets

With our shiny little hash function in mind, let’s use find a use.

One great use is lookup tables. These are sometimes called dictionaries, associated arrays, hash tables, etc.

Basically, they map a key to a value. So you tell a table that “my name” is “john”, and then later, if you ask it for “my name” is tells you “john”. Great tool for amnesiacs.

Hashing is perfect here for faster lookups, and for giving bound memory performance.

Think of it as a finite line of buckets. When you want to store something you hand it to the walrus, and say “Please keep this fish in a bucket called ‘Salmon’.”

Now, the walrus only has so many buckets, and none of them are labeled “Salmon”.

So, he uses a special walrus method (a hashing function) that deterministically says “Salmon == 5”, and puts the fish in the fifth bucket.

Later, when you come back and ask for the fish from the “Salmon” bucket, he knows to get the fish out of the fifth bucket because he can derive that “Salmon == 5” using the same method as before.

Unfortunately, he ate the fish already, so there is nothing to give you

Walrus are selfish that way.

Implementation

Now that we have a use case, let’s implement our hashing function.

I chose JavaScript here.

Let’s give it a try:

Cool! Now there is just a little more glue to make this complete. We have to have a get and a set for values, and an array to store them in.

The get and set functions are pretty similar. You hash the key, and use that integer as the index to either set, or retrieve a value in the table array.

Let’s give it a try!

Perfect!

Collision course

Now let’s use our SimpleHashTable to store some stuff!

Wait, what the heck just happened?!

It shouldn’t have printed the value for q, we asked for salutations!

This is known as a collision, and it happens.

What happened is that the hash value we compute for q is the same value as the hash for salutations. It’s bound to happen, we only have 128 possible buckets and an infinite amount of possible keys.

So we have several ways to improve this situation.

One option is more buckets.

We can also improve the hashing function, but this only buys us fewer, more predictable collision rates.

Another common fix is a second level to each bucket keeping the exact key, value tuple, and doing a scan on collisions.

That would be like the walrus putting a sticky note that says “Salmon” on the fish before tossing it into the bucket. Then, when you ask for the “Salmon”, he rifles through the bucket, checking each fish until he finds one labeled “Salmon”.

I’ll leave these as an exercise for the reader, because it’s time we got moving on.

Real hashes

Okay, so now you have an example of a simple hash you could use for table lookups. That’s cool and everything, but in general we should leave details like that to the neckbeards. It’s good to know roughly how it works, but we don’t need to do it ourselves.

What else are hashes good for? Well, there is a specie of hash called a cryptographic hash. These hashes are cryptographic primitives, with strong promises of distribution and collision avoidance.

Examples you may have come across are MD5, or the SHA family of algorithms.

Implementation

These algorithms are considerably more complex than the hash function we created earlier, so I would urge you not to write your own except as an exercise.

Implementations are available in a variety of languages, so Google wisely.

Uses

This class of hash has several every day uses that apply to developer-types.

Data Integrity

Since hashes are deterministic functions, the output can be used as a fingerprint for the input. Passing data and the hash of that data allows for others to quickly check and make sure that the data has not become corrupted or tampered with in transit.

This assumes a secure channel for the hash, so use it wisely.

If you are on a Linux machine, you can quickly get the fingerprint for a file. In this example I’ll use the sha1 algorithm to verify the Redis 2.4.2 source code.

Comparing that hash to the Redis website, it looks like my download completed without errors. Huzzah!

Secret Verification

Another common use of hashing is to verify secrets, without storing the secret in the clear.

Say I was writing a system login, and I needed to use usernames and passwords.

I know it’s bad to store passwords in plaintext on my server. What if my server gets hacked and the user table is stolen?

I can’t store my passwords in the clear, but I need to reliably verify them.

Hashes make for a great solution to this problem.

You hash the password and store it in your database. Then, when a user wants to login, you hash the password they provide and compare it to the one in the database.

If they match, you log them in.

Doing this properly is a bit more complicated than that, so if you need to work with passwords like this, I recommend you read up.

Data Authenticity

We can also use hashes to confirm the authenticity of a message.

Say we both know a secret phrase, and no one else does.

If you want to send a message to me, and want it to be provably from you, you can send me the message along with a hash of the message concatenated to the secret.

Then, when I get the message, I also take the hash of the message plus the secret and compare it to your provided hash.

If they match, then I know it’s from you.

This is encapsulated in something called a Hashed Message Authentication Code, or HMAC.

File De-duplication

One final case that I regularly use is file de-duplication.

If you host uploads for a website, it’s a nice trick to store upload meta data (user, original name, upload time, etc.) separate from the file itself.

By de-coupling these you can hash files on upload, and if you already have a copy of that content, discard it and point the meta data at the existing file.

Here is an example in pseudo-code:

Conclusion

Hashes are useful tools for programmers, so learn the concepts and the applications.

I hope you learned something from this post! How do you use hashes?

iCloud Shimmer Effect

October 12, 2011 » Geek

Today Alex pointed out the new iCloud website had lot’s of fancy effects. One I liked best was the polished metal effect on the login box that shimmered when you moved your mouse.

I went ahead duplicated it as best I could in a short time. There are some obvious differences in approach, but it’s essentially the same.

Shimmery Effect
View The Demo

One thing I did not do was the easing on the mouse move. I really like that, but it would be time consuming to get it running.

Also, I’m not browser compatible. I only tested it in Chrome 14.

Most of the work is done in two functions.

mousemove takes the mouse position and converts it to a degree of rotation.

draw rotates the canvas and draws the image onto it.

That’s essentially it. Simple, but visually powerful. The source is embedded in the demo and commented.

A Simple JavaScript Hooks System

August 19, 2010 » Geek

I was looking to add some more extensibility to a project this week and I couldn’t find a hook system for JavaScript. I wanted something similar to the PHP hook system in MediaWiki, but Google just wasn’t much help.

I’m sure there is something out there that does what I need, but it’s such a simple thing I went ahead and implemented it.

hook.js

Extensions can “get in line” for a hook by calling register with the hook name and callback.

Core code (or even other extensions actually) can call hooks by using the call method, with name and an argument array (think argv). If a hook returns anything other than true, processing of the hook ceases.

To do useful things you have to set up the right arguments. Since objects are passed by reference in JavaScript, you can manipulate anything in the argument array from inside of your hook (or even add to the array if you want).

Obviously this is a simplified tool. All code is implicitly trusted, argument specification is non-existent, there is no prioritization (except for insertion order) and hooks are not guaranteed to run. But it works!

You can check out some basic usage code and test it out here.

Suggestions are welcome!

PanoramAh: A jQuery Panorama Viewer

August 18, 2010 » Geek

Update (2010-08-25)

New version that supports multiple panorama’s on a page, check it out here.

As I have recently made some panoramas with Hugin I needed a solution for displaying them on the web. I searched around a bit and found this one from Gaya Designs.

I liked the clever technique for scrolling linked to the mouse via the background-position, but I didn’t care for the Prototype dependency or the rather large and feature-full script.

Taking that into consideration I whipped up my own viewer based on the same principles that used jQuery.

The Markup

The markup is dead simple. A class to identify the container, a little loading message and an img tag to preload into. I used the rel attribute on the container to carry the image URL and the width of the image. Height of the container should be set to the height of the panorama, width can be anything you like.

The JavaScript

Again, pretty super easy. I’ll let the code speak for itself.

The Demo

It works for me in Firefox and Chrome. YMMV.

Check it out!

Charting Weight Change With Google Visualizations

July 5, 2010 » Consume, Geek, Life

I started trying to lose weight a while back, since we both know I’m a bit heavy and sitting in front of a computer isn’t going to lose the weight for me.

Naturally, it’s important that I incorporate technology into my weight loss somehow, right? So I decided to give the Google Visualizations API a spin.

I worked up a quick data format and a method to pop the data out. Nothing fancy, just a fixed width flat file. This doesn’t deserve a database.

Easy to read, easy to edit, and easy to consume. Every morning I just hop on the server, add the day’s weight and log off.

Now I just needed to represent it. The API is very object oriented and easy to work with. I wish there was a less verbose way of presenting the data, but you can’t have everything.

Actually, there may be a better way, I just didn’t come across it while speed reading the docs.

And there you have it, fancy charting in no time.

Example Chart

See it in action at http://static.velvetcache.org/weight.php

Get the full source at http://gist.github.com/459148.