Category: Geek

Homoglyph Substitution for URL’s

August 29, 2014 » Geek

At Pack we use ascii-based unique identifiers in URL’s a lot. We call them slugs. Dogs have them, users have them, breeds have them, etc.

I made the decision early on to keep the slugs plain old ascii. No unicode. These are primarily for URL’s, and I wanted them easy to type. Most slugs in the system are automatically generated. These slugs are derived from names when a dog or user is created in the system. This is a problem, because there are a lot of people in the world who use characters outside of the ascii set.

Usually, the solution is just to drop non-ascii characters. This is the simplest option, and it works. For example, Designer News uses this technique. In the case of John Henry Müller, they simply drop the ü because of the umlaut, giving him the user URL of Müller becomes mller. I find this less than optimal.

A second technique is to use homoglyph substitution. A homoglyph is a character which is visually similar to another, to the point that they are difficult to quickly distinguish with just a glance. I’m familiar with them from the world of phishing, where people register domains that look very similar to other domains by using homoglyphs.

Once you build a list of homoglyphs, it’s easy to create slugs that are ascii only through substitution. We expanded the definition of homoglyph for our list to include anything you could squint at funny and think they were similar. The method is a bit brute force, but it only ever runs once per string, and I think the outcome is worth it.

This works well for us, we get reasonable URL’s for dogs like “Hólmfríður frá Ólafsfjordur”. holmfriour-fra-olafsfjordur is not the same, but it’s close enough for a URL that you don’t mind, and it’s better than using hlmfrur-fr-lafsfjordur.

Hólmfríður frá Ólafsfjordur

Unfortunately, this doesn’t work well for un-romanized languages, notably asian languages, such as “クッキー“. In this case, the system breaks down and we end up with no usable slug, so we build from a default. I’m still seeking a solution for that. Maybe I should use automatic translation on it.

Custom Mailbox Betacoins

August 20, 2014 » Geek

Yesterday, Mailbox released their beta Mac app. One cute thing they did, was that instead of a beta link or code, they distributed cute little animated gif coins which you could then drop into a “tin can” in the app to gain access.

A betacoin

I was intrigued by the concept, so I got some used betacoins from my friends and did a little digging to figure out how they were doing it.

My plan was to diff the coins and see what was changed from coin to coin, but I didn’t even need to do that. A quick inspect with gifsicle revealed an obvious token in the gif comments extension block.

From there I checked a couple other coins to see if they had differing comments, and sure enough they did.

So now the question became, could I add the comment from a valid betacoin to another gif and have it still work?

I grabbed a lovely gif of a barfing unicorn off the web, and set to work.

I then downloaded the beta, crossed my fingers, and dragged the unicoin into the tin can. I was rewarded with a tinkle of a coin dropping in, and access to the beta.

This is a valid betacoin.
This is a valid betacoin.

Turns out, Mailbox could care less what else is in your gif. Just so long as you have a comment with a valid token, it’ll use that gif and animate it prettily.

As an aside, the coin gif has a staggering 122 frames. 122. Sparkles are expensive, yo.

Edit (2014-08-20)

I created a service for changing up your Mailbox betacoins, called Unicoin. You’re welcome.

Second Edit (2014-08-20)


Tags: ,

Building Pandemonium

May 18, 2014 » Geek


Every year, What Cheer creates something fun for Big Omaha.

Previous years have been very interactive, requiring direct participation. A seek and find game, a conference only chat tool, etc. These have been fun, but interaction with the project is sporadic and not ubiquitous. This year we decided to build something that everyone would participate in, simply by being in the audience. Alex had the excellent idea of tracking the loudness of the auditorium over time, and we decided to monitor Twitter as well.


To measure sound levels in the auditorium (hangar? main stage?) we would obviously need some hardware on site. We chose a Raspberry Pi for simplicity, and because we already understood it. I initially experimented with using an electret microphone and GPIO, but as time ran out I went simpler and ordered a USB audio interface to plug in.

Before the event Paul and I went to KANEKO to set things up. The helpful guy from who was setting up the network gave us a hard line so we wouldn’t have to deal with wifi traffic, we ran the mic up the wall, plugged it in and watched the data flow. Pretty smooth install.

Raspberry Pi taped to the floor boards.
Raspberry Pi taped to the floorboards.

Our little mic on the wall.
Our little mic on the wall.


The architecture of Pandemonium is perhaps a bit over complex, but I was having fun gluing things together and who’s gonna stop me?

Pandemonium Architecture


Audio starts at the input, which we read with PyAudio. We read 10ms of audio, then calculate the RMS Amplitude of that data to produce our “loudness” value.

This packet gets pushed into a queue with a timestamp that is shared with the UDP client process. This process picks it up, and collects 50 other samples finding the peak value. Once it has collected 50 packets (0.5 seconds) it takes the peak value, wraps it with a signature and sends it off. The signature is an abbreviated HMAC to verify the origin and quality of the data. Originally we were sending 100% of the samples collected, so 100 per second. We decided that was a bit extreme and added the summarization code to reduce it to twice per second.

The UDP server receives the packet, unpacks it, and checks the signature. If it’s valid, it stores it in MySQL (async) and also pushes it to a Redis pubsub channel.

From there a node.js server picks it off the Redis pubsub channel and sends it down through to waiting clients. Even with all these hops, the roundtrip is pretty snappy, and there is less than a second of obvious lag.

On the client side we had a digital VU-style meter which scaled the volume over it’s seven bars and lit up accordingly. We also pushed the data to a live graph powered by HighCharts.

Pandemonium VU-Style Meter


Tweets were collected for the hashtag #bigomaha and stored directly into MySQL by a daemon using the Twython library.

A second process would aggregate and average the tweets per second, then push that data to a Redis pubsub channel to be distributed by the node.js bridge.

Since there isn’t a natural comparative value for Tweets, the aggregator keeps the peak value in memory and compares the current value against that for a percentage. Not perfect, but it’s works.

Mistakes Were Made

Everything performed better than I expected, honestly. We didn’t have the opportunity to test the audio sampling at a large, loud venue, so I was worried about that. Paul and I installed it in the back of the auditorium, just past a speaker, and put the mic as high up the wall as we could, which seemed to isolate it pretty well.

However, there were some problems. Due to a fat finger, none of the audio data from day one was saved until about 3pm. So that was a bummer. A quick fix gave us good data for day two through.

My second goof was that the MySQL library I used for storing tweets assumed that data was latin-1, even though I created my tables as utf-8. So, when people tweeted anything with odd characters, the database barfed and it dropped the tweets. That also got fixed in the afternoon on day one.


I think it was a neat project, I certainly had fun building it. And it worked, which is always what we are aiming for, and it didn’t require any direct interaction from attendee’s to succeed, it survived on it’s own. I wish I hadn’t made mistakes, but they weren’t too damaging to the real-time experience at Big Omaha.

Day one data.
Day one data.


November 12, 2013 » Consume, Geek

Update (2014-02-10): I forked an existing mini-campfire chrome extension and added some tweaks of my own. That is what I use now. You can get it here;

I use Campfire in my browser, and I often run multiple rooms side by side. This gets crowded in each browser window, so I made a bookmarklet to easily trim out the extra UI. It drops the sidebar and header, and widens the remaining elements. You loose some functionality, but I find I don’t really need it. A refresh will restore it to normal.


You can add this to your bookmarks bar by dragging this link to the bar: ThinFire

Building the Chicken Cam

October 8, 2013 » Geek

Yesterday I published a Chicken Cam for Buttercup farm, basically just a webcam for the 21 chicks we got a few weeks ago.


Oh, hey.

I had decided to do this project on a whim. I had a Raspberry Pi lying around unused, and I figured it would be simple to get running. It was a simple project, but I hit a few constraints which made it take longer.


One of the first problems I came across was bandwidth. I live 10 minutes from Omaha, and 10 from Blair. We do not have blazing internet out here. I have ADSL which is spotty when the weather is bad. I can’t afford to dedicate too much of my upstream to a chicken cam.

My first thought was to take a photo at an interval, and push it out to S3. That would save bytes since it would push the cost of serving more than one user outside of my link. The problem with that, is I didn’t have a simple mechanism to tell my camera to start and stop sending images. It was always on, and always consuming bandwidth.

My second thought was a proxy type system, and that’s what I ended up using. I wrote a quick node app with a background function on an interval which requests a new image from the camera. It stores this JPEG into a buffer and sleeps. When it wakes up, it checks a timestamp to see if someone has recently requested a new frame. If they have, we loop again, otherwise we wait a while and check again.

Update Loop

This way we serve images at a decent rate, and don’t use bandwidth when no one is watching.

I put this proxy up on Heroku and it’s been humming along just fine.


The Raspberry Pi

The RPi has decent specs, it’s not a really wimpy machine in an embedded context, but I figured, why push it? I wanted to find a really lightweight way to serve up the images.

I initially looked at motion, but it was way too feature rich and heavy. Likewise I ruled out ffmpeg because I wanted stills on demand, not an MJPEG stream.

Luckily, I eventually found tinycamd, a little c program which worked well after a few tweaks.

I had to compile this myself on the RPi since it’s not in the Debian repos. Easy enough. I started with the CNXSoft minimal image and installed build-essential and libjpeg-dev.

Let that run for a bit and then you can build the program. It’s a very simple program, with a very simple build process. No autotools, just type “make”.

One change I had to make to get it to compile was turn off -werror for the compiler, it was dying on a “variable set but not used” warning, which isn’t really a big deal to ignore.

Removing -werror from the CFLAGS lets it build, which it does pretty quick.

The Webcam

The next hurdle I encountered might be specific to my hardware. I’m running a Logitech UVC webcam I had laying around. It claims it can stream JPEG’s straight from the hardware, and it claims you can set the JPEG compression rate, but it was dying during the ioctl for setting that compression level, VIDIOC_G_JPEGCOMP error 25, Inappropriate ioctl for device.

Rather than fighting it further, I commented out that chunk of tinycamd and ran it with the YUYV format and pushed JPEG compression of the camera and into libjpeg. This makes it consume more resources, but it was the quickest workaround for me.

With all that done I installed it as a daemon (it ships with an init script) and I was good to go.

Running, it stays under 30% CPU and 3% of memory, and that is with YUYV conversion and 40% compression on the JPEG frames. Pretty good.

The Bunker

Got Concrete?

The final hurdle was our garage, where the chicks are kept. We have a very unusual garage, consisting of a concrete silo with very thick concrete walls. It’s also the furthest point from the router in the house. Wifi is slow and spotty out there, so I bought a bridge and ran 50′ of Ethernet cabling from the garage into the house where the bridge was set up. This decreased latency enough to make the camera viable.

The Ethernet Bridge

The End Result

It took more effort than I thought it would, and a few days of waiting for hardware to ship, but I think it was worth it. I intend to try and keep the camera running as they grow up and move outside, which will involve running even more cabling and probably a power line. We’ll see if I stick to it or not.

If you want run tinycamd on your own pi, I’ve included my build with an install script. This is the version with device JPEG compression disabled, so be aware of that if you decide to stream JPEG instead of YUYV. tinycamd-rpi-debian.tar.gz

You can also download my patch file if you want to build it yourself.

Update (2013-10-09): I published the proxy/cache code on github;

Update (2013-10-09)

In the comments Matt asked for clarification on the configuration, so I thought I would put that up here.

There are two services running, tinycamd on the RPi and my node app on Heroku. Both are HTTP servers. I’ve port forwarded 8080 through my router to the RPi (shown as on the diagram) which means that the tinycamd HTTP server on the RPi is available on the public internet if you know my WAN IP (direct chicken access!)

The Heroku app is configured with that WAN IP, so it knows where to make an HTTP request for a new JPEG. To simplify configuration, I actually have it pointing to a dynamic DNS host, which is kept up to date by software on my router.

The Network Configuration

So when you make a request to Heroku for an image, Heroku doesn’t forward that request to tinycamd, it just serves you whatever image is currently in memory. In this way, it’s not really a proxy, that’s a bad term. It’s more of a self-updating cache, because it goes out and updates the frame on it’s own schedule, not due to incoming requests.

Matt made a good point that web sockets would be a better control mechanism. I agree, but this system doesn’t have very hard real time constraints, and I’m fine with leaking a few kb for unused frames. Polling is gross, but sometimes it’s the simple option. S3 is no longer involved in serving frames, that was my first approach, which I abandoned.

Update (2013-10-10): Added a version for giggles.