Tag: Work

Homoglyph Substitution for URL’s

August 29, 2014 » Geek

At Pack we use ascii-based unique identifiers in URL’s a lot. We call them slugs. Dogs have them, users have them, breeds have them, etc.

I made the decision early on to keep the slugs plain old ascii. No unicode. These are primarily for URL’s, and I wanted them easy to type. Most slugs in the system are automatically generated. These slugs are derived from names when a dog or user is created in the system. This is a problem, because there are a lot of people in the world who use characters outside of the ascii set.

Usually, the solution is just to drop non-ascii characters. This is the simplest option, and it works. For example, Designer News uses this technique. In the case of John Henry Müller, they simply drop the ü because of the umlaut, giving him the user URL of https://news.layervault.com/u/11655/john-henry-mller/. Müller becomes mller. I find this less than optimal.

A second technique is to use homoglyph substitution. A homoglyph is a character which is visually similar to another, to the point that they are difficult to quickly distinguish with just a glance. I’m familiar with them from the world of phishing, where people register domains that look very similar to other domains by using homoglyphs.

Once you build a list of homoglyphs, it’s easy to create slugs that are ascii only through substitution. We expanded the definition of homoglyph for our list to include anything you could squint at funny and think they were similar. The method is a bit brute force, but it only ever runs once per string, and I think the outcome is worth it.

# -*- coding: utf-8 -*-

    ('a', u'AaÀÁÂÃÄÅàáâãäåɑΑαаᎪAaĄĀāĂăąÀÁÂÃÄÅàáâãäå'),
    ('z', u'ZzΖᏃZzŹźŻżŽž'),

def replace_homoglyphs(string):
    '''If a string is unicode, replace all of the unicode homoglyphs with ASCII equivalents.'''
    if unicode == type(string):
        for homoglyph_set in UNICODE_ASCII_HOMOGLYPHS:
            for homoglyph in homoglyph_set[1]:
                string = string.replace(homoglyph, homoglyph_set[0])
    return string

This works well for us, we get reasonable URL’s for dogs like “Hólmfríður frá Ólafsfjordur”. holmfriour-fra-olafsfjordur is not the same, but it’s close enough for a URL that you don’t mind, and it’s better than using hlmfrur-fr-lafsfjordur.

Hólmfríður frá Ólafsfjordur

Unfortunately, this doesn’t work well for un-romanized languages, notably asian languages, such as “クッキー“. In this case, the system breaks down and we end up with no usable slug, so we build from a default. I’m still seeking a solution for that. Maybe I should use automatic translation on it.

Thursday Quote: Zach Holman

April 12, 2012 » Geek, Life

“Hours are bullshit. Worry about good work.”

– Zach Holman
How Github Works

Tags: ,

Thursday Quote: Will Wilkinson

August 18, 2011 » Life

“I don’t want to maximise income. I want to maximise autonomy and time for unremunerative but satisfying creative work.”

– Will Wilkinson
Unemployment and jobs:
Work for post-materialists

Installing Scribe on OSX with Thrift 0.5.0

February 15, 2011 » Geek

Update (2011-02-24)

This also works on Ubuntu, with two little tweaks.

First, no need to install libevent from source, just do an apt-get install libevent-dev.

Second, after you install Scribe, you need to add Thrift to the shared library path so it will load.

Just add a new file called /etc/ld.so.conf.d/scribe.conf with this content:


Then run ldconfig and you should be good to go.

I looked for a way to install Facebook’s Scribe on OS X to test out some code I’m writing at work, but I could not find a process that worked for me.

The best I got was by @kpumuk called Installing and Using Scribe with Ruby on Mac OS.

It got me close, but what I outline below got me the rest of the way. Hopefully it will help you (until it breaks too)

Install libevent (2.0.10)

You’ll need the development files for libevent, which you probably don’t have. Grab the latest stable package at http://monkey.org/~provos/libevent/. I used 2.0.10.

This one is easy, just configure and make install.

$ sudo ./configure
$ sudo make install

You might consider using --prefix=/opt/libevent on the configure to keep this libevent separate from any others that might get installed (via brew or ports). If so, be sure to change --with-libevent when compiling Thrift.

Install Thrift (0.5.0)

Now let’s install Thrift. 0.5.0 is the latest stable, and what I used. http://incubator.apache.org/thrift/.

Again, not a tough build, but you need to be sure that you set --with-libevent on configure, otherwise thriftnb won’t be built and you’ll have to do this compile again later when you get stuck in the Scribe build.

$ sudo ./configure --prefix=/opt/thrift --with-libevent=/usr/local/lib
$ sudo make install

Install FB303 (In Thrift source)

You also need FaceBook Baseline (FB303) which is included in the Thrift source code. From your Thrift source directory, do the following:

$ cd contrib/fb303
$ sudo ./bootstrap.sh
$ sudo ./configure --prefix=/opt/fb303 --with-thriftpath=/opt/thrift
$ sudo make install

Install Scribe (> 2ee14d3)

There is a bug fix in Scribe at version 2ee14d3, which fixes a build problem created by Thrift 0.5.0.

So, as of right now you need to get your source for Scribe from github, and after that commit.

Once you have it:

$ sudo ./bootstrap.sh
$ sudo ./configure --prefix=/opt/scribe --with-thriftpath=/opt/thrift --with-fb303path=/opt/fb303
$ sudo make install

Build Ruby Thrift structures

Okay, everything is installed now! Well, almost. You still need to generate the Thrift bindings if you are going to be using Ruby.

From the scribe source directory:

$ /opt/thrift/bin/thrift -o . -I /opt/fb303/share/ --gen rb if/scribe.thrift 
$ /opt/thrift/bin/thrift -o . -I /opt/fb303/share/ --gen rb /opt/fb303/share/fb303/if/fb303.thrift
$ sudo mkdir /opt/scribe/ruby
$ sudo mv gen-rb/ /opt/scribe/ruby/scribe

Copy a config from Scribe

You’ll also need a config file for Scribe, which you can get from the examples directory in the Scribe source.

Again, from the Scribe source root:

$ sudo mkdir /opt/scribe/conf/
$ sudo cp examples/example1.conf /opt/scribe/conf/test.conf

Start Scribe!

You are now ready to run scribe, so fire it up!

$ sudo /opt/scribe/bin/scribed -c /opt/scribe/conf/test.conf

Test it from Ruby

Now open up an editor and drop this into a Ruby script.

require 'scribe'

  socket = Thrift::Socket.new('localhost', 1463)
  transport = Thrift::FramedTransport.new(socket)
  protocol = Thrift::BinaryProtocol.new(transport, false)
  client = Scribe::Client.new(protocol)
  log_entry = LogEntry.new(
    :category => 'test', 
    :message => 'This is a test message'
rescue Thrift::Exception => tx
  print 'Thrift::Exception: ', tx.message, "\n"

When you run it, you should get a new directory and file in /tmp/scribetest containing your message.


Thursday Quote: Paul Graham

October 28, 2010 » Life

“To be happy I think you have to be doing something you not only enjoy, but admire. You have to be able to say, at the end, wow, that’s pretty cool.”

– Paul Graham
How to Do What You Love

Tags: , ,