Tag: Internet

Homoglyph Substitution for URL’s

August 29, 2014 » Geek

At Pack we use ascii-based unique identifiers in URL’s a lot. We call them slugs. Dogs have them, users have them, breeds have them, etc.

I made the decision early on to keep the slugs plain old ascii. No unicode. These are primarily for URL’s, and I wanted them easy to type. Most slugs in the system are automatically generated. These slugs are derived from names when a dog or user is created in the system. This is a problem, because there are a lot of people in the world who use characters outside of the ascii set.

Usually, the solution is just to drop non-ascii characters. This is the simplest option, and it works. For example, Designer News uses this technique. In the case of John Henry Müller, they simply drop the ü because of the umlaut, giving him the user URL of https://news.layervault.com/u/11655/john-henry-mller/. Müller becomes mller. I find this less than optimal.

A second technique is to use homoglyph substitution. A homoglyph is a character which is visually similar to another, to the point that they are difficult to quickly distinguish with just a glance. I’m familiar with them from the world of phishing, where people register domains that look very similar to other domains by using homoglyphs.

Once you build a list of homoglyphs, it’s easy to create slugs that are ascii only through substitution. We expanded the definition of homoglyph for our list to include anything you could squint at funny and think they were similar. The method is a bit brute force, but it only ever runs once per string, and I think the outcome is worth it.

# -*- coding: utf-8 -*-

    ('a', u'AaÀÁÂÃÄÅàáâãäåɑΑαаᎪAaĄĀāĂăąÀÁÂÃÄÅàáâãäå'),
    ('z', u'ZzΖᏃZzŹźŻżŽž'),

def replace_homoglyphs(string):
    '''If a string is unicode, replace all of the unicode homoglyphs with ASCII equivalents.'''
    if unicode == type(string):
        for homoglyph_set in UNICODE_ASCII_HOMOGLYPHS:
            for homoglyph in homoglyph_set[1]:
                string = string.replace(homoglyph, homoglyph_set[0])
    return string

This works well for us, we get reasonable URL’s for dogs like “Hólmfríður frá Ólafsfjordur”. holmfriour-fra-olafsfjordur is not the same, but it’s close enough for a URL that you don’t mind, and it’s better than using hlmfrur-fr-lafsfjordur.

Hólmfríður frá Ólafsfjordur

Unfortunately, this doesn’t work well for un-romanized languages, notably asian languages, such as “クッキー“. In this case, the system breaks down and we end up with no usable slug, so we build from a default. I’m still seeking a solution for that. Maybe I should use automatic translation on it.

Thursday Quote: @hipsterhacker

June 9, 2011 » Geek

“The Cloud” is something idiots call what the rest of us call “The Internet”

– @hipsterhacker

Thursday Quote: Tim Berners-Lee

January 27, 2011 » Geek, Life

“Neither governments nor corporations should be allowed to use disconnection from the Internet as a way of arbitrarily furthering their own aims.”

Tim Berners-Lee

Extracting When You Visited A Page From Firefox

December 18, 2009 » Geek

Need to get the exact time that you visited a page in Firefox? I couldn’t find an easy way to look this up in the History interface, or anywhere else for that matter. I did however know that Firefox stores this kind of thing in sqlite3 databases. Here’s how I got what I needed.

First you have to find the sqlite databases, I’m on Linux so that would be in my home directory. The database you want is places.sqlite. Crack that open in sqlite3. Your command will differ as this is based on your profile name, mine is “gmail” so I ended up with g69ap5lc.gmail.

$ sqlite3 ~/.mozilla/firefox/g69ap5lc.gmail/places.sqlite

Be aware you have to shut down the Firefox instance first, because it locks the file. Make sure your privacy settings won’t erase it all when you shut it down! I had to change mine to “Remember history” first.

Next you need to find and grab the timestamp. This can be a chore if you don’t have the full URL. I was looking for the one from spiffie.org below.

sqlite>.headers on
sqlite>select * from moz_places;
1366|http://spiffie.org/kits/usb7/driver_linux.shtml|Linux USB7 Driver|gro.eiffips.|1|0|0||100|1261169238197827

The column we are interested in is last_visit_date which is 1261169238197827 in our case. You can also list all the recent visits from the moz_historyvisits table with the id column.

sqlite> select * from moz_historyvisits where place_id = '1366';

Now we need to convert that timestamp into something we can read (unless you are a super UNIX geek and can read timestamps). This instance is too precise for the date command, so lop off the first 10 digits and use that, so in the example we use 1261169238.

$ date -d @1261169238
Fri Dec 18 14:47:18 CST 2009

Not short and sweet, but it works.

CSS Heart

December 7, 2009 » Geek

(Update 2009-12-07)
I fixed it to work in webkit too.

I was playing with clip and border-radius and put this little guy together.

<3 CSS

I haven’t tested it in anything but Iceweasel 3.5.5, so your mileage may vary. You can check it out here: CSS Heart.

This is only two divs to make the heart. Much less impressive than the Homer Simpson in CSS.

I suppose I could have just used &hearts; huh?