Navigation
 
Search
 
Random Image
Hobbs 018.jpg
 
Me. Elsewhere.
 
Archives
 
Darcy
 
Things I Like
KDE
 
License
 
Thursday Quote: Rob Pike

“Data dominates. If you’ve chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.”

- Rob Pike
Notes on C Programming

Posted March 11th, 2010 - Permalink
Categories: Geek
Tags: ,
No Comments »
 
Thursday Quote: Brian Malow

“Schrödinger’s cat walks into a bar. And doesn’t.”

- Brian Malow
Science Comedian

Posted March 4th, 2010 - Permalink
Categories: Geek
Tags: ,
No Comments »
 
Looking up words in a Dictionary using Python

First off, I do not mean dictionary in the Python sense of the word. I mean dictionary in the glossary sense, like Merriam-Webster. This collision of terminology makes Googling for this functionality particularly difficult and frustrating.

I came across three useful Python solutions, and I’m going to detail usage of two of them in this post.

Option 1: NLTK + Wordnet

First up is accessing Wordnet.

“Wordnet is a large lexical database of English…”

The only Python way of accessing this (that I came across) is NLTK, a set of

“Open source Python modules, linguistic data and documentation for research and development in natural language processing…”

Getting NLTK Installed

For various reasons, NLTK is not packaged by Debian, so I had to install it by hand. Even if your distro does package NLTK, you might want to read this bit anyway. Installing was a cinch with easy_install nltk. However, this does not install the corpus (where wordnet is stored). As shown below:

>>> from nltk.corpus import wordnet
>>> wordnet.synsets( 'cake' )
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/site-packages/nltk-2.0b8-py2.5.egg/nltk/corpus/util.py", line 68, in __getattr__
    self.__load()
  File "/usr/lib/python2.5/site-packages/nltk-2.0b8-py2.5.egg/nltk/corpus/util.py", line 56, in __load
    except LookupError: raise e
LookupError:
**********************************************************************
  Resource 'corpora/wordnet' not found.  Please use the NLTK
  Downloader to obtain the resource: >>> nltk.download().
  Searched in:
    - '/home/jmhobbs/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************

So what we need to do is run the NLTK installer, as shown here:

>>> import nltk
>>> nltk.download()
NLTK Downloader
---------------------------------------------------------------------------
    d) Download      l) List      c) Config      h) Help      q) Quit
---------------------------------------------------------------------------
Downloader> d
 
Download which package (l=list; x=cancel)?
  Identifier> wordnet
    Downloading package 'wordnet' to /home/jmhobbs/nltk_data...
      Unzipping corpora/wordnet.zip.
 
---------------------------------------------------------------------------
    d) Download      l) List      c) Config      h) Help      q) Quit
---------------------------------------------------------------------------
Downloader> q
True
>>>

Using NLTK + Wordnet

Now that we have everything installed, using wordnet from Python is straight forward.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Load the wordnet corpus
from nltk.corpus import wordnet
 
# Get a collection of synsets (synonym sets) for a word
synsets = wordnet.synsets( 'cake' )
 
# Print the information
for synset in synsets:
  print "-" * 10
  print "Name:", synset.name
  print "Lexical Type:", synset.lexname
  print "Lemmas:", synset.lemma_names
  print "Definition:", synset.definition
  for example in synset.examples:
    print "Example:", example

The output of that is:

----------
Name: cake.n.01
Lexical Type: noun.artifact
Lemmas: ['cake', 'bar']
Definition: a block of solid substance (such as soap or wax)
Example: a bar of chocolate
----------
Name: patty.n.01
Lexical Type: noun.food
Lemmas: ['patty', 'cake']
Definition: small flat mass of chopped food
----------
Name: cake.n.03
Lexical Type: noun.food
Lemmas: ['cake']
Definition: baked goods made from or based on a mixture of flour, sugar, eggs, and fat
----------
Name: coat.v.03
Lexical Type: verb.contact
Lemmas: ['coat', 'cake']
Definition: form a coat over
Example: Dirt had coated her face

Perfect!

Caveats

There are some caveats to using WordNet with NLTK. First is that the definitions aren’t always ordered in the way you would expect. For instance, look at the “cake” results above. Cake, as in the confection, is the third definition, which feels wrong. You can of course order and filter on the synset name to correct this to some degree.

Second, there is a major load time for getting WordNet ready to use. Your first call to wordnet.sysnsets will take considerably longer than the next ones. On my machine the difference was 3.5 seconds versus 0.0003 seconds.

Last, you are constrained to the English language, as analyzed by Pinceton. I’ll address this issue in the next section.

Option 2: SDict Viewer

As I said above, using WordNet is simple, but restrictive. What if I want to use a foreign language dictionary or something? WordNet is only in English. This is where the SDict format comes in. It has lots of free resource files available at http://sdict.com/en/. The best existing parser I found was SDict Viewer which is a dead project, but remarkably complete.

SDict Viewer is an application

SDict Viewer is an application, so it’s not an easy to install library. However, it is very well written and extracting what you need is simple. You can get my “library” version from http://github.com/jmhobbs/sdictviewer-lib.

Here is an example when it’s all finished:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import sys
 
import sdictviewer.formats.dct.sdict as sdict
import sdictviewer.dictutil
 
dictionary = sdict.SDictionary( 'webster_1913.dct' )
dictionary.load()
 
start_word = sys.argv[1]
 
found = False
 
for item in dictionary.get_word_list_iter( start_word ):
  try:
    if start_word == str( item ):
      instance, definition = item.read_articles()[0]
      print "%s: %s" % ( item, definition )
      found = True
      break
  except:
    continue
 
if not found:
  print "No definition for '%s'." % start_word
 
dictionary.close()

Here is a sample run:

jmhobbs@katya:~$ python okay.py Cat
Cat: (n.) An animal of various species of the genera Felis and Lynx. The domestic cat is Felis domestica. The European wild cat (Felis catus) is much larger than the domestic cat. In the United States the name wild cat is commonly applied to the bay lynx (Lynx rufus) See Wild cat, and Tiger cat.
wrote /home/jmhobbs/.sdictviewer/index_cache/webster_1913.dct-1.0.index

As you can see, it gives a nice definition (thank you Webster 1913) and then it has a little junk on the end. This is the index cache, a lookup table for finding words faster. You can avoid saving it by calling dictionary.close(False) instead.

Option 3: Aard Format

In option 2 I said that SDict Viewer was a dead project, this is because the development has been moved to the Aard Dictionary project. I chose not to pursue this format, as most of the existing resources are stored in HTML formats and I needed plain text. This might be ideal for you though, as they also provide access to Wikipedia archives.

All Done

So there you have it. Two viable ways of extracting a plain text definition for a word in Python. Best of luck to you!

Posted March 1st, 2010 - Permalink
Categories: Consume - Geek
Tags: , ,
No Comments »
 
Clean Auth module usage in Kohana

I’ve been learning the Kohana framework for a project at work, and I have to say I really like it. It has a lot of the things I liked about rails, and it stays out of my way, unlike CakePHP.

I thought I’d highlight my authentication solution that uses the built in Auth module and a base controller that I call Site_Controller. Keep in mind that all of my controllers derive from this one.

So, what’s it boil down to? Essentially you set up Auth and my base controller, then in your children controllers you can set $access_control to an array of methods you want protected. It works with key == method and value == access level. For values you can have “*” which means anyone logged in can use the method, or a string providing a specific role. Take a look at the controller then I’ll show you an example usage.

application/views/site.php

<?php
 
  class Site_Controller extends Template_Controller {
 
    public $template = 'layout';
 
    protected $access_control = array();
    protected $access_denied = "/user/login";
 
    //public $auto_render = false;
 
    function __construct () {
      parent::__construct();
      $this->session = Session::instance();
 
      // Check permissions
      if( array_key_exists( router::$method, $this->access_control ) ) {
        if( '*' == $this->access_control[router::$method] ) {
          if( ! Auth::instance()->logged_in() )
            url::redirect( $this->access_denied );
        }
        else if( is_array( $this->access_control[router::$method] ) ) {
          $can_proceed = false;
          foreach( $this->access_control[router::$method] as $role )
            if( Auth::instance()->logged_in( $role ) )
              $can_proceed = true;
 
          if( ! $can_proceed )
            url::redirect( $this->access_denied );
        }
        else {
          if( ! Auth::instance()->logged_in( $this->access_control[router::$method] ) )
            url::redirect( $this->access_denied );
        }
      }
    }
 
    public function __call( $method, $arguments ) {
      $this->template->title = "404";
      $this->template->content = new View( 'errors/404');
    }
  }

Here’s an example controller. In this case anyone can access login, anyone logged in can access index and only logged in admins can access adminsonly.

application/controllers/user.php

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
<?php
 
  class User_Controller extends Site_Controller {
 
    protected $access_control = array( 
        "index" => "*",
        "adminsonly" => "admin"
      );
 
    function  index () {
      $this->template->content = "index";
    }
 
    function login () {
      $this->template->content = "login";
    }
 
    function adminsonly () {
      $this->template->content = "admins only";
    }
  }

I haven’t done a ton of testing and it’s not the most robust solution, but I like it and it was easy to write.

Posted February 24th, 2010 - Permalink
Categories: Geek
Tags: , , ,
No Comments »
 
Territorial Seeds User Script

So I was buying our garden seeds at Territorial Seed and they had a stupid 90’s “no right click” script installed. Lame, I need my tabs!

So I took a look and then wrote my first Greasemonkey script. Check it out if you need it, it should work on other sites that try the same trick.

territorial.seed.no.right.click.user.js

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// Version 0.1
// Copyright (c) 2010, John Hobbs
// Released under the GPL license
// http://www.gnu.org/copyleft/gpl.html
//
// ==UserScript==
// @name          Territorial Seed No Right Click
// @namespace     http://www.velvetcache.org/
// @description   Disable the disable right click script on Territorial Seed
// @include       http://www.territorialseed.com/*
// ==/UserScript==
 
window.addEventListener (
  'load',
  function () {
    setTimeout( "document.oncontextmenu = null;", 150 );
  },
  true
);

Posted February 16th, 2010 - Permalink
Categories: Geek - Life
Tags: , , ,
No Comments »
 
More Posts
 
Copyright © 2006 - 2010 John Hobbs