Time Until Wedding
Days
Hours
Minutes
Seconds
 
Navigation
 
Search
 
My Usable Projects
 
Reading...
 
Listening...
 
Random Image
DSC02948.JPG
 
Archives
 
Me. Elsewhere.
 
Feeds and Such

Google Reader or Homepage
Add to My Yahoo!
Add to Technorati Favorites!
Bookmark del.icio.us
Bookmark Furl
Bookmark Spurl
 
Darcy
 
Things I Like
Demonoid.com
 
License
 
MediaWiki and OmahaWiki.org

A ways back in the past I had a MediaWiki install at WikiOmaha.org with the hopes that a wiki could be formed for the omaha community, by the omaha community (sound familiar?). Anyway, I never really did much with it, and a few days ago a professor from Creighton contacted me about my domain and pooling resources.

He has created OmahaWiki.org which WikiOmaha.org now re-directs to. He is having students flesh it out. I came into the picture to help set up some bots to manage the content.

It turns out there is a cool framework for MediaWiki’s called the “Python Wikipedia Robot Framework” that is written in python. I got the scripts working on my machine and then I turned my attention to writing a bot that would do a word-count on every page, and add a stub to that page if it was under a given threshold.

I had forgotten how awesome Python is. It really is a good language, I just wish I had call to use it every once in a while. Anyway, here is my Python bot for that framework. You can grab a file version here

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
#!/usr/bin/python
# -*- coding: utf-8  -*-
"""
-----// Stub Adder //------------------------------------------------------
File: jmh_addstubs.py
Version: 1.0
Author: John Hobbs
Contact: john@velvetcache.org
 
This bot will iterate through all pages of the wiki and append a generic
stub ('{{Stub}}') to them if they do not have one already and have under
a given number of "words" in them.  Words, here, are counted as _any_ series
of characters seperated by a space.  The default maximum number of words
that the bot will work on is 5, so it is recommended that you pass it a more
realistic value.
 
Call
 
python wordcount.py
 
to have your change be done on all pages of the wiki. If that takes too
long to work in one stroke, run:
 
python wordcount.py Pagename
 
to do all pages starting at pagename.
 
There are two command line options:
 
-dryrun
    This will check and notify you but will not actually change anything.
 
-words=XX
  This is the word threshold. Replace XX with the biggest wordcount that you
  want the bot to append stubs to.
 
"""
import wikipedia
import pagegenerators
import sys
 
def workon(page):
    try:
        text = page.get()
    except wikipedia.IsRedirectPage:
        return
 
    jmh_tokens = text.split(' ')
    if len(jmh_tokens) <= jmh_count and -1 == text.find('Stub}}'):
      text += '{{Stub}}'
      if jmh_dryrun:
        print '--// MATCH: [['+page.title()+']] -> Dry Run, No Change //--'
      else:
        print '--// MATCH: [['+page.title()+']] -> Stub Added //--'
        page.put(text)
 
try:
    start = []
    test = False
    jmh_dryrun = False
    jmh_count = 5
    for arg in wikipedia.handleArgs():
        if arg.startswith("-words="):
            temp = arg.split('=')
            jmh_count = int(temp[1])
        elif arg.startswith("-dryrun"):
            jmh_dryrun = True
        else:
            start.append(arg)
    if start:
        start = " ".join(start)
    else:
        start = "!"
    mysite = wikipedia.getSite()
    basicgenerator = pagegenerators.AllpagesPageGenerator(start=start)
    generator = pagegenerators.PreloadingGenerator(basicgenerator)
    for page in generator:
        workon(page)
 
finally:
    wikipedia.stopme()

Posted March 2nd, 2007 - Permalink
Categories: Internet - Programming - Python
You can leave a comment, or trackback from your own site.
 
Adjacent Posts
 
Comments