| Days | |
| Hours | |
| Minutes | |
| Seconds |
A ways back in the past I had a MediaWiki install at WikiOmaha.org with the hopes that a wiki could be formed for the omaha community, by the omaha community (sound familiar?). Anyway, I never really did much with it, and a few days ago a professor from Creighton contacted me about my domain and pooling resources.
He has created OmahaWiki.org which WikiOmaha.org now re-directs to. He is having students flesh it out. I came into the picture to help set up some bots to manage the content.
It turns out there is a cool framework for MediaWiki’s called the “Python Wikipedia Robot Framework” that is written in python. I got the scripts working on my machine and then I turned my attention to writing a bot that would do a word-count on every page, and add a stub to that page if it was under a given threshold.
I had forgotten how awesome Python is. It really is a good language, I just wish I had call to use it every once in a while. Anyway, here is my Python bot for that framework. You can grab a file version here
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 | #!/usr/bin/python # -*- coding: utf-8 -*- """ -----// Stub Adder //------------------------------------------------------ File: jmh_addstubs.py Version: 1.0 Author: John Hobbs Contact: john@velvetcache.org This bot will iterate through all pages of the wiki and append a generic stub ('{{Stub}}') to them if they do not have one already and have under a given number of "words" in them. Words, here, are counted as _any_ series of characters seperated by a space. The default maximum number of words that the bot will work on is 5, so it is recommended that you pass it a more realistic value. Call python wordcount.py to have your change be done on all pages of the wiki. If that takes too long to work in one stroke, run: python wordcount.py Pagename to do all pages starting at pagename. There are two command line options: -dryrun This will check and notify you but will not actually change anything. -words=XX This is the word threshold. Replace XX with the biggest wordcount that you want the bot to append stubs to. """ import wikipedia import pagegenerators import sys def workon(page): try: text = page.get() except wikipedia.IsRedirectPage: return jmh_tokens = text.split(' ') if len(jmh_tokens) <= jmh_count and -1 == text.find('Stub}}'): text += '{{Stub}}' if jmh_dryrun: print '--// MATCH: [['+page.title()+']] -> Dry Run, No Change //--' else: print '--// MATCH: [['+page.title()+']] -> Stub Added //--' page.put(text) try: start = [] test = False jmh_dryrun = False jmh_count = 5 for arg in wikipedia.handleArgs(): if arg.startswith("-words="): temp = arg.split('=') jmh_count = int(temp[1]) elif arg.startswith("-dryrun"): jmh_dryrun = True else: start.append(arg) if start: start = " ".join(start) else: start = "!" mysite = wikipedia.getSite() basicgenerator = pagegenerators.AllpagesPageGenerator(start=start) generator = pagegenerators.PreloadingGenerator(basicgenerator) for page in generator: workon(page) finally: wikipedia.stopme() |