jmhobbs

Replacing MySQL FULLTEXT With JavaScript

Over at OurUNO.com I have a few of those ajaxy auto-complete text boxes. Where you start typing and it presents you with a bunch of likely items from the database. It's great, really cool looking and all that, but sometimes when the server is slow these things start stacking XHR's and it drags to the point of not being usable. I've got some super-cheap shared hosting, so this isn't the most rare of things.

I decided to replace this often slow mechanism with some pure JavaScript and no more XHR. I figure I can do this because I only have about 150 rows x 2 cols of matchable data per autocompleter.

The core of this script is the item class. I suppose this could be better termed the dbrow class or something similar, because it is just used to store rows returned from the database to be used as a look up table. If you had more fields you wanted to search you would have to add on to the fields array of course. Other important info you don't want to have to run back to the database for can be stored in the class as well.

function item(_id,_field1,_field2) {
  this.id = _id;
  this.fields = new Array(_field1, _field2);
}

Next I use a call to the DB to get my rows. Knowing the number of rows is important, because we use this to create our array and populate it. I've shown the output of that PHP, skipping some rows for brevity :)

var items = new Array(115);

items[0] = new item(11,"2710","Introduction to Digital Design Principles");
items[1] = new item(6,"2830","Java Programming");
.
.
.
items[113] = new item(1905,"9420","Intelligent Agent Systems");
items[114] = new item(1906,"9710","Foundations of Software Engineering Research");

Next up is the itemMatch class. This is a little class to hold information on matches that we make from our search terms to the items we defined. It just holds some simple information: the items array index of the matching item, the field (type) it was found in, the index in the field where we found the match start, and the length of the search term we matched.

function itemMatch(_index,_type,_strstr,_strlen) {
  this.matchIndex = _index;
  this.matchType = _type;
  this.matchStrStr = _strstr;
  this.matchStrLen = _strlen;
}

Now for some magic. We enter the ftsearch function. The four parameters are: returnArray is the empty, temporary array that we will be stuffing our itemMatch objects into.inputArray is the array of item objects we want to search through, the items array in this case. fields is the number of fields in each item to search through (you don't have to search them all). searchTerms is the search string, likely captured from a form field.

function ftsearch(returnArray,inputArray,fields,searchTerms) {

The first thing we do is iterate through the entire inputArray and try to match the whole searchTerms string against them. Notice that I did not use regular expressions. I chose not to because I figured they would add additional overhead and I'm not too concerned about capitalization in any case.

  for(x = 0; x < inputArray.length; x++) {
    for(y = 0; y < fields; y++) {
      tempIndex = inputArray[x].fields[y].toLowerCase().indexOf(searchTerms.toLowerCase());
      if(tempIndex != -1) {
        returnArray.push(new itemMatch(x,y,tempIndex,searchTerms.length));
      }
    }
  }

Next we split searchTerms on spaces and repeat our search for each chunk. We do not search on empty chunks as this returns every item and wastes time.

  searchTerms = searchTerms.split(" ");
  for(j = 0; j < searchTerms.length; j++) {
    if(searchTerms[j] == '')
      continue;

    for(x = 0; x < inputArray.length; x++) {
      for(y = 0; y < fields; y++) {
        tempIndex = inputArray[x].fields[y].toLowerCase().indexOf(searchTerms[j].toLowerCase());
        if(tempIndex != -1) {
          returnArray.push(new itemMatch(x,y,tempIndex,searchTerms[j].length));
        }
      }
    }
    
  }

Okay, now that we have our big array of itemMatches we need to sort them and remove the duplicates. To sort them I chose a combination of the matchStrStr data member and matchStrLen. I reasoned that matches closer to the start of the string were more important than those deeper in, so I sorted numerically ascending on matchStrStr I now think this is a little shady, but it works well for my purposes. I then decided that the length of the match is important to, since a match of the full string 20 chars in is far better than a match of a 3 char term at the start of the string. To handle this I subtracted the matchStrLen from the matchStrStr to weight longer matches better.

  returnArray.sort(
    function sortMatches(a, b) {
      a = a.matchStrStr-a.matchStrLen;
      b = b.matchStrStr-b.machStrLen;
      return a - b;
    }
  );

Removing duplicates is a simple search and remove, and then ftsearch is all done.

  for(x = 0; x < returnArray.length; x++) {
    for(y = x+1; y < returnArray.length; y++) {
      if(returnArray[x].matchIndex == returnArray[y].matchIndex) {
        returnArray.splice(y,1);
      }
    }
  }

So now that we have our searching function we need to use it. We've generated our items array and so all we need is a function for our autocomplete. This is a relatively custom job, but easy to assemble and it will have some common components.

First of these is the limiter. I require a search term length of at least two before I go through the trouble of searching. If I've got enough in my terms I make up an empty array for my itemMatchs and let it rip.

function autoComplete() {
  document.getElementById('thisOne').innerHTML = '';

  if(document.getElementById('thisField').value.length < 2)
    return;

  tempArray = new Array();
  searchTerms = document.getElementById('thisField').value;

  ftsearch(tempArray,items,2,searchTerms);

At this point we have our array of unique matches and we need to do whatever it is we wanted them for. In this case it is to display them. I used the && x < 5 to limit the number printed to only 5 items.

for(x = 0; x < tempArray.length && x < 5; x++) {
      tempItem = items[tempArray[x].matchIndex];
      tempStrStr = tempArray[x].matchStrStr;
      tempStrLen = tempArray[x].matchStrLen;
      if(tempArray[x].matchType == 1) {
        tempString = tempItem.fields[0]+' - ';
        tempString += tempItem.fields[1].slice(0,tempStrStr);
        tempString += '';
        tempString += tempItem.fields[1].slice(tempStrStr,tempStrStr+tempStrLen);
        tempString += '';
        tempString += tempItem.fields[1].slice(tempStrStr+tempStrLen,tempItem.fields[1].length)
      }
      else {
        tempString = tempItem.fields[0].slice(0,tempStrStr);
        tempString += '';
        tempString += tempItem.fields[0].slice(tempStrStr,tempStrStr+tempStrLen);
        tempString += '';
        tempString += tempItem.fields[0].slice(tempStrStr+tempStrLen,tempItem.fields[0].length)
        tempString += ' - '+tempItem.fields[1];
      }
      document.getElementById('thisOne').innerHTML += tempString+'
'; } }

So there you have it. It's not perfect, but it does a fairly good job of replacing my ajax version and its pretty quick. I ran with a few different sets of terms and averaged about 18.5 ms per running of autocomplete with a set of 115 items and search terms from 1-15 chars. Your mileage may vary, let me know how it goes for you.

You can see a demo as described above here.
You can find a full colorized code view of that demo here.

Update (03/20/07)

Turns out I should have done some testing in IE. IE7 won't let you define an array sort function in the method call, so I had to move it outside. Easy enough, but kinda bothersome.