jmhobbs

Recursive Word Count With Bash

I was curious how many lines were in the new OurUNO rewrite, so I decided to write a little script to find out. Well, that all got out of hand and I kept adding things and masks and depth recursion limiting and I managed to stop myself before I added color to the script, so I did okay I guess.

Anyway, here it is. I'm sure there is some really clever way to do an equivalent one-liner of this, but hey, that's life.

#!/bin/sh

function printUsage {

  echo "Usage: countLines directory [options]"
  echo
  echo "Options:"
  echo " -m=XX   --mask=XX    - The mask may be any grep style regular expression."
  echo " -d=XX   --depth=XX   - The maximum depth of recursion. Defaults to 20."
  echo
  exit
}

if [ $# -le 0 ]; then
  printUsage
  exit
fi

if [  "$1" == "-v" ]; then
  printUsage
  exit
fi

TOTALCOUNT=0
FILEMASK=''
MAXDEPTH=20

# Drag out our options...
for i in $@; do
  if [ `echo $i | sed 's/^\(-m=\).*$/\1/'` == "-m=" ]; then
     FILEMASK=`echo $i | sed 's/^-m=\(.*\)$/\1/'`
  elif [ `echo $i | sed 's/^\(--mask=\).*$/\1/'` == "--mask=" ]; then
    FILEMASK=`echo $i | sed 's/^--mask=\(.*\)$/\1/'`
  elif [ "$i" == "$1" ]; then
     continue;
  elif [ `echo $i | sed 's/^\(-d=\)[0-9]*$/\1/'` == "-d=" ]; then
    MAXDEPTH=`echo $i | sed 's/^-d=\(.*\)$/\1/'`
  elif [ `echo $i | sed 's/^\(--depth=\)[0-9]*$/\1/'` == "--depth=" ]; then
    MAXDEPTH=`echo $i | sed 's/^--depth=\(.*\)$/\1/'`
  else
    printUsage
    exit
  fi
done

CURDEPTH=0

function wcDir {
  FILES=`ls -l $1 | grep ^- | awk '{print $8}' | grep -e "$FILEMASK"`

  for i in $FILES; do
    LINES=`wc -l $1/$i | awk '{print $1}'`
    TOTALCOUNT=$(($LINES + $TOTALCOUNT))
  done
}

function recurseDir {
  COUNT=`ls -l $1 | grep ^d | awk '{print $8}' | wc -l`

  CURDEPTH=$(($CURDEPTH + 1))

  if [ $COUNT != 0 ] && [ $CURDEPTH -lt $MAXDEPTH ]; then
    for i in `ls -l $1 | grep ^d | awk '{print $8}'`; do
      recurseDir $1/$i
    done
  fi

  wcDir $1

  CURDEPTH=$(($CURDEPTH - 1))
}

recurseDir $1

echo $TOTALCOUNT

Bonus! Here's a tip for posting scripts on the interwebs. Replace your tabs with spaces before copying them into your posts with:
$ cat scriptOrCodeSource | sed 's/\t/ /g'

Update (11/08/07)So that doesn't work as advertised. I think it's doing some double counting or something. I'll post the rewrite when I finish it.