check_html_recursive script to check website html validity

I made a dodgy bash script to check the validity of the web pages on my site http://mathewpeet.org. There is probably a better way to do this since the validator is open source, but I couldn’t see how it worked easily, any advice there welcome.

The script makes a list of directories under the root directory specified and then checks index.html in each directory according to w3.org validator. Status of each file is printed to screen, and error reports are sent to a readable text file.

These are the contents of the file check_html_recursive ;

#!/bin/bash
# Copyright Mathew Peet 2009, please use and modify
# but leave some credit
#This script checks if the pages are valid html or not, and puts errors in ~/bin/errors.txt

myfiles=`find ~/www/mathewpeet.org/ -name 'content' -exec dirname {} \;`
#myfiles="/home/user/public_html/"
for x in $myfiles
 do 
    y=`echo ${x:32}`       #takes sting after nth character
    echo checking index in $y directory
    `w3m -dump http://validator.w3.org/check?uri=http://mathewpeet.org/$y/ > ~/b
in/temp.txt`
    popo=`grep "as XHTML 1.1!" ~/bin/temp.txt`
    echo $popo
    opop='Errors found while checking this document as XHTML 1.1!'
    if [ "$popo" != "$opop" ]; then
       echo "ok?!"
    else
      `cat ~/bin/temp.txt >> ~/bin/errors.txt`
       echo "$y/index.html does not validate as XHTML 1.1"
    fi
    echo ""
 done
echo "Any reported errors written to ~/bin/errors.txt (hopefully)"
echo "remove temp.txt"
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: