check_html_recursive script to check website html validity

I made a dodgy bash script to check the validity of the web pages on my site There is probably a better way to do this since the validator is open source, but I couldn’t see how it worked easily, any advice there welcome.

The script makes a list of directories under the root directory specified and then checks index.html in each directory according to validator. Status of each file is printed to screen, and error reports are sent to a readable text file.

These are the contents of the file check_html_recursive ;

# Copyright Mathew Peet 2009, please use and modify
# but leave some credit
#This script checks if the pages are valid html or not, and puts errors in ~/bin/errors.txt

myfiles=`find ~/www/ -name 'content' -exec dirname {} \;`
for x in $myfiles
    y=`echo ${x:32}`       #takes sting after nth character
    echo checking index in $y directory
    `w3m -dump$y/ > ~/b
    popo=`grep "as XHTML 1.1!" ~/bin/temp.txt`
    echo $popo
    opop='Errors found while checking this document as XHTML 1.1!'
    if [ "$popo" != "$opop" ]; then
       echo "ok?!"
      `cat ~/bin/temp.txt >> ~/bin/errors.txt`
       echo "$y/index.html does not validate as XHTML 1.1"
    echo ""
echo "Any reported errors written to ~/bin/errors.txt (hopefully)"
echo "remove temp.txt"

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: