Overview

This article will walk you through configuring your server to monitor any web page on the internet and receive email notification when the web page has been updated.

Connecting to the Server with SSH

First, let’s use SSH to connect to your remote server. See how to lease a server from a hosting server provider here if you do not yet have one.

The default username for Ubuntu on Amazon Web Services is ubuntu.

ssh -i PATH-TO-CERTIFICATE [email protected]

ssh -i /User/Kurt/rsa-cert.pem [email protected]

Change the permission of the certificate file to 600 if prompted “Permissions 0644 for ‘/User/Kurt/Desktop/rsa-cert.pem’ are too open.”.

chmod 600 PATH-TO-CERTIFICATE

chmod 600 /User/Kurt/Desktop/rsa-cert.pem

Or if you have enabled password authentication.

ssh [email protected]

ssh [email protected]

Then enter the password at prompt.

Writing a Bash script

Now you need to write a script to tell your server to read a web page, and to compare it to a previous read to see if there has been any change.

There are many tools or scripting languages that one can use to automate tasks on a server and Bash is one of them.

Bash is one of the most widely-used scripting languages and is the default login shell for Linux. When you SSH to your Linux server, you are already using Bash.

Use the nano editor to open a new file and start writing the script.

sudo nano

First line to add the script shall be.

#!/bin/bash

This is to inform the executer that this is a Bash script and is to be interpreted with Bash. This line is elegantly called a shebang.

Thereafter goes the script.

#!/bin/bash

URL="https://www.google.com"

DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
SCRIPT="$(basename "$(test -L "$0" && readlink "$0" || echo "$0")")"
SHA="$SCRIPT-sha"

cOUT="$( curl -s -L $URL)"
sOUT="$( echo $cOUT | sha256sum )"

if [ ! -f $DIR/$SHA ]; then
    echo $sOUT > $DIR/$SHA
else
    if [[ $(echo $cOUT | sha256sum -c $DIR/$SHA --quiet) ]]; then
        echo $cOUT > $DIR/"$SCRIPT-curl-output-$( date +"%Y-%m-%d-%H-%M-%S" )"
        echo $sOUT > $DIR/$SHA
        echo "Each update is stored in a separated file." | mail -s "Google.com homepage has been updated" -r [email protected] [email protected]
        # crontab -l | grep -v "$SCRIPT" | crontab -
    fi
fi

Control + o to save the script. Enter the a name for the new file e.g. web-monitor-google, with a full path if it is to be saved in a location other than the current working directory e.g. /home/kurt/web-monitor-google. Press “enter” to save.

Control + x to exit nano.

There are many elements in the script. Let’s look at them one by one.

Declaring Variables

The top part of the script declares a number of variables and set their values accordingly.

URL="https://www.google.com"

DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
SCRIPT="$(basename "$(test -L "$0" && readlink "$0" || echo "$0")")"

SHA="$SCRIPT-sha"

cOUT="$( curl -s -L $URL)"
sOUT="$( echo $cOUT | sha256sum )"

A variable can house a number, a text, or a collection of them. Once a variable is declared and set, it can be read over and over again throughout the script.

One benefit of a variable is that it allows you to store the output of a resource-heavy command or computation so whenever the output is needed throughout the script, it can be read from the variable instead of running the command or computation all over again.

For example the cURL command is used to download the content of the web page of interest. The content is then stored in a variable called cOUT. Whenever the script needs to read the content of the web page, instead of running the cURL command again to re-download it, the script can simply read the web page content from the cOUT variable. This reduces resource and bandwidth usage. It makes your script run faster.

Another benefit of a variable is that it makes future modification easier.

For example the URL of the web page of interest is stored in a variable declared right at the beginning of the script. Whenever the script needs to refer to the URL of the web page, it reads from the variable. If later on, this script is to be modified to monitor another web page, the URL can simply be changed in the variable, right at the beginning of the script, instead of having to go through the entire script and change every single mention of the URL. This saves time.

URL: the URL of the web page of interest

The first variable is the URL.

URL="https://www.google.com"

A variable called URL is declared and is set to the URL of the web page of interest. In this example, it will be https://www.google.com.

DIR: the full path of the script

The second variable is the DIR.

DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"

This DIR variable runs a series of commands to find out the full path of the script and stores it.

The script will be using a number of commands to download the web page of interest and generate a number of runtime files.

These runtime files include files that house a signature of the web page content so it can be compared to in subsequent runs, changes to web page if any, and of course a log of any error the script encounters.

It is a good practice to store these runtimes fils in the same directory of the script.

Having the full path of the script in this DIR variable, whenever the script needs to save or read from a runtime file, it can read from this DIR variable for the path to the runtime file.

SCRIPT: the name of the script file

The third variable is the SCRIPT.

SCRIPT="$(basename "$(test -L "$0" && readlink "$0" || echo "$0")")"

This SCRIPT variable runs a series of commands to find out the name of the script file and stores it.

Throughput the script, a number of runtime files will be generated. It is a good practice to use the name of the script file as a prefix for naming these runtime files.

Having the name of the script file in this SCRIPT variable, whenever the script needs to save or read from a runtime file which uses the name of the script file as a prefix, it can read from this variable for the prefix to use.

SHA: the name of the file that houses a signature of the web page content

The forth variable is the SHA.

SHA="$SCRIPT-sha"

This SHA variable constructs the name of the file that houses a signature of the web page content so it can be compared to in subsequent runs.

It uses the name of the script file as a prefix for naming the file that houses the web page signature.

cOUT: the current web page content

The fifth variable is the cOUT.

cOUT="$( curl -s -L $URL)"

This cOUT variable executes the cURL command to download the content of the web page specified by the URL variable and stores it.

Whenever the script the script needs to read the content of the web page, instead of running the cURL command again to re-download the web page, it can simply read the web page content from the cOUT variable.

sOUT: the signature of the current web page content.

The last variable is the sOUT.

sOUT="$( echo $cOUT | sha256sum )"

This cOUT variable does three things.

First it prints out the cOUT variable which is the the content of the web page of interest. Secondly it executes the sha256sum command to generate a signature of the web page content. Lastly it stores the signature.

Whenever the script needs to read the signature of the web page content, instead of re-downloading the web page and running the sha256sum command again to re-generate it, it can simply read the signature from the sOUT variable.

Performing Operations

The bottom part of the script performs operations on the variables, to determine whether the web page has been updated, while logging operation and storing a copy of the new content for reference.

if [ ! -f $DIR/$SHA ]; then
    echo $sOUT > $DIR/$SHA
else
    if [[ $(echo $cOUT | sha256sum -c $DIR/$SHA --quiet) ]]; then
        echo $cOUT > $DIR/"$SCRIPT-curl-output-$( date +"%Y-%m-%d-%H-%M-%S" )"
        echo $sOUT > $DIR/$SHA
        echo "Each update is stored in a separated file." | mail -s "Google.com homepage has been updated" -r [email protected] [email protected]
        # crontab -l | grep -v "$SCRIPT" | crontab -
    fi
fi
Build the signature file on first run

The first IF statement checks if the file that houses a signature of the web page content exists.

if [ ! -f $DIR/$SHA ]; then
    echo $sOUT > $DIR/$SHA

The signature is generated by SHA-256 checksum and is to be compared to in subsequent run to spot changes in the web page content.

If the signature file does not exist, this means either this is the first time the script runs, or that the signature file has been lost. The script should then build the signature file using the current web page content.

If however the signature file exists, then the script should proceed comparing the signature of the current web page content, to the one in the signature file.

Compare the current web page content to a previous run

The sha256sum command is executed to compare the signatures.

else
    if [[ $(echo $cOUT | sha256sum -c $DIR/$SHA --quiet) ]]; then

It is instructed to provide no output if the signatures match.

If there is no output, this means there has been no change to the web page content.

If there is output, this means there has been changes to the web page content. The script should then proceed accordingly.

Log the new content, update the signature file and send email notification

Now that there has been changes to the web page content, the script should proceed to log the change, update the signature file with the new signature and send email alert.

        echo $cOUT > $DIR/"$SCRIPT-curl-output-$( date +"%Y-%m-%d-%H-%M-%S" )"
        echo $sOUT > $DIR/$SHA
        echo "Each update is stored in a separated file." | mail -s "Google.com homepage has been updated" -r [email protected] [email protected]
        # crontab -l | grep -v "$SCRIPT" | crontab -
    fi
fi

The first line here outputs the new web page content to a file. Its name is prefixed by the name of the script file, followed by “-curl-output-” and the current date and time.

The second line then updates the signature file with the new signature.

The third line uses the mail command to send an email to [email protected] from [email protected] The email is titled “Google.com homepage has been updated” and says “Each update is stored in a separated file.” in the email body.

The last line is prefixed by a hashtag (#). Bash is designed to disregard any text after a hashtag (#).

This allows you to insert text in the script as comment, to make note of a particular line, or to explain your future self or anyone else using or developing the script, what a particular line is intended to do.

Later on, you will be adding this script to the system task scheduler , which will execute this script on a regular schedule to monitor the web page.

This last line modifies the cron table to remove the script from the execute schedule.

If you would like the script to stop being executed on a regular schedule once a change is spotted in the web page content, then we can remove the hashtag (#) of this last line so that Bash will execute this last line, once there is a change in the web page content, to remove it from the execute schedule.

This is not needed now. It may be needed in the future. This last line is therefore commented out so that this last line can be toggled on and off with a single keystroke.

Testing and Troubleshooting

Now the script can be tested.

First thing is to do is to provide the script with execute permission.

sudo chmod +x /full/path/to/script

sudo chmod +x /home/kurt/web-monitor-google

To execute the script, simply call on it in your Bash shell.

/home/kurt/web-monitor-google

If all goes well, there should be no output message, and in the same directory of the script, there will be a new file, with a file name prefixed by the name of your script, followed by “-sha” e.g. web-monitor-google-sha.

In it, there is a signature of the web page content generated by SHA-256 checksum.

This means the script has detected the absence of the signature file and built it from scratch.

To simulate a change in the web page content, use the nano editor to open the signature file and change it.

sudo nano /full/path/to/signature-file

sudo nano /home/kurt/web-monitor-google-sha

Simply change the first digit into any other number or alphabet. When done, control + x to exit, press “y” and “enter” to save.

Now execute the script again by calling on it in your Bash shell.

/home/kurt/web-monitor-google

If all goes well, there should be an output message saying that.

sha256sum: WARNING: 1 computed checksum did NOT match

Which is expected of the script.

In the same directory of the script, there will be a new file, with a file name prefixed by the name of your script, followed by “-curl-output-” and the current date and time e.g. web-monitor-google-curl-output-2017-01-01-00-00-00.

In it, there is a copy of the web page content.

You should also receive an email alerting you that the web page has been updated.

Once everything is good, you can clean up the signature file and the curl output.

sudo rm /full/path/to/signature-file /full/path/to/curl-output

sudo rm /home/kurt/web-monitor-google-sha /home/kurt/web-monitor-google-curl-output-2017-01-01-00-00-00

Below are some of the common errors.

Bash: permission denied
-bash: /full/path/to/script: Permission denied

Please make sure that your script has execute permission. To add execute permission.

sudo chmod +x /full/path/to/script

sudo chmod +x /home/kurt/web-monitor-google
curl: command not found
/full/path/to/script: line n: curl: command not found

Please make sure curl is installed on your server. Alternatively, use any other tool available on your server that can download a web page by URL.

sha256sum: command not found
/full/path/to/script: line n: sha256sum: command not found

Please make sure sha256sum is installed on your server. Alternatively, use any other secure hash algorithm available on your server.

mail: command not found or error of any kind

Please make sure a mail transfer agent is properly installed on your server.

Digital Ocean has a lovely guide on setting up Postfix on Ubuntu 16.04.

Scheduling a Bash script

Now that the script works as expected, it can be scheduled to run on a regular basis to monitor the web page of interest for change.

Cron is the task schedule for Linux. Tasks can be scheduled by adding it to the crontab, which is short for cron table.

To edit the crontab and add your Bash script there, do.

crontab -e

You may be asked which text editor should be used to edit the crontab. Select nano or your preferred one.

At the end of the crontab, add.

*/15 * * * * /full/path/to/script >> /full/path/to/script.log 2>&1

*/15 * * * * /home/kurt/web-monitor-google >> /home/kurt/web-monitor-google.log 2>&1

This line tells cron when this task should be executed and which script to be executed when the time comes. It also redirects the output and error of the script to a file of your choice for log.

*/15 * * * * means that cron will run this every 15 minutes. Refer here for more information on scheduling.

When done, control + x to exit, press “y” and “enter” to save.

To stop the script from executing on a regular basis, simply edit the crontab and remove this line.

To have the script remove itself from the crontab once it spots update to the web page. Add or uncomment this line to the script.

crontab -l | grep -v "$SCRIPT" | crontab -

Conclusion

This is just one example of using a simple Bash script and a system scheduler to automate tasks. There are many more amazing things one can do with a server.

Explore and be amazed.