Posted on May 15, 2021 — 13 Minutes Read
The rest of the code is containerised with Docker Compose for a modular and cloud-native deployment that fits in any microservice architecture, and is shared on Github for reference and further development.
Writing a Python Script
Task at hand now is to write a script to instruct your server to read a web page, and to compare it to a previous read to see if there has been any update. There are many tools or scripting languages for this purpose and Python is one of them.
Before writing the script, the latest version of Python and the corresponding package manager need to be properly installed.
$ apt install python3 python3-pip
Once these are set, the relevant libraries may be installed with the package manager as well.
$ pip3 install requests bs4
With everything in place, use the nano text editor to open a new file and start writing the script.
Thereafter goes the script.
Control + o to save the script. Enter a name for the new file e.g.
web-js-monitor.py, with a full path if it is to be saved in a location other than the current working directory e.g.
/app/web-js-monitor.py. Be sure to add
.py at the end of the file name to remark the file as a Python script Press
enter to save.
Control + x to exit nano.
The key elements will be examined in the discussion that follows.
Before anything else goes the import statement that instructs Python to literally import objects and methods from other standard or installed libraries that may be leveraged to code upon.
import requests ... from sys import path, argv ...
Following the Object-Oriented programming, this script is constructed around a main object aptly named
monitor within which variables and methods are housed.
class monitor: ...
The top part of the
monitor object define the error messages. These messages are referenced to throughout the class. Declaring them once and right at the top of the class makes modification easier if comes a time in the future that these need to be changed.
class monitor: ERR_USAGE = '''\ Usage: web-js-monitor.py [-e] -u Option: -h Display usage -e, --email Send email notification for changes -u, --url The URL of interest ''' ...
Downloading and Matching
The rest of the
monitor object defines the available methods. First among all is of course the reserved
__init__ method which is the compulsory method for any class in Python and is called once an instance of the class is initialised. For the
monitor object this method is where the URL of interest and whether or not email notification is needed, are read from the arguments supplied to the script. If email notification is requested, the script will read the corresponding environment variables that supply the Simple Mail Transfer Protocol over SSL (SMTPS) port number, the server address of the email sender, the email sender and receiver addresses, and the sender email password, that will be needed for the email notification. If Gmail is used as the sender email address, given that by default on security consideration it does not allow access from scripts or applications that do not meet their security standards. For this script to sign into the sender Gmail account and to send an email to the designated recipient, the less secure app access needs to be enabled. Follow these steps to turn it on before proceeding.
''') or leading with a hashtag (
#) are left for easier navigation along the way.
class monitor: ... def __init__(self, argv): ...
__init__ method is a method named
match that does precisely what its name says i.e. matching the generated SHA 256 checksum to a previous build. Logic dictates that there are three exhaustive outcomes, namely a match or otherwise, together with a non-match on ground of a missing previous build. It is the job of this match method to determine which outcome is prevalent and call the other two methods that handle the output and the notification as appropriate.
class monitor: ... def __init__(self, argv): ... def match(self): ...
The two remaining methods that follow, namely
__email manage the output and the notification. These are private methods that are not designed to be called by anything outside of the class itself for obvious reason. Without the previous match method that determines which reality the script finds itself in, dumping any output or triggering any email notification could have dire consequences. For keeping these two methods private to the class, they are named with two leading underscores that triggers name mingling which will among other things prevents them from being called outside the class.
class monitor: ... def __init__(self, argv): ... def match(self): ... def __write(self): ... def __email(self): ...
Right after the
monitor object is defined, comes the instruction to initiate an instance of it with the rest of the arguments supplied from when this script is called, if and only if the script is called directly instead of being imported as a library. For otherwise it is up to the other script to initiate the
monitor object as appropriate.
if __name__ == '__main__': ...
Testing and Troubleshooting
With everything in place, the script can now be tested by simply calling Python to interpret and execute it with the URL of interest e.g. https://lookingglass.pccwglobal.com/.
$ python3 /app/web-js-monitor.py -e -u https://lookingglass.pccwglobal.com/
No output message will be printed if all goes well. In the same directory of the script, there will be a new directory named by the URL of the web page of interest, with the checksum stored in a file directly under it and the rest of the downloaded contents in a nested directory named by the full date and time of the download to ease access.
Simulating a change in the web page content is cumbersome and next to impossible if it is a web site off access, easier it would be to edit the SHA-256 checksum file to simulate a change in the web page.
Print null to the SHA-256 checksum file to overwrite its content.
echo > *full-path-to-sha-256-checksum*
$ echo > /app/lookingglass.pccwglobal.com/lookingglass.pccwglobal.com-sha256hash
Execute the script again with Python on the same URL will print no output if all goes well. In the directory named by the URL there will be yet a different set of the downloaded contents, and an email will be sent to the designated recipient which will note that a change has been detected and the updates are stored in separate files.
Below are some of the common errors.
python3: command not found
Be sure that the latest version of Python is properly installed. Please refer to the previous discussion.
ImportError: No module named requests
Be sure that the Python library, requests, is properly installed. Please refer to the previous discussion.
ImportError: No module named bs4
Be sure that the Python library, Beautiful Soup, is properly installed. Please refer to the previous discussion.
Scheduling a Python Script
With the script working as expected, it may now be scheduled to run on a regular basis to monitor the web page of interest for update. Cron is the task scheduler for Linux. Tasks can be scheduled by adding them to the
crontab, which is short for cron table.
To edit the
$ crontab -e
If asked which text editor should be used to edit the
nano or any one of preference.
Add a new schedule at the end of the
*/15 * * * * python3 *full-path-to-script* -e -u *url-of-interest* >> *full-path-to-script.log* 2>&1
*/15 * * * * python3 /app/web-js-monitor.py -e -u https://lookingglass.pccwglobal.com/ >> /app/web-js-monitor.py.log 2>&1
The first part of the line i.e.
*/15 * * * * informs cron when and at what interval this task should be executed which in this case is once every 15 minutes. The remaining part of the line tells cron which script to execute when the time comes. It also redirects the standard output and standard error of the script to a log file of your choice.
control + x to exit, and
enter to save.
Python is orders of magnitude more powerful than what this discussion could demonstrate. For this reason, it should come as no surprise that Python is by one measure one of the most popular programming languages for years on end. Experiment and be amazed.