Warning
Keeping this operational with ancient python23 is advantageous
Simple monitoring and recording the output of commands that return a single value or a dict string. The results are stored with a timestamp in a sqlite DB
Usage with diskmon section is shown below. The section must correspond to a section name in the config file, which defaults to ~/.env.cnf:
valmon.py -s diskmon rec rep mon
Usage from cron:
52 * * * * ( valmon.py -s diskmon rec rep mon ) > $CRONLOG_DIR/diskmon.log 2>&1
Usage of the valmon.py script and the env python modules that it is based upon requires these to be installed as described at Installing env. Essentially this just requires a symbolic link from python site-packages and a PATH setting to give easy access to scripts from eg /root/env/bin
The value monitoring in valmon.py is kept generic, with all the specifics of obtaining the values handled within the command called and choosing constraints to apply to them within the config.
For example the diskmon section uses the disk_usage.py script which returns a dict string:
[blyth@cms01 e]$ disk_usage.py
{'gb_total': '131.74', 'gb_free': '24.90', 'percent_free': '18.90', 'percent_used': '76.02'}
Other sections like oomon monitors the single integer returned by the below command:
[root@cms02 ~]# grep oom /var/log/messages | wc -l
0
This approach allows the value monitoring and persistence framework to be reused for monitoring any quantity which commands or scripts can be written to obtain.
The variables available in context which may be constrained correspond to the fields in the table. These have changed through various versions.
The command to run and the constraints applied to what it returns are obtained from config. This approach is taken to allow most typical changes of varying constraints to be done via configuration only.
Examples:
[oomon]
note = despite notification being enabled this failed to notify me, apparently the C2 OOM issue made the machine incapable of sending email ?
cmd = grep oom /var/log/messages | wc -l
return = int
constraints = ( val == 0, )
dbpath = ~/.env/oomon.sqlite
tn = oomon
[diskmon]
note = stores the dict returned by the command as a string in the DB without interpretation
cmd = disk_usage.py /data
valmon_version = 0.2
return = dict
constraints = ( gb_free > 10, )
dbpath = ~/.env/envmon.sqlite
tn = diskmon
[sshagent_mon]
note = require an sshagent process is running by constraining the return code from the pgrep command
valmon_version = 0.2
email = blyth@hep1.phys.ntu.edu.tw
cmd = pgrep ssh-agent
return = int
constraints = ( rc == 0, )
dbpath = ~/.env/sshagent_mon.sqlite
tn = sshagent_mon
[dbsrvmon]
note = currently set to fail via age
chdir = /var/dbbackup/dbsrv/belle7.nuu.edu.tw/channelquality_db_belle7/archive/10000
cmd = digestpath.py
valmon_version = 0.2
return = dict
constraints = ( tarball_count >= 34, dna_mismatch == 0, age < 86400 , age < 1000, )
dbpath = ~/.env/dbsrvmon.sqlite
tn = channelquality_db
[envmon]
note = check C2 server from cron on other nodes
hostport = dayabay.phys.ntu.edu.tw
# from N need to get to C2 via nginx reverse proxy on H
#hostport = hfag.phys.ntu.edu.tw:90
cmd = curl -s --connect-timeout 3 http://%(hostport)s/repos/env/ | grep trunk | wc -l
return = int
constraints = ( val == 1, )
instruction = require a single trunk to be found, verifying that the apache interface to SVN is working
observations = may 16, 2013 observing variable response times that triggering notifications with a 3s timeout
dbpath = ~/.env/envmon.sqlite
tn = envmon
[envmon_demo]
note = check C2 server from cron on C,
cmd = curl -s --connect-timeout 3 http://dayabay.phys.ntu.edu.tw/repos/env/ | grep trunk | wc -l
return = int
valmin = -100
valmax = 100
constraints = ( val == 1 and val < valmax, val > valmin , val < valmax )
instruction =
the simple python `constraints` expression is evaluated within the scope of
the section config values (with things that can be coerced to floats so coerced)
the constraint needs to evaluate to a tuple of one or more bools.
To specify a one element tuple a trailing comma is needed, eg "( val > valmin, )"
dbpath = ~/.env/envmon.sqlite
tn = envmon
When forced to use source rather than system python 2.3 on C2 had to setup the cron environment accordingly:
SHELL=/bin/bash
HOME=/home/blyth
ENV_HOME=/home/blyth/env
CRONLOG_DIR=/home/blyth/cronlog
PATH=/home/blyth/env/bin:/data/env/system/python/Python-2.5.1/bin:/usr/bin:/bin
LD_LIBRARY_PATH=/data/env/system/python/Python-2.5.1/lib
42 * * * * * ( valmon.py -s envmon rec rep mon ) > $CRONLOG_DIR/envmon.log 2>&1
Avoided this complication by yum install python-sqlite2, see simtab for notes on this.