scm-backup-postfix-start
invokes the below:
scm-backup-repo
scm-backup-trac
scm-backup-folder for the apache-confdir
scm-backup-purge : retain the backups from the last 7 days only
scm-recover-all fromnode
In addition to the Trac and SVN repos this also now recovers the users.conf and authz.conf with scm-recover-config in a careful manner prompting for confirmation before replacing this critial apache/svn/Trac config files.
scm-recover-config fromnode
Extracts the users.conf and authz.conf from the svnsetup.tar.gz backup file into a temporary location. Compares these temporaries with the corresponding config files within apache-confdir. If there are preexisting config files, the diffs are show and a confirmation dialog is required to replace them with the extractions.
This calls:
scm-recover-folders # contrary to the name this just places a last link to identify the last tarball folder scm-recover-users scm-recover-authz
scm-recover-users fromnode
extract the users file from the last svnsetup tarball, called by scm-recover-all NB the other svnsetup files are sourced from the repository and contain system specific paths ... so more direct to re-generate them rather than using the backups
The users file is different because it is edited thru the webadmin interface
scm-recover-authz fromnode
Analogous to scm-recover-users for the authz file
still experimental .. NEEDS FURTHER CHECKING PRIOR TO REAL USAGE
recovers the users and permissions files from the last backup
scm-recover-lastlinks typ
typ defaults to tar.gz
this must be run from the backup folder that should contain the “last” link eg:
/var/scm/backup/cms01/tracs/env last -> 2008/08/14/174749if the “last” link exists then exit without doing anything, however if the last link has been collapsed into a folder (eg by web transfers or non-careful copying) then delete that folder and attempt to recreate the “last” link to the directory containing the last file of type
scm-backup-purge from-node number-to-keep
scm-backup-rsync
to the paired node to override and send the backup to non-standard destination, eg while not inside home internal network need to use G3R:
BACKUP_TAG=G3R scm-backup-rsync
scm-backup-rsync-from-node
rsync the backups from a remote node
scm-backup-dybsvn-from-node
copy over the reps for a specific day
scm-backup-eup
updates the env sphinx docs, including the SCM backup tarball monitoring pages and plots.
On repo node C2, this is done automatically via root crontab running scm-backup-monitor This means that in order to update env docs on C2, must do so as root:
ssh C2 /data/env/system/svn/subversion-1.4.6/bin/svn up \~/env ssh C2R scm-backup- scm-backup-eup
Note that the rsync LOCKED status is propagated to the remote directory during the rsync transfer, thus avoiding usage during transfers.
Locking now prevents backup/rsync/recover functions both locally and remotely from touching partials. The backup procedures are purported to be hotcopy of themselves although mismatches between what gets into the trac instance backup and the svn repo backup are possible. Such mismatches would not cause corruption however, probably just warnings from Trac syncing.
The DNA check ensures that the tarball content immediately after creation corresponds precisely to the tarball at the other end of the transfers.
scm-backup-trac
- scm-tgzcheck-trac : does a ztvf to /dev/null, extracts trac.db from tgz, dumps trac sql using sqlite3
- scm-backup-dna : writes python dict containing md5 digest and size of tgz in sidecar .dna file
scm-backup-repo
- scm-tgzcheck-ztvf : does a ztvf to /dev/null
- scm-backup-dna : as above
scm-backup-rsync
- performs remote DNA check for each paired backup node with scm-backup-dnachecktgzs : finds .tar.gz.dna and looks for mutants (by comparing sidecar DNA with recomputed)
[dayabay] /home/blyth/env > ~/e/base/digestpath.py /home/scm/backup/dayabay/svn/dybaux/2011/10/19/100802/dybaux-5086.tar.gz
{'dig': '7b87e78cc03ea544e2ad3abae46eecd1', 'size': 1915051630L}
[blyth@cms01 ~]$ ~/e/base/digestpath.py /data/var/scm/backup/dayabay/svn/dybaux/2011/10/18/100802/dybaux-5083.tar.gz
{'dig': 'da39aee61a748602a15c98e3db25d008', 'size': 1915004348L}
[blyth@cms01 ~]$ ~/e/base/digestpath.py /data/var/scm/backup/dayabay/svn/dybaux/2011/10/18/100802/dybaux-5083.tar.gz
{'dig': 'da39aee61a748602a15c98e3db25d008', 'size': 1915004348L}
sometime later, there is no change : transfer stalled ?
Checking logs see error:
=== scm-backup-rsync : quick re-transfer /var/scm/backup/cms02 to C:/data/var/scm/backup/ after unlock
=== scm-backup-rsync : time rsync -e "ssh" --delete-after --stats -razvt /var/scm/backup/cms02 C:/data/var/scm/backup/ --timeout 10
Scientific Linux CERN SLC release 4.8 (Beryllium)
building file list ... done
rsync: mkdir "/data/var/scm/backup" failed: No such file or directory (2)
rsync error: error in file IO (code 11) at main.c(576) [receiver=3.0.6]
rsync: connection unexpectedly closed (8 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(359)
real 0m1.153s
Repeating the rsync command manually works, deleting the backlog of unpurged tarballs:
[root@cms02 log]# rsync -e "ssh" --delete-after --stats -razvt /var/scm/backup/cms02 C:/data/var/scm/backup/ --timeout 10
INFO:env.tools.libfab:ENV setting (key,val) (timeout,2)
INFO:__main__:to check db: echo .dump tgzs | sqlite3 /data/env/local/env/scm/scm_backup_monitor.db
INFO:env.scm.tgz:opening DB /data/env/local/env/scm/scm_backup_monitor.db
INFO:ssh.transport:Connected (version 1.99, client OpenSSH_4.3p2-6.cern-hpn-CERN-4.3p2-6.cern)
INFO:ssh.transport:Authentication (publickey) successful!
INFO:ssh.transport:Secsh channel 1 opened.
monitor cfg: {'HOST': 'C',
'HUB': 'C2',
'dbpath': '$LOCAL_BASE/env/scm/scm_backup_monitor.db',
'email': 'blyth@hep1.phys.ntu.edu.tw simon.c.blyth@gmail.com',
'jspath': '$APACHE_HTDOCS/data/scm_backup_monitor_%(node)s.json',
'reporturl': 'http://dayabay.phys.ntu.edu.tw/e/scm/monitor/%(srvnode)s/',
'select': 'repos/env tracs/env repos/aberdeen tracs/aberdeen repos/tracdev tracs/tracdev repos/heprez tracs/heprez',
'srvnode': 'cms02'}
[C] run: find $SCM_FOLD/backup/cms02 -name '*.gz' -exec du --block-size=1M {} \;
[C] out: /home/blyth/.bash_profile: line 32: /data/env/local/env/home/env.bash: No such file or directory^M
[C] out: /home/blyth/.bash_profile: line 313: sv-: command not found^M
[C] out: /home/blyth/.bash_profile: line 315: python-: command not found^M
[C] out: find: /backup/cms02: No such file or directory^M
Fatal error: run() received nonzero return code 1 while executing!
changed Aug 2011 : Cron jobs time changed to 15pm(Beijing Time) and 09am(beijing).
Early versions of APR on its 0.9 branch, which Apache 2.0.x and Subversion 1.x use, have no support for copying large files (2Gb+). A fix which solves the ‘svnadmin hotcopy’ problem has been applied and is included in APR 0.9.5+ and Apache 2.0.50+. The fix doesn’t work on all platforms, but works on Linux.
On C2 are using source apache /data/env/system/apache/httpd-2.0.63
Note potential issue of incomplete tarballs, to reduce change
Run as root, eg from C2R:
scm-backup- ## pick up changes
t scm-backup-repo ## check the function
mkdir -p /tmp/bkp
scm-backup-repo newtest /var/scm/repos/newtest /tmp/bkp dummystamp
export LD_LIBRARY_PATH=/data/env/system/sqlite/sqlite-3.3.16/lib:$LD_LIBRARY_PATH ## for the right sqlite, otherwise aborts
scm-backup-trac newtest /var/scm/tracs/newtest /tmp/bkp dummystamp
Run as root, eg from C2R:
scm-backup-
t scm-backup-all ## check the function
rm -rf /tmp/bkptest ; mkdir -p /tmp/bkptest
export LD_LIBRARY_PATH=/data/env/system/sqlite/sqlite-3.3.16/lib:$LD_LIBRARY_PATH
cd /tmp ; SCM_BACKUP_TEST_FOLD=/tmp/bkptest scm-backup-all
DELETE FROM bitten_log_message WHERE log IN (SELECT id FROM bitten_log WHERE build IN (SELECT id FROM bitten_build WHERE rev < 23000 AND config = 'trunk'))
DELETE FROM bitten_log WHERE build IN (SELECT id FROM bitten_build WHERE rev < 23000 AND config = 'trunk')
DELETE FROM bitten_error WHERE build IN (SELECT id FROM bitten_build WHERE rev < 23000 AND config = 'trunk')
DELETE FROM bitten_step WHERE build IN (SELECT id FROM bitten_build WHERE rev < 23000 AND config = 'trunk')
DELETE FROM bitten_slave WHERE build IN (SELECT id FROM bitten_build WHERE rev < 23000 AND config = 'trunk')
DELETE FROM bitten_build WHERE rev < 23000 AND config = 'trunk'
compare:
scm-backup-du
scm-backup-rls
check base/cron.bash ... usually some environment change has broken the env setup for cron after modifications reset the cron backups:
cron-
cron-usage
cron-backup-reset
cron-list root
cron-list blyth
Warning
Usage of cron fabrication is deprecated, its easier to do this manually
Probably the agent needs restarting.. this is needs to be done manually after a reboot see:
ssh--usage
ssh--agent-start
then check offbox passwordless access with:
scm-backup-
scm-backup-rls
Do an emergency backup and rsync, with:
scm-backup-all-as-root
scm-backup-rsync
scm-backup-rls ## check the remote tgz