The following items should be completed to maintain the health of your workstation or server. For compute clusters, please see Common Maintenance Tasks (Clusters).
Backup non-replaceable data
Remember that RAID is not a replacement for backups. If your system is stolen, hacked or started on fire, your data will be gone forever. Automate this task or you will forget.
- For many groups, a weekly or monthly cron job is fine. Write a script calling
rsync
ortar
which writes the files to a separate server, NAS or SAN. Place the script in/etc/cron.weekly/
or/etc/cron.monthly/
- Users with more complex requirements should look at AMANDA or Bacula
- Tape backup systems are still available for those who prefer them. Contact us.
Verify the health of the drive arrays (RAIDs)
Drive sectors can go bad silently. Scheduling regular verifies will weed out any issues before they occur. Automate them or you will forget.
- Linux Software RAID (mdadm) arrays can be easily kicked into verify mode. Many distributions (Red Hat, CentOS, Ubuntu) come with their own utilities. To manually start a verify, run this line for each RAID (as root):
echo check > /sys/block/md#/md/sync_action
Watch the text file/proc/mdstat
and the output ofdmesg
to watch the status of each verify. - Hardware RAID controllers provide their own methods for automated verifies and alert notification. Reference the controller’s manual.
Monitor system alarms and system health
- Preferred: learn how to use the IPMI capability of your system for remote monitoring and management. You’ll spend a lot less time trekking to the datacenter.
- Alternative: listen for system alarms and check for warning LEDs.
Don’t ignore alarms! If you put it off, you’ll soon find that something else is wrong and the system needs major repair.