zondag 24 augustus 2014

Part III : ZFS - Scripts - Howto: Ubuntu Home Server (ZFS + virtual IpFire)

Setting up hardware monitoring and webmin 

Table of Contents:

Why hardware monitoring
CPU & Chipset temperature + fanspeed
HDD temperature
SMART monitoring
ZFS Health monitoring
Scheduling via CRON
Management via the browser



Why hardware monitoring

Current consumer electronics are equipped with a number of hardware monitoring sensors. Given that this server will be running 24/7 and will be hosting some of my most important data, I would like to have the piece of mind that nothing will go wrong. Moreover, Linux provides us with the excellent functionality of sending an email upon disk failure, hardware failure, etc.

CPU & Chipset temperature + fanspeed

CPU and chipset are monitored by LM-sensors. Webmin (which we will use for our combined interface) also supports these. Installation is easy :

sudo apt-get install lm-sensors

HDD temperature

HDD temp can be monitored via the hdd temp deamon install is again easy:

sudo apt-get install hddtemp

Automatically reading sensors

Both sensors and hdd temp are automagically read every hour by a cron job using the script below. Name the script eg. sensor-reading.sh and place in ' /usr/sbin ' . chmod +x so it is executable. The results are written to a file in the zfspool

#!/bin/sh
#script by Gertdus
echo `date +"%H_%M_%d_%m_%Y"`

cd /zfspool/ServerLogs

DIR=`date +"%Y_%m_%d"`
FILENAME=`date +"%H_%M"`

echo "$FILENAME"
echo "$DIR"

mkdir $DIR

cd $DIR

date > $FILENAME

sensors >> $FILENAME
hddtemp /dev/sda >> $FILENAME
hddtemp /dev/sdb >> $FILENAME
hddtemp /dev/sdc >> $FILENAME
hddtemp /dev/sdd >> $FILENAME
hddtemp /dev/sde >> $FILENAME
hddtemp /dev/sdf >> $FILENAME
hddtemp /dev/sdg >> $FILENAME
hddtemp /dev/sdh >> $FILENAME

mv $FILENAME $FILENAME.txt

SMART monitoring

Finally, smart info is monitored via smartmontools, again install:

sudo apt-get install smartmontools

I use a daily and monthly script. Daily read out and monthly short test + read out. Results are written to a log file in the zfs pool  ('/zfspool/serverlogs/')

Daily script, name eg. smart-test.sh and place in ' /usr/sbin ' . chmod +x so it is executable.

# Script by Meliorator. irc://irc.freenode.net/Meliorator
# modified by Ranpha and Gertdus


cd /zfspool/ServerLogs


DIR=`date +"%Y_%m_%d"`
FILENAME=SMART
echo "$FILENAME"
echo "$DIR"


mkdir $DIR
cd $DIR
mkdir smart-logs


smartctl -a /dev/sda >> smart-logs/smart-sda.log
smartctl -a /dev/sdb >> smart-logs/smart-sdb.log
smartctl -a /dev/sdc >> smart-logs/smart-sdc.log
smartctl -a /dev/sdd >> smart-logs/smart-sdd.log
smartctl -a /dev/sde >> smart-logs/smart-sde.log
smartctl -a /dev/sdf >> smart-logs/smart-sdf.log
smartctl -a /dev/sdg >> smart-logs/smart-sdg.log
smartctl -a /dev/sdh >> smart-logs/smart-sdh.log


and the monthly script, name eg. smart-test_monthly.sh and place in ' /usr/sbin ' . chmod +x so it is executable.


# Script by Meliorator. irc://irc.freenode.net/Meliorator
# modified by Ranpha and Gertdus


cd /zfspool/ServerLogs


DIR=`date +"%Y_%m_%d"`
FILENAME=SMART
echo "$FILENAME"
echo "$DIR"


mkdir $DIR
cd $DIR
mkdir smart-logs


smartctl -H -l selftest -f -d -m root ata /dev/sda >> smart-logs/smart-sda.log
sleep 5
smartctl -H -l selftest -f -d -m root ata /dev/sdb >> smart-logs/smart-sdb.log
sleep 5
smartctl -H -l selftest -f -d -m root ata /dev/sdc >> smart-logs/smart-sdc.log
sleep 5
smartctl -H -l selftest -f -d -m root ata /dev/sdd >> smart-logs/smart-sdd.log
sleep 5
smartctl -H -l selftest -f -d -m root ata /dev/sde >> smart-logs/smart-sde.log
sleep 5
smartctl -H -l selftest -f -d -m root ata /dev/sdf >> smart-logs/smart-sdf.log
sleep 5
smartctl -H -l selftest -f -d -m root ata /dev/sdg >> smart-logs/smart-sdg.log
sleep 5
smartctl -H -l selftest -f -d -m root ata /dev/sdh >> smart-logs/smart-sdh.log
sleep 5


smartctl -a /dev/sda >> smart-logs/smart-sda.log
smartctl -a /dev/sdb >> smart-logs/smart-sdb.log
smartctl -a /dev/sdc >> smart-logs/smart-sdc.log
smartctl -a /dev/sdd >> smart-logs/smart-sdd.log
smartctl -a /dev/sde >> smart-logs/smart-sde.log
smartctl -a /dev/sdf >> smart-logs/smart-sdf.log
smartctl -a /dev/sdg >> smart-logs/smart-sdg.log
smartctl -a /dev/sdh >> smart-logs/smart-sdh.log




ZFS Health monitoring

ZFS health is monitored by the script from calomel.org. However I have added the change that if all is well, an email is sent saying all is well.

Name the script eg. zfs-health.sh and place in ' /usr/sbin ' . chmod +x so it is executable.

#! /bin/bash
#
# Calomel.org
#     https://calomel.org/zfs_health_check_script.html
#     FreeBSD 9.1 ZFS Health Check script
#     zfs_health.sh @ Version 0.15


# Check health of ZFS volumes and drives. On any faults send email. In FreeBSD
# 10 there is supposed to be a ZFSd daemon to monitor the health of the ZFS
# pools. For now, in FreeBSD 9, we will make our own checks and run this script
# through cron a few times a day.


# 99 problems but ZFS aint one
problems=0


# Health - Check if all zfs volumes are in good condition. We are looking for
# any keyword signifying a degraded or broken array.


condition=$(/sbin/zpool status | egrep -i '(DEGRADED|FAULTED|OFFLINE|UNAVAIL|REMOVED|FAIL|DESTROYED|corrupt|cannot|unrecover)')
if [ "${condition}" ]; then
       emailSubject="`hostname` - ZFS pool - HEALTH fault"
       problems=1
fi


# Capacity - Make sure pool capacities are below 80% for best performance. The
# percentage really depends on how large your volume is. If you have a 128GB
# SSD then 80% is reasonable. If you have a 60TB raid-z2 array then you can
# probably set the warning closer to 95%.
#
# ZFS uses a copy-on-write scheme. The file system writes new data to
# sequential free blocks first and when the uberblock has been updated the new
# inode pointers become valid. This method is true only when the pool has
# enough free sequential blocks. If the pool is at capacity and space limited,
# ZFS will be have to randomly write blocks. This means ZFS can not create an
# optimal set of sequential writes and write performance is severely impacted.


maxCapacity=80


if [ ${problems} -eq 0 ]; then
  capacity=$(/sbin/zpool list -H -o capacity)
  for line in ${capacity//%/}
    do
      if [ $line -ge $maxCapacity ]; then
        emailSubject="`hostname` - ZFS pool - Capacity Exceeded"
        problems=1
      fi
    done
fi


# Errors - Check the columns for READ, WRITE and CKSUM (checksum) drive errors
# on all volumes and all drives using "zpool status". If any non-zero errors
# are reported an email will be sent out. You should then look to replace the
# faulty drive and run "zpool scrub" on the affected volume after resilvering.


if [ ${problems} -eq 0 ]; then
  errors=$(/sbin/zpool status | grep ONLINE | grep -v state | awk '{print $3 $4 $5}' | grep -v 000)
  if [ "${errors}" ]; then
       emailSubject="`hostname` - ZFS pool - Drive Errors"
       problems=1
  fi
fi


# Scrub Expired - Check if all volumes have been scrubbed in at least the last
# 8 days. The general guide is to scrub volumes on desktop quality drives once
# a week and volumes on enterprise class drives once a month. You can always
# use cron to schedule "zpool scrub" in off hours. We scrub our volumes every
# Sunday morning for example.
#
# Scrubbing traverses all the data in the pool once and verifies all blocks can
# be read. Scrubbing proceeds as fast as the devices allows, though the
# priority of any I/O remains below that of normal calls. This operation might
# negatively impact performance, but the file system will remain usable and
# responsive while scrubbing occurs. To initiate an explicit scrub, use the
# "zpool scrub" command.
#
# The scrubExpire variable is in seconds. So for 8 days we calculate 8 days
# times 24 hours times 3600 seconds to equal 691200 seconds.
#30days


scrubExpire=2592000


if [ ${problems} -eq 0 ]; then
  currentDate=$(date +%s)
  zfsVolumes=$(/sbin/zpool list -H -o name)


 for volume in ${zfsVolumes}
  do
   if [ $(/sbin/zpool status $volume | egrep -c "none requested") -ge 1 ]; then
       echo "ERROR: You need to run \"zpool scrub $volume\" before this script can monitor the scrub expiration time."
       break
   fi
   if [ $(/sbin/zpool status $volume | egrep -c "scrub in progress|resilver") -ge 1 ]; then
       break
   fi


   ### FreeBSD with *nix supported date format
    #scrubRawDate=$(/sbin/zpool status $volume | grep scrub | awk '{print $15 $12 $13}')
    #scrubDate=$(date -j -f '%Y%b%e-%H%M%S' $scrubRawDate'-000000' +%s)


   ### Ubuntu with GNU supported date format
   scrubRawDate=$(/sbin/zpool status $volume | grep scrub | awk '{print $11" "$12" " $13" " $14" "$15}')
   scrubDate=$(date -d "$scrubRawDate" +%s)


      if [ $(($currentDate - $scrubDate)) -ge $scrubExpire ]; then
       emailSubject="`hostname` - ZFS pool - Scrub Time Expired. Scrub Needed on $volume"
       problems=1
      fi
  done
fi


# Notifications - On any problems send email with drive status information and
# capacities including a helpful subject line to root. Also use logger to write
# the email subject to the local logs. This is the place you may want to put
# any other notifications like:
#
# + Update an anonymous twitter account with your ZFS status (https://twitter.com/zfsmonitor)
# + Playing a sound file or beep the internal speaker
# + Update Nagios, Cacti, Zabbix, Munin or even BigBrother


if [ "$problems" -ne 0 ]; then
 echo -e "$emailSubject \n\n\n `/sbin/zpool list` \n\n\n `/sbin/zpool status`" | mail -s "$emailSubject" root
 logger $emailSubject
fi


if [ "$problems" -eq 0 ]; then
 echo "ZFS Healthy"
fi


### EOF ###


Scheduling via CRON

Now simply add these scripts to the crontab so they are automagically executed. If you have used the names above :

sudo leafpad /etc/crontab

# m h dom mon dow user command
29 * * * * root /usr/sbin/zfs-health.sh
30 20 * * * root /usr/sbin/smart-test.sh
31 * * * * root /usr/sbin/sensor-reading.sh
32 20 13 * * root /usr/sbin/smart-test_monthly.sh

Columns indicate
Minute Hour Day Month Day-ofWeek , if * is annotated it is done every minute /hour /etc.

Here zfs health is read every hour at minute x.29 (column 1), and sensors at minute x.31.
Smart read is done daily at 20.30, and smart test monthly on the 13th @ 20.32.

Emailing via postfix

If you have not done so, set up postfix. I have forwarded the root email to my own email for convenience. This way you will receive an email when the cronjob is executed.



echo "bla@domain.com" > /root/.forward

Test postfix


sudo sendmail bla@domain.com
blabla
.

Test postfix
telnet localhost 25
ehlo localhost
mail from: root@localhost
rcpt to: root@localhost
data
Subject: My first mail on Postfix

Hi,
Are you there?
regards,
Admin
. (Type the .[dot] in a new Line and press Enter )
quit

Management via the browser

For management I tent to use webmin (also as I am more or securely behind a firewall ). Installation is straightforward and instructions can be found here: http://www.webmin.com/

If all works this is what it looks like: 

One of the things I really like if the section 'Historic System Statistics' (webmin stats http://webminstats.sourceforge.net/) 


So that's it, hope this is of any use!


Geen opmerkingen:

Een reactie plaatsen