Gerard JP van Westen: Part III : ZFS - Scripts - Howto: Ubuntu Home Server (ZFS + virtual IpFire)

Part I : Ubuntu install and setting up IpFire
Part II : Setting up ZFS
-->Part III : Setting up hardware monitoring and webmin (this post)<--
Part IV: How much space do you lose with a RaidZ1/RaidZ2/RaidZ3?
Part V : Poor man's deduplication

Setting up hardware monitoring and webmin

Why hardware monitoring

Current consumer electronics are equipped with a number of hardware monitoring sensors. Given that this server will be running 24/7 and will be hosting some of my most important data, I would like to have the piece of mind that nothing will go wrong. Moreover, Linux provides us with the excellent functionality of sending an email upon disk failure, hardware failure, etc.

CPU & Chipset temperature + fanspeed

CPU and chipset are monitored by LM-sensors. Webmin (which we will use for our combined interface) also supports these. Installation is easy :

sudo apt-get install lm-sensors

HDD temperature

HDD temp can be monitored via the hdd temp deamon install is again easy:

sudo apt-get install hddtemp

Automatically reading sensors

Both sensors and hdd temp are automagically read every hour by a cron job using the script below. Name the script eg. sensor-reading.sh and place in ' /usr/sbin ' . chmod +x so it is executable. The results are written to a file in the zfspool

#!/bin/sh

#script by Gertdus

echo `date +"%H_%M_%d_%m_%Y"`

cd /zfspool/ServerLogs

DIR=`date +"%Y_%m_%d"`

FILENAME=`date +"%H_%M"`

echo "$FILENAME"

echo "$DIR"

mkdir $DIR

cd $DIR

date > $FILENAME

sensors >> $FILENAME

hddtemp /dev/sda >> $FILENAME

hddtemp /dev/sdb >> $FILENAME

hddtemp /dev/sdc >> $FILENAME

hddtemp /dev/sdd >> $FILENAME

hddtemp /dev/sde >> $FILENAME

hddtemp /dev/sdf >> $FILENAME

hddtemp /dev/sdg >> $FILENAME

hddtemp /dev/sdh >> $FILENAME

mv $FILENAME $FILENAME.txt

SMART monitoring

Finally, smart info is monitored via smartmontools, again install:

sudo apt-get install smartmontools

I use a daily and monthly script. Daily read out and monthly short test + read out. Results are written to a log file in the zfs pool ('/zfspool/serverlogs/')

Daily script, name eg. smart-test.sh and place in ' /usr/sbin ' . chmod +x so it is executable.

# Script by Meliorator. irc://irc.freenode.net/Meliorator

# modified by Ranpha and Gertdus

cd /zfspool/ServerLogs

DIR=`date +"%Y_%m_%d"`

FILENAME=SMART

echo "$FILENAME"

echo "$DIR"

mkdir $DIR

cd $DIR

mkdir smart-logs

smartctl -a /dev/sda >> smart-logs/smart-sda.log

smartctl -a /dev/sdb >> smart-logs/smart-sdb.log

smartctl -a /dev/sdc >> smart-logs/smart-sdc.log

smartctl -a /dev/sdd >> smart-logs/smart-sdd.log

smartctl -a /dev/sde >> smart-logs/smart-sde.log

smartctl -a /dev/sdf >> smart-logs/smart-sdf.log

smartctl -a /dev/sdg >> smart-logs/smart-sdg.log

smartctl -a /dev/sdh >> smart-logs/smart-sdh.log

and the monthly script, name eg. smart-test_monthly.sh and place in ' /usr/sbin ' . chmod +x so it is executable.

# Script by Meliorator. irc://irc.freenode.net/Meliorator

# modified by Ranpha and Gertdus

cd /zfspool/ServerLogs

DIR=`date +"%Y_%m_%d"`

FILENAME=SMART

echo "$FILENAME"

echo "$DIR"

mkdir $DIR

cd $DIR

mkdir smart-logs

smartctl -H -l selftest -f -d -m root ata /dev/sda >> smart-logs/smart-sda.log

sleep 5

smartctl -H -l selftest -f -d -m root ata /dev/sdb >> smart-logs/smart-sdb.log

sleep 5

smartctl -H -l selftest -f -d -m root ata /dev/sdc >> smart-logs/smart-sdc.log

sleep 5

smartctl -H -l selftest -f -d -m root ata /dev/sdd >> smart-logs/smart-sdd.log

sleep 5

smartctl -H -l selftest -f -d -m root ata /dev/sde >> smart-logs/smart-sde.log

sleep 5

smartctl -H -l selftest -f -d -m root ata /dev/sdf >> smart-logs/smart-sdf.log

sleep 5

smartctl -H -l selftest -f -d -m root ata /dev/sdg >> smart-logs/smart-sdg.log

sleep 5

smartctl -H -l selftest -f -d -m root ata /dev/sdh >> smart-logs/smart-sdh.log

sleep 5

smartctl -a /dev/sda >> smart-logs/smart-sda.log

smartctl -a /dev/sdb >> smart-logs/smart-sdb.log

smartctl -a /dev/sdc >> smart-logs/smart-sdc.log

smartctl -a /dev/sdd >> smart-logs/smart-sdd.log

smartctl -a /dev/sde >> smart-logs/smart-sde.log

smartctl -a /dev/sdf >> smart-logs/smart-sdf.log

smartctl -a /dev/sdg >> smart-logs/smart-sdg.log

smartctl -a /dev/sdh >> smart-logs/smart-sdh.log

ZFS Health monitoring

ZFS health is monitored by the script from calomel.org. However I have added the change that if all is well, an email is sent saying all is well.

Name the script eg. zfs-health.sh and place in ' /usr/sbin ' . chmod +x so it is executable.

#! /bin/bash

# Calomel.org

# https://calomel.org/zfs_health_check_script.html

# FreeBSD 9.1 ZFS Health Check script

# zfs_health.sh @ Version 0.15

# Check health of ZFS volumes and drives. On any faults send email. In FreeBSD

# 10 there is supposed to be a ZFSd daemon to monitor the health of the ZFS

# pools. For now, in FreeBSD 9, we will make our own checks and run this script

# through cron a few times a day.

# 99 problems but ZFS aint one

problems=0

# Health - Check if all zfs volumes are in good condition. We are looking for

# any keyword signifying a degraded or broken array.

if [ "${condition}" ]; then

emailSubject="`hostname` - ZFS pool - HEALTH fault"

problems=1

# Capacity - Make sure pool capacities are below 80% for best performance. The

# percentage really depends on how large your volume is. If you have a 128GB

# SSD then 80% is reasonable. If you have a 60TB raid-z2 array then you can

# probably set the warning closer to 95%.

# ZFS uses a copy-on-write scheme. The file system writes new data to

# sequential free blocks first and when the uberblock has been updated the new

# inode pointers become valid. This method is true only when the pool has

# enough free sequential blocks. If the pool is at capacity and space limited,

# ZFS will be have to randomly write blocks. This means ZFS can not create an

# optimal set of sequential writes and write performance is severely impacted.

maxCapacity=80

if [ ${problems} -eq 0 ]; then

capacity=$(/sbin/zpool list -H -o capacity)

for line in ${capacity//%/}

if [ $line -ge $maxCapacity ]; then

emailSubject="`hostname` - ZFS pool - Capacity Exceeded"

problems=1

done

# Errors - Check the columns for READ, WRITE and CKSUM (checksum) drive errors

# on all volumes and all drives using "zpool status". If any non-zero errors

# are reported an email will be sent out. You should then look to replace the

# faulty drive and run "zpool scrub" on the affected volume after resilvering.

if [ ${problems} -eq 0 ]; then

errors=$(/sbin/zpool status | grep ONLINE | grep -v state | awk '{print $3 $4 $5}' | grep -v 000)

if [ "${errors}" ]; then

emailSubject="`hostname` - ZFS pool - Drive Errors"

problems=1

# Scrub Expired - Check if all volumes have been scrubbed in at least the last

# 8 days. The general guide is to scrub volumes on desktop quality drives once

# a week and volumes on enterprise class drives once a month. You can always

# use cron to schedule "zpool scrub" in off hours. We scrub our volumes every

# Sunday morning for example.

# Scrubbing traverses all the data in the pool once and verifies all blocks can

# be read. Scrubbing proceeds as fast as the devices allows, though the

# priority of any I/O remains below that of normal calls. This operation might

# negatively impact performance, but the file system will remain usable and

# responsive while scrubbing occurs. To initiate an explicit scrub, use the

# "zpool scrub" command.

# The scrubExpire variable is in seconds. So for 8 days we calculate 8 days

# times 24 hours times 3600 seconds to equal 691200 seconds.

#30days

scrubExpire=2592000

if [ ${problems} -eq 0 ]; then

currentDate=$(date +%s)

zfsVolumes=$(/sbin/zpool list -H -o name)

for volume in ${zfsVolumes}

if [ $(/sbin/zpool status $volume | egrep -c "none requested") -ge 1 ]; then

echo "ERROR: You need to run \"zpool scrub $volume\" before this script can monitor the scrub expiration time."

break

if [ $(/sbin/zpool status $volume | egrep -c "scrub in progress|resilver") -ge 1 ]; then

break

### FreeBSD with *nix supported date format

#scrubRawDate=$(/sbin/zpool status $volume | grep scrub | awk '{print $15 $12 $13}')

#scrubDate=$(date -j -f '%Y%b%e-%H%M%S' $scrubRawDate'-000000' +%s)

### Ubuntu with GNU supported date format

scrubRawDate=$(/sbin/zpool status $volume | grep scrub | awk '{print $11" "$12" " $13" " $14" "$15}')

scrubDate=$(date -d "$scrubRawDate" +%s)

if [ $(($currentDate - $scrubDate)) -ge $scrubExpire ]; then

emailSubject="`hostname` - ZFS pool - Scrub Time Expired. Scrub Needed on $volume"

problems=1

done

# Notifications - On any problems send email with drive status information and

# capacities including a helpful subject line to root. Also use logger to write

# the email subject to the local logs. This is the place you may want to put

# any other notifications like:

# + Update an anonymous twitter account with your ZFS status (https://twitter.com/zfsmonitor)

# + Playing a sound file or beep the internal speaker

# + Update Nagios, Cacti, Zabbix, Munin or even BigBrother

if [ "$problems" -ne 0 ]; then

echo -e "$emailSubject \n\n\n `/sbin/zpool list` \n\n\n `/sbin/zpool status`" | mail -s "$emailSubject" root

logger $emailSubject

if [ "$problems" -eq 0 ]; then

echo "ZFS Healthy"

### EOF ###

Scheduling via CRON

Now simply add these scripts to the crontab so they are automagically executed. If you have used the names above :

sudo leafpad /etc/crontab

# m h dom mon dow user command
29 * * * * root /usr/sbin/zfs-health.sh
30 20 * * * root /usr/sbin/smart-test.sh
31 * * * * root /usr/sbin/sensor-reading.sh
32 20 13 * * root /usr/sbin/smart-test_monthly.sh

Columns indicate
Minute Hour Day Month Day-ofWeek , if * is annotated it is done every minute /hour /etc.

Here zfs health is read every hour at minute x.29 (column 1), and sensors at minute x.31.
Smart read is done daily at 20.30, and smart test monthly on the 13th @ 20.32.

Emailing via postfix

If you have not done so, set up postfix. I have forwarded the root email to my own email for convenience. This way you will receive an email when the cronjob is executed.

echo "bla@domain.com" > /root/.forward

Test postfix

sudo sendmail bla@domain.com

blabla

Test postfix

telnet localhost 25

ehlo localhost
mail from: root@localhost
rcpt to: root@localhost
data
Subject: My first mail on Postfix

Hi,
Are you there?
regards,
Admin
. (Type the .[dot] in a new Line and press Enter )
quit

Management via the browser

For management I tent to use webmin (also as I am more or securely behind a firewall ). Installation is straightforward and instructions can be found here: http://www.webmin.com/

If all works this is what it looks like:

One of the things I really like if the section 'Historic System Statistics' (webmin stats http://webminstats.sourceforge.net/)

So that's it, hope this is of any use!

Gerard JP van Westen

zondag 24 augustus 2014

Part III : ZFS - Scripts - Howto: Ubuntu Home Server (ZFS + virtual IpFire)

Part I : Ubuntu install and setting up IpFire
Part II : Setting up ZFS
-->Part III : Setting up hardware monitoring and webmin (this post)<--
Part IV: How much space do you lose with a RaidZ1/RaidZ2/RaidZ3?
Part V : Poor man's deduplication

Setting up hardware monitoring and webmin

Table of Contents:

Why hardware monitoring

CPU & Chipset temperature + fanspeed

HDD temperature

Automatically reading sensors

SMART monitoring

ZFS Health monitoring

Scheduling via CRON

Emailing via postfix

Test postfix

Management via the browser

Geen opmerkingen:

Een reactie posten

zondag 24 augustus 2014

Part III : ZFS - Scripts - Howto: Ubuntu Home Server (ZFS + virtual IpFire)

Part I : Ubuntu install and setting up IpFire Part II : Setting up ZFS -->Part III : Setting up hardware monitoring and webmin (this post)<-- Part IV: How much space do you lose with a RaidZ1/RaidZ2/RaidZ3? Part V : Poor man's deduplication

Setting up hardware monitoring and webmin

Table of Contents:

Why hardware monitoring

CPU & Chipset temperature + fanspeed

HDD temperature

Automatically reading sensors

SMART monitoring

ZFS Health monitoring

Scheduling via CRON

Emailing via postfix

Test postfix

Management via the browser

Geen opmerkingen:

Een reactie posten

Part I : Ubuntu install and setting up IpFire
Part II : Setting up ZFS
-->Part III : Setting up hardware monitoring and webmin (this post)<--
Part IV: How much space do you lose with a RaidZ1/RaidZ2/RaidZ3?
Part V : Poor man's deduplication