Nagios Plugin to Check Extreme Networks Devices

Over at INEX we’ve embarked on a forklift upgrade of the primary peering LAN using Extreme Networks Summit x670’s and x460’s. As usual, we need to monitor these 24/7 and we have just written a new Extreme Networks chassis monitoring script which should work with most Extreme devices.

It will check and generate alerts on the following items:

  • a warning if the device was recently rebooted;
  • a warning / critical if any found temperature sensors are in a non-normal state;
  • a warning / critical if any found fans are in a non-normal state;
  • a warning / critical if any found PSUs are in a non-normal state (or missing);
  • a warning / critical if the 5 sec CPU utilisation is above set thresholds;
  • a warning / critical if the memory utilisation is above set thresholds.

You’ll find the script in this Github repository: barryo/nagios-plugins.

A some verbose output follows:

./check_chassis_extreme.php -c  -h 10.11.12.13 -v 
 
CPU: 5sec - 4% 
Last reboot: 4007.7666666667 minutes ago 
Uptime: 2.8 days. 
PSU: 1 - presentOK (Serial: 1429W-xxxxx; Source: ac) 
PSU: 2 - presentOK (Serial: 1429W-xxxxx; Source: ac) 
Fan: 101 - OK (4388 RPM) 
Fan: 102 - OK (9273 RPM) 
Fan: 103 - OK (4428 RPM) 
Fan: 104 - OK (9273 RPM) 
Fan: 105 - OK (4551 RPM) 
Fan: 106 - OK (9452 RPM) 
Temp: 39'C 
Over temp alert: NO 
Memory used in slot 1: 29% 
OK - CPU: 5sec - 4%. Uptime: 2.8 days.  PSUs: 1 - presentOK; 
  2 - presentOK;. Overall system power state: redundant power 
  available. Fans: [101 - OK (4388 RPM)]; [102 - OK (9273 RPM)];
  [103 - OK (4428 RPM)]; [104 - OK (9273 RPM)]; [105 - OK (4551 
  RPM)]; [106 - OK (9452 RPM)];. Temp: 39'C. 
  Memory (slot:usage%): 1:29%.

Nagios / Icinga Alerts via Pushover

I came across Pushover recently which makes it easy to send real-time notifications to your Android and iOS devices. And easy it is. It also allows you to set up applications with logos so that you can have multiple Nagios installations shunting alerts to you via Pushover with each one easily identifiable. After just a day playing with this, it’s much nicer than SMS’.

So, to set up Pushover with Nagios, first register for a free Pushover account. Then create a new application for your Nagios instance. I set the type to Script and also upload a logo. After this, you will be armed with two crucial pieces of information: your application API tokan/key ($APP_KEY) and your user key ($USER_KEY).

To get the notification script, clone this GitHub repository or just down this file – notify-by-pushover.php.

You can test this immediately with:

echo "Test message" | \
    ./notify-by-pushover.php HOST $APP_KEY $USER_KEY RECOVERY OK

The parameters are:

USAGE: notify-by-pushover.php  <$APP_KEY> \
    <$USER_KEY> <NOTIFICATIONTYPE>

Now, set up the new notifications in Nagios / Icinga:

# 'notify-by-pushover-service' command definition
define command{
    command_name notify-by-pushover-service
    command_line /usr/bin/printf "%b" "$NOTIFICATIONTYPE$: \
        $SERVICEDESC$@$HOSTNAME$: $SERVICESTATE$           \
        ($SERVICEOUTPUT$)" |                               \
      /usr/local/nagios-plugins/notify-by-pushover.php     \
        SERVICE $APP_KEY $CONTACTADDRESS1$                 \
        $NOTIFICATIONTYPE$ $SERVICESTATE$
}

# 'notify-by-pushover-host' command definition
define command{
  command_name notify-by-pushover-host
  command_line /usr/bin/printf "%b" "Host '$HOSTALIAS$'    \
        is $HOSTSTATE$: $HOSTOUTPUT$" |                    \
      /usr/local/nagios-plugins/notify-by-pushover.php     \
        HOST $APP_KEY $CONTACTADDRESS1$ $NOTIFICATIONTYPE$ \
        $HOSTSTATE$
}

Then, in your contact definition(s) add / update as follows:

define contact{
  contact_name ...
  ...
  service_notification_commands ...,notify-by-pushover-service
  host_notification_commands ...,notify-by-pushover-host
  address1 $USER_KEY
}

Make sure you break something to test that this works!

Monitoring SSL Certificate Expiry Dates with Nagios

It is good practice to separate Nagios checks of your web server being available from checking SSL certificate expiry. The latter need only be run once per day and should not add unnecessary noise to a more immediately important web service failure.

To use check_http to monitor SSL certificate expiry dates, first ensure you have a daily service definition – let’s call this service-daily. Now create two service commands as follows:

define command{
    command_name check_cert
    command_line /usr/lib/nagios/plugins/check_http -S \
        -I $HOSTADDRESS$ -w 5 -c 10 -p $ARG1$ -C $ARG2$
}

define command{
    command_name check_named_cert
    command_line /usr/lib/nagios/plugins/check_http -S \
        -I $ARG3$ -w 5 -c 10 -p $ARG1$ -C $ARG2$
}

The second is useful for checking named certificates on additional IP addresses on web servers serving multiple SSL domains.

We can use these to check SSL certificates for POP3, IMAP, SMTP and HTTP:

define service{
    use service-daily
    host_name mailserver
    service_description POP3 SSL Certificate
    check_command check_cert!993!21
}

define service{
    use service-daily
    host_name mailserver
    service_description IMAP SSL Certificate
    check_command check_cert!995!21
}

define service{
    use service-daily
    host_name mailserver
    service_description SMPT SSL Certificate
    check_command check_cert!465!21
}

define service{
    use service-daily
    host_name webserver
    service_description SSL Cert: www.example.com
    check_command check_named_cert!443!21!www.example.com
}

define service{
    use service-daily
    host_name webserver
    service_description SSL Cert: www.example.net
    check_command check_named_cert!443!21!www.example.net
}

Nagios Plugin for Checking Backups via rsnapshot

We’ve just added a check_rsnapshot.php script to our nagios-plugins bundle on Github. This script will verify rsnapshot backups via Nagios using a number of checks / tests:

  • minfiles – checks the number of files in a snapshot against a minimum expected number;
  • minsize – checks the size of a snapshot against a minimum expected size;
  • log – parses the rsnapshot log to ensure the most recent runs for each retention period completed successfully;
  • timestamp – checks for files created server side containing a timestamp and thus ensuring snapshots are succeeding;
  • rotation – checks that retention directories are being rotated; and
  • dir-creation – checks that retention directories are being created.

Please see this Github wiki page for more information including instructions.

We’ve Released Some of our Nagios Plugins

We create a lot of Nagios installations for our own systems over, for customer systems which we manage and as a service over at Open Solutions. We’ve written a lot of custom Nagios plugins over the years as part of this process.

We are now making a concerted effort to find them, clean them, maintain them centrally and release them for the good of others.

To that end, we have created a repository on GitHub for the task with a detailed readme file:

They main goal of Nagios plugins that we write and release are:

  • BSD (or BSD like) license so you can hack away to wield into something that may be more suitable for your own environment;
  • scalable in that if we are polling power supply units (PSUs) in a Cisco switch then it should not matter if there is one or a hundred – the script should handle them all;
  • WARNINGs are designed for email notifications during working hours; CRITICAL means an out of hours text / SMS message;
  • each script should be an independant unit with no dependancies on each other or unusual Perl module requirements;
  • the scripts should all be run with the --verbose on new kit. This will provide an inventory of what it finds as well as show anything that is being skipped. OIDs searched for by the script but reported as not supported on the target device should really be skipped via various --skip-xxx options.
  • useful help available via --help or -?

Some New Nagios Plugins

Over the past ten years I have left many many new and hacked Nagios plugins on many servers around the globe. I’m now making a concerted effort to find them, clean them, maintain them centrally and release them.

To that end, I have created a repository on GitHub for the task with a detailed readme file:

As a starting point, there are four plugins available now:

  • check_chassis_cisco.pl – a script to poll a Cisco switch or router and check if the device was recently rebooted; its temperature sensors; its fans; its PSU; its CPU utilisation; and its memory usage.

 

  • check_chassis_server.pl – a script to poll a Linux / BSD server and check its load average; memory and swap usage; and if it has been recently rebooted.

 

  • check_portsecurity.pl – a script to check all ports on a Cisco switch and issues a critical alert if port security has been triggered resulting in a shutdown port on the device.

 

  • check_portstatus.pl – a script which will issue warnings if the port status on any Ethernet (by default) port on a Cisco switch has changed within the last hour (by default). I.e. a port up or a port down event.