Nagios / Icinga Alerts via Pushover

I came across Pushover recently which makes it easy to send real-time notifications to your Android and iOS devices. And easy it is. It also allows you to set up applications with logos so that you can have multiple Nagios installations shunting alerts to you via Pushover with each one easily identifiable. After just a day playing with this, it’s much nicer than SMS’.

So, to set up Pushover with Nagios, first register for a free Pushover account. Then create a new application for your Nagios instance. I set the type to Script and also upload a logo. After this, you will be armed with two crucial pieces of information: your application API tokan/key ($APP_KEY) and your user key ($USER_KEY).

To get the notification script, clone this GitHub repository or just down this file – notify-by-pushover.php.

You can test this immediately with:

echo "Test message" | \
    ./notify-by-pushover.php HOST $APP_KEY $USER_KEY RECOVERY OK

The parameters are:

USAGE: notify-by-pushover.php  <$APP_KEY> \
    <$USER_KEY> <NOTIFICATIONTYPE>

Now, set up the new notifications in Nagios / Icinga:

# 'notify-by-pushover-service' command definition
define command{
    command_name notify-by-pushover-service
    command_line /usr/bin/printf "%b" "$NOTIFICATIONTYPE$: \
        $SERVICEDESC$@$HOSTNAME$: $SERVICESTATE$           \
        ($SERVICEOUTPUT$)" |                               \
      /usr/local/nagios-plugins/notify-by-pushover.php     \
        SERVICE $APP_KEY $CONTACTADDRESS1$                 \
        $NOTIFICATIONTYPE$ $SERVICESTATE$
}

# 'notify-by-pushover-host' command definition
define command{
  command_name notify-by-pushover-host
  command_line /usr/bin/printf "%b" "Host '$HOSTALIAS$'    \
        is $HOSTSTATE$: $HOSTOUTPUT$" |                    \
      /usr/local/nagios-plugins/notify-by-pushover.php     \
        HOST $APP_KEY $CONTACTADDRESS1$ $NOTIFICATIONTYPE$ \
        $HOSTSTATE$
}

Then, in your contact definition(s) add / update as follows:

define contact{
  contact_name ...
  ...
  service_notification_commands ...,notify-by-pushover-service
  host_notification_commands ...,notify-by-pushover-host
  address1 $USER_KEY
}

Make sure you break something to test that this works!

Monitoring LDAP – Example with Munin

Following up from my articles on Creating an LDAP Addressbook / Directory, then Securing LDAP with TLS / SSL and Multi-Master LDAP Replication; I’ll now look at monitoring LDAP with Munin as an immediate example and Nagios to follow.

First we need to enable monitoring on LDAP – execute:

cat <<EOF | ldapmodify -Y EXTERNAL -H ldapi:///
dn: cn=module{0},cn=config
changetype: modify
add: olcModuleLoad
olcModuleLoad: {2}back_monitor.la
EOF

after ensuring {2} is the appropriate next sequence for oldModuleLoad. You can check this my running:

ldapsearch -Y EXTERNAL -H ldapi:/// -b cn=module{0},cn=config

Now create a user with access to the monitoring information:

cat <<EOF | ldapadd -H ldapi:/// -D cn=admin,dc=nodomain -w h.TDVyELBjm0g
dn: cn=monitor,dc=nodomain
objectClass: simpleSecurityObject
objectClass: organizationalRole
cn: monitor
description: LDAP monitor
userPassword: cA.5rMfzHw9vw
EOF

Lastly, configure the monitor database:

cat <<EOF | ldapadd -Y EXTERNAL -H ldapi:///
dn: olcDatabase={2}Monitor,cn=config
objectClass: olcDatabaseConfig
objectClass: olcMonitorConfig
olcDatabase: {2}Monitor
olcAccess: {0}to dn.subtree="cn=Monitor" 
  by dn.base="cn=monitor,dc=nodomain" read by * none
EOF

The monitoring module should now be active and you can test with:

ldapsearch -D cn=monitor,dc=nodomain -w cA.5rMfzHw9vw -H ldapi:/// -b cn=Monitor

Configuring Munin

Munin is a networked resource monitoring tool that can help analyze resource trends and “what just happened to kill our performance?” problems. It is designed to be very plug and play. A default installation provides a lot of graphs with almost no work.”

On Ubuntu, you can install Munin and the required packages for LDAP monitoring with:

apt-get install munin-node libnet-ldap-perl

Then edit /etc/munin/plugin-conf.d/munin-node and add a section such as:

[slapd_*]
env.server 127.0.0.1
env.binddn cn=monitor,dc=nodomain
env.bindpw cA.5rMfzHw9vw

During the install, Munin may have detected OpenLDAP and added appropriate symlinks. If it didn’t, you can possibly do it from the output of:

munin-node-configure --suggest --shell

For me (Ubuntu 12.10), slapd showed up with an error Wrong amount of autoconf which I haven’t debugged. Instead I just created the symlinks manually:

ln -s /usr/share/munin/plugins/slapd_ slapd_statistics_bytes
ln -s /usr/share/munin/plugins/slapd_ slapd_statistics_pdu
ln -s /usr/share/munin/plugins/slapd_ slapd_statistics_referrals
ln -s /usr/share/munin/plugins/slapd_ slapd_operations_diff
ln -s /usr/share/munin/plugins/slapd_ slapd_statistics_entries
ln -s /usr/share/munin/plugins/slapd_ slapd_connections
ln -s /usr/share/munin/plugins/slapd_ slapd_waiters
ln -s /usr/share/munin/plugins/slapd_ slapd_operations

And restart Munin:

service munin-node restart

Adventures with LDAP (OpenLDAP) – SSL, Multi-Master Replication and Monitoring

In my career to date, I successfully managed to avoid all but the periphery engagement in OpenLDAP. Until recently that is – we had to build a Microsoft Exchange like environment with open source software in a way that was closely integrated and easily managed. But, more on that another time. For anyone else diving into OpenLDAP, here are some articles on my experiences that I have penned:

Monitoring SSL Certificate Expiry Dates with Nagios

It is good practice to separate Nagios checks of your web server being available from checking SSL certificate expiry. The latter need only be run once per day and should not add unnecessary noise to a more immediately important web service failure.

To use check_http to monitor SSL certificate expiry dates, first ensure you have a daily service definition – let’s call this service-daily. Now create two service commands as follows:

define command{
    command_name check_cert
    command_line /usr/lib/nagios/plugins/check_http -S \
        -I $HOSTADDRESS$ -w 5 -c 10 -p $ARG1$ -C $ARG2$
}

define command{
    command_name check_named_cert
    command_line /usr/lib/nagios/plugins/check_http -S \
        -I $ARG3$ -w 5 -c 10 -p $ARG1$ -C $ARG2$
}

The second is useful for checking named certificates on additional IP addresses on web servers serving multiple SSL domains.

We can use these to check SSL certificates for POP3, IMAP, SMTP and HTTP:

define service{
    use service-daily
    host_name mailserver
    service_description POP3 SSL Certificate
    check_command check_cert!993!21
}

define service{
    use service-daily
    host_name mailserver
    service_description IMAP SSL Certificate
    check_command check_cert!995!21
}

define service{
    use service-daily
    host_name mailserver
    service_description SMPT SSL Certificate
    check_command check_cert!465!21
}

define service{
    use service-daily
    host_name webserver
    service_description SSL Cert: www.example.com
    check_command check_named_cert!443!21!www.example.com
}

define service{
    use service-daily
    host_name webserver
    service_description SSL Cert: www.example.net
    check_command check_named_cert!443!21!www.example.net
}

Nagios Plugin for Checking Backups via rsnapshot

We’ve just added a check_rsnapshot.php script to our nagios-plugins bundle on Github. This script will verify rsnapshot backups via Nagios using a number of checks / tests:

  • minfiles – checks the number of files in a snapshot against a minimum expected number;
  • minsize – checks the size of a snapshot against a minimum expected size;
  • log – parses the rsnapshot log to ensure the most recent runs for each retention period completed successfully;
  • timestamp – checks for files created server side containing a timestamp and thus ensuring snapshots are succeeding;
  • rotation – checks that retention directories are being rotated; and
  • dir-creation – checks that retention directories are being created.

Please see this Github wiki page for more information including instructions.

Analysing MySQL Slow Query Logs

MySQL has a really useful feature that allows it to log slow queries where slow is a minimum time defined by you in micro seconds. It helps a lot is diagnosing website outages or slow responsiveness issues after the fact.

Unfortunately I couldn’t find any nice graphical tools for analysing these but there are a few command line tools:

mysqldumpslow

MySQL’s own tool, mysqldumpslow, which aggregates queries and allows you to sort them by: query time or average query time; lock time or average lock time; rows sent or average rows sent; or the number of queries.

Percona’s MySQL Slow Query Log Analyser

Dating from 2006, Percona’s Peter Zaitsev wrote about their own version of a slow query log analyser (local copy) which has given me good results. Note that their micro time patch has since been incorporated into MySQL mainstream.

One of the main differences over MySQL’s own version is that as well as printing the aggregated query (with number and string literals wildcarded), it also prints a real example of the query allowing a copy and paste to MySQL for execution with EXPLAIN.

Example output with query details redacted:

### 230 Queries 
### Total time: 4708.948293, Average time: 20.4736882304348
### Taking 0.093420 to 203.693466 seconds to complete
### Rows analyzed 0 - 141008
SET timestamp=XXX;
SELECT ... FROM ... AS A 
        INNER JOIN ... AS C ON C.item_id = A.item_id 
    WHERE XXX AND C.item_lang = 'XXX' AND ... 
    ORDER BY CATALOG.item_sort LIMIT XXX;

SET timestamp=1348032761;
SELECT ... FROM ... AS A 
        INNER JOIN ... AS C ON C.item_id = A.item_id 
    WHERE 1 AND C.item_lang = '1' AND ... 
    ORDER BY C.item_sort LIMIT 1;

 

Centralised Logging

I’m currently looking at some centralised logging tools and the following stand out:

  • Octopussy – one I cam across a long time ago but looking at some of the others below it may be past its sell by date?
  • Graylog2 – GSOH (in dating parlance) – “Manage your logs in the dark and have lasers going and make it look like you’re from space.
  • logstash – “a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use (like, for searching). Speaking of searching, logstash comes with a web interface for searching and drilling into all of your logs.
  • Kibana - You have logs. Billions of lines of data. You shipped, dated it, parsed it and stored it. Now what do you do with it? Now you make sense of it… Kibana is an alternative browser based interface for Logstash… that allows you to efficiently search, graph, analyze and otherwise make sense of a mountain of logs.

Kibana has a Bootstrap UI and is written in PHP which immediately bumps it up my list 😉

Useful RANCID Debugging Tips

I always find it difficult to find a good reference for RANCID debugging strategies and, after spending the afternoon on doing same on one installation, put together my own list.

I always find it difficult to find a good reference for RANCID debugging strategies and, after spending the afternoon on doing same on one installation, put together my own list.

Note that in the following, I use clogin and rancid which assumes a Cisco device. Change to the appropriate variations if you’re not trying to work with a Cisco.

  1. Test logging into a device:
    > clogin rtr1.example.com
  2. Test logging into a device and a single command:
    > clogin -t 90 -c"show version" rtr1.example.com
  3. Test logging into a device and run a sequence of commands:
    > clogin -t 90 -c"show version;show calendar" rtr1.example.com
  4. Show what RANCID does with debugging output:
    > rancid -d rtr1.example.com

    If the above throws some errors (especially a list of missed commands, and if you’re using TACACS, ensure you have authorisation to run all the commands RANCID tries but logging into the router as the RANCID user and executing them one at a time.

  5. Same as (4) but record all router / switch output for analysis:
    > setenv NOPIPE YES
    > rancid -d rtr1.example.com

    and then complete output can be found in the file: rtr1.example.com.raw (in this example).

  6. Run RANCID on a single switch / router tree rather than all:
    > /usr/local/bin/rancid-run [tree]
  7. Run RANCID normally:
> /usr/local/bin/rancid-run
  1. Don’t forget that logs are available in RANCID’s logs/ directory.