Debugging NFS Slowness

During patching for the recent GHOST bug, I updated all packages (including kernel) on a Ubuntu 14.04 file server (filer). This filer provided static content (mainly tens of thousands of images) to a number of web servers. You can see the effect in the following load graph from the filer:

Load average on the filer
Load average on the filer

You may notice from the above, that there were actually two issues. The first was solved by upgrading the filer from 14.04 to 14.10 based on a number of online references to symptoms and fixes. About an hour after this upgrade, a new form of NFS slowness manifested and, needless to say, sites that rendered in <1sec were now taking >15secs.

Diagnosing the second issue took a while longer but some tips and utilities include:

  • check /var/log and see if any log files are increasing rapidly;
  • check top and check any processes with high / unusual utilisation;
  • use iostat (apt-get install sysstat) and pay particular attention to any devices with high volumes of transactions per second. In my case it was the root filesystem rather than any of the mounted partitions exported by NFS.
  • use iotop (apt-get install iotop) and note any processes with high utilisation (in my case jbd2/xvda1-8 was at 100% and xvda1-8 is my root partition)

The jbd2 process is the ext4 journaling process. At this point you can evaluate fsck’ing your partition but I wanted to see if I could discover what was happening here. I enabled some debugging via:

# enable tracing:
echo 1 > /sys/kernel/debug/tracing/events/ext4/ext4_sync_file_enter/enable
# wait a couple of seconds and:
cat /sys/kernel/debug/tracing/trace
# and disable tracing:
echo 0 > /sys/kernel/debug/tracing/events/ext4/ext4_sync_file_enter/enable

What I found were lots of:

nfsd-2085  [001] .... 53730942.155573: ext4_sync_file_enter: dev 202,1 ino 276278 parent 149955 datasync 0
nfsd-2071  [001] .... 53730942.158743: ext4_sync_file_enter: dev 202,1 ino 276278 parent 149955 datasync 0
...

where every entry related to the same inode number (276278). We found this via:

find / -inum 276278
/var/lib/nfs/v4recovery

The solution was to stop nfs_kernal_server, remove that directory entirely, add it back and restart the nfs_kernel_server. We got the permissions wrong on the first attempt but this’ll be obvious from dmesg / kernel log messages such as:

kernel: [53731827.778104] NFSD: Failed to remove expired client state directory 8d97cccceb37641d3804a84683a9282a
kernel: [53731827.779204] NFSD: failed to write recovery record (err -13); please check that /var/lib/nfs/v4recovery exists and is writeableNFSD: Failed to remove expired client state directory 8d97cccceb37641d3804a84683a9282a

Development Contracts

At Open Solutions, we tend to undertake a lot of fixed price contracts to develop web applications. In fact, clients usually insist on fixed price contracts as they want to know in advance what the bill will be.

However, fixed price contracts have big negatives for both parties:

  • for the client, a fixed price contract can often limit them to their earliest ideas. Now, as a service provider, we want to be flexible and so we’re happy to chop and change as a project develops. But, this leads to:
  • for the service provider, if change and revision requests are not carefully managed agreed and billed for, the service provider could very quickly end up making a loss on the contract and thus find themselves in the position of funding their clients project!

To this end, we’ve recently been reviewing various web development contracts and have found some nice inspiration for basing our own on.

Following the success of Killer Contract, Andy wrote a plain language NDA (also available as a Gist).

Virtual Mail with Ubuntu, Postfix, Dovecot and ViMbAdmin

As part of pushing our new release of ViMbAdmin, I wrote up a mini how-to for setting up a virtual email system on Ubuntu where the components are:

  • Postfix as the SMTP engine;
  • Dovecot for IMAP. POP3, Sieve and LMTP;
  • ViMbAdmin as the domain / mailbox / alias management system via web interface.

It supports a number of features including mailbox archival and deletion, quota support and display of mailbox sizes (as well as per domain totals).

Find the how-to at:

Querying Cisco MST Port Roles via SNMP with OSS_SNMP

OSS_SNMP is a PHP SNMP library written by myself for people who hate SNMP. After a customer migration from PVST to MST (Multiple Spanning Tree), I have added a number of MST functions / MIBs to OSS_SNMP:

During a fairly significant network migration involving breaking / connecting a number of links, I wanted to be able to monitor the MST port role of significant ports at a glance. For this purpose, I wrote the mst-port-roles.php script and have committed it as an example to OSS_SNMP. First, here is what it looks like when run on the command line (with hostnames obfuscated):

MST Port RolesFrom a very simple array of port details at the top of the script, it will poll all switches and for each port print:

  • device and port name;
  • port state and speed;
  • port role for each applicable MST instance.

I run it on bash and use bash colouring. The script is well documented and can easily be repurposed for other networks. You’ll find the source here.

Bird / Quagga with MD5 Support for IPv4/6 on FreeBSD & Linux

Over in INEX we run a route server cluster which alleviates the burden of setting up bilateral peering sessions for the more than 80% of the members that use them. The current hardware is now about six years old and we have a forklift upgrade in the works.

BGP allows for MD5 authentication between clients (using the TCP MD5 signature option, see RFC 2385) and – while recently obsoleted in RFC 5925 – it is still widely used in shared LAN mediums such as IXPs; primarily to prevent packet spoofing and session hijacking via recycled IP addresses.

Our current route server implementation runs on FreeBSD which does not support TCP MD5 in its stock kernel (you are required to compile a custom kernel – see below for details). Additionally, specifying the session MD5 is not done in the BGP daemon configuration but separately in the IPsec configuration. Lastly, our current FreeBSD version has no support for TCP MD5  over IPv6. These have all led to unnecessarily complex configurations and a degree of confusion.

Because of this, we decided to test up to date Linux and FreeBSD versions for native IPv4 and IPv6 TCP MD5 support with Bird and Quagga (our route server daemons of choice).

In each case, BGP sessions were tested for:

  • no MD5 on each end (expected to work);
  • same MD5 on each end (expected to work);
  • different MD5 on each end (expected not to work); and
  • MD5 on one end with no MD5 on the other end (expected not to work).

For Linux, the platform chosen was Ubuntu 12.04 LTS with the stock 3.2.0-40-generic kernel.

  • Sessions were tested for Quagga to Quagga and Quagga to Bird;
  • Sessions were tested over both IPv4 and IPv6;
  • The presence of valid MD5 signatures were confirmed using tcpdump -M xxx;
  • Stock Quagga and Bird from the 12.04 apt repositories were used.

The results – everything worked and worked as expected:

  • BGP sessions only established when expected (no MD5 configured, same MD5 configured);
  • This held for both IPv4 and IPv6.

Summary: Linux will support TCP MD5 nativily for IPv4 and IPv6 when using Quagga or Bird.

For FreeBSD, we used the latest production release of 9.1. TCP MD5 support is not compiled in by default so a custom kernel must be built with the additional options of:

options   TCP_SIGNATURE
options   IPSEC
device    crypto
device    cryptodev

In addition to this, the MD5 shared secrets need to be added to the IPsec SA/SD database via the setkey utility or, preferably, via the /etc/ipsec.conf file which, for example, would contain entries for IPv4 and IPv6 addresses such as:

add 192.0.2.1 192.0.2.2 tcp 0x1000 -A tcp-md5 "supersecret1";
add 2001:db8::1 2001:db8::2 tcp 0x1000 -A tcp-md5 "supersecret2";

where the addresses ending in .1/:1 are local and .2/:2 are the BGP neighbor addresses. This file can be processed by setting ipsec_enable="YES" in /etc/rc.conf and executing /etc/rc.d/ipsec reload.

  • Sessions were tested for Quagga/Linux to Quagga/FreeBSD and  from Quagga/Linux to Bird/FreeBSD;
  • Sessions were tested over both IPv4 and IPv6;
  • The presence of valid MD5 signatures were confirmed using tcpdump -M xxx;
  • Stock Quagga from the 12.04 apt repositories and stock Quagga and Bird from FreeBSD ports were used.

The results – almost everything worked and worked as expected:

  • BGP sessions only established when expected (no MD5 configured, same MD5 configured);
  • This held for both IPv4 and IPv6;
  • one odd but expected behavior – you only need to set the MD5 via setkey / ipsec.conf – setting it (or not) in the Quagga and Bird config has no effect so long as it is set via setkey (but is useful for documentation purposes). However, trying to set it in Quagga without having rebuilt the kernel will result in an error.

Summary: FreeBSD will support TCP MD5 via a custom kernel and setkey / ipsec.conf for IPv4 and IPv6. Note that there is an additional complexity when changing or removing MD5 passwords as these need to be amended / deleted via setkey which can put an extra burden on automatic route server configuration generators.

Translating SNMP OIDs Using MIB Files

I get caught trying to remember this a lot and there’s a really useful tutorial on this at the Net-SNMP website: Using and loading MIBS.

If you’re using Ubuntu, also consider checking the comments in /etc/snmp/snmp.conf which (in 13.04) contains:

As the snmp packages come without MIB files due to license reasons, loading of MIBs is disabled by default. If you added the MIBs you can reenable loading them by commenting out the following line.

Also, run the following:

apt-get install snmp-mibs-downloader

which will download some basic MIBs as part of the installation.

Nagios / Icinga Alerts via Pushover

I came across Pushover recently which makes it easy to send real-time notifications to your Android and iOS devices. And easy it is. It also allows you to set up applications with logos so that you can have multiple Nagios installations shunting alerts to you via Pushover with each one easily identifiable. After just a day playing with this, it’s much nicer than SMS’.

So, to set up Pushover with Nagios, first register for a free Pushover account. Then create a new application for your Nagios instance. I set the type to Script and also upload a logo. After this, you will be armed with two crucial pieces of information: your application API tokan/key ($APP_KEY) and your user key ($USER_KEY).

To get the notification script, clone this GitHub repository or just down this file – notify-by-pushover.php.

You can test this immediately with:

echo "Test message" | \
    ./notify-by-pushover.php HOST $APP_KEY $USER_KEY RECOVERY OK

The parameters are:

USAGE: notify-by-pushover.php  <$APP_KEY> \
    <$USER_KEY> <NOTIFICATIONTYPE>

Now, set up the new notifications in Nagios / Icinga:

# 'notify-by-pushover-service' command definition
define command{
    command_name notify-by-pushover-service
    command_line /usr/bin/printf "%b" "$NOTIFICATIONTYPE$: \
        $SERVICEDESC$@$HOSTNAME$: $SERVICESTATE$           \
        ($SERVICEOUTPUT$)" |                               \
      /usr/local/nagios-plugins/notify-by-pushover.php     \
        SERVICE $APP_KEY $CONTACTADDRESS1$                 \
        $NOTIFICATIONTYPE$ $SERVICESTATE$
}

# 'notify-by-pushover-host' command definition
define command{
  command_name notify-by-pushover-host
  command_line /usr/bin/printf "%b" "Host '$HOSTALIAS$'    \
        is $HOSTSTATE$: $HOSTOUTPUT$" |                    \
      /usr/local/nagios-plugins/notify-by-pushover.php     \
        HOST $APP_KEY $CONTACTADDRESS1$ $NOTIFICATIONTYPE$ \
        $HOSTSTATE$
}

Then, in your contact definition(s) add / update as follows:

define contact{
  contact_name ...
  ...
  service_notification_commands ...,notify-by-pushover-service
  host_notification_commands ...,notify-by-pushover-host
  address1 $USER_KEY
}

Make sure you break something to test that this works!