Recent Posts

- Blogging to Victory
- An easy to use Nagios API
- Using Git Flow on GitHub
- Using the Nagios API
- Next Generation Monitoring
- SSH key forwarding and screen/tmux
- MongoDB disappoints me again.




Code

Most of the code referenced on this blog is available on my GitHub account here:

github.com/xb95

Contact

Twitter: @xb95
Freenode IRC: xb95
Send Email: mark@qq.is
Google Talk: smitty@gmail.com

About

Mark Smith is an engineer specializing in systems administration. He has worked for the likes of Google, Mozilla, StumbleUpon, and Bump doing a bit of everything.

Mark has a family (girlfriend, two small boys, and a cat) and lives in Mountain View, California. One day he wants to move far away from people and be a consultant or contractor.

MongoDB disappoints me again.

At my employer we use MongoDB for one of our core databases. I have never worked with it before I got here, but now I'm responsible for maintaining it so I have spent some decent amount of time banging on it and learning about it.

I'm impressed with the ease of use, configuration, and general maintenance. It seems to do things in a reasonably sane fashion most of the time. I am happy to recommend it to people with small to medium infrastructures who want to focus more on the application development and worry less about the administration overhead on the backend. For the most part, MongoDB just works.

There are a few things that make me less happy with the system, though, and lead me to recommend against using it for highly critical systems or once you pass a certain size. That brings us to today.

Last week, there was an odd issue where we restarted one of our MongoDB instances and when it came back up, some of the journal files were owned by root. This caused the database to stop processing the journal and it started falling behind. It also couldn't download further journal data from the master, so it was effectively doing no work.

Our monitoring didn't catch it (it wasn't yet replicating so it wasn't showing any replication lag), so it went a while without being noticed. When I finally did realize it was broken, I fixed the ownership of the files and restarted it. A while later, I checked back on the status and saw that the replication state was RECOVERING. Great! I went about my business content in the knowledge that it was now recovering from the problem and would be back up to speed at some point.

That was Thursday. Today, the machine has still not recovered and seems to be falling farther and farther behind. That's odd. We aren't doing so many writes on this cluster that I would expect it to be that overloaded -- and the other replica members aren't having these issues. In fact, as I started to dig into it, I realized that it was doing no useful work at all -- not progressing even a tiny bit.

I ended up in the log files and found:

Mon Jan 30 11:59:03 [replica set sync] replSet error RS102 too stale to catch up, at least from blahblahblah:27018
Mon Jan 30 11:59:03 [replica set sync] replSet our last optime : Jan 21 11:00:02 4f1aef12:d4
Mon Jan 30 11:59:03 [replica set sync] replSet oldest at blahblahblah:27018 : Jan 29 06:05:59 4f253627:90
Mon Jan 30 11:59:03 [replica set sync] replSet See http://www.mongodb.org/display/DOCS/Resyncing+a+Very+Stale+Replica+Set+Member
Mon Jan 30 11:59:03 [replica set sync] replSet error RS102 too stale to catch up
Mon Jan 30 11:59:03 [replica set sync] replSet RECOVERING

This is pretty obvious -- it's too far behind the master when it tried to recover, so the master doesn't have enough journal data to send it and it can't ever just come back up and recover. That's fine. I've been a MySQL DBA long enough to know that this happens in any replicated system. No foul here.

The problem, though, is that MongoDB uses the state RECOVERING. That word has a very well understood meaning -- that something has happened and that whatever it was will be over at some point in the future. It is currently recovering from the failure. It's really not, though! This instance will never recover from the state that it is in. A more appropriate word would be FAILED or ERROR or something that actually indicates that there is a problem that requires manual intervention!

I appreciate that MongoDB is a system that lends itself to ease of use and is very nice to set up. That's great. But if you want to be successful at companies with real traffic and usage, you have to build something that is reasonably sane for sysadmins to maintain. Our lives are already complicated enough with trying to manage dozens of systems built in thousands of ways -- if your system lies to me, I'm not going to feel comfortable with it and sure as heck won't recommend it to other companies!

The status fields of any system must be accurate. When you execute a SHOW SLAVE STATUS on MySQL, the Slave_IO_Running and Slave_SQL_Running columns need to be correct! If they're wrong, you suddenly can't trust the system and that takes it from a well-behaved system that is sane to administrate to a black hole of fail that is going to bite you in the ass at some point.

For this and other reasons, we're in the process of moving off of MongoDB. It was a great system when we were smaller, but we're beyond that now. We need systems that we don't have to fight. (To that end, I have a lot of positive things to say about Riak. That's a subject for a different day, though.)

End of rant.

View Comments // posted on 2012-01-30 at 10:35


SSH key forwarding and screen/tmux

If you just want the answer, skip to the end. This is written as an educational post and has a lot more detail than just how to solve this problem. Thanks!

If you're like me, you spend a lot of time connected to various servers. In any given day I'm using a dozen or more servers to accomplish whatever it is I'm setting out to do. I'm also bouncing between networks -- wired and wireless, typically, but also sometimes the wireless drops, or I want to walk across the building, or even dare to go home sometimes.

For years now, I've been taking advantage of screen (and more recently, a newer system called tmux) to allow me to keep state when I'm reconnecting from various locations. If you haven't used it, it's well worth the time to learn one of these tools.

Next time you launch that six hour job and realize, three hours later, that it's time to go home -- no problem. You can just leave it running in the screen session and reconnect tomorrow or from home or wherever you go next. No status lost.

The biggest problem with using screen is that, unless you have properly configured everything, you often run into a problem with SSH key forwarding.

Before We Begin

To really follow along here, you're going to need with you the machine you're working on, a remote machine that you will connect to, and an SSH key. Setting up SSH key access to your server is beyond the scope of this particularl tutorial.

Really, you will need to have two or more machines in your production environment, because this is really an advanced technique designed for places wehere you have to connect to many servers.

I assume that you have SSH key forwarding working already. You should be able to ssh user@host and not have to type a password (except maybe your SSH key passphrase).

How SSH works, in brief

SSH is a layered system. If you are familiar with the OSI model, you know that there are different layers that build up the networking stack that we're familiar with. When you connect to a web site, the stack usually looks something like this:

  • Layers 5-7: HTTP in your browser (Chrome, Firefox, Safari, IE, etc...)
  • Layer 4: TCP (provides reliable, ordered delivery of bytes)
  • Layer 3: IP (allows two machines to talk to each other across the Internet)
  • Layer 2: Ethernet (your NIC on your computer)
  • Layer 1: CAT-5/6 cable (or other physical connection)

Each layer has its own set of responsibilities and allows the layers on top of it to operate without knowing the intricacies of how everything else works. When you want to connect to 8.8.8.8 on port 53, you don't care that this involves an extremely complex system involving everything from routing to physically sending electrical impulses. It just works.

SSH has its own layers. When you fire up an SSH connection to a machine, you are really establishing several things:

  • The SSH transport layer
  • User authentication to the remote machine
  • A plethora of distinct SSH channels for moving data

The transport and authentication layers are responsible for establishing your initial connection to the remote server. Once that's done, SSH gives you channels for moving data back and forth. This is very similar to how IP gives you the ability to send data to a specific port -- the underlying data link layer (layer 2 in the OSI model) doesn't have that concept or care.

SSH uses a single TCP connection to a host to allow you to do many things over that single connection. If you are using port forwarding, SSH still uses a single TCP connection and multiplexes your forwarded connections, your shell, and whatever else you're doing all through the same pipe.

The problem statement

Now let's move to forwarding. In our example today, we're going to be using three machines. Your laptop will be named laptop (original, I know) and you will be first connecting to the machine named gateway. You have a screen session on that machine and you want to then connect to web01 and all of your other servers.

mark@laptop:~$ ssh gateway
mark@gateway:~$

When you type that command, SSH gets busy establishing a transport layer and performing user authentication. Since we're not debugging auth right now, let's just assume it works.

You are now presented with a shell on your remote machine. From this bare shell, you can connect off to your webserver and it should just work:

mark@gateway:~$ ssh web01
mark@web01:~$

Done. That was easy. If you just want to do this, there's really not much you have to do. Assuming your original SSH client is forwarding, you should be able to hop that to the next server.

But let's go back to our gateway machine and fire up screen...

mark@web01:~$ exit
mark@gateway:~$ screen

Now you will be back in a shell, but you will be inside of screen. I am also not going to give you a screen tutorial in this blog post. I will assume that you know how to basically use screen -- attach, detach, and reattach are all you really need to know for this.

From inside of screen, now SSH to your webserver. It works! But wait, you haven't done anything to configure anything yet! That's right, it'll work ... for now. Go ahead and detach from screen (detach -- don't terminate!) and then log out of your gateway machine.

mark@gateway:~$ ^ad
[detached from 23038.main]

mark@gateway:~$ exit
mark@laptop:~$

You are now back on your laptop, but your screen is still running. Reconnect to gateway and reattach your screen and then try to connect to your web server:

mark@laptop:~$ ssh gateway
mark@gateway:~$ screen -r
mark@gateway:~$ ssh web01
mark@web01's password:

You get a password prompt -- you aren't allowed in! How did this happen?

SSH forwarding, how it works

On gateway, after establishing the SSH connection, take a look at the environment of your shell:

mark@gateway:~$ env | grep SSH
SSH_CLIENT=68.38.123.35 45926 22
SSH_TTY=/dev/pts/0
SSH_CONNECTION=68.38.123.35 48926 10.1.35.23 22
SSH_AUTH_SOCK=/tmp/ssh-hRNwjA1342/agent.1342

The important one here is SSH_AUTH_SOCK which is currently set to some file in /tmp. If you examine this file, you'll see that it's a Unix domain socket -- and is connected to the particular instance of ssh that you connected in on. Importantly, this changes every time you connect.

As soon as you log out, that particular socket file is gone. Now, if you go and reattach your screen, you'll see the problem. It has the environment from when screen was originally launched -- which could have been weeks ago. That particular socket is long since dead.

From inside of screen, your shell has no idea that there is real SSH authentication socket somewhere else. It just knows that the one you have told it to use doesn't exist.

Solving the crisis

There are several ways of solving this problem. I believe the following to be the easiest and most reliable of the ones I've tried. This works in bash and zsh and probably will work in other shells as well.

Solution: since we know the problem has to do with knowing where the currently live SSH authentication socket is, let's just put it in a predictable place!

In your .bashrc or .zshrc file, add the following:

# Predictable SSH authentication socket location.
SOCK="/tmp/ssh-agent-$USER-screen"
if test $SSH_AUTH_SOCK && [ $SSH_AUTH_SOCK != $SOCK ]
then
    rm -f /tmp/ssh-agent-$USER-screen
    ln -sf $SSH_AUTH_SOCK $SOCK
    export SSH_AUTH_SOCK=$SOCK
fi

That's it. Make sure to put this on every machine that you intend to connect through, then you're done. SSH to gateway, reconnect to your screen, and you can immediately SSH over to web01 or wherever you want to go. It just works.

All this code does is, when you first SSH in to the machine, is set your SSH_AUTH_SOCK variable to a predictable value. It's a symlink that points to whatever your current SSH authentication socket happens to be. Every time you SSH in to this machine, that symlink gets rebuilt.

Inside of screen, the environment never has to change. It dereferences the symlink to find the correct socket and just works. No matter how many times you reconnect.

Conclusion and room for improvement

It took me a while to settle on this method. Originally I tried something fancy with getting screen/tmux to automatically import the environment of the shell I was attaching from, but that proved hard/impossible.

I also tried building a wrapper around the SSH command to automatically set the right environment variables. That turned out to work OK but was clumsy and hard to maintain between different machines. It also required building more and more wrappers to get other commands to work and ultimately proved unsustainable.

This particular solution came from, I'm pretty sure, somewhere else on the Internet. I would attribute if I remembered where I got the idea from. It's simple and just works.

The only trouble I've had is when I leave a terminal up at home, then go to work and connect from there (overwriting the symlink), and then when I get back home I have to close that terminal. I can't just use it. This happens so rarely that I haven't tried to engineer a fix to it. Let me know if you come up with one, though.

Thanks for reading. I hope this improves your systems administration experience.

View Comments // posted on 2011-11-17 at 15:48


Next Generation Monitoring

I want to talk a bit about the thoughts in my head about building a new monitoring system to replace Nagios. This is something that I've been thinking about for years and years, but finally I'm getting enough internal momentum to actually make it happen. First, let's dive in and look at the existing landscape of monitoring tools (as I know them).

Define Your Terms

For the purpose of this blog post, I define "monitoring" loosely as the act of gathering information about your services for the express purpose of alerting you when there's a problem. The other side of things, where you are creating pretty graphs to see how your servers and services are behaving over time is what I will call performance trending/analysis.

In short, Nagios is a monitoring system in that when your host goes down, it pages you. Cacti, on the other hand, is a performance analysis system that lets you keep track of how much RAM you have free, etc.

Many systems are both, too. But for the sake of this blog post, I'm mostly focusing on the monitoring side of the equation. If you want a good recommendation for performance analysis, please see OpenTSDB.

Monitoring Today

There are, it seems, two main approaches to monitoring: Nagios and everything else. Nagios is a fairly simple, relatively easy to use system that is good at doing a few things and doesn't really have many bells or whistles and doesn't do much else beyond monitoring your services.

Everything else seems to be a "Nagios and then some" system, providing some manner of bells and whistles that the traditional Nagios installation doesn't provide. That's fine, I don't really mind functionality, but it really gets away from the thing that I really need: something to let me know when my shit is broken.

I've spent a while over the years using Nagios, but every so often I go out and do a survey of the landscape. Sadly, the state of the art really hasn't changed a lot in ... well, years. You have Zabbix, OpenNMS, Zenoss, Hyperic, Icinga, Opsview, and I might be missing a few...

And, honestly, they're all probably good and accomplish the basic goals, but what they don't do for me is allow me to quickly and easily, with a minimum of fuss and nonsense, just monitor my infrastructure. I want something simple and easy to use. No surprises. A nearly flat learning curve. A UI that works. A CLI. (Preferrably one that works, too!)

These tools are Enterprise. They've got sales reps, marketing videos, VM appliances, and some of them are even built to do Windows, Unix, Solaris, and VMS! It's great, I'm positive they fill needs that people have and I don't think they're bad products. They're really just not what I'm looking for. Far too big for my needs.

The only thing that comes close to meeting my needs (forget my wants) is Nagios Core.

So, why not Nagios Core?

Because the HTML it generates looks like this:

<table border=0 width=100% cellspacing=0 cellpadding=0>
<tr>
<td align=left valign=top width=33%>
<TABLE CLASS='infoBox' BORDER=1 CELLSPACING=0 CELLPADDING=0>
<TR><TD CLASS='infoBox'>
<DIV CLASS='infoBoxTitle'>Current Network Status</DIV>
Last Updated: Wed Nov 2 02:26:21 CDT 2011<BR>

Okay, a little more seriously: because it's basically crippleware. Nagios Core has been held back to the state it was in nearly a decade ago so that the company can differentiate its enterprise offering, Nagios XI.

I'm all for the company making money -- that's great -- but their decision to leave the open source version of the product back in the stone age makes it so that I can't really use it to meet my needs. Over the years I've put hundreds of my own hours into efforts that I really shouldn't have had to because the system lacks so much that I need:

  • A functioning CLI. Doesn't exist. I'm starting to write one, though, but I really shouldn't have to.

  • A UI that is at all modern. The code above demonstrates, but if you actually interact with Nagios Core, you'll pretty quickly regret it. It's hard to use and has arcane, confusing commentary. Just try to schedule a downtime and do it right the first time!

  • Nice to have: An API that I can integrate with. I would like to build my own UIs or dashboards, so please give me access to your data in a reasonable fashion.

  • Reasonable behavior -- this is a very personal opinion, but Nagios does a few things that confuse and consternate me.

To be fair -- Nagios is still, in my opinion, the only system that allows me to get a monitoring environment up and running in an hour or two of hacking. A basic setup is easy to accomplish and worth having. I've used the software for many years now and I still choose it over everything else, so it's not all bad.

In fact, I recommend it if you're not sure what to use. It is currently the best system out there for monitoring your infrastructure.

The Wheel, Again

Of course, I wouldn't have started this blog post if all I wanted to do was bash Nagios. I really don't intend to be that hard on it. It's a good system, it's just old and getting older. Today's infrastructures demand a new, more interoperable monitoring system, and that's what I want to talk about here.

I'm starting to put together a design for building a monitoring system. I have a few key points that I am keeping in mind while doing this, but they're things that I think should resonate with many of you:

  • Prioritize simple. I'm not building an Enterprise(TM) solution here, I'm building for the busy sysadmin who needs to make sure things are working. Configuration and usage should be damn easy. So should setup.

  • Keep it minimal. The core of this project can be defined as "make software that tells me if my shit is broken". Other functionality can be added by other software -- which I may or may not write, but won't be part of the core.

  • Integrate with everybody. Provide a functional API that allows people to write web interfaces, shell scripts, or whatever they want. I will provide libraries to do just that, too, to make it easier to get started.

Those are my main three points right now: write something simple, make it handle the few things it should, and allow other people to bolt things on if they want. Add a widget to your dashboard that shows the availability of a service? Great, that's a simple HTTP query that will return JSON for you to consume. Make a shell script silence alerts? Easy.

Implementation Notes

I've spent a lot of time considering my options here, and as much as I love Perl, these days I'm a Python guy. I'm going to stick to Python for now. I will probably also use the Diesel library. That provides a lot of network service and microthread functionality that certainly makes my life a lot easier.

Another goal (this may not be in v1, I'm not sure) is also to make it so that the system can run on N machines for redundancy. These days, there's very little reason to run your monitoring system in one place. Why not run it on five machines and just have them sort out how to divvy up the work? This is the way many things are moving, and I see no reason that monitoring systems can't as well.

In the name of allowing people to do some interesting and complicated things with the system, I really want to support a full event system. While this is actually not particularly complicated for a monitoring system, it has a lot of implications for the rest of the ecosystem.

For example, let's say that we have an event that fires when the monitoring system has determined that a host is down. Next, we give people the ability to write plugins for the monitoring system that can listen to events. Alternately we allow people to subscribe to events using a pubsub type model of some sort?

Either way, someone could potentially write code that does a database failover when the system detects that a database has gone down. Or maybe they have code to automatically restart a process, reboot a server, etc etc. The list of possibilities is endless and it doesn't compromise the vision to build a simple system -- you never have to touch it. The power is there, though.

Closing Thoughts

Monitoring is a really interesting subject to me. It seems to me that the state of the art is really pretty woeful when you consider how important our infrastructure is these days. Most people use a handful of tools they've cobbled together combined with a few dozen scripts of their own and nobody ever seems to have a really great handle on it.

It would be good to simplify this and, to some extent, standardize it. The LAMP stack has nearly been commoditized at this point, giving rise to services like Heroku that allow you to just write code and not worry about your backend. Those are great and for those who can use them -- awesome. I envy you a bit.

For the rest of us, though: I think it's high time to improve the state of things. I welcome your feedback as I (continue to) embark on this crusade.

View Comments // posted on 2011-11-01 at 23:50