Wednesday, March 16, 2011

Nagios: Monitoring Systems & Applications

"Someone please help me."
These are the first words we usually say when we see our production system stuck/crashed due to some reasons.
But "Someone helps those who help themselves."  --Harpreet Singh ;-)

Now jokes apart.
But suppose we get an alert before the systems fails may be during the first stroke or when the load starting going high or may be when total processes were more or any thing related to our applications running on the server.

Wouldn't this be like a boon, a chance to save the system in time?
If you have seen my earlier posts you will find monit/munin doing the same, but as on the way of my learning, I found that nagios is a better (easy and more flexible/plugable) tool.
Before starting to explain on why my opinion changed I will ask you one question here.
What do you expect/need from any system/application monitoring tool?

The general answers would be:
 - Stable.
 - Good UI.
 - Easy Installation.
 - Easy configuration.
 - Good coverage over different applications and system.
 - etc. etc.

Now lets see if nagios answers all of these?

Like other tools nagios also has client-server architecture, which gives us freedom to monitor any number of  systems/applications from one nagios server.
It has a easy to understand & configure UI, through which you can do many things like scheduling, controlling alerts etc. And if you are a CLI lover (as most of Linux geeks are) then you can do all those from command line also.

Now here comes the most impressive part.
Nagios is highly flexible. First of all it has huge plugin base already available for you to work with.
But if that is not enough for you, then ask yourself just one questions.
Do I know how to write a script (bash, python etc)?

I usually say one line for nagios, that "If you can do it through CLI, you definitely can do it with nagios." Same is the answer for the question you asked yourself (above). If you can write a script to perform any action (login check, api calls, application query etc.) and get a small readable/understandable output (for both
success and failure cases). Then it's like a kids play to integrate it to nagios and see the same results in UI.

In simple words:
 - Write a script to perform certain action.
 - Copy that to the nagios script directory (just to ensure that you/anyone doesn't accidentally deletes it).
 - Add that to the nagios commands.
 - Call that command for the host you want.
 - And done.

Another plus part is that you can flaunt in front of your seniors about the work done (with minimal effort involved) ;-)

No comments:

Post a Comment