How to monitor proper work of key server services (sshd, Bind, syslog, postfix, courier-imap, mysql, apache, vsftpd, MRTG, clamd, amavisd, vixie-cron) using Monit package.
Each administrator wants to be aware of the current state of his servers. When the number of the servers is growing monitoring them becomes a painful task. Fortunately not in Linux. We have many services that are able to take care of server monitoring and informing administrators about the problems. On my LAMP servers I use a package called Monit which is helping me to keep my server up and running. Monit is able to monitor server load and inform the administrator when the load becomes to heavy. Monit is also able to monitor key server services and restart them if they are overloaded or they crashed for some reason. Monit is superb failsafe utility.
I will try to share configuration of Monit package that is coming from my LAMP and mail servers hoping that someone will find this informations useful. Provided here Monit configuration should be applicable to any Linux distribution but installation instructions are specific for Gentoo Linux that I use on my servers.
First thing we need to do will be installing Monit on our server. In Gentoo Linux we can do it like this:
emerge -av app-admin/monit
Configuration of Monit is extremely easy it takes no more there 15 minutes. Monit is able to monitor whole system and particular services. Your Monit configuration should relfect the number of services on your server. On my LAMP servers I use ssh, Bind DNS server, MySQL database server, Apache web server, Vsftpd FTP server and Subversion Code version control server. On my Mail servers I additionally use Postfix and Courier-Imap mail server software, Clam antiivirus and Amavisd content checker. I also use Monit to additionally monitor Syslog-Ng, MRTG and Vixie-cron daemons. To create Monit configuration in Gentoo Linux you have to edit /etc/monitrc file. You can also create a directory called /etc/monit.d and keep separate Monit configuration files in there. In fact this type of configuration may be default for many Linux distributions. If you want to keep the configuration like this make sure you will uncomment following section in /etc/monitrc.
############################################################################### ## Includes ############################################################################### ## ## It is possible to include additional configuration parts from other files or ## directories. # # include /etc/monit.d/* # #
I will keep the configuration in one file. Below you will find most important parts of my Monit configuration coming from both types of servers. This example file is a marge of quite a few files coming from different servers I use:
############################################################################### ## Global section ############################################################################### ## ## Start Monit in the background (run as a daemon): # set daemon 300 # check services at 5-minute intervals with start delay 600 # optional: delay the first check by 10-minutes ## Set syslog logging with the 'daemon' facility. If the FACILITY option is ## omitted, monit will use 'user' facility by default. If you want to log to ## a stand alone log file instead, specify the path to a log file # set logfile syslog facility log_daemon ### Set the location of the Monit id file which stores the unique id for the ### Monit instance. The id is generated and stored on first Monit start. By ### default the file is placed in $HOME/.monit.id. # set idfile /var/monit/.monit.id ### Set the location of monit state file which saves the monitoring state ### on each cycle. By default the file is placed in $HOME/.monit.state. If ### state file is stored on persistent filesystem, monit will recover the ### monitoring state across reboots. If it is on temporary filesystem, the ### state will be lost on reboot. # set statefile /var/monit/.monit.state # Set the list of mail servers for alert delivery. Multiple servers may be ## specified using comma separator. By default monit uses port 25 - this ## is possible to override with the PORT option. # set mailserver primary_mail_server_address, # primary mailserver secondary_mail_server_address # fallback relay ## By default monit will drop alert events if no mail servers are available. ## If you want to keep the alerts for a later delivery retry, you can use the ## EVENTQUEUE statement. The base directory where undelivered alerts will be ## stored is specified by the BASEDIR option. You can limit the maximal queue ## size using the SLOTS option (if omitted, the queue is limited by space ## available in the back end filesystem). # set eventqueue basedir /var/monit # set the base directory where events will be stored slots 100 # optionaly limit the queue size ## You can set alert recipients whom will receive alerts if/when a ## service defined in this file has errors. Alerts may be restricted on ## events by using a filter as in the second example below. # set alert my_email@adress.org # receive all alerts ## Monit has an embedded web server which can be used to view status of ## services monitored and manage services from a web interface. See the ## Monit Wiki if you want to enable SSL for the web server. # set httpd port monit_web_port and allow login:password SSL ENABLE PEMFILE /etc/ssl/apache2/certificate.pem CLIENTPEMFILE /var/certs/monit-client.pem ############################################################################### ## Services ############################################################################### ## ## Check general system resources such as load average, cpu and memory ## usage. Each test specifies a resource, conditions and the action to be ## performed should a test fail. # check system localhost if loadavg (1min) > 4 then alert if loadavg (5min) > 2 then alert if memory usage > 75% then alert if cpu usage (user) > 70% then alert if cpu usage (system) > 30% then alert if cpu usage (wait) > 20% then alert ################################################### ##Check ssh ################################################### check process sshd with pidfile /var/run/sshd.pid group system start program "/etc/init.d/sshd start" stop program "/etc/init.d/sshd stop" if failed port 22 protocol ssh then restart if 5 restarts within 5 cycles then timeout ################################################### #Check Bind ################################################### check process named with pidfile /chroot/dns/var/run/named/named.pid group dns start program = "/etc/init.d/named start" stop program = "/etc/init.d/named stop" if failed host 127.0.0.1 port 53 type tcp then restart if failed host 127.0.0.1 port 53 type udp then restart if 5 restarts within 5 cycles then timeout ################################################### #Check syslog-ng ################################################### check process syslog-ng with pidfile /var/run/syslog-ng.pid group system start program = "/etc/init.d/syslog-ng start" stop program = "/etc/init.d/syslog-ng stop" if 5 restarts within 5 cycles then timeout ################################################## #Check Vixie-cron ################################################## check process cron with pidfile /var/run/cron.pid group system start program = "/etc/init.d/vixie-cron start" stop program = "/etc/init.d/vixie-cron stop" if 5 restarts within 5 cycles then timeout ################################################## #Check postfix ################################################## check process postfix with pidfile /var/spool/postfix/pid/master.pid start program = "/etc/init.d/postfix start" stop program = "/etc/init.d/postfix stop" if cpu > 40% for 2 cycles then alert if cpu > 60% for 5 cycles then restart if totalmem > 512 MB then restart if failed port 25 protocol smtp then restart if 5 restarts within 5 cycles then timeout ################################################## #Check POP3 ################################################## check process pop3 with pidfile /var/run/pop3d.pid group mail start program = "/etc/init.d/courier-pop3d start" stop program = "/etc/init.d/courier-pop3d stop" if failed port 110 then restart if 5 restarts within 5 cycles then timeout ################################################## #Check POP3-SSL ################################################## check process pop3-ssl with pidfile /var/run/pop3d-ssl.pid group mail start program = "/etc/init.d/courier-pop3d-ssl start" stop program = "/etc/init.d/courier-pop3d-ssl stop" if failed host localhost port 995 type tcpssl sslauto protocol pop then restart if 5 restarts within 5 cycles then timeout ################################################## #Check IMAP ################################################## check process imap-ssl with pidfile /var/run/imapd.pid group mail start program = "/etc/init.d/courier-imap start" stop program = "/etc/init.d/courier-imap stop" if failed host localhost port 143 protocol imap then restart if 5 restarts within 5 cycles then timeout ################################################## #Check IMAP SSL ################################################## check process imap-ssl with pidfile /var/run/imapd.pid group mail start program = "/etc/init.d/courier-imapd-ssl start" stop program = "/etc/init.d/courier-imapd-ssl stop" if failed host localhost port 993 type tcpssl sslauto protocol imap then restart if 5 restarts within 5 cycles then timeout ################################################## #Courier-Authlib ################################################## check process saslauthd with pidfile /var/run/authdaemon.pid group mail start program = "/etc/init.d/courier-authlib start" stop program = "/etc/init.d/courier-authlib stop" if 5 restarts within 5 cycles then timeout ################################################## #SASL Auth Deamon ################################################## check process saslauthd with pidfile /var/lib/sasl2/saslauthd.pid group mail depends on postfix start program = "/etc/init.d/saslauthd start" stop program = "/etc/init.d/saslauthd stop" if 5 restarts within 5 cycles then timeout ################################################## #Amavisd-new ################################################## check process amavisd with pidfile /var/amavis/amavisd.pid group mail start program = "/etc/init.d/amavisd start" stop program = "/etc/init.d/amavisd stop" if cpu > 40% for 2 cycles then alert if cpu > 60% for 5 cycles then restart if failed unixsocket /var/amavis/amavis.sock then restart if failed port 10024 then restart if 5 restarts within 5 cycles then timeout ################################################## #CLAM Antivirus ################################################## check process clamd with pidfile /var/run/clamav/clamd.pid group virus start program = "/etc/init.d/clamd start" stop program = "/etc/init.d/clamd stop" if cpu > 40% for 2 cycles then alert if cpu > 60% for 5 cycles then restart if failed unixsocket /var/run/clamav/clamd.sock then restart if 5 restarts within 5 cycles then timeout ################################################### #Check svnserve ################################################### check process svnserve with pidfile /var/run/svnserve.pid start program = "/etc/init.d/svnserve start" stop program = "/etc/init.d/svnserve stop" if failed host localhost port SVNSERVE_PORT then restart if 5 restarts within 5 cycles then timeout ################################################### #Check mysql ################################################### check process mysql with pidfile /var/run/mysqld/mysqld.pid group database start program = "/etc/init.d/mysql start" stop program = "/etc/init.d/mysql stop" if failed unix "/var/run/mysqld/mysqld.sock" then restart if 5 restarts within 5 cycles then timeout ################################################### #Check MRTG ################################################### check process mrtg with pidfile /var/run/mrtg.pid group monitoring start program = "/etc/init.d/mrtg start" stop program = "/etc/init.d/mrtg stop" if 5 restarts within 5 cycles then timeout ################################################### #Check apache2 ################################################### check process apache with pidfile /var/run/apache2.pid group www start program = "/etc/init.d/apache2 start" stop program = "/etc/init.d/apache2 stop" if failed host www.web_app_address.org port 80 protocol http and request "/monit/hello" then restart if failed host 127.0.0.1 port 80 protocol apache-status loglimit > 60% then restart if cpu > 60% for 2 cycles then alert if cpu > 80% for 5 cycles then restart if totalmem > 1024 MB for 2 cycles then alert if totalmem > 2048 MB for 5 cycles then restart if children > 500 then restart if loadavg(15min) 10 for 8 cycles then restart if 5 restarts within 5 cycles then timeout ################################################### #Check vsftpd ################################################### check process vsftpd with pidfile /var/run/vsftpd.pid group ftp start program = "/etc/init.d/vsftpd start" stop program = "/etc/init.d/vsftpd stop" if failed port 2122 protocol ftp then restart if 5 restarts within 5 cycles then timeout
I will try to briefly explain this configuration. First part of the file defines the Monit service configuration. All defined checks will run in 5 minutes interval, first check will be performed 10 minutes after service start. Next important thing is mail server configuration. As you can see I have defined main and backup server. It's not good to rely only on localhost mail server, as most likely you will monitor it's proper work with Monit. Make sure the secondary mail server will have open relay from Monit server. Next lines are defining the EVENTQUEUE which will be used to store Monit events if both mails servers will not be available. Next thing you may notice in configuration file is recipient mail definition. I use one email but Monit allows a very detail configuration of mail recipients. For example it can send mails to different recipients depending on type of the event. Monit comes with embedded Web Server. My configuration allows connection to this server based on provided username and password. WEB connection will require SSL protocol on given server port and access is allowed only from browsers having special access certificate. Make sure you will open specified WEB server port in firewall for TCP connections.
The most important part is Service configuration section. First I defined server load monitoring. As you can see most of this lines speak for themselves. Next part is service specific monitoring. Every service is monitored using pid file, next come the service start and stop command. As you can see if something will go wrong Monit will try to restart service 5 times within 5 cycles, before assuming it can't be done. For some services (ssh, Bind, postfix, pop3, pop3-ssl, imap, imap-ssl, amavisd, svnserve, vsftpd) I use network connection monitoring for others (amavisd, clamd, mysql) I use unix socket monitoring. Additionally key services (postfix, amavisd, clamd) that may cause large server load are monitored based on service load. First I generate an alert which will force Monit to send me an email and if the load will grow the Monit will restart the services. The most complex monitoring definition I use is for Apache service. I love Monit for ability to monitor the work of the Web server based on call to single Web application.
After creating a Monit configuration you have to add Monit service to your distribution default run level. In Gentoo Linux we do it using rc-update script like this:
rc-update add monit defualt
Next start the monit service:
/etc/init.d/monit start
If you will run into problems and the service is not able to start it's most likely because you made some simple mistake in configuration file. The easiest way to find the mistake is to run following command in shell:
monit status
I found out that this will give you detailed information whats wrong with your configuration
Monit is extremely useful tool. My configuration is using only a part of possible options. For example you can additionally monitor key services init files and binaries using checksums. I don't use it because I have AIDE intrusion detection system which is taking care of such checks. Below you can see a screenshot of Monit Web Interface coming from one of my servers.
With Web interface you can see detailed information about every service and start, stop or disable monitoring for them. Below you can see a screenshot with single service monitoring interface.
If you want to use parts of this configuration in different distribution then Gentoo Linux make sure you will check the proper init file names and also location and names of socket and pid files of monitored services. This types of things may look slightly different then in Gentoo.
Sources:
If you have found something wrong with the information provided above or maybe you just want to speak your mind about it, feel free to leave a comment.
All comments will show up on page after being approved. Sorry for such policy but I want to make sure that my site will be free of abusive or vulgar content. I don't mind being criticized just do it using right words.
Processing a comment.