18 August 2010

Monitoring key server services with Monit

Categories:  Server  Monitoring  Linux  Gentoo

How to monitor proper work of key server services (sshd, Bind, syslog, postfix, courier-imap, mysql, apache, vsftpd, MRTG, clamd, amavisd, vixie-cron) using Monit package.

Each administrator wants to be aware of the current state of his servers. When the number of the servers is growing monitoring them becomes a painful task. Fortunately not in Linux. We have many services that are able to take care of server monitoring and informing administrators about the problems. On my LAMP servers I use a package called Monit which is helping me to keep my server up and running. Monit is able to monitor server load and inform the administrator when the load becomes to heavy. Monit is also able to monitor key server services and restart them if they are overloaded or they crashed for some reason. Monit is superb failsafe utility.

I will try to share configuration of Monit package that is coming from my LAMP and mail servers hoping that someone will find this informations useful. Provided here Monit configuration should be applicable to any Linux distribution but installation instructions are specific for Gentoo Linux that I use on my servers.

First thing we need to do will be installing Monit on our server. In Gentoo Linux we can do it like this:

emerge -av app-admin/monit

Configuration of Monit is extremely easy it takes no more there 15 minutes. Monit is able to monitor whole system and particular services. Your Monit configuration should relfect the number of services on your server. On my LAMP servers I use ssh, Bind DNS server, MySQL database server, Apache web server, Vsftpd FTP server and Subversion Code version control server. On my Mail servers I additionally use Postfix and Courier-Imap mail server software, Clam antiivirus and Amavisd content checker. I also use Monit to additionally monitor Syslog-Ng, MRTG and Vixie-cron daemons. To create Monit configuration in Gentoo Linux you have to edit /etc/monitrc file. You can also create a directory called /etc/monit.d and keep separate Monit configuration files in there. In fact this type of configuration may be default for many Linux distributions. If you want to keep the configuration like this make sure you will uncomment following section in /etc/monitrc.

###############################################################################
## Includes
###############################################################################
## 
## It is possible to include additional configuration parts from other files or
## directories.
#
#  include /etc/monit.d/*
#
#

I will keep the configuration in one file. Below you will find most important parts of my Monit configuration coming from both types of servers. This example file is a marge of quite a few files coming from different servers I use:

View the monitrc configuration file
  1. ###############################################################################
  2. ## Global section
  3. ###############################################################################
  4. ##
  5. ## Start Monit in the background (run as a daemon):
  6. #
  7. set daemon 300 # check services at 5-minute intervals
  8. with start delay 600 # optional: delay the first check by 10-minutes
  9. ## Set syslog logging with the 'daemon' facility. If the FACILITY option is
  10. ## omitted, monit will use 'user' facility by default. If you want to log to
  11. ## a stand alone log file instead, specify the path to a log file
  12. #
  13. set logfile syslog facility log_daemon
  14. ### Set the location of the Monit id file which stores the unique id for the
  15. ### Monit instance. The id is generated and stored on first Monit start. By
  16. ### default the file is placed in $HOME/.monit.id.
  17. #
  18. set idfile /var/monit/.monit.id
  19. ### Set the location of monit state file which saves the monitoring state
  20. ### on each cycle. By default the file is placed in $HOME/.monit.state. If
  21. ### state file is stored on persistent filesystem, monit will recover the
  22. ### monitoring state across reboots. If it is on temporary filesystem, the
  23. ### state will be lost on reboot.
  24. #
  25. set statefile /var/monit/.monit.state
  26. # Set the list of mail servers for alert delivery. Multiple servers may be
  27. ## specified using comma separator. By default monit uses port 25 - this
  28. ## is possible to override with the PORT option.
  29. #
  30. set mailserver primary_mail_server_address, # primary mailserver
  31. secondary_mail_server_address # fallback relay
  32. ## By default monit will drop alert events if no mail servers are available.
  33. ## If you want to keep the alerts for a later delivery retry, you can use the
  34. ## EVENTQUEUE statement. The base directory where undelivered alerts will be
  35. ## stored is specified by the BASEDIR option. You can limit the maximal queue
  36. ## size using the SLOTS option (if omitted, the queue is limited by space
  37. ## available in the back end filesystem).
  38. #
  39. set eventqueue
  40. basedir /var/monit # set the base directory where events will be stored
  41. slots 100 # optionaly limit the queue size
  42. ## You can set alert recipients whom will receive alerts if/when a
  43. ## service defined in this file has errors. Alerts may be restricted on
  44. ## events by using a filter as in the second example below.
  45. #
  46. set alert my_email@adress.org # receive all alerts
  47. ## Monit has an embedded web server which can be used to view status of
  48. ## services monitored and manage services from a web interface. See the
  49. ## Monit Wiki if you want to enable SSL for the web server.
  50. #
  51. set httpd port monit_web_port and
  52. allow login:password
  53. SSL ENABLE
  54. PEMFILE /etc/ssl/apache2/certificate.pem
  55. CLIENTPEMFILE /var/certs/monit-client.pem
  56. ###############################################################################
  57. ## Services
  58. ###############################################################################
  59. ##
  60. ## Check general system resources such as load average, cpu and memory
  61. ## usage. Each test specifies a resource, conditions and the action to be
  62. ## performed should a test fail.
  63. #
  64. check system localhost
  65. if loadavg (1min) > 4 then alert
  66. if loadavg (5min) > 2 then alert
  67. if memory usage > 75% then alert
  68. if cpu usage (user) > 70% then alert
  69. if cpu usage (system) > 30% then alert
  70. if cpu usage (wait) > 20% then alert
  71.  
  72. ###################################################
  73. ##Check ssh
  74. ###################################################
  75.  
  76. check process sshd with pidfile /var/run/sshd.pid
  77. group system
  78. start program "/etc/init.d/sshd start"
  79. stop program "/etc/init.d/sshd stop"
  80. if failed port 22 protocol ssh then restart
  81. if 5 restarts within 5 cycles then timeout
  82.  
  83. ###################################################
  84. #Check Bind
  85. ###################################################
  86.  
  87. check process named with pidfile /chroot/dns/var/run/named/named.pid
  88. group dns
  89. start program = "/etc/init.d/named start"
  90. stop program = "/etc/init.d/named stop"
  91. if failed host 127.0.0.1 port 53 type tcp then restart
  92. if failed host 127.0.0.1 port 53 type udp then restart
  93. if 5 restarts within 5 cycles then timeout
  94.  
  95. ###################################################
  96. #Check syslog-ng
  97. ###################################################
  98.  
  99. check process syslog-ng with pidfile /var/run/syslog-ng.pid
  100. group system
  101. start program = "/etc/init.d/syslog-ng start"
  102. stop program = "/etc/init.d/syslog-ng stop"
  103. if 5 restarts within 5 cycles then timeout
  104.  
  105. ##################################################
  106. #Check Vixie-cron
  107. ##################################################
  108.  
  109. check process cron with pidfile /var/run/cron.pid
  110. group system
  111. start program = "/etc/init.d/vixie-cron start"
  112. stop program = "/etc/init.d/vixie-cron stop"
  113. if 5 restarts within 5 cycles then timeout
  114.  
  115. ##################################################
  116. #Check postfix
  117. ##################################################
  118.  
  119. check process postfix with pidfile /var/spool/postfix/pid/master.pid
  120. start program = "/etc/init.d/postfix start"
  121. stop program = "/etc/init.d/postfix stop"
  122. if cpu > 40% for 2 cycles then alert
  123. if cpu > 60% for 5 cycles then restart
  124. if totalmem > 512 MB then restart
  125. if failed port 25 protocol smtp then restart
  126. if 5 restarts within 5 cycles then timeout
  127.  
  128. ##################################################
  129. #Check POP3
  130. ##################################################
  131.  
  132. check process pop3 with pidfile /var/run/pop3d.pid
  133. group mail
  134. start program = "/etc/init.d/courier-pop3d start"
  135. stop program = "/etc/init.d/courier-pop3d stop"
  136. if failed port 110 then restart
  137. if 5 restarts within 5 cycles then timeout
  138.  
  139. ##################################################
  140. #Check POP3-SSL
  141. ##################################################
  142.  
  143. check process pop3-ssl with pidfile /var/run/pop3d-ssl.pid
  144. group mail
  145. start program = "/etc/init.d/courier-pop3d-ssl start"
  146. stop program = "/etc/init.d/courier-pop3d-ssl stop"
  147. if failed host localhost port 995 type tcpssl sslauto protocol pop then restart
  148. if 5 restarts within 5 cycles then timeout
  149.  
  150. ##################################################
  151. #Check IMAP
  152. ##################################################
  153.  
  154. check process imap-ssl with pidfile /var/run/imapd.pid
  155. group mail
  156. start program = "/etc/init.d/courier-imap start"
  157. stop program = "/etc/init.d/courier-imap stop"
  158. if failed host localhost port 143 protocol imap then restart
  159. if 5 restarts within 5 cycles then timeout
  160.  
  161. ##################################################
  162. #Check IMAP SSL
  163. ##################################################
  164.  
  165. check process imap-ssl with pidfile /var/run/imapd.pid
  166. group mail
  167. start program = "/etc/init.d/courier-imapd-ssl start"
  168. stop program = "/etc/init.d/courier-imapd-ssl stop"
  169. if failed host localhost port 993 type tcpssl sslauto protocol imap then restart
  170. if 5 restarts within 5 cycles then timeout
  171.  
  172. ##################################################
  173. #Courier-Authlib
  174. ##################################################
  175.  
  176. check process saslauthd with pidfile /var/run/authdaemon.pid
  177. group mail
  178. start program = "/etc/init.d/courier-authlib start"
  179. stop program = "/etc/init.d/courier-authlib stop"
  180. if 5 restarts within 5 cycles then timeout
  181.  
  182. ##################################################
  183. #SASL Auth Deamon
  184. ##################################################
  185.  
  186. check process saslauthd with pidfile /var/lib/sasl2/saslauthd.pid
  187. group mail
  188. depends on postfix
  189. start program = "/etc/init.d/saslauthd start"
  190. stop program = "/etc/init.d/saslauthd stop"
  191. if 5 restarts within 5 cycles then timeout
  192.  
  193. ##################################################
  194. #Amavisd-new
  195. ##################################################
  196.  
  197. check process amavisd with pidfile /var/amavis/amavisd.pid
  198. group mail
  199. start program = "/etc/init.d/amavisd start"
  200. stop program = "/etc/init.d/amavisd stop"
  201. if cpu > 40% for 2 cycles then alert
  202. if cpu > 60% for 5 cycles then restart
  203. if failed unixsocket /var/amavis/amavis.sock then restart
  204. if failed port 10024 then restart
  205. if 5 restarts within 5 cycles then timeout
  206.  
  207. ##################################################
  208. #CLAM Antivirus
  209. ##################################################
  210.  
  211. check process clamd with pidfile /var/run/clamav/clamd.pid
  212. group virus
  213. start program = "/etc/init.d/clamd start"
  214. stop program = "/etc/init.d/clamd stop"
  215. if cpu > 40% for 2 cycles then alert
  216. if cpu > 60% for 5 cycles then restart
  217. if failed unixsocket /var/run/clamav/clamd.sock then restart
  218. if 5 restarts within 5 cycles then timeout
  219.  
  220. ###################################################
  221. #Check svnserve
  222. ###################################################
  223.  
  224. check process svnserve with pidfile /var/run/svnserve.pid
  225. start program = "/etc/init.d/svnserve start"
  226. stop program = "/etc/init.d/svnserve stop"
  227. if failed host localhost port SVNSERVE_PORT then restart
  228. if 5 restarts within 5 cycles then timeout
  229.  
  230. ###################################################
  231. #Check mysql
  232. ###################################################
  233.  
  234. check process mysql with pidfile /var/run/mysqld/mysqld.pid
  235. group database
  236. start program = "/etc/init.d/mysql start"
  237. stop program = "/etc/init.d/mysql stop"
  238. if failed unix "/var/run/mysqld/mysqld.sock" then restart
  239. if 5 restarts within 5 cycles then timeout
  240.  
  241. ###################################################
  242. #Check MRTG
  243. ###################################################
  244.  
  245. check process mrtg with pidfile /var/run/mrtg.pid
  246. group monitoring
  247. start program = "/etc/init.d/mrtg start"
  248. stop program = "/etc/init.d/mrtg stop"
  249. if 5 restarts within 5 cycles then timeout
  250.  
  251. ###################################################
  252. #Check apache2
  253. ###################################################
  254.  
  255. check process apache with pidfile /var/run/apache2.pid
  256. group www
  257. start program = "/etc/init.d/apache2 start"
  258. stop program = "/etc/init.d/apache2 stop"
  259. if failed host www.web_app_address.org port 80
  260. protocol http and request "/monit/hello"
  261. then restart
  262. if failed host 127.0.0.1 port 80
  263. protocol apache-status loglimit > 60%
  264. then restart
  265. if cpu > 60% for 2 cycles then alert
  266. if cpu > 80% for 5 cycles then restart
  267. if totalmem > 1024 MB for 2 cycles then alert
  268. if totalmem > 2048 MB for 5 cycles then restart
  269. if children > 500 then restart
  270. if loadavg(15min) 10 for 8 cycles then restart
  271. if 5 restarts within 5 cycles then timeout
  272.  
  273. ###################################################
  274. #Check vsftpd
  275. ###################################################
  276.  
  277. check process vsftpd with pidfile /var/run/vsftpd.pid
  278. group ftp
  279. start program = "/etc/init.d/vsftpd start"
  280. stop program = "/etc/init.d/vsftpd stop"
  281. if failed port 2122 protocol ftp then restart
  282. if 5 restarts within 5 cycles then timeout

I will try to briefly explain this configuration. First part of the file defines the Monit service configuration. All defined checks will run in 5 minutes interval, first check will be performed 10 minutes after service start. Next important thing is mail server configuration. As you can see I have defined main and backup server. It's not good to rely only on localhost mail server, as most likely you will monitor it's proper work with Monit. Make sure the secondary mail server will have open relay from Monit server. Next lines are defining the EVENTQUEUE which will be used to store Monit events if both mails servers will not be available. Next thing you may notice in configuration file is recipient mail definition. I use one email but Monit allows a very detail configuration of mail recipients. For example it can send mails to different recipients depending on type of the event. Monit comes with embedded Web Server. My configuration allows connection to this server based on provided username and password. WEB connection will require SSL protocol on given server port and access is allowed only from browsers having special access certificate. Make sure you will open specified WEB server port in firewall for TCP connections.

The most important part is Service configuration section. First I defined server load monitoring. As you can see most of this lines speak for themselves. Next part is service specific monitoring. Every service is monitored using pid file, next come the service start and stop command. As you can see if something will go wrong Monit will try to restart service 5 times within 5 cycles, before assuming it can't be done. For some services (ssh, Bind, postfix, pop3, pop3-ssl, imap, imap-ssl, amavisd, svnserve, vsftpd) I use network connection monitoring for others (amavisd, clamd, mysql) I use unix socket monitoring. Additionally key services (postfix, amavisd, clamd) that may cause large server load are monitored based on service load. First I generate an alert which will force Monit to send me an email and if the load will grow the Monit will restart the services. The most complex monitoring definition I use is for Apache service. I love Monit for ability to monitor the work of the Web server based on call to single Web application.

After creating a Monit configuration you have to add Monit service to your distribution default run level. In Gentoo Linux we do it using rc-update script like this:

rc-update add monit defualt

Next start the monit service:

/etc/init.d/monit start

If you will run into problems and the service is not able to start it's most likely because you made some simple mistake in configuration file. The easiest way to find the mistake is to run following command in shell:

monit status

I found out that this will give you detailed information whats wrong with your configuration

Monit is extremely useful tool. My configuration is using only a part of possible options. For example you can additionally monitor key services init files and binaries using checksums. I don't use it because I have AIDE intrusion detection system which is taking care of such checks. Below you can see a screenshot of Monit Web Interface coming from one of my servers.

Monit Web

With Web interface you can see detailed information about every service and start, stop or disable monitoring for them. Below you can see a screenshot with single service monitoring interface.

Monit Service Web

If you want to use parts of this configuration in different distribution then Gentoo Linux make sure you will check the proper init file names and also location and names of socket and pid files of monitored services. This types of things may look slightly different then in Gentoo.

Sources:




Comments

If you have found something wrong with the information provided above or maybe you just want to speak your mind about it, feel free to leave a comment.
All comments will show up on page after being approved. Sorry for such policy but I want to make sure that my site will be free of abusive or vulgar content. I don't mind being criticized just do it using right words.

Leave a comment