Tag: nagios

Nagios更改报警邮件发件人地址

Posted by – 2010-03-10

#发现nagios的发件人地址是这样的:
#“运行nagios的用户”@“服务器上hosts文件中配置的主机名”

#觉得不爽,改之
#nagios发邮件是调用sendmail,所以牵扯到sendmail的配置
#centos下yum安装的sendmail配置文件路径是在/etc/mail/下
#sendmail.cf是Sendmail的主配置文件,其中的内容为特定宏,因为文件中的宏代码实在是太多了。sendmail.cf通常是由一个以mc结尾的文件编译生成。
#先备份下

cp sendmail.cf sendmail.cf.default
cp sendmail.mc sendmail.mc.default
vi sendmail.mc
#----------------------------引用文字-开始----------------------------
#找到:
dnl MASQUERADE_AS(`mydomain.com')dnl
#改为自己想要的地址:
MASQUERADE_AS(`chengyongxu.com')dnl
#----------------------------引用文字-结束----------------------------

#然后编译一下
m4 /etc/mail/sendmail.mc  > /etc/mail/sendmail.cf

#如果报下面的错误

#----------------------------引用文字-开始----------------------------
sendmail.mc:10: m4: cannot open `/usr/share/sendmail-cf/m4/cf.m4': No such file or directory
#----------------------------引用文字-结束----------------------------

#请安装sendmail-cf

yum install sendmail-cf

#重启sendmail

service sendmail restart

#这样再收到报警邮件发件人变成了

nagios@chengyongxu.com

#引申
#如果把自己的地址配成yahoo.com
#那么是不是可以伪装为yahoo邮箱呢?

Nagios冗余设置

Posted by – 2009-11-09

#主监控服务器搬迁后,原来的配置还是留下做个冗余比较好。
#原主监控机ip为10.0.0.52
#新主监控机ip为10.0.0.166
#原监控机上把配置文件都备份一下
#只留下一个被监控机,一个服务

#在新监控机上更改
#先请允许原主监控机监控自己

vi /usr/local/nrpe/etc/nrpe.cfg
#将allowed_hosts=127.0.0.1,10.0.0.166
#更改为
#----------------------------引用文字-开始----------------------------
allowed_hosts=127.0.0.1,10.0.0.52,10.0.0.166
#----------------------------引用文字-结束----------------------------
#再增加对check_nagios命令的定义
vi /usr/local/nagios/etc/nrpe.cfg
#增加下边这行
#----------------------------引用文字-开始----------------------------
command[check_nagios]=/usr/local/nagios/libexec/check_nagios -e 5 -F /usr/local/nagios/var/status.dat -C /usr/local/nagios/bin/nagios
#----------------------------引用文字-结束----------------------------
service nagios reload

#在原监控机上更改

vi hosts.cfg
#----------------------------引用文字-开始----------------------------
define host{
                host_name       10.0.0.166
                alias       166.chengyongxu.com
                address       10.0.0.166
                max_check_attempts       5
                #check_interval       1
                #retry_interval       1
                check_period       24x7
                contact_groups       sa_groups
                notification_interval       30
                #first_notification_delay       #
                notification_period       24x7
                notification_options      d,u
                }
#----------------------------引用文字-结束----------------------------

vi services.cfg
#----------------------------引用文字-开始----------------------------
#monitor 166's nagios
define service{
                host_name       10.0.0.166
                service_description       check_nagios
                check_command       check_nrpe!check_nagios
                max_check_attempts       3
                check_interval       5
                retry_interval       1
                check_period       24x7
                notification_interval       30
                notification_period       24x7
                notification_options      w,u,c
                #contacts       contacts(*)
                contact_groups       sa_groups
                }
#----------------------------引用文字-结束----------------------------
service nagios reload

#完成了

#没有采用主监控机挂掉后,备用机自动接替主监控的工作。原因是:
#1.因为nagios进程还是非常强健的,之前的那台跑了半年多nagios进程从未挂掉过,当机房受攻击,带宽迅速耗尽的情况下也能发出短信;
#2.主监控都挂了,还不赶快修复?那会儿不监控也不会有什么损失;
#3.本文所说的设置只为了应付两种情况:主监控机突然断电断网和nagios假死。

Nagios监控磁盘I/O

Posted by – 2009-11-05

#官方默认插件中并无此插件,但官方还是提供了这个插件的下载,地址在:

http://exchange.nagios.org/directory/Plugins/Uncategorized/Operating-Systems/Linux/check_iostat-%252D-I-2FO-statistics/details

#注意:保证所有被监控机上都安装了sysstat包,并可以执行iostat命令

#下载之后,放在每台被监控机的/usr/local/nagios/libexec/目录下
#然后更改属组,赋予可执行权限

chown nagios:nagios /usr/local/nagios/libexec/check_iostat
chmod 755 !$

#被监控机上更改nrpe的配置文件

vi /usr/local/nagios/etc/nrpe.cfg
#增加下边这两行,如果有更多磁盘,相应增加即可,至于警告和严重警告的值,请适当调整
#----------------------------引用文字-开始----------------------------
command[check_sda_iostat]=/usr/local/nagios/libexec/check_iostat -d sda -w 100 -c 200
command[check_sdb_iostat]=/usr/local/nagios/libexec/check_iostat -d sdb -w 100 -c 200
#----------------------------引用文字-结束----------------------------

#重启服务

service nrpe restart

#打开主监控机上的servicegroups.cfg,我选择的是监控所有机器的I/O,所以在all_hosts组下增加下面一段
#----------------------------引用文字-开始----------------------------
define service{
hostgroup_name       all_hosts
service_description       check_sda_iostat
check_command       check_nrpe!check_sda_iostat
max_check_attempts       4
check_interval       1440
retry_interval       5
check_period       24x7
notification_interval       1440
notification_period       24x7
notification_options      w,u,c
#contacts       contacts(*)
contact_groups       sa_groups
}
#----------------------------引用文字-结束----------------------------

#如果只监控个别机器,请在services.cfg中相应的机器下增加
#----------------------------引用文字-开始----------------------------
define service{
host_name       10.0.0.166
service_description       check_sda_iostat
check_command       check_nrpe!check_sda_iostat
max_check_attempts       3
check_interval       10
retry_interval       5
check_period       24x7
notification_interval       30
notification_period       24x7
notification_options      w,u,c
#contacts       contacts(*)
contact_groups       sa_groups
}
#----------------------------引用文字-结束----------------------------

#检查配置文件并重新加载服务

service nagios checkconfig
service nagios reload

Nagios配置文件美化

Posted by – 2009-10-30

监控的机器多了,那个hostgroups文件就成了这:

#----------------------------引用文字-开始----------------------------
define hostgroups{
hostgroup_name       all_hosts
alias       all_hosts
members       66.66.66.28,66.66.66.29,66.66.66.30,77.77.77.4,88.88.88.11,88.88.88.12,
88.88.88.13,88.88.88.14,88.88.88.15,88.88.88.16,88.88.88.17,88.88.88.18,
88.88.88.19,88.88.88.20,88.88.88.21,88.88.88.23,55.55.55.4,55.55.55.5,
55.55.55.6,55.55.55.7,99.99.99.28,99.99.99.4,99.99.99.2
#notes       note_string
#notes_url       url
#action_url       url
}

define hostgroups{
hostgroup_name       http_hosts
alias       http_hosts
members      66.66.66.29,77.77.77.4,88.88.88.11,88.88.88.12,88.88.88.13,55.55.55.4,
55.55.55.7,55.55.55.5,99.99.99.4,99.99.99.2
#notes       note_string
#notes_url       url
#action_url       url
}

define hostgroups{
hostgroup_name       mysql_hosts
alias       mysql_hosts
members     66.66.66.28,66.66.66.30,77.77.77.4,88.88.88.14,88.88.88.15,88.88.88.20,
88.88.88.21,55.55.55.6,55.55.55.7,99.99.99.2,99.99.99.3
#notes       note_string
#notes_url       url
#action_url       url
}
#----------------------------引用文字-结束----------------------------

一时半会还没发现怎么用include把member的内容写到其它地方。于是折中一下
写成这样:

#----------------------------引用文字-开始----------------------------
define hostgroups{
hostgroup_name       all_hosts
alias       all_hosts
members   66.66.66.28,\
66.66.66.29,\
66.66.66.30,\
77.77.77.4,\
88.88.88.11,\
88.88.88.12,\
88.88.88.13,\
88.88.88.14,\
88.88.88.15,\
88.88.88.16,\
88.88.88.17,\
88.88.88.18,\
88.88.88.19,\
88.88.88.20,\
88.88.88.21,\
88.88.88.23,\
55.55.55.4,\
55.55.55.5,\
55.55.55.6,\
55.55.55.7,\
99.99.99.28,\
99.99.99.4,\
99.99.99.2
#notes       note_string
#notes_url       url
#action_url       url
}

define hostgroups{
hostgroup_name       http_hosts
alias       http_hosts
members    66.66.66.29,\
77.77.77.4,\
88.88.88.11,\
88.88.88.12,\
88.88.88.13,\
55.55.55.4,\
55.55.55.7,\
55.55.55.5,\
99.99.99.4,\
99.99.99.2
#notes       note_string
#notes_url       url
#action_url       url
}

define hostgroups{
hostgroup_name       mysql_hosts
alias       mysql_hosts
members   66.66.66.28,\
66.66.66.30,\
77.77.77.4,\
88.88.88.14,\
88.88.88.15,\
88.88.88.20,\
88.88.88.21,\
55.55.55.6,\
55.55.55.7,\
99.99.99.2,\
99.99.99.3
#notes       note_string
#notes_url       url
#action_url       url
}
#----------------------------引用文字-结束----------------------------

使用vi编辑器是很容易的,打冒号进入,然后查找替换就行了

:1,$ s/,/,\\\r/g

然后

:行号,行号 s/^/\t\t\t/g

就行了

很简单的一个东西,写博客主要是因为忘了vi下的替换成换行是\r,所以记一下。

Nagios升级安装

Posted by – 2009-10-22

#################################
# Nagios升级安装
# Author: 楚霏
# Date: 2009-10-22
# Env: Centos 5.3 x86_64
#################################

#nagios 3.2.0出来有些时间了,决定把自己的3.1.2升级一下,同时顺便把插件从1.4.13升级到1.4.14

一、先在主监控机上升级nagios

cp -R /usr/local/nagios/etc /backup/nagios_config
cd /usr/local/src/
wget http://prdownloads.sourceforge.net/sourceforge/nagios/nagios-3.2.0.tar.gz
tar xvf nagios-3.2.0.tar.gz
cd nagios-3.2.0
./configure --with-command-group=nagcmd
make all
make install
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
service nagios restart

二、在主监控机上升级nagios-plugins

cd /usr/local/src/
wget http://prdownloads.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.14.tar.gz
tar xvf nagios-plugins-1.4.14.tar.gz
cd nagios-plugins-1.4.14
./configure --with-nagios-user=nagios --with-nagios-group=nagios
make
make install

三、在所有的被监控机上升级nagios-plugins
#步骤实际等同于文章第二部分,写在一块,在多台被监控上执行起来省事一些

cd /usr/local/src/ && wget http://prdownloads.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.14.tar.gz
tar xvf nagios-plugins-1.4.14.tar.gz && cd nagios-plugins-1.4.14 && ./configure --with-nagios-user=nagios --with-nagios-group=nagios
make && make install

四、可以检查一下,/usr/local/nagios/libexec/的日期都是当前的啦

ll /usr/local/nagios/libexec/
#----------------------------引用文字-开始----------------------------
-rwxr-xr-x 1 nagios nagios 339905 Oct 22 17:34 check_apt
-rwxr-xr-x 1 nagios nagios   2245 Oct 22 17:34 check_breeze
-rwxr-xr-x 1 nagios nagios 110956 Oct 22 17:34 check_by_ssh
lrwxrwxrwx 1 root   root        9 Oct 22 17:34 check_clamd -> check_tcp
-rwxr-xr-x 1 nagios nagios  64022 Oct 22 17:34 check_cluster
-r-sr-xr-x 1 root   nagios 101536 Oct 22 17:34 check_dhcp
-rwxr-xr-x 1 nagios nagios 103508 Oct 22 17:34 check_dig
-rwxr-xr-x 1 nagios nagios 390573 Oct 22 17:34 check_disk
-rwxr-xr-x 1 nagios nagios   8080 Oct 22 17:34 check_disk_smb
-rwxr-xr-x 1 nagios nagios 110269 Oct 22 17:34 check_dns
-rwxr-xr-x 1 nagios nagios  58713 Oct 22 17:34 check_dummy
-rwxr-xr-x 1 nagios nagios   3056 Oct 22 17:34 check_file_age
-rwxr-xr-x 1 nagios nagios   6318 Oct 22 17:34 check_flexlm
lrwxrwxrwx 1 root   root        9 Oct 22 17:34 check_ftp -> check_tcp
-rwxr-xr-x 1 nagios nagios 481109 Oct 22 17:34 check_http
-r-sr-xr-x 1 root   nagios 108694 Oct 22 17:34 check_icmp
-rwxr-xr-x 1 nagios nagios  70743 Oct 22 17:34 check_ide_smart
-rwxr-xr-x 1 nagios nagios  15137 Oct 22 17:34 check_ifoperstatus
-rwxr-xr-x 1 nagios nagios  12523 Oct 22 17:34 check_ifstatus
lrwxrwxrwx 1 root   root        9 Oct 22 17:34 check_imap -> check_tcp
-rwxr-xr-x 1 nagios nagios   7355 Oct 22 17:34 check_ircd
lrwxrwxrwx 1 root   root        9 Oct 22 17:34 check_jabber -> check_tcp
-rwxr-xr-x 1 nagios nagios  98072 Oct 22 17:34 check_load
-rwxr-xr-x 1 nagios nagios   6020 Oct 22 17:34 check_log
-rwxr-xr-x 1 nagios nagios  20287 Oct 22 17:34 check_mailq
-rwxr-xr-x 1 nagios nagios  71242 Oct 22 17:34 check_mrtg
-rwxr-xr-x 1 nagios nagios  71427 Oct 22 17:34 check_mrtgtraf
-rwxr-xr-x 1 nagios nagios  89016 Oct 22 17:34 check_nagios
lrwxrwxrwx 1 root   root        9 Oct 22 17:34 check_nntp -> check_tcp
lrwxrwxrwx 1 root   root        9 Oct 22 17:34 check_nntps -> check_tcp
-rwxrwxr-x 1 nagios nagios  65853 Sep  8 09:06 check_nrpe
-rwxr-xr-x 1 nagios nagios 100642 Oct 22 17:34 check_nt
-rwxr-xr-x 1 nagios nagios 105427 Oct 22 17:34 check_ntp
-rwxr-xr-x 1 nagios nagios  95287 Oct 22 17:34 check_ntp_peer
-rwxr-xr-x 1 nagios nagios  94825 Oct 22 17:34 check_ntp_time
-rwxr-xr-x 1 nagios nagios 125696 Oct 22 17:34 check_nwstat
-rwxr-xr-x 1 nagios nagios   8324 Oct 22 17:34 check_oracle
-rwxr-xr-x 1 nagios nagios  85798 Oct 22 17:34 check_overcr
-rwxr-xr-x 1 nagios nagios 124401 Oct 22 17:34 check_ping
lrwxrwxrwx 1 root   root        9 Oct 22 17:34 check_pop -> check_tcp
-rwxr-xr-x 1 nagios nagios 357008 Oct 22 17:34 check_procs
-rwxr-xr-x 1 nagios nagios  83442 Oct 22 17:34 check_real
-rwxr-xr-x 1 nagios nagios   9584 Oct 22 17:34 check_rpc
-rwxr-xr-x 1 nagios nagios   1135 Oct 22 17:34 check_sensors
lrwxrwxrwx 1 root   root        9 Oct 22 17:34 check_simap -> check_tcp
-rwxr-xr-x 1 nagios nagios 392331 Oct 22 17:34 check_smtp
lrwxrwxrwx 1 root   root        9 Oct 22 17:34 check_spop -> check_tcp
-rwxr-xr-x 1 nagios nagios  82483 Oct 22 17:34 check_ssh
lrwxrwxrwx 1 root   root        9 Oct 22 17:34 check_ssmtp -> check_tcp
-rwxr-xr-x 1 nagios nagios 100498 Oct 22 17:34 check_swap
-rwxr-xr-x 1 nagios nagios 126810 Oct 22 17:34 check_tcp
-rwxr-xr-x 1 nagios nagios  86459 Oct 22 17:34 check_time
lrwxrwxrwx 1 root   root        9 Oct 22 17:34 check_udp -> check_tcp
-rwxr-xr-x 1 nagios nagios  92746 Oct 22 17:34 check_ups
-rwxr-xr-x 1 nagios nagios  95025 Oct 22 17:34 check_users
-rwxr-xr-x 1 nagios nagios   2939 Oct 22 17:34 check_wave
-rwxr-xr-x 1 nagios nagios  90641 Oct 22 17:34 negate
-rwxr-xr-x 1 nagios nagios  93916 Oct 22 17:34 urlize
-rwxr-xr-x 1 nagios nagios   1921 Oct 22 17:34 utils.pm
-rwxr-xr-x 1 nagios nagios    862 Oct 22 17:34 utils.sh
#----------------------------引用文字-结束----------------------------

#保证nagios用户有权限使用这些插件

chown -R nagios:nagios /usr/local/nagios/libexec/

#完成

Nagios搭建监控服务器

Posted by – 2009-08-11

####################################
#nagios_configuration
#Author:楚霏
#Date: 2009-3-19
#Update:2009-8-11
#Env: Centos 5.3 x86_64
#感谢Sery兄的帮助
####################################

一、准备工作
####################################
环境:Centos 5.3 x86_64
所需软件:

nagios-3.1.?.tar.gz
nagios-plugins-1.4.13.tar.gz
nrpe-2.12.tar.gz
httpd-2.2.??.tar.gz
gcc
glibc
glibc-common
gd
gd-devel
fetion20080910047-lin64.tar.gz
library64_linux.tar.gz
libstdc++-4.3.0-8.x86_64.rpm

####################################

####################################
#下载相关软件

cd /usr/local/src/
wget http://osdn.dl.sourceforge.net/sourceforge/nagios/nagios-3.1.2.tar.gz
wget http://osdn.dl.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.13.tar.gz
wget http://jaist.dl.sourceforge.net/sourceforge/nagios/nrpe-2.12.tar.gz
wget ftp://mirror.switch.ch/pool/2/mirror/fedora/linux/releases/9/Fedora/x86_64/os/Packages/libstdc++-4.3.0-8.x86_64.rpm
wget http://www.it-adv.net/fetion/downng/fetion20090406003-linux.tar.gz
wget http://www.it-adv.net/fetion/downng/library_linux.tar.gz

####################################

二、环境介绍
####################################
两台机器全是Centos 5.3 x86_64
主监控机IP=10.0.0.52
被监控机IP=10.0.0.166
主监控机上运行nagios的用户名是nagios,这个用户隶属于nagios组和运行apache的用户组

主监控机需要安装nagios,nagios-plugins,nrpe,fetion
被监控机只需要安装nagios-plugins,nrpe

支持PHP和GD的WEB环境并不是nagios必需的,主要是为了在web上看到监控状态,而nagios所带的html需要php+gd的支持

所有增减主机增减服务器操作均在主监控机上配置
主监控机上的nagios.cfg是总的配置文件,配置各个部分的配置文件的位置等信息
####################################

三、安装配置
####################################
(1)在主监控机上安装apache+php+gd的web环境,推荐编译安装,不再赘述,本处方便起见用yum装了

yum -y install gcc glibc glibc-common gd gd-devel httpd php php-gd libpng

####################################

####################################
(2)在主监控机上安装Nagios

#创建相关的用户和组
useradd -m nagios
groupadd nagcmd && usermod -a -G nagcmd nagios

#下边这条命令是使nagios用户也隶属于运行web服务器的组
usermod -a -G nagcmd apache

cd /usr/local/src/
tar xvf nagios-3.1.?.tar.gz ; cd nagios-3.1.?

#可以先看一下编译帮助
./configure --help
./configure --prefix=/usr/local/nagios --with-command-group=nagcmd
make all

#第一步执行make install安装主要的程序、CGI及HTML文件
#第二步执行make install-init的步骤,它的作用是把nagios做成一个运行脚本,使nagios随系统开机启动
#第三步执行make install-commandmode 给外部命令访问nagios配置文件的权限
#第四步执行make install-config 把配置文件的例子复制到nagios的安装目录
make install
make install-init
make install-commandmode
make install-config

#验证程序是否被正确安装上文指定的安装路径(这里是/usr/local/nagios),看是否存在etc、bin、sbin、share、var这五个目录。
#bin      执行程序所在目录,这个目录只有一个文件nagios
#etc      配置文件位置,初始安装完后,只有几个*.cfg-sample文件
#sbin     Nagios Cgi文件所在目录,也就是执行外部命令所需文件所在的目录
#share    Nagios网页文件所在的目录
#var      Nagios日志文件、spid 等文件所在的目录
ls /usr/local/nagios

####################################

####################################
(3)配置WEB接口

#相当于httpd.conf中加了

#----------------------------引用文字-开始----------------------------
# Load config files from the config directory "/etc/httpd/conf.d".
Include conf.d/*.conf
#----------------------------引用文字-结束----------------------------

#然后在新建的/安装路径/httpd/conf.d/下新建了一个文件,内容是:

#----------------------------引用文字-开始----------------------------
# SAMPLE CONFIG SNIPPETS FOR APACHE WEB SERVER
# Last Modified: 11-26-2005
#
# This file contains examples of entries that need
# to be incorporated into your Apache web server
# configuration file.  Customize the paths, etc. as
# needed to fit your system.

ScriptAlias /nagios/cgi-bin "/usr/local/nagios/sbin"


#  SSLRequireSSL
   AuthType Basic
   Options ExecCGI
   AllowOverride None
   Order allow,deny
   Allow from all
#  Order deny,allow
#  Deny from all
#  Allow from 127.0.0.1
   AuthName "Nagios Access"
   AuthUserFile /usr/local/nagios/etc/htpasswd.users
   Require valid-user


Alias /nagios "/usr/local/nagios/share"


#  SSLRequireSSL
   AuthType Basic
   Options None
   AllowOverride None
   Order allow,deny
   Allow from all
#  Order deny,allow
#  Deny from all
#  Allow from 127.0.0.1
   AuthName "Nagios Access"
   AuthUserFile /usr/local/nagios/etc/htpasswd.users
   Require valid-user

#----------------------------引用文字-结束----------------------------

#yum安装的apache,可用下面命令来实现
make install-webconf
#生成验证用户,
htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
#在httpd.conf中的DirectoryIndex中加上index.php
#apache其它配置此处不再废话
service httpd start

####################################

####################################
(4)安装Nagios Plugins

cd /usr/local/src/
tar xvf nagios-plugins-1.4.??.tar.gz && cd nagios-plugins-1.4.??
./configure --with-nagios-user=nagios --with-nagios-group=nagios
make
make install

####################################

####################################
(5)把Nagios增加为服务器并试运行

chkconfig --add nagios
chkconfig --level 3 nagios on

#测试一下配置文件
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

#保证nagios用户有权限运行插件
chown -R nagios:nagios /usr/local/nagios/libexec/

#如果没有错误,启动
service nagios start

####################################

####################################
(6)Nagios配置文件简介

#主配置文件nagios.cfg

#日志文件
#格式:log_file=
#例如:
#log_file=/usr/local/nagios/var/nagios.log

#对象的配置文件
#格式:cfg_file=
#例如:
#cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
#cfg_file=/usr/local/nagios/etc/objects/contactgroups.cfg

#对象的配置目录
#格式:cfg_dir =
#例如:
#cfg_dir=/usr/local/nagios/etc/switches

#Nagios用户
#格式:nagios_user=
#例如:
#nagios_user = nagios

#配置文件cgi.cfg,它是控制相关cgi脚本的

#objects(对象)是所有可监控和通知的要素。
#下边包含的配置文件主要包括
#hosts.cfg定义被监控主机
#hostgroups.cfg定义被监控主机组
#services.cfg定义服务
#servicegroups.cfg定义服务组
#contacts.cfg定义联系人
#contactgroups.cfg定义联系人组
#timeperiods.cfg定义时间期限-如24×7全天候的监测
#commands.cfg定义命令
#servicedependency定义服务依赖
#serviceescalation定义服务扩展
#hostdependency定义主机依赖
#hostescalation定义主机扩展

####################################

####################################
(7)修改配置文件

cd /usr/local/nagios/etc/
cp nagios.cfg nagios.cfg.chushibak
vi nagios.cfg
#把下面部分

#----------------------------引用文字-开始----------------------------
cfg_file=/usr/local/nagios/etc/objects/commands.cfg
cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/objects/templates.cfg

# Definitions for monitoring the local (Linux) host
cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
#----------------------------引用文字-结束----------------------------

#修改为
#----------------------------引用文字-开始----------------------------
cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
cfg_file=/usr/local/nagios/etc/objects/contactgroups.cfg

cfg_file=/usr/local/nagios/etc/objects/services.cfg
cfg_file=/usr/local/nagios/etc/objects/servicegroups.cfg

cfg_file=/usr/local/nagios/etc/objects/commands.cfg
cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/objects/templates.cfg

# Definitions for monitoring the local (Linux) host
cfg_file=/usr/local/nagios/etc/objects/hosts.cfg
cfg_file=/usr/local/nagios/etc/objects/hostgroups.cfg
#----------------------------引用文字-结束----------------------------

####################################

####################################
(8)创建和修改对象配置文件

cd /usr/local/nagios/etc/objects
mkdir bak
mv contacts.cfg ./bak/
mv localhost.cfg ./bak/

cat << EOF >> hosts.cfg
#----------------------------引用文字-开始----------------------------
define host{
                host_name       10.0.0.52
                alias       10.0.0.52
                address       10.0.0.52
                max_check_attempts       5
                #check_interval       1
                #retry_interval       1
                check_period       24x7
                contact_groups       sa_groups
                notification_interval       30
                #first_notification_delay       #
                notification_period       24x7
                notification_options      d,u,r
                }

define host{
                host_name       10.0.0.166
                alias       10.0.0.166
                address       10.0.0.166
                max_check_attempts       5
                #check_interval       1
                #retry_interval       1
                check_period       24x7
                contact_groups       sa_groups
                notification_interval       30
                #first_notification_delay       #
                notification_period       24x7
                notification_options      d,u,r
                }
EOF
#----------------------------引用文字-结束----------------------------

cat << EOF >> hostgroups.cfg
#----------------------------引用文字-开始----------------------------
define hostgroup{
        hostgroup_name       all_hosts
        alias       all_hosts
        members       10.0.0.52,10.0.0.166
        #notes       note_string
        #notes_url       url
        #action_url       url
        }
define hostgroup{
        hostgroup_name       http_hosts
        alias       http_hosts
        members       10.0.0.166
        #notes       note_string
        #notes_url       url
        #action_url       url
        }
EOF
#----------------------------引用文字-结束----------------------------

cat << EOF >> contacts.cfg
#----------------------------引用文字-开始----------------------------
define contact{
                contact_name       cheng
                alias       sa_cheng
                host_notifications_enabled      1 [0/1]
                service_notifications_enabled      1 [0/1]
                host_notification_period       24x7
                service_notification_period       24x7
                host_notification_options      d,u,r
                service_notification_options      w,u,c,r
                host_notification_commands       notify-service-by-email,notify-service-by-sms
                service_notification_commands       notify-host-by-email,notify-host-by-sms
                email       yxcx@yahoo.cn
                pager       13712345678
                can_submit_commands      1 [0/1]
                #retain_status_information       [0/1]
                #retain_nonstatus_information       [0/1]
                }
EOF
#----------------------------引用文字-结束----------------------------

cat << EOF >> contactgroups.cfg
#----------------------------引用文字-开始----------------------------
define contactgroup{
		contactgroup_name       sa_groups
		alias       sa_groups
		members       cheng
		#contactgroup_members       contactgroups
		}
EOF
#----------------------------引用文字-结束----------------------------

#下边检查调用的命令(check_command),在命令配置文件中定义或在nrpe配置文件中要有定义
#最大重试次数(max_check_attempts),一般设置为3-4次比较好,这样不会因为太敏感而发生误报,一丢包就发短信太崩溃了吧
#检查间隔(check_interval)和重试检查间隔(retry_interval)的单位是分钟,不同的检查项目酌情修改
#通知间隔(notification_interval)指探测到故障以后,每隔多少分钟发送一次报警信息。
#状态级别:
#d=send notifications on a DOWN state宕
#w=send notifications on a WARNING state警告状态
#c=send notifications on a CRITICAL state严重状态、临界状态
#u=send notifications on an UNREACHABLE or UNKNOWN state找不到、不可达
#r=send notifications on recoveries (OK state)OK状态
#f=send notifications when the host or service starts and stops flapping
#s=send notifications when scheduled downtime starts and ends

cat << EOF >> services.cfg
#----------------------------引用文字-开始----------------------------
#monitor hosts
define service{
                host_name       10.0.0.166
                service_description       check_ftp
                check_command       check_ftp
                max_check_attempts       3
                check_interval       10
                retry_interval       5
                check_period       24x7
                notification_interval       30
                notification_period       24x7
                notification_options      w,u,c
                #contacts       contacts(*)
                contact_groups       sa_groups
                }
EOF
#----------------------------引用文字-结束----------------------------

cat << EOF >> servicegroups.cfg
#----------------------------引用文字-开始----------------------------
#monitor all_hosts
define service{
                hostgroup_name       all_hosts
                service_description       check_host-alive
                check_command       check_ping
                max_check_attempts       5
                check_interval       3
                retry_interval       1
                check_period       24x7
                notification_interval       30
                notification_period       24x7
                notification_options      w,u,c
                #contacts       contacts(*)
                contact_groups       sa_groups
                }
define service{
                hostgroup_name       all_hosts
                service_description       check_df
                check_command       check_nrpe!check_df
                max_check_attempts       4
                check_interval       1440
                retry_interval       5
                check_period       24x7
                notification_interval       1440
                notification_period       24x7
                notification_options      w,u,c
                #contacts       contacts(*)
                contact_groups       sa_groups
                }
define service{
                hostgroup_name       all_hosts
                service_description       check_load
                check_command       check_nrpe!check_load
                max_check_attempts       5
                check_interval       5
                retry_interval       5
                check_period       24x7
                notification_interval       30
                notification_period       24x7
                notification_options      w,u,c
                #contacts       contacts(*)
                contact_groups       sa_groups
                }
define service{
                hostgroup_name       all_hosts
                service_description       check_zombie_procs
                check_command       check_nrpe!check_zombie_procs
                max_check_attempts       5
                check_interval       5
                retry_interval       5
                check_period       24x7
                notification_interval       30
                notification_period       24x7
                notification_options      w,u,c
                #contacts       contacts(*)
                contact_groups       sa_groups
                }
define service{
                hostgroup_name       all_hosts
                service_description       check_total_procs
                check_command       check_nrpe!check_total_procs
                max_check_attempts       5
                check_interval       5
                retry_interval       5
                check_period       24x7
                notification_interval       30
                notification_period       24x7
                notification_options      w,u,c
                #contacts       contacts(*)
                contact_groups       sa_groups
                }
define service{
                hostgroup_name       all_hosts
                service_description       check_ssh
                check_command       check_ssh
                max_check_attempts       3
                check_interval       60
                retry_interval       5
                check_period       24x7
                notification_interval       60
                notification_period       24x7
                notification_options      w,u,c
                #contacts       contacts(*)
                contact_groups       sa_groups
                }

#monitor http_hosts
define service{
                hostgroup_name       http_hosts
                service_description       check_http
                check_command       check_http
                max_check_attempts       4
                check_interval       3
                retry_interval       1
                check_period       24x7
                notification_interval       30
                notification_period       24x7
                notification_options      w,u,c
                #contacts       contacts(*)
                contact_groups       sa_groups
                }
EOF
#----------------------------引用文字-结束----------------------------

####################################

####################################
(7)主监控机安装nrpe

cd /usr/local/src/
tar xvf nrpe-2.??.tar.gz && cd nrpe-2.??
./configure --prefix=/usr/local/nrpe

#编译结束后在屏幕打印出相关的一些系统信息
#----------------------------引用文字-开始----------------------------
General Options:
 -------------------------
 NRPE port:    5666
 NRPE user:    nagios
 NRPE group:   nagios
 Nagios user:  nagios
 Nagios group: nagios
#----------------------------引用文字-结束----------------------------
make
make install

#复制几个插件以便nrpe正常工作
cp /usr/local/nrpe/libexec/check_nrpe  /usr/local/nagios/libexec/
cp /usr/local/nagios/libexec/check_disk  /usr/local/nrpe/libexec/
cp /usr/local/nagios/libexec/check_load  /usr/local/nrpe/libexec/
cp /usr/local/nagios/libexec/check_ping  /usr/local/nrpe/libexec/
cp /usr/local/nagios/libexec/check_procs  /usr/local/nrpe/libexec/
chown -R nagios:nagios /usr/local/nrpe/libexec/

#在/usr/local/nagios/etc/objects/commands.cfg中适当位置加入下面内容,我加在check_ssh和check_dhcp中间了
vi /usr/local/nagios/etc/objects/commands.cfg
#----------------------------引用文字-开始----------------------------
# 'check_nrpe' command definition
define command{
        command_name check_nrpe
        command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
		}
#----------------------------引用文字-结束----------------------------

####################################

####################################
(8)配置nrpe

mkdir /usr/local/nrpe/etc
cp sample-config/nrpe.cfg  /usr/local/nrpe/etc/

#修改下边的几个选项
#server_address=按实际情况修改
#allowed_hosts=允许被哪些机器监控
#----------------------------引用文字-开始----------------------------
server_address=127.0.0.1
allowed_hosts=127.0.0.1
#----------------------------引用文字-结束----------------------------

#命令部分根据实际情况调整,比如硬盘,此处我注释了check_hda1命令,改为全部硬盘
#----------------------------引用文字-开始----------------------------
#command[check_hda1]=/usr/local/nrpe/libexec/check_disk -w 20% -c 10% -p /dev/hda1
command[check_df]=/usr/local/nrpe/libexec/check_disk -w 20% -c 10%
#----------------------------引用文字-结束----------------------------

#把nrpe增加为服务
cp init-script /etc/init.d/nrpe
chmod 755 /etc/init.d/nrpe
chkconfig --add nrpe
chkconfig --level 3 nrpe on

####################################

####################################
(9)安装飞信机器人

cd /usr/local/src/
rpm -Uvh libstdc++-4.3.0-8.x86_64.rpm
tar xvf fetion20090406003-linux.tar.gz
tar xvf library_linux.tar.gz
mv install ../sms
mv libACE* /usr/local/lib64/
mv libcrypto.so.0.9.8 /usr/local/lib64/
mv libssl.so.0.9.8 /usr/local/lib64/
echo "/usr/local/lib64/" >> /etc/ld.so.conf
ldconfig
chown -R nagios:nagios /usr/local/sms
chmod 755 /usr/local/sms/fetion

#最好能切换到nagios发短信测试一下
su nagios
#13744444444发短信所用的手机号
#jiubugaosuni为13744444444密码
#13712345678改为你自己的手机号
/usr/local/sms/fetion --mobile=13744444444 --pwd=jiubugaosuni --to=13712345678 --msg-utf8=test
#别忘了回到root用户
exit

#加入短信报警的命令,我加在email部分下边了
vi commands.cfg
#----------------------------引用文字-开始----------------------------
# 'notify-host-by-sms' command definition
define command{
        command_name    notify-host-by-sms
        command_line /usr/local/sms/fetion --mobile=13744444444 --pwd=jiubugaosuni --to=$CONTACTPAGER$ --msg-utf8="$NOTIFICATIONTYPE$ $HOSTNAME$ $SERVICEDESC$ is $SERVICESTATE$ info: $SERVICEOUTPUT$"
        }
# 'notify-service-by-sms' command definition
   define command{
        command_name    notify-service-by-sms
        command_line    /usr/local/sms/fetion --mobile=13744444444 --pwd=jiubugaosuni --to=$CONTACTPAGER$ --msg-utf8="$NOTIFICATIONTYPE$: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$"
        }
#----------------------------引用文字-结束----------------------------

#修改contacts.cfg和contactgroups.cfg相关信息,主要是手机号

####################################

####################################
(10)重启nagios服务,验证对主监控机本身的监控情况

#测试一下配置文件,看是否有错误输出
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
service nagios restart
#用浏览器打开http://ip/nagios/看一下情况

####################################

####################################
(11)在被监控机上安装nagios-plugins和nrpe

useradd -m nagios
cd /usr/local/src/
tar xvf nagios-plugins-1.4.13.tar.gz
cd nagios-plugins-1.4.13
./configure --with-nagios-user=nagios --with-nagios-group=nagios
make
make install
cd ../
tar xvf nrpe-2.12.tar.gz
cd nrpe-2.12
./configure
make
make install
mkdir /usr/local/nagios/etc/
cp sample-config/nrpe.cfg  /usr/local/nagios/etc/

#修改/usr/local/nagios/etc/nrpe.cfg下边的几个选项
#server_address=按实际情况修改
#allowed_hosts=允许被哪些机器监控
#----------------------------引用文字-开始----------------------------
server_address=10.0.0.166
allowed_hosts=127.0.0.1,10.0.0.52,10.0.0.166
#----------------------------引用文字-结束----------------------------
#命令部分根据实际情况调整,比如硬盘,此处我注释了check_hda1命令,改为全部硬盘
#----------------------------引用文字-开始----------------------------
#command[check_hda1]=/usr/local/nrpe/libexec/check_disk -w 20% -c 10% -p /dev/hda1
command[check_df]=/usr/local/nrpe/libexec/check_disk -w 20% -c 10%
#----------------------------引用文字-结束----------------------------
cp init-script /etc/init.d/nrpe
chmod 755 /etc/init.d/nrpe
chkconfig --add nrpe
chkconfig --level 3 nrpe on

####################################

####################################
(12)如何添加一台被监控机
#步骤:

#a.保证被监控机已经正确安装nagios-plugins和nrpe
#b.在hosts.cfg定义这台被监控机。把主机定义这部分复制粘贴后稍做修改即可
#c.在hostgroups.cfg定义这台机器应该属于哪些组
#d.需要监控的服务未在servicegroups被定义时在services.cfg中定义

####################################

####################################
(13)监控一台mysql服务器需注意

#编译nagios-plugins时需要加上--with-mysql=/usr/local/mysql(你的mysql安装路径)
#./configure --with-mysql=/usr/local/mysql --with-nagios-user=nagios --with-nagios-group=nagios
#在被监控机上做相关操作
#实际是以一个只有查询权限的用户nrpe来查询一个空数据库nrpe。功能等于mysqladmin -u 用户 --password='密码' status -i 2
mysql -p
#----------------------------引用文字-开始----------------------------
mysql> create database nrpe;
mysql> grant select on nrpe.* to nrpe@localhost identified by 'password' with grant option;
mysql> grant select on nrpe.* to nrpe@主监控机ip identified by 'password' with grant option;
#----------------------------引用文字-结束----------------------------
#试运行,会输出mysql运行情况
/usr/local/nagios/libexec/check_mysql -u nrpe -d nrpe
#在监控机所在的服务器上试运行(需要mysql_client)
/usr/local/nagios/libexec/check_mysql -H 10.0.0.166 -u nrpe -d nrpe

####################################

####################################
(14)监控一台web服务器时,可以采用nrpe来监控

#在主监控机的services.cfg中如需调用check_http命令的改为调用check_nrpe!check_http
#在被监控机中的nrpe.cfg中加下条
#----------------------------引用文字-开始----------------------------
command[check_http]=/usr/local/nagios/libexec/check_http -H www.chengyongxu.com -u /index.php
#----------------------------引用文字-结束----------------------------
#也就是说访问这台web服务器上的一个页面,这个页面正常说明web服务正常