Red Hat Openshift Container Platform v3.6 and Cisco ACI APIC Controller Integration

Cisco has published latest version of Cisco ACI and Red Hat Openshift Container Platform (OCP) 3.6 integration in below link :

https://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/kb/b_Cisco_ACI_and_OpenShift_Integration.html#id_60118

But in this post, I want to share what task have to be done at Openshift installation progress.  Please note that :

  1. Segment 192.168.12.xx kubeapi_vlan will not exist before we run : acc-provision -f openshift-3.6 -c aci-provisions-config.yaml -o aci-provisions.yaml -a -u admin -p password
  2. Before installation, we will use 192.168.11.xx to configure Hosts and to connect to CISCO APIC controller, and after we run steps number 1, we will use kubeapi_vlan as the primary interface with default gateway to this interface

Cisco LAB OCP – ACI in ABC Company

Hosts :

ocp-bastion IN A 192.168.12.20

ocp-master1 IN A 192.168.12.21

ocp-master2 IN A 192.168.12.22

ocp-master3 IN A 192.168.12.23

ocp-infra1 IN A 192.168.12.24

ocp-infra2 IN A 192.168.12.25

ocp-infra3 IN A 192.168.12.26

ocp-worker1 IN A 192.168.12.27

ocp-worker2 IN A 192.168.12.28

ocp-worker3 IN A 192.168.12.29

*.app.dev.abc IN A 192.168.12.24

—————————————————————–

***Configure DNS

[root@ocp-bastion ~]# cat /etc/named.conf

//

// named.conf

//

// Provided by Red Hat bind package to configure the ISC BIND named(8) DNS

// server as a caching only nameserver (as a localhost DNS resolver only).

//

// See /usr/share/doc/bind*/sample/ for example named configuration files.

//

// See the BIND Administrator’s Reference Manual (ARM) for details about the

// configuration located in /usr/share/doc/bind-{version}/Bv9ARM.html

options {

listen-on port 53 { 192.168.12.20; };

listen-on-v6 port 53 { ::1; };

directory “/var/named”;

dump-file “/var/named/data/cache_dump.db”;

statistics-file “/var/named/data/named_stats.txt”;

memstatistics-file “/var/named/data/named_mem_stats.txt”;

allow-query { localhost; 192.168.12.0/24; };

/*

– If you are building an AUTHORITATIVE DNS server, do NOT enable recursion.

– If you are building a RECURSIVE (caching) DNS server, you need to enable

recursion.

– If your recursive DNS server has a public IP address, you MUST enable access

control to limit queries to your legitimate users. Failing to do so will

cause your server to become part of large scale DNS amplification

attacks. Implementing BCP38 within your network would greatly

reduce such attack surface

*/

recursion yes;

dnssec-enable no;

dnssec-validation no;

//dnssec-lookaside auto;

//192.168.1.10 is my Public DNS server

forwarders { 192.168.1.10; };

/* Path to ISC DLV key */

bindkeys-file “/etc/named.iscdlv.key”;

managed-keys-directory “/var/named/dynamic”;

pid-file “/run/named/named.pid”;

session-keyfile “/run/named/session.key”;

};

logging {

channel default_debug {

file “data/named.run”;

severity dynamic;

};

};

zone “.” IN {

type hint;

file “named.ca”;

};

zone “dev.abc” IN {

type master;

file “forward.abc”;

allow-update { none; };

};

zone “12.168.192.in-addr.arpa” IN {

type master;

file “reverse.abc”;

allow-update { none; };

};

include “/etc/named.rfc1912.zones”;

include “/etc/named.root.key”;

———-

[root@ocp-bastion byo]# cat /var/named/forward.abc

$TTL 86400

@ IN SOA ocp-bastion.dev.abc. root.dev.abc. (

2011071001 ;Serial

3600 ;Refresh

1800 ;Retry

604800 ;Expire

86400 ;Minimum TTL

)

@ IN NS ocp-bastion.dev.abc.

ocp-bastion IN A 192.168.12.20

ocp-master1 IN A 192.168.12.21

ocp-master2 IN A 192.168.12.22

ocp-master3 IN A 192.168.12.23

ocp-infra1 IN A 192.168.12.24

ocp-infra2 IN A 192.168.12.25

ocp-infra3 IN A 192.168.12.26

ocp-worker1 IN A 192.168.12.27

ocp-worker2 IN A 192.168.12.28

ocp-worker3 IN A 192.168.12.29

*.app.dev.abc IN A 192.168.12.24

———-

[root@ocp-bastion byo]# cat /var/named/reverse.abc

$TTL 86400

@ IN SOA ocp-bastion.dev.abc. root.dev.abc. (

2011071001 ;Serial

3600 ;Refresh

1800 ;Retry

604800 ;Expire

86400 ;Minimum TTL

)

@ IN NS ocpbastion.dev.abc.

@ IN PTR dev.abc.

ocp-bastion IN A 192.168.12.20

ocp-master1 IN A 192.168.12.21

ocp-master2 IN A 192.168.12.22

ocp-master3 IN A 192.168.12.23

ocp-infra1 IN A 192.168.12.24

ocp-infra2 IN A 192.168.12.25

ocp-infra3 IN A 192.168.12.26

ocp-worker1 IN A 192.168.12.27

ocp-worker2 IN A 192.168.12.28

ocp-worker3 IN A 192.168.12.29

20 IN PTR ocp-bastion.dev.abc.

21 IN PTR ocp-master1.dev.abc.

22 IN PTR ocp-master2.dev.abc.

23 IN PTR ocp-master3.dev.abc.

24 IN PTR ocp-infra1.dev.abc.

25 IN PTR ocp-infra2.dev.abc.

26 IN PTR ocp-infra3.dev.abc.

27 IN PTR ocp-worker1.dev.abc.

28 IN PTR ocp-worker2.dev.abc.

29 IN PTR ocp-worker3.dev.abc.

————————————————————————

# Management network interface (not connected to ACI)

[root@ocp-master2 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens192

TYPE=Ethernet

PROXY_METHOD=none

BROWSER_ONLY=no

BOOTPROTO=static

DEFROUTE=yes

NAME=ens192

UUID=d2acbd8f-0c9d-4fa0-aa3b-a1f7bbb7f937

DEVICE=ens192

ONBOOT=yes

IPADDR=192.168.11.22

PREFIX=24

#GATEWAY=192.168.11.1

#DNS1=192.168.1.10

# Interface connected to ACI

[root@ocp-master2 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens224

TYPE=Ethernet

BOOTPROTO=none

NAME=ens224

DEVICE=ens224

ONBOOT=yes

MTU=1600

# ACI Infra VLAN

[root@ocp-master2 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens224.3000

BOOTPROTO=dhcp

DEVICE=ens224.3000

PHYSDEV=ens224

ONBOOT=yes

VLAN=yes

[root@ocp-master2 ~]# cat /etc/sysconfig/network-scripts/route-ens224.3000

224.0.0.0/4 dev ens224.3000

# Node Vlan

[root@ocp-master2 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ens224.3011

BOOTPROTO=none

DEVICE=ens224.3011

PHYSDEV=ens224

ONBOOT=yes

IPADDR=192.168.12.22

PREFIX=24

GATEWAY=192.168.12.1

DNS1=192.168.12.20

VLAN=yes

[root@ocp-master2 ~]# cat /etc/sysconfig/network-scripts/route-ens192

192.168.1.100/32 via 192.168.11.1 dev ens192

————————————————————————

# Create new DHCP-client file for infra vlan

# vi /etc/dhcp/dhclient-ens224.3000.conf

send dhcp-client-identifier 01:<mac-address of infra VLAN interface>;

request subnet-mask, domain-name, domain-name-servers, host-name;

send host-name <server-host-name>;

option rfc3442-classless-static-routes code 121 = array of unsigned integer 8;

option ms-classless-static-routes code 249 = array of unsigned integer 8;

option wpad code 252 = string;

also request rfc3442-classless-static-routes;

also request ms-classless-static-routes;

also request static-routes;

also request wpad;

also request ntp-servers;

————————————————————————

# Sample acc-provision-config.yaml

[root@ocp-master1 ~]# cat aci-containers-config.yaml

#

# Configuration for ACI Fabric

#

aci_config:

system_id: MABC_DEV_OPENSHIFT # Every opflex cluster must have a distict ID

apic_hosts: # List of APIC hosts to connect for APIC API

– 192.168.100.1

vmm_domain: # Kubernetes container domain configuration

encap_type: vxlan # Encap mode: vxlan or vlan

mcast_range: # Every opflex VMM must use a distinct range

start: 225.20.1.1

end: 225.20.255.255

nested_inside: # Include if nested inside a VMM

type: vmware # Specify the VMM vendor (supported: vmware)

name: VDSMABCDEVOPS # Specify the name of the VMM domain

# The following resources must already exist on the APIC,

# they are used, but not created by the provisioning tool.

aep: MABC_DEV_VMWARE_AEP # The AEP for ports/VPCs used by this cluster

vrf: # This VRF used to create all kubernetes EPs

name: MABC_DEV_DEVOPS_VRF

tenant: common # This can be system-id or common

l3out:

name: MABC_DEV_ASA_DEVOPS_L3OUT # Used to provision external IPs

external_networks:

– MABC_DEV_DEVOPS_NET # Used for external contracts

#

# Networks used by Kubernetes

#

net_config:

node_subnet: 192.168.12.1/24 # Subnet to use for nodes

pod_subnet: 192.168.0.1/21 # Subnet to use for Kubernetes Pods

extern_dynamic: 192.168.14.1/24 # Subnet to use for dynamic external IPs

extern_static: 192.168.15.1/24 # Subnet to use for static external IPs

node_svc_subnet: 192.168.13.1/24 # Subnet to use for service graph

kubeapi_vlan: 3011 # The VLAN used by the physdom for nodes

service_vlan: 3012 # The VLAN used by LoadBalancer services

infra_vlan: 3000 # The VLAN used by ACI infra

#

# Configuration for container registry

# Update if a custom container registry has been setup

#

registry:

image_prefix: noiro # e.g: registry.example.com/noiro

# image_pull_secret: secret_name # (if needed)

———————————————————————————————————-

***Installation steps :

– Prepare master,worker,infra as normal OCP installation

– Set 2 interfaces in each hosts, 1 for management, 1 for 2 vlan interface

– Install Cisco acc-provision package

– Edit aci-provision-config.yaml

– Run acc-provision

acc-provision -f openshift-3.6 -c aci-provisions-config.yaml -o aci-provisions.yaml -a -u admin -p password

– After provision, copy dhclient-ens224.conf file to /etc/dhcp/ and restart network service

– Network access should have primary gateway to ACI node_subnet gateway, not to management gateway

– Run Openshift Installation, and after installation, Nodes will not in Ready status

– Run oc apply -f aci-containers.yaml ( make sure all node can pull image from docker.io )

– After run, all nodes should be ready

***Installation notes

– Set the DNS record for all nodes to the kubeapi_vlan (in this lab vlan 3011, 192.168.12.xx IPs), so when the installation is running, etcd will detect 192.168.12.xx IPs, not ens192 IP

– before run oc apply -f aci-containers.yaml, make sure that all nodes has label, app = true

– Have to edit svc before expose : Spec – type: LoadBalancer

– Notes in ansible hosts file :

– add option :

openshift_master_ingress_ip_network_cidr: 192.168.15.0/24 ( go to extern_dynamic IP)

openshift_use_openshift_sdn: false

os_sdn_network_plugin_name: cni

Findings :

1. Minimum Pods IP that aci-controller needs is 128 IP, and it will be automatically assigned, so Pod Network Subnet should at least /21 for abc case (9 nodes)

2. Deploy aci-containers.yaml need access to *.docker.io and *.github.com

3. Add iptables UDP Port 8472

4. MTU should be 1600

5. Before run oc apply -f aci-containers.yaml, make sure that all nodes has label, app = true

6. Have to edit svc before expose : Spec – type: LoadBalancer

7. Have to allow UDP and TCP port 53 to all Nodes in iptables

~

Advertisements

How to Configure Remote Execution in Red Hat Satellite 6 Server

Remote Execution SSH Satellite

Remote execution uses SSH connections secured with key authentication from the capsule to the hosts. By default, the capsule loads the SSH key from /usr/share/foreman-proxy/.ssh/id_rsa_foreman_proxy. The SSH keys are created automatically when installing a capsule.

To enable remote execution, the capsule’s public SSH key needs to be copied to the hosts.

Use one of the following methods to copy the SSH key to the hosts:

Method 1: Install the public key automatically during provisioning of hosts:

Include a remote_execution_ssh_keys snippet in your Kickstart templates:

<%= snippet ‘remote_execution_ssh_keys’ %>

You can see this snippet in the Satellite Kickstart Default template under Hosts → Provisioning templates

Method 2: Use the Satellite 6 API to distribute the public key from a capsule to hosts that are already installed (like your host1.example.com):

Make sure host1.example.com is running in RHEV

Look up the IP address in Satellite, it should be 192.168.0.200

Log in to host1.example.com via SSH using the IP address.

Run this command:

curl https://satellite6.example.com:9090/ssh/pubkey >> ~/.ssh/authorized_keys

Method 3: Copy the key directly from the capsule (your Satellite in this case) to the host:

ssh-copy-id -i /usr/share/foreman-proxy/.ssh/id_rsa_foreman_proxy.pub root@<IP address>

How to Install and Configure Red Hat Satellite Server 6

Preparation :
Define foreman-initial-organization
Define foreman-initial-location
Define which repo needs to be synced (is there any custom content)
Define for dns, dhcp, tftp server (will it be managed by Satellite??)
Define Satellite server’s SSL Certificate
Define Life Cycle Environment
Define the location, P/V, of satellite and capsule
Define Hardening Guide

Install Satellite server :
Action :
*enable firewall rules for satellites
*Mount Iso image, run ./Install_packages
*answer file /etc/foreman-installer/scenarios.d/satellite-answers.yaml
satellite-installer –scenario satellite \
–foreman-initial-organization “initial_organization_name” \
–foreman-initial-location “initial_location_name” \
–foreman-admin-username admin-username \
–foreman-admin-password admin-password \
–foreman-proxy-dns-managed=false \
–foreman-proxy-dhcp-managed=false

*Register satellite server (capsule server) to satellite server
rpm -Uvh /var/www/html/pub/katello-ca-consumer-latest.noarch.rpm
subscription-manager register –org=ExampleCompany \
–environment=Library

Install Capsule server :
firewall-cmd –permanent –add-port=”53/udp” –add-port=”53/tcp” –add-port=”67/udp” –add-port=”69/udp” –add-port=”80/tcp” –add-port=”443/tcp” –add-port=”5647/tcp” –add-port=”8000/tcp” –add-port=”8140/tcp” –add-port=”8443/tcp” –add-port=”9090/tcp”
rpm -Uvh http://satellite.example.com/pub/katello-ca-consumer-latest.noarch.rpm
subscription-manager register –org organization_name
subscription-manager attach –pool=Red_Hat_Satellite_Capsule_Pool_Id
subscription-manager repos –enable rhel-7-server-rpms –enable rhel-7-server-satellite-capsule-6.2-rpm –enable rhel-server-rhscl-7-rpms –enable rhel-7-server-extras-rpms
yum install satellite-capsule
capsule-certs-generate –capsule-fqdn “mycapsule.example.com” –certs-tar “~/mycapsule.example.com-certs.tar” (run from satellite server, don’t forget to copy command under the result)
satellite-installer –scenario capsule –capsule-parent-fqdn “satellite.example.com” –foreman-proxy-register-in-foreman “true” –foreman-proxy-foreman-base-url ”

Configure Satellite Server :
1. Create Satellite Organization
2. Attach subs to current Satellite Organization
3. Download Manifest File
4. Upload Manifest file to Satellite server from Content > Red Hat Subscription
5. Enable products that want to be synced in RPMS
6. Create Life Cycle Environment
7. Create Content View
8. Publish and Promote content view to desired Environment
9. Sync Life Cycle Environment to Capsule via CLI – hammer capsule content add-lifecycle-environment –id ?, and hammer capsule content sync
10. Create Compute Source from RHEV, VMware, KVM or Bare Metal
11. Import SCAP Content
12. Download puppet module foreman_scap_clients, and upload to Satellite
13. Import Classes after upload puppet module to Satellite
14. Puppet environment will be created automatically
15. If you want provision, you have define the DHCP IP in the beginning, set DNS forwarders and enable TFTP. When create subnet it should import from capsule, not created

*Apply Custom Certificate to Satellite and Capsule
1. Create new certificate key : openssl genrsa -out /root/sat_cert/satellite_cert_key.pem 4096

2. Create new Certificate Signing Request for each Satellite and Capsule Hostname :
openssl req -new \
-key /root/sat_cert/satellite_cert_key.pem \
-out /root/sat_cert/satellite_cert_csr.pem

3. Check the Certificate Validity :
katello-certs-check \
-c /root/sat_cert/satellite_cert.pem \ 1
-k /root/sat_cert/satellite_cert_key.pem \ 2
-r /root/sat_cert/satellite_cert_csr.pem \ 3
-b /root/sat_cert/ca_cert_bundle.pem

4. Run the satellite installer to apply the new Certificate to Satellite Server :
To install the Satellite main server with the custom certificates, run:

satellite-installer –scenario satellite \
–certs-server-cert “/root/sat_cert/satellite_cert.pem” \
–certs-server-cert-req “/root/sat_cert/satellite_cert_csr.pem” \
–certs-server-key “/root/sat_cert/satellite_cert_key.pem” \
–certs-server-ca-cert “/root/sat_cert/ca_cert_bundle.pem”

To update the certificates on a currently running Satellite installation, run:

satellite-installer –scenario satellite \
–certs-server-cert “/root/sat_cert/satellite_cert.pem” \
–certs-server-cert-req “/root/sat_cert/satellite_cert_csr.pem” \
–certs-server-key “/root/sat_cert/satellite_cert_key.pem” \
–certs-server-ca-cert “/root/sat_cert/ca_cert_bundle.pem” \
–certs-update-server –certs-update-server-ca

5. Generate new certificate for capsule, do this per each capsule (match the hostname and the certificate)

capsule-certs-generate –capsule-fqdn “” \
–certs-tar “/root/certs.tar” \
–server-cert “/root/sat_cert/satellite_cert.pem” \
–server-cert-req “/root/sat_cert/satellite_cert_csr.pem” \
–server-key “/root/sat_cert/satellite_cert_key.pem” \
–server-ca-cert “/root/sat_cert/ca_cert_bundle.pem” \
–certs-update-server

6. Apply the new tar file to the Capsule :
satellite-installer –scenario capsule \
–capsule-parent-fqdn “capsule.example.com”\

 

 


——————————————————————————————————————————————————————————————————————————————————————————————

Linux Performance Monitoring Tools

Command Line Tools

Top

Top
This is a small tool which is pre-installed on many unix systems. When you want an overview of all the processes or threads running in the system: top is a good tool. Order processes on different criteria – the default of which is CPU.

htop

Top
Htop is essentially an enhanced version of top. It’s easier to sort by processes. It’s visually easier to understand and has built in commands for common things you would like to do. Plus it’s fully interactive.

atop

Atop monitors all processes much like top and htop, unlike top and htop however it has daily logging of the processes for long-term analysis. It also shows resource consumption by all processes. It will also highlight resources that have reached a critical load.

apachetop

Apachetop monitors the overall performance of your apache webserver. It’s largely based on mytop. It displays current number of reads, writes and the overall number of requests processed.

ftptop

ftptop gives you basic information of all the current ftp connections to your server such as the total amount of sessions, how many are uploading and downloading and who the client is.

mytop

Top
mytop is a neat tool for monitoring threads and performance of mysql. It gives you a live look into the database and what queries it’s processing in real time.

powertop

Top
powertop helps you diagnose issues that has to do with power consumption and power management. It can also help you experiment with power management settings to achieve the most efficient settings for your server. You switch tabs with the tab key.

iotop

Top
iotop checks the I/O usage information and gives you a top-like interface to that. It displays columns on read and write and each row represents a process. It also displays the percentage of time the process spent while swapping in and while waiting on I/O.

Desktop Monitoring

ntopng

Top
ntopng is the next generation of ntop and the tool provides a graphical user interface via the browser for network monitoring. It can do stuff such as: geolocate hosts, get network traffic and show ip traffic distribution and analyze it.

iftop

Top
iftop is similar to top, but instead of mainly checking for cpu usage it listens to network traffic on selected network interfaces and displays a table of current usage. It can be handy for answering questions such as “Why on earth is my internet connection so slow?!”.

jnettop

Top
jnettop visualises network traffic in much the same way as iftop does. It also supports customizable text output and a machine-friendly mode to support further analysis.

bandwidthd

Top
BandwidthD tracks usage of TCP/IP network subnets and visualises that in the browser by building a html page with graphs in png. There is a database driven system that supports searching, filtering, multiple sensors and custom reports.

EtherApe

EtherApe displays network traffic graphically, the more talkative the bigger the node. It either captures live traffic or can read it from a tcpdump. The displayed can also be refined using a network filter with pcap syntax.

ethtool

Top
ethtool is used for displaying and modifying some parameters of the network interface controllers. It can also be used to diagnose Ethernet devices and get more statistics from the devices.

NetHogs

Top
NetHogs breaks down network traffic per protocol or per subnet. It then groups by process. So if there’s a surge in network traffic you can fire up NetHogs and see which process is causing it.

iptraf

Top
iptraf gathers a variety of metrics such as TCP connection packet and byte count, interface statistics and activity indicators, TCP/UDP traffic breakdowns and station packet and byte counts.

ngrep

Top
ngrep is grep but for the network layer. It’s pcap aware and will allow to specify extended regular or hexadecimal expressions to match against packets of .

MRTG

Top
MRTG was orginally developed to monitor router traffic, but now it’s able to monitor other network related things as well. It typically collects every five minutes and then generates a html page. It also has the capability of sending warning emails.

bmon

Top
Bmon monitors and helps you debug networks. It captures network related statistics and presents it in human friendly way. You can also interact with bmon through curses or through scripting.

traceroute

Top
Traceroute is a built-in tool for displaying the route and measuring the delay of packets across a network.

IPTState

IPTState allows you to watch where traffic that crosses your iptables is going and then sort that by different criteria as you please. The tool also allows you to delete states from the table.

darkstat

Top
Darkstat captures network traffic and calculates statistics about usage. The reports are served over a simple HTTP server and gives you a nice graphical user interface of the graphs.

vnStat

Top
vnStat is a network traffic monitor that uses statistics provided by the kernel which ensures light use of system resources. The gathered statistics persists through system reboots. It has color options for the artistic sysadmins.

netstat

Top
Netstat is a built-in tool that displays TCP network connections, routing tables and a number of network interfaces. It’s used to find problems in the network.

ss

Instead of using netstat, it’s however preferable to use ss. The ss command is capable of showing more information than netstat and is actually faster. If you want a summary statistics you can use the command ss -s.

nmap

Top
Nmap allows you to scan your server for open ports or detect which OS is being used. But you could also use this for SQL injection vulnerabilities, network discovery and other means related to penetration testing.

MTR


MTR combines the functionality of traceroute and the ping tool into a single network diagnostic tool. When using the tool it will limit the number hops individual packets has to travel while also listening to their expiry. It then repeats this every second.

tcpdump


tcpdump will output a description of the contents of the packet it just captured which matches the expression that you provided in the command. You can also save the this data for further analysis.

Justniffer


Justniffer is a tcp packet sniffer. You can choose whether you would like to collect low-level data or high-level data with this sniffer. It also allows you to generate logs in customizable way. You could for instance mimic the access log that apache has.

Infrastructure Monitoring

Server Density


Our server monitoring tool! It has a web interface that allows you to set alerts and view graphs for all system and network metrics. You can also set up monitoring of websites whether they are up or down. Server Density allows you to set permissions for users and you can extend your monitoring with our plugin infrastructure or api. The service already supports Nagios plugins.

OpenNMS


OpenNMS has four main functional areas: event management and notifications; discovery and provisioning; service monitoring and data collection. It’s designed to be customizable to work in a variety of network environments.

SysUsage


SysUsage monitors your system continuously via Sar and other system commands. It also allows notifications to alarm you once a threshold is reached. SysUsage itself can be run from a centralized place where all the collected statistics are also being stored. It has a web interface where you can view all the stats.

brainypdm


brainypdm is a data management and monitoring tool that has the capability to gather data from nagios or another generic source to make graphs. It’s cross-platform, has custom graphs and is web based.

PCP


PCP has the capability of collating metrics from multiple hosts and does so efficiently. It also has a plugin framework so you can make it collect specific metrics that is important to you. You can access graph data through either a web interface or a GUI. Good for monitoring large systems.

KDE system guard


This tool is both a system monitor and task manager. You can view server metrics from several machines through the worksheet and if a process needs to be killed or if you need to start a process it can be done within KDE system guard.

Munin


Munin is both a network and a system monitoring tool which offers alerts for when metrics go beyond a given threshold. It uses RRDtool to create the graphs and it has web interface to display these graphs. Its emphasis is on plug and play capabilities with a number of plugins available.

Nagios


Nagios is system and network monitoring tool that helps you monitor monitor your many servers. It has support for alerting for when things go wrong. It also has many plugins written for the platform.

Zenoss


Zenoss provides a web interface that allows you to monitor all system and network metrics. Moreover it discovers network resources and changes in network configurations. It has alerts for you to take action on and it supports the Nagios plugins.

Cacti


(And one for luck!) Cacti is network graphing solution that uses the RRDtool data storage. It allows a user to poll services at predetermined intervals and graph the result. Cacti can be extended to monitor a source of your choice through shell scripts.

Zabbix

Zabbix Monitoring
Zabbix is an open source infrastructure monitoring solution. It can use most databases out there to store the monitoring statistics. The Core is written in C and has a frontend in PHP. If you don’t like installing an agent, Zabbix might be an option for you.

nmon


nmon either outputs the data on screen or saves it in a comma separated file. You can display CPU, memory, network, filesystems, top processes. The data can also be added to a RRD database for further analysis.

conky


Conky monitors a plethora of different OS stats. It has support for IMAP and POP3 and even support for many popular music players! For the handy person you could extend it with your own scripts or programs using Lua.

Glances


Glances monitors your system and aims to present a maximum amount of information in a minimum amount of space. It has the capability to function in a client/server mode as well as monitoring remotely. It also has a web interface.

saidar


Saidar is a very small tool that gives you basic information about your system resources. It displays a full screen of the standard system resources. The emphasis for saidar is being as simple as possible.

RRDtool


RRDtool is a tool developed to handle round-robin databases or RRD. RRD aims to handle time-series data like CPU load, temperatures etc. This tool provides a way to extract RRD data in a graphical format.

monit


Monit has the capability of sending you alerts as well as restarting services if they run into trouble. It’s possible to perform any type of check you could write a script for with monit and it has a web user interface to ease your eyes.

Linux process explorer

linux-process-monitor
Linux process explorer is akin to the activity monitor for OSX or the windows equivalent. It aims to be more usable than top or ps. You can view each process and see how much memory usage or CPU it uses.

df


df is an abbreviation for disk free and is pre-installed program in all unix systems used to display the amount of available disk space for filesystems which the user have access to.

discus


Discus is similar to df however it aims to improve df by making it prettier using fancy features as colors, graphs and smart formatting of numbers.

xosview


xosview is a classic system monitoring tool and it gives you a simple overview of all the different parts of the including IRQ.

Dstat


Dstat aims to be a replacement for vmstat, iostat, netstat and ifstat. It allows you to view all of your system resources in real-time. The data can then be exported into csv. Most importantly dstat allows for plugins and could thus be extended into areas not yet known to mankind.

Net-SNMP

SNMP is the protocol ‘simple network management protocol’ and the Net-SNMP tool suite helps you collect accurate information about your servers using this protocol.

incron

Incron allows you to monitor a directory tree and then take action on those changes. If you wanted to copy files to directory ‘b’ once new files appeared in directory ‘a’ that’s exactly what incron does.

monitorix

Monitorix is lightweight system monitoring tool. It helps you monitor a single machine and gives you a wealth of metrics. It also has a built-in HTTP server to view graphs and a reporting mechanism of all metrics.

vmstat


vmstat or virtual memory statistics is a small built-in tool that monitors and displays a summary about the memory in the machine.

uptime

This small command that quickly gives you information about how long the machine has been running, how many users currently are logged on and the system load average for the past 1, 5 and 15 minutes.

mpstat


mpstat is a built-in tool that monitors cpu usage. The most common command is using mpstat -P ALL which gives you the usage of all the cores. You can also get an interval update of the CPU usage.

pmap


pmap is a built-in tool that reports the memory map of a process. You can use this command to find out causes of memory bottlenecks.

ps


The ps command will give you an overview of all the current processes. You can easily select all processes using the command ps -A

sar


sar is a part of the sysstat package and helps you to collect, report and save different system metrics. With different commands it will give you CPU, memory and I/O usage among other things.

collectl


Similar to sar collectl collects performance metrics for your machine. By default it shows cpu, network and disk stats but it collects a lot more. The difference to sar is collectl is able to deal with times below 1 second, it can be fed into a plotting tool directly and collectl monitors processes more extensively.

iostat


iostat is also part of the sysstat package. This command is used for monitoring system input/output. The reports themselves can be used to change system configurations to better balance input/output load between hard drives in your machine.

free


This is a built-in command that displays the total amount of free and used physical memory on your machine. It also displays the buffers used by the kernel at that given moment.

/Proc file system


The proc file system gives you a peek into kernel statistics. From these statistics you can get detailed information about the different hardware devices on your machine. Take a look at the full list of the proc file statistics

GKrellM

GKrellm is a gui application that monitor the status of your hardware such CPU, main memory, hard disks, network interfaces and many other things. It can also monitor and launch a mail reader of your choice.

Gnome system monitor


Gnome system monitor is a basic system monitoring tool that has features looking at process dependencies from a tree view, kill or renice processes and graphs of all server metrics.

Log Monitoring Tools

GoAccess


GoAccess is a real-time web log analyzer which analyzes the access log from either apache, nginx or amazon cloudfront. It’s also possible to output the data into HTML, JSON or CSV. It will give you general statistics, top visitors, 404s, geolocation and many other things.

Logwatch

Logwatch is a log analysis system. It parses through your system’s logs and creates a report analyzing the areas that you specify. It can give you daily reports with short digests of the activities taking place on your machine.

Swatch


Much like Logwatch Swatch also monitors your logs, but instead of giving reports it watches for regular expression and notifies you via mail or the console when there is a match. It could be used for intruder detection for example.

MultiTail


MultiTail helps you monitor logfiles in multiple windows. You can merge two or more of these logfiles into one. It will also use colors to display the logfiles for easier reading with the help of regular expressions.

Network Monitoring

acct or psacct

acct or psacct (depending on if you use apt-get or yum) allows you to monitor all the commands a users executes inside the system including CPU and memory time. Once installed you get that summary with the command ‘sa’.

whowatch

Similar to acct this tool monitors users on your system and allows you to see in real time what commands and processes they are using. It gives you a tree structure of all the processes and so you can see exactly what’s happening.

strace


strace is used to diagnose, debug and monitor interactions between processes. The most common thing to do is making strace print a list of system calls made by the program which is useful if the program does not behave as expected.

DTrace


DTrace is the big brother of strace. It dynamically patches live running instructions with instrumentation code. This allows you to do in-depth performance analysis and troubleshooting. However, it’s not for the weak of heart as there is a 1200 book written on the topic.

webmin


Webmin is a web-based system administration tool. It removes the need to manually edit unix configuration files and lets you manage the system remotely if need be. It has a couple of monitoring modules that you can attach to it.

stat


Stat is a built-in tool for displaying status information of files and file systems. It will give you information such as when the file was modified, accessed or changed.

ifconfig


ifconfig is a built-in tool used to configure the network interfaces. Behind the scenes network monitor tools use ifconfig to set it into promiscuous mode to capture all packets. You can do it yourself with ifconfig eth0 promisc and return to normal mode with `ifconfig eth0 -promisc`.

ulimit


ulimit is a built-in tool that monitors system resources and keeps a limit so any of the monitored resources don’t go overboard. For instance making a fork bomb where a properly configured ulimit is in place would be totally fine.

cpulimit

CPUlimit is a small tool that monitors and then limits the CPU usage of a process. It’s particularly useful to make batch jobs not eat up too many CPU cycles.

lshw


lshw is a small built-in tool extract detailed information about the hardware configuration of the machine. It can output everything from CPU version and speed to mainboard configuration.

w

W is a built-in command that displays information about the users currently using the machine and their processes.

lsof


lsof is a built-in tool that gives you a list of all open files and network connections. From there you can narrow it down to files opened by processes, based on the process name, by a specific user or perhaps kill all processes that belongs to a specific user.

Thanks for your suggestions. It’s an oversight on our part that we’ll have to go back trough and renumber all the headings. In light of that, here’s a short section at the end for some of the Linux monitoring tools recommended by you:

collectd

Collectd is a Unix daemon that collects all your monitoring statistics. It uses a modular design and plugins to fill in any niche monitoring. This way collectd stays as lightweight and customizable as possible.

Observium

Observium is an auto-discovering network monitoring platform supporting a wide range of hardware platforms and operating systems. Observium focuses on providing a beautiful and powerful yet simple and intuitive interface to the health and status of your network.

Nload

It’s a command line tool that monitors network throughput. It’s neat because it visualizes the in and and outgoing traffic using two graphs and some additional useful data like total amount of transferred data. You can install it with

yum install nload

or

sudo apt-get install nload

SmokePing

SmokePing keeps track of the network latencies of your network and it visualises them too. There are a wide range of latency measurement plugins developed for SmokePing. If a GUI is important to you it’s there is an ongoing development to make that happen.

MobaXterm

If you’re working in windows environment day in and day out. You may feel limited by the terminal Windows provides. MobaXterm comes to the rescue and allows you to use many of the terminal commands commonly found in Linux. Which will help you tremendously in your monitoring needs!

Shinken monitoring

Shinken is a monitoring framework which is a total rewrite of Nagios in python. It aims to enhance flexibility and managing a large environment. While still keeping all your nagios configuration and plugins.

*Source : https://blog.serverdensity.com/80-linux-monitoring-tools-know/

XAMPP for Linux Error Access from Public

If you have error to access phpmyadmin from remote access like this :

New XAMPP security concept:

Access to the requested object is only available from the local network.

This setting can be configured in the file “httpd-xampp.conf”.

For XAMPP/LAMPP 1.7.xx, add this line to “/opt/lampp/etc/extra/httpd-xampp.conf”, 

<Directory “/opt/lampp/phpmyadmin”>
      AllowOverride All
      Order allow,deny
      Allow from all

      Require all granted

</Directory>

For XAMPP/LAMPP 1.8.xx, you have do additional configuration in /opt/lampp/etc/extra/httpd-xampp.conf :

<LocationMatch “^/(?i:(?:xampp|security|phpmyadmin|licenses|webalizer|server-status|server-info))”>
Order deny,allow
#Deny from all
Allow from ::1 127.0.0.0/8 \
fc00::/7 10.0.0.0/8 172.16.0.0/12 192.168.0.0/16 \
fe80::/10 169.254.0.0/16

Add your Source IP in here

Save and restart your XAMPP Server (/opt/lampp/lampp restart).

Everything should be fine Now

Install VMware Workstation 9 in Fedora 18

Installing VMWare Workstation 9.0.1 in Fedora 18 is a bit easier than in previous versions of Fedora and VMWare.

Fedora 18 includes a newer version of the Intel video driver and mesa libraries enabling 3D acceleration in VMWare Workstation.

With kernel 3.7.x, no VMWare kernel module patch is needed. However, there are a few packages that need to be installed and one file that needs to be symlinked first.

First, install the kernel development tools and the GNU C Compiler.

# yum install kernel-devel kernel-headers gcc

Next, you need to symlink or copy the linux version.h file (otherwise the module generation will fail):

# cp /usr/src/kernels/`uname -r`/include/generated/uapi/linux/version.h \
     /lib/modules/`uname -r`/build/include/linux/

Install VMWare workstation via command line

# sh VMware-Workstation-Full-9.0.1-894247.x86_64.sh

After VMWare is installed, build the kernel modules:

# vmware-modconfig --console --install-all

After the modules were built, I was able to open VMware and select a VM to start. However, when I started the VM and the license key requirement came up and I was unable to enter my key (the dialog never came up).

There appears to be some sort of issue with the way privileges are elevated and the $DISPLAY environment variable, but the issue wasn’t easly fixable. Fortunately, there’s an easy way to register the product with your serial number:

# /usr/lib/vmware/bin/vmware-vmx --new-sn YOUR_SERIAL_NUMBER

Once you have entered your serial number, you should be good to go. The kernel module should continue to work on newer versions of 3.7.2-*, but when a new release comes out, you might need to repeat symlink/copy step.

***Repost From http://www.andrewsorensen.net/blog/2013/01/17/vmware-workstation-9-fedora-18***

How to run spark on 64 bit Linux OS

Normally, if you have downloaded Spark from ignite-realtime website for Linux, on tar.gz format, after you extract it, and run with sh command, it will be run. But when you use 64 bit Linux OS, it wont run, and show this error :

Preparing JRE …
./Spark: bin/unpack200: /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory
Error unpacking jar files. Aborting.
You might need administrative priviledges for this operation.

To solve this problem, remove the “jre” folder in the extracted directory of Spak.tar.gz file.  Then re run the spark using sh command. It’s Work !!!

Complete Guide Install and Configure Nagios on Suse

Nagios:

Host and Service Monitoring Tool – quick overview and basic installation guide

What is Nagios ?


This is a sample screen from our installation / service problems view.

Basically Nagios (http://www.nagios.org) is an open source host, service and network monitoring tool. It let’s you manage different types of services and hosts running on different operating systems like linux, netware, windows, aix ,…. It’s flexible in configuration and can be extended as much as you want. It’s configured within text files and managed with a web browser.

When you do a basic installation you get a set of Nagios check programs which let you start monitoring your first hosts and services, beginning with the installation in less than an hour.

Based on that default configuration you can start extending the configuration to your special needs.

You can use the existing check programs or even add more check programs if you take a look at http://www.nagiosexchange.org where other developers put a lot of their check programs for download. Or just write your own in any programming language that is available on linux. Based on the services, you can setup time frames for the monitoring process as well as notifications when alarms arise. You can, but do not have to. It’s always your choice and that’s great with Nagios. If you do not want to setup any notifications, ignore its configuration, you can always watch the service problems in the web browser. That is what I currently use. It’s good to come to the office in the morning take one look at the Nagios service problems view and know what is going on. Or when I’ll do an eDirectory migration in our 260+ netware server tree Nagios is open in the background to see which servers aren’t reachable.

Here is a list of check programs in the base system:

check_breeze check_http check_nt check_ssh

check_by_ssh check_icmp check_ntp check_swap

check_dhcp check_ifoperstatus check_nwstat check_tcp

check_dig check_ifstatus check_oracle check_time

check_disk check_imap check_overcr check_udp

check_disk_smb check_ircd check_ping check_udp2

check_dns check_ldaps check_pop check_ups

check_dummy check_load check_procs check_users

check_file_age check_log check_radius check_wave

check_flexlm check_mailq check_real negate

check_fping check_mrtg check_rpc urlize

check_ftp check_mrtgtraf check_sensors utils.pm

check_hpjd check_nagios check_smtp utils.sh

check_nntp check_snmp

At http://www.nagiosexchange.org you’ll find a lot more if you are missing something here.

Here are some things we wanted to solve with Nagios:

We are running a lot of different management solutions (ZEN for server, IBM Tivoli, HP / Compaq System Insight Manager, McAfee ePolicy Orchestrator, ……) and all of them are in some parts very strong and powerful. The problem is, that some of them are really time consuming installations, configurations and administrations. It’s problematic to add own services and hard to get a single view of all current services and the problems that existing.

So I went around looking for a solution that is even quiet easy to deploy but as powerful as possible to handle all our requirements. Of course I will try to drop out some other existing management solutions in our company for this, but I am sure it’s not possible to drop them all for this one. A combination of them will be the best for us.

Here are some samples (beside a lot of default monitoring requirements like cpu, filesystem, …) I had to deal with during the day to day administration and that should be able to automate with Nagios:

  • check Bordermanager SurfControl logfiles to see if the update happened
  • check ftp/tls and ftp/ssh server functionality with a real ftp user connect
  • check time synchronization on oes/linux server
  • check running virus scanner processes (like LinuxShield from McAfee)
  • check HP Proliant server status from the HP Proliant support pack agents
  • check DNS round robin functionality
  • check Vmware remote connection and web interface ports
  • check ZEN linux management mirror process for ZLM and RCD targets
  • check memory / swap usage on linux servers
  • check linux drbd synchronization status
  • check server availability on one screen during eg. eDirectory migrations
  • check ldap functionality
  • check web applications
  • check oes/linux cluster ressources
  • ….

So how does it work ?

Just to say it in a simple way, you define a host by it’s name and ip address and a view other parameters and assign it some services. Behind a service is nothing other than a check program configured that runs in a predefined interval some tests. Like the ftp services uses the check_ftp program to do some ftp connects to the server. It reports the result of it with a exit code back to Nagios. Exit code 0, if the test was okay = green, exit code 1 means warning = yellow, exit code 2 means critical = red and exit code 3 means unknown = orange.

Nagios itself shows you in the service problems view, the currently existing problems (hopefully this list is short or even empty) or in the service details a list of all configured services and their status.

Depending on your notification configuration it also notifies you with regarding information.

If you are interested in such a system monitoring tool the time is well worth to do a test installation and spend a view hours with it. We are monitoring currently 308 hosts and about 1058 services.

Additional configurations / extensions:

There are a lot of configurations / extensions that are not covered in this document. As you discover more and more of the possible configurations you will find things like how to put a web application link behind a host extra note. Like a host is down, just select it’s extra host notes and you are forwarded directly to the eg. HP remote insight board. Or if a server shows some warnings from the HP Proliant agents, you can select the additional service tasks and come directly to the Proliant web management interface on the server. There is no need for us to enter the url for that web page anymore or even take a look at the HP system insight manager for server status. That’s all in Nagios now.

I’ll try to show you how you can setup a Nagios installation with a little linux knowledge in less than an hour or two. I think that’s the best way to take a look at it. After that I’ll show how to add a new check program for Nagios to monitor if a single file exists.

So here is a basic installation guide:

Unnecessary to say that you should do it on a test server first. If you use eg. PuTTY for ssh connect to your server you can copy / paste the commands from this document directly into the shell. So you do not have to write them yourself.

  1. Do a default OES/Linux or SLES 9 – 32 bit installation.

    Nagios does not need very much memory (less than 32mb) and disk space (less than 50mb). So it could be a ?small? server or even a virtual machine.

  2. Install / remove some packages of your installation

    OES and SLES contains Nagios version 1.2 in it’s distribution but I’ll decided to use the current version 2.0 rc1 from http://www.nagios.org. So I had to remove and install a view packages with yast:

    remove if installed: nagios, nagios-nsca and nagios-plugins

    install if not yet done: gd-devel and libpng-devel packages
    and the whole
    Simple Webserver selection

  3. Download the Nagios and the plugin tarball

    Download the most recent Nagios and the official plugin tarball from the current Nagios version from http://www.nagios.org/download and copy it to /tmp on your test server.

    Right now there is the Nagios version 2.0rc1 available. Normally I prefer to use only rpms for installation, but this time I use the tarball so I can configure some additional parts during compilations and installation. If tried this installation procedure with other 2.0x version and it was working well.

    current nagios tarball: nagios-2.0rc1.tar.gz
    current plugin tarball:
    nagios-plugins-1.4.1.tar.gz

  4. Create the Nagios user and group

    You’re probably going to want to run Nagios under a normal user account, so add a new user and group to your system with the following command:

    # useradd -m nagios
    # groupadd nagios

  5. Create the installation directory

    Create the base directory where you would like to install Nagios as follows…

    # mkdir /usr/local/nagios

    Change the owner of the base installation directory to be the Nagios user and group you added earlier as follows:

    # chown nagios.nagios /usr/local/nagios

  6. Identify the web server user

    You’re probably going to want to issue external commands (like acknowledgements and scheduled downtime) from the web interface. To do so, you need to identify the user your web server runs as (typically wwwrun). This setting is found in your web server configuration file. The following command can be used to determine quickly what user Apache is running as:

    # grep -R “^User” /etc/apache2/*

    Normally the user is the wwwrun. We will add it to the Nagios group in the next step. If the user differs, be sure to use in step 7 the right one.

  7. Add a command file group

    Next we’re going to create a new group whose members include the user your web server is running as and the user Nagios is running as. Let’s say we call this new group ‘nagcmd‘:

    # groupadd nagcmd

    Next, add the users that your web server and Nagios run as to the newly created group with the following commands:

    # usermod -G nagcmd wwwrun
    # usermod -G nagcmd nagios

  8. Extract the Nagios tarball

    # cd /tmp
    # tar -xvzf nagios-2.0rc1.tar.gz
    # cd nagios-2.0rc1

  9. Compile the Nagios package

    Run the configure script to initialize variables and create a Makefile as follows:

    # ./configure –prefix=/usr/local/nagios –with-cgiurl=/nagios/cgi-bin –with-htmurl=/nagios –with-nagios-user=nagios –with-nagios-group=nagios ?with-command-group=nagcmd

  10. Compile the binaries

    Compile Nagios and the CGIs with the following command:

    # make all

  11. Installing the binaries and HTML files

    Install the binaries and HTML files (documentation and main web page) with the following command:

    # make install

  12. Installing an init script

    If you want, you can also install the init script /etc/init.d/nagios with the following command:

    # make install-init

  13. Installing command mode

    If you want, you can also install the command mode environment with the following command:

    # make install-commandmode

  14. Installing sample config files

    Now we install some default configuration file, that have to be changed a little bit later:

    # make install-config

  15. Installing the plugins

    Plugins are usually installed in the libexec/ directory of your Nagios installation (i.e. /usr/local/nagios/libexec). Plugins are scripts or binaries which perform all the service and host checks that constitute monitoring.

    # cd /tmp
    # tar -xvzf nagios-plugins-1.4.1.tar.gz
    # cd nagios-plugins-1.4.1
    # ./configure
    # make
    # make install

  16. Setup the Apache web interface

    To make Nagios accessible through the apache web server we have to setup a config file for it. Create the config file as follows:

    #
    vi /etc/apache2/conf.d/nagios.conf

    Insert this elements into that new file:

    ScriptAlias /nagios/cgi-bin /usr/local/nagios/sbin
    <Directory “/usr/local/nagios/sbin”>
    AllowOverride AuthConfig
    Options ExecCGI
    Order allow,deny
    Allow from all
    </Directory>

    Alias /nagios /usr/local/nagios/share
    <Directory “/usr/local/nagios/share”>
    Options None
    AllowOverride AuthConfig
    Order allow,deny
    Allow from all
    </Directory>

    After that restart the apache web server with the following command:

    #
    rcapache2 restart

  17. Setup a minimum Nagios configuration

    # cd /usr/local/nagios/etc
    # cp cgi.cfg-sample cgi.cfg
    # cp nagios.cfg-sample nagios.cfg
    # cp minimal.cfg-sample minimal.cfg
    # cp resource.cfg-sample resource.cfg

    For proper use of this Nagios configuration we have to create two additional, empty config files. Do not copy the sample files for this one, there would be duplicate command definitions.

    # touch checkcommands.cfg
    # touch misccommands.cfg

  18. The last configuration steps …

    Set the Nagios user and group as owner of the Nagios installation:

    # chown -R nagios.nagios /usr/local/nagios

    Deactivate the authentication for the cgi’s:

    # vi cgi.cfg

    Search for the line ?use_authentication=1? and change it to ?use_authentication=0?. That’s for testing easier to handle.
    But not all functions are possible if it’s disabled.
    For production use later it should be activated but then you have to configure some other parts of Nagios as well.

  19. Verify the Nagios configuration and start it

    # /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

    The ?Total Warnings? and ?Total Errors? should be 0 if you have done everything correct.
    If so just start it the first time:

    # /etc/init.d/nagios start

    Activate Nagios to start within the runlevel scripts automatically

    # insserv nagios

  20. Test Nagios access with a web browser

    http://<servername or ip address>/nagios

    You should see the following:


Now you can start exploring Nagios yourself.

Nice to know: All screens refresh every 30 seconds, no need to reload them.
That time can be changed even to lower values in the Nagios configuration files.

Just let me show you the most important views:


1. Tactical Overview:

This screen gives you an overview of the current status of the monitored services and hosts. Take a look at the ?Hosts? and you see that you are currently monitoring only one host. In the ?Services? line you see that you are monitoring 5 services and maybe all are reporting the status ?OK?. As a summary the ?Network Health? – Host and Service health bar is filled completely with green, indicating all configured hosts and services are OK.

Most fields on this page are links to more specific views. If you want to know more about your 5 monitored services you can either klick on the ?5 OK? filed, or choose ?Service Detail? on the left.


2. Service Detail

This screen shows you all the services configured to be monitored and their current status.
As we saw in the tactical screen here are our five services we monitor right now.

You can see basic informations about each service on this page:

Host the host to which this services are configured
If this field is marked red, the host itself is down,
if it’s just grey the server is up and reachable with ping.
Status show the current status of the service
OK = green
Warning = yellow
Critical = red
Unknown = orange
Last Check date and time when it has been checked the last time
Duration shows for how long the service in this status
Attempt how many attempts were needed for the check
Status Information this is the output from the check program

Again if you want to know more about a single service, select it by its name and you are redirected to a more detailed page about it.

3. Host detail

This is the same view as the service detail, showing the details of the monitored hosts. Therefore I have no screen shot of it. You would see all configured hosts and have again the choice to select one to get more informations about it.


4. Service Problems

Hopefully this screen will be empty as long as possible. This is the screen that I have opened the day long. On top of this document is an actual screen shot of our system. There are some service problems. Hmm.. something to do for me … Whenever a service reports a failure you will get the information on this page. The browser refreshes also every 30 seconds and you get the current list of failed services.

When a service reports a failure the line will be shown here. The interval a service should be checked can be configured in the Nagios configuration. The minimum interval is 1 minute. When the next check reports everything is okay for that service, it will disappear from this list. So this is the page where you can see the failures of your monitored services right now and even actual.

Last but not least for this article I want to show you how you can add another service with a new check program. I think this is the best way to understand the configuration for the hosts and services.

To show you how you could add new check programs on your own, here is an easy sample:

We add a simple bash script that checks if the file /tmp/nagios.chk is available. If it is there and it’s executable the service goes to critical, if it is there and not executable it’s going to warning and if it doesn’t exist the service is ok.

  1. Create the executable check file

    # vi /usr/local/nagios/libexec/check_file_exist.sh

    Add the following to that file:

    #!/bin/bash
    #
    # Check if a local file exist
    #
    while getopts F: VAR
    do
    case “$VAR” in
    F ) LOGFILE=$OPTARG ;;
    * ) echo “wrong syntax: use $o -F <file to check>”
    exit 3 ;;
    esac
    done

    if test “$LOGFILE” = “”
    then
    echo “wrong syntax: use $0 -F <file to check>”
    # Nagios exit code 3 = status UNKNOWN = orange
    exit 3
    fi

    if test -e “$LOGFILE”
    then
    if test -x “$LOGFILE”
    then
    echo “Critical $LOGFILE is executable !”
    # Nagios exit code 2 = status CRITICAL = red
    exit 2
    else
    echo “Warning $LOGFILE exists !”
    # Nagios exit code 1 = status WARNING = yellow
    exit 1
    fi
    else
    echo “OK: $LOGFILE does not exist !”
    # Nagios exit code 0 = status OK = green
    exit 0
    fi

    Now set the file attributes:

    # chown nagios.nagios /usr/local/nagios/libexec/check_file_exist.sh
    # chmod +x /usr/local/nagios/libexec/check_file_exist.sh

  2. Add the check program to the nagios configuration

    Each new check command has to been defined once in the global Nagios configuration:

    # vi /usr/local/nagios/etc/minimal.cfg

    Add the following block at the end of the file:

    define command{
    command_name check_file_exist
    command_line $USER1$/check_file_exist.sh -F /tmp/nagios.chk
    }

  3. Add a new service to the localhost

    Each new service has to be defined once in the Nagios configuration and can be assigned to a single host, multiple hosts or even a host group. We assign it only to the localhost that is already defined in this base configuration:

    # vi /usr/local/nagios/etc/minimal.cfg

    Add the following block at the end of the file:

    define service{
    use generic-service
    host_name localhost
    service_description File check
    is_volatile 0
    check_period 24×7
    max_check_attempts 4
    normal_check_interval 5
    retry_check_interval 1
    contact_groups admins
    notification_options w,u,c,r
    notification_interval 960
    notification_period 24×7
    check_command check_file_exist
    }

  4. Verify Nagios configuration and restart it

    After all changes of the config files you should check the Nagios configuration and you have to restart Nagios after that:

    # /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

    The Total Warnings and Total Errors should be 0 if you have done everything correct.
    So restart it with:

    # /etc/init.d/nagios restart

  5. Check if the new program is working

    First take a look at the tactical screen and you should see that one service is in status pending.
    That means no check was done before for this service.
    Wait a view minutes and it should disappear as pending and the number of OKs should increment from 5 to 6.

    Now create the file and watch the tactical screen, the service detail screen or the service problems screen.

    # touch /tmp/nagios.chk

    As we set the normal_check_interval to 5 minutes in the service definition, you should get the warning message during that time. Now add the executable attribute and watch:

    # chmod +x /tmp/nagios.chk

    The status should change during the check interval to critical.
    When you delete the file the service should return to status ok.

    # rm /tmp/nagios.chk

So that’s all for the moment. I hope I have shown you a little bit about Nagios and how it works.

For me it’s a great tool and it saves me a lot of time during the day-to-day business.

If you continue to work with it, there are a lot of things that could be made better with the configuration files. Please remember this is only a simple installation of it. If you would like I can write some more articles about it and how we manage our config files and what other check programs we added to the system. We even added user authentication for Nagios access with ldap to our eDirectory and so on.

Rainer Brunold

Repost From : http://www.novell.com/coolsolutions/feature/16723.html

Install and Configure Nagios on Suse Linux

Nagios is a popular host and service monitoring tool used by many administrators to keep an eye on their systems.

Since I wrote a basic installation guide in Jan 2006 on Cool Solutions many new versions were published and many Nagios plugins are now available. Because of that I think it’s time to write a series of articles here that show you some very interesting solutions. I hope that you find them helpful and that you can use them in your environment. If you are not yet and nagios user I hope that I can inspire you and you give it a try.

I don’t want to write here a full documentation about Nagios, I prefer to give you a basic installation guide so you can set it up very easy and play with it yourself. The installation guide will show you how to install Nagios as well as some interesting extensions and how they integrate into each other. During this installation you will make many modifications to the installation that will help to understand how it works, how you can integrate systems and different services. I will also provide some articles about monitoring special services where I describe what they do and what configuration changes are needed. All together should give you a very good overview and documentation on how you can enhance the Nagios installation yourself.

If you would like to read some detailed information about Nagios visit the documentation at the project homepage at http://www.nagios.org/docs or go through my short article from Jan 2006 at http://www.novell.com/coolsolutions/feature/16723.html

These are the Nagios extensions that I would like to explain for you:

Nagios 3.0 This version is currently in beta 4 but the installation and configuration should not differ very much from the released version later this year.
Nagios Plugins 1.4.10 This is a default set of Nagios check programs that do all the work for you by checking file systems, memory usage, cpu utilization and so on. Many more check programs are available at the Nagios Exchange web site (http://www.nagiosexchange.org)
Nagiosgraph 0.9 This is a perl enhancement that allows you to write Nagios check results into round robin databases and create graphs based on that values
NDOUtils 1.4 The ndoutils are also in beta state right now. They allow you to write the whole Nagios configuration and check results into a database from where it can be used by different other Nagios extensions.
NSCA 2.7.2 The NSCA package allows you to build some distributed Nagios installations that report all their check results to a single central Nagios installation. From that one you can keep the overview of your whole network.
NagVis 1.1 NagVis allows you based on the database filled by the NDOUtils to draw some very nice maps of your network and infrastructure.

Please note that there are much more plugins available for Nagios. I focus on this list because we use mainly this one. Many administrators out there use other one’s. Please feel free to write an article with you favorite one’s.

Here are screen shots of some of the components so you can expect where I would like to guide you to:

Nagios:

This is a screen shot of our Nagios production system with the service details of a linux server. The screen shot is from Nagios version 2.5 installation because our production is on that version right now. After the release of version 3.0 we will migrate to it. From the menu on the left side there are just a view changes in version 3.0, there rest is nearly the same.

Nagiosgraph:

This is a screen shot from a production system where we graph the number of ldap requests per second.

This screen shot is from a little bit older NagiosGraph version. The new one does better diagram formatting.

NagVis:

This is a sample screen shot from the project homepage at http://sourceforge.net/projects/nagvis

Chapter 1 — Basic Nagios installation and configuration

Nagios is included in all SUSE Linux Enterprise distributions in different version. The latest service pack 1 of SLES 10 includes Nagios 2.6 but as I would like to show you the integration of different components like the NDOUtils I have to use the most recent version 3.0 beta 4. here.

1. Server Preparation

I use a SLES 10 sp1 Server for this article. The installation should work the same way for SLES 9 or OES server.

Nagios does not need very much memory nor does it use a lot of disk space. I would say 256MB ram and about 100MB of disk space are enough to monitor a view hundred services. If you start using nagiosgraph to draw some graphs it will increase a little bit but not very much.

So I did a default installation of SLES 10 sp1 installing the additional patterns “Web and LAMP Server” and the “C/C++ Compiler and Tools“. The LAMP packages bring us the apache and the mysql database for the NDOUtils and the Compiler packages are required for building the software binaries.

After the installation please check that this packages are not installed because they would conflict with the one we are installing. If they are installed remove them with Yast.

nagios
nagios-nsca
nagios-nsca-client
nagios-plugins
nagios-plugins-extras
nagios-www

For several Nagios functions we require some additional packages that have to be installed. Please check that the following are installed:

gd-devel
libpng-devel

2. Software Download and Extraction

For this first section of the Nagios basic installation we require the following two packages:

Software Download Link Current File Name by 05/10/2007
Nagios 3.0 Beta 4 http://www.nagios.org/download nagios-3.0b4.tar.gz
Nagios Plugins 1.4.10 http://www.nagios.org/download nagios-plugins-1.4.10.tar.gz

Download those two packages and copy them to a temporary installation directory. I use /images for those steps.

# mkdir /images
# cp <nagios-3.0b4.tar.gz> /images
# cp <nagios-plugins-1.4.10.tar.gz> /images
# cd /images
# tar -xvzf nagios-3.0b4.tar.gz
# tar -xvzf nagios-plugins-1.4.10.tar.gz

3. Security Preparation

Nagios itself does not require root permission to run on the system.

In a normal installation there is a dedicated nagios user and a nagios group for that. Sometimes Nagios runs some check programs that require root permissions, for that we can utilize sudo at that time.

As apache presents the Nagios front end, we have the choice to submit some commands to Nagios using it.

For those steps we have to prepare a another local linux group called nagcmd that has the permissions to write to a named pipe where Nagios receives the command at the other side. Sample commands are when you would like to reschedule a service check to run right now and you don’t want to wait till the normal check interval, or when you would like to define a service downtime in which time frame there should be no service down notification.

NOTE: On SLES systems apache runs as user wwwrun. If you use a different distribution, add the appropriate user to the nagcmd group.

# useradd -m nagios
# groupadd nagios
# groupadd nagcmd
# usermod -G nagios,nagcmd nagios
# usermod -G nagcmd wwwrun

4. Software Compilation and Installation of Nagios 3.0 beta 4

Never compiled a software package? Don’t worry it’s quite easy.

The only thing is that sometimes packages require some extra parameters during the preparation of the compilation.

Nagios offers the choice to define the directory structure we would like to have when Nagios get’s installed. To do so we have to provide the configure command specific parameters where we would like to have the binaries after the installation. I vary here from a default Nagios installation because I would like to follow the LSB (Linux Standard Base) rules which define where each kind of files should be placed. A sample is that variable data (log files, databases, …) should be placed below /var and so on. Because of this situation we have to make a view more modifications after the installation.

NOTE: LSB ? Linux Standard Base ? this tries to standardize linux distributions. When a developer creates a software package according to the lsb rules, it is guaranteed that the package can be installed on all lsb certified distributions like SUSE, Red Hat, … LSB does not only give you rules where to place the different file type, it also gives the developers rules which libraries and function they can use to be lsb compliant. For more information please visit http://www.linuxbase.org/en

Here are my configuration options:

--prefix		/opt/nagios	defines where to put the Nagios files
--with-cgiurl		/nagios/cgi-bin	defines the web server url where the cgi's will be available
--with-htmurl		/nagios		defines the web server url where nagios will be available
--with-nagios-user	nagios		user account under which Nagios will run
--with-nagios-group	nagios		group account under which Nagios will run
--with-command-group	nagcmd		group account which will allow the apache user to submit 
						commands to Nagios

In the following command section, the “configure” will prepare the compilation and set the provided parameters, the “make all” will do the compilation and the “make install” will do the installation itself.

# cd /images/nagios-3.0b4
# ./configure --prefix=/opt/nagios --with-cgiurl=/nagios/cgi-bin \
              --with-htmurl=/nagios --with-nagios-user=nagios \
              --with-nagios-group=nagios --with-command-group=nagcmd
# make all

The “make all” compilation should finish without any errors and you should get a description of the next steps that have to be done. If there were some errors you have to correct them and rerun the configure command before continuing. If so, make sure to have the software packages installed listed in the part “1. Server Preparation” installed.

# make install
# make install-init
# make install-commandmode
# make install-config
# make install-webconf

Nagios is right now installed, but before starting we have to install some further components and make some changes to the default configuration. First I describe the installation of the following components and then the configuration of it.

5.Software Compilation and Installation of the Nagios Plugins 1.4.10

Here it is the same as with the Nagios package. First we have to prepare the compilation with the configure command, where we also have to provide matching options to the Nagios package and then we have to do the compilation and installation.

# cd /images/nagios-plugins-1.4.10
# ./configure --prefix=/opt/nagios --with-nagios-user=nagios \
              --with-nagios-group=nagios

If compilation has shown no error, continue with the installation, otherwise correct the problems and rerun the configuration command.

# make
# make install

That’s all for the plugins. They need nothing else.

6. Configuration of Nagios 3.0 beta 4

The Nagios package provides us a default set of configuration files. In general the configuration files are split into two different categories. The first one with files in /opt/nagios/etc controls the behavior on how Nagios works. The second group in the objects sub directory holds all host and service definitions.

Here is a overview of the default configuration files in the /opt/nagios/etc directory:

nagios.cfg This is the main Nagios configuration file that controls the behavior of the whole application. Here you define a lot of global parameters as well as all the other configuration files. There is a lot of documentation in there if you would like to go through.
cgi.cfg This file controls the web interface. Configurations like user authentication are here stored.
resource.cfg This file should hold sensitive data that is used in the check command definitions. Imagine you configure a ftp service to be monitored where you need a user account and password for. All the other Nagios configuration files are readable by everybody. Only the resource.cfg is only readable by the nagios user and the nagios group itself. So place the username and account into this file and refer in the other configuration files to this one by using the $USERx$ variable.

And this are the files in the objects sub directory:

timeperiods.cfg All time periods for Nagios are configured here. Eg. create a time period called 24×7 or another on named workhours. When you define in the other configuration files when eg. a service downtime should be notified you refer to a definition here.
contacts.cfg Every person that needs system notifications from Nagios have to be defined here.
Some parameters are email address, type of notification (email, sms, …), time periods for notification (24×7, workhours), groups to which the user belong ….
commands.cfg Every let’s call it external action (host check, service check, email notification, …) is just a command that Nagios launches with the appropriate parameters. All those command definitions are here in. Nagios provides a lot of so called macros which help us to define those commands and let us submit parameters to them. If you define a host with it’s ip address, Nagios provides a $HOSTNAME$ and $HOSTADDRESS$ macro during the command execution that con be used.
templates.cfg When you define several hosts or services it is easier to collect similar parameters together, assign them to a template and assign that template to the host and service definitions. This is a very easy way to keep the host and service definitions small and clean. Use templates whenever possible !
localhost.cfg This is a sample host and service definition for the localhost.
printer.cfg This is a sample host and service definition for the printer.
switch.cfg This is a sample host and service definition for the switch.
windows.cfg This is a sample host and service definition for the windows machine.

Because I changed the default directory structure we have to do some modifications to several files.

First there is the main Nagios configuration file nagios.cfg which holds most of the Nagios configuration itself.

# vi /opt/nagios/etc/nagios.cfg
...
log_file=/var/opt/nagios/nagios.log
...
object_cache_file=/var/opt/nagios/objects.cache
...
precached_object_file=/var/opt/nagios/objects.precache
...
status_file=/var/opt/nagios/status.dat
...
command_file=/var/opt/nagios/rw/nagios.cmd
...
lock_file=/var/opt/nagios/nagios.lock
...
temp_file=/var/opt/nagios/nagios.tmp
...
log_archive_path=/var/opt/nagios/archives
...
check_result_path=/var/opt/nagios/spool/checkresults
...
state_retention_file=/var/opt/nagios/retention.dat
...
debug_file=/var/opt/nagios/nagios.debug
...

Next create the required directories below /var to hold the above configured files:

# mkdir -p /var/opt/nagios/rw
# mkdir -p /var/opt/nagios/spool/checkresults
# mkdir -p /var/opt/nagios/archives
# chown -R nagios.nagios /var/opt/nagios
# chown -R nagios.nagcmd /var/opt/nagios/rw
# chmod 2775 /var/opt/nagios/rw

Also the Nagios runlevel script has to be modified:

# vi /etc/init.d/nagios
...
NagiosStatusFile=/var${prefix}/status.dat
NagiosRetentionFile=/var${prefix}/retention.dat
NagiosCommandFile=/var${prefix}/rw/nagios.cmd
NagiosVarDir=/var${prefix}
NagiosRunFile=/var${prefix}/nagios.lock
...

7. Apache Security Preparation

Nagios has by default the authentication for the web interface activated.

That means after Nagios has been started and you try to access the web interface a login windows appears. Nagios has already a default user defined in the contacts.cfg (user: nagiosadmin) so we just have to create a apache password file where we store the password for it.

Do this with the following command and set the password to nagios:

# htpasswd2 -c /opt/nagios/etc/htpasswd.users nagiosadmin
Password: nagios

NOTE: LDAP can also be used for user authentication and requires just a view changes to the apache configuration. But the users still have to be defined in the contacts.cfg. LDAP is used just for verifying the password. I will cover this later.

NOTE: If you do not like to define every user in the contacts.cfg and security is not the case for you, you can modify the cgi.cfg and change all “authorized_for…” parameters to “authorized_for…=*”. That will give all user that authenticate to the web server (or LDAP) all permissions even they do not exist in the contacts.cfg. After ny modification to the cgi.cfg restart Nagios.

8. Apache and Nagios Startup

So now all configurations are done and we can startup the applications.

During the Nagios installation an apache config file (nagios.conf) was placed in /etc/apache2/conf.d. So we first have to restart apache to activate that configuration.

# rcapache2 restart

After that we can start Nagios for the first time. In earlier version it was always a very good idea to do a configuration verify before restarting. Otherwise Nagios could be interrupted if there were errors in. In this new version now Nagios does this automatically when it is started or reloaded. If configuration errors were found the restart / reload operation is aborted and you get a information where the error is. Correct it and retry the operation.

As we use the default configuration and we made no mistakes Nagios should start up:

# /etc/init.d/nagios start

If you would like, add Nagios for automatic startup at system boot time:

# insserv nagios

9. Nagios Test

Now Nagios should be available at the following URL: http://<nagios server>/nagios

As you do have the authentication settings modified you should get an authentication window where you have to enter the user nagiosadmin with password nagios.

Repost From : http://www.novell.com/coolsolutions/feature/19807.html