Nagios XI
Best Practices
By Troy Lea
tlea@nagios.com
About Me
•Tech Support Contractor for Nagios
Enterprises
•Based in Australia
•Typically cover UTC+10 from 9am to 5pm
•Nagi...
What's Covered In This Talk
•Getting the most from Nagios XI
•Time saving information
•Configuration practices
•Object def...
Nagios XI
Server Internals
Nagios XI License Entitlements
•XI license entitles 3 instances:
•Production
•Test & Dev (T&D)
•Disaster Recovery (DR)
•Li...
Whats Monitoring Nagios XI?
•How would you know your XI server died?
•“Nagios XI Server” Monitoring Wizard
•DR instance mo...
localhost services
•Do you know how your XI server is
performing?
•Basic local services are included in XI base
•You shoul...
localhost services
•File Counts (check_file_count)
•NPCD Perfdata spool directory
•xidpe spool directory
•Check results fo...
localhost services
•root mailbox size
•(box293_check_mbox)
•MySQL / MariaDB
•Database tables crashed?
•(box293_check_mysql...
localhost services
•Overall Load (check_load)
•Memory Free – Physical (check_memory)
•Swap Usage (check_swap)
•Disk Free (...
Date and Timezone!
•Configure Timezone
•Admin > Manage System Config
•Sync with trusted time source
•VM? Don’t sync with h...
CPU
•CPU Cores vs Speed!
•Not everything is multi-threaded
•3.4 GHz vs 2.2 GHz
•Number of cores is still important
•Refer ...
Memory
•Enough memory to cope in a major outage
•Event handlers consume memory quickly (+GB
in a matter of minutes in a ma...
RAM Disk
•Lots of little files created/deleted/updated
•Using a RAM Disk:
•Reduces disk I/O & load
•Speeds up processing o...
Solid State Disk (SSD)
•Greatly improves overall performance
•Compliments RAM Disk
•Helps read/writes with:
•Logs
•Databas...
SSD vs RAID ?
•SSD beats* a spinning disk RAID set
•*Depends on how much money you have
•Still need to RAID1 SSD for redun...
rrdcached
•Enabling rrdcached accumulates the
spooled performance data, after x amount of
time it is processed into backen...
Offloaded MySQL / MariaDB
•Data constantly written to databases
•Historical and Configuration
•Offload to separate server ...
Mod-Gearman
•Used for offloading plugins to workers
•Plugins need to be installed on all workers
•Be aware of plugins that...
Disaster Recovey
•Failover and High Availability Solutions for
Nagios XI
•Andy Brist - NWC2014 – Failover & HA
•What is re...
Backups!!!
•Admin > System Backups
•Schedule backups of XI
•Location can be local, FTP, SSH
•Remote location recommended
•...
Restoring Backups
•Official Backup and Restore procedure
•Brings system back online with ease
•Great for migrating from ol...
Configuration
Intervals - Host vs Services
•Host down HARD = service notifications
suppressed
•What happens when host and services
use t...
Service Dependencies
•When a master service goes down:
•Prevents notifications from being sent
•Prevents service checks fr...
Disable Service Checks ?
•host_down_disable_service_checks
•Nagios Core 4.1.x feature (XI 5)
•System wide setting
•Reduces...
Check Intervals - Be Realistic
•Does it need to be checked every 5
minutes?
•Disk Free Space – every 60 minutes perhaps?
•...
Notification & Check Intervals
•Nagios determines if it is allowed to send a
notification every service HARD state
•e.g. 1...
Use Hostgroups!
•Assign ONE service to a hostgroup of
common servers
•Windows Servers
•Linux Servers
•Consistent monitorin...
Use Contact Groups!
•Use contact groups in all definitions
•Makes it easy when staff join/leave
•Just add/remove the conta...
Configuration Wizards
•Pros
•Great for getting up and running quickly
•No need to learn how a plugin works
•Cons
•Creates ...
Templates
•Common settings applied to objects
•Helps enforce standards
•Reduces administrative overhead
•Layer multiple te...
User Macros – resources.cfg
•$USERx$ macros are good for common
items like a username or password
•Allows passwords with a...
Custom Object Variables
•Allows you to create your own variables
•Can be defined in host or service objects
•E.G. hosts ha...
Other
MTRG Clean Configs
•Your MRTG configs may be collecting
more than what you think
•/etc/mrtg/conf.d/*.cfg files
•Created by...
Plugins – Compiled vs Scripts
•Compiled runs quicker
•Official nagios-plugins are compiled
•“Custom modifications” require...
Backend API - Read Only User
•API provides you with URLs for use in third
party products without needing user/pass
•Requir...
Performance Data Tool
•Component developed by box293
•Allows you to manipulate RRD files
•Great for merging RRD data
•Can ...
Thank you!
What Is Your Best Practice?
Any Questions?
end
done
fi esac
)
}
;
od
until
.
of 41

Nagios XI Best Practices

Best Practices? That’s like asking how long is a piece of string! While every environment is different, there are however a number of configurations, tweaks and methods that can be of great benefit for your Nagios XI environment. This talk will cover a variety of Best Practice topics for Nagios XI ranging from flexible object configurations through to back end performance enhancements.
Published on: Mar 3, 2016
Published in: Presentations & Public Speaking      
Source: www.slideshare.net


Transcripts - Nagios XI Best Practices

  • 1. Nagios XI Best Practices By Troy Lea tlea@nagios.com
  • 2. About Me •Tech Support Contractor for Nagios Enterprises •Based in Australia •Typically cover UTC+10 from 9am to 5pm •Nagios & XI Dev (Box293) •Nagios MVP3
  • 3. What's Covered In This Talk •Getting the most from Nagios XI •Time saving information •Configuration practices •Object definitions •Backend setup •Performance enhancements
  • 4. Nagios XI Server Internals
  • 5. Nagios XI License Entitlements •XI license entitles 3 instances: •Production •Test & Dev (T&D) •Disaster Recovery (DR) •License activation is tied to IP Address of each XI host
  • 6. Whats Monitoring Nagios XI? •How would you know your XI server died? •“Nagios XI Server” Monitoring Wizard •DR instance monitors production instance •Production instance is UP & HEALTHY •Production Instance monitors DR instance •DR instance is UP & HEALTHY
  • 7. localhost services •Do you know how your XI server is performing? •Basic local services are included in XI base •You should ideally be monitoring: •Service Status (check_init_service) •crond, httpd, mysql, ndo2db, npcd, ntpd, postgresql, snmptrapd, snmptt
  • 8. localhost services •File Counts (check_file_count) •NPCD Perfdata spool directory •xidpe spool directory •Check results folder •snmptt spool folder •nagios user account has not expired •(check_pass_expire.pl)
  • 9. localhost services •root mailbox size •(box293_check_mbox) •MySQL / MariaDB •Database tables crashed? •(box293_check_mysql_table_status) •Date/Time correct? •(box293_check_mysql_date)
  • 10. localhost services •Overall Load (check_load) •Memory Free – Physical (check_memory) •Swap Usage (check_swap) •Disk Free (check_disk)
  • 11. Date and Timezone! •Configure Timezone •Admin > Manage System Config •Sync with trusted time source •VM? Don’t sync with hypervisor! •Can be the source of confusing problems
  • 12. CPU •CPU Cores vs Speed! •Not everything is multi-threaded •3.4 GHz vs 2.2 GHz •Number of cores is still important •Refer to XI hardware requirements
  • 13. Memory •Enough memory to cope in a major outage •Event handlers consume memory quickly (+GB in a matter of minutes in a major outage) •Have at least 50% more memory than needed •Refer to XI hardware requirements
  • 14. RAM Disk •Lots of little files created/deleted/updated •Using a RAM Disk: •Reduces disk I/O & load •Speeds up processing of performance data •Speeds up processing of spooled check results •Speeds up nagios restarts •Refer to official procedure
  • 15. Solid State Disk (SSD) •Greatly improves overall performance •Compliments RAM Disk •Helps read/writes with: •Logs •Database •Performance Graphs •Reports
  • 16. SSD vs RAID ? •SSD beats* a spinning disk RAID set •*Depends on how much money you have •Still need to RAID1 SSD for redundancy! •SSD may not give you the required capacity •3.8TB SAS SSD now available !!!
  • 17. rrdcached •Enabling rrdcached accumulates the spooled performance data, after x amount of time it is processed into backend RRD files •Reduces Disk I/O •Can be a delay in data appearing in graphs •Refer to official procedure
  • 18. Offloaded MySQL / MariaDB •Data constantly written to databases •Historical and Configuration •Offload to separate server to reduce load •Don't forget to monitor offloaded server!!! •Disk/CPU/Memory/Tables/Service •Refer to earlier slides •Refer to official procedure
  • 19. Mod-Gearman •Used for offloading plugins to workers •Plugins need to be installed on all workers •Be aware of plugins that use /tmp files! •XI 2014 onwards uses Core 4 •Core 4 has it's own workers (only local workers) •nagios.cfg “check_workers” option •Refer to official procedure
  • 20. Disaster Recovey •Failover and High Availability Solutions for Nagios XI •Andy Brist - NWC2014 – Failover & HA •What is really important in disaster? •Plan and test
  • 21. Backups!!! •Admin > System Backups •Schedule backups of XI •Location can be local, FTP, SSH •Remote location recommended •Manual Backups •Local Backup Archives via Admin menu •/usr/local/nagiosxi/scripts/backup_xi.sh
  • 22. Restoring Backups •Official Backup and Restore procedure •Brings system back online with ease •Great for migrating from old XI to new XI •Also good for: •DR •Test & Dev
  • 23. Configuration
  • 24. Intervals - Host vs Services •Host down HARD = service notifications suppressed •What happens when host and services use the same check intervals? •Unnecessary Notifications get sent :( •Make host go down HARD quicker than it’s services!
  • 25. Service Dependencies •When a master service goes down: •Prevents notifications from being sent •Prevents service checks from execution •Make master service go down HARD quicker than dependent services! •Otherwise dependencies are pointless •Master service e.g. - Ping or NRPE Version
  • 26. Disable Service Checks ? •host_down_disable_service_checks •Nagios Core 4.1.x feature (XI 5) •System wide setting •Reduces load on XI host •Think of it as automatic service dependencies on their own hosts •Service dependencies ignored if host is down
  • 27. Check Intervals - Be Realistic •Does it need to be checked every 5 minutes? •Disk Free Space – every 60 minutes perhaps? •Too long = no performance data •Different intervals to spread the load •3, 5, 7 minute intervals •58, 60, 62 minute intervals
  • 28. Notification & Check Intervals •Nagios determines if it is allowed to send a notification every service HARD state •e.g. 15 minute check and 60 minute notification •Internal scheduling may cause 14min 55sec to pass, 4 x 14:55 = 59min 40sec … it’s < 60min! •Notification not sent until 75min! •Scheduling is geared +/- to reduce load!
  • 29. Use Hostgroups! •Assign ONE service to a hostgroup of common servers •Windows Servers •Linux Servers •Consistent monitoring, standards enforced! •Directive changes - all hosts get updated •Reduces management overhead
  • 30. Use Contact Groups! •Use contact groups in all definitions •Makes it easy when staff join/leave •Just add/remove the contact from groups •Reduces administrative overhead •Enforces your company policy •Similar principle to host groups
  • 31. Configuration Wizards •Pros •Great for getting up and running quickly •No need to learn how a plugin works •Cons •Creates individual services •More work later when enforcing “standards”
  • 32. Templates •Common settings applied to objects •Helps enforce standards •Reduces administrative overhead •Layer multiple templates •Can be additive or ignore inheritance •XI Config Wizard objects use templates •Example of common icmp check
  • 33. User Macros – resources.cfg •$USERx$ macros are good for common items like a username or password •Allows passwords with a ! exclamation mark •Values not visible in object definitions •$USER1$ •/usr/local/nagios/libexec
  • 34. Custom Object Variables •Allows you to create your own variables •Can be defined in host or service objects •E.G. hosts have their own check_nt password •Define _CHECK_NT_PASSWORD in host object •In command definitions reference it as: •$_HOSTCHECK_NT_PASSWORD$ •VERY POWERFULL!
  • 35. Other
  • 36. MTRG Clean Configs •Your MRTG configs may be collecting more than what you think •/etc/mrtg/conf.d/*.cfg files •Created by Network Switch / Router Wizard •Comment out unused ports •About 37 lines per port •Comment out unused non-interfaces (VLANs)
  • 37. Plugins – Compiled vs Scripts •Compiled runs quicker •Official nagios-plugins are compiled •“Custom modifications” require re-compiling •Scripts run slower, consume more resources •Perl plugins known to consume +CPU +RAM •“nice” can reduce impact of plugins •Check Profiler component by box293
  • 38. Backend API - Read Only User •API provides you with URLs for use in third party products without needing user/pass •Requires a user account to be created •Account should be READ ONLY
  • 39. Performance Data Tool •Component developed by box293 •Allows you to manipulate RRD files •Great for merging RRD data •Can also delete old RRD files for old services •View raw data in tables •Find it in the Nagios Exchange
  • 40. Thank you! What Is Your Best Practice? Any Questions?
  • 41. end done fi esac ) } ; od until .

Related Documents