Real World Uses for Nagios APIs
Janice Singh
janice.s.singh@nasa.gov
Agenda
This presentation describes the Nagios 4 APIs and how
the NASA Advanced Supercomputing at Ames
Research Center i...
The HUD:
Visualization of the Center Status
Janice S Singh – janice.s.singh@nasa.gov 3
Monitored Resources
• Pleiades
– 11,176-node SGI ICE supercluster
– 184,800 cores (plus 32,768 GPU cores)
• Frontend s...
Nagios 4 Application Programming Interface
• No additional setup required
• Returns JSON output – multi-language support...
JSON example
http://lnxsrv78/nagios4/cgi-bin/objectjson.cgi?
query=hostgroup&hostgroup=tools
"data": {
"hostgroup": { ...
Original Data Flow
network firewall (The Enclave)
nrpe nrpe
Dedicated Nagios Node
nsca
nsca
nrpe
ssh
Cluster
Comp...
Nagios 4 Benefits
• Upgrading simplified configuration file
– Frequent system configuration changes
– Error prone
– Ti...
Modified Data Flow
network firewall (The Enclave)
Cluster
nrpe nrpe
Dedicated Nagios Node
nrdp
nrdp
nrpe
ssh
Comp...
Data Transfer with NRDP vs NSCA
• Only using one pipe allows use of nrdp
• Removing datagg layer allows using nagios
as...
API Type - Archive
• Gives historical information based on var/archives
– Availability
– Alerts
– Notifications
• Bas...
API Type - Object
Mirrors what your nagios configuration is
•Hosts
•Services
•Contacts
•Commands
•Dependencies
•etc...
API Type - Status
Gives the current state of nagios checks
•Host
•Service
•Comment
•Downtime
http://lnxsrv78/nagios4...
Status API Post Processing
• The API return codes are different than nagios
• nagpopd converts for HUD
Status Code (Fro...
API GUI Tool
Tool to figure out the variables for the APIs
•Display builds the query
– Dropdowns provide only relevant ...
API GUI Tool Screenshot
Janice S Singh – janice.s.singh@nasa.gov 16
API GUI Tool Hover Example
Janice S Singh – janice.s.singh@nasa.gov 17
NAS Use of APIs
• nagpopd
– datagg replacement
– API for object model
– API for status
• Scheduled downtime handling ...
Using API for nagpopd
Uses objectJSON:
•Get the structure directly from the API
•Eliminates separate HUD config file
–...
NAS Local Process (nagpopd)
Prepares HUD interfacing file:
•Object Model
– Loaded at startup from API queries
– Perl, ...
Object Model
NII
System::
Main
System::
Config
System::
Encode
System::
Log
System::
Query
System::
Service2O...
API Queries
• Object JSON used on startup to create the layout:
– objectjson.cgi?query=hostlist&details=true
– objectjs...
Processing Status Information
• Generic Service object:
– Default process ::setStatus (no changes)
– Default output ::w...
Scheduled Downtime Handling
• Old solution edited downtime.log
• When host is down, nagios stops checking it
• Used to ...
External Program Use
• External program (command line interface)
$ schedule all
ALEX 10/06/2014 10:00-10:25 10/06/2014 ...
Updating downtimelist
• Use nagios external command feature
– SCHEDULE_HOST_DOWNTIME;<host_name>;
<start_time>;<end_tim...
Hiccups
Fixed by Nagios support
•Custom variables didn’t show up in JSON output
•Percent signs broke the JSON … sometim...
Hiccups
• We have one plugin that outputs so much data it can’t be
passed on the command line, so nrdp breaks.
– Kernel...
Future Plans
• AJAX-style updates to only
update the part of the page
that needs it
• Use the other information we
ge...
Conclusion
Using nagios 4 APIs has made our process much
easier and will do more so in the future
•Simplified configura...
Questions?
Janice S Singh – janice.s.singh@nasa.gov 31
Thank You
Janice Singh
janice.s.singh@nasa.gov
of 32

Nagios Conference 2014 - Janice Singh - Real World Uses for Nagios APIs

Janice Singh's presentation on Real World Uses for Nagios APIs. The presentation was given during the Nagios World Conference North America held Oct 13th - Oct 16th, 2014 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/conference
Published on: Mar 3, 2016
Published in: Technology      
Source: www.slideshare.net


Transcripts - Nagios Conference 2014 - Janice Singh - Real World Uses for Nagios APIs

  • 1. Real World Uses for Nagios APIs Janice Singh janice.s.singh@nasa.gov
  • 2. Agenda This presentation describes the Nagios 4 APIs and how the NASA Advanced Supercomputing at Ames Research Center is employing them to upgrade its graphical status display (the HUD) and explain why it’s worth trying to use them yourselves. Janice S Singh – janice.s.singh@nasa.gov 2
  • 3. The HUD: Visualization of the Center Status Janice S Singh – janice.s.singh@nasa.gov 3
  • 4. Monitored Resources • Pleiades – 11,176-node SGI ICE supercluster – 184,800 cores (plus 32,768 GPU cores) • Frontend systems • Hyperwall visualization cluster • Tape Storage - pDMF cluster • NFS servers for /home on computing systems • Lustre scratch filesystems with multiple servers • PBS (Portable Batch System) job scheduler Ref: http://www.nas.nasa.gov/hecc/ Janice S Singh – janice.s.singh@nasa.gov 4
  • 5. Nagios 4 Application Programming Interface • No additional setup required • Returns JSON output – multi-language support … • Three kinds of APIs – Archive – Object – Status • Run from the cgi-bin directory • Each of the APIs have a help query – domain.com/nagios/cgi-bin/statusjson.cgi?query=help – Also gives help if there is an error in the query Janice S Singh – janice.s.singh@nasa.gov 5
  • 6. JSON example http://lnxsrv78/nagios4/cgi-bin/objectjson.cgi? query=hostgroup&hostgroup=tools "data": { "hostgroup": { "group_name": "tools", "alias": "Tools Group", "members": [ "lamsdb", "lamsweb", "lnxsrv107", "nasrunner", "remedy", "reports" ], "notes": "", "notes_url": "", "action_url": "" } } Janice S Singh – janice.s.singh@nasa.gov 6
  • 7. Original Data Flow network firewall (The Enclave) nrpe nrpe Dedicated Nagios Node nsca nsca nrpe ssh Cluster Compute Node Remote Node Web Server nsca nagios nagios.cmd HUD format datagg nagios2.cmd nagios nagios nagios web interface HUD orange - pipe file green - text file purple - web site HUD buffer Janice Singh - janice.s.singh@nasa.gov 7 downtime.log
  • 8. Nagios 4 Benefits • Upgrading simplified configuration file – Frequent system configuration changes – Error prone – Time consuming • Was one file: 17,835 lines; now 23 files: 9,121 lines • Majority of the cleanup was using hostgroups • APIs eliminate datagg configuration file Janice S Singh – janice.s.singh@nasa.gov 8
  • 9. Modified Data Flow network firewall (The Enclave) Cluster nrpe nrpe Dedicated Nagios Node nrdp nrdp nrpe ssh Compute Node Remote Node Web Server nagios nagpopd nagios nagios nagios web interface HUD green - flat file purple - web site HUD buffer Janice Singh - janice.s.singh@nasa.gov 9
  • 10. Data Transfer with NRDP vs NSCA • Only using one pipe allows use of nrdp • Removing datagg layer allows using nagios as it was intended • nrdp’s larger file transfer simplifies process –Previously had to split/reassemble –Kernel limit may cause split/reassemble • No longer need to overload the perfdata Janice S Singh – janice.s.singh@nasa.gov 10
  • 11. API Type - Archive • Gives historical information based on var/archives – Availability – Alerts – Notifications • Based on timestamps that you give it http://lnxsrv78/nagios4/cgi-bin/archivejson.cgi? query=availability&availabilityobjecttype=hosts& hostname=pbspl233b&starttime=-604800& endtime=-0 Janice S Singh – janice.s.singh@nasa.gov 11
  • 12. API Type - Object Mirrors what your nagios configuration is •Hosts •Services •Contacts •Commands •Dependencies •etc. http://lnxsrv78/nagios4/cgi-bin/objectjson.cgi? query=hostgroup&hostgroup=tools Janice S Singh – janice.s.singh@nasa.gov 12
  • 13. API Type - Status Gives the current state of nagios checks •Host •Service •Comment •Downtime http://lnxsrv78/nagios4/cgi-bin/statusjson.cgi? query=hostlist&formatoptions=enumerate& hostgroup=tools Janice S Singh – janice.s.singh@nasa.gov 13
  • 14. Status API Post Processing • The API return codes are different than nagios • nagpopd converts for HUD Status Code (From Nagios To Hud): Pending: 1 => 6 Ok: 2 => 0 Warning: 4 => 1 Unknown: 8 => 3 Critical: 16 => 2 Janice S Singh – janice.s.singh@nasa.gov 14
  • 15. API GUI Tool Tool to figure out the variables for the APIs •Display builds the query – Dropdowns provide only relevant variables – Displays and executes the query – Displays the resulting JSON – Hovering over the input gives you help tips •domain.com/nagios/jsonquery.html Janice S Singh – janice.s.singh@nasa.gov 15
  • 16. API GUI Tool Screenshot Janice S Singh – janice.s.singh@nasa.gov 16
  • 17. API GUI Tool Hover Example Janice S Singh – janice.s.singh@nasa.gov 17
  • 18. NAS Use of APIs • nagpopd – datagg replacement – API for object model – API for status • Scheduled downtime handling Janice S Singh – janice.s.singh@nasa.gov 18
  • 19. Using API for nagpopd Uses objectJSON: •Get the structure directly from the API •Eliminates separate HUD config file – Duplicate effort – Human errors – Inertia (resist making changes) •HUD configuration put into nagios config •HUD content uses custom variables Janice S Singh – janice.s.singh@nasa.gov 19
  • 20. NAS Local Process (nagpopd) Prepares HUD interfacing file: •Object Model – Loaded at startup from API queries – Perl, but could be any OO language – Can apply to other processing needs – Specific processing via Service subclassing •Some objects created from custom variables – Some hosts form Domains – MultiServiceGroup for shared filesystem servers Janice S Singh – janice.s.singh@nasa.gov 20
  • 21. Object Model NII System:: Main System:: Config System:: Encode System:: Log System:: Query System:: Service2Object Objects:: Domain Objects:: Host Objects:: HostGroup Objects:: MultiServiceGroup Objects:: Service Objects:: A_Service Objects:: B_Service Objects:: Z_Service …
  • 22. API Queries • Object JSON used on startup to create the layout: – objectjson.cgi?query=hostlist&details=true – objectjson.cgi?query=hostgrouplist&details=true – objectjson.cgi?query=servicelist&details=true – objectjson.cgi?query=servicegrouplist&details=true • Status JSON queried in a loop to get latest data – statusjson.cgi?query=servicelist&details=true Janice S Singh – janice.s.singh@nasa.gov 22
  • 23. Processing Status Information • Generic Service object: – Default process ::setStatus (no changes) – Default output ::writeHUDb (reformat for HUD) – Other output methods easily added • ::writeJSON (planned) • ::writeHTML (later version) • others: MySQL commands, etc • Service Subclass overrides methods: – Handles service unique process or output – One array maps service name to object.pm Janice S Singh – janice.s.singh@nasa.gov 23
  • 24. Scheduled Downtime Handling • Old solution edited downtime.log • When host is down, nagios stops checking it • Used to sync with external program (schedule) … – Previous solution required shadow host • pleiades – actual host could be down • Pleiades – shadow never down – Now able to use APIs… Janice S Singh – janice.s.singh@nasa.gov 24 Host_a host_a
  • 25. External Program Use • External program (command line interface) $ schedule all ALEX 10/06/2014 10:00-10:25 10/06/2014 Raid Maintenance SUSAN 10/06/2014 10:00-10:25 10/06/2014 RAID maintenance REMEDY 10/06/2014 12:30-12:40 10/06/2014 Restart to resolve issue. $ • query=downtimelist&formatoptions=enumerate& details=true • Merges and updates nagios downtimelist … Janice S Singh – janice.s.singh@nasa.gov 25
  • 26. Updating downtimelist • Use nagios external command feature – SCHEDULE_HOST_DOWNTIME;<host_name>; <start_time>;<end_time>;<fixed>;<trigger_id>; <duration>;<author>;<comment> – SCHEDULE_HOST_DOWNTIME;pioneer;1412626315; 1412626233;1;0;7200;janice;just a test • Documentation described in: http://old.nagios.org/developerinfo/externalcommands/commandlist.Janice S Singh – janice.s.singh@nasa.gov 26
  • 27. Hiccups Fixed by Nagios support •Custom variables didn’t show up in JSON output •Percent signs broke the JSON … sometimes fatally •JSON output was limited to 8k •Newlines didn’t show up in output Janice S Singh – janice.s.singh@nasa.gov 27
  • 28. Hiccups • We have one plugin that outputs so much data it can’t be passed on the command line, so nrdp breaks. – Kernel limitation – Will have to send in packets • Having to have nsca and nrdp work at the same time Janice S Singh – janice.s.singh@nasa.gov 28
  • 29. Future Plans • AJAX-style updates to only update the part of the page that needs it • Use the other information we get from the APIs – When a service is acknowledged – Use archive data to display alerts based on trends Janice S Singh – janice.s.singh@nasa.gov 29
  • 30. Conclusion Using nagios 4 APIs has made our process much easier and will do more so in the future •Simplified configurations •Enabled object model •Improved the flow •Can communicate with external processes •Good customer support Janice S Singh – janice.s.singh@nasa.gov 30
  • 31. Questions? Janice S Singh – janice.s.singh@nasa.gov 31
  • 32. Thank You Janice Singh janice.s.singh@nasa.gov

Related Documents