Monitoring Netezza database
with Nagios
Frank Pantaleo
fpantaleo@brightlightconsulting.com
Introduction & Agenda
• A couple of W’s
• State of monitoring Netezza
• Monitoring Netezza with Nagios
• Future direct...
A couple of W’s - Why
Why are we monitoring Netezza ?
• How much $ does your business lose when IT is down ?
• 7 millio...
A Couple of W’s - What
What are we looking for in a monitor ?
• Universal monitoring
• Efficient Alert Notifications (a...
A couple of W’s - What
What are we looking for in a monitor ? (cont)
• Reporting (how many times was this service down ?...
State of Netezza monitoring
Monitoring systems available for Netezza
• Netezza event monitor – comes stock with tool
• ...
State of Netezza monitoring
Netezza comes with 34 alerts
Alerts actions have limited responses
• Email
• Script execut...
State of Netezza monitoring
Examples of Netezza 7.1 stock sample alerts
• Disk Full
• SPU Full
• Hardware Failed
• Ha...
State of Netezza monitoring
Netezza Portal
• Face on glass monitoring
• Custom queries can be added to the monitor
• A...
Netezza monitoring using Nagios
What are we monitoring in Netezza ?
• Table Locks by non-EDW statements during EDW batch...
Netezza monitoring using Nagios
What are we monitoring in Netezza ? (cont)
• SPU space unbalanced (generally a side effe...
Netezza monitoring using Nagios
Architecture options with Nagios
• Sensors live on Nagios monitoring server
• Sensors l...
Netezza monitoring using Nagios
Architecture options with Nagios (cont)
• Active – NRPE is a intermediary for running sc...
Netezza monitoring using Nagios
Passive alerts require snmp trap software
 Nagios server must be enabled to receive ale...
Netezza monitoring using Nagios
Passive alerts architecture
Netezza monitoring using Nagios
Active alerts require NRPE to be installed
 Checking is done using shell script and Per...
Netezza monitoring using Nagios
Active Alert architecture
Netezza monitoring using Nagios
Active Alert agent writing (interface requirements)
• MUST set a return code e.g.
• # 0...
Netezza monitoring using Nagios
Active alerts - NRPE configuration on Netezza server
• If using the Perl package command...
Netezza monitoring using Nagios
Active alerts - How does NRPE work on Nagios
server ?
define command{
command_name che...
Netezza monitoring using Nagios
Active Alerts - Perl programming using SQL.pm package
• Invocation
use lib "/nz/kit/sha...
Netezza monitoring using Nagios
Perl programming using SQL.pm package (continued)
• Interface example … nz::SQL::query($...
Future direction
• Data graphing
• Expand areas that we are monitoring for in Netezza
• Integrate into a product offeri...
Conclusion
 Key takeaways are
 Using Nagios can help your company have an extensible
event monitor. Understanding Nag...
Questions?
Any questions?
Thanks!
Reference
http://www.thegeekstuff.com/2010/08/monitoring-software-criteria/
http://exchange.nagios.org/directory/Tutoria...
The End
Frank Pantaleo
fpantaleo@brightlightconsulting.com
of 27

Nagios Conference 2014 - Frank Pantaleo - Nagios Monitoring of Netezza Databases

Frank Pantaleo's presentation on Nagios Monitoring of Netezza Databases. The presentation was given during the Nagios World Conference North America held Oct 13th - Oct 16th, 2014 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/conference
Published on: Mar 3, 2016
Published in: Technology      
Source: www.slideshare.net


Transcripts - Nagios Conference 2014 - Frank Pantaleo - Nagios Monitoring of Netezza Databases

  • 1. Monitoring Netezza database with Nagios Frank Pantaleo fpantaleo@brightlightconsulting.com
  • 2. Introduction & Agenda • A couple of W’s • State of monitoring Netezza • Monitoring Netezza with Nagios • Future direction
  • 3. A couple of W’s - Why Why are we monitoring Netezza ? • How much $ does your business lose when IT is down ? • 7 million each year from IT downtime • Gartner (2005) pegs the hourly cost of downtime for computer networks at $42,000 • A data center outage by itself can cost an average of $5,600 per minute • Outages damage their reputation • Now take this and bring it to a Cloud level - For every hour it is not up and running, Amazon.com takes a hit of almost $5 million • Allows you to be more proactive • Allow upper management to plan for DB growth (includes secondary effects e.g. DR, tape, disk for backup)
  • 4. A Couple of W’s - What What are we looking for in a monitor ? • Universal monitoring • Efficient Alert Notifications (also allows your IT staff to tell each other when something is being worked on) • Web Dashboard (one stop shopping!) • Issue Escalation (separate lists for warning, high) • Distributed Monitoring and Scalability (high availability)
  • 5. A couple of W’s - What What are we looking for in a monitor ? (cont) • Reporting (how many times was this service down ?) • External Application Integration (Can I enable my current applications to allow for early issue notification) • Open source solution
  • 6. State of Netezza monitoring Monitoring systems available for Netezza • Netezza event monitor – comes stock with tool • Netezza portal – comes stock with tool • Commercial offerings – Brightlight Consulting Observation Deck
  • 7. State of Netezza monitoring Netezza comes with 34 alerts Alerts actions have limited responses • Email • Script execution • In Version 7.1 can auto create support ticket • Configuration can be done through NPS client or command line interface on Netezza server
  • 8. State of Netezza monitoring Examples of Netezza 7.1 stock sample alerts • Disk Full • SPU Full • Hardware Failed • Hardware needs attention • Hardware restarted • Hardware service requested • Heat threshold exceeded • History capture event • History load event • HwvoltageFaultAuto • NPSNoLongerOnline • RegenFault • RunAwayQuery • No custom events allowed
  • 9. State of Netezza monitoring Netezza Portal • Face on glass monitoring • Custom queries can be added to the monitor • All queries can be seen as numeric or graphic • No alerting • Tool can also be used for maintaining database objects, users, events, and sessions • If you are using LDAP, portal can’t take advantage of it. Once you login to portal though you will be using your DB username/password
  • 10. Netezza monitoring using Nagios What are we monitoring in Netezza ? • Table Locks by non-EDW statements during EDW batch cycle • User queries exceeding 1 hour (90% time poorly formed queries) • User queries during EDW batch cycle (depends on SLA) • Age of backup older than SLA • LDAP server available for SSO
  • 11. Netezza monitoring using Nagios What are we monitoring in Netezza ? (cont) • SPU space unbalanced (generally a side effect of poor distribution) • State of EDW e.g. loading files, file processing complete • Late arrival of files preventing the EDW from meeting SLA’s
  • 12. Netezza monitoring using Nagios Architecture options with Nagios • Sensors live on Nagios monitoring server • Sensors live on Database server and are controlled by NRPE. This is what we went with based on customer security rules. • Scripting language is Perl. Really could be any language that allows ability to query the database and deal with responses. There are other options such as Bash, Java, Python, and C.
  • 13. Netezza monitoring using Nagios Architecture options with Nagios (cont) • Active – NRPE is a intermediary for running scripts and bringing results back to Nagios. • Passive – SNMP is an option but current provided alerts need to be tied into a SNMP agent that reports status. Netezza doesn’t raise SNMP alerts OOB.
  • 14. Netezza monitoring using Nagios Passive alerts require snmp trap software  Nagios server must be enabled to receive alerts – http://hyper-choi.blogspot.com/2012/12/nagios-snmp-trap-part-1- snmptt.html – http://hyper-choi.blogspot.com/2013/01/nagios-snmp-trap-part-2- configuration.html  Once Nagios is enabled Netezza events must be changed to make Nagios aware there is a issue – http://netezzaadmin.wordpress.com/2011/10/07/using-netezzas-event- manager-to-generate-snmp-traps
  • 15. Netezza monitoring using Nagios Passive alerts architecture
  • 16. Netezza monitoring using Nagios Active alerts require NRPE to be installed  Checking is done using shell script and Perl  Perl DBI ODBC  Downside is you have to have a exposed user/password. In this case it was against IT policy so I stopped using this option.  If we use this though all agents could live on Nagios server  Perl supplied package from Netezza  Downside is this is equivalent of admin so you can do anything  Upside is no username/password configuration  Agents must live on Database server
  • 17. Netezza monitoring using Nagios Active Alert architecture
  • 18. Netezza monitoring using Nagios Active Alert agent writing (interface requirements) • MUST set a return code e.g. • # 0 OK • # 1 WARNING • # 2 CRITICAL • # 3 UNKNOWN • Nagios dashboard displays associated text if (some logic here ) print "Okn"; else print "Error please look at tablexyzn";
  • 19. Netezza monitoring using Nagios Active alerts - NRPE configuration on Netezza server • If using the Perl package commands must run as nz user so /etc/nagios/nrpe.cfg must use the following – nrpe_user=nz – nrpe_group=nz • Once a sensor (perl script) is written and tested it must be added to nrpe.cfg file. • command[check_nz_longqry]=/export/home/nz/scrip ts/check_nz_longqry.pl • Best practice - Request /etc/nagios/nrpe.cfg be open to read/write from nz user
  • 20. Netezza monitoring using Nagios Active alerts - How does NRPE work on Nagios server ? define command{ command_name check_nrpe command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -t 300 } define service{ use generic-service host_name proddb service_description NZSQL Long query check_command check_nrpe!check_nz_longqry! notifications_enabled 0 }
  • 21. Netezza monitoring using Nagios Active Alerts - Perl programming using SQL.pm package • Invocation use lib "/nz/kit/share/perl"; use nz::SQL; • Package can only be used by the nz owner • NO username & password my ($KITDIR, $DATADIR); $DATADIR = "/nz/data.1.0"; $KITDIR = "/nz/kit"; nz::SQL::config(KITDIR => $KITDIR, DATADIR => $DATADIR); • Best practice - use alarm timers around SQL statements • Handy variables after each SQL execution $qresp->{nrows}, ncols, colid, qtype;
  • 22. Netezza monitoring using Nagios Perl programming using SQL.pm package (continued) • Interface example … nz::SQL::query($dbname, $sql). Unlike DBI the database must be called out every time you query. • Resultsets are not active in database (unlike DBI) they are in perl memory • Resultset traversal is done using perl foreach e.g. foreach my $row (@{$qresp->{data}}) { ($blocker_username,$blocker_sql,$blockee_username,$blockee_sql) = @$row; • Best practice: If you can avoid dealing with resultset and deal only with counts e.g (nrows). Most efficient use especially when dealing with a Nagios alert check that is going to occur several times a day.
  • 23. Future direction • Data graphing • Expand areas that we are monitoring for in Netezza • Integrate into a product offering (Observation Deck) from Brightlight that collects NZHIST for customer • Predict when we are going to outgrow our current processing and database needs
  • 24. Conclusion  Key takeaways are  Using Nagios can help your company have an extensible event monitor. Understanding Nagios architecture is important to a stable and working monitoring setup. Once you understand architecture setup writing an agent is trivial. If you can write SQL to detect an event then you can write an agent.  Other Reading materials or learning devices on this subject that you would like to share  URL’s provided in document have the recipe for how to setup Nagios, SNMP traps, and Netezza. Please visit those sites to get that info.
  • 25. Questions? Any questions? Thanks!
  • 26. Reference http://www.thegeekstuff.com/2010/08/monitoring-software-criteria/ http://exchange.nagios.org/directory/Tutorials/Install-and-Configure-NRPE-in- CentOS-and-Red-Hat/details http://www- 01.ibm.com/support/knowledgecenter/SSULQD_7.1.0/com.ibm.nz.portal.doc /c_portal_welcome.html http://www.networkworld.com/article/2329877/infrastructure-management/ how-to-quantify-downtime.html
  • 27. The End Frank Pantaleo fpantaleo@brightlightconsulting.com

Related Documents