Performance tuning
From OpenNMS
Hardware considerations
Probably the biggest performance improvement on systems that are collecting a lot of RRD data is to move PostgreSQL and Tomcat to a separate system from OpenNMS daemons! Huge difference.
On a server with hardware RAID, consider investing in a battery-backed write cache. On a HP DL380 G4, the I/O wait of the server dropped from an average of 15% to almost nil with the addition of a 128 MB BBWC. Additionally, ensure that you have ample memory on the system, on a HP G4 - single processor 4 Gigs of memory monitoring about 300 devices with 700 interfaces, our I/O wait time steadily began to climb. The CPU wait time was obsessively hogging all of the processor, making OpenNMS crawl, we resolved this by upping our memory to 12 Gigs of memory, which in turn brought the wait time back down to 1%.
For a small collection of monitored nodes, moving the RRD data area into a tmpfs / RAM drive may also alleviate the I/O wait caused by all of the writing required by the RRD data. The trade-off is that a server crash or power-down will cause the RRD files to be lost, unless you implement a sync tool to sync the RAM drive to a disk backup.
Operating system
- Don't run in a VM.
- Don't put DB or RRD data on file systems managed by LVM.
- Don't put DB or RRD data on file systems on RAID-5.
- Do put OpenNMS logs and RRDs and PostgreSQL data on separate spindles or separate RAID sets. Read details for postgres and RRD below.
- Do run on a modern kernel. Linux 2.6 and later as well as Solaris 10 or newer are good. Stay away from Linux 2.4, in particular.
- Set noatime mount flag on file systems hosting data for #4 above.
- Solaris 10 systems may require increasing ICMP buffer size if polling large numbers of systems (ndd -set /dev/icmp icmp_max_buf 2097152). Use 'netstat -s -P icmp' and check the value of 'icmpInOverflows' to determine if you're overflowing the ICMP buffer.
Java Virtual Machine (JVM)
Parameters for tuning java may be added in $OPENNMS_HOME/etc/opennms.conf.
One important parameter is the java heap size
JAVA_HEAP_SIZE=size_in_MBytes
The default value is 256.
You can roughly test performance improvement opening the event list from opennms, adding ?limit=250 to the url and pressing Return
http://opennms:8980/opennms/event/list?limit=250
Now there should be 250 events in your list. Press F5 (at least with Firefox and IE this is the Reload-Page button) and stop the time until the page finished to refresh. Repeat this several times to get a good mean value. Now stop opennms, change the heap size as described above, restart opennms and wait for about 10 minutes to let it settle down after starting. Repeat the measurements then increase the heap size again as described above. You will get a table like
heap refresh time 1536 5-7 sec. 2048 3-4 sec. 3072 1-2 sec.
Watch out for memory and swap on your system (by example using top) and decide which value to keep in the config file.
To speed up the start phase of the java virtual machine you might want to add
ADDITIONAL_MANAGER_OPTIONS="-Xms"$JAVA_HEAP_SIZE"m
though speeding up the startup time in most cases is not a big problem and the parameter sometimes doesn't help at all.
If you have a system with a lot of cores and threads like sun's niagara cpu you might run into a problem known as "Amdahl's Law", see http://en.wikipedia.org/wiki/Amdahl%27s_law. You can try to optimize garbage collection using different garbage collectors, see http://java.sun.com/docs/hotspot/gc1.4.2/#3.%20Sizing%20the%20Generations|outline.
Using
ADDITIONAL_MANAGER_OPTIONS="-XX:+UseParallelGC \ -verbose:gc \ -XX:+PrintGCDetails \ -XX:+PrintTenuringDistribution \ -XX:+PrintGCTimeStamps"
you will get a lot of time information about garbage collection in the output.log of opennms. The default garbage collector used by opennms is incgc (e.g. -XX:+incgc), others to try are ConcMarkSweepGC (-XX:+UseConcMarkSweepGC) and the ParallelGC (-XX:+UseParallelGC) which might be the best if you have a lot of cores/threads. If you have settled down you configuration remove the lines containing verbose and Print from the options:
ADDITIONAL_MANAGER_OPTIONS="-Xms"$JAVA_HEAP_SIZE"m -XX:+UseParallelGC"
---On Solaris system ---
It is also useful to use libumem instead of standard IO libraries on Solaris 10. If you want to enable libumem on an existing application, you can use the LD_PRELOAD environment variable (or LD_PRELOAD_64 for 64 bit applications) to interpose the library on the application and cause it to use the malloc() family of functions from libumem instead of libc.
LD_PRELOAD=libumem.so opennms start
LD_PRELOAD_64=libumem.so opennms start
To confirm that you are using libumem, you can use the pldd(1) command to list the dynamic libraries being used by your application. For example:
$ pgrep -l opennms
2239 opennms
$ pldd 2239
2239: opennms
/lib/libumem.so.1
/usr/lib/libc/libc_hwcap2.so.1
PostgreSQL
The default shared_buffers parameter in postgresql.conf is extremely conservative, and in most cases with modern servers, this can be significantly tweaked for a big performance boost, and drop in I/O wait time. This change will need to be in-line with kernel parameter changes to shmmax. See this PostgreSQL performance page for recommendations on this and other postgresql settings.
If you want to put PostgreSQL on a different box then you want to change the SQL host look in opennms-datasources.xml. The PostgreSQL server will also need iplike installed and configured.
To clean up extra events out of the datbase try this Event_Configuration_How-To#The_Database
PostgreSQL 8.1 and later
These changes to postgresql.conf will probably improve your DB performance if you have a enough RAM (about 2GB installed RAM for a dedicated server) to support the changes. (YMMV) You'll probably need to make adjustments to the shmmax kernel attribute on your system.
shared_buffers = 20000 work_mem = 16348 maintenance_work_mem = 65536 vacuum_cost_delay = 50 checkpoint_segments = 20 checkpoint_timeout = 900 wal_buffers = 64 stats_start_collector = on stats_row_level = on autovacuum = on
I've also set these higher values on *bigger* systems:
wal_buffers = 256 work_mem = 32768 maintenance_work_mem = 524288
Systems with lots of RAM and PostgreSQL 8.2
Recently, we've found that changing the max_fsm_pages and max_fsm_releations 10 fold on systems with plenty of memory (4G+), improves performance dramatically.
#max_fsm_pages = 204800 # min max_fsm_relations*16, 6 bytes each max_fsm_pages = 2048000 #max_fsm_relations = 1000 # min 100, ~70 bytes each max_fsm_relations = 10000
(Note that the free space map has been reimplemented in PostgreSQL 8.4 and is now self-maintaining, so the max_fsm_* settings above are not necessary if you're running PostgreSQL 8.4.1 or later - note that 8.4.0 is not supported due to a nasty bug.)
As well as really bumping these:
work_mem = 100MB maintenance_work_mem = 128MB
Note: To make adjustments to shmmax, do the following:
Start postgresql from the command line:
sudo -u postgres pg_ctl -D /var/lib/pgsql/data start
(adjusting paths as necessary) and look at the error message:
# FATAL: could not create shared memory segment: Invalid argument DETAIL: Failed system call was shmget(key=5432001, size=170639360, 03600). HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMMAX parameter. You can either reduce the request size or reconfigure the kernel with larger SHMMAX. To reduce the request size (currently 170639360 bytes), reduce PostgreSQL's shared_buffers parameter (currently 20000) and/or its max_connections parameter (currently 100).
Notice the value of "size".
Then up the value of shmmax:
sysctl -w kernel.shmmax=170639360
And restart postgresql (using the normal method such as "service postgresql start")
Finally, edit /etc/sysctl and add the line
kernel.shmmax=170639360
so it will survive a reboot.
PostgreSQL *any* Version
One additional configuration that seems to make a tremendous amount of peformance improvement is having the write-head logs on a separate spindle (even better a separate disk controller/channel). The way to do this is:
- shutdown opennms / tomcat
- shutdown postgresql
- cd to $PG_DATA
- mv pg_xlog <file system on different spindle>
- ln -s <file system on different spindle>/pg_xlog pg_xlog
- restart postgresql
Make sure postgres data and write-ahead logs do not live on a RAID-5 disk subsystem.
iplike stored procedure
See the documentation in iplike to be sure you have the best version of iplike running
RRDTool/JRobin
Writing all the snmp-collected data and the results from polling the service (response times) to rrd files produces a lot of disk I/O, so look for disk tuning below. For further tuning see the fundamentals and some more detailed pages like
Disk Tuning
Because OpenNMS is well-equipped for gathering and recording details regarding network and systems performance and behavior, it tends to be a write-heavy application. If your environment offers a very large number of data points to be managed, it would serve you well to ensure that a large degree of spindle separation exists. In particular and where possible, ensure that:
- OpenNMS SNMP Collection
- OpenNMS Response Time Collection
- OpenNMS (and system) logging
- PostgreSQL Database
- PostgreSQL Writeahead logging
..occur on separate spindles, and in some cases separate drives or separate devices. Further, in a *Nix environment, it may behoove you to ensure that the RRD's end up on different mounts, so one has the option of mounting with the noatime and nodiratime directives without compromising other aspects of the system configuration.
The defaults for the opennms directories mentioned above are
/opt/opennms/share/rrd/snmp /opt/opennms/share/rrd/response /opt/opennms/logs or /var/log/opennms
but watch out for symbolic links!
As a filesystem, the best performance is achieved with XFS. EXT(2,3) have built-in limitations in the number of file descriptors per directory and can not be used on larger installations.
The data storage is the critical factor, hence the capacity of the storage must match the size of the installation: Best performance is achieved with SAN's (FibreChannel + Netapp or EMC or ..). The important point is that the IO Queue is kept on the "other" device and not on the OpenNMS Server.
Recently good results for smaller systems have been reported with SSD Drives.
Tomcat (if not using built-in Jetty server)
Note that there's no need to use Tomcat since OpenNMS version 1.3.7 unless you have a specific requirement that the built-in Jetty server in OpenNMS cannot meet.
If not already done at installation time; To allow Tomcat to access more memory than the default. The easiest way to do this is via the CATALINA_OPTS environment variable. If the Tomcat software being used has a configuration file as above, it can be added to that file. Otherwise it is best just to add it to catalina.sh. CATALINA_OPTS="-Xmx1024m"
The -Xmx option allows Tomcat to access up to 1GB of memory. Of course, the assumes that there is 1GB of available memory on the system. It will need to be tuned to the particular server in use.
OpenNMS daemon
OpenNMS webapp
Logging
By default the daemons and webapp log at DEBUG level. This causes a lot of extra disk I/O. You can reduce the logging substantially by setting the level to WARN in /opt/opennms/etc/log4j.properties and /opt/opennms/webapps/opennms/WEB-INF/log4j.properties. Just add this line:
log4j.threshold=WARN
There is also /opt/opennms/jetty-webapps/opennms/WEB-INF/log4j.properties, but even though this file is read on startup, it seems not to matter; I didn't need to modify it.
After restarting, you should no longer see messages labelled DEBUG or INFO in /opt/opennms/logs/daemon/* and /opt/opennms/logs/webapp/*, except for the startup log (/opt/opennms/logs/daemon/output.log).
Capsd service discovery / rescan
If discovery or rescanning of a node takes a long time, you can turn up the maximum number of threads for initial discovery of services (max-suspect-thread-pool-size) or rescans (max-rescan-thread-pool-size) at the top of capsd-configuration.xml.
Change logging for capsd in log4j.properties from WARN to DEBUG and check the capsd.log file for the number after "Pool-fibern". If n is most of the time the same as the maximum number of threads configured you should increase the maximum number of threads. Most servers will easily handle 50 threads or even more as the threads are most of the time waiting for services that don't answer. Don't forget to change logging back to WARN.
Capsd will check every service defined in capsd-configuration.xml for
every interface of the device during a rescan. For every service you can
define the number of retries and the timeout value. If you have a device
with a lot (hundred) of interfaces and the default capsd configuration
it has to check about 30 services (default for opennms 1.6.x) for every interface. If the interfaces
are just "ip interfaces" with no other service like dns, dhcp, http etc.
you have about 30 services to time out for every interface, and probably there are retries, too.
To get an estimate of the time this needs take
time = number of interfaces * number of services * ((number of retries)+1) * (timeout value/1000)
Note: timeout is defined in milliseconds!
By example
time = 100 [interfaces] * 30 [services] * (1 [retry] +1) *(2000 [timeout in ms]/1000)
= 12.000 seconds
= 200 min.
= 3.3 hours
Try to reduce the ip-ranges, the number of services to check, the timeout- and retry-values to something reasonable for your environment.
Poller threads
If you have good hardware and find your pollers are not completing in time, you can turn up the maximum number of poller threads at the top of poller-configuration.xml.
To find out how many threads are actually being used, make sure DEBUG level logging is enabled for daemon/poller.log, then run:
$ tail -f poller.log | egrep 'PollerScheduler.*adjust:'
...
2007-09-05 10:30:32,755 DEBUG [PollerScheduler-45 Pool] RunnableConsumerThreadPool$SizingFifoQueue:
adjust: started fiber PollerScheduler-45 Pool-fiber2 ratio = 1.0227273, alive = 44
...
2007-09-05 10:30:12,783 DEBUG [PollerScheduler-45 Pool-fiber29] RunnableConsumerThreadPool$SizingFifoQueue:
adjust: calling stop on fiber PollerScheduler-45 Pool-fiber3
Watch the output for a while after startup. The "alive" count shows the number of active poller threads (minus one -- the new thread isn't counted). If the number of threads is continually pegged at the maximum (default 30), you might want to add more threads.
Event Handling
All incoming events have to be checked against the configured events to classify them and to handle the parameters correctly. There are a lot of predefined events in opennms. Incoming events are compared to the list of configured events until the first match is found. If you have a lot of incoming events you might consider to make the following changes in $OPENNMS_HOME/etc/eventconf.xml
- comment out vendor events that you don't need
- put the vendor events that make most of your incoming events on top of the list
- Take care that Standard, default and programmatic events keep their place at the end of the list.
As there probably are a lot of events hitting the Standard- or default-events configured at the end of the list resorting the event list won't help as much as commenting out.
Event Archiving
In the OpenNMS "contrib" directory, we have a small script for helping performance by archiving events into a historical event table and updating the references to the archived event to an event place holder.
You can download the latest version of the script here.
It is recommended that you run this script by passing in a timestamp argument such that you archive one day's worth of events beginning with the oldest day up to the point you want to keep live events (default is 9 weeks). Then run this script without a timestamp parameter, from cron as often as you like from there out.
./maint_events.sh "2008/01/01"
To analyze why your event table is so large, have a look at Event_Maintenance.
Data Collection
If you try to collect a lot of data from nodes which don't provide those values you will get a lot of threads waiting for timeouts or getting errors. If you have specific nodes with problems look in your $OPENNMS_HOME/share/rrd/snmp/[nodeid] directory for the node(s) in question and note all the mib objects that are actually being collected.
Another possibility is to change the logging for collectd from WARN to DEBUG:
$OPENNMS_HOME/etc/log4j.properties: # Collectd log4j.category.OpenNMS.Collectd=DEBUG, COLLECTD
and then fgrep for "node[your_nodeid]" in collectd.log.
There you should see which variables are tried to collect and which variables are successfully collected. The successful ones normally end up in the jRRD files.
If there are too many unsuccessful tries change your datacollection-config.xml. You may omit those values for all devices or create new collection groups that contain only those mib objects the node(s) provide values for. Add a systemDef for your node(s) providing the the same values. In collectd-configuration.xml define a separate package for your node and reference the snmp-collection you just created in datacollection-config.xml. Make sure the node is only in this one package. This gives you an environment to work in that is free of any extra clutter and avoids requesting extraneous mib objects that you won't get a response for. Then experiment with different values for max-vars-per-pdu, timeout and also snmp v1 or v2c.
Don't forget to change back logging to WARN once you have finished debugging.









New Pages