CPU Usage from walking interfaces on Juniper switches #694
-
I've got two Juniper virtual chassis switch stacks that end up with their CPU usage really high whenever SNMP Exporter is walking the interfaces to get typical throughput/errors/counters/etc OIDs. I think part of the problem is that I have a redundant pair of Prometheus servers querying SNMP Exporter, and so SNMP exporter may be conducting multiple walks simultaneously. I was able to manage this reasonably well on an EX3400 stack of switches just by reducing the scrape frequency out to every 2 minutes. On those, the CPU usage jumps from about 35% to 70% when queries are happening. On a new stack of EX4300 switches, I'm seeing nearly near constant CPU pegged over 95% when I'm querying for this data. It drops to around 40% when I'm not querying for interface metrics. I've been tempted to put some kind of cache in front of SNMP exporter that can just reply with cached information from the last 2 minutes if it's asked. I'm sure that would help. I'm wondering if anyone's seen similar behavior and found ways to mitigate or work around it. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Typicaly what I do for Juniper is to enable caching in the config. I set the JunOS SNMP cache to be 1 second less than my scrape interval.
I also broke up my walks into smaller pieces to make JunOS do less work per scrape. I also dropped the ifDescr lookup on JunOS since it wasn't necessary for my setup. Here's my JunOS generator config: ---
modules:
# Trimmed down if_mib for slow JunOS devices.
if_mib_junos:
walk:
- sysUpTime
# ifXTable
- ifHCInOctets
- ifHCInUcastPkts
- ifHCInBroadcastPkts
- ifHCOutOctets
- ifHCOutUcastPkts
- ifHCOutBroadcastPkts
lookups:
- source_indexes: [ifIndex]
lookup: ifAlias
#- source_indexes: [ifIndex]
# # Uis OID to avoid conflict with PaloAlto PAN-COMMON-MIB.
# lookup: 1.3.6.1.2.1.2.2.1.2 # ifDescr
- source_indexes: [ifIndex]
# Use OID to avoid conflict with Netscaler NS-ROOT-MIB.
lookup: 1.3.6.1.2.1.31.1.1.1.1 # ifName
overrides:
ifAlias:
ignore: true # Lookup metric
ifName:
ignore: true # Lookup metric
# Trimmed down if_mib for slow JunOS devices.
if_mib_junos_errors:
walk:
# ifTable
- ifAdminStatus
- ifOperStatus
- ifInDiscards
- ifInErrors
- ifOutDiscards
- ifOutErrors
# ifXTable
- ifHighSpeed
lookups:
- source_indexes: [ifIndex]
lookup: ifAlias
#- source_indexes: [ifIndex]
# # Uis OID to avoid conflict with PaloAlto PAN-COMMON-MIB.
# lookup: 1.3.6.1.2.1.2.2.1.2 # ifDescr
- source_indexes: [ifIndex]
# Use OID to avoid conflict with Netscaler NS-ROOT-MIB.
lookup: 1.3.6.1.2.1.31.1.1.1.1 # ifName
overrides:
ifAdminStatus:
type: EnumAsStateSet
ifAlias:
ignore: true # Lookup metric
ifName:
ignore: true # Lookup metric
ifOperStatus:
type: EnumAsStateSet
ifType:
type: EnumAsInfo This allows me to scrape every 30s without much trouble. |
Beta Was this translation helpful? Give feedback.
Typicaly what I do for Juniper is to enable caching in the config. I set the JunOS SNMP cache to be 1 second less than my scrape interval.
I also broke up my walks into smaller pieces to make JunOS do less work per scrape. I also dropped the ifDescr lookup on JunOS since it wasn't necessary for my setup.
Here's my JunOS generator config: