You need OpenManage Agents to be installed and SNMP (not just Traps) enabled on each server you want to monitor. Also, you'll need the official and model specific MIB(s), which you copy to the (InstallDir)\Data\MIB-Directory (Restart all WhatsUp related services after the import).
Normally, I start by scanning the MIB using the SNMP MIB File Explorer, which is integrated in the WhatsUp web interface (Search by: MIB Module); As soon as you have found the SNMP "branch", where you assume finding the values you need, you select it and switch to the SNMP MIB Browser.
Here, you select your "reference device", the valid SNMP-Credentials and start the look up.
If you have further questions, just ask.
Best regards
Stefan

23 Replies
Hi,
You should take a look on WUG Passive and Performance monitor, I think that may help you.
You need OpenManage Agents to be installed and SNMP (not just Traps) enabled on each server you want to monitor. Also, you'll need the official and model specific MIB(s), which you copy to the (InstallDir)\Data\MIB-Directory (Restart all WhatsUp related services after the import).
Normally, I start by scanning the MIB using the SNMP MIB File Explorer, which is integrated in the WhatsUp web interface (Search by: MIB Module); As soon as you have found the SNMP "branch", where you assume finding the values you need, you select it and switch to the SNMP MIB Browser.
Here, you select your "reference device", the valid SNMP-Credentials and start the look up.
If you have further questions, just ask.
Best regards
Stefan
@Ahmed: I never recommend the use of Passive Monitors.
Passive Monitors are impractical, not really customizable and you don't know, if they will really alert you, when something happens. If you poll a system state using Active Monitors, you notice any disruption immediately (even if there is something, which makes successful polling impossible, like a changed firewall rule, a static SNMP Manager IP-Address, etc.).
The most important thing, why I prefer using Active Monitors:
You always have an updated report as a proof, that everything is working fine.
Best regards
Stefan
Although polling is more accurate than relying on a device to send message that has no acknowledgement, some of the notifications are only available via a SNMP trap.
You can poll for fan, power supply, temperature status but you need to configure passive monitors to get traps for things like per-failure notifications, correctable memory parity / CPU errors, chassis intrusion.
The best monitor strategy would be to Active Monitors to poll for anything you can, like Stefan said, AND to set up Passive Monitors to listen for traps as well.
Thanks,
-Rich
Thanks for the help. I will try out your suggestions.
I have over 300 Dell servers, the main things I want to check for are hard drives and raid containers, because those are what seem to fail the most.
I tried using Dell IT assistant, I had no problems setting it up, but I'm not getting the alerts. I get a test alert, but when I change a temperature threshhold to force an alert it never comes.
Both of these values should be available to be polled by SNMP Active Monitors.
Just set up one device as a reference and create a new Role in the Discovery Console according to the configuration of the reference device. Now scan the IP Range containing all your Dell Servers (make sure, you have SNMP enabled and one single SNMP community set on all of them) and they get all configured automatically, as you wanted to have them.
Best regards
Stefan
Thanks, I had a quick question. I have many different models; I'm guessing each model needs a specific active monitor?
Also when creating a new rule I have found the virtual disk. next to the OID there is a field called Instance, that starts at 0. I tried creating a monitor of a raid 5 container using 0 for the instance and it went to failed, if I made it a 1, then it went to normal. I have some servers that have several raid containers; do you increment the instance counter to monitor each virtual disk?
here's what I found when doing an snmp walk:
Type: Enumeration
Access: Read Only
Label: iso.org.dod.internet.private.enterprises.dell.storage.software.storageManagement.logicalDevices.virtualDiskTable.virtualDiskEntry.virtualDiskState
Description: The current condition of this virtual disk (which includes any member array disks.) Possible states: 0: Unknown 1: Ready - The disk is accessible and has no known problems. 2: Failed - Access has been lost to the data or is about to be lost. 3: Online 4: Offline - The disk is not accessible. The disk may be corrupted or intermittently unavailable. 6: Degraded - The data on the virtual disk is no longer fault tolerant because one of the underlying disks is not online. 15: Resynching 16: Regenerating 24: Rebuilding 26: Formatting 32: Reconstructing 35: Initializing 36: Background Initialization 38: Resynching Paused 52: Permanently Degraded 54: Degraded Redundancy
Enumerations: 1 - ready
2 - failed
3 - online
4 - offline
6 - degraded
7 - verifying
15 - resynching
16 - regenerating
18 - failedRedundancy
24 - rebuilding
26 - formatting
32 - reconstructing
35 - initializing
36 - backgroundInit
52 - permanentlyDegraded
I think I found it :) It's amazing what you find when you keep looking...
So far I've created some for the raid container, and one for each individual disk, so I now which drive has failed.
Nice, that you could find, what you needed :)
As you might have detected: Any Instance (start with 1 for the first, 2 for the second, and so on; 0 is always "empty" = related to the branch, in which your instances are) corresponds to one element (e.g. Logical Disk 1, Logical Disk 2, Logical Disk 3, etc.) below the named branch like virtualDiskState
How did you define your Active Monitor must be value in the above example? Up value = 1?
Best regards
Stefan
Yes I used 1, thanks.
I've noticed that the polls for some of these are failing quite often, I had to tweak the alerts so I wasn't getting flooded with emails.
One that I've noticed that is failing quite often is the raid controller battery, not sure if it's failing because its charging. Also I was hoping to monitor the power supplies, but when I do an SNMP walk to the power supplies it's not able to poll the status, it's always unknown.
Right now I'm pollng raid containers, raid contoller, individual hard drives, and raid battery.
You could set up a Performance Monitor to track the values of the RAID controller battery. The provided graph could help you to know, what's happening with the status. Further, you should find the polled value (thus the reason, why the Monitor gets a Down state) in the State Change Timeline of the Device (column: Message).
If the possible values are well sorted, you could then set the Active Monitor to have a Range of Values (1 [Ready] to 2 [Charging]) instead of a constant value to be considered as UP.
You could attach the MIB files (or the necessary sequences) you are using, so that we could find a better solution.
Which of the other monitors are also failing?
Thanks, good idea. Sometimes they are failing because of timeouts, so I extended the timeout from 2 to 4, and upped the retry from 1 to 2.
I notice that the raid controller status fluctuates a lot as well. Also when I do an SNMP walk to the power supply status and temperature probes I am not getting values.
Would you like to comment?
You must be a member. Sign In if you are already a member.