NOTICE: Branded Content
NOTICE: Certain versions of content (“Material”) accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.
Operations Bridge Idea Exchange
cancel

Primary HeartBeat - Agent Ping Monitoring from OMi

Agent health monitoring done is OMi is not reactive enough to outages for us, and we needed to add extra layer of monitoring for Agents (called primary heartbeat in HPOM). Basically a simple ping on port 383 every 5min.

We could script a ping monitor or using Sitescope with auto template deployment, but it is a shame to do it ourself where OMi could do i OOTB.

thanks,

Anthony

15 Comments
Outstanding Contributor...

Hi Anthony, thanks for the idea. Have you looked at the Agent Health View functionality and if yes, do you think adding that in OMi will address this need?

Regards,

Mamta

 Super Contributor...

Hi Anthony,

could you explain why the server health check is not usable for you, see

https://docs.microfocus.com/itom/Operations_Bridge_Manager:10.64/OMi/AdminGuide/HealthCheck/health_check ?

Agent & server health check

If the server does not receive an event from an agent within the configured interval, the server can generate an agent problem event or, if Agent & Server checking is configured, can actively check the status of the agent before generating an agent problem event.

The server attempts to make two checks using HTTP connections to the agent. The first check opens a socket connection to verify that the control daemon (ovcd) and communication broker (ovbbccb) are running.

It actually does a ping to 383 as you requested.

Volker

Acclaimed Contributor.
Volker Agent & server health check is not reliable because it still works by pinging OA agent on 383. What happens if OA agent is not running and definitely it would alert, but that cannot be used as an source for Server availability, since Server is running. Basically we need Ping and System uptime monitor like in NNMi to determine Server availability which is not been provided with latest version of OMI even.
 Honored Contributor...
@volker Agent health is really not reliable when we need real server availability. Usually we want alerts to send to respective servers owners when their server is not reachable ( basic ping does the work ) Same alerts will be send if agent stops working (and we all know that Agents are really hard to maintain) . If we have 1000 agents out of that atleast 100 will have some kind off issue. To be honest we need a simple ping monitor which OMI can do itself to all the monitored nodes and report something similar as sitescope.
 Honored Contributor..

Hello,
Agent health view is great, but not in OMI by default and not for the same purpose.
We define basically two types of heartbeat : primary heartbeat (ping, some teams are sensitive to this for critical env) and secondary heartbeat (from agent, and check port 383) - less critical as it is "only" an agent issue, so we have some delay in generating this event.

So today we build ourself the primary heartbeat with scripts, but having a small NNM or Sitescope or internal script into OMi would do much better job (and supported :)
cheers,
Anthony

Outstanding Contributor...
Status changed to: Waiting for Votes
 
 Frequent Contributor..

The Ping (Availability) is required.

This will also help is getting available dashboard in BvD without using SiteScope or custom script.

Trusted Contributor.

Yes, indeed, we need 2 types of check:

(1). Agent not working

(2). Server not pingable.

If we get a generic statement, server may be down or agent may be down, will not actually help the exact issue.

Hope to get a solution on this

Super Contributor.

I support the proposal for server and agent differentiation.

But for agent health i think best idea to add Agent Health View to OMi(OBM). At this moment it did not integrate and does not support SSO.
Second idea for AHW to change agent connection from REST to BBC connection. AHW does not support RCP enviroment correctly.

Super Contributor.

We would like to expand this thought a bit more.  TCP connections to 383 are important, although I would like to know that the 383 request is actually processed in some way.  The old method of using ICMP to back up the agent health ping is no  longer a viable verification since virtualization and or offloading of network function to secondary processors can cause the ping to respond, even when the server can't process data. 

We would also like to suggest that when a monitored server is not found or the agent sends a failed agent component message, that a series of diagnostic steps automatically are initiated to diagnose the agent issue, and restart first the component, and if that doesn't work to restart the agent.  We have created an oo flow that does some of this, and if we can't talk to the agent, we do an ssh, or rdp to the server to ensure that it is still up.  Ops Bridge really needs a definitive way show the UP/DOWN status of a server.  UP means the server is processing data and responding to new tasks.  Down, means that it is hung, not responding, can't spawn new tasks....   We understand this is a very difficult thing to ask for, but I know Microfocus has folks that can figure this out..  :)  I have faith in you!