Every time I apply a SCOM 2012 R2 update I go to the best source
on the web, Kevin Holman’s Step by Step series.
But here is my experience, issues that I encountered and of
course, the solution.
After running all SCOM server-related updates as per Kevin’s
document I was now ready to update all Managed Agents waiting to be updated in
the Pending Management pane.
Half updated with no issues, but the other half (200+
agents) would generate the following error:
The Agent Management
Operation Agent Install failed for remote computer xxxx
Install account: xxx\ScomAction
Error Code: 8007041D
Error Description: The
service did not respond to the start or control request in a timely fashion.
Microsoft Installer
Error Description:
For more information,
see Windows Installer log file "C:\Program Files\Microsoft System Center
2012 R2\Operations Manager\Server\AgentManagement\AgentLogs\AgentInstall.LOG
C:\Program
Files\Microsoft System Center 2012 R2\Operations
Manager\Server\AgentManagement\AgentLogs\AgentPatch.LOG
C:\Program
Files\Microsoft System Center 2012 R2\Operations
Manager\Server\AgentManagement\AgentLogs\MOMAgentMgmt.log" on the
Management Server.
The key words here is in the error description “The service
did not respond to the start or control request in a timely fashion”
When I open the affected agent services, indeed the
Microsoft Monitoring Agent (HealthService) was stopped.
That of course generated a “Health Service Heartbeat Failure”
alert and if the agent was a Cluster or Domain Controller then there were a plethora
of other Critical alerts that came with it.
The funny part is that SCOM had successfully updated the
agent but failed to re-start the Health Service which presented a challenge
since there is nothing that can be done from within the SCOM console to resuscitate
the now grey-out agent.
A quick solution is to remotely start the Microsoft
Monitoring Service but it’s impractical on a 400+ agent population.
The Solution:
I created a SCORCH 2012 R2 Runbook to start the Microsoft
Monitoring Service
Under the hood:
The Runbook listens for the ‘Health Service Heartbeat
Failure’ alert
Ping the server to ensure it has not been shutdown or
rebooted.
If ping fails, an Information
alert is created, mainly so it won’t interfere with the ‘Failed to Connect to
Computer’ Critical alert that is generated immediately after.
If the ping is successful we pass the information to the
next activity to start the Microsoft Monitoring Agent service.
Last step, we closed the ‘Health Service Heartbeat Failure’
alert and write ‘Closed by SCORCH’ in custom filed 1 as a successful stamp.
Disclaimer:
All software and information is provided “AS IS” with no warranties. Use at your own risk! Please test it in a Lab environment first!
All software and information is provided “AS IS” with no warranties. Use at your own risk! Please test it in a Lab environment first!