Friday, October 19, 2018

SCOM and Orchestrator Voice Notification Solution with Twilio and Automys


SCOM and Orchestrator Voice Notification Solution with Twilio and Automys.


Cherry Picking SCOM alerts…

Problem:

Issue # 1 : Spam (and I don’t mean the canned pork meat)

Alert spam has been a challenge for anyone that uses SCOM. Finding the perfect balance between notifying IT folks on important issues and avoiding overwhelming them with alerts has proven difficult.

However, when a large number of alerts are generated, generally means a big problem is brewing in your network.

Issue # 2 : Timing

Email and SMS Text notifications can be very effective during working hours. After hours is a different issue all together since most people put their devices down (as they should) and a large number of alerts or a network outage could be missed.

That is where Voice notification comes into play.

However, we wouldn’t want to wake up our IT Admin at 3:00 am for just any issue. Totally fine if you’re testing or if you dislike the person, you know the one always complaining about the amount of alerts they get or when issues do arise, they complaint why they didn’t get more alerts and when.. (ahem), excuse me, back to the subject.


Solution: Voice notification, appropriately named 'Alert Cherry Picker'


The solution provided below is a combination of SCOM, Orchestrator and third party solutions Twillio and Automys.

Logic:

Orchestrator side:


  • An SCORCH Runbook monitors SCOM alerts and writes the alert’s details to a Database.
  • A PowerShell scripts converts the alert’s UTC time to local time.
  • An SQL query is later run to verify if the same alert has been written to the Database in the last x number of minutes. (Here you can customize it to monitor only critical alerts, alerts from specific group, etc.)
  • If the alert meets the time and frequency criteria, an event is written to the Orchestrator server event log.


SCOM side:


  • A SCOM rule reads the Orchestrator log file and generates another alert. (Voice Notification Alert Trigger)
  • A SCOM Command notification configured to listen for the above alert is executed which places the phone call.


This is where the third party comes in.

Noah Stahl has done an amazing job integrating SCOM to communication solution Twilio.

Follow the steps in the guide below to integrate SCOM with twilio.

https://automys.com/library/asset/sms-voice-notifications-with-powershell-system-center-operations-manager-and-t

My contribution:

My contribution here is in the form of the Orchestrator Runbook and the custom SCOM Management Pack that perform the steps described in the logic section above.

Runbooks

The Trimmer Runbook




Activities:

Monitor Alert:
As part of the Operations Manager Integration Pack this activity monitor all SCOM alerts.



‘The Times They Are a-Changin’ script:
Simple script to change the alert’s UTC time to local time:




Write to Database:
This activity writes the alerts details to the custom database. You can find the AlertCherryPickerTable.sql file in this solution to create the needed SQL table that exactly matches the activity parameters.




To the Picker Invoke Runbook Control:
Passes alert’s details to the Picker Runbook, which determines if our thresholds are met.



The Picker Runbook




Activities:

Query Database if Alert Exists:
Feel free to modify the details. Here we query SQL to find another alert with the same name in the last 5 minutes.



SQL Query Result Greater Than or equals to X Link:
This link determine if the results from the query are pass to the next activity, which creates an event
Add the minimum number of rows in the Query Result as a condition to pass to the next activity




Create an Event:
An event is created in a Custom EventLog on the Orchestrator server.




SCOM Management Pack:

Management Pack ‘Mundo SCOM SCORH Runbook Event Log Monitoring’ contains the rule “Voice Notification Alert Trigger”.

The alert is disabled. Enable it for the Orchestrator server via an override



Command Channel
Follow Noah’s steps on how to set up a channel. Below is what the end product looks like. Note the message type is Voice.



Alert Subscription
This is what triggers the command channel, which executes the script (Noah’s solution), which makes the call. That’s a lot…

Very important to include the dummy subscriber, and  I don’t mean your IT Admin… ba-dum-pump chsh!




Downloads AlertCherryPicker.zip


Friday, October 21, 2016

SCOM 2012 R2 Console crashes due to Windows Updates


SCOM 2012 Console crashes due to Windows Updates

Issue:

After installing Windows Updates for the month of October 2016 the SCOM 2012 R2 Console started to crach.

Error message in Application Log


Log Name:      Application
Source:        Application Hang
Date:          10/21/2016 9:59:19 AM
Event ID:      1002
Task Category: (101)
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      xxxxxxxxx
Description:
The program vsjitdebugger.exe version 10.0.30319.1 stopped interacting with Windows and was closed. To see if more information about the problem is available, check the problem history in the Action Center control panel.
 Process ID: 2758
 Start Time: 01d22ba33b5f3b2a
 Termination Time: 2
 Application Path: C:\Windows\system32\vsjitdebugger.exe
 Report Id: 8dce1c7e-9796-11e6-9422-d8d3856142dc
 Faulting package full name:
 Faulting package-relative application ID:

Solution:

Uninstall the following Windows Updates:

For Windows 2012 remove:

 KB3185332 and  KB3192393 


For Windows 2012 R2 remove

 KB3185331 and KB3192392

Important Information from Microsoft

If you use update management processes other than Windows Update and automatically approve all Security updates classifications for deployment, note that both the Security Only Quality Update 3185332 and the Security Monthly Quality Rollup for the month 3192393 will be deployed. We recommend that you review your update deployment rules to ensure the desired updates are deployed. 

 

If you use update management processes other than Windows Update and automatically approve all Security updates classifications for deployment, note that both the Security Only Quality Update 3185331 and the Security Monthly Quality Rollup for the month 3192392 will be deployed. We recommend that you review your update deployment rules to ensure the desired updates are deployed.

Monday, June 22, 2015

Health Service Failure Runbook



Every time I apply a SCOM 2012 R2 update I go to the best source on the web, Kevin Holman’s Step by Step series.



But here is my experience, issues that I encountered and of course, the solution.

After running all SCOM server-related updates as per Kevin’s document I was now ready to update all Managed Agents waiting to be updated in the Pending Management pane.

Half updated with no issues, but the other half (200+ agents) would generate the following error:


The Agent Management Operation Agent Install failed for remote computer xxxx
Install account: xxx\ScomAction
Error Code: 8007041D
Error Description: The service did not respond to the start or control request in a timely fashion.
Microsoft Installer Error Description:
For more information, see Windows Installer log file "C:\Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\AgentManagement\AgentLogs\AgentInstall.LOG
C:\Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\AgentManagement\AgentLogs\AgentPatch.LOG
C:\Program Files\Microsoft System Center 2012 R2\Operations Manager\Server\AgentManagement\AgentLogs\MOMAgentMgmt.log" on the Management Server.




The key words here is in the error description “The service did not respond to the start or control request in a timely fashion”

When I open the affected agent services, indeed the Microsoft Monitoring Agent (HealthService) was stopped.

That of course generated a “Health Service Heartbeat Failure” alert and if the agent was a Cluster or Domain Controller then there were a plethora of other Critical alerts that came with it.

The funny part is that SCOM had successfully updated the agent but failed to re-start the Health Service which presented a challenge since there is nothing that can be done from within the SCOM console to resuscitate the now grey-out agent.

A quick solution is to remotely start the Microsoft Monitoring Service but it’s impractical on a 400+ agent population.



The Solution:


I created a SCORCH 2012 R2 Runbook to start the Microsoft Monitoring Service




Under the hood:

The Runbook listens for the ‘Health Service Heartbeat Failure’ alert
Ping the server to ensure it has not been shutdown or rebooted. 
If ping fails, an Information alert is created, mainly so it won’t interfere with the ‘Failed to Connect to Computer’ Critical alert that is generated immediately after.
If the ping is successful we pass the information to the next activity to start the Microsoft Monitoring Agent service.
Last step, we closed the ‘Health Service Heartbeat Failure’ alert and write ‘Closed by SCORCH’ in custom filed 1 as a successful stamp.



Disclaimer: 
All software and information is provided “AS IS” with no warranties. Use at your own risk! Please test it in a Lab environment first!






Tuesday, June 9, 2015

AD User Attribute Changes Audit Report



The following SCOM 2012 R2 ACS report provides detailed attribute changes done to any Active Directory user.
The Challenge
There are many reports that provide similar information included in SCOM Audit reports. One example is located in Reporting>>Audit Reports>>DAC_-_Object_Attribute_Changes

However, there are thousands of AD Attributes which include hundreds of AD User-related attributes making the above mentioned report very convoluted. The use of sometimes cryptic attribute names, values and operation description adds to the complexity of the report making it hard to read especially for non-tech people whom are, most of the time, the recipients of many SCOM reports.

Sample out-of-the-box report



The Solution:

The attached report focuses on AD User attributes displayed via Outlook which are a representation of LDAP fields and are by far the most commonly modified.

Most Common User Attributes




In my report I have replaced all AD User attributes with user-friendly names.


AD User Attribute Name
Friendly Name
displayname
Display Name
givenname
First Name
initials
Initials
sn
Last Name
mailNickname
Email Alias
streetAddress
Address
description
Description
title
Title
company
Company
department
Department
physicalDeliveryOfficeName
Office
msExchAssistantName
Assistant
telephoneNumber
Phone Number
L
City
st
State/Province
Postal Code
Zip/Postal Code
co
Country/Region
thumbnailPhoto
Photo


Sample Report

Mundo SCOM AD User Attribute Changes Report

The report takes two variable in between two %% signs: ‘User Name Contains’ (Affected User) and/or ‘Attribute Name Contains’ (Changed Attribute) or just enter two %% to get all possible results.



Preparing the AD environment:

How to enable AD Object Auditing, Audit Policies or Advanced Audit Policies setup is out of the scope of this post. 

However here is quick description of what is needed in order to produce the report:
On you Domain Controllers, enable ‘Directory Service Changes’ Audit Policy Subcategory, which is part of the Directory Service Audit Policy Category. Make sure to enable both.

AD object attribute changes are captured in Event ID 5136: A directory service object was modified which is part of the above Subcategory.

Enable Auditing to all Users via GPO or manually for a small number of users.

For a single user go to ‘Advanced’ security setting, Auditing. Add ‘Write all properties’.


Disclaimer: 
All software and the information is provided “AS IS” with no warranties. Use at your own risk! Please test it in a Lab environment first!




Monday, April 13, 2015

BlackBerry BES 12 Management Pack for SCOM 2012 R2


Update:

Thanks all for your feedback.

<<<For those asking for a customized MP, you can email me directly if you wish to "brand" this or any MP with your company name instead of "MundoSCOM".>>>.


This management pack is for monitoring BlackBerry Enterprise Server version 12.

This MP is designed for BES 12 servers that have been upgraded from BES version 5. This setup is done in order to manage BlackBerry legacy and BB10 devices.

(Link to download xml file below)

Console View:



Discoveries:

This management pack uses a seed class that searches for the following registry key:
HKLM\SOFTWARE\Wow6432Node\BlackBerry\BES12.


 Monitored BES and BES12 Services:



NOTE: All Monitors are enabled by default. In a Cluster setup, monitors for services set to ‘Manual’ could be disabled in order to avoid alerts from the passive node when servers are rebooted.


Distributed Application:



Disclaimer: 
The Management Pack and the information is provided “AS IS” with no warranties. Use at your own risk!

Link:



Thursday, April 9, 2015

SCOM 2012 R2 Command Notification Channel using PowerShell Fails


This issue may be common but I couldn't find any information on the different SCOM blogs hence I decided to post it.

Problem:
I added a second Management Server (MS) to my lab management group.
I have some Command channels that execute different PowerShell scripts copied locally on the RMSe which are triggered by an event rule.
The trigger rule was being generated successfully but the PowerShell scripts did not get executed.

Solution:

1. The new MS is automatically added to the “Notification Resource Pool” therefore it needs the same scripts copied locally. (Same folder structure for both MS servers)





2. The Notifications account needs appropriate permission on the folder/share where these scripts are stored to execute them.




3.  To maintain the same information on both Script shares I have created a batch file with a Robocopy command which is executed weekly via a Scheduled Task




       Command in Batch file:
       robocopy "C:\SCOM\ScriptFolder" "\\MS2\C$\SCOM\ScriptFolder" 
       /E /ZB /X /PURGE /COPYALL /TEE /LOG:E:\Copy_from_HD_to_Ext_HD.log

       Meaning of switches used in above command explained below

         /E :: copy subdirectories, including Empty ones.     

        /ZB :: use restartable mode; if access denied use Backup mode.

       /COPYALL :: COPY ALL file info (equivalent to /COPY:DATSOU).

       /PURGE :: delete dest files/dirs that no longer exist in source.

       /X :: report all eXtra files, not just those selected.

      /TEE :: output to console window, as well as the log file.

      /LOG:file :: output status to LOG file (overwrite existing log).



      Thanks to ITBloggerTips for posting this very useful Robocopy command!!

     http://itbloggertips.com/2013/05/robocopy-command-copy-only-new-changed-files-sync-both-the-drive/


Disclaimer:
The information is provided “AS IS” with no warranties. Use at your own risk!


SCOM and Orchestrator Voice Notification Solution with Twilio and Automys

SCOM and Orchestrator Voice Notification Solution with Twilio and Automys. Cherry Picking SCOM alerts… Problem: Issue # 1 : Spam (a...