Problem management is focused on reducing incidents and their impact on an organization's operations. Problem management and incident management, although tightly coupled, differ in several ways:
- A problem is the underlying cause for multiple disruptions; an incident is one of those disruptions.
- Problem management addresses the underlying cause of multiple incidents; incident management entails responding to an instance of disrupted operations caused by a problem.
- Problem management attempts to detect and address root causes of problems; incident management attempts to restore normal operating functions, possibly without fully correcting the underlying cause.
Problem management depends on data from multiple incidents, so a CMDB and incident repository can support the investigation and analysis of root causes. For example, if an end user application repeatedly crashes on some but not all client devices, the CMDB can be used to determine what the affected systems have in common that are not found in the unaffected devices.
Figure 5.3: CMDBs can help to rapidly identify common characteristics of devices affected by an incident, thus supporting root cause analysis and problem management.
Once the cause of a problem is identified and a solution developed, the problem and solution should be documented for future reference. Even if the identical problem is not likely to occur again -- for example, all servers are patched for a known vulnerability -- the solution description may help to solve other somewhat similar problems.
Another part of problem management, and closely related to incident management, is trend analysis. The function of trend analysis is to determine the frequency of particular types of problems and determine which, if any, incident types are increasing. Trend analysis can lead to introducing new methods or devices. For example:
- The increasing number of password resets, coupled with the cost of staffing Help desks, can create a cost justification for a self-service password reset.
- Rapid growth in email storage requirements may justify the use of a network appliance to filter spam.
- Discovery of an increasing number of conflicts between newly deployed applications and legacy applications can lead to changes in software testing methodology.
Trend analysis in itself does not solve problems but identifies categories of problems that are growing in severity or frequency. A general problem that can have ripple affects throughout an IT infrastructure is errors in configuration management.
Implementing System Management Services
Home: Deploying Service Support
Part 1: Elements of Service Support
Part 2: Incident Management
Part 3: Problem Management
Part 4: Configuration Management
Part 5: Change Management
Part 6: Release Management
The above tip is excerpted from Chapter 5, "Implementing System Management Services, Part 1: Deploying Service Support" of The Definitive Guide to Service-Oriented Systems Management by Dan Sullivan. Get a copy of this ebook at Realtime Publishers.
About the author: Chief Technology Officer of Redmont Corporation. Dan's 17 years of IT experience include engagements in enterprise content management, data warehousing, database design, natural language processing and artificial intelligence. Dan has developed significant expertise in all phases of the system development lifecycle and in a broad range of industries, including financial services, manufacturing, government, retail, gas and oil production, power generation, and education. In addition to authoring various books, articles and columns, Dan is the leader of The Realtime Messaging and Web Security Community where he posts to his Messaging and Web Security weblog and produces his expert podcast.
This was first published in February 2007