Service support is about responding to change. The needs of users change. Configurations change. Unexpected events occur. The specific details of these changes will vary, but how systems managers respond should not. A set of well-defined processes are at the core of service-oriented systems management. Those processes -- incident management, problem management, configuration management, change management, and release management -- are discussed in detail later in this chapter. In this section, the focus is on the shared attributes and interdependencies of these processes.
Interdependent Service Support Processes
Service support processes have different specific goals depending on the area of service delivery they address. For example, in incident management, something has disrupted the normal flow of operations, and the goal is to restore normal services as soon as possible. Problem management, in contrast, takes a more holistic approach and attempts to prevent incidents by detecting patterns in incidents and identifying root causes. Even with their different goals, service support processes are highly interdependent.
In some cases, those root causes could be the result of improperly configured devices, in which case, configuration management processes must be examined. Were they applied properly? Is there a deficiency in the process that allowed a flawed configuration to enter production service? Perhaps the correct configuration specifications had been defined but they were not properly installed; in that case, the release management procedures require review.
Figure 5.1: Service support processes are interdependent and can support or trigger other each other.
It is clear from these simple examples that information relevant to one process can be vital to the proper implementation of other processes. It is also clear that errors in one set of procedures can have ripple effects that cause other processes to be activated. For both of these reasons, automated configuration management services can improve service support.
Automated Configuration Management
The objective of automated configuration management is twofold. First, it is to obtain and maintain information about the state of devices and applications deployed throughout the IT infrastructure. The second objective is to support multiple IT operations, especially service support.
To meet these objectives, automated configuration management applications use several modules, including:
- Agents for collecting configuration and other status information
- Centralized data repository
- Process flow support
- Information retrieval
Together, these modules provide the core services of automated configuration management.
Data Collection Procedures
Data is gathered from devices using agents, or applications that collect information locally and transmit it to a central repository. These agents should be relatively lightweight and autonomous; once installed and configured, they should require little systems manager intervention.
The configuration processes entail setting a number of characteristics, including:
- Data collection policies
- Frequency of data collection
- Data transmission information
- Authentication mechanism
The data collection policies define what information is gathered and how frequently it is transmitted. The information gathered can include local security policy settings, storage utilization, significant system events, and other audit-related details.
The frequency of data collection will determine how often the agent sends information to the central repository. There is a tradeoff with this setting. Devices that frequently update the central repository are less likely to have outdated data, but the data collection process places additional demands that can adversely impact the performance of other applications.
When depending on agents, it is important for the central repository to accept data only from authenticated agents. Distributed applications such as these are vulnerable to spoofing -- that is, an attacker or an attacker's program pretending to be the real agent. An attacker, for example, might want to cover his or her tracks by sending false information about failed login attempts or the amount of disk space in use. By using cryptographic techniques, such as digitally signing all transmissions, the repository can significantly reduce the chance of attacks.
Centralized Data Repository
A centralized data repository for configuration management is one that supports multiple functions related to service delivery, including managing configurations, which, in turn, support both service delivery and security enforcement. Although the basic role of the repository is to answer queries about the state of devices, it must be designed to support queries from multiple domains. For example, from an incident management perspective, the database might be queried about the software installed on a particular device and the dependencies between those applications. This is useful in cases in which a newly installed application is not working properly but works correctly on other similarly configured clients. In such a case, one of the first questions to answer is: What are the differences between clients with a working installation of the application and clients without?
In the case of problem management, support personnel may discover that a particular version of a browser add-in causes parts of an application interface to fail. They may also find that rolling back to an earlier version of the add-in resolves the problem. In this case, the configuration database could be used to determine all devices that have both the problematic plug and need to run the thin-client application. After the correct plug-in has been deployed, the database can be queried to verify installation (assuming agents have updated the repository). These are relatively simple examples; other more complex issues may require multi-step procedures.
Process Flow Support
Configuring IT devices often entails dependencies between components. Mechanisms for supporting process flow can help control procedures that must be aware of those dependencies.
Consider requirements for rolling out a Web-based application that uses Java-based technologies in clients' browsers. In addition to updating client browsers with the latest security patches, the release requires that the JRE is installed to a particular revision level. Once the browser is patched and the JRE installed, a plug-in must be installed within the browser as well. Each of these steps must be done in sequence and if one step fails, the succeeding steps should not occur. The results of the installation must be verifiable.
A process flow engine within a configuration management system could meet these requirements if it supports:
- Ordered deployment of modules
- Tests for success of each step
- Conditional processing -- for example, if the browser does not contain a particular patch, it is installed; otherwise, it is not
- Detail logs of each step
Logged information about the deployment process should be available by querying the CMDB.
Information retrieval sounds trivial -- you simply want to display data that is stored in a database. What is not trivial is precisely specifying what data it is that you want displayed. At one end of the information retrieval spectrum, there are query languages used by database developers and the occasional power user. Even for relatively simple queries, this is not a reasonable tool for most users. Consider the following query: a systems manager wants to list all resource associations, the associated resource type, the name of the resource, and a brief description, sorted by resource type. The corresponding database query would look something like (the details depend on the database structure, but the example holds for a typical normalized relational database):
SELECT ra.resource_assoc_name, rt.resource_assoc_type_name, rt.resource_type_name, r.resource_name r.resource_descr FROM resources r, resource_type rt, resource_associations ra WHERE r.resource_id = ra.resource_id AND r.resource_type_id = rt.resource_type_id ORDER BY rt.resource_type_name
Query languages are not practical tools for working with CMDBs -- they require an understanding of the underlying data model and knowledge of the database query language, typically a variation on ANSI standard SQL. However, query languages are quite flexible and with the right query, one can find anything that is in the database. Static reports lie at the other end of the information retrieval spectrum. They require no knowledge of the implementation details of the database, but they are limited in their usefulness. Static reports provide information about a limited amount of data and typically represent designers and developers' best guess at what information a systems manager will need.
Between the two extremes lies parameterized reports. They provide some of the flexibility of query languages along with some of the ease of use of static reports. Properly configured, these reports can help guide users to the information they need (see Figure 5.2 for an example).
Figure 5.2: Information retrieval from complex data structures should use a combination of search and guided querying.
Automated configuration management tools provide several mechanisms important for efficient service support, including a centralized data repository, automated data collection, support for process flow, and flexible reporting. The following sections describe how automated configuration management can support the particular requirements of several service support areas.
Implementing System Management Services
Home: Deploying Service Support
Part 1: Elements of Service Support
Part 2: Incident Management
Part 3: Problem Management
Part 4: Configuration Management
Part 5: Change Management
Part 6: Release Management
The above tip is excerpted from Chapter 5, "Implementing System Management Services, Part 1: Deploying Service Support" of The Definitive Guide to Service-Oriented Systems Management by Dan Sullivan. Get a copy of this ebook at Realtime Publishers.
About the author: Chief Technology Officer of Redmont Corporation. Dan's 17 years of IT experience include engagements in enterprise content management, data warehousing, database design, natural language processing and artificial intelligence. Dan has developed significant expertise in all phases of the system development lifecycle and in a broad range of industries, including financial services, manufacturing, government, retail, gas and oil production, power generation, and education. In addition to authoring various books, articles and columns, Dan is the leader of The Realtime Messaging and Web Security Community where he posts to his Messaging and Web Security weblog and produces his expert podcast.