With the advent of solid-state drives (SSDs) and their lightning speeds, the topic of storage systems performance is becoming increasingly important. While SSD will likely improve storage performance in almost every situation, to fully maximize the capabilities of the technology, everything else between the server and the storage must run at peak efficiency as well.
The first step in diagnosing storage network performance problems is to understand the I/O capabilities of the interface card installed in the performance-challenged host. To do this, measure the storage bandwidth—both peak and average--that the server is using and compare that with what the card is capable of delivering. If peak utilization is anywhere close to full utilization of theoretical bandwidth, the storage network is a candidate for a hardware upgrade.
This upgrade can be done in one of two ways. The most obvious and traditional option is to install a faster card and a faster network infrastructure, at least for the servers that have the performance problem.
Another, newer option is to reduce the amount of traffic on the network with server-based solid-state caching. This technology—as we discuss in our article “What is server-based solid-state caching?”—lowers the amount of data transmitted across the network by keeping the most active segments of that server’s data on high-speed SSD in the server. Since it is a cache, the flash management is handled in an automated way, without user interaction. For environments with just a few servers to accelerate, this option may be less expensive than an entire network upgrade, and it may deliver better performance.
Storage network performance analysis becomes more challenging if the network card doesn’t appear to be the performance bottleneck. While the default next step is to blame the storage system, there are more network stones to overturn in your pursuit of the source of the problem. To confirm that you really do have a network problem (or, conversely, to prove that the problem stems from the storage system) look at the disk queues on the storage system. If disk queues are low (4 or below), disk IOPS are not consistently high and server CPU utilization is low (less than 50%), more than likely the performance problem stems from the storage network. The CPU is waiting on something, it just is not the storage system.
If the above process indicates a storage network performance problem, it’s time to turn your attention to the adapter type. The use of software iSCSI—iSCSI used in conjunction with a standard (and often cheap) Ethernet card—can cause the server to spend too much time processing IP-to-SCSI conversions.
To address this problem, there are three options: You could upgrade the server’s processor so that it can perform the iSCSI conversion more quickly; upgrade to a card that offloads the iSCSI traffic; or switch to a technology, like ATA over Ethernet (AoE) or Fibre Channel over Ethernet (FCoE), that does not require the IP-to- SCSI conversion yet still works with standard high-quality Ethernet cabling.
Another area to examine in storage network performance analysis is the quality of the cabling. In some cases, faulty or improperly lengthened cables can hinder performance, typically by causing more errors thanks to transmission retries. The best way to measure this is on-the-wire physical analysis through some sort of network tap solution that can report packet loss in real time. While it may seem like a lot of extra effort, it can pay off by saving days’ worth of troubleshooting.
The switch itself also requires evaluation during storage network performance analysis. While it may seem somewhat obvious, make sure the customer’s ports are all set for their maximum speed. In our experience, the inter-switch links (ISLs) are often not set to maximum speed. This is typically done to fulfill some backward compatibility requirement since most infrastructures are not converted from one speed to the next overnight. Simply setting the maximum port speeds on the switches can greatly improve performance at no additional cost.
The final area for consideration is an upgrade to modern switches and storage systems that support the Data Center Bridging (DCB) standard, which provides for a lossless IP infrastructure. As long as the host card and the storage system support it, a lossless environment will perform better because performance won’t be lost in transmission retries. Dell has already announced improved performance with its recently added DCB support in EqualLogic storage systems. And you should expect to see several more vendors announce DCB test results in the near future.
George Crump is president of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments.