Sizing link bandwidth for long-distance storage networking applications
As described previously in this chapter, there are two primary types of data mirroring: synchronous and asynchronous. In determining which to choose, you must first answer the question, "Should the storage in the primary data center become completely inaccessible, how much data can we afford to lose?" If the answer is none, then synchronous mirroring must be used; otherwise, either asynchronous or synchronous mirroring can be used.
Synchronous mirroring requires that each write must be completed successfully on both the primary and the secondary disk array before the servers or initiators can write more data. That ensures that both sets of data are identical at all times. In general, increasing the network bandwidth available to synchronous mirroring applications increases the write performance of both the primary and the secondary disk arrays, as the pacing factor is the time it takes to transmit the backup data across the network and receive a response from the secondary array. But even with ample network bandwidth, other factors may come into play, which limit the performance.
Given enough network bandwidth, disk arrays that are mirrored asynchronously also can be kept in lockstep, but the asynchronous mirroring application provides no guarantee of that. Instead, only a best effort is made to keep the secondary disk array updated by the primary array. Some IT managers may elect to perform batch updates only periodically -- say, once per hour or once per day -- rather than allow continuous updates.
For continuous updates, increasing the network bandwidth available to asynchronous mirroring applications reduces the amount of lag between the updates to the primary and the secondary arrays, so less data is lost should the primary array go off line. For periodic updates, an increase in the amount of bandwidth available simply shortens the backup time. That has become increasingly important as overnight backup time windows shrink due to globalization of commerce and the resulting need for around-the-clock access to data.
There are many ways to estimate the amount of bandwidth needed for metropolitan- and wide-area asynchronous mirroring, but a fairly good starting point is to size the network links for the peak hour. At times within that hour, there may be insufficient bandwidth to handle sporadic bursts of data, but for most applications, the bursts last only a few seconds, after which there is ample bandwidth again and the secondary array can catch up.
Assuming that the bursts are not sustained, the peak hour estimate can be used. This approach also can be used for applications in which there are sustained bursts, but which can tolerate the loss of a few minutes of data. In this case, it is okay for the secondary array to fall behind for extended periods of time.
If historical traffic patterns are not available for the primary array, then the activity rate for the peak hour can be estimated by using the procedure shown in Figure 10–3.
|Determine the amount of data stored on the primary storage array.|
|Estimate the portion of the primary storage that is changed on a peak day. For example, the peak day might be a certain day of the week or the last day of each fiscal quarter.|
|If the activity on that day is concentrated mainly into, say, eight or 10 hours, then an average hourly data rate for the peak day should be calculated over that amount of time rather than over 24 hours.|
|If there is a peak hour during the peak business day, it should be used for the estimate rather than simply taking the average rate. For example, if from 1:00 to 2:00 p.m. the data rate is two or three times the average, the circuit should be sized for that rate.|
Figure 10-3: Estimating the amount of data to be mirrored during the peak hour.
Figure 10-4 shows a numerical example of an estimate of the activity rate for the peak hour.
|The usable capacity of the primary storage array is 12 TB (terabytes). However, the array only is about one-third full, so assume that 4 TB (4000 GB) of actual data is stored.|
|On the peak day, approximately 15% of the data is changed: 4000 GB x 0.15 = 600 GB|
|During the day, most of the activity occurs between 9:00 a.m. and 7:00 p.m. (10 hours), so the average amount of data written per hour during that period is approximately 600 GB/10 hours = 60 GB/hour.|
|The amount of increase over the average for the peak hour of the day is not known, so a peak-to-average ratio of 2.5 is used: 60 GB/hour x 2.5 = 150 GB/hour.|
Figure 10-4: Numerical example of a peak hour data activity estimate.
Once the data rate for the peak hour has been measured or estimated, the network bandwidth requirements can be calculated. Network bandwidth is measured not in bytes, but in bits, per second, so care is needed to insure the correct units are used. Also, some framing overhead is required to transport the data over the network, and that must be added to the bandwidth calculation, as shown in Figure 10-5.
|Convert the peak hour data estimate from GB/hour to MB/second.|
|Add approximately 10% for network protocol framing overhead.|
|The amount of data written to the primary storage array during the peak hour is 150 GB. That can be converted to megabytes per second (MBps) as follows: 150 GB/hour x 1000 MB/GB = 150,000 MB/hour 50,000 MB/hour x 1 hour/3600 seconds = 41.6 MBps|
|In bits per second (bps), the estimated WAN bandwidth required for this application is 45.8 MBps x 8.39 Mb/MB = 384 Mbps|
Figure 10-6: Numerical example of a network link traffic estimate.
Some of the new multiprotocol storage switches and IP routers have a data compression facility, which reduces the amount of network bandwidth needed. For highly repetitive data, the compression ratio can be better than 10:1. Ratios for mixed storage data typically range from about 2:1 to 5:1, so an estimate of 3:1 is fairly conservative.
For the example shown in Figure 10-6, the assumption of a 3:1 data compression capability would reduce the bandwidth requirement from 384 Mbps to 128 Mbps, which could be handled safely by a standard OC-3c (155 Mbps) network link. Without data compression, a much more costly OC-12c (622 Mbps) network link would have been required for this application.
If, during the peak hour, there are sustained bursts of data, and the loss of a few minutes of data is unacceptable, then a better network bandwidth estimate may be needed. Fortunately, in situations where network designers know that there are sustained bursts, the knowledge most likely was derived from historical traffic patterns on the primary disk array, so that same traffic pattern information also can be manipulated to produce a better bandwidth requirement estimate.
For example, if the goal were to limit the data loss to no more than 10 seconds, the historical traffic pattern observations could be sliced into 10-second samples and the network link could be sized to handle the largest observed value. That may call for a link that has much more bandwidth and costs much more than the one that was derived from the peak hour calculations.
If the cost of the faster link is too great, a calculated risk can be taken as a compromise. More sophisticated statistical analysis could be used to determine, say, the 95th percentile value of the peak 10-second observations, and the necessary network bandwidth could be derived from that. In that case, the link would be expected to handle all but 5 percent of the peak data rates for any 10-second interval. Depending on the variability of the data rate, it's possible that the 95th percentile of the 10-second interval data rates could be covered by the bandwidth needed for the peak hour.
Other analysis tools can be used to produce bandwidth requirements estimates. For example, if the raw traffic data is available for the primary array, it can be relatively easy to use time series analysis to produce an estimate of the peak loading for various intervals of time. A basic moving average model can be used, adjusting the averaging period to correspond to the maximum amount of data loss that can be tolerated and taking the peak values produced by the model.
In general, as the sample intervals become smaller -- one second or subsecond -- bandwidth models for asynchronous mirroring become virtually identical to bandwidth models for synchronous mirroring. However, since there is no possibility of data loss for synchronous mirroring, the amount of bandwidth allocated to those applications affects instead the amount of time that the servers have to wait for the data to be written remotely before they can proceed to the next write operation. That waiting time also is affected by the latency of the network, which is examined in the next section.
The above tip was excerpted from IP Storage Networking: Straight to the Core. Get additional book excerpts and information below.
Long-distance storage networking applications
IP storage networking expands remote data replication
IP storage data replication technology
Sizing link bandwidth for long-distance applications
Network latency effects on application performance
TCP effects on application performance
About the book: Whether you're a technical or business professional, IP Storage Networking: Straight to the Core will help you develop storage action plans that leverage innovation to maximize value on every dime you invest.
About the author: Gary Orenstein has been active in the IP storage networking industry since its inception with a career spanning multiple network storage companies and industry efforts. He was an initial governing board member of the Storage Networking Industry Association (SNIA) IP Storage Forum where he helped develop, promote, and deliver educational information furthering market growth. Gary is currently vice president of marketing at Compellent Technologies, a network storage company.