Like most of you, one of the reasons I'm in the field of technology is because I get to learn new things on every project and take those lessons with me to the next one. For instance, for one recent project, I leveraged my knowledge of network-attached storage (NAS) from past projects for a large-scale one involving a global NAS implementation. At the time, I was a principal consultant for a large manufacturing company's NAS filer implementation. The NAS filers were intended to address several problems the customer had: legacy software (think Windows NT 4.0 -- very legacy!) as file servers in hundreds of remote branches; limited IT staff in those remote branches, making nightly tape backups and off-site shipping challenging; and high support fees for legacy operating systems.
To address these issues, a NAS solution was proposed where the remote branch NT file servers would be replaced with small filers and their source volumes would replicate over the WAN back to hub site filer locations (there were several hundred) as a secondary backup. The plan included having large cluster filers in the main data center, which would be a repository for the branch data replication volumes as well as a primary NAS filer for users in the hub site locations. The solution was great on paper, but there were many problems with the actual implementation.
Problem 1: Baseline data replication
The NAS solution called for the source volume data to replicate back to the main data centers as a secondary backup. The initial baseline data replication would require a full transfer of the source volume data. Once that completed, the NAS systems would send only 4 KB of changed block data. The initial baseline replication was a challenge for a number of reasons: There was limited bandwidth on the WAN; users used the WAN for daily retrieval of mail and other data; and there was a lot of source volume data that needed to be replicated.
When initiating the baseline replication over the WAN, I noticed multiple problems. First of all, the replication technology for the NAS solution was so aggressive that it took all the WAN bandwidth, showing up on the WAN engineers' radar as the top talker on the circuit. We tried addressing the problem with QoS controls: I asked the WAN engineers to place the TCP port used for data replication in a lower queue on the WAN. It should have throttled the NAS data replication down when higher-priority traffic traversed the WAN and ramp the NAS data replication up when there wasn't any traffic of higher priority. But in testing, it was clear that the QoS did not work. The NAS data replication kept taking up all the WAN bandwidth (the reason why the approach didn't work was and remains a mystery).
After abandoning the idea of QoS controls for baseline replication, I began focusing on tape-based migration while figuring out what to do with WAN-based migrations. In fact, when the project began, I'd questioned why we couldn't simply back up the data to tape and restore from tape in the hub site locations for the initial baseline replication.
Unfortunately, there were many inherent problems with this approach. The main issue was that the file system of the NAS solution used snapshots as a means of establishing a common point in time for the source and the destination NAS; with standard tape-based migration, we wouldn't have snapshots and therefore no common point in time would be established. Without that common baseline, even if the customer retrieved all the data from tape onto the destination system, the NAS system would not know where to pick up the replication properly for incremental backups.
To address this problem, we decided to use a function on the filer to back up the source data and during the process the filer would create the snapshot -- the process is called "snapmirror to tape." I then shipped small tape libraries to the remote sites. By means of a Web GUI, I was able to control the robot to move the tapes in and out of the drive remotely. On the filer console, I backed up the filer using the snapmirror-to-tape function, restored the volume in the hub site location and established the replication relationship; the NAS filers sent only 4 KB block-level changes from that point onward. This worked surprisingly well in the United States. However, as we got into South America, Europe, Africa and Asia Pacific, I found that moving tapes and tape libraries across several country borders was a challenge because of customs. The mail carriers in other countries were not as prompt as in the United States, and it was taking too long for the tapes and the tape libraries to reach their respective locations. I had to come up with a faster solution.
This time, I turned to Perl throttling in combination with WAN-based replication. While there was another application written by the vendor that could perform the throttling, but it presented some major risks. For instance, it would take the replication schedule files, which were plain text files, and put them in its internal database. If there was a problem with the application, it was possible that all global replication could stop; I didn't have enough time to do a failure and restore test. The application also created the destination volumes names in its proprietary format as opposed to the standard naming convention that our team decided on. The naming convention was key in quickly understanding which volume belonged to which filer and would be important in case of an emergency or an outage.
Beyond that, using the NAS system's native throttling capability, during non-business hours, when we needed the replication at full speed, it would remain in the throttled-down state. If we allowed the replication to stay in a throttled-down state, the project timeline would get pushed out too far. We needed a way to throttle the data replication so that it wouldn't take bandwidth away from the business users and then unthrottle at night and on weekends to take full advantage of the WAN circuit when business users were not using it.
So I wrote a Perl script that would throttle the replication during the day (that is, 6 a.m. to 6 p.m.) and unthrottle the data transfer on nights and weekends. The Perl script would take into consideration the time local to the source filers; this was important because I was dealing with filers in multiple time zones.
The most difficult part of using the Perl script was figuring out what throttle values would keep the data replication off of the radar scope of the WAN team and replicate the data at a good rate to move the project along. I sat with a WAN engineer and literally went through hundreds of circuits to determine whether we needed to upgrade the WAN link to make a WAN-based migration possible and determine what the throttle values should be for business hours. For WAN circuits that needed to be upgraded, we checked to see whether that was even possible in that area of the world. For some sites, we determined how long the WAN-based migration would take compared with tape-based migration given the delay times for shipping to other countries. In some cases, the tape-based migration process that we'd rejected as too time-consuming for non-U.S. sites ended up as a better choice for a particular site.
The Perl script ran every 15 minutes in the environment across all the NAS data replication repositories to throttle and unthrottle data replication. The initial intent was to run the script until the baseline replication completed, but we decided to keep it running after the initial replication completed to catch any nightly updates that ran into business hours. This Perl solution kept the project running on time, on target and became part of the infrastructure.
Problem 2: Snapshots
Our NAS solution used "allocate on write" operation for snapshots as opposed to "copy on first write." What do I mean by that? With snapshots that allocate on write, when the snapshot is taken, there are pointers to the 4 KB blocks of data; those blocks of data are essentially locked down until all snapshots that are pointing to that block are no longer referenced. Copy-on-first-write snapshot technology, on the other hand, physically moves the data blocks to a copy-out area, then overwrites the data block in the active file system.
Which of these approaches is better? Each has its advantages and disadvantages, but as we found out, allocate on write was a problem during our implementation. The technique allows the product to take a snapshot instantaneously and keep as many as 255 snapshots, with absolutely no performance penalty. Copy on first write penalizes performance on write operations because it first has to copy the block and then overwrite the block. And if the copy-out area gets full, you could lose all your snapshots. The advantage is that the data space is protected since the snapshot data resides in a different area than the active data. In an allocate-on-write scenario, the snapshot could grow larger than the data in the active file system, and since the active data and snapshot reside in the same space, your file system could run out of space -- not because of active data but because of snapshot data.
So why was allocate on write problematic for us? During deployment of the NAS filers, we learned that the migration team had a migration tool that wasn't too smart. It would begin copying the source data to the destination NAS filer and would have several iterations before the final cutover. If the migration tool lost contact with the original Windows server because of DNS, IP or other issues, the tool thought the data was no longer on the source system and would begin deleting data on the destination NAS file system. Snapshots of the file system were being taken every day, and when the migration tool deleted all the data, the active file system was empty but all the blocks pointing to the data were still locked down in the snapshot. The snapshot became several hundred gigabytes in size while the data on the filer was zero gigabytes!
Fortunately, the NAS filer had the ability to revert the entire file system to a point in time. In fact, it literally took just seconds to revert the file system back because the file system was based on pointers and never really moved any data -- impressive. We reverted the file system back to a time when all the data was in the active file system and began the next iteration of the data migration; granted, we had to use the same data migration tool but were able to restart the process from a point in time before the error occurred.
Problem 3: Building the filers
Another problem I saw when starting on the project was that all filers were being built manually. I am a firm believer in automation and would have done well in the Industrial Revolution. Doing things manually introduces problems of quality control, takes too much time and costs too much money. Think of a car manufacturer. Can you imagine each car being built by hand? Y the person who built your car was detail-oriented and really paying attention during the construction. Plus, building it by hand would take too long and carry an astronomical cost. Think of an Aston Martin. Great car, but only Simon Cowell and James Bond can afford them.
The same is true of building out NAS filers for hundreds and hundreds of branches. Having a highly salaried NAS engineer build filers by hand is not cost-effective, the quality could be questionable, and the time to build is too long. To automate the process, I scripted it with Perl. The script was designed to assign the right disks to the right filer head if it was a cluster, leave the appropriate disks for hot spares, create the aggregate, resize the root volume, set the security, create the volumes with the correct naming convention, and set the correct RAID size and RAID level. I compressed the time to build a filer from two days down to literally seconds, and since the filers were being put through my Perl automation process, the result was the same every single time. (I made sure to thoroughly test the script before implementation, since if the script was wrong on one system, it would be wrong on all of them.) The script saved roughly 4,000 man-hours.
In the end the NAS solution worked well, although we faced problems during implementation. And in retrospect, we should have deduped the data on the source to cut down the data footprint in the branches. The networking team did research putting in a caching solution for the WAN to dedupe the packet traffic, but for one reason or another, the solutions reviewed were never accepted by the customer. However, the project was a success because it accomplished multiple goals for the customer: It broke the OS upgrade and patch management cycle, and it provided offsite backups without the cost and maintenance of tapes; beyond that, the NAS vendor that was chosen has a global presence and can handle break-fix issues, and the NAS filer provided highly resilient RAID technology in remote sites that had little or no IT resources.
About the author
Seiji Shintaku is a principal consultant at RTP Technology. Previously, he was global NetApp engineer at Lehman Brothers, Celerra and DMX engineer at Credit Suisse First Boston, principal consultant at IBM and global Windows engineer at Morgan Stanley. He can be reached at [email protected]. RTP Technology, based in Fairlawn, N.J., is a reseller of products from NetApp, EMC, VMware, F5 and Quantum. The company also provides professional services for storage-based solutions. It can be reached at (201) 796-2266.