Technology is fine, but its deployment (or not) is a business decision that must be made using the same sort of hard-headed business criteria as are applied to other business issues. In this chapter we'll learn about some of the criteria that come into play, strategies that companies apply in deploying cloud-based applications, and what a cloud application can mean for your organization. We'll discuss:
- Can you even use a cloud?—We've talked a bit about regulatory issues, but what are the other issues, and is this really the next step?
- Do you have enough Internet feed into your organization to use clouds instead of local infrastructure?—Moving to cloud desktops might sound great, but you don't get something for nothing. We'll look at where the costs might potentially shift.
- Load balancing—What is it? How is it going to help us? How does it work with clouds?
- Global load balancing and auto provisioning—How can you apply global load balancing to use clouds for on-demand capacity?
- Computing on demand—Do you really have to upgrade your computing infrastructure for that special project, only to let it rot after that project is done? Why not use the cloud for special projects instead of building more asset liability?
- Clouds as the DMZ for partnerships—Why are clouds becoming the neutral territory for a growing number of businesses? Why did the authors decide against setting up a server to host our writing efforts?
- Federation—Are clouds going to be the key technology that finally makes federated computing a reality? Why does it make sense, and are we already starting to see the beginnings?
Business Concerns About IT
Let's begin with a quick review of the basic concerns of business about IT. It's all about return on investment (ROI) and the black hole that is a data center as a huge corporate investment. The care and feeding of a modern data center is a nontrivial affair with a decision-making process akin to dancing a polka through a minefield. While the business concern is about ROI, the biggest fights tend to be over control: who gets it and who wants it.
The data centers and switch closets of companies are filled with departmental servers that are there just because a couple of personalities argued about things such as remote access, operating system support (or lack thereof), who has root access, who can add/edit/delete users, and so on. It's often just easier to buy an additional server than to fight these battles up and down through the organization. For those who do decide to battle it out, it can feel like fight night at every budget meeting; meanwhile, those servers suck up power, add heat to the office, add noise to the office, and prevent facilities from being able to shut down the office on holidays.
On the flip side, new environments lead to new IT training and personnel costs, and with shrinking budgets, saying "No!" has become a fashionable knee-jerk reaction. So, while on-demand clouds or clouds in general might sound like a magic solution, business decision processes demand that we know just where the hidden costs lie.
That's the environment in which cloud computing is being considered and in which decisions are being made. Is the cloud decision just about numbers, or are there issues to be considered that are more difficult to quantify? What kinds of numbers are you going to need to consider making cloud decisions?
Can Your Business Cloud?
The first question is the most basic: Can you use a cloud? This is far from being a technology-only question. In some cases, regulatory issues mandate that your data stay within a particular country; with today's global load balancing, that can't always be put into a service agreement. "It's 10:00—Do you know where your data is?" isn't just a clever take on an old TV ad. The abstraction layers that were so exciting when we were talking about the technology can be incredibly complicating when it comes to policy. We know that the federal courts wanted to use some of the emerging cloud backup solutions, but proxied Internet access combined with out-of-country storage prevented at least one try at adoption.
Second, does the cloud service support your existing applications, or are you looking at migration costs on top of IT retooling costs? The phrase "total cost of ownership" has been greatly abused in the last decade, but when considering a substantial shift in technology customers, you must think about training costs, temporary productivity disruptions, and support costs in excess of normal run-rate expenses. You also have to remember to extend your search for app support all the way out to the edge and in some cases out to your business partners. Consider a company like Walmart, for example: Some of their applications directly affect communications paths with their supply chain. If they were forced to push a major process like supply chain into the cloud, would they also be forcing their suppliers to upgrade similarly? The answer is almost certainly "Yes," and while Walmart has the market muscle to ensure that suppliers follow along, most companies don't have that much clout. Understanding how far the ramifications of a shift to the cloud will spread is another key consideration for executives pondering the change.
A commonly overlooked application with organization-wide ramifications is the email and calendaring combo, especially as they connect to the enterprise directory infrastructure. When we reviewed Microsoft's online services, some of the key questions were about the costs and mechanisms required for the migration. We looked at whether it was better to migrate completely or to try to make a cloud application platform coexist with a large legacy active-directory infrastructure. Microsoft's online services had migration tools for active-directory infrastructure, but other cloud service providers may not.
In Chapter 4 we talked about the analogy of using the cloud like a rental car, and taking the technology for a test drive before buying something you'll have to live with for years. If you're serious about considering Microsoft Exchange for your business, take it for a test drive using Microsoft Office Online services for a representative segment of your user community. Live with it, learn it, and make sure you find all the warts. While the trial is going on, make sure someone keeps track of the hidden costs are. How much time is it taking to manage? Did someone have to go out and buy a whole bunch of books to learn how the pieces fit together? Can you realistically support this if you decide to move forward? Just think to all the pieces you already have to fund, and imagine the increase or decrease in support cost when/if the program is expanded.
It should also be reiterated that clouds are great because they're normally pretty easy to walk away from. Instead of holding the pink slip on a new data center, you can just walk away if the project turns out to be a bust.
Bandwidth and Business Limits
Next under the microscope is the question of external versus internal bandwidth. A decade ago some people thought we were about to enter an era in which bandwidth would be the cheapest possible commodity. In 2009, bandwidth costs were carefully watched and considered by every company. Moving application bandwidth from LAN links that aren't metered to WAN links that are is another of those costs that must be carefully considered when a move to the cloud is proposed. In addition to the dollars to move bits, there are the dollars represented by application performance to consider. Those critical enterprise applications that were so snappy when they had to travel only through internal gigabit pathways now have to make it through to a cloud, a pathway that includes the corporate firewall and the rest of the security infrastructure. Now, the list of factors to take into account includes pieces of the network infrastructure. Is that firewall even capable of handling the new aggregate throughput of shoving that application into the cloud? Is your external Internet feed even big enough for your internal users? The impact of network bandwidth and infrastructure is dramatic, but it is only one of the technology issues that need to be taken into account when working toward the decision to expand enterprise applications into the cloud.
Testing for Clouds
Determining whether you have the necessary bandwidth can run the gamut from simple to extremely complex, though as the complexity increases, so does the accuracy of the model. On the simple side, you can use a site such as Speedtest.net and choose a server target that's fairly close to your cloud provider. Speedtest.net will toss a bunch of files back and forth to give you a thumbnail of the throughput possible between your two sites. However, this simplistic view of the world uses fixed packet sizes over a short duration, and it measures the throughput at only a single point in time. You might consider using Iperf, where you can vary the packet size and duration of the throughput test. Although it has the ability to run under Linux or Windows, iPerf is still fairly simplistic, but at least it considers the fact that network traffic isn't all made up of single-sized packets. At the complex end of the spectrum, Ixia Communications is now the owner of the Chariot application throughput test tool. This piece of software consists of endpoints and a management console. The management console allows you to set up synthetic traffic patterns between the endpoints that can consist of varying amounts of different traffic types. For instance, you use a protocol analyzer and a network tap to look at the traffic exiting your firewall. You find a mix of HTML, SSL, IMAP, POP, SMTP, FTP, and some miscellaneous stuff. The Chariot console can set up synthetic data streams that simulate a variable number of users doing different types of network functions. Since Chariot typically has access to all the resources on those endpoints, a single modern computer can easily simulate several users' worth of data. This gives you the ability to run after-hour's simulation of your entire company. What kinds of synthetic traffic you can toss around includes a pretty big collection, with data streams such as
- YouTube video
- Skype VoIP traffic
- Real streaming video
- SIP trunks
- SIP conversations
- Web traffic
- And many others
The power of this system is the ability to put endpoints on just about any workstation or server technology on the market and even some switch blades from various network equipment manufacturers. The ability to do "what if" scenarios on your network during off-hours is an extremely powerful tool, easy enough that you could run a bunch of "what ifs": "If I moved my key applications to the cloud, would I have enough bandwidth for those specific applications?" "If there is enough bandwidth, is the link jitter and latency low enough to support voice-over-IP?"
Let's assume you've done a bunch of testing, and so far it seems the answer is a slow migration to the cloud. The first step is to get a handle on what's out there and exactly where it is.
Remote Access and the Long March to the Clouds
Not long ago, IT expansion meant more racks in the data center, more power to feed those racks, more air conditioning to cool them, expanding backbone connections to link them, and perhaps more IT staff for the care and feeding of those new servers. Those new racks meant capital expenses, physical assets, human resources, and recurring costs, all of which affect the bottom line. The question we've heard from CFOs around the world has always revolved around, "Is there a way to make that data center cost less?" The question has never been asked with more urgency than in the most recent two or three years, and the answers have never been more critical to the health of the organization. Cloud computing seems to offer an ideal way of reducing the capital costs and many of the recurring expenses, though we've seen that there are other costs that may limit the immediate impact of a migration into the cloud. While we're still thinking about the costs of cloud computing, we should consider a few additional items that can weigh on the pro or con side of the decision.
Just what, for example, is the life cycle of the project you're considering? Using the New York Times indexing project described in Chapter 4 as an example (http://open.blogs.nytimes.com/tag/aws), the Times was looking at several racks of blades, server licenses, Adobe Acrobat licenses, power, cooling, and personnel for a project that more than likely would have to be done only once. Then all those assets would either have to be sold, or re-tasked within the organization. This is where our CFO asks how much of our original investment can be recovered if we can return or sell these temporary assets. "Can't you just rent that gear?" is a CFO war cry heard all over the world. What cloud computing gives us is the ability to give it all back, for a small fraction of the long-term asset cost.
With all the issues we've provided to think about, it's possible that we've not yet considered the most important question: How, precisely, will you use the cloud? To begin answering this question, it's useful to think in terms of models.
One of the models we most often hear about is "local prototyping, remote production." This model had its roots in behavior that started in software development groups before cloud computing began. Programmers began installing VMWare or Virtual Server onto their workstations simply to provide prototyping for new systems. The reasons were fairly straightforward: Virtual machines were far less expensive than actual banks of new computers, and virtual operating system images that are hosting still-buggy applications in development can be blown away and regenerated much more quickly than similar images running on dedicated hardware.
So far we've talked only about savings on physical infrastructure. How about demand-based expansion? An application or set of information that is unavailable because the server can't keep up with demand is just as useless as an application that is bug-ridden. While excess demand can be considered a "high-class problem," it is a problem, and it can come from a variety of sources. Depending on your target market, your company might be SlashDot'ed or covered by CNN and get a massive surge in Web traffic. If you were to have enough foresight to try to plan for this, what would it cost you? We'll look next at a couple of ways to implement a strategy for this situation.
Traditional Server Load Balancing
The first server load balancing systems were simple: They just divided up the incoming Web requests among several physical servers. They did this based on simple algorithms that depended on basic round-robin scheduling or elementary demand-feedback routines. These load balancers had the advantage of also allowing for maintenance of a server by shifting its load to the other servers in the group. These Layer 4 (referring to the ISO seven-layer networking model) devices had a single public IP address for each service (FTP, HTTP, etc.) and were configured to split the incoming traffic up among two or more physical servers behind it. Coyote Point has a great demonstration on their website: http://support.coyotepoint.com/ docs/dropin_nav.htm.
A typical load balancer configuration would go something like this:
- The DNS name for the server cluster is set up to point to the outside or public address for the load balancer.
- Inside or private addresses are assigned to various servers behind the load balancer.
- The load balancer is then told which private addresses are serving what type of network service (i.e., Web, ftp, email) and whether a weight should be assigned to larger, faster servers in the collection.
- Then a choice is made as to what kind of load balancing should be used: round-robin, Gaussian distribution, weighted average, etc.
- If a machine needs servicing of some sort, the system administrator declares a machine to be out of service, and the load balancer shifts load to the remaining servers.
Key to this whole arrangement working is that each collection of servers has access to some sort of common storage system (i.e., NFS). Load balancing in many cases came in the back door as a method to extend the backup window for many critical services. By shifting the load off a primary server, it could be frozen in time and have a full backup done without worries about open files and such. In many cases backups were taking longer than the system administrator's window of opportunity, forcing the migration to some sort of load balancing.
The downside to this plan was that adding servers to respond to larger than anticipated loads was a long and expensive process, and the process was inherently reactive: In most cases, capacity couldn't be added until after the traffic surge had passed. More critically, the servers that were added were static, dedicated to a single purpose when deployed. Load balancing wasn't really dynamic in that FTP servers, for example, couldn't be reallocated to handle HTTP traffic without large amounts of human intervention. There's a way to balance in genuinely dynamic ways, but financial officers don't like it.
The workaround is to deploy a series of new servers, put all the necessary services for all applications on each server, but not route traffic to them until needed. This way a system administrator can quickly alter the load balancer's configuration to add additional Web servers to handle an unanticipated load spike. Once again, though, this requires encumbered resources sitting idle (and sucking up power and cooling) to handle a load spike that may never occur. This was all a guessing game played by IT groups all over the world, and a boon to hardware and software vendors worldwide. Now, this was not necessarily a bad thing, since backup facilities of some sort are part of everyone's business continuity plans. With virtualization and cloud computing, though, there may be a better way.
The Virtualization Load Response
Scyld Software (part of Penguin Computing) was the first company we know of to deliver products that saw computing clusters change from Beowulf scientific-style cluster computing to business clusters. In the Scyld system, a virtualization kernel was installed on each server in the cluster and applications were distributed across these. The distinctive feature of Scyld's software wasn't in the virtualization cluster, though, but in how this system could detect incoming application loads and apply business rules to the problem of how to handle unanticipated loads. The example the company gives was how they handled a massive spike in Web traffic. Their system would move the Apache Web server from a shared system (multiple applications all sharing a single physical server) to a dedicated server. If the setup was configured correctly, this happened automatically. An added benefit was that it was not bound to a single type or model or server, but rather could be run on a heterogeneous collection of boxes with weight assigned to them to vary the load.
A few years later, VMWare started offering a system called VMotion (www.vmware.com/products/vi/vc/vmotion.html), which took this idea quite a bit further. The VMotion concept was to have a collection of servers all running the VMWare infrastructure. Under normal circumstances, machine #1 could be running a collection of virtual servers that might consist of Apache Web servers and email services. Machine #2 might be running SugarCRM, and machine #3 might be running billing software. Let's imagine a case in which Company X has decided that if a huge surge in Web traffic occurs, the business won't be hurt if billing is delayed by a day. So their IT group has set up business rules that allow VMotion to shift the Apache Web server to a dedicated server if a huge load starts up. When the load disappears, the Web server will move back to a shared server and billing will be resumed. Those rules could be modified to also handle automatic migration of running servers to another physical server if a hardware failure should occur. This takes virtualization much of the way to the scenario that might exist when a company deploys a "private cloud." What's missing from this current scenario is how to detect when an application like Apache has crashed even if the virtual server is still up. Previously, IT professionals would write custom scripts for UniCenter or OpenView that would periodically probe to see if applications were running on the target machine and, if not, send a reset script to the system in question. Early efforts were more of a "Hail Mary" in that they would keep sending the reset over and over again if the application had crashed badly and restarting the system wasn't fixing it. More sophisticated scripts started appearing, and as the Microsoft Power Shell interface documentation became widely known, testing at the application level and then restarting more intelligently became commonplace.
Taking this knowledge base quite a bit further, Coyote Point has extended its application load balancer into the VMWare world to the extent that rules can be set up for spawning additional machines from prestored images. This generation of load balancers is able to probe higher in the ISO stack, and it has the ability to detect if a Layer 7 application like Apache has crashed and then do something about it. According to Sergey Katsev, Engineering Project Manager at Coyote Point Systems:
Actually, we have a few customers who have a few applications "in the cloud" and still have a minimal datacenter "since they have control of it." Either way, app load balancing is needed since otherwise you don't know when your application has failed. . . . Amazon or whatever will guarantee that the "server" remains up, but they have no way of guaranteeing that your Apache Web server hasn't crashed.
With technology and deployment moving toward cloud capability, the next big question is where the servers and applications will live. This is the point at which the cloud begins to separate from simple virtualization, and decisions we've discussed earlier—decisions about bandwidth and networking infrastructure—are joined with business strategy concerns to determine whether it's time to move data and apps out of the local network. Now an IT professional has the choice to have apps live both in a local data center and in the cloud. It isn't a hard stretch to imagine that most of the time a key e-commerce app will live in a small but adequate data center in Corporation Y. Suppose, however, that a CNN reporter stumbles across their newest widget and highlights it every half-hour all over the world. Suddenly the Web load on this tiny little e-commerce app skyrockets, and if nothing is done the server in question will die a horrible death. However, preplanning has paid off, the meat-and-potatoes apps have already been set up in the clouds, and the load balancer is spinning up the cloud apps in a big hurry. Now, with the business surge spread across the entire North American continent (and the small but adequate data center), Corporation Y can reap the benefits of the CNN report.
Brian J. S. Chee is a senior contributing editor at InfoWorld magazine. Chee currently does research for University of Hawaii School of Ocean and Earth Sciences and Technology and has a unique insight into networking trends and the emergence of new technology.
Curtis Franklin Jr. is a senior writer at NetWitness and been an information technology writer for more than 25 years.
Printed with permission from CRC Press. Copyright 2010. Cloud Computing: Technologies and Strategies of the Ubiquitous Data Center by Brian J. S. Chee and Curtis Franklin Jr. For more information about this title and other similar books, please visit http://www.crcpress.com/.