Managing Xen shared resources: Credit scheduler and Xen scheduler

This chapter excerpt examines the advantages of Xen and the best ways to manage Xen shared resources, such as physical and virtual CPUs, the credit scheduler and Xen scheduler.

Solutions Provider Takeaway: Solutions providers can use this chapter excerpt to learn about the advantages of Xen. You'll also find information on managing Xen shared resources, such as physical and virtual CPUs, and how to use the credit scheduler and Xen scheduler .

About the book:
This chapter excerpt on Hosting untrusted users under Xen: Lessons from the trenches (download PDF) is taken from The Book of Xen: A practical guide for the system administrator. This book advises solutions providers on the best practices for Xen installation, networking, memory management and virtualized storage. You'll also find information on virtual hosting, installing and managing multiple guests, easily migrating systems and troubleshooting common Xen issues.

Now that we've gone over the basics of Xen administration -- storage, networking, provisioning, and management -- let's look at applying these basics in practice. This chapter is mostly a case study of our VPS hosting firm,, and the lessons we've learned from renting Xen instances to the public.

The most important lesson of public Xen hosting is that the users can't be trusted to cooperate with you or each other. Some people will always try to seize as much as they can. Our focus will be on preventing this tragedy of the commons.

Advantages for the Users

There's exactly one basic reason that a user would want to use a Xen VPS rather than paying to colocate a box in your data center: it's cheap, especially for someone who's just interested in some basic services, rather than massive raw performance.

Note: Grid Computing and Virtualization
One term that you hear fairly often in connection with Xen is grid
computing. The basic idea behind grid computing is that you can quickly and automatically pro-vision and destroy nodes. Amazon's EC2 service is a good example of a grid computing platform that allows you to rent Linux servers by the hour.

Grid computing doesn't require virtualization, but the two concepts are fairly closely linked. One could design a system using physical machines and PXEboot for fast, easy, automated provisioning without using Xen, but a virtualization system would make the setup more lightweight, agile, and efficient.

There are several open source projects that are attempting to create a standard and open interface to provision "grid computing" resources. One such project is Eucalyptus ( ). We feel that standard frameworks like this -- that allow you to easily switch between grid computing providers -- are essential if "the grid" is to survive.

Xen also gives users nearly all the advantages they'd get from colocating a box: their own publicly routed network interface, their own disk, root access, and so forth. With a 128MB VM, they can run DNS, light mail service, a web server, IRC, SSH, and so on. For lightweight services like these, the power of the box is much less important than its basic existence -- just having something available and publicly accessible makes life more convenient.

You also have the basic advantages of virtualization, namely, that hosting one server with 32GB of RAM is a whole lot cheaper than hosting 32 servers with 1GB of RAM each (or even 4 servers with 8GB RAM each). In fact, the price of RAM being what it is, I would argue that it's difficult to even eco-nomically justify hosting a general-purpose server with less than 32GB of RAM.

The last important feature of Xen is that, relative to other virtualization systems, it's got a good combination of light weight, strong partitioning, and robust resource controls. Unlike some other virtualization options, it's consistent -- a user can rely on getting exactly the amount of memory, disk space, and network bandwidth that he's signed up for and approximately as much CPU and disk bandwidth.

Shared Resources and Protecting Them from the Users

Xen's design is congruent to good security.
-- Tavis Ormandy,

It's a ringing endorsement, by security-boffin standards. By and large, with Xen, we're not worried about keeping people from breaking out of their virtual machines -- Xen itself is supposed to provide an appropriate level of isolation. In paravirtualized mode, Xen doesn't expose hardware drivers to domUs, which eliminates one major attack vector. For the most part, securing a dom0 is exactly like securing any other server, except in one area.

That area of possible concern is in the access controls for shared resources, which are not entirely foolproof. The primary worry is that malicious users could gain more resources than they're entitled to, or in extreme cases cause denial-of-service attacks by exploiting flaws in Xen's accounting. In other words, we are in the business of enforcing performance isolation, rather than specifically trying to protect the dom0 from attacks via the domUs.

Most of the resource controls that we present here are aimed at users who aren't necessarily malicious -- just, perhaps, exuberant.

Tuning CPU Usage

The first shared resource of interest is the CPU. While memory and disk size are easy to tune -- you can just specify memory in the config file, while disk size is determined by the size of the backing device -- fine-grained CPU allocation requires you to adjust the scheduler.

Scheduler Basics

The Xen scheduler acts as a referee between the running domains. In some ways it's a lot like the Linux scheduler: It can preempt processes as needed, it tries its best to ensure fair allocation, and it ensures that the CPU wastes as few cycles as possible. As the name suggests, Xen's scheduler schedules domains to run on the physical CPU. These domains, in turn, schedule and run processes from their internal run queues.

Because the dom0 is just another domain as far as Xen's concerned, it's subject to the same scheduling algorithm as the domUs. This can lead to trouble if it's not assigned a high enough weight because the dom0 has to be able to respond to I/O requests. We'll go into more detail on that topic a bit later, after we describe the general procedures for adjusting domain weights.

Xen can use a variety of scheduling algorithms, ranging from the simple to the baroque. Although Xen has shipped with a number of schedulers in the past, we're going to concentrate on the credit scheduler ; it's the current default and recommended choice and the only one that the Xen team has indicated any interest in keeping.

The xm dmesg command will tell you, among other things, what scheduler Xen is using.

# xm dmesg | grep scheduler
(XEN) Using scheduler: SMP Credit Scheduler (credit)

If you want to change the scheduler, you can set it as a boot parameter -- to change to the SEDF scheduler, for example, append sched=sedf to the kernel line in GRUB. (That's the Xen kernel, not the dom0 Linux kernel loaded by the first module line.)

VCPUs and Physical CPUs

For convenience, we consider each Xen domain to have one or more virtual CPUs (VCPUs), which periodically run on the physical CPUs. These are the entities that consume credits when run. To examine VCPUs, use xm vcpu-list <domain>:

# xm vcpu-list horatio
Name                  ID  VCPUs    CPU  State   Time(s)   CPU   Affinity
horatio                16     0     0      ---   140005.6  any   cpu
horatio                16     1     2      r--   139968.3  any   cpu

In this case, the domain has two VCPUs, 0 and 1. VCPU 1 is in the running state on (physical) CPU 1. Note that Xen will try to spread VCPUs across CPUs as much as possible. Unless you've pinned them manually, VCPUs can occasionally switch CPUs, depending on which physical CPUs are available.

To specify the number of VCPUs for a domain, specify the vcpus= directive in the config file. You can also change the number of VCPUs while a domain is running using xm vcpu-set. However, note that you can decrease the number of VCPUs this way, but you can't increase the number of VCPUs beyond the initial count.

About the authors:
Luke S. Crawford is a Xen consultant, working on corporate server consolidation in a Fortune 100 corporate environment. Crawford also works on a Xen hosting venture at

Chris Takemura is a recent graduate and occasional Xen consultant. He is currently working on a Xen hosting venture at

To set the CPU affinity, use xm vcpu-pin <domain> <vcpu> <pcpu>. For example, to switch the CPU assignment in the domain horatio, so that VCPU0 runs on CPU2 and VCPU1 runs on CPU0:

# xm vcpu-pin horatio 0 2

# xm vcpu-pin horatio 1 0

Equivalently, you can pin VCPUs in the domain config file (/etc/xen/horatio, if you're using our standard naming convention) like this:



This gives the domain two VCPUs, pins the first VCPU to the first physical
CPU, and pins the second VCPU to the third physical CPU.

Credit Scheduler

The Xen team designed the credit scheduler to minimize wasted CPU time. This makes it a work-conserving scheduler, in that it tries to ensure that the CPU will always be working whenever there is work for it to do.

As a consequence, if there is more real CPU available than the domUs are demanding, all domUs get all the CPU they want. When there is contention -- that is, when the domUs in aggregate want more CPU than actually exists -- then the scheduler arbitrates fairly between the domains that want CPU.

Xen does its best to do a fair division, but the scheduling isn't perfect by any stretch of the imagination. In particular, cycles spent servicing I/O by domain 0 are not charged to the responsible domain, leading to situations where I/O-intensive clients get a disproportionate share of CPU usage.

Nonetheless, you can get pretty good allocation in nonpathological cases. (Also, in our experience, the CPU sits idle most of the time anyway.)

The credit scheduler assigns each domain a weight and, optionally, a cap. The weight indicates the relative CPU allocation of a domain -- if the CPU is scarce, a domain with a weight of 512 will receive twice as much CPU time as a domain with a weight of 256 (the default). The cap sets an absolute limit on the amount of CPU time a domain can use, expressed in hundredths of a CPU. Note that the CPU cap can exceed 100 on multiprocessor hosts.

The scheduler transforms the weight into a credit allocation for each VCPU, using a separate accounting thread. As a VCPU runs, it consumes credits. If a VCPU runs out of credits, it only runs when other, more thrifty VCPUs have finished executing, as shown in Figure 7-1. Periodically, the accounting thread goes through and gives everybody more credits.

Figure 7-1: VCPUs wait in two queues: one for VCPUs with credits and the other for those that are over their allotment. Once the first queue is exhausted, the CPU will pull from the second.

In this case, the details are probably less important than the practical application. Using the xm sched-credit commands, we can adjust CPU allocation on a per-domain basis. For example, here we'll increase a domain's CPU allocation. First, to list the weight and cap for the domain horatio:

# xm sched-credit -d horatio

{'cap': 0, 'weight': 256}

Then, to modify the scheduler's parameters:

# xm sched-credit -d horatio -w 512

# xm sched-credit -d horatio

{'cap': 0, 'weight': 512}

Of course, the value "512" only has meaning relative to the other domains that are running on the machine. Make sure to set all the domains' weights appropriately.

To set the cap for a domain:

# xm sched-credit -d domain -c cap

Scheduling for Providers

We decided to divide the CPU along the same lines as the available RAM -- it stands to reason that a user paying for half the RAM in a box will want more CPU than someone with a 64MB domain. Thus, in our setup, a customer with 25 percent of the RAM also has a minimum share of 25 percent of the CPU cycles.

The simple way to do this is to assign each CPU a weight equal to the number of megabytes of memory it has and leave the cap empty. The scheduler will then handle converting that into fair proportions. For example, our aforementioned user with half the RAM will get about as much CPU time as the rest of the users put together.

Of course, that's the worst case; that is what the user will get in an environment of constant struggle for the CPU. Idle domains will automatically yield the CPU. If all domains but one are idle, that one can have the entire CPU to itself.

Note: It's essential to make sure that the dom0 has sufficient CPU to service I/O requests. You can handle this by dedicating a CPU to the dom0 or by giving the dom0 a very high weight -- high enough to ensure that it never runs out of credits. At, we handle the problem by weighting each domU with its RAM amount and weighting the dom0 at 6000.

This simple weight = memory formula becomes a bit more complex when dealing with multiprocessor systems because independent systems of CPU allocation come into play. A good rule would be to allocate VCPUs in proportion to memory (and therefore in proportion to weight). For example, a domain with half the RAM on a box with four cores (and hyperthreading turned off) should have at least two VCPUs. Another solution would be to give all domains as many VCPUs as physical processors in the box -- this would allow all domains to burst to the full CPU capacity of the physical machine but might lead to increased overhead from context swaps.

Hosting Untrusted Users under Xen: Lessons from the Trenches
  Managing Xen shared resources: Credit scheduler and Xen scheduler
  Monitoring Xen network traffic and usage with network resources
  Using Xen PyGRUB, ionice to manage storage and disks

Printed with permission from No Starch Press Inc . Copyright 2009. The Book of Xen: A Practical Guide for the System Administrator by Chris Takemura and Luke S. Crawford. For more information about this title and other similar books, please visit No Starch Press Inc.

Dig Deeper on Desktop virtualization technology and services