“So, there is a problem with accounting. Here's the long and short of it.
The host runs at
HZ ticks per second. Unless you changed it (and
you likely did not), that’s 100 ticks per second. This means the
smallest granularity of time that can be quantized inside the VM
is “slightly less” than 100 ticks per second. And of course, you
can’t fire 100 ticks into the guest (for each guest) if the host
itself is only receiving 100 opportunities to do that per second.
Similarly, each RTC in the guest expects a periodic timer firing at 128 Hz. Clearly that can’t be accomplished if the host itself is only scheduling at 100 Hz.
What this leads to is errors in accounting since the guest kernels
don’t have a consistent view of time. What you’ll likely see is odd
numbers for the “rate” column in
This is fairly harmless, but could be contributing to the “87%” interrupt number. I bet you actually aren’t running 87% interrupt load there, but the accounting is confused because of the skewed timer.
-current, I fixed some of the egergious problems in
asserted PIC lines in slovenia last month. (There are still more
to fix though).
As an experiment on a spare machine (if you have one), build
a HOST kernel using
HZ=1000 (this can be set in sys/conf/param.c
before make config), or use the
-DHZ=xxx build parameter.
Boot to that kernel on the host and verify you’re getting
1000 * ncpus “clock” interrupts via
vmstat -zi (again, on the HOST):
irq0/clock 14398951 3999
Example for my 4 CPU machine
Use regular 100 HZ kernels in the guest, and you should see some of these issues go away.
So, why don’t we make 1,000 Hz the default? Well, consider you run a 1,000 Hz host and a 1,000 Hz guest … you’re back in the same problem again. What really is needed here is a deadline scheduler (something like Linux’s tickless model) to handle arbitrary guest timer granularities. This is hard and is something dlg@ and I have been working on but it’s not ready. A 1,000 Hz (or even 2,000 Hz) host is the stopgap measure until that’s done.
Again, aside from cosmetic issues, I don’t think this is causing you or the users any real pain, just wanted to explain what I think is going on.”
See also vmstat(8), vmm(4), vmd(8)