Clock

“So, there is a problem with accounting. Here's the long and short of it.

The host runs at HZ ticks per second. Unless you changed it (and you likely did not), that’s 100 ticks per second. This means the smallest granularity of time that can be quantized inside the VM is “slightly less” than 100 ticks per second. And of course, you can’t fire 100 ticks into the guest (for each guest) if the host itself is only receiving 100 opportunities to do that per second.

Similarly, each RTC in the guest expects a periodic timer firing at 128 Hz. Clearly that can’t be accomplished if the host itself is only scheduling at 100 Hz.

What this leads to is errors in accounting since the guest kernels don’t have a consistent view of time. What you’ll likely see is odd numbers for the “rate” column in vmstat -zi.

This is fairly harmless, but could be contributing to the “87%” interrupt number. I bet you actually aren’t running 87% interrupt load there, but the accounting is confused because of the skewed timer.

Some ideas:

Update to -current, I fixed some of the egergious problems in asserted PIC lines in slovenia last month. (There are still more to fix though).
As an experiment on a spare machine (if you have one), build a HOST kernel using HZ=1000 (this can be set in sys/conf/param.c before make config), or use the -DHZ=xxx build parameter.
Boot to that kernel on the host and verify you’re getting 1000 * ncpus “clock” interrupts via vmstat -zi (again, on the HOST):
```
irq0/clock      14398951    3999
```
Example for my 4 CPU machine

Use regular 100 HZ kernels in the guest, and you should see some of these issues go away.

So, why don’t we make 1,000 Hz the default? Well, consider you run a 1,000 Hz host and a 1,000 Hz guest … you’re back in the same problem again. What really is needed here is a deadline scheduler (something like Linux’s tickless model) to handle arbitrary guest timer granularities. This is hard and is something dlg@ and I have been working on but it’s not ready. A 1,000 Hz (or even 2,000 Hz) host is the stopgap measure until that’s done.

Again, aside from cosmetic issues, I don’t think this is causing you or the users any real pain, just wanted to explain what I think is going on.”

Mike Larkin
OpenBSD Developer
@mlarkin2012