In a multi-CPU Virtual Machine(VM), virtual CPUs (VCPUs) are not guaranteed to be scheduled simultaneously. Operating System (OS) constructs, such as busy-wait (for e.g., spin locks) are written with an assumption of CPUs running concurrently on bare-metal wastes lot of CPU time. The Hardware assisted Pause Loop Exit (PLE) feature detects unnecessary busy-loop constructs in guest VMs and traps to the VCPU scheduler a.k.a PLE handler to choose a best VCPU candidate to run. The existing approach (before the optimization mentioned in the paper) does a directed yieldfootnote{ A task giving away it's CPU time to another task.} to a random VCPU and needs more intelligence. We also need to carefully consider the over-commit ratiofootnote { Ratio of total virtual CPUs to physical CPUs.} while designing the VCPU scheduling algorithm. For e.g., trapping to the PLE handler is an overhead during under-commit cases. The existing approach lacks the over-commit ratio awareness. Hence we need effective scheduling of VCPUs to boost the performance of VMs. We present three major improvements to old VCPU scheduling technique that include choosing a better VCPU for directed yield and optimizing for under-commit cases. All these approaches have been accepted into Linux kernel. These changes potentially bring around 300-400% improvements to I/O intensive cloud VMs (large under-committed guests) and up to 25% improvement to over-committed CPU intensive VMs.
展开▼