takes place due to a lock scheduling conflict that occurs on a single host CPU running various vCPUs. That is, a vCPU (vCPU1) accessing a lock is scheduled prior to the vCPU (vCPU0) that is already holding the lock on the same host CPU.
Embedded systems require efficient memory handling. The overhead sustained by memory management is an important consideration with embedded hypervisors. The ARM architecture provides two-staged translation tables (or nested page tables) for guest memory virtualization. Fig. 13 shows the two-staged MMU on ARM. The guest OS is responsible for programming stage1 translation table which carries out guest virtual address (GVA) to intermediate physical address (IPA) translation. The ARM hypervisors are responsible for programming stage2 translation table to achieve intermediate physical address (IPA) to actual physical address (PA) translation
Translation table walks are required upon TLB misses. The number levels of stage2 translation table accessed through this process affect the memory bandwidth and overall performance of virtualized system. Such that N levels in stage1 translation table and M levels in stage2 translation table will carry out NxM memory accesses in worst-case scenarios. Clearly, the TLB-miss penalty is very expensive for guests on any virtualized system. To reduce TLB-miss penalty in two-staged MMU, ARM hypervisors create bigger pages in stage2 translation table.
Xvisor ARM pre-allocates contiguous host memory as guest RAM at guest creation time. It creates a separate three level stage2 translation table for each guest. Xvisor ARM can create 4KB or 2MB or 1GB translation table entries in stage2. Additionally, it always creates the biggest possible translation table entry in stage2 based on IPA and PA alignment. Finally, the guest RAM being flat/contiguous (unlike other hypervisors) helps cache speculative access, which further improves memory accesses for guests.