1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun================================= 4*4882a593SmuzhiyunThe PPC KVM paravirtual interface 5*4882a593Smuzhiyun================================= 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunThe basic execution principle by which KVM on PowerPC works is to run all kernel 8*4882a593Smuzhiyunspace code in PR=1 which is user space. This way we trap all privileged 9*4882a593Smuzhiyuninstructions and can emulate them accordingly. 10*4882a593Smuzhiyun 11*4882a593SmuzhiyunUnfortunately that is also the downfall. There are quite some privileged 12*4882a593Smuzhiyuninstructions that needlessly return us to the hypervisor even though they 13*4882a593Smuzhiyuncould be handled differently. 14*4882a593Smuzhiyun 15*4882a593SmuzhiyunThis is what the PPC PV interface helps with. It takes privileged instructions 16*4882a593Smuzhiyunand transforms them into unprivileged ones with some help from the hypervisor. 17*4882a593SmuzhiyunThis cuts down virtualization costs by about 50% on some of my benchmarks. 18*4882a593Smuzhiyun 19*4882a593SmuzhiyunThe code for that interface can be found in arch/powerpc/kernel/kvm* 20*4882a593Smuzhiyun 21*4882a593SmuzhiyunQuerying for existence 22*4882a593Smuzhiyun====================== 23*4882a593Smuzhiyun 24*4882a593SmuzhiyunTo find out if we're running on KVM or not, we leverage the device tree. When 25*4882a593SmuzhiyunLinux is running on KVM, a node /hypervisor exists. That node contains a 26*4882a593Smuzhiyuncompatible property with the value "linux,kvm". 27*4882a593Smuzhiyun 28*4882a593SmuzhiyunOnce you determined you're running under a PV capable KVM, you can now use 29*4882a593Smuzhiyunhypercalls as described below. 30*4882a593Smuzhiyun 31*4882a593SmuzhiyunKVM hypercalls 32*4882a593Smuzhiyun============== 33*4882a593Smuzhiyun 34*4882a593SmuzhiyunInside the device tree's /hypervisor node there's a property called 35*4882a593Smuzhiyun'hypercall-instructions'. This property contains at most 4 opcodes that make 36*4882a593Smuzhiyunup the hypercall. To call a hypercall, just call these instructions. 37*4882a593Smuzhiyun 38*4882a593SmuzhiyunThe parameters are as follows: 39*4882a593Smuzhiyun 40*4882a593Smuzhiyun ======== ================ ================ 41*4882a593Smuzhiyun Register IN OUT 42*4882a593Smuzhiyun ======== ================ ================ 43*4882a593Smuzhiyun r0 - volatile 44*4882a593Smuzhiyun r3 1st parameter Return code 45*4882a593Smuzhiyun r4 2nd parameter 1st output value 46*4882a593Smuzhiyun r5 3rd parameter 2nd output value 47*4882a593Smuzhiyun r6 4th parameter 3rd output value 48*4882a593Smuzhiyun r7 5th parameter 4th output value 49*4882a593Smuzhiyun r8 6th parameter 5th output value 50*4882a593Smuzhiyun r9 7th parameter 6th output value 51*4882a593Smuzhiyun r10 8th parameter 7th output value 52*4882a593Smuzhiyun r11 hypercall number 8th output value 53*4882a593Smuzhiyun r12 - volatile 54*4882a593Smuzhiyun ======== ================ ================ 55*4882a593Smuzhiyun 56*4882a593SmuzhiyunHypercall definitions are shared in generic code, so the same hypercall numbers 57*4882a593Smuzhiyunapply for x86 and powerpc alike with the exception that each KVM hypercall 58*4882a593Smuzhiyunalso needs to be ORed with the KVM vendor code which is (42 << 16). 59*4882a593Smuzhiyun 60*4882a593SmuzhiyunReturn codes can be as follows: 61*4882a593Smuzhiyun 62*4882a593Smuzhiyun ==== ========================= 63*4882a593Smuzhiyun Code Meaning 64*4882a593Smuzhiyun ==== ========================= 65*4882a593Smuzhiyun 0 Success 66*4882a593Smuzhiyun 12 Hypercall not implemented 67*4882a593Smuzhiyun <0 Error 68*4882a593Smuzhiyun ==== ========================= 69*4882a593Smuzhiyun 70*4882a593SmuzhiyunThe magic page 71*4882a593Smuzhiyun============== 72*4882a593Smuzhiyun 73*4882a593SmuzhiyunTo enable communication between the hypervisor and guest there is a new shared 74*4882a593Smuzhiyunpage that contains parts of supervisor visible register state. The guest can 75*4882a593Smuzhiyunmap this shared page using the KVM hypercall KVM_HC_PPC_MAP_MAGIC_PAGE. 76*4882a593Smuzhiyun 77*4882a593SmuzhiyunWith this hypercall issued the guest always gets the magic page mapped at the 78*4882a593Smuzhiyundesired location. The first parameter indicates the effective address when the 79*4882a593SmuzhiyunMMU is enabled. The second parameter indicates the address in real mode, if 80*4882a593Smuzhiyunapplicable to the target. For now, we always map the page to -4096. This way we 81*4882a593Smuzhiyuncan access it using absolute load and store functions. The following 82*4882a593Smuzhiyuninstruction reads the first field of the magic page:: 83*4882a593Smuzhiyun 84*4882a593Smuzhiyun ld rX, -4096(0) 85*4882a593Smuzhiyun 86*4882a593SmuzhiyunThe interface is designed to be extensible should there be need later to add 87*4882a593Smuzhiyunadditional registers to the magic page. If you add fields to the magic page, 88*4882a593Smuzhiyunalso define a new hypercall feature to indicate that the host can give you more 89*4882a593Smuzhiyunregisters. Only if the host supports the additional features, make use of them. 90*4882a593Smuzhiyun 91*4882a593SmuzhiyunThe magic page layout is described by struct kvm_vcpu_arch_shared 92*4882a593Smuzhiyunin arch/powerpc/include/asm/kvm_para.h. 93*4882a593Smuzhiyun 94*4882a593SmuzhiyunMagic page features 95*4882a593Smuzhiyun=================== 96*4882a593Smuzhiyun 97*4882a593SmuzhiyunWhen mapping the magic page using the KVM hypercall KVM_HC_PPC_MAP_MAGIC_PAGE, 98*4882a593Smuzhiyuna second return value is passed to the guest. This second return value contains 99*4882a593Smuzhiyuna bitmap of available features inside the magic page. 100*4882a593Smuzhiyun 101*4882a593SmuzhiyunThe following enhancements to the magic page are currently available: 102*4882a593Smuzhiyun 103*4882a593Smuzhiyun ============================ ======================================= 104*4882a593Smuzhiyun KVM_MAGIC_FEAT_SR Maps SR registers r/w in the magic page 105*4882a593Smuzhiyun KVM_MAGIC_FEAT_MAS0_TO_SPRG7 Maps MASn, ESR, PIR and high SPRGs 106*4882a593Smuzhiyun ============================ ======================================= 107*4882a593Smuzhiyun 108*4882a593SmuzhiyunFor enhanced features in the magic page, please check for the existence of the 109*4882a593Smuzhiyunfeature before using them! 110*4882a593Smuzhiyun 111*4882a593SmuzhiyunMagic page flags 112*4882a593Smuzhiyun================ 113*4882a593Smuzhiyun 114*4882a593SmuzhiyunIn addition to features that indicate whether a host is capable of a particular 115*4882a593Smuzhiyunfeature we also have a channel for a guest to tell the guest whether it's capable 116*4882a593Smuzhiyunof something. This is what we call "flags". 117*4882a593Smuzhiyun 118*4882a593SmuzhiyunFlags are passed to the host in the low 12 bits of the Effective Address. 119*4882a593Smuzhiyun 120*4882a593SmuzhiyunThe following flags are currently available for a guest to expose: 121*4882a593Smuzhiyun 122*4882a593Smuzhiyun MAGIC_PAGE_FLAG_NOT_MAPPED_NX Guest handles NX bits correctly wrt magic page 123*4882a593Smuzhiyun 124*4882a593SmuzhiyunMSR bits 125*4882a593Smuzhiyun======== 126*4882a593Smuzhiyun 127*4882a593SmuzhiyunThe MSR contains bits that require hypervisor intervention and bits that do 128*4882a593Smuzhiyunnot require direct hypervisor intervention because they only get interpreted 129*4882a593Smuzhiyunwhen entering the guest or don't have any impact on the hypervisor's behavior. 130*4882a593Smuzhiyun 131*4882a593SmuzhiyunThe following bits are safe to be set inside the guest: 132*4882a593Smuzhiyun 133*4882a593Smuzhiyun - MSR_EE 134*4882a593Smuzhiyun - MSR_RI 135*4882a593Smuzhiyun 136*4882a593SmuzhiyunIf any other bit changes in the MSR, please still use mtmsr(d). 137*4882a593Smuzhiyun 138*4882a593SmuzhiyunPatched instructions 139*4882a593Smuzhiyun==================== 140*4882a593Smuzhiyun 141*4882a593SmuzhiyunThe "ld" and "std" instructions are transformed to "lwz" and "stw" instructions 142*4882a593Smuzhiyunrespectively on 32 bit systems with an added offset of 4 to accommodate for big 143*4882a593Smuzhiyunendianness. 144*4882a593Smuzhiyun 145*4882a593SmuzhiyunThe following is a list of mapping the Linux kernel performs when running as 146*4882a593Smuzhiyunguest. Implementing any of those mappings is optional, as the instruction traps 147*4882a593Smuzhiyunalso act on the shared page. So calling privileged instructions still works as 148*4882a593Smuzhiyunbefore. 149*4882a593Smuzhiyun 150*4882a593Smuzhiyun======================= ================================ 151*4882a593SmuzhiyunFrom To 152*4882a593Smuzhiyun======================= ================================ 153*4882a593Smuzhiyunmfmsr rX ld rX, magic_page->msr 154*4882a593Smuzhiyunmfsprg rX, 0 ld rX, magic_page->sprg0 155*4882a593Smuzhiyunmfsprg rX, 1 ld rX, magic_page->sprg1 156*4882a593Smuzhiyunmfsprg rX, 2 ld rX, magic_page->sprg2 157*4882a593Smuzhiyunmfsprg rX, 3 ld rX, magic_page->sprg3 158*4882a593Smuzhiyunmfsrr0 rX ld rX, magic_page->srr0 159*4882a593Smuzhiyunmfsrr1 rX ld rX, magic_page->srr1 160*4882a593Smuzhiyunmfdar rX ld rX, magic_page->dar 161*4882a593Smuzhiyunmfdsisr rX lwz rX, magic_page->dsisr 162*4882a593Smuzhiyun 163*4882a593Smuzhiyunmtmsr rX std rX, magic_page->msr 164*4882a593Smuzhiyunmtsprg 0, rX std rX, magic_page->sprg0 165*4882a593Smuzhiyunmtsprg 1, rX std rX, magic_page->sprg1 166*4882a593Smuzhiyunmtsprg 2, rX std rX, magic_page->sprg2 167*4882a593Smuzhiyunmtsprg 3, rX std rX, magic_page->sprg3 168*4882a593Smuzhiyunmtsrr0 rX std rX, magic_page->srr0 169*4882a593Smuzhiyunmtsrr1 rX std rX, magic_page->srr1 170*4882a593Smuzhiyunmtdar rX std rX, magic_page->dar 171*4882a593Smuzhiyunmtdsisr rX stw rX, magic_page->dsisr 172*4882a593Smuzhiyun 173*4882a593Smuzhiyuntlbsync nop 174*4882a593Smuzhiyun 175*4882a593Smuzhiyunmtmsrd rX, 0 b <special mtmsr section> 176*4882a593Smuzhiyunmtmsr rX b <special mtmsr section> 177*4882a593Smuzhiyun 178*4882a593Smuzhiyunmtmsrd rX, 1 b <special mtmsrd section> 179*4882a593Smuzhiyun 180*4882a593Smuzhiyun[Book3S only] 181*4882a593Smuzhiyunmtsrin rX, rY b <special mtsrin section> 182*4882a593Smuzhiyun 183*4882a593Smuzhiyun[BookE only] 184*4882a593Smuzhiyunwrteei [0|1] b <special wrteei section> 185*4882a593Smuzhiyun======================= ================================ 186*4882a593Smuzhiyun 187*4882a593SmuzhiyunSome instructions require more logic to determine what's going on than a load 188*4882a593Smuzhiyunor store instruction can deliver. To enable patching of those, we keep some 189*4882a593SmuzhiyunRAM around where we can live translate instructions to. What happens is the 190*4882a593Smuzhiyunfollowing: 191*4882a593Smuzhiyun 192*4882a593Smuzhiyun 1) copy emulation code to memory 193*4882a593Smuzhiyun 2) patch that code to fit the emulated instruction 194*4882a593Smuzhiyun 3) patch that code to return to the original pc + 4 195*4882a593Smuzhiyun 4) patch the original instruction to branch to the new code 196*4882a593Smuzhiyun 197*4882a593SmuzhiyunThat way we can inject an arbitrary amount of code as replacement for a single 198*4882a593Smuzhiyuninstruction. This allows us to check for pending interrupts when setting EE=1 199*4882a593Smuzhiyunfor example. 200*4882a593Smuzhiyun 201*4882a593SmuzhiyunHypercall ABIs in KVM on PowerPC 202*4882a593Smuzhiyun================================= 203*4882a593Smuzhiyun 204*4882a593Smuzhiyun1) KVM hypercalls (ePAPR) 205*4882a593Smuzhiyun 206*4882a593SmuzhiyunThese are ePAPR compliant hypercall implementation (mentioned above). Even 207*4882a593Smuzhiyungeneric hypercalls are implemented here, like the ePAPR idle hcall. These are 208*4882a593Smuzhiyunavailable on all targets. 209*4882a593Smuzhiyun 210*4882a593Smuzhiyun2) PAPR hypercalls 211*4882a593Smuzhiyun 212*4882a593SmuzhiyunPAPR hypercalls are needed to run server PowerPC PAPR guests (-M pseries in QEMU). 213*4882a593SmuzhiyunThese are the same hypercalls that pHyp, the POWER hypervisor implements. Some of 214*4882a593Smuzhiyunthem are handled in the kernel, some are handled in user space. This is only 215*4882a593Smuzhiyunavailable on book3s_64. 216*4882a593Smuzhiyun 217*4882a593Smuzhiyun3) OSI hypercalls 218*4882a593Smuzhiyun 219*4882a593SmuzhiyunMac-on-Linux is another user of KVM on PowerPC, which has its own hypercall (long 220*4882a593Smuzhiyunbefore KVM). This is supported to maintain compatibility. All these hypercalls get 221*4882a593Smuzhiyunforwarded to user space. This is only useful on book3s_32, but can be used with 222*4882a593Smuzhiyunbook3s_64 as well. 223