1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun=========================================================== 4*4882a593SmuzhiyunPOWER9 eXternal Interrupt Virtualization Engine (XIVE Gen1) 5*4882a593Smuzhiyun=========================================================== 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunDevice types supported: 8*4882a593Smuzhiyun - KVM_DEV_TYPE_XIVE POWER9 XIVE Interrupt Controller generation 1 9*4882a593Smuzhiyun 10*4882a593SmuzhiyunThis device acts as a VM interrupt controller. It provides the KVM 11*4882a593Smuzhiyuninterface to configure the interrupt sources of a VM in the underlying 12*4882a593SmuzhiyunPOWER9 XIVE interrupt controller. 13*4882a593Smuzhiyun 14*4882a593SmuzhiyunOnly one XIVE instance may be instantiated. A guest XIVE device 15*4882a593Smuzhiyunrequires a POWER9 host and the guest OS should have support for the 16*4882a593SmuzhiyunXIVE native exploitation interrupt mode. If not, it should run using 17*4882a593Smuzhiyunthe legacy interrupt mode, referred as XICS (POWER7/8). 18*4882a593Smuzhiyun 19*4882a593Smuzhiyun* Device Mappings 20*4882a593Smuzhiyun 21*4882a593Smuzhiyun The KVM device exposes different MMIO ranges of the XIVE HW which 22*4882a593Smuzhiyun are required for interrupt management. These are exposed to the 23*4882a593Smuzhiyun guest in VMAs populated with a custom VM fault handler. 24*4882a593Smuzhiyun 25*4882a593Smuzhiyun 1. Thread Interrupt Management Area (TIMA) 26*4882a593Smuzhiyun 27*4882a593Smuzhiyun Each thread has an associated Thread Interrupt Management context 28*4882a593Smuzhiyun composed of a set of registers. These registers let the thread 29*4882a593Smuzhiyun handle priority management and interrupt acknowledgment. The most 30*4882a593Smuzhiyun important are : 31*4882a593Smuzhiyun 32*4882a593Smuzhiyun - Interrupt Pending Buffer (IPB) 33*4882a593Smuzhiyun - Current Processor Priority (CPPR) 34*4882a593Smuzhiyun - Notification Source Register (NSR) 35*4882a593Smuzhiyun 36*4882a593Smuzhiyun They are exposed to software in four different pages each proposing 37*4882a593Smuzhiyun a view with a different privilege. The first page is for the 38*4882a593Smuzhiyun physical thread context and the second for the hypervisor. Only the 39*4882a593Smuzhiyun third (operating system) and the fourth (user level) are exposed the 40*4882a593Smuzhiyun guest. 41*4882a593Smuzhiyun 42*4882a593Smuzhiyun 2. Event State Buffer (ESB) 43*4882a593Smuzhiyun 44*4882a593Smuzhiyun Each source is associated with an Event State Buffer (ESB) with 45*4882a593Smuzhiyun either a pair of even/odd pair of pages which provides commands to 46*4882a593Smuzhiyun manage the source: to trigger, to EOI, to turn off the source for 47*4882a593Smuzhiyun instance. 48*4882a593Smuzhiyun 49*4882a593Smuzhiyun 3. Device pass-through 50*4882a593Smuzhiyun 51*4882a593Smuzhiyun When a device is passed-through into the guest, the source 52*4882a593Smuzhiyun interrupts are from a different HW controller (PHB4) and the ESB 53*4882a593Smuzhiyun pages exposed to the guest should accommadate this change. 54*4882a593Smuzhiyun 55*4882a593Smuzhiyun The passthru_irq helpers, kvmppc_xive_set_mapped() and 56*4882a593Smuzhiyun kvmppc_xive_clr_mapped() are called when the device HW irqs are 57*4882a593Smuzhiyun mapped into or unmapped from the guest IRQ number space. The KVM 58*4882a593Smuzhiyun device extends these helpers to clear the ESB pages of the guest IRQ 59*4882a593Smuzhiyun number being mapped and then lets the VM fault handler repopulate. 60*4882a593Smuzhiyun The handler will insert the ESB page corresponding to the HW 61*4882a593Smuzhiyun interrupt of the device being passed-through or the initial IPI ESB 62*4882a593Smuzhiyun page if the device has being removed. 63*4882a593Smuzhiyun 64*4882a593Smuzhiyun The ESB remapping is fully transparent to the guest and the OS 65*4882a593Smuzhiyun device driver. All handling is done within VFIO and the above 66*4882a593Smuzhiyun helpers in KVM-PPC. 67*4882a593Smuzhiyun 68*4882a593Smuzhiyun* Groups: 69*4882a593Smuzhiyun 70*4882a593Smuzhiyun1. KVM_DEV_XIVE_GRP_CTRL 71*4882a593Smuzhiyun Provides global controls on the device 72*4882a593Smuzhiyun 73*4882a593Smuzhiyun Attributes: 74*4882a593Smuzhiyun 1.1 KVM_DEV_XIVE_RESET (write only) 75*4882a593Smuzhiyun Resets the interrupt controller configuration for sources and event 76*4882a593Smuzhiyun queues. To be used by kexec and kdump. 77*4882a593Smuzhiyun 78*4882a593Smuzhiyun Errors: none 79*4882a593Smuzhiyun 80*4882a593Smuzhiyun 1.2 KVM_DEV_XIVE_EQ_SYNC (write only) 81*4882a593Smuzhiyun Sync all the sources and queues and mark the EQ pages dirty. This 82*4882a593Smuzhiyun to make sure that a consistent memory state is captured when 83*4882a593Smuzhiyun migrating the VM. 84*4882a593Smuzhiyun 85*4882a593Smuzhiyun Errors: none 86*4882a593Smuzhiyun 87*4882a593Smuzhiyun 1.3 KVM_DEV_XIVE_NR_SERVERS (write only) 88*4882a593Smuzhiyun The kvm_device_attr.addr points to a __u32 value which is the number of 89*4882a593Smuzhiyun interrupt server numbers (ie, highest possible vcpu id plus one). 90*4882a593Smuzhiyun 91*4882a593Smuzhiyun Errors: 92*4882a593Smuzhiyun 93*4882a593Smuzhiyun ======= ========================================== 94*4882a593Smuzhiyun -EINVAL Value greater than KVM_MAX_VCPU_ID. 95*4882a593Smuzhiyun -EFAULT Invalid user pointer for attr->addr. 96*4882a593Smuzhiyun -EBUSY A vCPU is already connected to the device. 97*4882a593Smuzhiyun ======= ========================================== 98*4882a593Smuzhiyun 99*4882a593Smuzhiyun2. KVM_DEV_XIVE_GRP_SOURCE (write only) 100*4882a593Smuzhiyun Initializes a new source in the XIVE device and mask it. 101*4882a593Smuzhiyun 102*4882a593Smuzhiyun Attributes: 103*4882a593Smuzhiyun Interrupt source number (64-bit) 104*4882a593Smuzhiyun 105*4882a593Smuzhiyun The kvm_device_attr.addr points to a __u64 value:: 106*4882a593Smuzhiyun 107*4882a593Smuzhiyun bits: | 63 .... 2 | 1 | 0 108*4882a593Smuzhiyun values: | unused | level | type 109*4882a593Smuzhiyun 110*4882a593Smuzhiyun - type: 0:MSI 1:LSI 111*4882a593Smuzhiyun - level: assertion level in case of an LSI. 112*4882a593Smuzhiyun 113*4882a593Smuzhiyun Errors: 114*4882a593Smuzhiyun 115*4882a593Smuzhiyun ======= ========================================== 116*4882a593Smuzhiyun -E2BIG Interrupt source number is out of range 117*4882a593Smuzhiyun -ENOMEM Could not create a new source block 118*4882a593Smuzhiyun -EFAULT Invalid user pointer for attr->addr. 119*4882a593Smuzhiyun -ENXIO Could not allocate underlying HW interrupt 120*4882a593Smuzhiyun ======= ========================================== 121*4882a593Smuzhiyun 122*4882a593Smuzhiyun3. KVM_DEV_XIVE_GRP_SOURCE_CONFIG (write only) 123*4882a593Smuzhiyun Configures source targeting 124*4882a593Smuzhiyun 125*4882a593Smuzhiyun Attributes: 126*4882a593Smuzhiyun Interrupt source number (64-bit) 127*4882a593Smuzhiyun 128*4882a593Smuzhiyun The kvm_device_attr.addr points to a __u64 value:: 129*4882a593Smuzhiyun 130*4882a593Smuzhiyun bits: | 63 .... 33 | 32 | 31 .. 3 | 2 .. 0 131*4882a593Smuzhiyun values: | eisn | mask | server | priority 132*4882a593Smuzhiyun 133*4882a593Smuzhiyun - priority: 0-7 interrupt priority level 134*4882a593Smuzhiyun - server: CPU number chosen to handle the interrupt 135*4882a593Smuzhiyun - mask: mask flag (unused) 136*4882a593Smuzhiyun - eisn: Effective Interrupt Source Number 137*4882a593Smuzhiyun 138*4882a593Smuzhiyun Errors: 139*4882a593Smuzhiyun 140*4882a593Smuzhiyun ======= ======================================================= 141*4882a593Smuzhiyun -ENOENT Unknown source number 142*4882a593Smuzhiyun -EINVAL Not initialized source number 143*4882a593Smuzhiyun -EINVAL Invalid priority 144*4882a593Smuzhiyun -EINVAL Invalid CPU number. 145*4882a593Smuzhiyun -EFAULT Invalid user pointer for attr->addr. 146*4882a593Smuzhiyun -ENXIO CPU event queues not configured or configuration of the 147*4882a593Smuzhiyun underlying HW interrupt failed 148*4882a593Smuzhiyun -EBUSY No CPU available to serve interrupt 149*4882a593Smuzhiyun ======= ======================================================= 150*4882a593Smuzhiyun 151*4882a593Smuzhiyun4. KVM_DEV_XIVE_GRP_EQ_CONFIG (read-write) 152*4882a593Smuzhiyun Configures an event queue of a CPU 153*4882a593Smuzhiyun 154*4882a593Smuzhiyun Attributes: 155*4882a593Smuzhiyun EQ descriptor identifier (64-bit) 156*4882a593Smuzhiyun 157*4882a593Smuzhiyun The EQ descriptor identifier is a tuple (server, priority):: 158*4882a593Smuzhiyun 159*4882a593Smuzhiyun bits: | 63 .... 32 | 31 .. 3 | 2 .. 0 160*4882a593Smuzhiyun values: | unused | server | priority 161*4882a593Smuzhiyun 162*4882a593Smuzhiyun The kvm_device_attr.addr points to:: 163*4882a593Smuzhiyun 164*4882a593Smuzhiyun struct kvm_ppc_xive_eq { 165*4882a593Smuzhiyun __u32 flags; 166*4882a593Smuzhiyun __u32 qshift; 167*4882a593Smuzhiyun __u64 qaddr; 168*4882a593Smuzhiyun __u32 qtoggle; 169*4882a593Smuzhiyun __u32 qindex; 170*4882a593Smuzhiyun __u8 pad[40]; 171*4882a593Smuzhiyun }; 172*4882a593Smuzhiyun 173*4882a593Smuzhiyun - flags: queue flags 174*4882a593Smuzhiyun KVM_XIVE_EQ_ALWAYS_NOTIFY (required) 175*4882a593Smuzhiyun forces notification without using the coalescing mechanism 176*4882a593Smuzhiyun provided by the XIVE END ESBs. 177*4882a593Smuzhiyun - qshift: queue size (power of 2) 178*4882a593Smuzhiyun - qaddr: real address of queue 179*4882a593Smuzhiyun - qtoggle: current queue toggle bit 180*4882a593Smuzhiyun - qindex: current queue index 181*4882a593Smuzhiyun - pad: reserved for future use 182*4882a593Smuzhiyun 183*4882a593Smuzhiyun Errors: 184*4882a593Smuzhiyun 185*4882a593Smuzhiyun ======= ========================================= 186*4882a593Smuzhiyun -ENOENT Invalid CPU number 187*4882a593Smuzhiyun -EINVAL Invalid priority 188*4882a593Smuzhiyun -EINVAL Invalid flags 189*4882a593Smuzhiyun -EINVAL Invalid queue size 190*4882a593Smuzhiyun -EINVAL Invalid queue address 191*4882a593Smuzhiyun -EFAULT Invalid user pointer for attr->addr. 192*4882a593Smuzhiyun -EIO Configuration of the underlying HW failed 193*4882a593Smuzhiyun ======= ========================================= 194*4882a593Smuzhiyun 195*4882a593Smuzhiyun5. KVM_DEV_XIVE_GRP_SOURCE_SYNC (write only) 196*4882a593Smuzhiyun Synchronize the source to flush event notifications 197*4882a593Smuzhiyun 198*4882a593Smuzhiyun Attributes: 199*4882a593Smuzhiyun Interrupt source number (64-bit) 200*4882a593Smuzhiyun 201*4882a593Smuzhiyun Errors: 202*4882a593Smuzhiyun 203*4882a593Smuzhiyun ======= ============================= 204*4882a593Smuzhiyun -ENOENT Unknown source number 205*4882a593Smuzhiyun -EINVAL Not initialized source number 206*4882a593Smuzhiyun ======= ============================= 207*4882a593Smuzhiyun 208*4882a593Smuzhiyun* VCPU state 209*4882a593Smuzhiyun 210*4882a593Smuzhiyun The XIVE IC maintains VP interrupt state in an internal structure 211*4882a593Smuzhiyun called the NVT. When a VP is not dispatched on a HW processor 212*4882a593Smuzhiyun thread, this structure can be updated by HW if the VP is the target 213*4882a593Smuzhiyun of an event notification. 214*4882a593Smuzhiyun 215*4882a593Smuzhiyun It is important for migration to capture the cached IPB from the NVT 216*4882a593Smuzhiyun as it synthesizes the priorities of the pending interrupts. We 217*4882a593Smuzhiyun capture a bit more to report debug information. 218*4882a593Smuzhiyun 219*4882a593Smuzhiyun KVM_REG_PPC_VP_STATE (2 * 64bits):: 220*4882a593Smuzhiyun 221*4882a593Smuzhiyun bits: | 63 .... 32 | 31 .... 0 | 222*4882a593Smuzhiyun values: | TIMA word0 | TIMA word1 | 223*4882a593Smuzhiyun bits: | 127 .......... 64 | 224*4882a593Smuzhiyun values: | unused | 225*4882a593Smuzhiyun 226*4882a593Smuzhiyun* Migration: 227*4882a593Smuzhiyun 228*4882a593Smuzhiyun Saving the state of a VM using the XIVE native exploitation mode 229*4882a593Smuzhiyun should follow a specific sequence. When the VM is stopped : 230*4882a593Smuzhiyun 231*4882a593Smuzhiyun 1. Mask all sources (PQ=01) to stop the flow of events. 232*4882a593Smuzhiyun 233*4882a593Smuzhiyun 2. Sync the XIVE device with the KVM control KVM_DEV_XIVE_EQ_SYNC to 234*4882a593Smuzhiyun flush any in-flight event notification and to stabilize the EQs. At 235*4882a593Smuzhiyun this stage, the EQ pages are marked dirty to make sure they are 236*4882a593Smuzhiyun transferred in the migration sequence. 237*4882a593Smuzhiyun 238*4882a593Smuzhiyun 3. Capture the state of the source targeting, the EQs configuration 239*4882a593Smuzhiyun and the state of thread interrupt context registers. 240*4882a593Smuzhiyun 241*4882a593Smuzhiyun Restore is similar: 242*4882a593Smuzhiyun 243*4882a593Smuzhiyun 1. Restore the EQ configuration. As targeting depends on it. 244*4882a593Smuzhiyun 2. Restore targeting 245*4882a593Smuzhiyun 3. Restore the thread interrupt contexts 246*4882a593Smuzhiyun 4. Restore the source states 247*4882a593Smuzhiyun 5. Let the vCPU run 248