xref: /OK3568_Linux_fs/kernel/Documentation/virt/kvm/devices/xive.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun===========================================================
4*4882a593SmuzhiyunPOWER9 eXternal Interrupt Virtualization Engine (XIVE Gen1)
5*4882a593Smuzhiyun===========================================================
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunDevice types supported:
8*4882a593Smuzhiyun  - KVM_DEV_TYPE_XIVE     POWER9 XIVE Interrupt Controller generation 1
9*4882a593Smuzhiyun
10*4882a593SmuzhiyunThis device acts as a VM interrupt controller. It provides the KVM
11*4882a593Smuzhiyuninterface to configure the interrupt sources of a VM in the underlying
12*4882a593SmuzhiyunPOWER9 XIVE interrupt controller.
13*4882a593Smuzhiyun
14*4882a593SmuzhiyunOnly one XIVE instance may be instantiated. A guest XIVE device
15*4882a593Smuzhiyunrequires a POWER9 host and the guest OS should have support for the
16*4882a593SmuzhiyunXIVE native exploitation interrupt mode. If not, it should run using
17*4882a593Smuzhiyunthe legacy interrupt mode, referred as XICS (POWER7/8).
18*4882a593Smuzhiyun
19*4882a593Smuzhiyun* Device Mappings
20*4882a593Smuzhiyun
21*4882a593Smuzhiyun  The KVM device exposes different MMIO ranges of the XIVE HW which
22*4882a593Smuzhiyun  are required for interrupt management. These are exposed to the
23*4882a593Smuzhiyun  guest in VMAs populated with a custom VM fault handler.
24*4882a593Smuzhiyun
25*4882a593Smuzhiyun  1. Thread Interrupt Management Area (TIMA)
26*4882a593Smuzhiyun
27*4882a593Smuzhiyun  Each thread has an associated Thread Interrupt Management context
28*4882a593Smuzhiyun  composed of a set of registers. These registers let the thread
29*4882a593Smuzhiyun  handle priority management and interrupt acknowledgment. The most
30*4882a593Smuzhiyun  important are :
31*4882a593Smuzhiyun
32*4882a593Smuzhiyun      - Interrupt Pending Buffer     (IPB)
33*4882a593Smuzhiyun      - Current Processor Priority   (CPPR)
34*4882a593Smuzhiyun      - Notification Source Register (NSR)
35*4882a593Smuzhiyun
36*4882a593Smuzhiyun  They are exposed to software in four different pages each proposing
37*4882a593Smuzhiyun  a view with a different privilege. The first page is for the
38*4882a593Smuzhiyun  physical thread context and the second for the hypervisor. Only the
39*4882a593Smuzhiyun  third (operating system) and the fourth (user level) are exposed the
40*4882a593Smuzhiyun  guest.
41*4882a593Smuzhiyun
42*4882a593Smuzhiyun  2. Event State Buffer (ESB)
43*4882a593Smuzhiyun
44*4882a593Smuzhiyun  Each source is associated with an Event State Buffer (ESB) with
45*4882a593Smuzhiyun  either a pair of even/odd pair of pages which provides commands to
46*4882a593Smuzhiyun  manage the source: to trigger, to EOI, to turn off the source for
47*4882a593Smuzhiyun  instance.
48*4882a593Smuzhiyun
49*4882a593Smuzhiyun  3. Device pass-through
50*4882a593Smuzhiyun
51*4882a593Smuzhiyun  When a device is passed-through into the guest, the source
52*4882a593Smuzhiyun  interrupts are from a different HW controller (PHB4) and the ESB
53*4882a593Smuzhiyun  pages exposed to the guest should accommadate this change.
54*4882a593Smuzhiyun
55*4882a593Smuzhiyun  The passthru_irq helpers, kvmppc_xive_set_mapped() and
56*4882a593Smuzhiyun  kvmppc_xive_clr_mapped() are called when the device HW irqs are
57*4882a593Smuzhiyun  mapped into or unmapped from the guest IRQ number space. The KVM
58*4882a593Smuzhiyun  device extends these helpers to clear the ESB pages of the guest IRQ
59*4882a593Smuzhiyun  number being mapped and then lets the VM fault handler repopulate.
60*4882a593Smuzhiyun  The handler will insert the ESB page corresponding to the HW
61*4882a593Smuzhiyun  interrupt of the device being passed-through or the initial IPI ESB
62*4882a593Smuzhiyun  page if the device has being removed.
63*4882a593Smuzhiyun
64*4882a593Smuzhiyun  The ESB remapping is fully transparent to the guest and the OS
65*4882a593Smuzhiyun  device driver. All handling is done within VFIO and the above
66*4882a593Smuzhiyun  helpers in KVM-PPC.
67*4882a593Smuzhiyun
68*4882a593Smuzhiyun* Groups:
69*4882a593Smuzhiyun
70*4882a593Smuzhiyun1. KVM_DEV_XIVE_GRP_CTRL
71*4882a593Smuzhiyun     Provides global controls on the device
72*4882a593Smuzhiyun
73*4882a593Smuzhiyun  Attributes:
74*4882a593Smuzhiyun    1.1 KVM_DEV_XIVE_RESET (write only)
75*4882a593Smuzhiyun    Resets the interrupt controller configuration for sources and event
76*4882a593Smuzhiyun    queues. To be used by kexec and kdump.
77*4882a593Smuzhiyun
78*4882a593Smuzhiyun    Errors: none
79*4882a593Smuzhiyun
80*4882a593Smuzhiyun    1.2 KVM_DEV_XIVE_EQ_SYNC (write only)
81*4882a593Smuzhiyun    Sync all the sources and queues and mark the EQ pages dirty. This
82*4882a593Smuzhiyun    to make sure that a consistent memory state is captured when
83*4882a593Smuzhiyun    migrating the VM.
84*4882a593Smuzhiyun
85*4882a593Smuzhiyun    Errors: none
86*4882a593Smuzhiyun
87*4882a593Smuzhiyun    1.3 KVM_DEV_XIVE_NR_SERVERS (write only)
88*4882a593Smuzhiyun    The kvm_device_attr.addr points to a __u32 value which is the number of
89*4882a593Smuzhiyun    interrupt server numbers (ie, highest possible vcpu id plus one).
90*4882a593Smuzhiyun
91*4882a593Smuzhiyun    Errors:
92*4882a593Smuzhiyun
93*4882a593Smuzhiyun      =======  ==========================================
94*4882a593Smuzhiyun      -EINVAL  Value greater than KVM_MAX_VCPU_ID.
95*4882a593Smuzhiyun      -EFAULT  Invalid user pointer for attr->addr.
96*4882a593Smuzhiyun      -EBUSY   A vCPU is already connected to the device.
97*4882a593Smuzhiyun      =======  ==========================================
98*4882a593Smuzhiyun
99*4882a593Smuzhiyun2. KVM_DEV_XIVE_GRP_SOURCE (write only)
100*4882a593Smuzhiyun     Initializes a new source in the XIVE device and mask it.
101*4882a593Smuzhiyun
102*4882a593Smuzhiyun  Attributes:
103*4882a593Smuzhiyun    Interrupt source number  (64-bit)
104*4882a593Smuzhiyun
105*4882a593Smuzhiyun  The kvm_device_attr.addr points to a __u64 value::
106*4882a593Smuzhiyun
107*4882a593Smuzhiyun    bits:     | 63   ....  2 |   1   |   0
108*4882a593Smuzhiyun    values:   |    unused    | level | type
109*4882a593Smuzhiyun
110*4882a593Smuzhiyun  - type:  0:MSI 1:LSI
111*4882a593Smuzhiyun  - level: assertion level in case of an LSI.
112*4882a593Smuzhiyun
113*4882a593Smuzhiyun  Errors:
114*4882a593Smuzhiyun
115*4882a593Smuzhiyun    =======  ==========================================
116*4882a593Smuzhiyun    -E2BIG   Interrupt source number is out of range
117*4882a593Smuzhiyun    -ENOMEM  Could not create a new source block
118*4882a593Smuzhiyun    -EFAULT  Invalid user pointer for attr->addr.
119*4882a593Smuzhiyun    -ENXIO   Could not allocate underlying HW interrupt
120*4882a593Smuzhiyun    =======  ==========================================
121*4882a593Smuzhiyun
122*4882a593Smuzhiyun3. KVM_DEV_XIVE_GRP_SOURCE_CONFIG (write only)
123*4882a593Smuzhiyun     Configures source targeting
124*4882a593Smuzhiyun
125*4882a593Smuzhiyun  Attributes:
126*4882a593Smuzhiyun    Interrupt source number  (64-bit)
127*4882a593Smuzhiyun
128*4882a593Smuzhiyun  The kvm_device_attr.addr points to a __u64 value::
129*4882a593Smuzhiyun
130*4882a593Smuzhiyun    bits:     | 63   ....  33 |  32  | 31 .. 3 |  2 .. 0
131*4882a593Smuzhiyun    values:   |    eisn       | mask |  server | priority
132*4882a593Smuzhiyun
133*4882a593Smuzhiyun  - priority: 0-7 interrupt priority level
134*4882a593Smuzhiyun  - server: CPU number chosen to handle the interrupt
135*4882a593Smuzhiyun  - mask: mask flag (unused)
136*4882a593Smuzhiyun  - eisn: Effective Interrupt Source Number
137*4882a593Smuzhiyun
138*4882a593Smuzhiyun  Errors:
139*4882a593Smuzhiyun
140*4882a593Smuzhiyun    =======  =======================================================
141*4882a593Smuzhiyun    -ENOENT  Unknown source number
142*4882a593Smuzhiyun    -EINVAL  Not initialized source number
143*4882a593Smuzhiyun    -EINVAL  Invalid priority
144*4882a593Smuzhiyun    -EINVAL  Invalid CPU number.
145*4882a593Smuzhiyun    -EFAULT  Invalid user pointer for attr->addr.
146*4882a593Smuzhiyun    -ENXIO   CPU event queues not configured or configuration of the
147*4882a593Smuzhiyun	     underlying HW interrupt failed
148*4882a593Smuzhiyun    -EBUSY   No CPU available to serve interrupt
149*4882a593Smuzhiyun    =======  =======================================================
150*4882a593Smuzhiyun
151*4882a593Smuzhiyun4. KVM_DEV_XIVE_GRP_EQ_CONFIG (read-write)
152*4882a593Smuzhiyun     Configures an event queue of a CPU
153*4882a593Smuzhiyun
154*4882a593Smuzhiyun  Attributes:
155*4882a593Smuzhiyun    EQ descriptor identifier (64-bit)
156*4882a593Smuzhiyun
157*4882a593Smuzhiyun  The EQ descriptor identifier is a tuple (server, priority)::
158*4882a593Smuzhiyun
159*4882a593Smuzhiyun    bits:     | 63   ....  32 | 31 .. 3 |  2 .. 0
160*4882a593Smuzhiyun    values:   |    unused     |  server | priority
161*4882a593Smuzhiyun
162*4882a593Smuzhiyun  The kvm_device_attr.addr points to::
163*4882a593Smuzhiyun
164*4882a593Smuzhiyun    struct kvm_ppc_xive_eq {
165*4882a593Smuzhiyun	__u32 flags;
166*4882a593Smuzhiyun	__u32 qshift;
167*4882a593Smuzhiyun	__u64 qaddr;
168*4882a593Smuzhiyun	__u32 qtoggle;
169*4882a593Smuzhiyun	__u32 qindex;
170*4882a593Smuzhiyun	__u8  pad[40];
171*4882a593Smuzhiyun    };
172*4882a593Smuzhiyun
173*4882a593Smuzhiyun  - flags: queue flags
174*4882a593Smuzhiyun      KVM_XIVE_EQ_ALWAYS_NOTIFY (required)
175*4882a593Smuzhiyun	forces notification without using the coalescing mechanism
176*4882a593Smuzhiyun	provided by the XIVE END ESBs.
177*4882a593Smuzhiyun  - qshift: queue size (power of 2)
178*4882a593Smuzhiyun  - qaddr: real address of queue
179*4882a593Smuzhiyun  - qtoggle: current queue toggle bit
180*4882a593Smuzhiyun  - qindex: current queue index
181*4882a593Smuzhiyun  - pad: reserved for future use
182*4882a593Smuzhiyun
183*4882a593Smuzhiyun  Errors:
184*4882a593Smuzhiyun
185*4882a593Smuzhiyun    =======  =========================================
186*4882a593Smuzhiyun    -ENOENT  Invalid CPU number
187*4882a593Smuzhiyun    -EINVAL  Invalid priority
188*4882a593Smuzhiyun    -EINVAL  Invalid flags
189*4882a593Smuzhiyun    -EINVAL  Invalid queue size
190*4882a593Smuzhiyun    -EINVAL  Invalid queue address
191*4882a593Smuzhiyun    -EFAULT  Invalid user pointer for attr->addr.
192*4882a593Smuzhiyun    -EIO     Configuration of the underlying HW failed
193*4882a593Smuzhiyun    =======  =========================================
194*4882a593Smuzhiyun
195*4882a593Smuzhiyun5. KVM_DEV_XIVE_GRP_SOURCE_SYNC (write only)
196*4882a593Smuzhiyun     Synchronize the source to flush event notifications
197*4882a593Smuzhiyun
198*4882a593Smuzhiyun  Attributes:
199*4882a593Smuzhiyun    Interrupt source number  (64-bit)
200*4882a593Smuzhiyun
201*4882a593Smuzhiyun  Errors:
202*4882a593Smuzhiyun
203*4882a593Smuzhiyun    =======  =============================
204*4882a593Smuzhiyun    -ENOENT  Unknown source number
205*4882a593Smuzhiyun    -EINVAL  Not initialized source number
206*4882a593Smuzhiyun    =======  =============================
207*4882a593Smuzhiyun
208*4882a593Smuzhiyun* VCPU state
209*4882a593Smuzhiyun
210*4882a593Smuzhiyun  The XIVE IC maintains VP interrupt state in an internal structure
211*4882a593Smuzhiyun  called the NVT. When a VP is not dispatched on a HW processor
212*4882a593Smuzhiyun  thread, this structure can be updated by HW if the VP is the target
213*4882a593Smuzhiyun  of an event notification.
214*4882a593Smuzhiyun
215*4882a593Smuzhiyun  It is important for migration to capture the cached IPB from the NVT
216*4882a593Smuzhiyun  as it synthesizes the priorities of the pending interrupts. We
217*4882a593Smuzhiyun  capture a bit more to report debug information.
218*4882a593Smuzhiyun
219*4882a593Smuzhiyun  KVM_REG_PPC_VP_STATE (2 * 64bits)::
220*4882a593Smuzhiyun
221*4882a593Smuzhiyun    bits:     |  63  ....  32  |  31  ....  0  |
222*4882a593Smuzhiyun    values:   |   TIMA word0   |   TIMA word1  |
223*4882a593Smuzhiyun    bits:     | 127       ..........       64  |
224*4882a593Smuzhiyun    values:   |            unused              |
225*4882a593Smuzhiyun
226*4882a593Smuzhiyun* Migration:
227*4882a593Smuzhiyun
228*4882a593Smuzhiyun  Saving the state of a VM using the XIVE native exploitation mode
229*4882a593Smuzhiyun  should follow a specific sequence. When the VM is stopped :
230*4882a593Smuzhiyun
231*4882a593Smuzhiyun  1. Mask all sources (PQ=01) to stop the flow of events.
232*4882a593Smuzhiyun
233*4882a593Smuzhiyun  2. Sync the XIVE device with the KVM control KVM_DEV_XIVE_EQ_SYNC to
234*4882a593Smuzhiyun  flush any in-flight event notification and to stabilize the EQs. At
235*4882a593Smuzhiyun  this stage, the EQ pages are marked dirty to make sure they are
236*4882a593Smuzhiyun  transferred in the migration sequence.
237*4882a593Smuzhiyun
238*4882a593Smuzhiyun  3. Capture the state of the source targeting, the EQs configuration
239*4882a593Smuzhiyun  and the state of thread interrupt context registers.
240*4882a593Smuzhiyun
241*4882a593Smuzhiyun  Restore is similar:
242*4882a593Smuzhiyun
243*4882a593Smuzhiyun  1. Restore the EQ configuration. As targeting depends on it.
244*4882a593Smuzhiyun  2. Restore targeting
245*4882a593Smuzhiyun  3. Restore the thread interrupt contexts
246*4882a593Smuzhiyun  4. Restore the source states
247*4882a593Smuzhiyun  5. Let the vCPU run
248