xref: /OK3568_Linux_fs/kernel/Documentation/s390/vfio-ccw.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun==================================
2*4882a593Smuzhiyunvfio-ccw: the basic infrastructure
3*4882a593Smuzhiyun==================================
4*4882a593Smuzhiyun
5*4882a593SmuzhiyunIntroduction
6*4882a593Smuzhiyun------------
7*4882a593Smuzhiyun
8*4882a593SmuzhiyunHere we describe the vfio support for I/O subchannel devices for
9*4882a593SmuzhiyunLinux/s390. Motivation for vfio-ccw is to passthrough subchannels to a
10*4882a593Smuzhiyunvirtual machine, while vfio is the means.
11*4882a593Smuzhiyun
12*4882a593SmuzhiyunDifferent than other hardware architectures, s390 has defined a unified
13*4882a593SmuzhiyunI/O access method, which is so called Channel I/O. It has its own access
14*4882a593Smuzhiyunpatterns:
15*4882a593Smuzhiyun
16*4882a593Smuzhiyun- Channel programs run asynchronously on a separate (co)processor.
17*4882a593Smuzhiyun- The channel subsystem will access any memory designated by the caller
18*4882a593Smuzhiyun  in the channel program directly, i.e. there is no iommu involved.
19*4882a593Smuzhiyun
20*4882a593SmuzhiyunThus when we introduce vfio support for these devices, we realize it
21*4882a593Smuzhiyunwith a mediated device (mdev) implementation. The vfio mdev will be
22*4882a593Smuzhiyunadded to an iommu group, so as to make itself able to be managed by the
23*4882a593Smuzhiyunvfio framework. And we add read/write callbacks for special vfio I/O
24*4882a593Smuzhiyunregions to pass the channel programs from the mdev to its parent device
25*4882a593Smuzhiyun(the real I/O subchannel device) to do further address translation and
26*4882a593Smuzhiyunto perform I/O instructions.
27*4882a593Smuzhiyun
28*4882a593SmuzhiyunThis document does not intend to explain the s390 I/O architecture in
29*4882a593Smuzhiyunevery detail. More information/reference could be found here:
30*4882a593Smuzhiyun
31*4882a593Smuzhiyun- A good start to know Channel I/O in general:
32*4882a593Smuzhiyun  https://en.wikipedia.org/wiki/Channel_I/O
33*4882a593Smuzhiyun- s390 architecture:
34*4882a593Smuzhiyun  s390 Principles of Operation manual (IBM Form. No. SA22-7832)
35*4882a593Smuzhiyun- The existing QEMU code which implements a simple emulated channel
36*4882a593Smuzhiyun  subsystem could also be a good reference. It makes it easier to follow
37*4882a593Smuzhiyun  the flow.
38*4882a593Smuzhiyun  qemu/hw/s390x/css.c
39*4882a593Smuzhiyun
40*4882a593SmuzhiyunFor vfio mediated device framework:
41*4882a593Smuzhiyun- Documentation/driver-api/vfio-mediated-device.rst
42*4882a593Smuzhiyun
43*4882a593SmuzhiyunMotivation of vfio-ccw
44*4882a593Smuzhiyun----------------------
45*4882a593Smuzhiyun
46*4882a593SmuzhiyunTypically, a guest virtualized via QEMU/KVM on s390 only sees
47*4882a593Smuzhiyunparavirtualized virtio devices via the "Virtio Over Channel I/O
48*4882a593Smuzhiyun(virtio-ccw)" transport. This makes virtio devices discoverable via
49*4882a593Smuzhiyunstandard operating system algorithms for handling channel devices.
50*4882a593Smuzhiyun
51*4882a593SmuzhiyunHowever this is not enough. On s390 for the majority of devices, which
52*4882a593Smuzhiyunuse the standard Channel I/O based mechanism, we also need to provide
53*4882a593Smuzhiyunthe functionality of passing through them to a QEMU virtual machine.
54*4882a593SmuzhiyunThis includes devices that don't have a virtio counterpart (e.g. tape
55*4882a593Smuzhiyundrives) or that have specific characteristics which guests want to
56*4882a593Smuzhiyunexploit.
57*4882a593Smuzhiyun
58*4882a593SmuzhiyunFor passing a device to a guest, we want to use the same interface as
59*4882a593Smuzhiyuneverybody else, namely vfio. We implement this vfio support for channel
60*4882a593Smuzhiyundevices via the vfio mediated device framework and the subchannel device
61*4882a593Smuzhiyundriver "vfio_ccw".
62*4882a593Smuzhiyun
63*4882a593SmuzhiyunAccess patterns of CCW devices
64*4882a593Smuzhiyun------------------------------
65*4882a593Smuzhiyun
66*4882a593Smuzhiyuns390 architecture has implemented a so called channel subsystem, that
67*4882a593Smuzhiyunprovides a unified view of the devices physically attached to the
68*4882a593Smuzhiyunsystems. Though the s390 hardware platform knows about a huge variety of
69*4882a593Smuzhiyundifferent peripheral attachments like disk devices (aka. DASDs), tapes,
70*4882a593Smuzhiyuncommunication controllers, etc. They can all be accessed by a well
71*4882a593Smuzhiyundefined access method and they are presenting I/O completion a unified
72*4882a593Smuzhiyunway: I/O interruptions.
73*4882a593Smuzhiyun
74*4882a593SmuzhiyunAll I/O requires the use of channel command words (CCWs). A CCW is an
75*4882a593Smuzhiyuninstruction to a specialized I/O channel processor. A channel program is
76*4882a593Smuzhiyuna sequence of CCWs which are executed by the I/O channel subsystem.  To
77*4882a593Smuzhiyunissue a channel program to the channel subsystem, it is required to
78*4882a593Smuzhiyunbuild an operation request block (ORB), which can be used to point out
79*4882a593Smuzhiyunthe format of the CCW and other control information to the system. The
80*4882a593Smuzhiyunoperating system signals the I/O channel subsystem to begin executing
81*4882a593Smuzhiyunthe channel program with a SSCH (start sub-channel) instruction. The
82*4882a593Smuzhiyuncentral processor is then free to proceed with non-I/O instructions
83*4882a593Smuzhiyununtil interrupted. The I/O completion result is received by the
84*4882a593Smuzhiyuninterrupt handler in the form of interrupt response block (IRB).
85*4882a593Smuzhiyun
86*4882a593SmuzhiyunBack to vfio-ccw, in short:
87*4882a593Smuzhiyun
88*4882a593Smuzhiyun- ORBs and channel programs are built in guest kernel (with guest
89*4882a593Smuzhiyun  physical addresses).
90*4882a593Smuzhiyun- ORBs and channel programs are passed to the host kernel.
91*4882a593Smuzhiyun- Host kernel translates the guest physical addresses to real addresses
92*4882a593Smuzhiyun  and starts the I/O with issuing a privileged Channel I/O instruction
93*4882a593Smuzhiyun  (e.g SSCH).
94*4882a593Smuzhiyun- channel programs run asynchronously on a separate processor.
95*4882a593Smuzhiyun- I/O completion will be signaled to the host with I/O interruptions.
96*4882a593Smuzhiyun  And it will be copied as IRB to user space to pass it back to the
97*4882a593Smuzhiyun  guest.
98*4882a593Smuzhiyun
99*4882a593SmuzhiyunPhysical vfio ccw device and its child mdev
100*4882a593Smuzhiyun-------------------------------------------
101*4882a593Smuzhiyun
102*4882a593SmuzhiyunAs mentioned above, we realize vfio-ccw with a mdev implementation.
103*4882a593Smuzhiyun
104*4882a593SmuzhiyunChannel I/O does not have IOMMU hardware support, so the physical
105*4882a593Smuzhiyunvfio-ccw device does not have an IOMMU level translation or isolation.
106*4882a593Smuzhiyun
107*4882a593SmuzhiyunSubchannel I/O instructions are all privileged instructions. When
108*4882a593Smuzhiyunhandling the I/O instruction interception, vfio-ccw has the software
109*4882a593Smuzhiyunpolicing and translation how the channel program is programmed before
110*4882a593Smuzhiyunit gets sent to hardware.
111*4882a593Smuzhiyun
112*4882a593SmuzhiyunWithin this implementation, we have two drivers for two types of
113*4882a593Smuzhiyundevices:
114*4882a593Smuzhiyun
115*4882a593Smuzhiyun- The vfio_ccw driver for the physical subchannel device.
116*4882a593Smuzhiyun  This is an I/O subchannel driver for the real subchannel device.  It
117*4882a593Smuzhiyun  realizes a group of callbacks and registers to the mdev framework as a
118*4882a593Smuzhiyun  parent (physical) device. As a consequence, mdev provides vfio_ccw a
119*4882a593Smuzhiyun  generic interface (sysfs) to create mdev devices. A vfio mdev could be
120*4882a593Smuzhiyun  created by vfio_ccw then and added to the mediated bus. It is the vfio
121*4882a593Smuzhiyun  device that added to an IOMMU group and a vfio group.
122*4882a593Smuzhiyun  vfio_ccw also provides an I/O region to accept channel program
123*4882a593Smuzhiyun  request from user space and store I/O interrupt result for user
124*4882a593Smuzhiyun  space to retrieve. To notify user space an I/O completion, it offers
125*4882a593Smuzhiyun  an interface to setup an eventfd fd for asynchronous signaling.
126*4882a593Smuzhiyun
127*4882a593Smuzhiyun- The vfio_mdev driver for the mediated vfio ccw device.
128*4882a593Smuzhiyun  This is provided by the mdev framework. It is a vfio device driver for
129*4882a593Smuzhiyun  the mdev that created by vfio_ccw.
130*4882a593Smuzhiyun  It realizes a group of vfio device driver callbacks, adds itself to a
131*4882a593Smuzhiyun  vfio group, and registers itself to the mdev framework as a mdev
132*4882a593Smuzhiyun  driver.
133*4882a593Smuzhiyun  It uses a vfio iommu backend that uses the existing map and unmap
134*4882a593Smuzhiyun  ioctls, but rather than programming them into an IOMMU for a device,
135*4882a593Smuzhiyun  it simply stores the translations for use by later requests. This
136*4882a593Smuzhiyun  means that a device programmed in a VM with guest physical addresses
137*4882a593Smuzhiyun  can have the vfio kernel convert that address to process virtual
138*4882a593Smuzhiyun  address, pin the page and program the hardware with the host physical
139*4882a593Smuzhiyun  address in one step.
140*4882a593Smuzhiyun  For a mdev, the vfio iommu backend will not pin the pages during the
141*4882a593Smuzhiyun  VFIO_IOMMU_MAP_DMA ioctl. Mdev framework will only maintain a database
142*4882a593Smuzhiyun  of the iova<->vaddr mappings in this operation. And they export a
143*4882a593Smuzhiyun  vfio_pin_pages and a vfio_unpin_pages interfaces from the vfio iommu
144*4882a593Smuzhiyun  backend for the physical devices to pin and unpin pages by demand.
145*4882a593Smuzhiyun
146*4882a593SmuzhiyunBelow is a high Level block diagram::
147*4882a593Smuzhiyun
148*4882a593Smuzhiyun +-------------+
149*4882a593Smuzhiyun |             |
150*4882a593Smuzhiyun | +---------+ | mdev_register_driver() +--------------+
151*4882a593Smuzhiyun | |  Mdev   | +<-----------------------+              |
152*4882a593Smuzhiyun | |  bus    | |                        | vfio_mdev.ko |
153*4882a593Smuzhiyun | | driver  | +----------------------->+              |<-> VFIO user
154*4882a593Smuzhiyun | +---------+ |    probe()/remove()    +--------------+    APIs
155*4882a593Smuzhiyun |             |
156*4882a593Smuzhiyun |  MDEV CORE  |
157*4882a593Smuzhiyun |   MODULE    |
158*4882a593Smuzhiyun |   mdev.ko   |
159*4882a593Smuzhiyun | +---------+ | mdev_register_device() +--------------+
160*4882a593Smuzhiyun | |Physical | +<-----------------------+              |
161*4882a593Smuzhiyun | | device  | |                        |  vfio_ccw.ko |<-> subchannel
162*4882a593Smuzhiyun | |interface| +----------------------->+              |     device
163*4882a593Smuzhiyun | +---------+ |       callback         +--------------+
164*4882a593Smuzhiyun +-------------+
165*4882a593Smuzhiyun
166*4882a593SmuzhiyunThe process of how these work together.
167*4882a593Smuzhiyun
168*4882a593Smuzhiyun1. vfio_ccw.ko drives the physical I/O subchannel, and registers the
169*4882a593Smuzhiyun   physical device (with callbacks) to mdev framework.
170*4882a593Smuzhiyun   When vfio_ccw probing the subchannel device, it registers device
171*4882a593Smuzhiyun   pointer and callbacks to the mdev framework. Mdev related file nodes
172*4882a593Smuzhiyun   under the device node in sysfs would be created for the subchannel
173*4882a593Smuzhiyun   device, namely 'mdev_create', 'mdev_destroy' and
174*4882a593Smuzhiyun   'mdev_supported_types'.
175*4882a593Smuzhiyun2. Create a mediated vfio ccw device.
176*4882a593Smuzhiyun   Use the 'mdev_create' sysfs file, we need to manually create one (and
177*4882a593Smuzhiyun   only one for our case) mediated device.
178*4882a593Smuzhiyun3. vfio_mdev.ko drives the mediated ccw device.
179*4882a593Smuzhiyun   vfio_mdev is also the vfio device drvier. It will probe the mdev and
180*4882a593Smuzhiyun   add it to an iommu_group and a vfio_group. Then we could pass through
181*4882a593Smuzhiyun   the mdev to a guest.
182*4882a593Smuzhiyun
183*4882a593Smuzhiyun
184*4882a593SmuzhiyunVFIO-CCW Regions
185*4882a593Smuzhiyun----------------
186*4882a593Smuzhiyun
187*4882a593SmuzhiyunThe vfio-ccw driver exposes MMIO regions to accept requests from and return
188*4882a593Smuzhiyunresults to userspace.
189*4882a593Smuzhiyun
190*4882a593Smuzhiyunvfio-ccw I/O region
191*4882a593Smuzhiyun-------------------
192*4882a593Smuzhiyun
193*4882a593SmuzhiyunAn I/O region is used to accept channel program request from user
194*4882a593Smuzhiyunspace and store I/O interrupt result for user space to retrieve. The
195*4882a593Smuzhiyundefinition of the region is::
196*4882a593Smuzhiyun
197*4882a593Smuzhiyun  struct ccw_io_region {
198*4882a593Smuzhiyun  #define ORB_AREA_SIZE 12
199*4882a593Smuzhiyun	  __u8    orb_area[ORB_AREA_SIZE];
200*4882a593Smuzhiyun  #define SCSW_AREA_SIZE 12
201*4882a593Smuzhiyun	  __u8    scsw_area[SCSW_AREA_SIZE];
202*4882a593Smuzhiyun  #define IRB_AREA_SIZE 96
203*4882a593Smuzhiyun	  __u8    irb_area[IRB_AREA_SIZE];
204*4882a593Smuzhiyun	  __u32   ret_code;
205*4882a593Smuzhiyun  } __packed;
206*4882a593Smuzhiyun
207*4882a593SmuzhiyunThis region is always available.
208*4882a593Smuzhiyun
209*4882a593SmuzhiyunWhile starting an I/O request, orb_area should be filled with the
210*4882a593Smuzhiyunguest ORB, and scsw_area should be filled with the SCSW of the Virtual
211*4882a593SmuzhiyunSubchannel.
212*4882a593Smuzhiyun
213*4882a593Smuzhiyunirb_area stores the I/O result.
214*4882a593Smuzhiyun
215*4882a593Smuzhiyunret_code stores a return code for each access of the region. The following
216*4882a593Smuzhiyunvalues may occur:
217*4882a593Smuzhiyun
218*4882a593Smuzhiyun``0``
219*4882a593Smuzhiyun  The operation was successful.
220*4882a593Smuzhiyun
221*4882a593Smuzhiyun``-EOPNOTSUPP``
222*4882a593Smuzhiyun  The orb specified transport mode or an unidentified IDAW format, or the
223*4882a593Smuzhiyun  scsw specified a function other than the start function.
224*4882a593Smuzhiyun
225*4882a593Smuzhiyun``-EIO``
226*4882a593Smuzhiyun  A request was issued while the device was not in a state ready to accept
227*4882a593Smuzhiyun  requests, or an internal error occurred.
228*4882a593Smuzhiyun
229*4882a593Smuzhiyun``-EBUSY``
230*4882a593Smuzhiyun  The subchannel was status pending or busy, or a request is already active.
231*4882a593Smuzhiyun
232*4882a593Smuzhiyun``-EAGAIN``
233*4882a593Smuzhiyun  A request was being processed, and the caller should retry.
234*4882a593Smuzhiyun
235*4882a593Smuzhiyun``-EACCES``
236*4882a593Smuzhiyun  The channel path(s) used for the I/O were found to be not operational.
237*4882a593Smuzhiyun
238*4882a593Smuzhiyun``-ENODEV``
239*4882a593Smuzhiyun  The device was found to be not operational.
240*4882a593Smuzhiyun
241*4882a593Smuzhiyun``-EINVAL``
242*4882a593Smuzhiyun  The orb specified a chain longer than 255 ccws, or an internal error
243*4882a593Smuzhiyun  occurred.
244*4882a593Smuzhiyun
245*4882a593Smuzhiyun
246*4882a593Smuzhiyunvfio-ccw cmd region
247*4882a593Smuzhiyun-------------------
248*4882a593Smuzhiyun
249*4882a593SmuzhiyunThe vfio-ccw cmd region is used to accept asynchronous instructions
250*4882a593Smuzhiyunfrom userspace::
251*4882a593Smuzhiyun
252*4882a593Smuzhiyun  #define VFIO_CCW_ASYNC_CMD_HSCH (1 << 0)
253*4882a593Smuzhiyun  #define VFIO_CCW_ASYNC_CMD_CSCH (1 << 1)
254*4882a593Smuzhiyun  struct ccw_cmd_region {
255*4882a593Smuzhiyun         __u32 command;
256*4882a593Smuzhiyun         __u32 ret_code;
257*4882a593Smuzhiyun  } __packed;
258*4882a593Smuzhiyun
259*4882a593SmuzhiyunThis region is exposed via region type VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD.
260*4882a593Smuzhiyun
261*4882a593SmuzhiyunCurrently, CLEAR SUBCHANNEL and HALT SUBCHANNEL use this region.
262*4882a593Smuzhiyun
263*4882a593Smuzhiyuncommand specifies the command to be issued; ret_code stores a return code
264*4882a593Smuzhiyunfor each access of the region. The following values may occur:
265*4882a593Smuzhiyun
266*4882a593Smuzhiyun``0``
267*4882a593Smuzhiyun  The operation was successful.
268*4882a593Smuzhiyun
269*4882a593Smuzhiyun``-ENODEV``
270*4882a593Smuzhiyun  The device was found to be not operational.
271*4882a593Smuzhiyun
272*4882a593Smuzhiyun``-EINVAL``
273*4882a593Smuzhiyun  A command other than halt or clear was specified.
274*4882a593Smuzhiyun
275*4882a593Smuzhiyun``-EIO``
276*4882a593Smuzhiyun  A request was issued while the device was not in a state ready to accept
277*4882a593Smuzhiyun  requests.
278*4882a593Smuzhiyun
279*4882a593Smuzhiyun``-EAGAIN``
280*4882a593Smuzhiyun  A request was being processed, and the caller should retry.
281*4882a593Smuzhiyun
282*4882a593Smuzhiyun``-EBUSY``
283*4882a593Smuzhiyun  The subchannel was status pending or busy while processing a halt request.
284*4882a593Smuzhiyun
285*4882a593Smuzhiyunvfio-ccw schib region
286*4882a593Smuzhiyun---------------------
287*4882a593Smuzhiyun
288*4882a593SmuzhiyunThe vfio-ccw schib region is used to return Subchannel-Information
289*4882a593SmuzhiyunBlock (SCHIB) data to userspace::
290*4882a593Smuzhiyun
291*4882a593Smuzhiyun  struct ccw_schib_region {
292*4882a593Smuzhiyun  #define SCHIB_AREA_SIZE 52
293*4882a593Smuzhiyun         __u8 schib_area[SCHIB_AREA_SIZE];
294*4882a593Smuzhiyun  } __packed;
295*4882a593Smuzhiyun
296*4882a593SmuzhiyunThis region is exposed via region type VFIO_REGION_SUBTYPE_CCW_SCHIB.
297*4882a593Smuzhiyun
298*4882a593SmuzhiyunReading this region triggers a STORE SUBCHANNEL to be issued to the
299*4882a593Smuzhiyunassociated hardware.
300*4882a593Smuzhiyun
301*4882a593Smuzhiyunvfio-ccw crw region
302*4882a593Smuzhiyun---------------------
303*4882a593Smuzhiyun
304*4882a593SmuzhiyunThe vfio-ccw crw region is used to return Channel Report Word (CRW)
305*4882a593Smuzhiyundata to userspace::
306*4882a593Smuzhiyun
307*4882a593Smuzhiyun  struct ccw_crw_region {
308*4882a593Smuzhiyun         __u32 crw;
309*4882a593Smuzhiyun         __u32 pad;
310*4882a593Smuzhiyun  } __packed;
311*4882a593Smuzhiyun
312*4882a593SmuzhiyunThis region is exposed via region type VFIO_REGION_SUBTYPE_CCW_CRW.
313*4882a593Smuzhiyun
314*4882a593SmuzhiyunReading this region returns a CRW if one that is relevant for this
315*4882a593Smuzhiyunsubchannel (e.g. one reporting changes in channel path state) is
316*4882a593Smuzhiyunpending, or all zeroes if not. If multiple CRWs are pending (including
317*4882a593Smuzhiyunpossibly chained CRWs), reading this region again will return the next
318*4882a593Smuzhiyunone, until no more CRWs are pending and zeroes are returned. This is
319*4882a593Smuzhiyunsimilar to how STORE CHANNEL REPORT WORD works.
320*4882a593Smuzhiyun
321*4882a593Smuzhiyunvfio-ccw operation details
322*4882a593Smuzhiyun--------------------------
323*4882a593Smuzhiyun
324*4882a593Smuzhiyunvfio-ccw follows what vfio-pci did on the s390 platform and uses
325*4882a593Smuzhiyunvfio-iommu-type1 as the vfio iommu backend.
326*4882a593Smuzhiyun
327*4882a593Smuzhiyun* CCW translation APIs
328*4882a593Smuzhiyun  A group of APIs (start with `cp_`) to do CCW translation. The CCWs
329*4882a593Smuzhiyun  passed in by a user space program are organized with their guest
330*4882a593Smuzhiyun  physical memory addresses. These APIs will copy the CCWs into kernel
331*4882a593Smuzhiyun  space, and assemble a runnable kernel channel program by updating the
332*4882a593Smuzhiyun  guest physical addresses with their corresponding host physical addresses.
333*4882a593Smuzhiyun  Note that we have to use IDALs even for direct-access CCWs, as the
334*4882a593Smuzhiyun  referenced memory can be located anywhere, including above 2G.
335*4882a593Smuzhiyun
336*4882a593Smuzhiyun* vfio_ccw device driver
337*4882a593Smuzhiyun  This driver utilizes the CCW translation APIs and introduces
338*4882a593Smuzhiyun  vfio_ccw, which is the driver for the I/O subchannel devices you want
339*4882a593Smuzhiyun  to pass through.
340*4882a593Smuzhiyun  vfio_ccw implements the following vfio ioctls::
341*4882a593Smuzhiyun
342*4882a593Smuzhiyun    VFIO_DEVICE_GET_INFO
343*4882a593Smuzhiyun    VFIO_DEVICE_GET_IRQ_INFO
344*4882a593Smuzhiyun    VFIO_DEVICE_GET_REGION_INFO
345*4882a593Smuzhiyun    VFIO_DEVICE_RESET
346*4882a593Smuzhiyun    VFIO_DEVICE_SET_IRQS
347*4882a593Smuzhiyun
348*4882a593Smuzhiyun  This provides an I/O region, so that the user space program can pass a
349*4882a593Smuzhiyun  channel program to the kernel, to do further CCW translation before
350*4882a593Smuzhiyun  issuing them to a real device.
351*4882a593Smuzhiyun  This also provides the SET_IRQ ioctl to setup an event notifier to
352*4882a593Smuzhiyun  notify the user space program the I/O completion in an asynchronous
353*4882a593Smuzhiyun  way.
354*4882a593Smuzhiyun
355*4882a593SmuzhiyunThe use of vfio-ccw is not limited to QEMU, while QEMU is definitely a
356*4882a593Smuzhiyungood example to get understand how these patches work. Here is a little
357*4882a593Smuzhiyunbit more detail how an I/O request triggered by the QEMU guest will be
358*4882a593Smuzhiyunhandled (without error handling).
359*4882a593Smuzhiyun
360*4882a593SmuzhiyunExplanation:
361*4882a593Smuzhiyun
362*4882a593Smuzhiyun- Q1-Q7: QEMU side process.
363*4882a593Smuzhiyun- K1-K5: Kernel side process.
364*4882a593Smuzhiyun
365*4882a593SmuzhiyunQ1.
366*4882a593Smuzhiyun    Get I/O region info during initialization.
367*4882a593Smuzhiyun
368*4882a593SmuzhiyunQ2.
369*4882a593Smuzhiyun    Setup event notifier and handler to handle I/O completion.
370*4882a593Smuzhiyun
371*4882a593Smuzhiyun... ...
372*4882a593Smuzhiyun
373*4882a593SmuzhiyunQ3.
374*4882a593Smuzhiyun    Intercept a ssch instruction.
375*4882a593SmuzhiyunQ4.
376*4882a593Smuzhiyun    Write the guest channel program and ORB to the I/O region.
377*4882a593Smuzhiyun
378*4882a593Smuzhiyun    K1.
379*4882a593Smuzhiyun	Copy from guest to kernel.
380*4882a593Smuzhiyun    K2.
381*4882a593Smuzhiyun	Translate the guest channel program to a host kernel space
382*4882a593Smuzhiyun	channel program, which becomes runnable for a real device.
383*4882a593Smuzhiyun    K3.
384*4882a593Smuzhiyun	With the necessary information contained in the orb passed in
385*4882a593Smuzhiyun	by QEMU, issue the ccwchain to the device.
386*4882a593Smuzhiyun    K4.
387*4882a593Smuzhiyun	Return the ssch CC code.
388*4882a593SmuzhiyunQ5.
389*4882a593Smuzhiyun    Return the CC code to the guest.
390*4882a593Smuzhiyun
391*4882a593Smuzhiyun... ...
392*4882a593Smuzhiyun
393*4882a593Smuzhiyun    K5.
394*4882a593Smuzhiyun	Interrupt handler gets the I/O result and write the result to
395*4882a593Smuzhiyun	the I/O region.
396*4882a593Smuzhiyun    K6.
397*4882a593Smuzhiyun	Signal QEMU to retrieve the result.
398*4882a593Smuzhiyun
399*4882a593SmuzhiyunQ6.
400*4882a593Smuzhiyun    Get the signal and event handler reads out the result from the I/O
401*4882a593Smuzhiyun    region.
402*4882a593SmuzhiyunQ7.
403*4882a593Smuzhiyun    Update the irb for the guest.
404*4882a593Smuzhiyun
405*4882a593SmuzhiyunLimitations
406*4882a593Smuzhiyun-----------
407*4882a593Smuzhiyun
408*4882a593SmuzhiyunThe current vfio-ccw implementation focuses on supporting basic commands
409*4882a593Smuzhiyunneeded to implement block device functionality (read/write) of DASD/ECKD
410*4882a593Smuzhiyundevice only. Some commands may need special handling in the future, for
411*4882a593Smuzhiyunexample, anything related to path grouping.
412*4882a593Smuzhiyun
413*4882a593SmuzhiyunDASD is a kind of storage device. While ECKD is a data recording format.
414*4882a593SmuzhiyunMore information for DASD and ECKD could be found here:
415*4882a593Smuzhiyunhttps://en.wikipedia.org/wiki/Direct-access_storage_device
416*4882a593Smuzhiyunhttps://en.wikipedia.org/wiki/Count_key_data
417*4882a593Smuzhiyun
418*4882a593SmuzhiyunTogether with the corresponding work in QEMU, we can bring the passed
419*4882a593Smuzhiyunthrough DASD/ECKD device online in a guest now and use it as a block
420*4882a593Smuzhiyundevice.
421*4882a593Smuzhiyun
422*4882a593SmuzhiyunThe current code allows the guest to start channel programs via
423*4882a593SmuzhiyunSTART SUBCHANNEL, and to issue HALT SUBCHANNEL, CLEAR SUBCHANNEL,
424*4882a593Smuzhiyunand STORE SUBCHANNEL.
425*4882a593Smuzhiyun
426*4882a593SmuzhiyunCurrently all channel programs are prefetched, regardless of the
427*4882a593Smuzhiyunp-bit setting in the ORB.  As a result, self modifying channel
428*4882a593Smuzhiyunprograms are not supported.  For this reason, IPL has to be handled as
429*4882a593Smuzhiyuna special case by a userspace/guest program; this has been implemented
430*4882a593Smuzhiyunin QEMU's s390-ccw bios as of QEMU 4.1.
431*4882a593Smuzhiyun
432*4882a593Smuzhiyunvfio-ccw supports classic (command mode) channel I/O only. Transport
433*4882a593Smuzhiyunmode (HPF) is not supported.
434*4882a593Smuzhiyun
435*4882a593SmuzhiyunQDIO subchannels are currently not supported. Classic devices other than
436*4882a593SmuzhiyunDASD/ECKD might work, but have not been tested.
437*4882a593Smuzhiyun
438*4882a593SmuzhiyunReference
439*4882a593Smuzhiyun---------
440*4882a593Smuzhiyun1. ESA/s390 Principles of Operation manual (IBM Form. No. SA22-7832)
441*4882a593Smuzhiyun2. ESA/390 Common I/O Device Commands manual (IBM Form. No. SA22-7204)
442*4882a593Smuzhiyun3. https://en.wikipedia.org/wiki/Channel_I/O
443*4882a593Smuzhiyun4. Documentation/s390/cds.rst
444*4882a593Smuzhiyun5. Documentation/driver-api/vfio.rst
445*4882a593Smuzhiyun6. Documentation/driver-api/vfio-mediated-device.rst
446