1*4882a593Smuzhiyun================================== 2*4882a593Smuzhiyunvfio-ccw: the basic infrastructure 3*4882a593Smuzhiyun================================== 4*4882a593Smuzhiyun 5*4882a593SmuzhiyunIntroduction 6*4882a593Smuzhiyun------------ 7*4882a593Smuzhiyun 8*4882a593SmuzhiyunHere we describe the vfio support for I/O subchannel devices for 9*4882a593SmuzhiyunLinux/s390. Motivation for vfio-ccw is to passthrough subchannels to a 10*4882a593Smuzhiyunvirtual machine, while vfio is the means. 11*4882a593Smuzhiyun 12*4882a593SmuzhiyunDifferent than other hardware architectures, s390 has defined a unified 13*4882a593SmuzhiyunI/O access method, which is so called Channel I/O. It has its own access 14*4882a593Smuzhiyunpatterns: 15*4882a593Smuzhiyun 16*4882a593Smuzhiyun- Channel programs run asynchronously on a separate (co)processor. 17*4882a593Smuzhiyun- The channel subsystem will access any memory designated by the caller 18*4882a593Smuzhiyun in the channel program directly, i.e. there is no iommu involved. 19*4882a593Smuzhiyun 20*4882a593SmuzhiyunThus when we introduce vfio support for these devices, we realize it 21*4882a593Smuzhiyunwith a mediated device (mdev) implementation. The vfio mdev will be 22*4882a593Smuzhiyunadded to an iommu group, so as to make itself able to be managed by the 23*4882a593Smuzhiyunvfio framework. And we add read/write callbacks for special vfio I/O 24*4882a593Smuzhiyunregions to pass the channel programs from the mdev to its parent device 25*4882a593Smuzhiyun(the real I/O subchannel device) to do further address translation and 26*4882a593Smuzhiyunto perform I/O instructions. 27*4882a593Smuzhiyun 28*4882a593SmuzhiyunThis document does not intend to explain the s390 I/O architecture in 29*4882a593Smuzhiyunevery detail. More information/reference could be found here: 30*4882a593Smuzhiyun 31*4882a593Smuzhiyun- A good start to know Channel I/O in general: 32*4882a593Smuzhiyun https://en.wikipedia.org/wiki/Channel_I/O 33*4882a593Smuzhiyun- s390 architecture: 34*4882a593Smuzhiyun s390 Principles of Operation manual (IBM Form. No. SA22-7832) 35*4882a593Smuzhiyun- The existing QEMU code which implements a simple emulated channel 36*4882a593Smuzhiyun subsystem could also be a good reference. It makes it easier to follow 37*4882a593Smuzhiyun the flow. 38*4882a593Smuzhiyun qemu/hw/s390x/css.c 39*4882a593Smuzhiyun 40*4882a593SmuzhiyunFor vfio mediated device framework: 41*4882a593Smuzhiyun- Documentation/driver-api/vfio-mediated-device.rst 42*4882a593Smuzhiyun 43*4882a593SmuzhiyunMotivation of vfio-ccw 44*4882a593Smuzhiyun---------------------- 45*4882a593Smuzhiyun 46*4882a593SmuzhiyunTypically, a guest virtualized via QEMU/KVM on s390 only sees 47*4882a593Smuzhiyunparavirtualized virtio devices via the "Virtio Over Channel I/O 48*4882a593Smuzhiyun(virtio-ccw)" transport. This makes virtio devices discoverable via 49*4882a593Smuzhiyunstandard operating system algorithms for handling channel devices. 50*4882a593Smuzhiyun 51*4882a593SmuzhiyunHowever this is not enough. On s390 for the majority of devices, which 52*4882a593Smuzhiyunuse the standard Channel I/O based mechanism, we also need to provide 53*4882a593Smuzhiyunthe functionality of passing through them to a QEMU virtual machine. 54*4882a593SmuzhiyunThis includes devices that don't have a virtio counterpart (e.g. tape 55*4882a593Smuzhiyundrives) or that have specific characteristics which guests want to 56*4882a593Smuzhiyunexploit. 57*4882a593Smuzhiyun 58*4882a593SmuzhiyunFor passing a device to a guest, we want to use the same interface as 59*4882a593Smuzhiyuneverybody else, namely vfio. We implement this vfio support for channel 60*4882a593Smuzhiyundevices via the vfio mediated device framework and the subchannel device 61*4882a593Smuzhiyundriver "vfio_ccw". 62*4882a593Smuzhiyun 63*4882a593SmuzhiyunAccess patterns of CCW devices 64*4882a593Smuzhiyun------------------------------ 65*4882a593Smuzhiyun 66*4882a593Smuzhiyuns390 architecture has implemented a so called channel subsystem, that 67*4882a593Smuzhiyunprovides a unified view of the devices physically attached to the 68*4882a593Smuzhiyunsystems. Though the s390 hardware platform knows about a huge variety of 69*4882a593Smuzhiyundifferent peripheral attachments like disk devices (aka. DASDs), tapes, 70*4882a593Smuzhiyuncommunication controllers, etc. They can all be accessed by a well 71*4882a593Smuzhiyundefined access method and they are presenting I/O completion a unified 72*4882a593Smuzhiyunway: I/O interruptions. 73*4882a593Smuzhiyun 74*4882a593SmuzhiyunAll I/O requires the use of channel command words (CCWs). A CCW is an 75*4882a593Smuzhiyuninstruction to a specialized I/O channel processor. A channel program is 76*4882a593Smuzhiyuna sequence of CCWs which are executed by the I/O channel subsystem. To 77*4882a593Smuzhiyunissue a channel program to the channel subsystem, it is required to 78*4882a593Smuzhiyunbuild an operation request block (ORB), which can be used to point out 79*4882a593Smuzhiyunthe format of the CCW and other control information to the system. The 80*4882a593Smuzhiyunoperating system signals the I/O channel subsystem to begin executing 81*4882a593Smuzhiyunthe channel program with a SSCH (start sub-channel) instruction. The 82*4882a593Smuzhiyuncentral processor is then free to proceed with non-I/O instructions 83*4882a593Smuzhiyununtil interrupted. The I/O completion result is received by the 84*4882a593Smuzhiyuninterrupt handler in the form of interrupt response block (IRB). 85*4882a593Smuzhiyun 86*4882a593SmuzhiyunBack to vfio-ccw, in short: 87*4882a593Smuzhiyun 88*4882a593Smuzhiyun- ORBs and channel programs are built in guest kernel (with guest 89*4882a593Smuzhiyun physical addresses). 90*4882a593Smuzhiyun- ORBs and channel programs are passed to the host kernel. 91*4882a593Smuzhiyun- Host kernel translates the guest physical addresses to real addresses 92*4882a593Smuzhiyun and starts the I/O with issuing a privileged Channel I/O instruction 93*4882a593Smuzhiyun (e.g SSCH). 94*4882a593Smuzhiyun- channel programs run asynchronously on a separate processor. 95*4882a593Smuzhiyun- I/O completion will be signaled to the host with I/O interruptions. 96*4882a593Smuzhiyun And it will be copied as IRB to user space to pass it back to the 97*4882a593Smuzhiyun guest. 98*4882a593Smuzhiyun 99*4882a593SmuzhiyunPhysical vfio ccw device and its child mdev 100*4882a593Smuzhiyun------------------------------------------- 101*4882a593Smuzhiyun 102*4882a593SmuzhiyunAs mentioned above, we realize vfio-ccw with a mdev implementation. 103*4882a593Smuzhiyun 104*4882a593SmuzhiyunChannel I/O does not have IOMMU hardware support, so the physical 105*4882a593Smuzhiyunvfio-ccw device does not have an IOMMU level translation or isolation. 106*4882a593Smuzhiyun 107*4882a593SmuzhiyunSubchannel I/O instructions are all privileged instructions. When 108*4882a593Smuzhiyunhandling the I/O instruction interception, vfio-ccw has the software 109*4882a593Smuzhiyunpolicing and translation how the channel program is programmed before 110*4882a593Smuzhiyunit gets sent to hardware. 111*4882a593Smuzhiyun 112*4882a593SmuzhiyunWithin this implementation, we have two drivers for two types of 113*4882a593Smuzhiyundevices: 114*4882a593Smuzhiyun 115*4882a593Smuzhiyun- The vfio_ccw driver for the physical subchannel device. 116*4882a593Smuzhiyun This is an I/O subchannel driver for the real subchannel device. It 117*4882a593Smuzhiyun realizes a group of callbacks and registers to the mdev framework as a 118*4882a593Smuzhiyun parent (physical) device. As a consequence, mdev provides vfio_ccw a 119*4882a593Smuzhiyun generic interface (sysfs) to create mdev devices. A vfio mdev could be 120*4882a593Smuzhiyun created by vfio_ccw then and added to the mediated bus. It is the vfio 121*4882a593Smuzhiyun device that added to an IOMMU group and a vfio group. 122*4882a593Smuzhiyun vfio_ccw also provides an I/O region to accept channel program 123*4882a593Smuzhiyun request from user space and store I/O interrupt result for user 124*4882a593Smuzhiyun space to retrieve. To notify user space an I/O completion, it offers 125*4882a593Smuzhiyun an interface to setup an eventfd fd for asynchronous signaling. 126*4882a593Smuzhiyun 127*4882a593Smuzhiyun- The vfio_mdev driver for the mediated vfio ccw device. 128*4882a593Smuzhiyun This is provided by the mdev framework. It is a vfio device driver for 129*4882a593Smuzhiyun the mdev that created by vfio_ccw. 130*4882a593Smuzhiyun It realizes a group of vfio device driver callbacks, adds itself to a 131*4882a593Smuzhiyun vfio group, and registers itself to the mdev framework as a mdev 132*4882a593Smuzhiyun driver. 133*4882a593Smuzhiyun It uses a vfio iommu backend that uses the existing map and unmap 134*4882a593Smuzhiyun ioctls, but rather than programming them into an IOMMU for a device, 135*4882a593Smuzhiyun it simply stores the translations for use by later requests. This 136*4882a593Smuzhiyun means that a device programmed in a VM with guest physical addresses 137*4882a593Smuzhiyun can have the vfio kernel convert that address to process virtual 138*4882a593Smuzhiyun address, pin the page and program the hardware with the host physical 139*4882a593Smuzhiyun address in one step. 140*4882a593Smuzhiyun For a mdev, the vfio iommu backend will not pin the pages during the 141*4882a593Smuzhiyun VFIO_IOMMU_MAP_DMA ioctl. Mdev framework will only maintain a database 142*4882a593Smuzhiyun of the iova<->vaddr mappings in this operation. And they export a 143*4882a593Smuzhiyun vfio_pin_pages and a vfio_unpin_pages interfaces from the vfio iommu 144*4882a593Smuzhiyun backend for the physical devices to pin and unpin pages by demand. 145*4882a593Smuzhiyun 146*4882a593SmuzhiyunBelow is a high Level block diagram:: 147*4882a593Smuzhiyun 148*4882a593Smuzhiyun +-------------+ 149*4882a593Smuzhiyun | | 150*4882a593Smuzhiyun | +---------+ | mdev_register_driver() +--------------+ 151*4882a593Smuzhiyun | | Mdev | +<-----------------------+ | 152*4882a593Smuzhiyun | | bus | | | vfio_mdev.ko | 153*4882a593Smuzhiyun | | driver | +----------------------->+ |<-> VFIO user 154*4882a593Smuzhiyun | +---------+ | probe()/remove() +--------------+ APIs 155*4882a593Smuzhiyun | | 156*4882a593Smuzhiyun | MDEV CORE | 157*4882a593Smuzhiyun | MODULE | 158*4882a593Smuzhiyun | mdev.ko | 159*4882a593Smuzhiyun | +---------+ | mdev_register_device() +--------------+ 160*4882a593Smuzhiyun | |Physical | +<-----------------------+ | 161*4882a593Smuzhiyun | | device | | | vfio_ccw.ko |<-> subchannel 162*4882a593Smuzhiyun | |interface| +----------------------->+ | device 163*4882a593Smuzhiyun | +---------+ | callback +--------------+ 164*4882a593Smuzhiyun +-------------+ 165*4882a593Smuzhiyun 166*4882a593SmuzhiyunThe process of how these work together. 167*4882a593Smuzhiyun 168*4882a593Smuzhiyun1. vfio_ccw.ko drives the physical I/O subchannel, and registers the 169*4882a593Smuzhiyun physical device (with callbacks) to mdev framework. 170*4882a593Smuzhiyun When vfio_ccw probing the subchannel device, it registers device 171*4882a593Smuzhiyun pointer and callbacks to the mdev framework. Mdev related file nodes 172*4882a593Smuzhiyun under the device node in sysfs would be created for the subchannel 173*4882a593Smuzhiyun device, namely 'mdev_create', 'mdev_destroy' and 174*4882a593Smuzhiyun 'mdev_supported_types'. 175*4882a593Smuzhiyun2. Create a mediated vfio ccw device. 176*4882a593Smuzhiyun Use the 'mdev_create' sysfs file, we need to manually create one (and 177*4882a593Smuzhiyun only one for our case) mediated device. 178*4882a593Smuzhiyun3. vfio_mdev.ko drives the mediated ccw device. 179*4882a593Smuzhiyun vfio_mdev is also the vfio device drvier. It will probe the mdev and 180*4882a593Smuzhiyun add it to an iommu_group and a vfio_group. Then we could pass through 181*4882a593Smuzhiyun the mdev to a guest. 182*4882a593Smuzhiyun 183*4882a593Smuzhiyun 184*4882a593SmuzhiyunVFIO-CCW Regions 185*4882a593Smuzhiyun---------------- 186*4882a593Smuzhiyun 187*4882a593SmuzhiyunThe vfio-ccw driver exposes MMIO regions to accept requests from and return 188*4882a593Smuzhiyunresults to userspace. 189*4882a593Smuzhiyun 190*4882a593Smuzhiyunvfio-ccw I/O region 191*4882a593Smuzhiyun------------------- 192*4882a593Smuzhiyun 193*4882a593SmuzhiyunAn I/O region is used to accept channel program request from user 194*4882a593Smuzhiyunspace and store I/O interrupt result for user space to retrieve. The 195*4882a593Smuzhiyundefinition of the region is:: 196*4882a593Smuzhiyun 197*4882a593Smuzhiyun struct ccw_io_region { 198*4882a593Smuzhiyun #define ORB_AREA_SIZE 12 199*4882a593Smuzhiyun __u8 orb_area[ORB_AREA_SIZE]; 200*4882a593Smuzhiyun #define SCSW_AREA_SIZE 12 201*4882a593Smuzhiyun __u8 scsw_area[SCSW_AREA_SIZE]; 202*4882a593Smuzhiyun #define IRB_AREA_SIZE 96 203*4882a593Smuzhiyun __u8 irb_area[IRB_AREA_SIZE]; 204*4882a593Smuzhiyun __u32 ret_code; 205*4882a593Smuzhiyun } __packed; 206*4882a593Smuzhiyun 207*4882a593SmuzhiyunThis region is always available. 208*4882a593Smuzhiyun 209*4882a593SmuzhiyunWhile starting an I/O request, orb_area should be filled with the 210*4882a593Smuzhiyunguest ORB, and scsw_area should be filled with the SCSW of the Virtual 211*4882a593SmuzhiyunSubchannel. 212*4882a593Smuzhiyun 213*4882a593Smuzhiyunirb_area stores the I/O result. 214*4882a593Smuzhiyun 215*4882a593Smuzhiyunret_code stores a return code for each access of the region. The following 216*4882a593Smuzhiyunvalues may occur: 217*4882a593Smuzhiyun 218*4882a593Smuzhiyun``0`` 219*4882a593Smuzhiyun The operation was successful. 220*4882a593Smuzhiyun 221*4882a593Smuzhiyun``-EOPNOTSUPP`` 222*4882a593Smuzhiyun The orb specified transport mode or an unidentified IDAW format, or the 223*4882a593Smuzhiyun scsw specified a function other than the start function. 224*4882a593Smuzhiyun 225*4882a593Smuzhiyun``-EIO`` 226*4882a593Smuzhiyun A request was issued while the device was not in a state ready to accept 227*4882a593Smuzhiyun requests, or an internal error occurred. 228*4882a593Smuzhiyun 229*4882a593Smuzhiyun``-EBUSY`` 230*4882a593Smuzhiyun The subchannel was status pending or busy, or a request is already active. 231*4882a593Smuzhiyun 232*4882a593Smuzhiyun``-EAGAIN`` 233*4882a593Smuzhiyun A request was being processed, and the caller should retry. 234*4882a593Smuzhiyun 235*4882a593Smuzhiyun``-EACCES`` 236*4882a593Smuzhiyun The channel path(s) used for the I/O were found to be not operational. 237*4882a593Smuzhiyun 238*4882a593Smuzhiyun``-ENODEV`` 239*4882a593Smuzhiyun The device was found to be not operational. 240*4882a593Smuzhiyun 241*4882a593Smuzhiyun``-EINVAL`` 242*4882a593Smuzhiyun The orb specified a chain longer than 255 ccws, or an internal error 243*4882a593Smuzhiyun occurred. 244*4882a593Smuzhiyun 245*4882a593Smuzhiyun 246*4882a593Smuzhiyunvfio-ccw cmd region 247*4882a593Smuzhiyun------------------- 248*4882a593Smuzhiyun 249*4882a593SmuzhiyunThe vfio-ccw cmd region is used to accept asynchronous instructions 250*4882a593Smuzhiyunfrom userspace:: 251*4882a593Smuzhiyun 252*4882a593Smuzhiyun #define VFIO_CCW_ASYNC_CMD_HSCH (1 << 0) 253*4882a593Smuzhiyun #define VFIO_CCW_ASYNC_CMD_CSCH (1 << 1) 254*4882a593Smuzhiyun struct ccw_cmd_region { 255*4882a593Smuzhiyun __u32 command; 256*4882a593Smuzhiyun __u32 ret_code; 257*4882a593Smuzhiyun } __packed; 258*4882a593Smuzhiyun 259*4882a593SmuzhiyunThis region is exposed via region type VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD. 260*4882a593Smuzhiyun 261*4882a593SmuzhiyunCurrently, CLEAR SUBCHANNEL and HALT SUBCHANNEL use this region. 262*4882a593Smuzhiyun 263*4882a593Smuzhiyuncommand specifies the command to be issued; ret_code stores a return code 264*4882a593Smuzhiyunfor each access of the region. The following values may occur: 265*4882a593Smuzhiyun 266*4882a593Smuzhiyun``0`` 267*4882a593Smuzhiyun The operation was successful. 268*4882a593Smuzhiyun 269*4882a593Smuzhiyun``-ENODEV`` 270*4882a593Smuzhiyun The device was found to be not operational. 271*4882a593Smuzhiyun 272*4882a593Smuzhiyun``-EINVAL`` 273*4882a593Smuzhiyun A command other than halt or clear was specified. 274*4882a593Smuzhiyun 275*4882a593Smuzhiyun``-EIO`` 276*4882a593Smuzhiyun A request was issued while the device was not in a state ready to accept 277*4882a593Smuzhiyun requests. 278*4882a593Smuzhiyun 279*4882a593Smuzhiyun``-EAGAIN`` 280*4882a593Smuzhiyun A request was being processed, and the caller should retry. 281*4882a593Smuzhiyun 282*4882a593Smuzhiyun``-EBUSY`` 283*4882a593Smuzhiyun The subchannel was status pending or busy while processing a halt request. 284*4882a593Smuzhiyun 285*4882a593Smuzhiyunvfio-ccw schib region 286*4882a593Smuzhiyun--------------------- 287*4882a593Smuzhiyun 288*4882a593SmuzhiyunThe vfio-ccw schib region is used to return Subchannel-Information 289*4882a593SmuzhiyunBlock (SCHIB) data to userspace:: 290*4882a593Smuzhiyun 291*4882a593Smuzhiyun struct ccw_schib_region { 292*4882a593Smuzhiyun #define SCHIB_AREA_SIZE 52 293*4882a593Smuzhiyun __u8 schib_area[SCHIB_AREA_SIZE]; 294*4882a593Smuzhiyun } __packed; 295*4882a593Smuzhiyun 296*4882a593SmuzhiyunThis region is exposed via region type VFIO_REGION_SUBTYPE_CCW_SCHIB. 297*4882a593Smuzhiyun 298*4882a593SmuzhiyunReading this region triggers a STORE SUBCHANNEL to be issued to the 299*4882a593Smuzhiyunassociated hardware. 300*4882a593Smuzhiyun 301*4882a593Smuzhiyunvfio-ccw crw region 302*4882a593Smuzhiyun--------------------- 303*4882a593Smuzhiyun 304*4882a593SmuzhiyunThe vfio-ccw crw region is used to return Channel Report Word (CRW) 305*4882a593Smuzhiyundata to userspace:: 306*4882a593Smuzhiyun 307*4882a593Smuzhiyun struct ccw_crw_region { 308*4882a593Smuzhiyun __u32 crw; 309*4882a593Smuzhiyun __u32 pad; 310*4882a593Smuzhiyun } __packed; 311*4882a593Smuzhiyun 312*4882a593SmuzhiyunThis region is exposed via region type VFIO_REGION_SUBTYPE_CCW_CRW. 313*4882a593Smuzhiyun 314*4882a593SmuzhiyunReading this region returns a CRW if one that is relevant for this 315*4882a593Smuzhiyunsubchannel (e.g. one reporting changes in channel path state) is 316*4882a593Smuzhiyunpending, or all zeroes if not. If multiple CRWs are pending (including 317*4882a593Smuzhiyunpossibly chained CRWs), reading this region again will return the next 318*4882a593Smuzhiyunone, until no more CRWs are pending and zeroes are returned. This is 319*4882a593Smuzhiyunsimilar to how STORE CHANNEL REPORT WORD works. 320*4882a593Smuzhiyun 321*4882a593Smuzhiyunvfio-ccw operation details 322*4882a593Smuzhiyun-------------------------- 323*4882a593Smuzhiyun 324*4882a593Smuzhiyunvfio-ccw follows what vfio-pci did on the s390 platform and uses 325*4882a593Smuzhiyunvfio-iommu-type1 as the vfio iommu backend. 326*4882a593Smuzhiyun 327*4882a593Smuzhiyun* CCW translation APIs 328*4882a593Smuzhiyun A group of APIs (start with `cp_`) to do CCW translation. The CCWs 329*4882a593Smuzhiyun passed in by a user space program are organized with their guest 330*4882a593Smuzhiyun physical memory addresses. These APIs will copy the CCWs into kernel 331*4882a593Smuzhiyun space, and assemble a runnable kernel channel program by updating the 332*4882a593Smuzhiyun guest physical addresses with their corresponding host physical addresses. 333*4882a593Smuzhiyun Note that we have to use IDALs even for direct-access CCWs, as the 334*4882a593Smuzhiyun referenced memory can be located anywhere, including above 2G. 335*4882a593Smuzhiyun 336*4882a593Smuzhiyun* vfio_ccw device driver 337*4882a593Smuzhiyun This driver utilizes the CCW translation APIs and introduces 338*4882a593Smuzhiyun vfio_ccw, which is the driver for the I/O subchannel devices you want 339*4882a593Smuzhiyun to pass through. 340*4882a593Smuzhiyun vfio_ccw implements the following vfio ioctls:: 341*4882a593Smuzhiyun 342*4882a593Smuzhiyun VFIO_DEVICE_GET_INFO 343*4882a593Smuzhiyun VFIO_DEVICE_GET_IRQ_INFO 344*4882a593Smuzhiyun VFIO_DEVICE_GET_REGION_INFO 345*4882a593Smuzhiyun VFIO_DEVICE_RESET 346*4882a593Smuzhiyun VFIO_DEVICE_SET_IRQS 347*4882a593Smuzhiyun 348*4882a593Smuzhiyun This provides an I/O region, so that the user space program can pass a 349*4882a593Smuzhiyun channel program to the kernel, to do further CCW translation before 350*4882a593Smuzhiyun issuing them to a real device. 351*4882a593Smuzhiyun This also provides the SET_IRQ ioctl to setup an event notifier to 352*4882a593Smuzhiyun notify the user space program the I/O completion in an asynchronous 353*4882a593Smuzhiyun way. 354*4882a593Smuzhiyun 355*4882a593SmuzhiyunThe use of vfio-ccw is not limited to QEMU, while QEMU is definitely a 356*4882a593Smuzhiyungood example to get understand how these patches work. Here is a little 357*4882a593Smuzhiyunbit more detail how an I/O request triggered by the QEMU guest will be 358*4882a593Smuzhiyunhandled (without error handling). 359*4882a593Smuzhiyun 360*4882a593SmuzhiyunExplanation: 361*4882a593Smuzhiyun 362*4882a593Smuzhiyun- Q1-Q7: QEMU side process. 363*4882a593Smuzhiyun- K1-K5: Kernel side process. 364*4882a593Smuzhiyun 365*4882a593SmuzhiyunQ1. 366*4882a593Smuzhiyun Get I/O region info during initialization. 367*4882a593Smuzhiyun 368*4882a593SmuzhiyunQ2. 369*4882a593Smuzhiyun Setup event notifier and handler to handle I/O completion. 370*4882a593Smuzhiyun 371*4882a593Smuzhiyun... ... 372*4882a593Smuzhiyun 373*4882a593SmuzhiyunQ3. 374*4882a593Smuzhiyun Intercept a ssch instruction. 375*4882a593SmuzhiyunQ4. 376*4882a593Smuzhiyun Write the guest channel program and ORB to the I/O region. 377*4882a593Smuzhiyun 378*4882a593Smuzhiyun K1. 379*4882a593Smuzhiyun Copy from guest to kernel. 380*4882a593Smuzhiyun K2. 381*4882a593Smuzhiyun Translate the guest channel program to a host kernel space 382*4882a593Smuzhiyun channel program, which becomes runnable for a real device. 383*4882a593Smuzhiyun K3. 384*4882a593Smuzhiyun With the necessary information contained in the orb passed in 385*4882a593Smuzhiyun by QEMU, issue the ccwchain to the device. 386*4882a593Smuzhiyun K4. 387*4882a593Smuzhiyun Return the ssch CC code. 388*4882a593SmuzhiyunQ5. 389*4882a593Smuzhiyun Return the CC code to the guest. 390*4882a593Smuzhiyun 391*4882a593Smuzhiyun... ... 392*4882a593Smuzhiyun 393*4882a593Smuzhiyun K5. 394*4882a593Smuzhiyun Interrupt handler gets the I/O result and write the result to 395*4882a593Smuzhiyun the I/O region. 396*4882a593Smuzhiyun K6. 397*4882a593Smuzhiyun Signal QEMU to retrieve the result. 398*4882a593Smuzhiyun 399*4882a593SmuzhiyunQ6. 400*4882a593Smuzhiyun Get the signal and event handler reads out the result from the I/O 401*4882a593Smuzhiyun region. 402*4882a593SmuzhiyunQ7. 403*4882a593Smuzhiyun Update the irb for the guest. 404*4882a593Smuzhiyun 405*4882a593SmuzhiyunLimitations 406*4882a593Smuzhiyun----------- 407*4882a593Smuzhiyun 408*4882a593SmuzhiyunThe current vfio-ccw implementation focuses on supporting basic commands 409*4882a593Smuzhiyunneeded to implement block device functionality (read/write) of DASD/ECKD 410*4882a593Smuzhiyundevice only. Some commands may need special handling in the future, for 411*4882a593Smuzhiyunexample, anything related to path grouping. 412*4882a593Smuzhiyun 413*4882a593SmuzhiyunDASD is a kind of storage device. While ECKD is a data recording format. 414*4882a593SmuzhiyunMore information for DASD and ECKD could be found here: 415*4882a593Smuzhiyunhttps://en.wikipedia.org/wiki/Direct-access_storage_device 416*4882a593Smuzhiyunhttps://en.wikipedia.org/wiki/Count_key_data 417*4882a593Smuzhiyun 418*4882a593SmuzhiyunTogether with the corresponding work in QEMU, we can bring the passed 419*4882a593Smuzhiyunthrough DASD/ECKD device online in a guest now and use it as a block 420*4882a593Smuzhiyundevice. 421*4882a593Smuzhiyun 422*4882a593SmuzhiyunThe current code allows the guest to start channel programs via 423*4882a593SmuzhiyunSTART SUBCHANNEL, and to issue HALT SUBCHANNEL, CLEAR SUBCHANNEL, 424*4882a593Smuzhiyunand STORE SUBCHANNEL. 425*4882a593Smuzhiyun 426*4882a593SmuzhiyunCurrently all channel programs are prefetched, regardless of the 427*4882a593Smuzhiyunp-bit setting in the ORB. As a result, self modifying channel 428*4882a593Smuzhiyunprograms are not supported. For this reason, IPL has to be handled as 429*4882a593Smuzhiyuna special case by a userspace/guest program; this has been implemented 430*4882a593Smuzhiyunin QEMU's s390-ccw bios as of QEMU 4.1. 431*4882a593Smuzhiyun 432*4882a593Smuzhiyunvfio-ccw supports classic (command mode) channel I/O only. Transport 433*4882a593Smuzhiyunmode (HPF) is not supported. 434*4882a593Smuzhiyun 435*4882a593SmuzhiyunQDIO subchannels are currently not supported. Classic devices other than 436*4882a593SmuzhiyunDASD/ECKD might work, but have not been tested. 437*4882a593Smuzhiyun 438*4882a593SmuzhiyunReference 439*4882a593Smuzhiyun--------- 440*4882a593Smuzhiyun1. ESA/s390 Principles of Operation manual (IBM Form. No. SA22-7832) 441*4882a593Smuzhiyun2. ESA/390 Common I/O Device Commands manual (IBM Form. No. SA22-7204) 442*4882a593Smuzhiyun3. https://en.wikipedia.org/wiki/Channel_I/O 443*4882a593Smuzhiyun4. Documentation/s390/cds.rst 444*4882a593Smuzhiyun5. Documentation/driver-api/vfio.rst 445*4882a593Smuzhiyun6. Documentation/driver-api/vfio-mediated-device.rst 446