1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun.. iommu: 3*4882a593Smuzhiyun 4*4882a593Smuzhiyun===================================== 5*4882a593SmuzhiyunIOMMU Userspace API 6*4882a593Smuzhiyun===================================== 7*4882a593Smuzhiyun 8*4882a593SmuzhiyunIOMMU UAPI is used for virtualization cases where communications are 9*4882a593Smuzhiyunneeded between physical and virtual IOMMU drivers. For baremetal 10*4882a593Smuzhiyunusage, the IOMMU is a system device which does not need to communicate 11*4882a593Smuzhiyunwith userspace directly. 12*4882a593Smuzhiyun 13*4882a593SmuzhiyunThe primary use cases are guest Shared Virtual Address (SVA) and 14*4882a593Smuzhiyunguest IO virtual address (IOVA), wherein the vIOMMU implementation 15*4882a593Smuzhiyunrelies on the physical IOMMU and for this reason requires interactions 16*4882a593Smuzhiyunwith the host driver. 17*4882a593Smuzhiyun 18*4882a593Smuzhiyun.. contents:: :local: 19*4882a593Smuzhiyun 20*4882a593SmuzhiyunFunctionalities 21*4882a593Smuzhiyun=============== 22*4882a593SmuzhiyunCommunications of user and kernel involve both directions. The 23*4882a593Smuzhiyunsupported user-kernel APIs are as follows: 24*4882a593Smuzhiyun 25*4882a593Smuzhiyun1. Bind/Unbind guest PASID (e.g. Intel VT-d) 26*4882a593Smuzhiyun2. Bind/Unbind guest PASID table (e.g. ARM SMMU) 27*4882a593Smuzhiyun3. Invalidate IOMMU caches upon guest requests 28*4882a593Smuzhiyun4. Report errors to the guest and serve page requests 29*4882a593Smuzhiyun 30*4882a593SmuzhiyunRequirements 31*4882a593Smuzhiyun============ 32*4882a593SmuzhiyunThe IOMMU UAPIs are generic and extensible to meet the following 33*4882a593Smuzhiyunrequirements: 34*4882a593Smuzhiyun 35*4882a593Smuzhiyun1. Emulated and para-virtualised vIOMMUs 36*4882a593Smuzhiyun2. Multiple vendors (Intel VT-d, ARM SMMU, etc.) 37*4882a593Smuzhiyun3. Extensions to the UAPI shall not break existing userspace 38*4882a593Smuzhiyun 39*4882a593SmuzhiyunInterfaces 40*4882a593Smuzhiyun========== 41*4882a593SmuzhiyunAlthough the data structures defined in IOMMU UAPI are self-contained, 42*4882a593Smuzhiyunthere are no user API functions introduced. Instead, IOMMU UAPI is 43*4882a593Smuzhiyundesigned to work with existing user driver frameworks such as VFIO. 44*4882a593Smuzhiyun 45*4882a593SmuzhiyunExtension Rules & Precautions 46*4882a593Smuzhiyun----------------------------- 47*4882a593SmuzhiyunWhen IOMMU UAPI gets extended, the data structures can *only* be 48*4882a593Smuzhiyunmodified in two ways: 49*4882a593Smuzhiyun 50*4882a593Smuzhiyun1. Adding new fields by re-purposing the padding[] field. No size change. 51*4882a593Smuzhiyun2. Adding new union members at the end. May increase the structure sizes. 52*4882a593Smuzhiyun 53*4882a593SmuzhiyunNo new fields can be added *after* the variable sized union in that it 54*4882a593Smuzhiyunwill break backward compatibility when offset moves. A new flag must 55*4882a593Smuzhiyunbe introduced whenever a change affects the structure using either 56*4882a593Smuzhiyunmethod. The IOMMU driver processes the data based on flags which 57*4882a593Smuzhiyunensures backward compatibility. 58*4882a593Smuzhiyun 59*4882a593SmuzhiyunVersion field is only reserved for the unlikely event of UAPI upgrade 60*4882a593Smuzhiyunat its entirety. 61*4882a593Smuzhiyun 62*4882a593SmuzhiyunIt's *always* the caller's responsibility to indicate the size of the 63*4882a593Smuzhiyunstructure passed by setting argsz appropriately. 64*4882a593SmuzhiyunThough at the same time, argsz is user provided data which is not 65*4882a593Smuzhiyuntrusted. The argsz field allows the user app to indicate how much data 66*4882a593Smuzhiyunit is providing; it's still the kernel's responsibility to validate 67*4882a593Smuzhiyunwhether it's correct and sufficient for the requested operation. 68*4882a593Smuzhiyun 69*4882a593SmuzhiyunCompatibility Checking 70*4882a593Smuzhiyun---------------------- 71*4882a593SmuzhiyunWhen IOMMU UAPI extension results in some structure size increase, 72*4882a593SmuzhiyunIOMMU UAPI code shall handle the following cases: 73*4882a593Smuzhiyun 74*4882a593Smuzhiyun1. User and kernel has exact size match 75*4882a593Smuzhiyun2. An older user with older kernel header (smaller UAPI size) running on a 76*4882a593Smuzhiyun newer kernel (larger UAPI size) 77*4882a593Smuzhiyun3. A newer user with newer kernel header (larger UAPI size) running 78*4882a593Smuzhiyun on an older kernel. 79*4882a593Smuzhiyun4. A malicious/misbehaving user passing illegal/invalid size but within 80*4882a593Smuzhiyun range. The data may contain garbage. 81*4882a593Smuzhiyun 82*4882a593SmuzhiyunFeature Checking 83*4882a593Smuzhiyun---------------- 84*4882a593SmuzhiyunWhile launching a guest with vIOMMU, it is strongly advised to check 85*4882a593Smuzhiyunthe compatibility upfront, as some subsequent errors happening during 86*4882a593SmuzhiyunvIOMMU operation, such as cache invalidation failures cannot be nicely 87*4882a593Smuzhiyunescalated to the guest due to IOMMU specifications. This can lead to 88*4882a593Smuzhiyuncatastrophic failures for the users. 89*4882a593Smuzhiyun 90*4882a593SmuzhiyunUser applications such as QEMU are expected to import kernel UAPI 91*4882a593Smuzhiyunheaders. Backward compatibility is supported per feature flags. 92*4882a593SmuzhiyunFor example, an older QEMU (with older kernel header) can run on newer 93*4882a593Smuzhiyunkernel. Newer QEMU (with new kernel header) may refuse to initialize 94*4882a593Smuzhiyunon an older kernel if new feature flags are not supported by older 95*4882a593Smuzhiyunkernel. Simply recompiling existing code with newer kernel header should 96*4882a593Smuzhiyunnot be an issue in that only existing flags are used. 97*4882a593Smuzhiyun 98*4882a593SmuzhiyunIOMMU vendor driver should report the below features to IOMMU UAPI 99*4882a593Smuzhiyunconsumers (e.g. via VFIO). 100*4882a593Smuzhiyun 101*4882a593Smuzhiyun1. IOMMU_NESTING_FEAT_SYSWIDE_PASID 102*4882a593Smuzhiyun2. IOMMU_NESTING_FEAT_BIND_PGTBL 103*4882a593Smuzhiyun3. IOMMU_NESTING_FEAT_BIND_PASID_TABLE 104*4882a593Smuzhiyun4. IOMMU_NESTING_FEAT_CACHE_INVLD 105*4882a593Smuzhiyun5. IOMMU_NESTING_FEAT_PAGE_REQUEST 106*4882a593Smuzhiyun 107*4882a593SmuzhiyunTake VFIO as example, upon request from VFIO userspace (e.g. QEMU), 108*4882a593SmuzhiyunVFIO kernel code shall query IOMMU vendor driver for the support of 109*4882a593Smuzhiyunthe above features. Query result can then be reported back to the 110*4882a593Smuzhiyunuserspace caller. Details can be found in 111*4882a593SmuzhiyunDocumentation/driver-api/vfio.rst. 112*4882a593Smuzhiyun 113*4882a593Smuzhiyun 114*4882a593SmuzhiyunData Passing Example with VFIO 115*4882a593Smuzhiyun------------------------------ 116*4882a593SmuzhiyunAs the ubiquitous userspace driver framework, VFIO is already IOMMU 117*4882a593Smuzhiyunaware and shares many key concepts such as device model, group, and 118*4882a593Smuzhiyunprotection domain. Other user driver frameworks can also be extended 119*4882a593Smuzhiyunto support IOMMU UAPI but it is outside the scope of this document. 120*4882a593Smuzhiyun 121*4882a593SmuzhiyunIn this tight-knit VFIO-IOMMU interface, the ultimate consumer of the 122*4882a593SmuzhiyunIOMMU UAPI data is the host IOMMU driver. VFIO facilitates user-kernel 123*4882a593Smuzhiyuntransport, capability checking, security, and life cycle management of 124*4882a593Smuzhiyunprocess address space ID (PASID). 125*4882a593Smuzhiyun 126*4882a593SmuzhiyunVFIO layer conveys the data structures down to the IOMMU driver. It 127*4882a593Smuzhiyunfollows the pattern below:: 128*4882a593Smuzhiyun 129*4882a593Smuzhiyun struct { 130*4882a593Smuzhiyun __u32 argsz; 131*4882a593Smuzhiyun __u32 flags; 132*4882a593Smuzhiyun __u8 data[]; 133*4882a593Smuzhiyun }; 134*4882a593Smuzhiyun 135*4882a593SmuzhiyunHere data[] contains the IOMMU UAPI data structures. VFIO has the 136*4882a593Smuzhiyunfreedom to bundle the data as well as parse data size based on its own flags. 137*4882a593Smuzhiyun 138*4882a593SmuzhiyunIn order to determine the size and feature set of the user data, argsz 139*4882a593Smuzhiyunand flags (or the equivalent) are also embedded in the IOMMU UAPI data 140*4882a593Smuzhiyunstructures. 141*4882a593Smuzhiyun 142*4882a593SmuzhiyunA "__u32 argsz" field is *always* at the beginning of each structure. 143*4882a593Smuzhiyun 144*4882a593SmuzhiyunFor example: 145*4882a593Smuzhiyun:: 146*4882a593Smuzhiyun 147*4882a593Smuzhiyun struct iommu_cache_invalidate_info { 148*4882a593Smuzhiyun __u32 argsz; 149*4882a593Smuzhiyun #define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1 150*4882a593Smuzhiyun __u32 version; 151*4882a593Smuzhiyun /* IOMMU paging structure cache */ 152*4882a593Smuzhiyun #define IOMMU_CACHE_INV_TYPE_IOTLB (1 << 0) /* IOMMU IOTLB */ 153*4882a593Smuzhiyun #define IOMMU_CACHE_INV_TYPE_DEV_IOTLB (1 << 1) /* Device IOTLB */ 154*4882a593Smuzhiyun #define IOMMU_CACHE_INV_TYPE_PASID (1 << 2) /* PASID cache */ 155*4882a593Smuzhiyun #define IOMMU_CACHE_INV_TYPE_NR (3) 156*4882a593Smuzhiyun __u8 cache; 157*4882a593Smuzhiyun __u8 granularity; 158*4882a593Smuzhiyun __u8 padding[6]; 159*4882a593Smuzhiyun union { 160*4882a593Smuzhiyun struct iommu_inv_pasid_info pasid_info; 161*4882a593Smuzhiyun struct iommu_inv_addr_info addr_info; 162*4882a593Smuzhiyun } granu; 163*4882a593Smuzhiyun }; 164*4882a593Smuzhiyun 165*4882a593SmuzhiyunVFIO is responsible for checking its own argsz and flags. It then 166*4882a593Smuzhiyuninvokes appropriate IOMMU UAPI functions. The user pointers are passed 167*4882a593Smuzhiyunto the IOMMU layer for further processing. The responsibilities are 168*4882a593Smuzhiyundivided as follows: 169*4882a593Smuzhiyun 170*4882a593Smuzhiyun- Generic IOMMU layer checks argsz range based on UAPI data in the 171*4882a593Smuzhiyun current kernel version. 172*4882a593Smuzhiyun 173*4882a593Smuzhiyun- Generic IOMMU layer checks content of the UAPI data for non-zero 174*4882a593Smuzhiyun reserved bits in flags, padding fields, and unsupported version. 175*4882a593Smuzhiyun This is to ensure not breaking userspace in the future when these 176*4882a593Smuzhiyun fields or flags are used. 177*4882a593Smuzhiyun 178*4882a593Smuzhiyun- Vendor IOMMU driver checks argsz based on vendor flags. UAPI data 179*4882a593Smuzhiyun is consumed based on flags. Vendor driver has access to 180*4882a593Smuzhiyun unadulterated argsz value in case of vendor specific future 181*4882a593Smuzhiyun extensions. Currently, it does not perform the copy_from_user() 182*4882a593Smuzhiyun itself. A __user pointer can be provided in some future scenarios 183*4882a593Smuzhiyun where there's vendor data outside of the structure definition. 184*4882a593Smuzhiyun 185*4882a593SmuzhiyunIOMMU code treats UAPI data in two categories: 186*4882a593Smuzhiyun 187*4882a593Smuzhiyun- structure contains vendor data 188*4882a593Smuzhiyun (Example: iommu_uapi_cache_invalidate()) 189*4882a593Smuzhiyun 190*4882a593Smuzhiyun- structure contains only generic data 191*4882a593Smuzhiyun (Example: iommu_uapi_sva_bind_gpasid()) 192*4882a593Smuzhiyun 193*4882a593Smuzhiyun 194*4882a593Smuzhiyun 195*4882a593SmuzhiyunSharing UAPI with in-kernel users 196*4882a593Smuzhiyun--------------------------------- 197*4882a593SmuzhiyunFor UAPIs that are shared with in-kernel users, a wrapper function is 198*4882a593Smuzhiyunprovided to distinguish the callers. For example, 199*4882a593Smuzhiyun 200*4882a593SmuzhiyunUserspace caller :: 201*4882a593Smuzhiyun 202*4882a593Smuzhiyun int iommu_uapi_sva_unbind_gpasid(struct iommu_domain *domain, 203*4882a593Smuzhiyun struct device *dev, 204*4882a593Smuzhiyun void __user *udata) 205*4882a593Smuzhiyun 206*4882a593SmuzhiyunIn-kernel caller :: 207*4882a593Smuzhiyun 208*4882a593Smuzhiyun int iommu_sva_unbind_gpasid(struct iommu_domain *domain, 209*4882a593Smuzhiyun struct device *dev, ioasid_t ioasid); 210