1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun.. _VAS-API: 3*4882a593Smuzhiyun 4*4882a593Smuzhiyun=================================================== 5*4882a593SmuzhiyunVirtual Accelerator Switchboard (VAS) userspace API 6*4882a593Smuzhiyun=================================================== 7*4882a593Smuzhiyun 8*4882a593SmuzhiyunIntroduction 9*4882a593Smuzhiyun============ 10*4882a593Smuzhiyun 11*4882a593SmuzhiyunPower9 processor introduced Virtual Accelerator Switchboard (VAS) which 12*4882a593Smuzhiyunallows both userspace and kernel communicate to co-processor 13*4882a593Smuzhiyun(hardware accelerator) referred to as the Nest Accelerator (NX). The NX 14*4882a593Smuzhiyununit comprises of one or more hardware engines or co-processor types 15*4882a593Smuzhiyunsuch as 842 compression, GZIP compression and encryption. On power9, 16*4882a593Smuzhiyunuserspace applications will have access to only GZIP Compression engine 17*4882a593Smuzhiyunwhich supports ZLIB and GZIP compression algorithms in the hardware. 18*4882a593Smuzhiyun 19*4882a593SmuzhiyunTo communicate with NX, kernel has to establish a channel or window and 20*4882a593Smuzhiyunthen requests can be submitted directly without kernel involvement. 21*4882a593SmuzhiyunRequests to the GZIP engine must be formatted as a co-processor Request 22*4882a593SmuzhiyunBlock (CRB) and these CRBs must be submitted to the NX using COPY/PASTE 23*4882a593Smuzhiyuninstructions to paste the CRB to hardware address that is associated with 24*4882a593Smuzhiyunthe engine's request queue. 25*4882a593Smuzhiyun 26*4882a593SmuzhiyunThe GZIP engine provides two priority levels of requests: Normal and 27*4882a593SmuzhiyunHigh. Only Normal requests are supported from userspace right now. 28*4882a593Smuzhiyun 29*4882a593SmuzhiyunThis document explains userspace API that is used to interact with 30*4882a593Smuzhiyunkernel to setup channel / window which can be used to send compression 31*4882a593Smuzhiyunrequests directly to NX accelerator. 32*4882a593Smuzhiyun 33*4882a593Smuzhiyun 34*4882a593SmuzhiyunOverview 35*4882a593Smuzhiyun======== 36*4882a593Smuzhiyun 37*4882a593SmuzhiyunApplication access to the GZIP engine is provided through 38*4882a593Smuzhiyun/dev/crypto/nx-gzip device node implemented by the VAS/NX device driver. 39*4882a593SmuzhiyunAn application must open the /dev/crypto/nx-gzip device to obtain a file 40*4882a593Smuzhiyundescriptor (fd). Then should issue VAS_TX_WIN_OPEN ioctl with this fd to 41*4882a593Smuzhiyunestablish connection to the engine. It means send window is opened on GZIP 42*4882a593Smuzhiyunengine for this process. Once a connection is established, the application 43*4882a593Smuzhiyunshould use the mmap() system call to map the hardware address of engine's 44*4882a593Smuzhiyunrequest queue into the application's virtual address space. 45*4882a593Smuzhiyun 46*4882a593SmuzhiyunThe application can then submit one or more requests to the engine by 47*4882a593Smuzhiyunusing copy/paste instructions and pasting the CRBs to the virtual address 48*4882a593Smuzhiyun(aka paste_address) returned by mmap(). User space can close the 49*4882a593Smuzhiyunestablished connection or send window by closing the file descriptior 50*4882a593Smuzhiyun(close(fd)) or upon the process exit. 51*4882a593Smuzhiyun 52*4882a593SmuzhiyunNote that applications can send several requests with the same window or 53*4882a593Smuzhiyuncan establish multiple windows, but one window for each file descriptor. 54*4882a593Smuzhiyun 55*4882a593SmuzhiyunFollowing sections provide additional details and references about the 56*4882a593Smuzhiyunindividual steps. 57*4882a593Smuzhiyun 58*4882a593SmuzhiyunNX-GZIP Device Node 59*4882a593Smuzhiyun=================== 60*4882a593Smuzhiyun 61*4882a593SmuzhiyunThere is one /dev/crypto/nx-gzip node in the system and it provides 62*4882a593Smuzhiyunaccess to all GZIP engines in the system. The only valid operations on 63*4882a593Smuzhiyun/dev/crypto/nx-gzip are: 64*4882a593Smuzhiyun 65*4882a593Smuzhiyun * open() the device for read and write. 66*4882a593Smuzhiyun * issue VAS_TX_WIN_OPEN ioctl 67*4882a593Smuzhiyun * mmap() the engine's request queue into application's virtual 68*4882a593Smuzhiyun address space (i.e. get a paste_address for the co-processor 69*4882a593Smuzhiyun engine). 70*4882a593Smuzhiyun * close the device node. 71*4882a593Smuzhiyun 72*4882a593SmuzhiyunOther file operations on this device node are undefined. 73*4882a593Smuzhiyun 74*4882a593SmuzhiyunNote that the copy and paste operations go directly to the hardware and 75*4882a593Smuzhiyundo not go through this device. Refer COPY/PASTE document for more 76*4882a593Smuzhiyundetails. 77*4882a593Smuzhiyun 78*4882a593SmuzhiyunAlthough a system may have several instances of the NX co-processor 79*4882a593Smuzhiyunengines (typically, one per P9 chip) there is just one 80*4882a593Smuzhiyun/dev/crypto/nx-gzip device node in the system. When the nx-gzip device 81*4882a593Smuzhiyunnode is opened, Kernel opens send window on a suitable instance of NX 82*4882a593Smuzhiyunaccelerator. It finds CPU on which the user process is executing and 83*4882a593Smuzhiyundetermine the NX instance for the corresponding chip on which this CPU 84*4882a593Smuzhiyunbelongs. 85*4882a593Smuzhiyun 86*4882a593SmuzhiyunApplications may chose a specific instance of the NX co-processor using 87*4882a593Smuzhiyunthe vas_id field in the VAS_TX_WIN_OPEN ioctl as detailed below. 88*4882a593Smuzhiyun 89*4882a593SmuzhiyunA userspace library libnxz is available here but still in development: 90*4882a593Smuzhiyun 91*4882a593Smuzhiyun https://github.com/abalib/power-gzip 92*4882a593Smuzhiyun 93*4882a593SmuzhiyunApplications that use inflate / deflate calls can link with libnxz 94*4882a593Smuzhiyuninstead of libz and use NX GZIP compression without any modification. 95*4882a593Smuzhiyun 96*4882a593SmuzhiyunOpen /dev/crypto/nx-gzip 97*4882a593Smuzhiyun======================== 98*4882a593Smuzhiyun 99*4882a593SmuzhiyunThe nx-gzip device should be opened for read and write. No special 100*4882a593Smuzhiyunprivileges are needed to open the device. Each window corresponds to one 101*4882a593Smuzhiyunfile descriptor. So if the userspace process needs multiple windows, 102*4882a593Smuzhiyunseveral open calls have to be issued. 103*4882a593Smuzhiyun 104*4882a593SmuzhiyunSee open(2) system call man pages for other details such as return values, 105*4882a593Smuzhiyunerror codes and restrictions. 106*4882a593Smuzhiyun 107*4882a593SmuzhiyunVAS_TX_WIN_OPEN ioctl 108*4882a593Smuzhiyun===================== 109*4882a593Smuzhiyun 110*4882a593SmuzhiyunApplications should use the VAS_TX_WIN_OPEN ioctl as follows to establish 111*4882a593Smuzhiyuna connection with NX co-processor engine: 112*4882a593Smuzhiyun 113*4882a593Smuzhiyun :: 114*4882a593Smuzhiyun 115*4882a593Smuzhiyun struct vas_tx_win_open_attr { 116*4882a593Smuzhiyun __u32 version; 117*4882a593Smuzhiyun __s16 vas_id; /* specific instance of vas or -1 118*4882a593Smuzhiyun for default */ 119*4882a593Smuzhiyun __u16 reserved1; 120*4882a593Smuzhiyun __u64 flags; /* For future use */ 121*4882a593Smuzhiyun __u64 reserved2[6]; 122*4882a593Smuzhiyun }; 123*4882a593Smuzhiyun 124*4882a593Smuzhiyun version: 125*4882a593Smuzhiyun The version field must be currently set to 1. 126*4882a593Smuzhiyun vas_id: 127*4882a593Smuzhiyun If '-1' is passed, kernel will make a best-effort attempt 128*4882a593Smuzhiyun to assign an optimal instance of NX for the process. To 129*4882a593Smuzhiyun select the specific VAS instance, refer 130*4882a593Smuzhiyun "Discovery of available VAS engines" section below. 131*4882a593Smuzhiyun 132*4882a593Smuzhiyun flags, reserved1 and reserved2[6] fields are for future extension 133*4882a593Smuzhiyun and must be set to 0. 134*4882a593Smuzhiyun 135*4882a593Smuzhiyun The attributes attr for the VAS_TX_WIN_OPEN ioctl are defined as 136*4882a593Smuzhiyun follows:: 137*4882a593Smuzhiyun 138*4882a593Smuzhiyun #define VAS_MAGIC 'v' 139*4882a593Smuzhiyun #define VAS_TX_WIN_OPEN _IOW(VAS_MAGIC, 1, 140*4882a593Smuzhiyun struct vas_tx_win_open_attr) 141*4882a593Smuzhiyun 142*4882a593Smuzhiyun struct vas_tx_win_open_attr attr; 143*4882a593Smuzhiyun rc = ioctl(fd, VAS_TX_WIN_OPEN, &attr); 144*4882a593Smuzhiyun 145*4882a593Smuzhiyun The VAS_TX_WIN_OPEN ioctl returns 0 on success. On errors, it 146*4882a593Smuzhiyun returns -1 and sets the errno variable to indicate the error. 147*4882a593Smuzhiyun 148*4882a593Smuzhiyun Error conditions: 149*4882a593Smuzhiyun 150*4882a593Smuzhiyun ====== ================================================ 151*4882a593Smuzhiyun EINVAL fd does not refer to a valid VAS device. 152*4882a593Smuzhiyun EINVAL Invalid vas ID 153*4882a593Smuzhiyun EINVAL version is not set with proper value 154*4882a593Smuzhiyun EEXIST Window is already opened for the given fd 155*4882a593Smuzhiyun ENOMEM Memory is not available to allocate window 156*4882a593Smuzhiyun ENOSPC System has too many active windows (connections) 157*4882a593Smuzhiyun opened 158*4882a593Smuzhiyun EINVAL reserved fields are not set to 0. 159*4882a593Smuzhiyun ====== ================================================ 160*4882a593Smuzhiyun 161*4882a593Smuzhiyun See the ioctl(2) man page for more details, error codes and 162*4882a593Smuzhiyun restrictions. 163*4882a593Smuzhiyun 164*4882a593Smuzhiyunmmap() NX-GZIP device 165*4882a593Smuzhiyun===================== 166*4882a593Smuzhiyun 167*4882a593SmuzhiyunThe mmap() system call for a NX-GZIP device fd returns a paste_address 168*4882a593Smuzhiyunthat the application can use to copy/paste its CRB to the hardware engines. 169*4882a593Smuzhiyun 170*4882a593Smuzhiyun :: 171*4882a593Smuzhiyun 172*4882a593Smuzhiyun paste_addr = mmap(addr, size, prot, flags, fd, offset); 173*4882a593Smuzhiyun 174*4882a593Smuzhiyun Only restrictions on mmap for a NX-GZIP device fd are: 175*4882a593Smuzhiyun 176*4882a593Smuzhiyun * size should be PAGE_SIZE 177*4882a593Smuzhiyun * offset parameter should be 0ULL 178*4882a593Smuzhiyun 179*4882a593Smuzhiyun Refer to mmap(2) man page for additional details/restrictions. 180*4882a593Smuzhiyun In addition to the error conditions listed on the mmap(2) man 181*4882a593Smuzhiyun page, can also fail with one of the following error codes: 182*4882a593Smuzhiyun 183*4882a593Smuzhiyun ====== ============================================= 184*4882a593Smuzhiyun EINVAL fd is not associated with an open window 185*4882a593Smuzhiyun (i.e mmap() does not follow a successful call 186*4882a593Smuzhiyun to the VAS_TX_WIN_OPEN ioctl). 187*4882a593Smuzhiyun EINVAL offset field is not 0ULL. 188*4882a593Smuzhiyun ====== ============================================= 189*4882a593Smuzhiyun 190*4882a593SmuzhiyunDiscovery of available VAS engines 191*4882a593Smuzhiyun================================== 192*4882a593Smuzhiyun 193*4882a593SmuzhiyunEach available VAS instance in the system will have a device tree node 194*4882a593Smuzhiyunlike /proc/device-tree/vas@* or /proc/device-tree/xscom@*/vas@*. 195*4882a593SmuzhiyunDetermine the chip or VAS instance and use the corresponding ibm,vas-id 196*4882a593Smuzhiyunproperty value in this node to select specific VAS instance. 197*4882a593Smuzhiyun 198*4882a593SmuzhiyunCopy/Paste operations 199*4882a593Smuzhiyun===================== 200*4882a593Smuzhiyun 201*4882a593SmuzhiyunApplications should use the copy and paste instructions to send CRB to NX. 202*4882a593SmuzhiyunRefer section 4.4 in PowerISA for Copy/Paste instructions: 203*4882a593Smuzhiyunhttps://openpowerfoundation.org/?resource_lib=power-isa-version-3-0 204*4882a593Smuzhiyun 205*4882a593SmuzhiyunCRB Specification and use NX 206*4882a593Smuzhiyun============================ 207*4882a593Smuzhiyun 208*4882a593SmuzhiyunApplications should format requests to the co-processor using the 209*4882a593Smuzhiyunco-processor Request Block (CRBs). Refer NX-GZIP user's manual for the format 210*4882a593Smuzhiyunof CRB and use NX from userspace such as sending requests and checking 211*4882a593Smuzhiyunrequest status. 212*4882a593Smuzhiyun 213*4882a593SmuzhiyunNX Fault handling 214*4882a593Smuzhiyun================= 215*4882a593Smuzhiyun 216*4882a593SmuzhiyunApplications send requests to NX and wait for the status by polling on 217*4882a593Smuzhiyunco-processor Status Block (CSB) flags. NX updates status in CSB after each 218*4882a593Smuzhiyunrequest is processed. Refer NX-GZIP user's manual for the format of CSB and 219*4882a593Smuzhiyunstatus flags. 220*4882a593Smuzhiyun 221*4882a593SmuzhiyunIn case if NX encounters translation error (called NX page fault) on CSB 222*4882a593Smuzhiyunaddress or any request buffer, raises an interrupt on the CPU to handle the 223*4882a593Smuzhiyunfault. Page fault can happen if an application passes invalid addresses or 224*4882a593Smuzhiyunrequest buffers are not in memory. The operating system handles the fault by 225*4882a593Smuzhiyunupdating CSB with the following data:: 226*4882a593Smuzhiyun 227*4882a593Smuzhiyun csb.flags = CSB_V; 228*4882a593Smuzhiyun csb.cc = CSB_CC_FAULT_ADDRESS; 229*4882a593Smuzhiyun csb.ce = CSB_CE_TERMINATION; 230*4882a593Smuzhiyun csb.address = fault_address; 231*4882a593Smuzhiyun 232*4882a593SmuzhiyunWhen an application receives translation error, it can touch or access 233*4882a593Smuzhiyunthe page that has a fault address so that this page will be in memory. Then 234*4882a593Smuzhiyunthe application can resend this request to NX. 235*4882a593Smuzhiyun 236*4882a593SmuzhiyunIf the OS can not update CSB due to invalid CSB address, sends SEGV signal 237*4882a593Smuzhiyunto the process who opened the send window on which the original request was 238*4882a593Smuzhiyunissued. This signal returns with the following siginfo struct:: 239*4882a593Smuzhiyun 240*4882a593Smuzhiyun siginfo.si_signo = SIGSEGV; 241*4882a593Smuzhiyun siginfo.si_errno = EFAULT; 242*4882a593Smuzhiyun siginfo.si_code = SEGV_MAPERR; 243*4882a593Smuzhiyun siginfo.si_addr = CSB adress; 244*4882a593Smuzhiyun 245*4882a593SmuzhiyunIn the case of multi-thread applications, NX send windows can be shared 246*4882a593Smuzhiyunacross all threads. For example, a child thread can open a send window, 247*4882a593Smuzhiyunbut other threads can send requests to NX using this window. These 248*4882a593Smuzhiyunrequests will be successful even in the case of OS handling faults as long 249*4882a593Smuzhiyunas CSB address is valid. If the NX request contains an invalid CSB address, 250*4882a593Smuzhiyunthe signal will be sent to the child thread that opened the window. But if 251*4882a593Smuzhiyunthe thread is exited without closing the window and the request is issued 252*4882a593Smuzhiyunusing this window. the signal will be issued to the thread group leader 253*4882a593Smuzhiyun(tgid). It is up to the application whether to ignore or handle these 254*4882a593Smuzhiyunsignals. 255*4882a593Smuzhiyun 256*4882a593SmuzhiyunNX-GZIP User's Manual: 257*4882a593Smuzhiyunhttps://github.com/libnxz/power-gzip/blob/master/power_nx_gzip_um.pdf 258*4882a593Smuzhiyun 259*4882a593SmuzhiyunSimple example 260*4882a593Smuzhiyun============== 261*4882a593Smuzhiyun 262*4882a593Smuzhiyun :: 263*4882a593Smuzhiyun 264*4882a593Smuzhiyun int use_nx_gzip() 265*4882a593Smuzhiyun { 266*4882a593Smuzhiyun int rc, fd; 267*4882a593Smuzhiyun void *addr; 268*4882a593Smuzhiyun struct vas_setup_attr txattr; 269*4882a593Smuzhiyun 270*4882a593Smuzhiyun fd = open("/dev/crypto/nx-gzip", O_RDWR); 271*4882a593Smuzhiyun if (fd < 0) { 272*4882a593Smuzhiyun fprintf(stderr, "open nx-gzip failed\n"); 273*4882a593Smuzhiyun return -1; 274*4882a593Smuzhiyun } 275*4882a593Smuzhiyun memset(&txattr, 0, sizeof(txattr)); 276*4882a593Smuzhiyun txattr.version = 1; 277*4882a593Smuzhiyun txattr.vas_id = -1 278*4882a593Smuzhiyun rc = ioctl(fd, VAS_TX_WIN_OPEN, 279*4882a593Smuzhiyun (unsigned long)&txattr); 280*4882a593Smuzhiyun if (rc < 0) { 281*4882a593Smuzhiyun fprintf(stderr, "ioctl() n %d, error %d\n", 282*4882a593Smuzhiyun rc, errno); 283*4882a593Smuzhiyun return rc; 284*4882a593Smuzhiyun } 285*4882a593Smuzhiyun addr = mmap(NULL, 4096, PROT_READ|PROT_WRITE, 286*4882a593Smuzhiyun MAP_SHARED, fd, 0ULL); 287*4882a593Smuzhiyun if (addr == MAP_FAILED) { 288*4882a593Smuzhiyun fprintf(stderr, "mmap() failed, errno %d\n", 289*4882a593Smuzhiyun errno); 290*4882a593Smuzhiyun return -errno; 291*4882a593Smuzhiyun } 292*4882a593Smuzhiyun do { 293*4882a593Smuzhiyun //Format CRB request with compression or 294*4882a593Smuzhiyun //uncompression 295*4882a593Smuzhiyun // Refer tests for vas_copy/vas_paste 296*4882a593Smuzhiyun vas_copy((&crb, 0, 1); 297*4882a593Smuzhiyun vas_paste(addr, 0, 1); 298*4882a593Smuzhiyun // Poll on csb.flags with timeout 299*4882a593Smuzhiyun // csb address is listed in CRB 300*4882a593Smuzhiyun } while (true) 301*4882a593Smuzhiyun close(fd) or window can be closed upon process exit 302*4882a593Smuzhiyun } 303*4882a593Smuzhiyun 304*4882a593Smuzhiyun Refer https://github.com/abalib/power-gzip for tests or more 305*4882a593Smuzhiyun use cases. 306