1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun=========================== 4*4882a593SmuzhiyunHypercall Op-codes (hcalls) 5*4882a593Smuzhiyun=========================== 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunOverview 8*4882a593Smuzhiyun========= 9*4882a593Smuzhiyun 10*4882a593SmuzhiyunVirtualization on 64-bit Power Book3S Platforms is based on the PAPR 11*4882a593Smuzhiyunspecification [1]_ which describes the run-time environment for a guest 12*4882a593Smuzhiyunoperating system and how it should interact with the hypervisor for 13*4882a593Smuzhiyunprivileged operations. Currently there are two PAPR compliant hypervisors: 14*4882a593Smuzhiyun 15*4882a593Smuzhiyun- **IBM PowerVM (PHYP)**: IBM's proprietary hypervisor that supports AIX, 16*4882a593Smuzhiyun IBM-i and Linux as supported guests (termed as Logical Partitions 17*4882a593Smuzhiyun or LPARS). It supports the full PAPR specification. 18*4882a593Smuzhiyun 19*4882a593Smuzhiyun- **Qemu/KVM**: Supports PPC64 linux guests running on a PPC64 linux host. 20*4882a593Smuzhiyun Though it only implements a subset of PAPR specification called LoPAPR [2]_. 21*4882a593Smuzhiyun 22*4882a593SmuzhiyunOn PPC64 arch a guest kernel running on top of a PAPR hypervisor is called 23*4882a593Smuzhiyuna *pSeries guest*. A pseries guest runs in a supervisor mode (HV=0) and must 24*4882a593Smuzhiyunissue hypercalls to the hypervisor whenever it needs to perform an action 25*4882a593Smuzhiyunthat is hypervisor priviledged [3]_ or for other services managed by the 26*4882a593Smuzhiyunhypervisor. 27*4882a593Smuzhiyun 28*4882a593SmuzhiyunHence a Hypercall (hcall) is essentially a request by the pseries guest 29*4882a593Smuzhiyunasking hypervisor to perform a privileged operation on behalf of the guest. The 30*4882a593Smuzhiyunguest issues a with necessary input operands. The hypervisor after performing 31*4882a593Smuzhiyunthe privilege operation returns a status code and output operands back to the 32*4882a593Smuzhiyunguest. 33*4882a593Smuzhiyun 34*4882a593SmuzhiyunHCALL ABI 35*4882a593Smuzhiyun========= 36*4882a593SmuzhiyunThe ABI specification for a hcall between a pseries guest and PAPR hypervisor 37*4882a593Smuzhiyunis covered in section 14.5.3 of ref [2]_. Switch to the Hypervisor context is 38*4882a593Smuzhiyundone via the instruction **HVCS** that expects the Opcode for hcall is set in *r3* 39*4882a593Smuzhiyunand any in-arguments for the hcall are provided in registers *r4-r12*. If values 40*4882a593Smuzhiyunhave to be passed through a memory buffer, the data stored in that buffer should be 41*4882a593Smuzhiyunin Big-endian byte order. 42*4882a593Smuzhiyun 43*4882a593SmuzhiyunOnce control is returns back to the guest after hypervisor has serviced the 44*4882a593Smuzhiyun'HVCS' instruction the return value of the hcall is available in *r3* and any 45*4882a593Smuzhiyunout values are returned in registers *r4-r12*. Again like in case of in-arguments, 46*4882a593Smuzhiyunany out values stored in a memory buffer will be in Big-endian byte order. 47*4882a593Smuzhiyun 48*4882a593SmuzhiyunPowerpc arch code provides convenient wrappers named **plpar_hcall_xxx** defined 49*4882a593Smuzhiyunin a arch specific header [4]_ to issue hcalls from the linux kernel 50*4882a593Smuzhiyunrunning as pseries guest. 51*4882a593Smuzhiyun 52*4882a593SmuzhiyunRegister Conventions 53*4882a593Smuzhiyun==================== 54*4882a593Smuzhiyun 55*4882a593SmuzhiyunAny hcall should follow same register convention as described in section 2.2.1.1 56*4882a593Smuzhiyunof "64-Bit ELF V2 ABI Specification: Power Architecture"[5]_. Table below 57*4882a593Smuzhiyunsummarizes these conventions: 58*4882a593Smuzhiyun 59*4882a593Smuzhiyun+----------+----------+-------------------------------------------+ 60*4882a593Smuzhiyun| Register |Volatile | Purpose | 61*4882a593Smuzhiyun| Range |(Y/N) | | 62*4882a593Smuzhiyun+==========+==========+===========================================+ 63*4882a593Smuzhiyun| r0 | Y | Optional-usage | 64*4882a593Smuzhiyun+----------+----------+-------------------------------------------+ 65*4882a593Smuzhiyun| r1 | N | Stack Pointer | 66*4882a593Smuzhiyun+----------+----------+-------------------------------------------+ 67*4882a593Smuzhiyun| r2 | N | TOC | 68*4882a593Smuzhiyun+----------+----------+-------------------------------------------+ 69*4882a593Smuzhiyun| r3 | Y | hcall opcode/return value | 70*4882a593Smuzhiyun+----------+----------+-------------------------------------------+ 71*4882a593Smuzhiyun| r4-r10 | Y | in and out values | 72*4882a593Smuzhiyun+----------+----------+-------------------------------------------+ 73*4882a593Smuzhiyun| r11 | Y | Optional-usage/Environmental pointer | 74*4882a593Smuzhiyun+----------+----------+-------------------------------------------+ 75*4882a593Smuzhiyun| r12 | Y | Optional-usage/Function entry address at | 76*4882a593Smuzhiyun| | | global entry point | 77*4882a593Smuzhiyun+----------+----------+-------------------------------------------+ 78*4882a593Smuzhiyun| r13 | N | Thread-Pointer | 79*4882a593Smuzhiyun+----------+----------+-------------------------------------------+ 80*4882a593Smuzhiyun| r14-r31 | N | Local Variables | 81*4882a593Smuzhiyun+----------+----------+-------------------------------------------+ 82*4882a593Smuzhiyun| LR | Y | Link Register | 83*4882a593Smuzhiyun+----------+----------+-------------------------------------------+ 84*4882a593Smuzhiyun| CTR | Y | Loop Counter | 85*4882a593Smuzhiyun+----------+----------+-------------------------------------------+ 86*4882a593Smuzhiyun| XER | Y | Fixed-point exception register. | 87*4882a593Smuzhiyun+----------+----------+-------------------------------------------+ 88*4882a593Smuzhiyun| CR0-1 | Y | Condition register fields. | 89*4882a593Smuzhiyun+----------+----------+-------------------------------------------+ 90*4882a593Smuzhiyun| CR2-4 | N | Condition register fields. | 91*4882a593Smuzhiyun+----------+----------+-------------------------------------------+ 92*4882a593Smuzhiyun| CR5-7 | Y | Condition register fields. | 93*4882a593Smuzhiyun+----------+----------+-------------------------------------------+ 94*4882a593Smuzhiyun| Others | N | | 95*4882a593Smuzhiyun+----------+----------+-------------------------------------------+ 96*4882a593Smuzhiyun 97*4882a593SmuzhiyunDRC & DRC Indexes 98*4882a593Smuzhiyun================= 99*4882a593Smuzhiyun:: 100*4882a593Smuzhiyun 101*4882a593Smuzhiyun DR1 Guest 102*4882a593Smuzhiyun +--+ +------------+ +---------+ 103*4882a593Smuzhiyun | | <----> | | | User | 104*4882a593Smuzhiyun +--+ DRC1 | | DRC | Space | 105*4882a593Smuzhiyun | PAPR | Index +---------+ 106*4882a593Smuzhiyun DR2 | Hypervisor | | | 107*4882a593Smuzhiyun +--+ | | <-----> | Kernel | 108*4882a593Smuzhiyun | | <----> | | Hcall | | 109*4882a593Smuzhiyun +--+ DRC2 +------------+ +---------+ 110*4882a593Smuzhiyun 111*4882a593SmuzhiyunPAPR hypervisor terms shared hardware resources like PCI devices, NVDIMMs etc 112*4882a593Smuzhiyunavailable for use by LPARs as Dynamic Resource (DR). When a DR is allocated to 113*4882a593Smuzhiyunan LPAR, PHYP creates a data-structure called Dynamic Resource Connector (DRC) 114*4882a593Smuzhiyunto manage LPAR access. An LPAR refers to a DRC via an opaque 32-bit number 115*4882a593Smuzhiyuncalled DRC-Index. The DRC-index value is provided to the LPAR via device-tree 116*4882a593Smuzhiyunwhere its present as an attribute in the device tree node associated with the 117*4882a593SmuzhiyunDR. 118*4882a593Smuzhiyun 119*4882a593SmuzhiyunHCALL Return-values 120*4882a593Smuzhiyun=================== 121*4882a593Smuzhiyun 122*4882a593SmuzhiyunAfter servicing the hcall, hypervisor sets the return-value in *r3* indicating 123*4882a593Smuzhiyunsuccess or failure of the hcall. In case of a failure an error code indicates 124*4882a593Smuzhiyunthe cause for error. These codes are defined and documented in arch specific 125*4882a593Smuzhiyunheader [4]_. 126*4882a593Smuzhiyun 127*4882a593SmuzhiyunIn some cases a hcall can potentially take a long time and need to be issued 128*4882a593Smuzhiyunmultiple times in order to be completely serviced. These hcalls will usually 129*4882a593Smuzhiyunaccept an opaque value *continue-token* within there argument list and a 130*4882a593Smuzhiyunreturn value of *H_CONTINUE* indicates that hypervisor hasn't still finished 131*4882a593Smuzhiyunservicing the hcall yet. 132*4882a593Smuzhiyun 133*4882a593SmuzhiyunTo make such hcalls the guest need to set *continue-token == 0* for the 134*4882a593Smuzhiyuninitial call and use the hypervisor returned value of *continue-token* 135*4882a593Smuzhiyunfor each subsequent hcall until hypervisor returns a non *H_CONTINUE* 136*4882a593Smuzhiyunreturn value. 137*4882a593Smuzhiyun 138*4882a593SmuzhiyunHCALL Op-codes 139*4882a593Smuzhiyun============== 140*4882a593Smuzhiyun 141*4882a593SmuzhiyunBelow is a partial list of HCALLs that are supported by PHYP. For the 142*4882a593Smuzhiyuncorresponding opcode values please look into the arch specific header [4]_: 143*4882a593Smuzhiyun 144*4882a593Smuzhiyun**H_SCM_READ_METADATA** 145*4882a593Smuzhiyun 146*4882a593Smuzhiyun| Input: *drcIndex, offset, buffer-address, numBytesToRead* 147*4882a593Smuzhiyun| Out: *numBytesRead* 148*4882a593Smuzhiyun| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_Hardware* 149*4882a593Smuzhiyun 150*4882a593SmuzhiyunGiven a DRC Index of an NVDIMM, read N-bytes from the the metadata area 151*4882a593Smuzhiyunassociated with it, at a specified offset and copy it to provided buffer. 152*4882a593SmuzhiyunThe metadata area stores configuration information such as label information, 153*4882a593Smuzhiyunbad-blocks etc. The metadata area is located out-of-band of NVDIMM storage 154*4882a593Smuzhiyunarea hence a separate access semantics is provided. 155*4882a593Smuzhiyun 156*4882a593Smuzhiyun**H_SCM_WRITE_METADATA** 157*4882a593Smuzhiyun 158*4882a593Smuzhiyun| Input: *drcIndex, offset, data, numBytesToWrite* 159*4882a593Smuzhiyun| Out: *None* 160*4882a593Smuzhiyun| Return Value: *H_Success, H_Parameter, H_P2, H_P4, H_Hardware* 161*4882a593Smuzhiyun 162*4882a593SmuzhiyunGiven a DRC Index of an NVDIMM, write N-bytes to the metadata area 163*4882a593Smuzhiyunassociated with it, at the specified offset and from the provided buffer. 164*4882a593Smuzhiyun 165*4882a593Smuzhiyun**H_SCM_BIND_MEM** 166*4882a593Smuzhiyun 167*4882a593Smuzhiyun| Input: *drcIndex, startingScmBlockIndex, numScmBlocksToBind,* 168*4882a593Smuzhiyun| *targetLogicalMemoryAddress, continue-token* 169*4882a593Smuzhiyun| Out: *continue-token, targetLogicalMemoryAddress, numScmBlocksToBound* 170*4882a593Smuzhiyun| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_P4, H_Overlap,* 171*4882a593Smuzhiyun| *H_Too_Big, H_P5, H_Busy* 172*4882a593Smuzhiyun 173*4882a593SmuzhiyunGiven a DRC-Index of an NVDIMM, map a continuous SCM blocks range 174*4882a593Smuzhiyun*(startingScmBlockIndex, startingScmBlockIndex+numScmBlocksToBind)* to the guest 175*4882a593Smuzhiyunat *targetLogicalMemoryAddress* within guest physical address space. In 176*4882a593Smuzhiyuncase *targetLogicalMemoryAddress == 0xFFFFFFFF_FFFFFFFF* then hypervisor 177*4882a593Smuzhiyunassigns a target address to the guest. The HCALL can fail if the Guest has 178*4882a593Smuzhiyunan active PTE entry to the SCM block being bound. 179*4882a593Smuzhiyun 180*4882a593Smuzhiyun**H_SCM_UNBIND_MEM** 181*4882a593Smuzhiyun| Input: drcIndex, startingScmLogicalMemoryAddress, numScmBlocksToUnbind 182*4882a593Smuzhiyun| Out: numScmBlocksUnbound 183*4882a593Smuzhiyun| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Overlap,* 184*4882a593Smuzhiyun| *H_Busy, H_LongBusyOrder1mSec, H_LongBusyOrder10mSec* 185*4882a593Smuzhiyun 186*4882a593SmuzhiyunGiven a DRC-Index of an NVDimm, unmap *numScmBlocksToUnbind* SCM blocks starting 187*4882a593Smuzhiyunat *startingScmLogicalMemoryAddress* from guest physical address space. The 188*4882a593SmuzhiyunHCALL can fail if the Guest has an active PTE entry to the SCM block being 189*4882a593Smuzhiyununbound. 190*4882a593Smuzhiyun 191*4882a593Smuzhiyun**H_SCM_QUERY_BLOCK_MEM_BINDING** 192*4882a593Smuzhiyun 193*4882a593Smuzhiyun| Input: *drcIndex, scmBlockIndex* 194*4882a593Smuzhiyun| Out: *Guest-Physical-Address* 195*4882a593Smuzhiyun| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound* 196*4882a593Smuzhiyun 197*4882a593SmuzhiyunGiven a DRC-Index and an SCM Block index return the guest physical address to 198*4882a593Smuzhiyunwhich the SCM block is mapped to. 199*4882a593Smuzhiyun 200*4882a593Smuzhiyun**H_SCM_QUERY_LOGICAL_MEM_BINDING** 201*4882a593Smuzhiyun 202*4882a593Smuzhiyun| Input: *Guest-Physical-Address* 203*4882a593Smuzhiyun| Out: *drcIndex, scmBlockIndex* 204*4882a593Smuzhiyun| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound* 205*4882a593Smuzhiyun 206*4882a593SmuzhiyunGiven a guest physical address return which DRC Index and SCM block is mapped 207*4882a593Smuzhiyunto that address. 208*4882a593Smuzhiyun 209*4882a593Smuzhiyun**H_SCM_UNBIND_ALL** 210*4882a593Smuzhiyun 211*4882a593Smuzhiyun| Input: *scmTargetScope, drcIndex* 212*4882a593Smuzhiyun| Out: *None* 213*4882a593Smuzhiyun| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Busy,* 214*4882a593Smuzhiyun| *H_LongBusyOrder1mSec, H_LongBusyOrder10mSec* 215*4882a593Smuzhiyun 216*4882a593SmuzhiyunDepending on the Target scope unmap all SCM blocks belonging to all NVDIMMs 217*4882a593Smuzhiyunor all SCM blocks belonging to a single NVDIMM identified by its drcIndex 218*4882a593Smuzhiyunfrom the LPAR memory. 219*4882a593Smuzhiyun 220*4882a593Smuzhiyun**H_SCM_HEALTH** 221*4882a593Smuzhiyun 222*4882a593Smuzhiyun| Input: drcIndex 223*4882a593Smuzhiyun| Out: *health-bitmap (r4), health-bit-valid-bitmap (r5)* 224*4882a593Smuzhiyun| Return Value: *H_Success, H_Parameter, H_Hardware* 225*4882a593Smuzhiyun 226*4882a593SmuzhiyunGiven a DRC Index return the info on predictive failure and overall health of 227*4882a593Smuzhiyunthe PMEM device. The asserted bits in the health-bitmap indicate one or more states 228*4882a593Smuzhiyun(described in table below) of the PMEM device and health-bit-valid-bitmap indicate 229*4882a593Smuzhiyunwhich bits in health-bitmap are valid. The bits are reported in 230*4882a593Smuzhiyunreverse bit ordering for example a value of 0xC400000000000000 231*4882a593Smuzhiyunindicates bits 0, 1, and 5 are valid. 232*4882a593Smuzhiyun 233*4882a593SmuzhiyunHealth Bitmap Flags: 234*4882a593Smuzhiyun 235*4882a593Smuzhiyun+------+-----------------------------------------------------------------------+ 236*4882a593Smuzhiyun| Bit | Definition | 237*4882a593Smuzhiyun+======+=======================================================================+ 238*4882a593Smuzhiyun| 00 | PMEM device is unable to persist memory contents. | 239*4882a593Smuzhiyun| | If the system is powered down, nothing will be saved. | 240*4882a593Smuzhiyun+------+-----------------------------------------------------------------------+ 241*4882a593Smuzhiyun| 01 | PMEM device failed to persist memory contents. Either contents were | 242*4882a593Smuzhiyun| | not saved successfully on power down or were not restored properly on | 243*4882a593Smuzhiyun| | power up. | 244*4882a593Smuzhiyun+------+-----------------------------------------------------------------------+ 245*4882a593Smuzhiyun| 02 | PMEM device contents are persisted from previous IPL. The data from | 246*4882a593Smuzhiyun| | the last boot were successfully restored. | 247*4882a593Smuzhiyun+------+-----------------------------------------------------------------------+ 248*4882a593Smuzhiyun| 03 | PMEM device contents are not persisted from previous IPL. There was no| 249*4882a593Smuzhiyun| | data to restore from the last boot. | 250*4882a593Smuzhiyun+------+-----------------------------------------------------------------------+ 251*4882a593Smuzhiyun| 04 | PMEM device memory life remaining is critically low | 252*4882a593Smuzhiyun+------+-----------------------------------------------------------------------+ 253*4882a593Smuzhiyun| 05 | PMEM device will be garded off next IPL due to failure | 254*4882a593Smuzhiyun+------+-----------------------------------------------------------------------+ 255*4882a593Smuzhiyun| 06 | PMEM device contents cannot persist due to current platform health | 256*4882a593Smuzhiyun| | status. A hardware failure may prevent data from being saved or | 257*4882a593Smuzhiyun| | restored. | 258*4882a593Smuzhiyun+------+-----------------------------------------------------------------------+ 259*4882a593Smuzhiyun| 07 | PMEM device is unable to persist memory contents in certain conditions| 260*4882a593Smuzhiyun+------+-----------------------------------------------------------------------+ 261*4882a593Smuzhiyun| 08 | PMEM device is encrypted | 262*4882a593Smuzhiyun+------+-----------------------------------------------------------------------+ 263*4882a593Smuzhiyun| 09 | PMEM device has successfully completed a requested erase or secure | 264*4882a593Smuzhiyun| | erase procedure. | 265*4882a593Smuzhiyun+------+-----------------------------------------------------------------------+ 266*4882a593Smuzhiyun|10:63 | Reserved / Unused | 267*4882a593Smuzhiyun+------+-----------------------------------------------------------------------+ 268*4882a593Smuzhiyun 269*4882a593Smuzhiyun**H_SCM_PERFORMANCE_STATS** 270*4882a593Smuzhiyun 271*4882a593Smuzhiyun| Input: drcIndex, resultBuffer Addr 272*4882a593Smuzhiyun| Out: None 273*4882a593Smuzhiyun| Return Value: *H_Success, H_Parameter, H_Unsupported, H_Hardware, H_Authority, H_Privilege* 274*4882a593Smuzhiyun 275*4882a593SmuzhiyunGiven a DRC Index collect the performance statistics for NVDIMM and copy them 276*4882a593Smuzhiyunto the resultBuffer. 277*4882a593Smuzhiyun 278*4882a593SmuzhiyunReferences 279*4882a593Smuzhiyun========== 280*4882a593Smuzhiyun.. [1] "Power Architecture Platform Reference" 281*4882a593Smuzhiyun https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference 282*4882a593Smuzhiyun.. [2] "Linux on Power Architecture Platform Reference" 283*4882a593Smuzhiyun https://members.openpowerfoundation.org/document/dl/469 284*4882a593Smuzhiyun.. [3] "Definitions and Notation" Book III-Section 14.5.3 285*4882a593Smuzhiyun https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0 286*4882a593Smuzhiyun.. [4] arch/powerpc/include/asm/hvcall.h 287*4882a593Smuzhiyun.. [5] "64-Bit ELF V2 ABI Specification: Power Architecture" 288*4882a593Smuzhiyun https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture 289