xref: /OK3568_Linux_fs/kernel/Documentation/powerpc/papr_hcalls.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun===========================
4*4882a593SmuzhiyunHypercall Op-codes (hcalls)
5*4882a593Smuzhiyun===========================
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunOverview
8*4882a593Smuzhiyun=========
9*4882a593Smuzhiyun
10*4882a593SmuzhiyunVirtualization on 64-bit Power Book3S Platforms is based on the PAPR
11*4882a593Smuzhiyunspecification [1]_ which describes the run-time environment for a guest
12*4882a593Smuzhiyunoperating system and how it should interact with the hypervisor for
13*4882a593Smuzhiyunprivileged operations. Currently there are two PAPR compliant hypervisors:
14*4882a593Smuzhiyun
15*4882a593Smuzhiyun- **IBM PowerVM (PHYP)**: IBM's proprietary hypervisor that supports AIX,
16*4882a593Smuzhiyun  IBM-i and  Linux as supported guests (termed as Logical Partitions
17*4882a593Smuzhiyun  or LPARS). It supports the full PAPR specification.
18*4882a593Smuzhiyun
19*4882a593Smuzhiyun- **Qemu/KVM**: Supports PPC64 linux guests running on a PPC64 linux host.
20*4882a593Smuzhiyun  Though it only implements a subset of PAPR specification called LoPAPR [2]_.
21*4882a593Smuzhiyun
22*4882a593SmuzhiyunOn PPC64 arch a guest kernel running on top of a PAPR hypervisor is called
23*4882a593Smuzhiyuna *pSeries guest*. A pseries guest runs in a supervisor mode (HV=0) and must
24*4882a593Smuzhiyunissue hypercalls to the hypervisor whenever it needs to perform an action
25*4882a593Smuzhiyunthat is hypervisor priviledged [3]_ or for other services managed by the
26*4882a593Smuzhiyunhypervisor.
27*4882a593Smuzhiyun
28*4882a593SmuzhiyunHence a Hypercall (hcall) is essentially a request by the pseries guest
29*4882a593Smuzhiyunasking hypervisor to perform a privileged operation on behalf of the guest. The
30*4882a593Smuzhiyunguest issues a with necessary input operands. The hypervisor after performing
31*4882a593Smuzhiyunthe privilege operation returns a status code and output operands back to the
32*4882a593Smuzhiyunguest.
33*4882a593Smuzhiyun
34*4882a593SmuzhiyunHCALL ABI
35*4882a593Smuzhiyun=========
36*4882a593SmuzhiyunThe ABI specification for a hcall between a pseries guest and PAPR hypervisor
37*4882a593Smuzhiyunis covered in section 14.5.3 of ref [2]_. Switch to the  Hypervisor context is
38*4882a593Smuzhiyundone via the instruction **HVCS** that expects the Opcode for hcall is set in *r3*
39*4882a593Smuzhiyunand any in-arguments for the hcall are provided in registers *r4-r12*. If values
40*4882a593Smuzhiyunhave to be passed through a memory buffer, the data stored in that buffer should be
41*4882a593Smuzhiyunin Big-endian byte order.
42*4882a593Smuzhiyun
43*4882a593SmuzhiyunOnce control is returns back to the guest after hypervisor has serviced the
44*4882a593Smuzhiyun'HVCS' instruction the return value of the hcall is available in *r3* and any
45*4882a593Smuzhiyunout values are returned in registers *r4-r12*. Again like in case of in-arguments,
46*4882a593Smuzhiyunany out values stored in a memory buffer will be in Big-endian byte order.
47*4882a593Smuzhiyun
48*4882a593SmuzhiyunPowerpc arch code provides convenient wrappers named **plpar_hcall_xxx** defined
49*4882a593Smuzhiyunin a arch specific header [4]_ to issue hcalls from the linux kernel
50*4882a593Smuzhiyunrunning as pseries guest.
51*4882a593Smuzhiyun
52*4882a593SmuzhiyunRegister Conventions
53*4882a593Smuzhiyun====================
54*4882a593Smuzhiyun
55*4882a593SmuzhiyunAny hcall should follow same register convention as described in section 2.2.1.1
56*4882a593Smuzhiyunof "64-Bit ELF V2 ABI Specification: Power Architecture"[5]_. Table below
57*4882a593Smuzhiyunsummarizes these conventions:
58*4882a593Smuzhiyun
59*4882a593Smuzhiyun+----------+----------+-------------------------------------------+
60*4882a593Smuzhiyun| Register |Volatile  |  Purpose                                  |
61*4882a593Smuzhiyun| Range    |(Y/N)     |                                           |
62*4882a593Smuzhiyun+==========+==========+===========================================+
63*4882a593Smuzhiyun|   r0     |    Y     |  Optional-usage                           |
64*4882a593Smuzhiyun+----------+----------+-------------------------------------------+
65*4882a593Smuzhiyun|   r1     |    N     |  Stack Pointer                            |
66*4882a593Smuzhiyun+----------+----------+-------------------------------------------+
67*4882a593Smuzhiyun|   r2     |    N     |  TOC                                      |
68*4882a593Smuzhiyun+----------+----------+-------------------------------------------+
69*4882a593Smuzhiyun|   r3     |    Y     |  hcall opcode/return value                |
70*4882a593Smuzhiyun+----------+----------+-------------------------------------------+
71*4882a593Smuzhiyun|  r4-r10  |    Y     |  in and out values                        |
72*4882a593Smuzhiyun+----------+----------+-------------------------------------------+
73*4882a593Smuzhiyun|   r11    |    Y     |  Optional-usage/Environmental pointer     |
74*4882a593Smuzhiyun+----------+----------+-------------------------------------------+
75*4882a593Smuzhiyun|   r12    |    Y     |  Optional-usage/Function entry address at |
76*4882a593Smuzhiyun|          |          |  global entry point                       |
77*4882a593Smuzhiyun+----------+----------+-------------------------------------------+
78*4882a593Smuzhiyun|   r13    |    N     |  Thread-Pointer                           |
79*4882a593Smuzhiyun+----------+----------+-------------------------------------------+
80*4882a593Smuzhiyun|  r14-r31 |    N     |  Local Variables                          |
81*4882a593Smuzhiyun+----------+----------+-------------------------------------------+
82*4882a593Smuzhiyun|    LR    |    Y     |  Link Register                            |
83*4882a593Smuzhiyun+----------+----------+-------------------------------------------+
84*4882a593Smuzhiyun|   CTR    |    Y     |  Loop Counter                             |
85*4882a593Smuzhiyun+----------+----------+-------------------------------------------+
86*4882a593Smuzhiyun|   XER    |    Y     |  Fixed-point exception register.          |
87*4882a593Smuzhiyun+----------+----------+-------------------------------------------+
88*4882a593Smuzhiyun|  CR0-1   |    Y     |  Condition register fields.               |
89*4882a593Smuzhiyun+----------+----------+-------------------------------------------+
90*4882a593Smuzhiyun|  CR2-4   |    N     |  Condition register fields.               |
91*4882a593Smuzhiyun+----------+----------+-------------------------------------------+
92*4882a593Smuzhiyun|  CR5-7   |    Y     |  Condition register fields.               |
93*4882a593Smuzhiyun+----------+----------+-------------------------------------------+
94*4882a593Smuzhiyun|  Others  |    N     |                                           |
95*4882a593Smuzhiyun+----------+----------+-------------------------------------------+
96*4882a593Smuzhiyun
97*4882a593SmuzhiyunDRC & DRC Indexes
98*4882a593Smuzhiyun=================
99*4882a593Smuzhiyun::
100*4882a593Smuzhiyun
101*4882a593Smuzhiyun     DR1                                  Guest
102*4882a593Smuzhiyun     +--+        +------------+         +---------+
103*4882a593Smuzhiyun     |  | <----> |            |         |  User   |
104*4882a593Smuzhiyun     +--+  DRC1  |            |   DRC   |  Space  |
105*4882a593Smuzhiyun                 |    PAPR    |  Index  +---------+
106*4882a593Smuzhiyun     DR2         | Hypervisor |         |         |
107*4882a593Smuzhiyun     +--+        |            | <-----> |  Kernel |
108*4882a593Smuzhiyun     |  | <----> |            |  Hcall  |         |
109*4882a593Smuzhiyun     +--+  DRC2  +------------+         +---------+
110*4882a593Smuzhiyun
111*4882a593SmuzhiyunPAPR hypervisor terms shared hardware resources like PCI devices, NVDIMMs etc
112*4882a593Smuzhiyunavailable for use by LPARs as Dynamic Resource (DR). When a DR is allocated to
113*4882a593Smuzhiyunan LPAR, PHYP creates a data-structure called Dynamic Resource Connector (DRC)
114*4882a593Smuzhiyunto manage LPAR access. An LPAR refers to a DRC via an opaque 32-bit number
115*4882a593Smuzhiyuncalled DRC-Index. The DRC-index value is provided to the LPAR via device-tree
116*4882a593Smuzhiyunwhere its present as an attribute in the device tree node associated with the
117*4882a593SmuzhiyunDR.
118*4882a593Smuzhiyun
119*4882a593SmuzhiyunHCALL Return-values
120*4882a593Smuzhiyun===================
121*4882a593Smuzhiyun
122*4882a593SmuzhiyunAfter servicing the hcall, hypervisor sets the return-value in *r3* indicating
123*4882a593Smuzhiyunsuccess or failure of the hcall. In case of a failure an error code indicates
124*4882a593Smuzhiyunthe cause for error. These codes are defined and documented in arch specific
125*4882a593Smuzhiyunheader [4]_.
126*4882a593Smuzhiyun
127*4882a593SmuzhiyunIn some cases a hcall can potentially take a long time and need to be issued
128*4882a593Smuzhiyunmultiple times in order to be completely serviced. These hcalls will usually
129*4882a593Smuzhiyunaccept an opaque value *continue-token* within there argument list and a
130*4882a593Smuzhiyunreturn value of *H_CONTINUE* indicates that hypervisor hasn't still finished
131*4882a593Smuzhiyunservicing the hcall yet.
132*4882a593Smuzhiyun
133*4882a593SmuzhiyunTo make such hcalls the guest need to set *continue-token == 0* for the
134*4882a593Smuzhiyuninitial call and use the hypervisor returned value of *continue-token*
135*4882a593Smuzhiyunfor each subsequent hcall until hypervisor returns a non *H_CONTINUE*
136*4882a593Smuzhiyunreturn value.
137*4882a593Smuzhiyun
138*4882a593SmuzhiyunHCALL Op-codes
139*4882a593Smuzhiyun==============
140*4882a593Smuzhiyun
141*4882a593SmuzhiyunBelow is a partial list of HCALLs that are supported by PHYP. For the
142*4882a593Smuzhiyuncorresponding opcode values please look into the arch specific header [4]_:
143*4882a593Smuzhiyun
144*4882a593Smuzhiyun**H_SCM_READ_METADATA**
145*4882a593Smuzhiyun
146*4882a593Smuzhiyun| Input: *drcIndex, offset, buffer-address, numBytesToRead*
147*4882a593Smuzhiyun| Out: *numBytesRead*
148*4882a593Smuzhiyun| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_Hardware*
149*4882a593Smuzhiyun
150*4882a593SmuzhiyunGiven a DRC Index of an NVDIMM, read N-bytes from the the metadata area
151*4882a593Smuzhiyunassociated with it, at a specified offset and copy it to provided buffer.
152*4882a593SmuzhiyunThe metadata area stores configuration information such as label information,
153*4882a593Smuzhiyunbad-blocks etc. The metadata area is located out-of-band of NVDIMM storage
154*4882a593Smuzhiyunarea hence a separate access semantics is provided.
155*4882a593Smuzhiyun
156*4882a593Smuzhiyun**H_SCM_WRITE_METADATA**
157*4882a593Smuzhiyun
158*4882a593Smuzhiyun| Input: *drcIndex, offset, data, numBytesToWrite*
159*4882a593Smuzhiyun| Out: *None*
160*4882a593Smuzhiyun| Return Value: *H_Success, H_Parameter, H_P2, H_P4, H_Hardware*
161*4882a593Smuzhiyun
162*4882a593SmuzhiyunGiven a DRC Index of an NVDIMM, write N-bytes to the metadata area
163*4882a593Smuzhiyunassociated with it, at the specified offset and from the provided buffer.
164*4882a593Smuzhiyun
165*4882a593Smuzhiyun**H_SCM_BIND_MEM**
166*4882a593Smuzhiyun
167*4882a593Smuzhiyun| Input: *drcIndex, startingScmBlockIndex, numScmBlocksToBind,*
168*4882a593Smuzhiyun| *targetLogicalMemoryAddress, continue-token*
169*4882a593Smuzhiyun| Out: *continue-token, targetLogicalMemoryAddress, numScmBlocksToBound*
170*4882a593Smuzhiyun| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_P4, H_Overlap,*
171*4882a593Smuzhiyun| *H_Too_Big, H_P5, H_Busy*
172*4882a593Smuzhiyun
173*4882a593SmuzhiyunGiven a DRC-Index of an NVDIMM, map a continuous SCM blocks range
174*4882a593Smuzhiyun*(startingScmBlockIndex, startingScmBlockIndex+numScmBlocksToBind)* to the guest
175*4882a593Smuzhiyunat *targetLogicalMemoryAddress* within guest physical address space. In
176*4882a593Smuzhiyuncase *targetLogicalMemoryAddress == 0xFFFFFFFF_FFFFFFFF* then hypervisor
177*4882a593Smuzhiyunassigns a target address to the guest. The HCALL can fail if the Guest has
178*4882a593Smuzhiyunan active PTE entry to the SCM block being bound.
179*4882a593Smuzhiyun
180*4882a593Smuzhiyun**H_SCM_UNBIND_MEM**
181*4882a593Smuzhiyun| Input: drcIndex, startingScmLogicalMemoryAddress, numScmBlocksToUnbind
182*4882a593Smuzhiyun| Out: numScmBlocksUnbound
183*4882a593Smuzhiyun| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Overlap,*
184*4882a593Smuzhiyun| *H_Busy, H_LongBusyOrder1mSec, H_LongBusyOrder10mSec*
185*4882a593Smuzhiyun
186*4882a593SmuzhiyunGiven a DRC-Index of an NVDimm, unmap *numScmBlocksToUnbind* SCM blocks starting
187*4882a593Smuzhiyunat *startingScmLogicalMemoryAddress* from guest physical address space. The
188*4882a593SmuzhiyunHCALL can fail if the Guest has an active PTE entry to the SCM block being
189*4882a593Smuzhiyununbound.
190*4882a593Smuzhiyun
191*4882a593Smuzhiyun**H_SCM_QUERY_BLOCK_MEM_BINDING**
192*4882a593Smuzhiyun
193*4882a593Smuzhiyun| Input: *drcIndex, scmBlockIndex*
194*4882a593Smuzhiyun| Out: *Guest-Physical-Address*
195*4882a593Smuzhiyun| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound*
196*4882a593Smuzhiyun
197*4882a593SmuzhiyunGiven a DRC-Index and an SCM Block index return the guest physical address to
198*4882a593Smuzhiyunwhich the SCM block is mapped to.
199*4882a593Smuzhiyun
200*4882a593Smuzhiyun**H_SCM_QUERY_LOGICAL_MEM_BINDING**
201*4882a593Smuzhiyun
202*4882a593Smuzhiyun| Input: *Guest-Physical-Address*
203*4882a593Smuzhiyun| Out: *drcIndex, scmBlockIndex*
204*4882a593Smuzhiyun| Return Value: *H_Success, H_Parameter, H_P2, H_NotFound*
205*4882a593Smuzhiyun
206*4882a593SmuzhiyunGiven a guest physical address return which DRC Index and SCM block is mapped
207*4882a593Smuzhiyunto that address.
208*4882a593Smuzhiyun
209*4882a593Smuzhiyun**H_SCM_UNBIND_ALL**
210*4882a593Smuzhiyun
211*4882a593Smuzhiyun| Input: *scmTargetScope, drcIndex*
212*4882a593Smuzhiyun| Out: *None*
213*4882a593Smuzhiyun| Return Value: *H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Busy,*
214*4882a593Smuzhiyun| *H_LongBusyOrder1mSec, H_LongBusyOrder10mSec*
215*4882a593Smuzhiyun
216*4882a593SmuzhiyunDepending on the Target scope unmap all SCM blocks belonging to all NVDIMMs
217*4882a593Smuzhiyunor all SCM blocks belonging to a single NVDIMM identified by its drcIndex
218*4882a593Smuzhiyunfrom the LPAR memory.
219*4882a593Smuzhiyun
220*4882a593Smuzhiyun**H_SCM_HEALTH**
221*4882a593Smuzhiyun
222*4882a593Smuzhiyun| Input: drcIndex
223*4882a593Smuzhiyun| Out: *health-bitmap (r4), health-bit-valid-bitmap (r5)*
224*4882a593Smuzhiyun| Return Value: *H_Success, H_Parameter, H_Hardware*
225*4882a593Smuzhiyun
226*4882a593SmuzhiyunGiven a DRC Index return the info on predictive failure and overall health of
227*4882a593Smuzhiyunthe PMEM device. The asserted bits in the health-bitmap indicate one or more states
228*4882a593Smuzhiyun(described in table below) of the PMEM device and health-bit-valid-bitmap indicate
229*4882a593Smuzhiyunwhich bits in health-bitmap are valid. The bits are reported in
230*4882a593Smuzhiyunreverse bit ordering for example a value of 0xC400000000000000
231*4882a593Smuzhiyunindicates bits 0, 1, and 5 are valid.
232*4882a593Smuzhiyun
233*4882a593SmuzhiyunHealth Bitmap Flags:
234*4882a593Smuzhiyun
235*4882a593Smuzhiyun+------+-----------------------------------------------------------------------+
236*4882a593Smuzhiyun|  Bit |               Definition                                              |
237*4882a593Smuzhiyun+======+=======================================================================+
238*4882a593Smuzhiyun|  00  | PMEM device is unable to persist memory contents.                     |
239*4882a593Smuzhiyun|      | If the system is powered down, nothing will be saved.                 |
240*4882a593Smuzhiyun+------+-----------------------------------------------------------------------+
241*4882a593Smuzhiyun|  01  | PMEM device failed to persist memory contents. Either contents were   |
242*4882a593Smuzhiyun|      | not saved successfully on power down or were not restored properly on |
243*4882a593Smuzhiyun|      | power up.                                                             |
244*4882a593Smuzhiyun+------+-----------------------------------------------------------------------+
245*4882a593Smuzhiyun|  02  | PMEM device contents are persisted from previous IPL. The data from   |
246*4882a593Smuzhiyun|      | the last boot were successfully restored.                             |
247*4882a593Smuzhiyun+------+-----------------------------------------------------------------------+
248*4882a593Smuzhiyun|  03  | PMEM device contents are not persisted from previous IPL. There was no|
249*4882a593Smuzhiyun|      | data to restore from the last boot.                                   |
250*4882a593Smuzhiyun+------+-----------------------------------------------------------------------+
251*4882a593Smuzhiyun|  04  | PMEM device memory life remaining is critically low                   |
252*4882a593Smuzhiyun+------+-----------------------------------------------------------------------+
253*4882a593Smuzhiyun|  05  | PMEM device will be garded off next IPL due to failure                |
254*4882a593Smuzhiyun+------+-----------------------------------------------------------------------+
255*4882a593Smuzhiyun|  06  | PMEM device contents cannot persist due to current platform health    |
256*4882a593Smuzhiyun|      | status. A hardware failure may prevent data from being saved or       |
257*4882a593Smuzhiyun|      | restored.                                                             |
258*4882a593Smuzhiyun+------+-----------------------------------------------------------------------+
259*4882a593Smuzhiyun|  07  | PMEM device is unable to persist memory contents in certain conditions|
260*4882a593Smuzhiyun+------+-----------------------------------------------------------------------+
261*4882a593Smuzhiyun|  08  | PMEM device is encrypted                                              |
262*4882a593Smuzhiyun+------+-----------------------------------------------------------------------+
263*4882a593Smuzhiyun|  09  | PMEM device has successfully completed a requested erase or secure    |
264*4882a593Smuzhiyun|      | erase procedure.                                                      |
265*4882a593Smuzhiyun+------+-----------------------------------------------------------------------+
266*4882a593Smuzhiyun|10:63 | Reserved / Unused                                                     |
267*4882a593Smuzhiyun+------+-----------------------------------------------------------------------+
268*4882a593Smuzhiyun
269*4882a593Smuzhiyun**H_SCM_PERFORMANCE_STATS**
270*4882a593Smuzhiyun
271*4882a593Smuzhiyun| Input: drcIndex, resultBuffer Addr
272*4882a593Smuzhiyun| Out: None
273*4882a593Smuzhiyun| Return Value:  *H_Success, H_Parameter, H_Unsupported, H_Hardware, H_Authority, H_Privilege*
274*4882a593Smuzhiyun
275*4882a593SmuzhiyunGiven a DRC Index collect the performance statistics for NVDIMM and copy them
276*4882a593Smuzhiyunto the resultBuffer.
277*4882a593Smuzhiyun
278*4882a593SmuzhiyunReferences
279*4882a593Smuzhiyun==========
280*4882a593Smuzhiyun.. [1] "Power Architecture Platform Reference"
281*4882a593Smuzhiyun       https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
282*4882a593Smuzhiyun.. [2] "Linux on Power Architecture Platform Reference"
283*4882a593Smuzhiyun       https://members.openpowerfoundation.org/document/dl/469
284*4882a593Smuzhiyun.. [3] "Definitions and Notation" Book III-Section 14.5.3
285*4882a593Smuzhiyun       https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0
286*4882a593Smuzhiyun.. [4] arch/powerpc/include/asm/hvcall.h
287*4882a593Smuzhiyun.. [5] "64-Bit ELF V2 ABI Specification: Power Architecture"
288*4882a593Smuzhiyun       https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture
289