xref: /OK3568_Linux_fs/kernel/Documentation/core-api/protection-keys.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun======================
4*4882a593SmuzhiyunMemory Protection Keys
5*4882a593Smuzhiyun======================
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunMemory Protection Keys for Userspace (PKU aka PKEYs) is a feature
8*4882a593Smuzhiyunwhich is found on Intel's Skylake (and later) "Scalable Processor"
9*4882a593SmuzhiyunServer CPUs. It will be available in future non-server Intel parts
10*4882a593Smuzhiyunand future AMD processors.
11*4882a593Smuzhiyun
12*4882a593SmuzhiyunFor anyone wishing to test or use this feature, it is available in
13*4882a593SmuzhiyunAmazon's EC2 C5 instances and is known to work there using an Ubuntu
14*4882a593Smuzhiyun17.04 image.
15*4882a593Smuzhiyun
16*4882a593SmuzhiyunMemory Protection Keys provides a mechanism for enforcing page-based
17*4882a593Smuzhiyunprotections, but without requiring modification of the page tables
18*4882a593Smuzhiyunwhen an application changes protection domains.  It works by
19*4882a593Smuzhiyundedicating 4 previously ignored bits in each page table entry to a
20*4882a593Smuzhiyun"protection key", giving 16 possible keys.
21*4882a593Smuzhiyun
22*4882a593SmuzhiyunThere is also a new user-accessible register (PKRU) with two separate
23*4882a593Smuzhiyunbits (Access Disable and Write Disable) for each key.  Being a CPU
24*4882a593Smuzhiyunregister, PKRU is inherently thread-local, potentially giving each
25*4882a593Smuzhiyunthread a different set of protections from every other thread.
26*4882a593Smuzhiyun
27*4882a593SmuzhiyunThere are two new instructions (RDPKRU/WRPKRU) for reading and writing
28*4882a593Smuzhiyunto the new register.  The feature is only available in 64-bit mode,
29*4882a593Smuzhiyuneven though there is theoretically space in the PAE PTEs.  These
30*4882a593Smuzhiyunpermissions are enforced on data access only and have no effect on
31*4882a593Smuzhiyuninstruction fetches.
32*4882a593Smuzhiyun
33*4882a593SmuzhiyunSyscalls
34*4882a593Smuzhiyun========
35*4882a593Smuzhiyun
36*4882a593SmuzhiyunThere are 3 system calls which directly interact with pkeys::
37*4882a593Smuzhiyun
38*4882a593Smuzhiyun	int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
39*4882a593Smuzhiyun	int pkey_free(int pkey);
40*4882a593Smuzhiyun	int pkey_mprotect(unsigned long start, size_t len,
41*4882a593Smuzhiyun			  unsigned long prot, int pkey);
42*4882a593Smuzhiyun
43*4882a593SmuzhiyunBefore a pkey can be used, it must first be allocated with
44*4882a593Smuzhiyunpkey_alloc().  An application calls the WRPKRU instruction
45*4882a593Smuzhiyundirectly in order to change access permissions to memory covered
46*4882a593Smuzhiyunwith a key.  In this example WRPKRU is wrapped by a C function
47*4882a593Smuzhiyuncalled pkey_set().
48*4882a593Smuzhiyun::
49*4882a593Smuzhiyun
50*4882a593Smuzhiyun	int real_prot = PROT_READ|PROT_WRITE;
51*4882a593Smuzhiyun	pkey = pkey_alloc(0, PKEY_DISABLE_WRITE);
52*4882a593Smuzhiyun	ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
53*4882a593Smuzhiyun	ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey);
54*4882a593Smuzhiyun	... application runs here
55*4882a593Smuzhiyun
56*4882a593SmuzhiyunNow, if the application needs to update the data at 'ptr', it can
57*4882a593Smuzhiyungain access, do the update, then remove its write access::
58*4882a593Smuzhiyun
59*4882a593Smuzhiyun	pkey_set(pkey, 0); // clear PKEY_DISABLE_WRITE
60*4882a593Smuzhiyun	*ptr = foo; // assign something
61*4882a593Smuzhiyun	pkey_set(pkey, PKEY_DISABLE_WRITE); // set PKEY_DISABLE_WRITE again
62*4882a593Smuzhiyun
63*4882a593SmuzhiyunNow when it frees the memory, it will also free the pkey since it
64*4882a593Smuzhiyunis no longer in use::
65*4882a593Smuzhiyun
66*4882a593Smuzhiyun	munmap(ptr, PAGE_SIZE);
67*4882a593Smuzhiyun	pkey_free(pkey);
68*4882a593Smuzhiyun
69*4882a593Smuzhiyun.. note:: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions.
70*4882a593Smuzhiyun          An example implementation can be found in
71*4882a593Smuzhiyun          tools/testing/selftests/x86/protection_keys.c.
72*4882a593Smuzhiyun
73*4882a593SmuzhiyunBehavior
74*4882a593Smuzhiyun========
75*4882a593Smuzhiyun
76*4882a593SmuzhiyunThe kernel attempts to make protection keys consistent with the
77*4882a593Smuzhiyunbehavior of a plain mprotect().  For instance if you do this::
78*4882a593Smuzhiyun
79*4882a593Smuzhiyun	mprotect(ptr, size, PROT_NONE);
80*4882a593Smuzhiyun	something(ptr);
81*4882a593Smuzhiyun
82*4882a593Smuzhiyunyou can expect the same effects with protection keys when doing this::
83*4882a593Smuzhiyun
84*4882a593Smuzhiyun	pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ);
85*4882a593Smuzhiyun	pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey);
86*4882a593Smuzhiyun	something(ptr);
87*4882a593Smuzhiyun
88*4882a593SmuzhiyunThat should be true whether something() is a direct access to 'ptr'
89*4882a593Smuzhiyunlike::
90*4882a593Smuzhiyun
91*4882a593Smuzhiyun	*ptr = foo;
92*4882a593Smuzhiyun
93*4882a593Smuzhiyunor when the kernel does the access on the application's behalf like
94*4882a593Smuzhiyunwith a read()::
95*4882a593Smuzhiyun
96*4882a593Smuzhiyun	read(fd, ptr, 1);
97*4882a593Smuzhiyun
98*4882a593SmuzhiyunThe kernel will send a SIGSEGV in both cases, but si_code will be set
99*4882a593Smuzhiyunto SEGV_PKERR when violating protection keys versus SEGV_ACCERR when
100*4882a593Smuzhiyunthe plain mprotect() permissions are violated.
101