xref: /OK3568_Linux_fs/kernel/Documentation/virt/ne_overview.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun==============
4*4882a593SmuzhiyunNitro Enclaves
5*4882a593Smuzhiyun==============
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunOverview
8*4882a593Smuzhiyun========
9*4882a593Smuzhiyun
10*4882a593SmuzhiyunNitro Enclaves (NE) is a new Amazon Elastic Compute Cloud (EC2) capability
11*4882a593Smuzhiyunthat allows customers to carve out isolated compute environments within EC2
12*4882a593Smuzhiyuninstances [1].
13*4882a593Smuzhiyun
14*4882a593SmuzhiyunFor example, an application that processes sensitive data and runs in a VM,
15*4882a593Smuzhiyuncan be separated from other applications running in the same VM. This
16*4882a593Smuzhiyunapplication then runs in a separate VM than the primary VM, namely an enclave.
17*4882a593Smuzhiyun
18*4882a593SmuzhiyunAn enclave runs alongside the VM that spawned it. This setup matches low latency
19*4882a593Smuzhiyunapplications needs. The resources that are allocated for the enclave, such as
20*4882a593Smuzhiyunmemory and CPUs, are carved out of the primary VM. Each enclave is mapped to a
21*4882a593Smuzhiyunprocess running in the primary VM, that communicates with the NE driver via an
22*4882a593Smuzhiyunioctl interface.
23*4882a593Smuzhiyun
24*4882a593SmuzhiyunIn this sense, there are two components:
25*4882a593Smuzhiyun
26*4882a593Smuzhiyun1. An enclave abstraction process - a user space process running in the primary
27*4882a593SmuzhiyunVM guest that uses the provided ioctl interface of the NE driver to spawn an
28*4882a593Smuzhiyunenclave VM (that's 2 below).
29*4882a593Smuzhiyun
30*4882a593SmuzhiyunThere is a NE emulated PCI device exposed to the primary VM. The driver for this
31*4882a593Smuzhiyunnew PCI device is included in the NE driver.
32*4882a593Smuzhiyun
33*4882a593SmuzhiyunThe ioctl logic is mapped to PCI device commands e.g. the NE_START_ENCLAVE ioctl
34*4882a593Smuzhiyunmaps to an enclave start PCI command. The PCI device commands are then
35*4882a593Smuzhiyuntranslated into  actions taken on the hypervisor side; that's the Nitro
36*4882a593Smuzhiyunhypervisor running on the host where the primary VM is running. The Nitro
37*4882a593Smuzhiyunhypervisor is based on core KVM technology.
38*4882a593Smuzhiyun
39*4882a593Smuzhiyun2. The enclave itself - a VM running on the same host as the primary VM that
40*4882a593Smuzhiyunspawned it. Memory and CPUs are carved out of the primary VM and are dedicated
41*4882a593Smuzhiyunfor the enclave VM. An enclave does not have persistent storage attached.
42*4882a593Smuzhiyun
43*4882a593SmuzhiyunThe memory regions carved out of the primary VM and given to an enclave need to
44*4882a593Smuzhiyunbe aligned 2 MiB / 1 GiB physically contiguous memory regions (or multiple of
45*4882a593Smuzhiyunthis size e.g. 8 MiB). The memory can be allocated e.g. by using hugetlbfs from
46*4882a593Smuzhiyunuser space [2][3]. The memory size for an enclave needs to be at least 64 MiB.
47*4882a593SmuzhiyunThe enclave memory and CPUs need to be from the same NUMA node.
48*4882a593Smuzhiyun
49*4882a593SmuzhiyunAn enclave runs on dedicated cores. CPU 0 and its CPU siblings need to remain
50*4882a593Smuzhiyunavailable for the primary VM. A CPU pool has to be set for NE purposes by an
51*4882a593Smuzhiyunuser with admin capability. See the cpu list section from the kernel
52*4882a593Smuzhiyundocumentation [4] for how a CPU pool format looks.
53*4882a593Smuzhiyun
54*4882a593SmuzhiyunAn enclave communicates with the primary VM via a local communication channel,
55*4882a593Smuzhiyunusing virtio-vsock [5]. The primary VM has virtio-pci vsock emulated device,
56*4882a593Smuzhiyunwhile the enclave VM has a virtio-mmio vsock emulated device. The vsock device
57*4882a593Smuzhiyunuses eventfd for signaling. The enclave VM sees the usual interfaces - local
58*4882a593SmuzhiyunAPIC and IOAPIC - to get interrupts from virtio-vsock device. The virtio-mmio
59*4882a593Smuzhiyundevice is placed in memory below the typical 4 GiB.
60*4882a593Smuzhiyun
61*4882a593SmuzhiyunThe application that runs in the enclave needs to be packaged in an enclave
62*4882a593Smuzhiyunimage together with the OS ( e.g. kernel, ramdisk, init ) that will run in the
63*4882a593Smuzhiyunenclave VM. The enclave VM has its own kernel and follows the standard Linux
64*4882a593Smuzhiyunboot protocol [6].
65*4882a593Smuzhiyun
66*4882a593SmuzhiyunThe kernel bzImage, the kernel command line, the ramdisk(s) are part of the
67*4882a593SmuzhiyunEnclave Image Format (EIF); plus an EIF header including metadata such as magic
68*4882a593Smuzhiyunnumber, eif version, image size and CRC.
69*4882a593Smuzhiyun
70*4882a593SmuzhiyunHash values are computed for the entire enclave image (EIF), the kernel and
71*4882a593Smuzhiyunramdisk(s). That's used, for example, to check that the enclave image that is
72*4882a593Smuzhiyunloaded in the enclave VM is the one that was intended to be run.
73*4882a593Smuzhiyun
74*4882a593SmuzhiyunThese crypto measurements are included in a signed attestation document
75*4882a593Smuzhiyungenerated by the Nitro Hypervisor and further used to prove the identity of the
76*4882a593Smuzhiyunenclave; KMS is an example of service that NE is integrated with and that checks
77*4882a593Smuzhiyunthe attestation doc.
78*4882a593Smuzhiyun
79*4882a593SmuzhiyunThe enclave image (EIF) is loaded in the enclave memory at offset 8 MiB. The
80*4882a593Smuzhiyuninit process in the enclave connects to the vsock CID of the primary VM and a
81*4882a593Smuzhiyunpredefined port - 9000 - to send a heartbeat value - 0xb7. This mechanism is
82*4882a593Smuzhiyunused to check in the primary VM that the enclave has booted. The CID of the
83*4882a593Smuzhiyunprimary VM is 3.
84*4882a593Smuzhiyun
85*4882a593SmuzhiyunIf the enclave VM crashes or gracefully exits, an interrupt event is received by
86*4882a593Smuzhiyunthe NE driver. This event is sent further to the user space enclave process
87*4882a593Smuzhiyunrunning in the primary VM via a poll notification mechanism. Then the user space
88*4882a593Smuzhiyunenclave process can exit.
89*4882a593Smuzhiyun
90*4882a593Smuzhiyun[1] https://aws.amazon.com/ec2/nitro/nitro-enclaves/
91*4882a593Smuzhiyun[2] https://www.kernel.org/doc/html/latest/admin-guide/mm/hugetlbpage.html
92*4882a593Smuzhiyun[3] https://lwn.net/Articles/807108/
93*4882a593Smuzhiyun[4] https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html
94*4882a593Smuzhiyun[5] https://man7.org/linux/man-pages/man7/vsock.7.html
95*4882a593Smuzhiyun[6] https://www.kernel.org/doc/html/latest/x86/boot.html
96