1*4882a593SmuzhiyunMicroarchitectural Data Sampling (MDS) mitigation 2*4882a593Smuzhiyun================================================= 3*4882a593Smuzhiyun 4*4882a593Smuzhiyun.. _mds: 5*4882a593Smuzhiyun 6*4882a593SmuzhiyunOverview 7*4882a593Smuzhiyun-------- 8*4882a593Smuzhiyun 9*4882a593SmuzhiyunMicroarchitectural Data Sampling (MDS) is a family of side channel attacks 10*4882a593Smuzhiyunon internal buffers in Intel CPUs. The variants are: 11*4882a593Smuzhiyun 12*4882a593Smuzhiyun - Microarchitectural Store Buffer Data Sampling (MSBDS) (CVE-2018-12126) 13*4882a593Smuzhiyun - Microarchitectural Fill Buffer Data Sampling (MFBDS) (CVE-2018-12130) 14*4882a593Smuzhiyun - Microarchitectural Load Port Data Sampling (MLPDS) (CVE-2018-12127) 15*4882a593Smuzhiyun - Microarchitectural Data Sampling Uncacheable Memory (MDSUM) (CVE-2019-11091) 16*4882a593Smuzhiyun 17*4882a593SmuzhiyunMSBDS leaks Store Buffer Entries which can be speculatively forwarded to a 18*4882a593Smuzhiyundependent load (store-to-load forwarding) as an optimization. The forward 19*4882a593Smuzhiyuncan also happen to a faulting or assisting load operation for a different 20*4882a593Smuzhiyunmemory address, which can be exploited under certain conditions. Store 21*4882a593Smuzhiyunbuffers are partitioned between Hyper-Threads so cross thread forwarding is 22*4882a593Smuzhiyunnot possible. But if a thread enters or exits a sleep state the store 23*4882a593Smuzhiyunbuffer is repartitioned which can expose data from one thread to the other. 24*4882a593Smuzhiyun 25*4882a593SmuzhiyunMFBDS leaks Fill Buffer Entries. Fill buffers are used internally to manage 26*4882a593SmuzhiyunL1 miss situations and to hold data which is returned or sent in response 27*4882a593Smuzhiyunto a memory or I/O operation. Fill buffers can forward data to a load 28*4882a593Smuzhiyunoperation and also write data to the cache. When the fill buffer is 29*4882a593Smuzhiyundeallocated it can retain the stale data of the preceding operations which 30*4882a593Smuzhiyuncan then be forwarded to a faulting or assisting load operation, which can 31*4882a593Smuzhiyunbe exploited under certain conditions. Fill buffers are shared between 32*4882a593SmuzhiyunHyper-Threads so cross thread leakage is possible. 33*4882a593Smuzhiyun 34*4882a593SmuzhiyunMLPDS leaks Load Port Data. Load ports are used to perform load operations 35*4882a593Smuzhiyunfrom memory or I/O. The received data is then forwarded to the register 36*4882a593Smuzhiyunfile or a subsequent operation. In some implementations the Load Port can 37*4882a593Smuzhiyuncontain stale data from a previous operation which can be forwarded to 38*4882a593Smuzhiyunfaulting or assisting loads under certain conditions, which again can be 39*4882a593Smuzhiyunexploited eventually. Load ports are shared between Hyper-Threads so cross 40*4882a593Smuzhiyunthread leakage is possible. 41*4882a593Smuzhiyun 42*4882a593SmuzhiyunMDSUM is a special case of MSBDS, MFBDS and MLPDS. An uncacheable load from 43*4882a593Smuzhiyunmemory that takes a fault or assist can leave data in a microarchitectural 44*4882a593Smuzhiyunstructure that may later be observed using one of the same methods used by 45*4882a593SmuzhiyunMSBDS, MFBDS or MLPDS. 46*4882a593Smuzhiyun 47*4882a593SmuzhiyunExposure assumptions 48*4882a593Smuzhiyun-------------------- 49*4882a593Smuzhiyun 50*4882a593SmuzhiyunIt is assumed that attack code resides in user space or in a guest with one 51*4882a593Smuzhiyunexception. The rationale behind this assumption is that the code construct 52*4882a593Smuzhiyunneeded for exploiting MDS requires: 53*4882a593Smuzhiyun 54*4882a593Smuzhiyun - to control the load to trigger a fault or assist 55*4882a593Smuzhiyun 56*4882a593Smuzhiyun - to have a disclosure gadget which exposes the speculatively accessed 57*4882a593Smuzhiyun data for consumption through a side channel. 58*4882a593Smuzhiyun 59*4882a593Smuzhiyun - to control the pointer through which the disclosure gadget exposes the 60*4882a593Smuzhiyun data 61*4882a593Smuzhiyun 62*4882a593SmuzhiyunThe existence of such a construct in the kernel cannot be excluded with 63*4882a593Smuzhiyun100% certainty, but the complexity involved makes it extremly unlikely. 64*4882a593Smuzhiyun 65*4882a593SmuzhiyunThere is one exception, which is untrusted BPF. The functionality of 66*4882a593Smuzhiyununtrusted BPF is limited, but it needs to be thoroughly investigated 67*4882a593Smuzhiyunwhether it can be used to create such a construct. 68*4882a593Smuzhiyun 69*4882a593Smuzhiyun 70*4882a593SmuzhiyunMitigation strategy 71*4882a593Smuzhiyun------------------- 72*4882a593Smuzhiyun 73*4882a593SmuzhiyunAll variants have the same mitigation strategy at least for the single CPU 74*4882a593Smuzhiyunthread case (SMT off): Force the CPU to clear the affected buffers. 75*4882a593Smuzhiyun 76*4882a593SmuzhiyunThis is achieved by using the otherwise unused and obsolete VERW 77*4882a593Smuzhiyuninstruction in combination with a microcode update. The microcode clears 78*4882a593Smuzhiyunthe affected CPU buffers when the VERW instruction is executed. 79*4882a593Smuzhiyun 80*4882a593SmuzhiyunFor virtualization there are two ways to achieve CPU buffer 81*4882a593Smuzhiyunclearing. Either the modified VERW instruction or via the L1D Flush 82*4882a593Smuzhiyuncommand. The latter is issued when L1TF mitigation is enabled so the extra 83*4882a593SmuzhiyunVERW can be avoided. If the CPU is not affected by L1TF then VERW needs to 84*4882a593Smuzhiyunbe issued. 85*4882a593Smuzhiyun 86*4882a593SmuzhiyunIf the VERW instruction with the supplied segment selector argument is 87*4882a593Smuzhiyunexecuted on a CPU without the microcode update there is no side effect 88*4882a593Smuzhiyunother than a small number of pointlessly wasted CPU cycles. 89*4882a593Smuzhiyun 90*4882a593SmuzhiyunThis does not protect against cross Hyper-Thread attacks except for MSBDS 91*4882a593Smuzhiyunwhich is only exploitable cross Hyper-thread when one of the Hyper-Threads 92*4882a593Smuzhiyunenters a C-state. 93*4882a593Smuzhiyun 94*4882a593SmuzhiyunThe kernel provides a function to invoke the buffer clearing: 95*4882a593Smuzhiyun 96*4882a593Smuzhiyun mds_clear_cpu_buffers() 97*4882a593Smuzhiyun 98*4882a593SmuzhiyunThe mitigation is invoked on kernel/userspace, hypervisor/guest and C-state 99*4882a593Smuzhiyun(idle) transitions. 100*4882a593Smuzhiyun 101*4882a593SmuzhiyunAs a special quirk to address virtualization scenarios where the host has 102*4882a593Smuzhiyunthe microcode updated, but the hypervisor does not (yet) expose the 103*4882a593SmuzhiyunMD_CLEAR CPUID bit to guests, the kernel issues the VERW instruction in the 104*4882a593Smuzhiyunhope that it might actually clear the buffers. The state is reflected 105*4882a593Smuzhiyunaccordingly. 106*4882a593Smuzhiyun 107*4882a593SmuzhiyunAccording to current knowledge additional mitigations inside the kernel 108*4882a593Smuzhiyunitself are not required because the necessary gadgets to expose the leaked 109*4882a593Smuzhiyundata cannot be controlled in a way which allows exploitation from malicious 110*4882a593Smuzhiyunuser space or VM guests. 111*4882a593Smuzhiyun 112*4882a593SmuzhiyunKernel internal mitigation modes 113*4882a593Smuzhiyun-------------------------------- 114*4882a593Smuzhiyun 115*4882a593Smuzhiyun ======= ============================================================ 116*4882a593Smuzhiyun off Mitigation is disabled. Either the CPU is not affected or 117*4882a593Smuzhiyun mds=off is supplied on the kernel command line 118*4882a593Smuzhiyun 119*4882a593Smuzhiyun full Mitigation is enabled. CPU is affected and MD_CLEAR is 120*4882a593Smuzhiyun advertised in CPUID. 121*4882a593Smuzhiyun 122*4882a593Smuzhiyun vmwerv Mitigation is enabled. CPU is affected and MD_CLEAR is not 123*4882a593Smuzhiyun advertised in CPUID. That is mainly for virtualization 124*4882a593Smuzhiyun scenarios where the host has the updated microcode but the 125*4882a593Smuzhiyun hypervisor does not expose MD_CLEAR in CPUID. It's a best 126*4882a593Smuzhiyun effort approach without guarantee. 127*4882a593Smuzhiyun ======= ============================================================ 128*4882a593Smuzhiyun 129*4882a593SmuzhiyunIf the CPU is affected and mds=off is not supplied on the kernel command 130*4882a593Smuzhiyunline then the kernel selects the appropriate mitigation mode depending on 131*4882a593Smuzhiyunthe availability of the MD_CLEAR CPUID bit. 132*4882a593Smuzhiyun 133*4882a593SmuzhiyunMitigation points 134*4882a593Smuzhiyun----------------- 135*4882a593Smuzhiyun 136*4882a593Smuzhiyun1. Return to user space 137*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^^^^^^ 138*4882a593Smuzhiyun 139*4882a593Smuzhiyun When transitioning from kernel to user space the CPU buffers are flushed 140*4882a593Smuzhiyun on affected CPUs when the mitigation is not disabled on the kernel 141*4882a593Smuzhiyun command line. The migitation is enabled through the static key 142*4882a593Smuzhiyun mds_user_clear. 143*4882a593Smuzhiyun 144*4882a593Smuzhiyun The mitigation is invoked in prepare_exit_to_usermode() which covers 145*4882a593Smuzhiyun all but one of the kernel to user space transitions. The exception 146*4882a593Smuzhiyun is when we return from a Non Maskable Interrupt (NMI), which is 147*4882a593Smuzhiyun handled directly in do_nmi(). 148*4882a593Smuzhiyun 149*4882a593Smuzhiyun (The reason that NMI is special is that prepare_exit_to_usermode() can 150*4882a593Smuzhiyun enable IRQs. In NMI context, NMIs are blocked, and we don't want to 151*4882a593Smuzhiyun enable IRQs with NMIs blocked.) 152*4882a593Smuzhiyun 153*4882a593Smuzhiyun 154*4882a593Smuzhiyun2. C-State transition 155*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^^^^ 156*4882a593Smuzhiyun 157*4882a593Smuzhiyun When a CPU goes idle and enters a C-State the CPU buffers need to be 158*4882a593Smuzhiyun cleared on affected CPUs when SMT is active. This addresses the 159*4882a593Smuzhiyun repartitioning of the store buffer when one of the Hyper-Threads enters 160*4882a593Smuzhiyun a C-State. 161*4882a593Smuzhiyun 162*4882a593Smuzhiyun When SMT is inactive, i.e. either the CPU does not support it or all 163*4882a593Smuzhiyun sibling threads are offline CPU buffer clearing is not required. 164*4882a593Smuzhiyun 165*4882a593Smuzhiyun The idle clearing is enabled on CPUs which are only affected by MSBDS 166*4882a593Smuzhiyun and not by any other MDS variant. The other MDS variants cannot be 167*4882a593Smuzhiyun protected against cross Hyper-Thread attacks because the Fill Buffer and 168*4882a593Smuzhiyun the Load Ports are shared. So on CPUs affected by other variants, the 169*4882a593Smuzhiyun idle clearing would be a window dressing exercise and is therefore not 170*4882a593Smuzhiyun activated. 171*4882a593Smuzhiyun 172*4882a593Smuzhiyun The invocation is controlled by the static key mds_idle_clear which is 173*4882a593Smuzhiyun switched depending on the chosen mitigation mode and the SMT state of 174*4882a593Smuzhiyun the system. 175*4882a593Smuzhiyun 176*4882a593Smuzhiyun The buffer clear is only invoked before entering the C-State to prevent 177*4882a593Smuzhiyun that stale data from the idling CPU from spilling to the Hyper-Thread 178*4882a593Smuzhiyun sibling after the store buffer got repartitioned and all entries are 179*4882a593Smuzhiyun available to the non idle sibling. 180*4882a593Smuzhiyun 181*4882a593Smuzhiyun When coming out of idle the store buffer is partitioned again so each 182*4882a593Smuzhiyun sibling has half of it available. The back from idle CPU could be then 183*4882a593Smuzhiyun speculatively exposed to contents of the sibling. The buffers are 184*4882a593Smuzhiyun flushed either on exit to user space or on VMENTER so malicious code 185*4882a593Smuzhiyun in user space or the guest cannot speculatively access them. 186*4882a593Smuzhiyun 187*4882a593Smuzhiyun The mitigation is hooked into all variants of halt()/mwait(), but does 188*4882a593Smuzhiyun not cover the legacy ACPI IO-Port mechanism because the ACPI idle driver 189*4882a593Smuzhiyun has been superseded by the intel_idle driver around 2010 and is 190*4882a593Smuzhiyun preferred on all affected CPUs which are expected to gain the MD_CLEAR 191*4882a593Smuzhiyun functionality in microcode. Aside of that the IO-Port mechanism is a 192*4882a593Smuzhiyun legacy interface which is only used on older systems which are either 193*4882a593Smuzhiyun not affected or do not receive microcode updates anymore. 194