xref: /OK3568_Linux_fs/kernel/Documentation/admin-guide/hw-vuln/mds.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593SmuzhiyunMDS - Microarchitectural Data Sampling
2*4882a593Smuzhiyun======================================
3*4882a593Smuzhiyun
4*4882a593SmuzhiyunMicroarchitectural Data Sampling is a hardware vulnerability which allows
5*4882a593Smuzhiyununprivileged speculative access to data which is available in various CPU
6*4882a593Smuzhiyuninternal buffers.
7*4882a593Smuzhiyun
8*4882a593SmuzhiyunAffected processors
9*4882a593Smuzhiyun-------------------
10*4882a593Smuzhiyun
11*4882a593SmuzhiyunThis vulnerability affects a wide range of Intel processors. The
12*4882a593Smuzhiyunvulnerability is not present on:
13*4882a593Smuzhiyun
14*4882a593Smuzhiyun   - Processors from AMD, Centaur and other non Intel vendors
15*4882a593Smuzhiyun
16*4882a593Smuzhiyun   - Older processor models, where the CPU family is < 6
17*4882a593Smuzhiyun
18*4882a593Smuzhiyun   - Some Atoms (Bonnell, Saltwell, Goldmont, GoldmontPlus)
19*4882a593Smuzhiyun
20*4882a593Smuzhiyun   - Intel processors which have the ARCH_CAP_MDS_NO bit set in the
21*4882a593Smuzhiyun     IA32_ARCH_CAPABILITIES MSR.
22*4882a593Smuzhiyun
23*4882a593SmuzhiyunWhether a processor is affected or not can be read out from the MDS
24*4882a593Smuzhiyunvulnerability file in sysfs. See :ref:`mds_sys_info`.
25*4882a593Smuzhiyun
26*4882a593SmuzhiyunNot all processors are affected by all variants of MDS, but the mitigation
27*4882a593Smuzhiyunis identical for all of them so the kernel treats them as a single
28*4882a593Smuzhiyunvulnerability.
29*4882a593Smuzhiyun
30*4882a593SmuzhiyunRelated CVEs
31*4882a593Smuzhiyun------------
32*4882a593Smuzhiyun
33*4882a593SmuzhiyunThe following CVE entries are related to the MDS vulnerability:
34*4882a593Smuzhiyun
35*4882a593Smuzhiyun   ==============  =====  ===================================================
36*4882a593Smuzhiyun   CVE-2018-12126  MSBDS  Microarchitectural Store Buffer Data Sampling
37*4882a593Smuzhiyun   CVE-2018-12130  MFBDS  Microarchitectural Fill Buffer Data Sampling
38*4882a593Smuzhiyun   CVE-2018-12127  MLPDS  Microarchitectural Load Port Data Sampling
39*4882a593Smuzhiyun   CVE-2019-11091  MDSUM  Microarchitectural Data Sampling Uncacheable Memory
40*4882a593Smuzhiyun   ==============  =====  ===================================================
41*4882a593Smuzhiyun
42*4882a593SmuzhiyunProblem
43*4882a593Smuzhiyun-------
44*4882a593Smuzhiyun
45*4882a593SmuzhiyunWhen performing store, load, L1 refill operations, processors write data
46*4882a593Smuzhiyuninto temporary microarchitectural structures (buffers). The data in the
47*4882a593Smuzhiyunbuffer can be forwarded to load operations as an optimization.
48*4882a593Smuzhiyun
49*4882a593SmuzhiyunUnder certain conditions, usually a fault/assist caused by a load
50*4882a593Smuzhiyunoperation, data unrelated to the load memory address can be speculatively
51*4882a593Smuzhiyunforwarded from the buffers. Because the load operation causes a fault or
52*4882a593Smuzhiyunassist and its result will be discarded, the forwarded data will not cause
53*4882a593Smuzhiyunincorrect program execution or state changes. But a malicious operation
54*4882a593Smuzhiyunmay be able to forward this speculative data to a disclosure gadget which
55*4882a593Smuzhiyunallows in turn to infer the value via a cache side channel attack.
56*4882a593Smuzhiyun
57*4882a593SmuzhiyunBecause the buffers are potentially shared between Hyper-Threads cross
58*4882a593SmuzhiyunHyper-Thread attacks are possible.
59*4882a593Smuzhiyun
60*4882a593SmuzhiyunDeeper technical information is available in the MDS specific x86
61*4882a593Smuzhiyunarchitecture section: :ref:`Documentation/x86/mds.rst <mds>`.
62*4882a593Smuzhiyun
63*4882a593Smuzhiyun
64*4882a593SmuzhiyunAttack scenarios
65*4882a593Smuzhiyun----------------
66*4882a593Smuzhiyun
67*4882a593SmuzhiyunAttacks against the MDS vulnerabilities can be mounted from malicious non
68*4882a593Smuzhiyunpriviledged user space applications running on hosts or guest. Malicious
69*4882a593Smuzhiyunguest OSes can obviously mount attacks as well.
70*4882a593Smuzhiyun
71*4882a593SmuzhiyunContrary to other speculation based vulnerabilities the MDS vulnerability
72*4882a593Smuzhiyundoes not allow the attacker to control the memory target address. As a
73*4882a593Smuzhiyunconsequence the attacks are purely sampling based, but as demonstrated with
74*4882a593Smuzhiyunthe TLBleed attack samples can be postprocessed successfully.
75*4882a593Smuzhiyun
76*4882a593SmuzhiyunWeb-Browsers
77*4882a593Smuzhiyun^^^^^^^^^^^^
78*4882a593Smuzhiyun
79*4882a593Smuzhiyun  It's unclear whether attacks through Web-Browsers are possible at
80*4882a593Smuzhiyun  all. The exploitation through Java-Script is considered very unlikely,
81*4882a593Smuzhiyun  but other widely used web technologies like Webassembly could possibly be
82*4882a593Smuzhiyun  abused.
83*4882a593Smuzhiyun
84*4882a593Smuzhiyun
85*4882a593Smuzhiyun.. _mds_sys_info:
86*4882a593Smuzhiyun
87*4882a593SmuzhiyunMDS system information
88*4882a593Smuzhiyun-----------------------
89*4882a593Smuzhiyun
90*4882a593SmuzhiyunThe Linux kernel provides a sysfs interface to enumerate the current MDS
91*4882a593Smuzhiyunstatus of the system: whether the system is vulnerable, and which
92*4882a593Smuzhiyunmitigations are active. The relevant sysfs file is:
93*4882a593Smuzhiyun
94*4882a593Smuzhiyun/sys/devices/system/cpu/vulnerabilities/mds
95*4882a593Smuzhiyun
96*4882a593SmuzhiyunThe possible values in this file are:
97*4882a593Smuzhiyun
98*4882a593Smuzhiyun  .. list-table::
99*4882a593Smuzhiyun
100*4882a593Smuzhiyun     * - 'Not affected'
101*4882a593Smuzhiyun       - The processor is not vulnerable
102*4882a593Smuzhiyun     * - 'Vulnerable'
103*4882a593Smuzhiyun       - The processor is vulnerable, but no mitigation enabled
104*4882a593Smuzhiyun     * - 'Vulnerable: Clear CPU buffers attempted, no microcode'
105*4882a593Smuzhiyun       - The processor is vulnerable but microcode is not updated.
106*4882a593Smuzhiyun
107*4882a593Smuzhiyun         The mitigation is enabled on a best effort basis. See :ref:`vmwerv`
108*4882a593Smuzhiyun     * - 'Mitigation: Clear CPU buffers'
109*4882a593Smuzhiyun       - The processor is vulnerable and the CPU buffer clearing mitigation is
110*4882a593Smuzhiyun         enabled.
111*4882a593Smuzhiyun
112*4882a593SmuzhiyunIf the processor is vulnerable then the following information is appended
113*4882a593Smuzhiyunto the above information:
114*4882a593Smuzhiyun
115*4882a593Smuzhiyun    ========================  ============================================
116*4882a593Smuzhiyun    'SMT vulnerable'          SMT is enabled
117*4882a593Smuzhiyun    'SMT mitigated'           SMT is enabled and mitigated
118*4882a593Smuzhiyun    'SMT disabled'            SMT is disabled
119*4882a593Smuzhiyun    'SMT Host state unknown'  Kernel runs in a VM, Host SMT state unknown
120*4882a593Smuzhiyun    ========================  ============================================
121*4882a593Smuzhiyun
122*4882a593Smuzhiyun.. _vmwerv:
123*4882a593Smuzhiyun
124*4882a593SmuzhiyunBest effort mitigation mode
125*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^^^^^^^^^^
126*4882a593Smuzhiyun
127*4882a593Smuzhiyun  If the processor is vulnerable, but the availability of the microcode based
128*4882a593Smuzhiyun  mitigation mechanism is not advertised via CPUID the kernel selects a best
129*4882a593Smuzhiyun  effort mitigation mode.  This mode invokes the mitigation instructions
130*4882a593Smuzhiyun  without a guarantee that they clear the CPU buffers.
131*4882a593Smuzhiyun
132*4882a593Smuzhiyun  This is done to address virtualization scenarios where the host has the
133*4882a593Smuzhiyun  microcode update applied, but the hypervisor is not yet updated to expose
134*4882a593Smuzhiyun  the CPUID to the guest. If the host has updated microcode the protection
135*4882a593Smuzhiyun  takes effect otherwise a few cpu cycles are wasted pointlessly.
136*4882a593Smuzhiyun
137*4882a593Smuzhiyun  The state in the mds sysfs file reflects this situation accordingly.
138*4882a593Smuzhiyun
139*4882a593Smuzhiyun
140*4882a593SmuzhiyunMitigation mechanism
141*4882a593Smuzhiyun-------------------------
142*4882a593Smuzhiyun
143*4882a593SmuzhiyunThe kernel detects the affected CPUs and the presence of the microcode
144*4882a593Smuzhiyunwhich is required.
145*4882a593Smuzhiyun
146*4882a593SmuzhiyunIf a CPU is affected and the microcode is available, then the kernel
147*4882a593Smuzhiyunenables the mitigation by default. The mitigation can be controlled at boot
148*4882a593Smuzhiyuntime via a kernel command line option. See
149*4882a593Smuzhiyun:ref:`mds_mitigation_control_command_line`.
150*4882a593Smuzhiyun
151*4882a593Smuzhiyun.. _cpu_buffer_clear:
152*4882a593Smuzhiyun
153*4882a593SmuzhiyunCPU buffer clearing
154*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^^
155*4882a593Smuzhiyun
156*4882a593Smuzhiyun  The mitigation for MDS clears the affected CPU buffers on return to user
157*4882a593Smuzhiyun  space and when entering a guest.
158*4882a593Smuzhiyun
159*4882a593Smuzhiyun  If SMT is enabled it also clears the buffers on idle entry when the CPU
160*4882a593Smuzhiyun  is only affected by MSBDS and not any other MDS variant, because the
161*4882a593Smuzhiyun  other variants cannot be protected against cross Hyper-Thread attacks.
162*4882a593Smuzhiyun
163*4882a593Smuzhiyun  For CPUs which are only affected by MSBDS the user space, guest and idle
164*4882a593Smuzhiyun  transition mitigations are sufficient and SMT is not affected.
165*4882a593Smuzhiyun
166*4882a593Smuzhiyun.. _virt_mechanism:
167*4882a593Smuzhiyun
168*4882a593SmuzhiyunVirtualization mitigation
169*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^^^^^^^^
170*4882a593Smuzhiyun
171*4882a593Smuzhiyun  The protection for host to guest transition depends on the L1TF
172*4882a593Smuzhiyun  vulnerability of the CPU:
173*4882a593Smuzhiyun
174*4882a593Smuzhiyun  - CPU is affected by L1TF:
175*4882a593Smuzhiyun
176*4882a593Smuzhiyun    If the L1D flush mitigation is enabled and up to date microcode is
177*4882a593Smuzhiyun    available, the L1D flush mitigation is automatically protecting the
178*4882a593Smuzhiyun    guest transition.
179*4882a593Smuzhiyun
180*4882a593Smuzhiyun    If the L1D flush mitigation is disabled then the MDS mitigation is
181*4882a593Smuzhiyun    invoked explicit when the host MDS mitigation is enabled.
182*4882a593Smuzhiyun
183*4882a593Smuzhiyun    For details on L1TF and virtualization see:
184*4882a593Smuzhiyun    :ref:`Documentation/admin-guide/hw-vuln//l1tf.rst <mitigation_control_kvm>`.
185*4882a593Smuzhiyun
186*4882a593Smuzhiyun  - CPU is not affected by L1TF:
187*4882a593Smuzhiyun
188*4882a593Smuzhiyun    CPU buffers are flushed before entering the guest when the host MDS
189*4882a593Smuzhiyun    mitigation is enabled.
190*4882a593Smuzhiyun
191*4882a593Smuzhiyun  The resulting MDS protection matrix for the host to guest transition:
192*4882a593Smuzhiyun
193*4882a593Smuzhiyun  ============ ===== ============= ============ =================
194*4882a593Smuzhiyun   L1TF         MDS   VMX-L1FLUSH   Host MDS     MDS-State
195*4882a593Smuzhiyun
196*4882a593Smuzhiyun   Don't care   No    Don't care    N/A          Not affected
197*4882a593Smuzhiyun
198*4882a593Smuzhiyun   Yes          Yes   Disabled      Off          Vulnerable
199*4882a593Smuzhiyun
200*4882a593Smuzhiyun   Yes          Yes   Disabled      Full         Mitigated
201*4882a593Smuzhiyun
202*4882a593Smuzhiyun   Yes          Yes   Enabled       Don't care   Mitigated
203*4882a593Smuzhiyun
204*4882a593Smuzhiyun   No           Yes   N/A           Off          Vulnerable
205*4882a593Smuzhiyun
206*4882a593Smuzhiyun   No           Yes   N/A           Full         Mitigated
207*4882a593Smuzhiyun  ============ ===== ============= ============ =================
208*4882a593Smuzhiyun
209*4882a593Smuzhiyun  This only covers the host to guest transition, i.e. prevents leakage from
210*4882a593Smuzhiyun  host to guest, but does not protect the guest internally. Guests need to
211*4882a593Smuzhiyun  have their own protections.
212*4882a593Smuzhiyun
213*4882a593Smuzhiyun.. _xeon_phi:
214*4882a593Smuzhiyun
215*4882a593SmuzhiyunXEON PHI specific considerations
216*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
217*4882a593Smuzhiyun
218*4882a593Smuzhiyun  The XEON PHI processor family is affected by MSBDS which can be exploited
219*4882a593Smuzhiyun  cross Hyper-Threads when entering idle states. Some XEON PHI variants allow
220*4882a593Smuzhiyun  to use MWAIT in user space (Ring 3) which opens an potential attack vector
221*4882a593Smuzhiyun  for malicious user space. The exposure can be disabled on the kernel
222*4882a593Smuzhiyun  command line with the 'ring3mwait=disable' command line option.
223*4882a593Smuzhiyun
224*4882a593Smuzhiyun  XEON PHI is not affected by the other MDS variants and MSBDS is mitigated
225*4882a593Smuzhiyun  before the CPU enters a idle state. As XEON PHI is not affected by L1TF
226*4882a593Smuzhiyun  either disabling SMT is not required for full protection.
227*4882a593Smuzhiyun
228*4882a593Smuzhiyun.. _mds_smt_control:
229*4882a593Smuzhiyun
230*4882a593SmuzhiyunSMT control
231*4882a593Smuzhiyun^^^^^^^^^^^
232*4882a593Smuzhiyun
233*4882a593Smuzhiyun  All MDS variants except MSBDS can be attacked cross Hyper-Threads. That
234*4882a593Smuzhiyun  means on CPUs which are affected by MFBDS or MLPDS it is necessary to
235*4882a593Smuzhiyun  disable SMT for full protection. These are most of the affected CPUs; the
236*4882a593Smuzhiyun  exception is XEON PHI, see :ref:`xeon_phi`.
237*4882a593Smuzhiyun
238*4882a593Smuzhiyun  Disabling SMT can have a significant performance impact, but the impact
239*4882a593Smuzhiyun  depends on the type of workloads.
240*4882a593Smuzhiyun
241*4882a593Smuzhiyun  See the relevant chapter in the L1TF mitigation documentation for details:
242*4882a593Smuzhiyun  :ref:`Documentation/admin-guide/hw-vuln/l1tf.rst <smt_control>`.
243*4882a593Smuzhiyun
244*4882a593Smuzhiyun
245*4882a593Smuzhiyun.. _mds_mitigation_control_command_line:
246*4882a593Smuzhiyun
247*4882a593SmuzhiyunMitigation control on the kernel command line
248*4882a593Smuzhiyun---------------------------------------------
249*4882a593Smuzhiyun
250*4882a593SmuzhiyunThe kernel command line allows to control the MDS mitigations at boot
251*4882a593Smuzhiyuntime with the option "mds=". The valid arguments for this option are:
252*4882a593Smuzhiyun
253*4882a593Smuzhiyun  ============  =============================================================
254*4882a593Smuzhiyun  full		If the CPU is vulnerable, enable all available mitigations
255*4882a593Smuzhiyun		for the MDS vulnerability, CPU buffer clearing on exit to
256*4882a593Smuzhiyun		userspace and when entering a VM. Idle transitions are
257*4882a593Smuzhiyun		protected as well if SMT is enabled.
258*4882a593Smuzhiyun
259*4882a593Smuzhiyun		It does not automatically disable SMT.
260*4882a593Smuzhiyun
261*4882a593Smuzhiyun  full,nosmt	The same as mds=full, with SMT disabled on vulnerable
262*4882a593Smuzhiyun		CPUs.  This is the complete mitigation.
263*4882a593Smuzhiyun
264*4882a593Smuzhiyun  off		Disables MDS mitigations completely.
265*4882a593Smuzhiyun
266*4882a593Smuzhiyun  ============  =============================================================
267*4882a593Smuzhiyun
268*4882a593SmuzhiyunNot specifying this option is equivalent to "mds=full". For processors
269*4882a593Smuzhiyunthat are affected by both TAA (TSX Asynchronous Abort) and MDS,
270*4882a593Smuzhiyunspecifying just "mds=off" without an accompanying "tsx_async_abort=off"
271*4882a593Smuzhiyunwill have no effect as the same mitigation is used for both
272*4882a593Smuzhiyunvulnerabilities.
273*4882a593Smuzhiyun
274*4882a593SmuzhiyunMitigation selection guide
275*4882a593Smuzhiyun--------------------------
276*4882a593Smuzhiyun
277*4882a593Smuzhiyun1. Trusted userspace
278*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^^^
279*4882a593Smuzhiyun
280*4882a593Smuzhiyun   If all userspace applications are from a trusted source and do not
281*4882a593Smuzhiyun   execute untrusted code which is supplied externally, then the mitigation
282*4882a593Smuzhiyun   can be disabled.
283*4882a593Smuzhiyun
284*4882a593Smuzhiyun
285*4882a593Smuzhiyun2. Virtualization with trusted guests
286*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
287*4882a593Smuzhiyun
288*4882a593Smuzhiyun   The same considerations as above versus trusted user space apply.
289*4882a593Smuzhiyun
290*4882a593Smuzhiyun3. Virtualization with untrusted guests
291*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
292*4882a593Smuzhiyun
293*4882a593Smuzhiyun   The protection depends on the state of the L1TF mitigations.
294*4882a593Smuzhiyun   See :ref:`virt_mechanism`.
295*4882a593Smuzhiyun
296*4882a593Smuzhiyun   If the MDS mitigation is enabled and SMT is disabled, guest to host and
297*4882a593Smuzhiyun   guest to guest attacks are prevented.
298*4882a593Smuzhiyun
299*4882a593Smuzhiyun.. _mds_default_mitigations:
300*4882a593Smuzhiyun
301*4882a593SmuzhiyunDefault mitigations
302*4882a593Smuzhiyun-------------------
303*4882a593Smuzhiyun
304*4882a593Smuzhiyun  The kernel default mitigations for vulnerable processors are:
305*4882a593Smuzhiyun
306*4882a593Smuzhiyun  - Enable CPU buffer clearing
307*4882a593Smuzhiyun
308*4882a593Smuzhiyun  The kernel does not by default enforce the disabling of SMT, which leaves
309*4882a593Smuzhiyun  SMT systems vulnerable when running untrusted code. The same rationale as
310*4882a593Smuzhiyun  for L1TF applies.
311*4882a593Smuzhiyun  See :ref:`Documentation/admin-guide/hw-vuln//l1tf.rst <default_mitigations>`.
312