xref: /OK3568_Linux_fs/kernel/Documentation/admin-guide/pm/intel_pstate.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun.. include:: <isonum.txt>
3*4882a593Smuzhiyun
4*4882a593Smuzhiyun===============================================
5*4882a593Smuzhiyun``intel_pstate`` CPU Performance Scaling Driver
6*4882a593Smuzhiyun===============================================
7*4882a593Smuzhiyun
8*4882a593Smuzhiyun:Copyright: |copy| 2017 Intel Corporation
9*4882a593Smuzhiyun
10*4882a593Smuzhiyun:Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
11*4882a593Smuzhiyun
12*4882a593Smuzhiyun
13*4882a593SmuzhiyunGeneral Information
14*4882a593Smuzhiyun===================
15*4882a593Smuzhiyun
16*4882a593Smuzhiyun``intel_pstate`` is a part of the
17*4882a593Smuzhiyun:doc:`CPU performance scaling subsystem <cpufreq>` in the Linux kernel
18*4882a593Smuzhiyun(``CPUFreq``).  It is a scaling driver for the Sandy Bridge and later
19*4882a593Smuzhiyungenerations of Intel processors.  Note, however, that some of those processors
20*4882a593Smuzhiyunmay not be supported.  [To understand ``intel_pstate`` it is necessary to know
21*4882a593Smuzhiyunhow ``CPUFreq`` works in general, so this is the time to read :doc:`cpufreq` if
22*4882a593Smuzhiyunyou have not done that yet.]
23*4882a593Smuzhiyun
24*4882a593SmuzhiyunFor the processors supported by ``intel_pstate``, the P-state concept is broader
25*4882a593Smuzhiyunthan just an operating frequency or an operating performance point (see the
26*4882a593SmuzhiyunLinuxCon Europe 2015 presentation by Kristen Accardi [1]_ for more
27*4882a593Smuzhiyuninformation about that).  For this reason, the representation of P-states used
28*4882a593Smuzhiyunby ``intel_pstate`` internally follows the hardware specification (for details
29*4882a593Smuzhiyunrefer to Intel Software Developer’s Manual [2]_).  However, the ``CPUFreq`` core
30*4882a593Smuzhiyunuses frequencies for identifying operating performance points of CPUs and
31*4882a593Smuzhiyunfrequencies are involved in the user space interface exposed by it, so
32*4882a593Smuzhiyun``intel_pstate`` maps its internal representation of P-states to frequencies too
33*4882a593Smuzhiyun(fortunately, that mapping is unambiguous).  At the same time, it would not be
34*4882a593Smuzhiyunpractical for ``intel_pstate`` to supply the ``CPUFreq`` core with a table of
35*4882a593Smuzhiyunavailable frequencies due to the possible size of it, so the driver does not do
36*4882a593Smuzhiyunthat.  Some functionality of the core is limited by that.
37*4882a593Smuzhiyun
38*4882a593SmuzhiyunSince the hardware P-state selection interface used by ``intel_pstate`` is
39*4882a593Smuzhiyunavailable at the logical CPU level, the driver always works with individual
40*4882a593SmuzhiyunCPUs.  Consequently, if ``intel_pstate`` is in use, every ``CPUFreq`` policy
41*4882a593Smuzhiyunobject corresponds to one logical CPU and ``CPUFreq`` policies are effectively
42*4882a593Smuzhiyunequivalent to CPUs.  In particular, this means that they become "inactive" every
43*4882a593Smuzhiyuntime the corresponding CPU is taken offline and need to be re-initialized when
44*4882a593Smuzhiyunit goes back online.
45*4882a593Smuzhiyun
46*4882a593Smuzhiyun``intel_pstate`` is not modular, so it cannot be unloaded, which means that the
47*4882a593Smuzhiyunonly way to pass early-configuration-time parameters to it is via the kernel
48*4882a593Smuzhiyuncommand line.  However, its configuration can be adjusted via ``sysfs`` to a
49*4882a593Smuzhiyungreat extent.  In some configurations it even is possible to unregister it via
50*4882a593Smuzhiyun``sysfs`` which allows another ``CPUFreq`` scaling driver to be loaded and
51*4882a593Smuzhiyunregistered (see `below <status_attr_>`_).
52*4882a593Smuzhiyun
53*4882a593Smuzhiyun
54*4882a593SmuzhiyunOperation Modes
55*4882a593Smuzhiyun===============
56*4882a593Smuzhiyun
57*4882a593Smuzhiyun``intel_pstate`` can operate in two different modes, active or passive.  In the
58*4882a593Smuzhiyunactive mode, it uses its own internal performance scaling governor algorithm or
59*4882a593Smuzhiyunallows the hardware to do preformance scaling by itself, while in the passive
60*4882a593Smuzhiyunmode it responds to requests made by a generic ``CPUFreq`` governor implementing
61*4882a593Smuzhiyuna certain performance scaling algorithm.  Which of them will be in effect
62*4882a593Smuzhiyundepends on what kernel command line options are used and on the capabilities of
63*4882a593Smuzhiyunthe processor.
64*4882a593Smuzhiyun
65*4882a593SmuzhiyunActive Mode
66*4882a593Smuzhiyun-----------
67*4882a593Smuzhiyun
68*4882a593SmuzhiyunThis is the default operation mode of ``intel_pstate`` for processors with
69*4882a593Smuzhiyunhardware-managed P-states (HWP) support.  If it works in this mode, the
70*4882a593Smuzhiyun``scaling_driver`` policy attribute in ``sysfs`` for all ``CPUFreq`` policies
71*4882a593Smuzhiyuncontains the string "intel_pstate".
72*4882a593Smuzhiyun
73*4882a593SmuzhiyunIn this mode the driver bypasses the scaling governors layer of ``CPUFreq`` and
74*4882a593Smuzhiyunprovides its own scaling algorithms for P-state selection.  Those algorithms
75*4882a593Smuzhiyuncan be applied to ``CPUFreq`` policies in the same way as generic scaling
76*4882a593Smuzhiyungovernors (that is, through the ``scaling_governor`` policy attribute in
77*4882a593Smuzhiyun``sysfs``).  [Note that different P-state selection algorithms may be chosen for
78*4882a593Smuzhiyundifferent policies, but that is not recommended.]
79*4882a593Smuzhiyun
80*4882a593SmuzhiyunThey are not generic scaling governors, but their names are the same as the
81*4882a593Smuzhiyunnames of some of those governors.  Moreover, confusingly enough, they generally
82*4882a593Smuzhiyundo not work in the same way as the generic governors they share the names with.
83*4882a593SmuzhiyunFor example, the ``powersave`` P-state selection algorithm provided by
84*4882a593Smuzhiyun``intel_pstate`` is not a counterpart of the generic ``powersave`` governor
85*4882a593Smuzhiyun(roughly, it corresponds to the ``schedutil`` and ``ondemand`` governors).
86*4882a593Smuzhiyun
87*4882a593SmuzhiyunThere are two P-state selection algorithms provided by ``intel_pstate`` in the
88*4882a593Smuzhiyunactive mode: ``powersave`` and ``performance``.  The way they both operate
89*4882a593Smuzhiyundepends on whether or not the hardware-managed P-states (HWP) feature has been
90*4882a593Smuzhiyunenabled in the processor and possibly on the processor model.
91*4882a593Smuzhiyun
92*4882a593SmuzhiyunWhich of the P-state selection algorithms is used by default depends on the
93*4882a593Smuzhiyun:c:macro:`CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE` kernel configuration option.
94*4882a593SmuzhiyunNamely, if that option is set, the ``performance`` algorithm will be used by
95*4882a593Smuzhiyundefault, and the other one will be used by default if it is not set.
96*4882a593Smuzhiyun
97*4882a593SmuzhiyunActive Mode With HWP
98*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~~~
99*4882a593Smuzhiyun
100*4882a593SmuzhiyunIf the processor supports the HWP feature, it will be enabled during the
101*4882a593Smuzhiyunprocessor initialization and cannot be disabled after that.  It is possible
102*4882a593Smuzhiyunto avoid enabling it by passing the ``intel_pstate=no_hwp`` argument to the
103*4882a593Smuzhiyunkernel in the command line.
104*4882a593Smuzhiyun
105*4882a593SmuzhiyunIf the HWP feature has been enabled, ``intel_pstate`` relies on the processor to
106*4882a593Smuzhiyunselect P-states by itself, but still it can give hints to the processor's
107*4882a593Smuzhiyuninternal P-state selection logic.  What those hints are depends on which P-state
108*4882a593Smuzhiyunselection algorithm has been applied to the given policy (or to the CPU it
109*4882a593Smuzhiyuncorresponds to).
110*4882a593Smuzhiyun
111*4882a593SmuzhiyunEven though the P-state selection is carried out by the processor automatically,
112*4882a593Smuzhiyun``intel_pstate`` registers utilization update callbacks with the CPU scheduler
113*4882a593Smuzhiyunin this mode.  However, they are not used for running a P-state selection
114*4882a593Smuzhiyunalgorithm, but for periodic updates of the current CPU frequency information to
115*4882a593Smuzhiyunbe made available from the ``scaling_cur_freq`` policy attribute in ``sysfs``.
116*4882a593Smuzhiyun
117*4882a593SmuzhiyunHWP + ``performance``
118*4882a593Smuzhiyun.....................
119*4882a593Smuzhiyun
120*4882a593SmuzhiyunIn this configuration ``intel_pstate`` will write 0 to the processor's
121*4882a593SmuzhiyunEnergy-Performance Preference (EPP) knob (if supported) or its
122*4882a593SmuzhiyunEnergy-Performance Bias (EPB) knob (otherwise), which means that the processor's
123*4882a593Smuzhiyuninternal P-state selection logic is expected to focus entirely on performance.
124*4882a593Smuzhiyun
125*4882a593SmuzhiyunThis will override the EPP/EPB setting coming from the ``sysfs`` interface
126*4882a593Smuzhiyun(see `Energy vs Performance Hints`_ below).  Moreover, any attempts to change
127*4882a593Smuzhiyunthe EPP/EPB to a value different from 0 ("performance") via ``sysfs`` in this
128*4882a593Smuzhiyunconfiguration will be rejected.
129*4882a593Smuzhiyun
130*4882a593SmuzhiyunAlso, in this configuration the range of P-states available to the processor's
131*4882a593Smuzhiyuninternal P-state selection logic is always restricted to the upper boundary
132*4882a593Smuzhiyun(that is, the maximum P-state that the driver is allowed to use).
133*4882a593Smuzhiyun
134*4882a593SmuzhiyunHWP + ``powersave``
135*4882a593Smuzhiyun...................
136*4882a593Smuzhiyun
137*4882a593SmuzhiyunIn this configuration ``intel_pstate`` will set the processor's
138*4882a593SmuzhiyunEnergy-Performance Preference (EPP) knob (if supported) or its
139*4882a593SmuzhiyunEnergy-Performance Bias (EPB) knob (otherwise) to whatever value it was
140*4882a593Smuzhiyunpreviously set to via ``sysfs`` (or whatever default value it was
141*4882a593Smuzhiyunset to by the platform firmware).  This usually causes the processor's
142*4882a593Smuzhiyuninternal P-state selection logic to be less performance-focused.
143*4882a593Smuzhiyun
144*4882a593SmuzhiyunActive Mode Without HWP
145*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~~~~~~
146*4882a593Smuzhiyun
147*4882a593SmuzhiyunThis operation mode is optional for processors that do not support the HWP
148*4882a593Smuzhiyunfeature or when the ``intel_pstate=no_hwp`` argument is passed to the kernel in
149*4882a593Smuzhiyunthe command line.  The active mode is used in those cases if the
150*4882a593Smuzhiyun``intel_pstate=active`` argument is passed to the kernel in the command line.
151*4882a593SmuzhiyunIn this mode ``intel_pstate`` may refuse to work with processors that are not
152*4882a593Smuzhiyunrecognized by it.  [Note that ``intel_pstate`` will never refuse to work with
153*4882a593Smuzhiyunany processor with the HWP feature enabled.]
154*4882a593Smuzhiyun
155*4882a593SmuzhiyunIn this mode ``intel_pstate`` registers utilization update callbacks with the
156*4882a593SmuzhiyunCPU scheduler in order to run a P-state selection algorithm, either
157*4882a593Smuzhiyun``powersave`` or ``performance``, depending on the ``scaling_governor`` policy
158*4882a593Smuzhiyunsetting in ``sysfs``.  The current CPU frequency information to be made
159*4882a593Smuzhiyunavailable from the ``scaling_cur_freq`` policy attribute in ``sysfs`` is
160*4882a593Smuzhiyunperiodically updated by those utilization update callbacks too.
161*4882a593Smuzhiyun
162*4882a593Smuzhiyun``performance``
163*4882a593Smuzhiyun...............
164*4882a593Smuzhiyun
165*4882a593SmuzhiyunWithout HWP, this P-state selection algorithm is always the same regardless of
166*4882a593Smuzhiyunthe processor model and platform configuration.
167*4882a593Smuzhiyun
168*4882a593SmuzhiyunIt selects the maximum P-state it is allowed to use, subject to limits set via
169*4882a593Smuzhiyun``sysfs``, every time the driver configuration for the given CPU is updated
170*4882a593Smuzhiyun(e.g. via ``sysfs``).
171*4882a593Smuzhiyun
172*4882a593SmuzhiyunThis is the default P-state selection algorithm if the
173*4882a593Smuzhiyun:c:macro:`CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE` kernel configuration option
174*4882a593Smuzhiyunis set.
175*4882a593Smuzhiyun
176*4882a593Smuzhiyun``powersave``
177*4882a593Smuzhiyun.............
178*4882a593Smuzhiyun
179*4882a593SmuzhiyunWithout HWP, this P-state selection algorithm is similar to the algorithm
180*4882a593Smuzhiyunimplemented by the generic ``schedutil`` scaling governor except that the
181*4882a593Smuzhiyunutilization metric used by it is based on numbers coming from feedback
182*4882a593Smuzhiyunregisters of the CPU.  It generally selects P-states proportional to the
183*4882a593Smuzhiyuncurrent CPU utilization.
184*4882a593Smuzhiyun
185*4882a593SmuzhiyunThis algorithm is run by the driver's utilization update callback for the
186*4882a593Smuzhiyungiven CPU when it is invoked by the CPU scheduler, but not more often than
187*4882a593Smuzhiyunevery 10 ms.  Like in the ``performance`` case, the hardware configuration
188*4882a593Smuzhiyunis not touched if the new P-state turns out to be the same as the current
189*4882a593Smuzhiyunone.
190*4882a593Smuzhiyun
191*4882a593SmuzhiyunThis is the default P-state selection algorithm if the
192*4882a593Smuzhiyun:c:macro:`CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE` kernel configuration option
193*4882a593Smuzhiyunis not set.
194*4882a593Smuzhiyun
195*4882a593SmuzhiyunPassive Mode
196*4882a593Smuzhiyun------------
197*4882a593Smuzhiyun
198*4882a593SmuzhiyunThis is the default operation mode of ``intel_pstate`` for processors without
199*4882a593Smuzhiyunhardware-managed P-states (HWP) support.  It is always used if the
200*4882a593Smuzhiyun``intel_pstate=passive`` argument is passed to the kernel in the command line
201*4882a593Smuzhiyunregardless of whether or not the given processor supports HWP.  [Note that the
202*4882a593Smuzhiyun``intel_pstate=no_hwp`` setting causes the driver to start in the passive mode
203*4882a593Smuzhiyunif it is not combined with ``intel_pstate=active``.]  Like in the active mode
204*4882a593Smuzhiyunwithout HWP support, in this mode ``intel_pstate`` may refuse to work with
205*4882a593Smuzhiyunprocessors that are not recognized by it if HWP is prevented from being enabled
206*4882a593Smuzhiyunthrough the kernel command line.
207*4882a593Smuzhiyun
208*4882a593SmuzhiyunIf the driver works in this mode, the ``scaling_driver`` policy attribute in
209*4882a593Smuzhiyun``sysfs`` for all ``CPUFreq`` policies contains the string "intel_cpufreq".
210*4882a593SmuzhiyunThen, the driver behaves like a regular ``CPUFreq`` scaling driver.  That is,
211*4882a593Smuzhiyunit is invoked by generic scaling governors when necessary to talk to the
212*4882a593Smuzhiyunhardware in order to change the P-state of a CPU (in particular, the
213*4882a593Smuzhiyun``schedutil`` governor can invoke it directly from scheduler context).
214*4882a593Smuzhiyun
215*4882a593SmuzhiyunWhile in this mode, ``intel_pstate`` can be used with all of the (generic)
216*4882a593Smuzhiyunscaling governors listed by the ``scaling_available_governors`` policy attribute
217*4882a593Smuzhiyunin ``sysfs`` (and the P-state selection algorithms described above are not
218*4882a593Smuzhiyunused).  Then, it is responsible for the configuration of policy objects
219*4882a593Smuzhiyuncorresponding to CPUs and provides the ``CPUFreq`` core (and the scaling
220*4882a593Smuzhiyungovernors attached to the policy objects) with accurate information on the
221*4882a593Smuzhiyunmaximum and minimum operating frequencies supported by the hardware (including
222*4882a593Smuzhiyunthe so-called "turbo" frequency ranges).  In other words, in the passive mode
223*4882a593Smuzhiyunthe entire range of available P-states is exposed by ``intel_pstate`` to the
224*4882a593Smuzhiyun``CPUFreq`` core.  However, in this mode the driver does not register
225*4882a593Smuzhiyunutilization update callbacks with the CPU scheduler and the ``scaling_cur_freq``
226*4882a593Smuzhiyuninformation comes from the ``CPUFreq`` core (and is the last frequency selected
227*4882a593Smuzhiyunby the current scaling governor for the given policy).
228*4882a593Smuzhiyun
229*4882a593Smuzhiyun
230*4882a593Smuzhiyun.. _turbo:
231*4882a593Smuzhiyun
232*4882a593SmuzhiyunTurbo P-states Support
233*4882a593Smuzhiyun======================
234*4882a593Smuzhiyun
235*4882a593SmuzhiyunIn the majority of cases, the entire range of P-states available to
236*4882a593Smuzhiyun``intel_pstate`` can be divided into two sub-ranges that correspond to
237*4882a593Smuzhiyundifferent types of processor behavior, above and below a boundary that
238*4882a593Smuzhiyunwill be referred to as the "turbo threshold" in what follows.
239*4882a593Smuzhiyun
240*4882a593SmuzhiyunThe P-states above the turbo threshold are referred to as "turbo P-states" and
241*4882a593Smuzhiyunthe whole sub-range of P-states they belong to is referred to as the "turbo
242*4882a593Smuzhiyunrange".  These names are related to the Turbo Boost technology allowing a
243*4882a593Smuzhiyunmulticore processor to opportunistically increase the P-state of one or more
244*4882a593Smuzhiyuncores if there is enough power to do that and if that is not going to cause the
245*4882a593Smuzhiyunthermal envelope of the processor package to be exceeded.
246*4882a593Smuzhiyun
247*4882a593SmuzhiyunSpecifically, if software sets the P-state of a CPU core within the turbo range
248*4882a593Smuzhiyun(that is, above the turbo threshold), the processor is permitted to take over
249*4882a593Smuzhiyunperformance scaling control for that core and put it into turbo P-states of its
250*4882a593Smuzhiyunchoice going forward.  However, that permission is interpreted differently by
251*4882a593Smuzhiyundifferent processor generations.  Namely, the Sandy Bridge generation of
252*4882a593Smuzhiyunprocessors will never use any P-states above the last one set by software for
253*4882a593Smuzhiyunthe given core, even if it is within the turbo range, whereas all of the later
254*4882a593Smuzhiyunprocessor generations will take it as a license to use any P-states from the
255*4882a593Smuzhiyunturbo range, even above the one set by software.  In other words, on those
256*4882a593Smuzhiyunprocessors setting any P-state from the turbo range will enable the processor
257*4882a593Smuzhiyunto put the given core into all turbo P-states up to and including the maximum
258*4882a593Smuzhiyunsupported one as it sees fit.
259*4882a593Smuzhiyun
260*4882a593SmuzhiyunOne important property of turbo P-states is that they are not sustainable.  More
261*4882a593Smuzhiyunprecisely, there is no guarantee that any CPUs will be able to stay in any of
262*4882a593Smuzhiyunthose states indefinitely, because the power distribution within the processor
263*4882a593Smuzhiyunpackage may change over time  or the thermal envelope it was designed for might
264*4882a593Smuzhiyunbe exceeded if a turbo P-state was used for too long.
265*4882a593Smuzhiyun
266*4882a593SmuzhiyunIn turn, the P-states below the turbo threshold generally are sustainable.  In
267*4882a593Smuzhiyunfact, if one of them is set by software, the processor is not expected to change
268*4882a593Smuzhiyunit to a lower one unless in a thermal stress or a power limit violation
269*4882a593Smuzhiyunsituation (a higher P-state may still be used if it is set for another CPU in
270*4882a593Smuzhiyunthe same package at the same time, for example).
271*4882a593Smuzhiyun
272*4882a593SmuzhiyunSome processors allow multiple cores to be in turbo P-states at the same time,
273*4882a593Smuzhiyunbut the maximum P-state that can be set for them generally depends on the number
274*4882a593Smuzhiyunof cores running concurrently.  The maximum turbo P-state that can be set for 3
275*4882a593Smuzhiyuncores at the same time usually is lower than the analogous maximum P-state for
276*4882a593Smuzhiyun2 cores, which in turn usually is lower than the maximum turbo P-state that can
277*4882a593Smuzhiyunbe set for 1 core.  The one-core maximum turbo P-state is thus the maximum
278*4882a593Smuzhiyunsupported one overall.
279*4882a593Smuzhiyun
280*4882a593SmuzhiyunThe maximum supported turbo P-state, the turbo threshold (the maximum supported
281*4882a593Smuzhiyunnon-turbo P-state) and the minimum supported P-state are specific to the
282*4882a593Smuzhiyunprocessor model and can be determined by reading the processor's model-specific
283*4882a593Smuzhiyunregisters (MSRs).  Moreover, some processors support the Configurable TDP
284*4882a593Smuzhiyun(Thermal Design Power) feature and, when that feature is enabled, the turbo
285*4882a593Smuzhiyunthreshold effectively becomes a configurable value that can be set by the
286*4882a593Smuzhiyunplatform firmware.
287*4882a593Smuzhiyun
288*4882a593SmuzhiyunUnlike ``_PSS`` objects in the ACPI tables, ``intel_pstate`` always exposes
289*4882a593Smuzhiyunthe entire range of available P-states, including the whole turbo range, to the
290*4882a593Smuzhiyun``CPUFreq`` core and (in the passive mode) to generic scaling governors.  This
291*4882a593Smuzhiyungenerally causes turbo P-states to be set more often when ``intel_pstate`` is
292*4882a593Smuzhiyunused relative to ACPI-based CPU performance scaling (see `below <acpi-cpufreq_>`_
293*4882a593Smuzhiyunfor more information).
294*4882a593Smuzhiyun
295*4882a593SmuzhiyunMoreover, since ``intel_pstate`` always knows what the real turbo threshold is
296*4882a593Smuzhiyun(even if the Configurable TDP feature is enabled in the processor), its
297*4882a593Smuzhiyun``no_turbo`` attribute in ``sysfs`` (described `below <no_turbo_attr_>`_) should
298*4882a593Smuzhiyunwork as expected in all cases (that is, if set to disable turbo P-states, it
299*4882a593Smuzhiyunalways should prevent ``intel_pstate`` from using them).
300*4882a593Smuzhiyun
301*4882a593Smuzhiyun
302*4882a593SmuzhiyunProcessor Support
303*4882a593Smuzhiyun=================
304*4882a593Smuzhiyun
305*4882a593SmuzhiyunTo handle a given processor ``intel_pstate`` requires a number of different
306*4882a593Smuzhiyunpieces of information on it to be known, including:
307*4882a593Smuzhiyun
308*4882a593Smuzhiyun * The minimum supported P-state.
309*4882a593Smuzhiyun
310*4882a593Smuzhiyun * The maximum supported `non-turbo P-state <turbo_>`_.
311*4882a593Smuzhiyun
312*4882a593Smuzhiyun * Whether or not turbo P-states are supported at all.
313*4882a593Smuzhiyun
314*4882a593Smuzhiyun * The maximum supported `one-core turbo P-state <turbo_>`_ (if turbo P-states
315*4882a593Smuzhiyun   are supported).
316*4882a593Smuzhiyun
317*4882a593Smuzhiyun * The scaling formula to translate the driver's internal representation
318*4882a593Smuzhiyun   of P-states into frequencies and the other way around.
319*4882a593Smuzhiyun
320*4882a593SmuzhiyunGenerally, ways to obtain that information are specific to the processor model
321*4882a593Smuzhiyunor family.  Although it often is possible to obtain all of it from the processor
322*4882a593Smuzhiyunitself (using model-specific registers), there are cases in which hardware
323*4882a593Smuzhiyunmanuals need to be consulted to get to it too.
324*4882a593Smuzhiyun
325*4882a593SmuzhiyunFor this reason, there is a list of supported processors in ``intel_pstate`` and
326*4882a593Smuzhiyunthe driver initialization will fail if the detected processor is not in that
327*4882a593Smuzhiyunlist, unless it supports the HWP feature.  [The interface to obtain all of the
328*4882a593Smuzhiyuninformation listed above is the same for all of the processors supporting the
329*4882a593SmuzhiyunHWP feature, which is why ``intel_pstate`` works with all of them.]
330*4882a593Smuzhiyun
331*4882a593Smuzhiyun
332*4882a593SmuzhiyunUser Space Interface in ``sysfs``
333*4882a593Smuzhiyun=================================
334*4882a593Smuzhiyun
335*4882a593SmuzhiyunGlobal Attributes
336*4882a593Smuzhiyun-----------------
337*4882a593Smuzhiyun
338*4882a593Smuzhiyun``intel_pstate`` exposes several global attributes (files) in ``sysfs`` to
339*4882a593Smuzhiyuncontrol its functionality at the system level.  They are located in the
340*4882a593Smuzhiyun``/sys/devices/system/cpu/intel_pstate/`` directory and affect all CPUs.
341*4882a593Smuzhiyun
342*4882a593SmuzhiyunSome of them are not present if the ``intel_pstate=per_cpu_perf_limits``
343*4882a593Smuzhiyunargument is passed to the kernel in the command line.
344*4882a593Smuzhiyun
345*4882a593Smuzhiyun``max_perf_pct``
346*4882a593Smuzhiyun	Maximum P-state the driver is allowed to set in percent of the
347*4882a593Smuzhiyun	maximum supported performance level (the highest supported `turbo
348*4882a593Smuzhiyun	P-state <turbo_>`_).
349*4882a593Smuzhiyun
350*4882a593Smuzhiyun	This attribute will not be exposed if the
351*4882a593Smuzhiyun	``intel_pstate=per_cpu_perf_limits`` argument is present in the kernel
352*4882a593Smuzhiyun	command line.
353*4882a593Smuzhiyun
354*4882a593Smuzhiyun``min_perf_pct``
355*4882a593Smuzhiyun	Minimum P-state the driver is allowed to set in percent of the
356*4882a593Smuzhiyun	maximum supported performance level (the highest supported `turbo
357*4882a593Smuzhiyun	P-state <turbo_>`_).
358*4882a593Smuzhiyun
359*4882a593Smuzhiyun	This attribute will not be exposed if the
360*4882a593Smuzhiyun	``intel_pstate=per_cpu_perf_limits`` argument is present in the kernel
361*4882a593Smuzhiyun	command line.
362*4882a593Smuzhiyun
363*4882a593Smuzhiyun``num_pstates``
364*4882a593Smuzhiyun	Number of P-states supported by the processor (between 0 and 255
365*4882a593Smuzhiyun	inclusive) including both turbo and non-turbo P-states (see
366*4882a593Smuzhiyun	`Turbo P-states Support`_).
367*4882a593Smuzhiyun
368*4882a593Smuzhiyun	The value of this attribute is not affected by the ``no_turbo``
369*4882a593Smuzhiyun	setting described `below <no_turbo_attr_>`_.
370*4882a593Smuzhiyun
371*4882a593Smuzhiyun	This attribute is read-only.
372*4882a593Smuzhiyun
373*4882a593Smuzhiyun``turbo_pct``
374*4882a593Smuzhiyun	Ratio of the `turbo range <turbo_>`_ size to the size of the entire
375*4882a593Smuzhiyun	range of supported P-states, in percent.
376*4882a593Smuzhiyun
377*4882a593Smuzhiyun	This attribute is read-only.
378*4882a593Smuzhiyun
379*4882a593Smuzhiyun.. _no_turbo_attr:
380*4882a593Smuzhiyun
381*4882a593Smuzhiyun``no_turbo``
382*4882a593Smuzhiyun	If set (equal to 1), the driver is not allowed to set any turbo P-states
383*4882a593Smuzhiyun	(see `Turbo P-states Support`_).  If unset (equalt to 0, which is the
384*4882a593Smuzhiyun	default), turbo P-states can be set by the driver.
385*4882a593Smuzhiyun	[Note that ``intel_pstate`` does not support the general ``boost``
386*4882a593Smuzhiyun	attribute (supported by some other scaling drivers) which is replaced
387*4882a593Smuzhiyun	by this one.]
388*4882a593Smuzhiyun
389*4882a593Smuzhiyun	This attrubute does not affect the maximum supported frequency value
390*4882a593Smuzhiyun	supplied to the ``CPUFreq`` core and exposed via the policy interface,
391*4882a593Smuzhiyun	but it affects the maximum possible value of per-policy P-state	limits
392*4882a593Smuzhiyun	(see `Interpretation of Policy Attributes`_ below for details).
393*4882a593Smuzhiyun
394*4882a593Smuzhiyun``hwp_dynamic_boost``
395*4882a593Smuzhiyun	This attribute is only present if ``intel_pstate`` works in the
396*4882a593Smuzhiyun	`active mode with the HWP feature enabled <Active Mode With HWP_>`_ in
397*4882a593Smuzhiyun	the processor.  If set (equal to 1), it causes the minimum P-state limit
398*4882a593Smuzhiyun	to be increased dynamically for a short time whenever a task previously
399*4882a593Smuzhiyun	waiting on I/O is selected to run on a given logical CPU (the purpose
400*4882a593Smuzhiyun	of this mechanism is to improve performance).
401*4882a593Smuzhiyun
402*4882a593Smuzhiyun	This setting has no effect on logical CPUs whose minimum P-state limit
403*4882a593Smuzhiyun	is directly set to the highest non-turbo P-state or above it.
404*4882a593Smuzhiyun
405*4882a593Smuzhiyun.. _status_attr:
406*4882a593Smuzhiyun
407*4882a593Smuzhiyun``status``
408*4882a593Smuzhiyun	Operation mode of the driver: "active", "passive" or "off".
409*4882a593Smuzhiyun
410*4882a593Smuzhiyun	"active"
411*4882a593Smuzhiyun		The driver is functional and in the `active mode
412*4882a593Smuzhiyun		<Active Mode_>`_.
413*4882a593Smuzhiyun
414*4882a593Smuzhiyun	"passive"
415*4882a593Smuzhiyun		The driver is functional and in the `passive mode
416*4882a593Smuzhiyun		<Passive Mode_>`_.
417*4882a593Smuzhiyun
418*4882a593Smuzhiyun	"off"
419*4882a593Smuzhiyun		The driver is not functional (it is not registered as a scaling
420*4882a593Smuzhiyun		driver with the ``CPUFreq`` core).
421*4882a593Smuzhiyun
422*4882a593Smuzhiyun	This attribute can be written to in order to change the driver's
423*4882a593Smuzhiyun	operation mode or to unregister it.  The string written to it must be
424*4882a593Smuzhiyun	one of the possible values of it and, if successful, the write will
425*4882a593Smuzhiyun	cause the driver to switch over to the operation mode represented by
426*4882a593Smuzhiyun	that string - or to be unregistered in the "off" case.  [Actually,
427*4882a593Smuzhiyun	switching over from the active mode to the passive mode or the other
428*4882a593Smuzhiyun	way around causes the driver to be unregistered and registered again
429*4882a593Smuzhiyun	with a different set of callbacks, so all of its settings (the global
430*4882a593Smuzhiyun	as well as the per-policy ones) are then reset to their default
431*4882a593Smuzhiyun	values, possibly depending on the target operation mode.]
432*4882a593Smuzhiyun
433*4882a593Smuzhiyun``energy_efficiency``
434*4882a593Smuzhiyun	This attribute is only present on platforms with CPUs matching the Kaby
435*4882a593Smuzhiyun	Lake or Coffee Lake desktop CPU model. By default, energy-efficiency
436*4882a593Smuzhiyun	optimizations are disabled on these CPU models if HWP is enabled.
437*4882a593Smuzhiyun	Enabling energy-efficiency optimizations may limit maximum operating
438*4882a593Smuzhiyun	frequency with or without the HWP feature.  With HWP enabled, the
439*4882a593Smuzhiyun	optimizations are done only in the turbo frequency range.  Without it,
440*4882a593Smuzhiyun	they are done in the entire available frequency range.  Setting this
441*4882a593Smuzhiyun	attribute to "1" enables the energy-efficiency optimizations and setting
442*4882a593Smuzhiyun	to "0" disables them.
443*4882a593Smuzhiyun
444*4882a593SmuzhiyunInterpretation of Policy Attributes
445*4882a593Smuzhiyun-----------------------------------
446*4882a593Smuzhiyun
447*4882a593SmuzhiyunThe interpretation of some ``CPUFreq`` policy attributes described in
448*4882a593Smuzhiyun:doc:`cpufreq` is special with ``intel_pstate`` as the current scaling driver
449*4882a593Smuzhiyunand it generally depends on the driver's `operation mode <Operation Modes_>`_.
450*4882a593Smuzhiyun
451*4882a593SmuzhiyunFirst of all, the values of the ``cpuinfo_max_freq``, ``cpuinfo_min_freq`` and
452*4882a593Smuzhiyun``scaling_cur_freq`` attributes are produced by applying a processor-specific
453*4882a593Smuzhiyunmultiplier to the internal P-state representation used by ``intel_pstate``.
454*4882a593SmuzhiyunAlso, the values of the ``scaling_max_freq`` and ``scaling_min_freq``
455*4882a593Smuzhiyunattributes are capped by the frequency corresponding to the maximum P-state that
456*4882a593Smuzhiyunthe driver is allowed to set.
457*4882a593Smuzhiyun
458*4882a593SmuzhiyunIf the ``no_turbo`` `global attribute <no_turbo_attr_>`_ is set, the driver is
459*4882a593Smuzhiyunnot allowed to use turbo P-states, so the maximum value of ``scaling_max_freq``
460*4882a593Smuzhiyunand ``scaling_min_freq`` is limited to the maximum non-turbo P-state frequency.
461*4882a593SmuzhiyunAccordingly, setting ``no_turbo`` causes ``scaling_max_freq`` and
462*4882a593Smuzhiyun``scaling_min_freq`` to go down to that value if they were above it before.
463*4882a593SmuzhiyunHowever, the old values of ``scaling_max_freq`` and ``scaling_min_freq`` will be
464*4882a593Smuzhiyunrestored after unsetting ``no_turbo``, unless these attributes have been written
465*4882a593Smuzhiyunto after ``no_turbo`` was set.
466*4882a593Smuzhiyun
467*4882a593SmuzhiyunIf ``no_turbo`` is not set, the maximum possible value of ``scaling_max_freq``
468*4882a593Smuzhiyunand ``scaling_min_freq`` corresponds to the maximum supported turbo P-state,
469*4882a593Smuzhiyunwhich also is the value of ``cpuinfo_max_freq`` in either case.
470*4882a593Smuzhiyun
471*4882a593SmuzhiyunNext, the following policy attributes have special meaning if
472*4882a593Smuzhiyun``intel_pstate`` works in the `active mode <Active Mode_>`_:
473*4882a593Smuzhiyun
474*4882a593Smuzhiyun``scaling_available_governors``
475*4882a593Smuzhiyun	List of P-state selection algorithms provided by ``intel_pstate``.
476*4882a593Smuzhiyun
477*4882a593Smuzhiyun``scaling_governor``
478*4882a593Smuzhiyun	P-state selection algorithm provided by ``intel_pstate`` currently in
479*4882a593Smuzhiyun	use with the given policy.
480*4882a593Smuzhiyun
481*4882a593Smuzhiyun``scaling_cur_freq``
482*4882a593Smuzhiyun	Frequency of the average P-state of the CPU represented by the given
483*4882a593Smuzhiyun	policy for the time interval between the last two invocations of the
484*4882a593Smuzhiyun	driver's utilization update callback by the CPU scheduler for that CPU.
485*4882a593Smuzhiyun
486*4882a593SmuzhiyunOne more policy attribute is present if the HWP feature is enabled in the
487*4882a593Smuzhiyunprocessor:
488*4882a593Smuzhiyun
489*4882a593Smuzhiyun``base_frequency``
490*4882a593Smuzhiyun	Shows the base frequency of the CPU. Any frequency above this will be
491*4882a593Smuzhiyun	in the turbo frequency range.
492*4882a593Smuzhiyun
493*4882a593SmuzhiyunThe meaning of these attributes in the `passive mode <Passive Mode_>`_ is the
494*4882a593Smuzhiyunsame as for other scaling drivers.
495*4882a593Smuzhiyun
496*4882a593SmuzhiyunAdditionally, the value of the ``scaling_driver`` attribute for ``intel_pstate``
497*4882a593Smuzhiyundepends on the operation mode of the driver.  Namely, it is either
498*4882a593Smuzhiyun"intel_pstate" (in the `active mode <Active Mode_>`_) or "intel_cpufreq" (in the
499*4882a593Smuzhiyun`passive mode <Passive Mode_>`_).
500*4882a593Smuzhiyun
501*4882a593SmuzhiyunCoordination of P-State Limits
502*4882a593Smuzhiyun------------------------------
503*4882a593Smuzhiyun
504*4882a593Smuzhiyun``intel_pstate`` allows P-state limits to be set in two ways: with the help of
505*4882a593Smuzhiyunthe ``max_perf_pct`` and ``min_perf_pct`` `global attributes
506*4882a593Smuzhiyun<Global Attributes_>`_ or via the ``scaling_max_freq`` and ``scaling_min_freq``
507*4882a593Smuzhiyun``CPUFreq`` policy attributes.  The coordination between those limits is based
508*4882a593Smuzhiyunon the following rules, regardless of the current operation mode of the driver:
509*4882a593Smuzhiyun
510*4882a593Smuzhiyun 1. All CPUs are affected by the global limits (that is, none of them can be
511*4882a593Smuzhiyun    requested to run faster than the global maximum and none of them can be
512*4882a593Smuzhiyun    requested to run slower than the global minimum).
513*4882a593Smuzhiyun
514*4882a593Smuzhiyun 2. Each individual CPU is affected by its own per-policy limits (that is, it
515*4882a593Smuzhiyun    cannot be requested to run faster than its own per-policy maximum and it
516*4882a593Smuzhiyun    cannot be requested to run slower than its own per-policy minimum). The
517*4882a593Smuzhiyun    effective performance depends on whether the platform supports per core
518*4882a593Smuzhiyun    P-states, hyper-threading is enabled and on current performance requests
519*4882a593Smuzhiyun    from other CPUs. When platform doesn't support per core P-states, the
520*4882a593Smuzhiyun    effective performance can be more than the policy limits set on a CPU, if
521*4882a593Smuzhiyun    other CPUs are requesting higher performance at that moment. Even with per
522*4882a593Smuzhiyun    core P-states support, when hyper-threading is enabled, if the sibling CPU
523*4882a593Smuzhiyun    is requesting higher performance, the other siblings will get higher
524*4882a593Smuzhiyun    performance than their policy limits.
525*4882a593Smuzhiyun
526*4882a593Smuzhiyun 3. The global and per-policy limits can be set independently.
527*4882a593Smuzhiyun
528*4882a593SmuzhiyunIn the `active mode with the HWP feature enabled <Active Mode With HWP_>`_, the
529*4882a593Smuzhiyunresulting effective values are written into hardware registers whenever the
530*4882a593Smuzhiyunlimits change in order to request its internal P-state selection logic to always
531*4882a593Smuzhiyunset P-states within these limits.  Otherwise, the limits are taken into account
532*4882a593Smuzhiyunby scaling governors (in the `passive mode <Passive Mode_>`_) and by the driver
533*4882a593Smuzhiyunevery time before setting a new P-state for a CPU.
534*4882a593Smuzhiyun
535*4882a593SmuzhiyunAdditionally, if the ``intel_pstate=per_cpu_perf_limits`` command line argument
536*4882a593Smuzhiyunis passed to the kernel, ``max_perf_pct`` and ``min_perf_pct`` are not exposed
537*4882a593Smuzhiyunat all and the only way to set the limits is by using the policy attributes.
538*4882a593Smuzhiyun
539*4882a593Smuzhiyun
540*4882a593SmuzhiyunEnergy vs Performance Hints
541*4882a593Smuzhiyun---------------------------
542*4882a593Smuzhiyun
543*4882a593SmuzhiyunIf the hardware-managed P-states (HWP) is enabled in the processor, additional
544*4882a593Smuzhiyunattributes, intended to allow user space to help ``intel_pstate`` to adjust the
545*4882a593Smuzhiyunprocessor's internal P-state selection logic by focusing it on performance or on
546*4882a593Smuzhiyunenergy-efficiency, or somewhere between the two extremes, are present in every
547*4882a593Smuzhiyun``CPUFreq`` policy directory in ``sysfs``.  They are :
548*4882a593Smuzhiyun
549*4882a593Smuzhiyun``energy_performance_preference``
550*4882a593Smuzhiyun	Current value of the energy vs performance hint for the given policy
551*4882a593Smuzhiyun	(or the CPU represented by it).
552*4882a593Smuzhiyun
553*4882a593Smuzhiyun	The hint can be changed by writing to this attribute.
554*4882a593Smuzhiyun
555*4882a593Smuzhiyun``energy_performance_available_preferences``
556*4882a593Smuzhiyun	List of strings that can be written to the
557*4882a593Smuzhiyun	``energy_performance_preference`` attribute.
558*4882a593Smuzhiyun
559*4882a593Smuzhiyun	They represent different energy vs performance hints and should be
560*4882a593Smuzhiyun	self-explanatory, except that ``default`` represents whatever hint
561*4882a593Smuzhiyun	value was set by the platform firmware.
562*4882a593Smuzhiyun
563*4882a593SmuzhiyunStrings written to the ``energy_performance_preference`` attribute are
564*4882a593Smuzhiyuninternally translated to integer values written to the processor's
565*4882a593SmuzhiyunEnergy-Performance Preference (EPP) knob (if supported) or its
566*4882a593SmuzhiyunEnergy-Performance Bias (EPB) knob. It is also possible to write a positive
567*4882a593Smuzhiyuninteger value between 0 to 255, if the EPP feature is present. If the EPP
568*4882a593Smuzhiyunfeature is not present, writing integer value to this attribute is not
569*4882a593Smuzhiyunsupported. In this case, user can use the
570*4882a593Smuzhiyun"/sys/devices/system/cpu/cpu*/power/energy_perf_bias" interface.
571*4882a593Smuzhiyun
572*4882a593Smuzhiyun[Note that tasks may by migrated from one CPU to another by the scheduler's
573*4882a593Smuzhiyunload-balancing algorithm and if different energy vs performance hints are
574*4882a593Smuzhiyunset for those CPUs, that may lead to undesirable outcomes.  To avoid such
575*4882a593Smuzhiyunissues it is better to set the same energy vs performance hint for all CPUs
576*4882a593Smuzhiyunor to pin every task potentially sensitive to them to a specific CPU.]
577*4882a593Smuzhiyun
578*4882a593Smuzhiyun.. _acpi-cpufreq:
579*4882a593Smuzhiyun
580*4882a593Smuzhiyun``intel_pstate`` vs ``acpi-cpufreq``
581*4882a593Smuzhiyun====================================
582*4882a593Smuzhiyun
583*4882a593SmuzhiyunOn the majority of systems supported by ``intel_pstate``, the ACPI tables
584*4882a593Smuzhiyunprovided by the platform firmware contain ``_PSS`` objects returning information
585*4882a593Smuzhiyunthat can be used for CPU performance scaling (refer to the ACPI specification
586*4882a593Smuzhiyun[3]_ for details on the ``_PSS`` objects and the format of the information
587*4882a593Smuzhiyunreturned by them).
588*4882a593Smuzhiyun
589*4882a593SmuzhiyunThe information returned by the ACPI ``_PSS`` objects is used by the
590*4882a593Smuzhiyun``acpi-cpufreq`` scaling driver.  On systems supported by ``intel_pstate``
591*4882a593Smuzhiyunthe ``acpi-cpufreq`` driver uses the same hardware CPU performance scaling
592*4882a593Smuzhiyuninterface, but the set of P-states it can use is limited by the ``_PSS``
593*4882a593Smuzhiyunoutput.
594*4882a593Smuzhiyun
595*4882a593SmuzhiyunOn those systems each ``_PSS`` object returns a list of P-states supported by
596*4882a593Smuzhiyunthe corresponding CPU which basically is a subset of the P-states range that can
597*4882a593Smuzhiyunbe used by ``intel_pstate`` on the same system, with one exception: the whole
598*4882a593Smuzhiyun`turbo range <turbo_>`_ is represented by one item in it (the topmost one).  By
599*4882a593Smuzhiyunconvention, the frequency returned by ``_PSS`` for that item is greater by 1 MHz
600*4882a593Smuzhiyunthan the frequency of the highest non-turbo P-state listed by it, but the
601*4882a593Smuzhiyuncorresponding P-state representation (following the hardware specification)
602*4882a593Smuzhiyunreturned for it matches the maximum supported turbo P-state (or is the
603*4882a593Smuzhiyunspecial value 255 meaning essentially "go as high as you can get").
604*4882a593Smuzhiyun
605*4882a593SmuzhiyunThe list of P-states returned by ``_PSS`` is reflected by the table of
606*4882a593Smuzhiyunavailable frequencies supplied by ``acpi-cpufreq`` to the ``CPUFreq`` core and
607*4882a593Smuzhiyunscaling governors and the minimum and maximum supported frequencies reported by
608*4882a593Smuzhiyunit come from that list as well.  In particular, given the special representation
609*4882a593Smuzhiyunof the turbo range described above, this means that the maximum supported
610*4882a593Smuzhiyunfrequency reported by ``acpi-cpufreq`` is higher by 1 MHz than the frequency
611*4882a593Smuzhiyunof the highest supported non-turbo P-state listed by ``_PSS`` which, of course,
612*4882a593Smuzhiyunaffects decisions made by the scaling governors, except for ``powersave`` and
613*4882a593Smuzhiyun``performance``.
614*4882a593Smuzhiyun
615*4882a593SmuzhiyunFor example, if a given governor attempts to select a frequency proportional to
616*4882a593Smuzhiyunestimated CPU load and maps the load of 100% to the maximum supported frequency
617*4882a593Smuzhiyun(possibly multiplied by a constant), then it will tend to choose P-states below
618*4882a593Smuzhiyunthe turbo threshold if ``acpi-cpufreq`` is used as the scaling driver, because
619*4882a593Smuzhiyunin that case the turbo range corresponds to a small fraction of the frequency
620*4882a593Smuzhiyunband it can use (1 MHz vs 1 GHz or more).  In consequence, it will only go to
621*4882a593Smuzhiyunthe turbo range for the highest loads and the other loads above 50% that might
622*4882a593Smuzhiyunbenefit from running at turbo frequencies will be given non-turbo P-states
623*4882a593Smuzhiyuninstead.
624*4882a593Smuzhiyun
625*4882a593SmuzhiyunOne more issue related to that may appear on systems supporting the
626*4882a593Smuzhiyun`Configurable TDP feature <turbo_>`_ allowing the platform firmware to set the
627*4882a593Smuzhiyunturbo threshold.  Namely, if that is not coordinated with the lists of P-states
628*4882a593Smuzhiyunreturned by ``_PSS`` properly, there may be more than one item corresponding to
629*4882a593Smuzhiyuna turbo P-state in those lists and there may be a problem with avoiding the
630*4882a593Smuzhiyunturbo range (if desirable or necessary).  Usually, to avoid using turbo
631*4882a593SmuzhiyunP-states overall, ``acpi-cpufreq`` simply avoids using the topmost state listed
632*4882a593Smuzhiyunby ``_PSS``, but that is not sufficient when there are other turbo P-states in
633*4882a593Smuzhiyunthe list returned by it.
634*4882a593Smuzhiyun
635*4882a593SmuzhiyunApart from the above, ``acpi-cpufreq`` works like ``intel_pstate`` in the
636*4882a593Smuzhiyun`passive mode <Passive Mode_>`_, except that the number of P-states it can set
637*4882a593Smuzhiyunis limited to the ones listed by the ACPI ``_PSS`` objects.
638*4882a593Smuzhiyun
639*4882a593Smuzhiyun
640*4882a593SmuzhiyunKernel Command Line Options for ``intel_pstate``
641*4882a593Smuzhiyun================================================
642*4882a593Smuzhiyun
643*4882a593SmuzhiyunSeveral kernel command line options can be used to pass early-configuration-time
644*4882a593Smuzhiyunparameters to ``intel_pstate`` in order to enforce specific behavior of it.  All
645*4882a593Smuzhiyunof them have to be prepended with the ``intel_pstate=`` prefix.
646*4882a593Smuzhiyun
647*4882a593Smuzhiyun``disable``
648*4882a593Smuzhiyun	Do not register ``intel_pstate`` as the scaling driver even if the
649*4882a593Smuzhiyun	processor is supported by it.
650*4882a593Smuzhiyun
651*4882a593Smuzhiyun``active``
652*4882a593Smuzhiyun	Register ``intel_pstate`` in the `active mode <Active Mode_>`_ to start
653*4882a593Smuzhiyun	with.
654*4882a593Smuzhiyun
655*4882a593Smuzhiyun``passive``
656*4882a593Smuzhiyun	Register ``intel_pstate`` in the `passive mode <Passive Mode_>`_ to
657*4882a593Smuzhiyun	start with.
658*4882a593Smuzhiyun
659*4882a593Smuzhiyun``force``
660*4882a593Smuzhiyun	Register ``intel_pstate`` as the scaling driver instead of
661*4882a593Smuzhiyun	``acpi-cpufreq`` even if the latter is preferred on the given system.
662*4882a593Smuzhiyun
663*4882a593Smuzhiyun	This may prevent some platform features (such as thermal controls and
664*4882a593Smuzhiyun	power capping) that rely on the availability of ACPI P-states
665*4882a593Smuzhiyun	information from functioning as expected, so it should be used with
666*4882a593Smuzhiyun	caution.
667*4882a593Smuzhiyun
668*4882a593Smuzhiyun	This option does not work with processors that are not supported by
669*4882a593Smuzhiyun	``intel_pstate`` and on platforms where the ``pcc-cpufreq`` scaling
670*4882a593Smuzhiyun	driver is used instead of ``acpi-cpufreq``.
671*4882a593Smuzhiyun
672*4882a593Smuzhiyun``no_hwp``
673*4882a593Smuzhiyun	Do not enable the hardware-managed P-states (HWP) feature even if it is
674*4882a593Smuzhiyun	supported by the processor.
675*4882a593Smuzhiyun
676*4882a593Smuzhiyun``hwp_only``
677*4882a593Smuzhiyun	Register ``intel_pstate`` as the scaling driver only if the
678*4882a593Smuzhiyun	hardware-managed P-states (HWP) feature is supported by the processor.
679*4882a593Smuzhiyun
680*4882a593Smuzhiyun``support_acpi_ppc``
681*4882a593Smuzhiyun	Take ACPI ``_PPC`` performance limits into account.
682*4882a593Smuzhiyun
683*4882a593Smuzhiyun	If the preferred power management profile in the FADT (Fixed ACPI
684*4882a593Smuzhiyun	Description Table) is set to "Enterprise Server" or "Performance
685*4882a593Smuzhiyun	Server", the ACPI ``_PPC`` limits are taken into account by default
686*4882a593Smuzhiyun	and this option has no effect.
687*4882a593Smuzhiyun
688*4882a593Smuzhiyun``per_cpu_perf_limits``
689*4882a593Smuzhiyun	Use per-logical-CPU P-State limits (see `Coordination of P-state
690*4882a593Smuzhiyun	Limits`_ for details).
691*4882a593Smuzhiyun
692*4882a593Smuzhiyun
693*4882a593SmuzhiyunDiagnostics and Tuning
694*4882a593Smuzhiyun======================
695*4882a593Smuzhiyun
696*4882a593SmuzhiyunTrace Events
697*4882a593Smuzhiyun------------
698*4882a593Smuzhiyun
699*4882a593SmuzhiyunThere are two static trace events that can be used for ``intel_pstate``
700*4882a593Smuzhiyundiagnostics.  One of them is the ``cpu_frequency`` trace event generally used
701*4882a593Smuzhiyunby ``CPUFreq``, and the other one is the ``pstate_sample`` trace event specific
702*4882a593Smuzhiyunto ``intel_pstate``.  Both of them are triggered by ``intel_pstate`` only if
703*4882a593Smuzhiyunit works in the `active mode <Active Mode_>`_.
704*4882a593Smuzhiyun
705*4882a593SmuzhiyunThe following sequence of shell commands can be used to enable them and see
706*4882a593Smuzhiyuntheir output (if the kernel is generally configured to support event tracing)::
707*4882a593Smuzhiyun
708*4882a593Smuzhiyun # cd /sys/kernel/debug/tracing/
709*4882a593Smuzhiyun # echo 1 > events/power/pstate_sample/enable
710*4882a593Smuzhiyun # echo 1 > events/power/cpu_frequency/enable
711*4882a593Smuzhiyun # cat trace
712*4882a593Smuzhiyun gnome-terminal--4510  [001] ..s.  1177.680733: pstate_sample: core_busy=107 scaled=94 from=26 to=26 mperf=1143818 aperf=1230607 tsc=29838618 freq=2474476
713*4882a593Smuzhiyun cat-5235  [002] ..s.  1177.681723: cpu_frequency: state=2900000 cpu_id=2
714*4882a593Smuzhiyun
715*4882a593SmuzhiyunIf ``intel_pstate`` works in the `passive mode <Passive Mode_>`_, the
716*4882a593Smuzhiyun``cpu_frequency`` trace event will be triggered either by the ``schedutil``
717*4882a593Smuzhiyunscaling governor (for the policies it is attached to), or by the ``CPUFreq``
718*4882a593Smuzhiyuncore (for the policies with other scaling governors).
719*4882a593Smuzhiyun
720*4882a593Smuzhiyun``ftrace``
721*4882a593Smuzhiyun----------
722*4882a593Smuzhiyun
723*4882a593SmuzhiyunThe ``ftrace`` interface can be used for low-level diagnostics of
724*4882a593Smuzhiyun``intel_pstate``.  For example, to check how often the function to set a
725*4882a593SmuzhiyunP-state is called, the ``ftrace`` filter can be set to
726*4882a593Smuzhiyun:c:func:`intel_pstate_set_pstate`::
727*4882a593Smuzhiyun
728*4882a593Smuzhiyun # cd /sys/kernel/debug/tracing/
729*4882a593Smuzhiyun # cat available_filter_functions | grep -i pstate
730*4882a593Smuzhiyun intel_pstate_set_pstate
731*4882a593Smuzhiyun intel_pstate_cpu_init
732*4882a593Smuzhiyun ...
733*4882a593Smuzhiyun # echo intel_pstate_set_pstate > set_ftrace_filter
734*4882a593Smuzhiyun # echo function > current_tracer
735*4882a593Smuzhiyun # cat trace | head -15
736*4882a593Smuzhiyun # tracer: function
737*4882a593Smuzhiyun #
738*4882a593Smuzhiyun # entries-in-buffer/entries-written: 80/80   #P:4
739*4882a593Smuzhiyun #
740*4882a593Smuzhiyun #                              _-----=> irqs-off
741*4882a593Smuzhiyun #                             / _----=> need-resched
742*4882a593Smuzhiyun #                            | / _---=> hardirq/softirq
743*4882a593Smuzhiyun #                            || / _--=> preempt-depth
744*4882a593Smuzhiyun #                            ||| /     delay
745*4882a593Smuzhiyun #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
746*4882a593Smuzhiyun #              | |       |   ||||       |         |
747*4882a593Smuzhiyun             Xorg-3129  [000] ..s.  2537.644844: intel_pstate_set_pstate <-intel_pstate_timer_func
748*4882a593Smuzhiyun  gnome-terminal--4510  [002] ..s.  2537.649844: intel_pstate_set_pstate <-intel_pstate_timer_func
749*4882a593Smuzhiyun      gnome-shell-3409  [001] ..s.  2537.650850: intel_pstate_set_pstate <-intel_pstate_timer_func
750*4882a593Smuzhiyun           <idle>-0     [000] ..s.  2537.654843: intel_pstate_set_pstate <-intel_pstate_timer_func
751*4882a593Smuzhiyun
752*4882a593Smuzhiyun
753*4882a593SmuzhiyunReferences
754*4882a593Smuzhiyun==========
755*4882a593Smuzhiyun
756*4882a593Smuzhiyun.. [1] Kristen Accardi, *Balancing Power and Performance in the Linux Kernel*,
757*4882a593Smuzhiyun       https://events.static.linuxfound.org/sites/events/files/slides/LinuxConEurope_2015.pdf
758*4882a593Smuzhiyun
759*4882a593Smuzhiyun.. [2] *Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3: System Programming Guide*,
760*4882a593Smuzhiyun       https://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-system-programming-manual-325384.html
761*4882a593Smuzhiyun
762*4882a593Smuzhiyun.. [3] *Advanced Configuration and Power Interface Specification*,
763*4882a593Smuzhiyun       https://uefi.org/sites/default/files/resources/ACPI_6_3_final_Jan30.pdf
764