1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun.. include:: <isonum.txt> 3*4882a593Smuzhiyun 4*4882a593Smuzhiyun=============================================== 5*4882a593Smuzhiyun``intel_pstate`` CPU Performance Scaling Driver 6*4882a593Smuzhiyun=============================================== 7*4882a593Smuzhiyun 8*4882a593Smuzhiyun:Copyright: |copy| 2017 Intel Corporation 9*4882a593Smuzhiyun 10*4882a593Smuzhiyun:Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> 11*4882a593Smuzhiyun 12*4882a593Smuzhiyun 13*4882a593SmuzhiyunGeneral Information 14*4882a593Smuzhiyun=================== 15*4882a593Smuzhiyun 16*4882a593Smuzhiyun``intel_pstate`` is a part of the 17*4882a593Smuzhiyun:doc:`CPU performance scaling subsystem <cpufreq>` in the Linux kernel 18*4882a593Smuzhiyun(``CPUFreq``). It is a scaling driver for the Sandy Bridge and later 19*4882a593Smuzhiyungenerations of Intel processors. Note, however, that some of those processors 20*4882a593Smuzhiyunmay not be supported. [To understand ``intel_pstate`` it is necessary to know 21*4882a593Smuzhiyunhow ``CPUFreq`` works in general, so this is the time to read :doc:`cpufreq` if 22*4882a593Smuzhiyunyou have not done that yet.] 23*4882a593Smuzhiyun 24*4882a593SmuzhiyunFor the processors supported by ``intel_pstate``, the P-state concept is broader 25*4882a593Smuzhiyunthan just an operating frequency or an operating performance point (see the 26*4882a593SmuzhiyunLinuxCon Europe 2015 presentation by Kristen Accardi [1]_ for more 27*4882a593Smuzhiyuninformation about that). For this reason, the representation of P-states used 28*4882a593Smuzhiyunby ``intel_pstate`` internally follows the hardware specification (for details 29*4882a593Smuzhiyunrefer to Intel Software Developer’s Manual [2]_). However, the ``CPUFreq`` core 30*4882a593Smuzhiyunuses frequencies for identifying operating performance points of CPUs and 31*4882a593Smuzhiyunfrequencies are involved in the user space interface exposed by it, so 32*4882a593Smuzhiyun``intel_pstate`` maps its internal representation of P-states to frequencies too 33*4882a593Smuzhiyun(fortunately, that mapping is unambiguous). At the same time, it would not be 34*4882a593Smuzhiyunpractical for ``intel_pstate`` to supply the ``CPUFreq`` core with a table of 35*4882a593Smuzhiyunavailable frequencies due to the possible size of it, so the driver does not do 36*4882a593Smuzhiyunthat. Some functionality of the core is limited by that. 37*4882a593Smuzhiyun 38*4882a593SmuzhiyunSince the hardware P-state selection interface used by ``intel_pstate`` is 39*4882a593Smuzhiyunavailable at the logical CPU level, the driver always works with individual 40*4882a593SmuzhiyunCPUs. Consequently, if ``intel_pstate`` is in use, every ``CPUFreq`` policy 41*4882a593Smuzhiyunobject corresponds to one logical CPU and ``CPUFreq`` policies are effectively 42*4882a593Smuzhiyunequivalent to CPUs. In particular, this means that they become "inactive" every 43*4882a593Smuzhiyuntime the corresponding CPU is taken offline and need to be re-initialized when 44*4882a593Smuzhiyunit goes back online. 45*4882a593Smuzhiyun 46*4882a593Smuzhiyun``intel_pstate`` is not modular, so it cannot be unloaded, which means that the 47*4882a593Smuzhiyunonly way to pass early-configuration-time parameters to it is via the kernel 48*4882a593Smuzhiyuncommand line. However, its configuration can be adjusted via ``sysfs`` to a 49*4882a593Smuzhiyungreat extent. In some configurations it even is possible to unregister it via 50*4882a593Smuzhiyun``sysfs`` which allows another ``CPUFreq`` scaling driver to be loaded and 51*4882a593Smuzhiyunregistered (see `below <status_attr_>`_). 52*4882a593Smuzhiyun 53*4882a593Smuzhiyun 54*4882a593SmuzhiyunOperation Modes 55*4882a593Smuzhiyun=============== 56*4882a593Smuzhiyun 57*4882a593Smuzhiyun``intel_pstate`` can operate in two different modes, active or passive. In the 58*4882a593Smuzhiyunactive mode, it uses its own internal performance scaling governor algorithm or 59*4882a593Smuzhiyunallows the hardware to do preformance scaling by itself, while in the passive 60*4882a593Smuzhiyunmode it responds to requests made by a generic ``CPUFreq`` governor implementing 61*4882a593Smuzhiyuna certain performance scaling algorithm. Which of them will be in effect 62*4882a593Smuzhiyundepends on what kernel command line options are used and on the capabilities of 63*4882a593Smuzhiyunthe processor. 64*4882a593Smuzhiyun 65*4882a593SmuzhiyunActive Mode 66*4882a593Smuzhiyun----------- 67*4882a593Smuzhiyun 68*4882a593SmuzhiyunThis is the default operation mode of ``intel_pstate`` for processors with 69*4882a593Smuzhiyunhardware-managed P-states (HWP) support. If it works in this mode, the 70*4882a593Smuzhiyun``scaling_driver`` policy attribute in ``sysfs`` for all ``CPUFreq`` policies 71*4882a593Smuzhiyuncontains the string "intel_pstate". 72*4882a593Smuzhiyun 73*4882a593SmuzhiyunIn this mode the driver bypasses the scaling governors layer of ``CPUFreq`` and 74*4882a593Smuzhiyunprovides its own scaling algorithms for P-state selection. Those algorithms 75*4882a593Smuzhiyuncan be applied to ``CPUFreq`` policies in the same way as generic scaling 76*4882a593Smuzhiyungovernors (that is, through the ``scaling_governor`` policy attribute in 77*4882a593Smuzhiyun``sysfs``). [Note that different P-state selection algorithms may be chosen for 78*4882a593Smuzhiyundifferent policies, but that is not recommended.] 79*4882a593Smuzhiyun 80*4882a593SmuzhiyunThey are not generic scaling governors, but their names are the same as the 81*4882a593Smuzhiyunnames of some of those governors. Moreover, confusingly enough, they generally 82*4882a593Smuzhiyundo not work in the same way as the generic governors they share the names with. 83*4882a593SmuzhiyunFor example, the ``powersave`` P-state selection algorithm provided by 84*4882a593Smuzhiyun``intel_pstate`` is not a counterpart of the generic ``powersave`` governor 85*4882a593Smuzhiyun(roughly, it corresponds to the ``schedutil`` and ``ondemand`` governors). 86*4882a593Smuzhiyun 87*4882a593SmuzhiyunThere are two P-state selection algorithms provided by ``intel_pstate`` in the 88*4882a593Smuzhiyunactive mode: ``powersave`` and ``performance``. The way they both operate 89*4882a593Smuzhiyundepends on whether or not the hardware-managed P-states (HWP) feature has been 90*4882a593Smuzhiyunenabled in the processor and possibly on the processor model. 91*4882a593Smuzhiyun 92*4882a593SmuzhiyunWhich of the P-state selection algorithms is used by default depends on the 93*4882a593Smuzhiyun:c:macro:`CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE` kernel configuration option. 94*4882a593SmuzhiyunNamely, if that option is set, the ``performance`` algorithm will be used by 95*4882a593Smuzhiyundefault, and the other one will be used by default if it is not set. 96*4882a593Smuzhiyun 97*4882a593SmuzhiyunActive Mode With HWP 98*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~~~ 99*4882a593Smuzhiyun 100*4882a593SmuzhiyunIf the processor supports the HWP feature, it will be enabled during the 101*4882a593Smuzhiyunprocessor initialization and cannot be disabled after that. It is possible 102*4882a593Smuzhiyunto avoid enabling it by passing the ``intel_pstate=no_hwp`` argument to the 103*4882a593Smuzhiyunkernel in the command line. 104*4882a593Smuzhiyun 105*4882a593SmuzhiyunIf the HWP feature has been enabled, ``intel_pstate`` relies on the processor to 106*4882a593Smuzhiyunselect P-states by itself, but still it can give hints to the processor's 107*4882a593Smuzhiyuninternal P-state selection logic. What those hints are depends on which P-state 108*4882a593Smuzhiyunselection algorithm has been applied to the given policy (or to the CPU it 109*4882a593Smuzhiyuncorresponds to). 110*4882a593Smuzhiyun 111*4882a593SmuzhiyunEven though the P-state selection is carried out by the processor automatically, 112*4882a593Smuzhiyun``intel_pstate`` registers utilization update callbacks with the CPU scheduler 113*4882a593Smuzhiyunin this mode. However, they are not used for running a P-state selection 114*4882a593Smuzhiyunalgorithm, but for periodic updates of the current CPU frequency information to 115*4882a593Smuzhiyunbe made available from the ``scaling_cur_freq`` policy attribute in ``sysfs``. 116*4882a593Smuzhiyun 117*4882a593SmuzhiyunHWP + ``performance`` 118*4882a593Smuzhiyun..................... 119*4882a593Smuzhiyun 120*4882a593SmuzhiyunIn this configuration ``intel_pstate`` will write 0 to the processor's 121*4882a593SmuzhiyunEnergy-Performance Preference (EPP) knob (if supported) or its 122*4882a593SmuzhiyunEnergy-Performance Bias (EPB) knob (otherwise), which means that the processor's 123*4882a593Smuzhiyuninternal P-state selection logic is expected to focus entirely on performance. 124*4882a593Smuzhiyun 125*4882a593SmuzhiyunThis will override the EPP/EPB setting coming from the ``sysfs`` interface 126*4882a593Smuzhiyun(see `Energy vs Performance Hints`_ below). Moreover, any attempts to change 127*4882a593Smuzhiyunthe EPP/EPB to a value different from 0 ("performance") via ``sysfs`` in this 128*4882a593Smuzhiyunconfiguration will be rejected. 129*4882a593Smuzhiyun 130*4882a593SmuzhiyunAlso, in this configuration the range of P-states available to the processor's 131*4882a593Smuzhiyuninternal P-state selection logic is always restricted to the upper boundary 132*4882a593Smuzhiyun(that is, the maximum P-state that the driver is allowed to use). 133*4882a593Smuzhiyun 134*4882a593SmuzhiyunHWP + ``powersave`` 135*4882a593Smuzhiyun................... 136*4882a593Smuzhiyun 137*4882a593SmuzhiyunIn this configuration ``intel_pstate`` will set the processor's 138*4882a593SmuzhiyunEnergy-Performance Preference (EPP) knob (if supported) or its 139*4882a593SmuzhiyunEnergy-Performance Bias (EPB) knob (otherwise) to whatever value it was 140*4882a593Smuzhiyunpreviously set to via ``sysfs`` (or whatever default value it was 141*4882a593Smuzhiyunset to by the platform firmware). This usually causes the processor's 142*4882a593Smuzhiyuninternal P-state selection logic to be less performance-focused. 143*4882a593Smuzhiyun 144*4882a593SmuzhiyunActive Mode Without HWP 145*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~~~~~~ 146*4882a593Smuzhiyun 147*4882a593SmuzhiyunThis operation mode is optional for processors that do not support the HWP 148*4882a593Smuzhiyunfeature or when the ``intel_pstate=no_hwp`` argument is passed to the kernel in 149*4882a593Smuzhiyunthe command line. The active mode is used in those cases if the 150*4882a593Smuzhiyun``intel_pstate=active`` argument is passed to the kernel in the command line. 151*4882a593SmuzhiyunIn this mode ``intel_pstate`` may refuse to work with processors that are not 152*4882a593Smuzhiyunrecognized by it. [Note that ``intel_pstate`` will never refuse to work with 153*4882a593Smuzhiyunany processor with the HWP feature enabled.] 154*4882a593Smuzhiyun 155*4882a593SmuzhiyunIn this mode ``intel_pstate`` registers utilization update callbacks with the 156*4882a593SmuzhiyunCPU scheduler in order to run a P-state selection algorithm, either 157*4882a593Smuzhiyun``powersave`` or ``performance``, depending on the ``scaling_governor`` policy 158*4882a593Smuzhiyunsetting in ``sysfs``. The current CPU frequency information to be made 159*4882a593Smuzhiyunavailable from the ``scaling_cur_freq`` policy attribute in ``sysfs`` is 160*4882a593Smuzhiyunperiodically updated by those utilization update callbacks too. 161*4882a593Smuzhiyun 162*4882a593Smuzhiyun``performance`` 163*4882a593Smuzhiyun............... 164*4882a593Smuzhiyun 165*4882a593SmuzhiyunWithout HWP, this P-state selection algorithm is always the same regardless of 166*4882a593Smuzhiyunthe processor model and platform configuration. 167*4882a593Smuzhiyun 168*4882a593SmuzhiyunIt selects the maximum P-state it is allowed to use, subject to limits set via 169*4882a593Smuzhiyun``sysfs``, every time the driver configuration for the given CPU is updated 170*4882a593Smuzhiyun(e.g. via ``sysfs``). 171*4882a593Smuzhiyun 172*4882a593SmuzhiyunThis is the default P-state selection algorithm if the 173*4882a593Smuzhiyun:c:macro:`CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE` kernel configuration option 174*4882a593Smuzhiyunis set. 175*4882a593Smuzhiyun 176*4882a593Smuzhiyun``powersave`` 177*4882a593Smuzhiyun............. 178*4882a593Smuzhiyun 179*4882a593SmuzhiyunWithout HWP, this P-state selection algorithm is similar to the algorithm 180*4882a593Smuzhiyunimplemented by the generic ``schedutil`` scaling governor except that the 181*4882a593Smuzhiyunutilization metric used by it is based on numbers coming from feedback 182*4882a593Smuzhiyunregisters of the CPU. It generally selects P-states proportional to the 183*4882a593Smuzhiyuncurrent CPU utilization. 184*4882a593Smuzhiyun 185*4882a593SmuzhiyunThis algorithm is run by the driver's utilization update callback for the 186*4882a593Smuzhiyungiven CPU when it is invoked by the CPU scheduler, but not more often than 187*4882a593Smuzhiyunevery 10 ms. Like in the ``performance`` case, the hardware configuration 188*4882a593Smuzhiyunis not touched if the new P-state turns out to be the same as the current 189*4882a593Smuzhiyunone. 190*4882a593Smuzhiyun 191*4882a593SmuzhiyunThis is the default P-state selection algorithm if the 192*4882a593Smuzhiyun:c:macro:`CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE` kernel configuration option 193*4882a593Smuzhiyunis not set. 194*4882a593Smuzhiyun 195*4882a593SmuzhiyunPassive Mode 196*4882a593Smuzhiyun------------ 197*4882a593Smuzhiyun 198*4882a593SmuzhiyunThis is the default operation mode of ``intel_pstate`` for processors without 199*4882a593Smuzhiyunhardware-managed P-states (HWP) support. It is always used if the 200*4882a593Smuzhiyun``intel_pstate=passive`` argument is passed to the kernel in the command line 201*4882a593Smuzhiyunregardless of whether or not the given processor supports HWP. [Note that the 202*4882a593Smuzhiyun``intel_pstate=no_hwp`` setting causes the driver to start in the passive mode 203*4882a593Smuzhiyunif it is not combined with ``intel_pstate=active``.] Like in the active mode 204*4882a593Smuzhiyunwithout HWP support, in this mode ``intel_pstate`` may refuse to work with 205*4882a593Smuzhiyunprocessors that are not recognized by it if HWP is prevented from being enabled 206*4882a593Smuzhiyunthrough the kernel command line. 207*4882a593Smuzhiyun 208*4882a593SmuzhiyunIf the driver works in this mode, the ``scaling_driver`` policy attribute in 209*4882a593Smuzhiyun``sysfs`` for all ``CPUFreq`` policies contains the string "intel_cpufreq". 210*4882a593SmuzhiyunThen, the driver behaves like a regular ``CPUFreq`` scaling driver. That is, 211*4882a593Smuzhiyunit is invoked by generic scaling governors when necessary to talk to the 212*4882a593Smuzhiyunhardware in order to change the P-state of a CPU (in particular, the 213*4882a593Smuzhiyun``schedutil`` governor can invoke it directly from scheduler context). 214*4882a593Smuzhiyun 215*4882a593SmuzhiyunWhile in this mode, ``intel_pstate`` can be used with all of the (generic) 216*4882a593Smuzhiyunscaling governors listed by the ``scaling_available_governors`` policy attribute 217*4882a593Smuzhiyunin ``sysfs`` (and the P-state selection algorithms described above are not 218*4882a593Smuzhiyunused). Then, it is responsible for the configuration of policy objects 219*4882a593Smuzhiyuncorresponding to CPUs and provides the ``CPUFreq`` core (and the scaling 220*4882a593Smuzhiyungovernors attached to the policy objects) with accurate information on the 221*4882a593Smuzhiyunmaximum and minimum operating frequencies supported by the hardware (including 222*4882a593Smuzhiyunthe so-called "turbo" frequency ranges). In other words, in the passive mode 223*4882a593Smuzhiyunthe entire range of available P-states is exposed by ``intel_pstate`` to the 224*4882a593Smuzhiyun``CPUFreq`` core. However, in this mode the driver does not register 225*4882a593Smuzhiyunutilization update callbacks with the CPU scheduler and the ``scaling_cur_freq`` 226*4882a593Smuzhiyuninformation comes from the ``CPUFreq`` core (and is the last frequency selected 227*4882a593Smuzhiyunby the current scaling governor for the given policy). 228*4882a593Smuzhiyun 229*4882a593Smuzhiyun 230*4882a593Smuzhiyun.. _turbo: 231*4882a593Smuzhiyun 232*4882a593SmuzhiyunTurbo P-states Support 233*4882a593Smuzhiyun====================== 234*4882a593Smuzhiyun 235*4882a593SmuzhiyunIn the majority of cases, the entire range of P-states available to 236*4882a593Smuzhiyun``intel_pstate`` can be divided into two sub-ranges that correspond to 237*4882a593Smuzhiyundifferent types of processor behavior, above and below a boundary that 238*4882a593Smuzhiyunwill be referred to as the "turbo threshold" in what follows. 239*4882a593Smuzhiyun 240*4882a593SmuzhiyunThe P-states above the turbo threshold are referred to as "turbo P-states" and 241*4882a593Smuzhiyunthe whole sub-range of P-states they belong to is referred to as the "turbo 242*4882a593Smuzhiyunrange". These names are related to the Turbo Boost technology allowing a 243*4882a593Smuzhiyunmulticore processor to opportunistically increase the P-state of one or more 244*4882a593Smuzhiyuncores if there is enough power to do that and if that is not going to cause the 245*4882a593Smuzhiyunthermal envelope of the processor package to be exceeded. 246*4882a593Smuzhiyun 247*4882a593SmuzhiyunSpecifically, if software sets the P-state of a CPU core within the turbo range 248*4882a593Smuzhiyun(that is, above the turbo threshold), the processor is permitted to take over 249*4882a593Smuzhiyunperformance scaling control for that core and put it into turbo P-states of its 250*4882a593Smuzhiyunchoice going forward. However, that permission is interpreted differently by 251*4882a593Smuzhiyundifferent processor generations. Namely, the Sandy Bridge generation of 252*4882a593Smuzhiyunprocessors will never use any P-states above the last one set by software for 253*4882a593Smuzhiyunthe given core, even if it is within the turbo range, whereas all of the later 254*4882a593Smuzhiyunprocessor generations will take it as a license to use any P-states from the 255*4882a593Smuzhiyunturbo range, even above the one set by software. In other words, on those 256*4882a593Smuzhiyunprocessors setting any P-state from the turbo range will enable the processor 257*4882a593Smuzhiyunto put the given core into all turbo P-states up to and including the maximum 258*4882a593Smuzhiyunsupported one as it sees fit. 259*4882a593Smuzhiyun 260*4882a593SmuzhiyunOne important property of turbo P-states is that they are not sustainable. More 261*4882a593Smuzhiyunprecisely, there is no guarantee that any CPUs will be able to stay in any of 262*4882a593Smuzhiyunthose states indefinitely, because the power distribution within the processor 263*4882a593Smuzhiyunpackage may change over time or the thermal envelope it was designed for might 264*4882a593Smuzhiyunbe exceeded if a turbo P-state was used for too long. 265*4882a593Smuzhiyun 266*4882a593SmuzhiyunIn turn, the P-states below the turbo threshold generally are sustainable. In 267*4882a593Smuzhiyunfact, if one of them is set by software, the processor is not expected to change 268*4882a593Smuzhiyunit to a lower one unless in a thermal stress or a power limit violation 269*4882a593Smuzhiyunsituation (a higher P-state may still be used if it is set for another CPU in 270*4882a593Smuzhiyunthe same package at the same time, for example). 271*4882a593Smuzhiyun 272*4882a593SmuzhiyunSome processors allow multiple cores to be in turbo P-states at the same time, 273*4882a593Smuzhiyunbut the maximum P-state that can be set for them generally depends on the number 274*4882a593Smuzhiyunof cores running concurrently. The maximum turbo P-state that can be set for 3 275*4882a593Smuzhiyuncores at the same time usually is lower than the analogous maximum P-state for 276*4882a593Smuzhiyun2 cores, which in turn usually is lower than the maximum turbo P-state that can 277*4882a593Smuzhiyunbe set for 1 core. The one-core maximum turbo P-state is thus the maximum 278*4882a593Smuzhiyunsupported one overall. 279*4882a593Smuzhiyun 280*4882a593SmuzhiyunThe maximum supported turbo P-state, the turbo threshold (the maximum supported 281*4882a593Smuzhiyunnon-turbo P-state) and the minimum supported P-state are specific to the 282*4882a593Smuzhiyunprocessor model and can be determined by reading the processor's model-specific 283*4882a593Smuzhiyunregisters (MSRs). Moreover, some processors support the Configurable TDP 284*4882a593Smuzhiyun(Thermal Design Power) feature and, when that feature is enabled, the turbo 285*4882a593Smuzhiyunthreshold effectively becomes a configurable value that can be set by the 286*4882a593Smuzhiyunplatform firmware. 287*4882a593Smuzhiyun 288*4882a593SmuzhiyunUnlike ``_PSS`` objects in the ACPI tables, ``intel_pstate`` always exposes 289*4882a593Smuzhiyunthe entire range of available P-states, including the whole turbo range, to the 290*4882a593Smuzhiyun``CPUFreq`` core and (in the passive mode) to generic scaling governors. This 291*4882a593Smuzhiyungenerally causes turbo P-states to be set more often when ``intel_pstate`` is 292*4882a593Smuzhiyunused relative to ACPI-based CPU performance scaling (see `below <acpi-cpufreq_>`_ 293*4882a593Smuzhiyunfor more information). 294*4882a593Smuzhiyun 295*4882a593SmuzhiyunMoreover, since ``intel_pstate`` always knows what the real turbo threshold is 296*4882a593Smuzhiyun(even if the Configurable TDP feature is enabled in the processor), its 297*4882a593Smuzhiyun``no_turbo`` attribute in ``sysfs`` (described `below <no_turbo_attr_>`_) should 298*4882a593Smuzhiyunwork as expected in all cases (that is, if set to disable turbo P-states, it 299*4882a593Smuzhiyunalways should prevent ``intel_pstate`` from using them). 300*4882a593Smuzhiyun 301*4882a593Smuzhiyun 302*4882a593SmuzhiyunProcessor Support 303*4882a593Smuzhiyun================= 304*4882a593Smuzhiyun 305*4882a593SmuzhiyunTo handle a given processor ``intel_pstate`` requires a number of different 306*4882a593Smuzhiyunpieces of information on it to be known, including: 307*4882a593Smuzhiyun 308*4882a593Smuzhiyun * The minimum supported P-state. 309*4882a593Smuzhiyun 310*4882a593Smuzhiyun * The maximum supported `non-turbo P-state <turbo_>`_. 311*4882a593Smuzhiyun 312*4882a593Smuzhiyun * Whether or not turbo P-states are supported at all. 313*4882a593Smuzhiyun 314*4882a593Smuzhiyun * The maximum supported `one-core turbo P-state <turbo_>`_ (if turbo P-states 315*4882a593Smuzhiyun are supported). 316*4882a593Smuzhiyun 317*4882a593Smuzhiyun * The scaling formula to translate the driver's internal representation 318*4882a593Smuzhiyun of P-states into frequencies and the other way around. 319*4882a593Smuzhiyun 320*4882a593SmuzhiyunGenerally, ways to obtain that information are specific to the processor model 321*4882a593Smuzhiyunor family. Although it often is possible to obtain all of it from the processor 322*4882a593Smuzhiyunitself (using model-specific registers), there are cases in which hardware 323*4882a593Smuzhiyunmanuals need to be consulted to get to it too. 324*4882a593Smuzhiyun 325*4882a593SmuzhiyunFor this reason, there is a list of supported processors in ``intel_pstate`` and 326*4882a593Smuzhiyunthe driver initialization will fail if the detected processor is not in that 327*4882a593Smuzhiyunlist, unless it supports the HWP feature. [The interface to obtain all of the 328*4882a593Smuzhiyuninformation listed above is the same for all of the processors supporting the 329*4882a593SmuzhiyunHWP feature, which is why ``intel_pstate`` works with all of them.] 330*4882a593Smuzhiyun 331*4882a593Smuzhiyun 332*4882a593SmuzhiyunUser Space Interface in ``sysfs`` 333*4882a593Smuzhiyun================================= 334*4882a593Smuzhiyun 335*4882a593SmuzhiyunGlobal Attributes 336*4882a593Smuzhiyun----------------- 337*4882a593Smuzhiyun 338*4882a593Smuzhiyun``intel_pstate`` exposes several global attributes (files) in ``sysfs`` to 339*4882a593Smuzhiyuncontrol its functionality at the system level. They are located in the 340*4882a593Smuzhiyun``/sys/devices/system/cpu/intel_pstate/`` directory and affect all CPUs. 341*4882a593Smuzhiyun 342*4882a593SmuzhiyunSome of them are not present if the ``intel_pstate=per_cpu_perf_limits`` 343*4882a593Smuzhiyunargument is passed to the kernel in the command line. 344*4882a593Smuzhiyun 345*4882a593Smuzhiyun``max_perf_pct`` 346*4882a593Smuzhiyun Maximum P-state the driver is allowed to set in percent of the 347*4882a593Smuzhiyun maximum supported performance level (the highest supported `turbo 348*4882a593Smuzhiyun P-state <turbo_>`_). 349*4882a593Smuzhiyun 350*4882a593Smuzhiyun This attribute will not be exposed if the 351*4882a593Smuzhiyun ``intel_pstate=per_cpu_perf_limits`` argument is present in the kernel 352*4882a593Smuzhiyun command line. 353*4882a593Smuzhiyun 354*4882a593Smuzhiyun``min_perf_pct`` 355*4882a593Smuzhiyun Minimum P-state the driver is allowed to set in percent of the 356*4882a593Smuzhiyun maximum supported performance level (the highest supported `turbo 357*4882a593Smuzhiyun P-state <turbo_>`_). 358*4882a593Smuzhiyun 359*4882a593Smuzhiyun This attribute will not be exposed if the 360*4882a593Smuzhiyun ``intel_pstate=per_cpu_perf_limits`` argument is present in the kernel 361*4882a593Smuzhiyun command line. 362*4882a593Smuzhiyun 363*4882a593Smuzhiyun``num_pstates`` 364*4882a593Smuzhiyun Number of P-states supported by the processor (between 0 and 255 365*4882a593Smuzhiyun inclusive) including both turbo and non-turbo P-states (see 366*4882a593Smuzhiyun `Turbo P-states Support`_). 367*4882a593Smuzhiyun 368*4882a593Smuzhiyun The value of this attribute is not affected by the ``no_turbo`` 369*4882a593Smuzhiyun setting described `below <no_turbo_attr_>`_. 370*4882a593Smuzhiyun 371*4882a593Smuzhiyun This attribute is read-only. 372*4882a593Smuzhiyun 373*4882a593Smuzhiyun``turbo_pct`` 374*4882a593Smuzhiyun Ratio of the `turbo range <turbo_>`_ size to the size of the entire 375*4882a593Smuzhiyun range of supported P-states, in percent. 376*4882a593Smuzhiyun 377*4882a593Smuzhiyun This attribute is read-only. 378*4882a593Smuzhiyun 379*4882a593Smuzhiyun.. _no_turbo_attr: 380*4882a593Smuzhiyun 381*4882a593Smuzhiyun``no_turbo`` 382*4882a593Smuzhiyun If set (equal to 1), the driver is not allowed to set any turbo P-states 383*4882a593Smuzhiyun (see `Turbo P-states Support`_). If unset (equalt to 0, which is the 384*4882a593Smuzhiyun default), turbo P-states can be set by the driver. 385*4882a593Smuzhiyun [Note that ``intel_pstate`` does not support the general ``boost`` 386*4882a593Smuzhiyun attribute (supported by some other scaling drivers) which is replaced 387*4882a593Smuzhiyun by this one.] 388*4882a593Smuzhiyun 389*4882a593Smuzhiyun This attrubute does not affect the maximum supported frequency value 390*4882a593Smuzhiyun supplied to the ``CPUFreq`` core and exposed via the policy interface, 391*4882a593Smuzhiyun but it affects the maximum possible value of per-policy P-state limits 392*4882a593Smuzhiyun (see `Interpretation of Policy Attributes`_ below for details). 393*4882a593Smuzhiyun 394*4882a593Smuzhiyun``hwp_dynamic_boost`` 395*4882a593Smuzhiyun This attribute is only present if ``intel_pstate`` works in the 396*4882a593Smuzhiyun `active mode with the HWP feature enabled <Active Mode With HWP_>`_ in 397*4882a593Smuzhiyun the processor. If set (equal to 1), it causes the minimum P-state limit 398*4882a593Smuzhiyun to be increased dynamically for a short time whenever a task previously 399*4882a593Smuzhiyun waiting on I/O is selected to run on a given logical CPU (the purpose 400*4882a593Smuzhiyun of this mechanism is to improve performance). 401*4882a593Smuzhiyun 402*4882a593Smuzhiyun This setting has no effect on logical CPUs whose minimum P-state limit 403*4882a593Smuzhiyun is directly set to the highest non-turbo P-state or above it. 404*4882a593Smuzhiyun 405*4882a593Smuzhiyun.. _status_attr: 406*4882a593Smuzhiyun 407*4882a593Smuzhiyun``status`` 408*4882a593Smuzhiyun Operation mode of the driver: "active", "passive" or "off". 409*4882a593Smuzhiyun 410*4882a593Smuzhiyun "active" 411*4882a593Smuzhiyun The driver is functional and in the `active mode 412*4882a593Smuzhiyun <Active Mode_>`_. 413*4882a593Smuzhiyun 414*4882a593Smuzhiyun "passive" 415*4882a593Smuzhiyun The driver is functional and in the `passive mode 416*4882a593Smuzhiyun <Passive Mode_>`_. 417*4882a593Smuzhiyun 418*4882a593Smuzhiyun "off" 419*4882a593Smuzhiyun The driver is not functional (it is not registered as a scaling 420*4882a593Smuzhiyun driver with the ``CPUFreq`` core). 421*4882a593Smuzhiyun 422*4882a593Smuzhiyun This attribute can be written to in order to change the driver's 423*4882a593Smuzhiyun operation mode or to unregister it. The string written to it must be 424*4882a593Smuzhiyun one of the possible values of it and, if successful, the write will 425*4882a593Smuzhiyun cause the driver to switch over to the operation mode represented by 426*4882a593Smuzhiyun that string - or to be unregistered in the "off" case. [Actually, 427*4882a593Smuzhiyun switching over from the active mode to the passive mode or the other 428*4882a593Smuzhiyun way around causes the driver to be unregistered and registered again 429*4882a593Smuzhiyun with a different set of callbacks, so all of its settings (the global 430*4882a593Smuzhiyun as well as the per-policy ones) are then reset to their default 431*4882a593Smuzhiyun values, possibly depending on the target operation mode.] 432*4882a593Smuzhiyun 433*4882a593Smuzhiyun``energy_efficiency`` 434*4882a593Smuzhiyun This attribute is only present on platforms with CPUs matching the Kaby 435*4882a593Smuzhiyun Lake or Coffee Lake desktop CPU model. By default, energy-efficiency 436*4882a593Smuzhiyun optimizations are disabled on these CPU models if HWP is enabled. 437*4882a593Smuzhiyun Enabling energy-efficiency optimizations may limit maximum operating 438*4882a593Smuzhiyun frequency with or without the HWP feature. With HWP enabled, the 439*4882a593Smuzhiyun optimizations are done only in the turbo frequency range. Without it, 440*4882a593Smuzhiyun they are done in the entire available frequency range. Setting this 441*4882a593Smuzhiyun attribute to "1" enables the energy-efficiency optimizations and setting 442*4882a593Smuzhiyun to "0" disables them. 443*4882a593Smuzhiyun 444*4882a593SmuzhiyunInterpretation of Policy Attributes 445*4882a593Smuzhiyun----------------------------------- 446*4882a593Smuzhiyun 447*4882a593SmuzhiyunThe interpretation of some ``CPUFreq`` policy attributes described in 448*4882a593Smuzhiyun:doc:`cpufreq` is special with ``intel_pstate`` as the current scaling driver 449*4882a593Smuzhiyunand it generally depends on the driver's `operation mode <Operation Modes_>`_. 450*4882a593Smuzhiyun 451*4882a593SmuzhiyunFirst of all, the values of the ``cpuinfo_max_freq``, ``cpuinfo_min_freq`` and 452*4882a593Smuzhiyun``scaling_cur_freq`` attributes are produced by applying a processor-specific 453*4882a593Smuzhiyunmultiplier to the internal P-state representation used by ``intel_pstate``. 454*4882a593SmuzhiyunAlso, the values of the ``scaling_max_freq`` and ``scaling_min_freq`` 455*4882a593Smuzhiyunattributes are capped by the frequency corresponding to the maximum P-state that 456*4882a593Smuzhiyunthe driver is allowed to set. 457*4882a593Smuzhiyun 458*4882a593SmuzhiyunIf the ``no_turbo`` `global attribute <no_turbo_attr_>`_ is set, the driver is 459*4882a593Smuzhiyunnot allowed to use turbo P-states, so the maximum value of ``scaling_max_freq`` 460*4882a593Smuzhiyunand ``scaling_min_freq`` is limited to the maximum non-turbo P-state frequency. 461*4882a593SmuzhiyunAccordingly, setting ``no_turbo`` causes ``scaling_max_freq`` and 462*4882a593Smuzhiyun``scaling_min_freq`` to go down to that value if they were above it before. 463*4882a593SmuzhiyunHowever, the old values of ``scaling_max_freq`` and ``scaling_min_freq`` will be 464*4882a593Smuzhiyunrestored after unsetting ``no_turbo``, unless these attributes have been written 465*4882a593Smuzhiyunto after ``no_turbo`` was set. 466*4882a593Smuzhiyun 467*4882a593SmuzhiyunIf ``no_turbo`` is not set, the maximum possible value of ``scaling_max_freq`` 468*4882a593Smuzhiyunand ``scaling_min_freq`` corresponds to the maximum supported turbo P-state, 469*4882a593Smuzhiyunwhich also is the value of ``cpuinfo_max_freq`` in either case. 470*4882a593Smuzhiyun 471*4882a593SmuzhiyunNext, the following policy attributes have special meaning if 472*4882a593Smuzhiyun``intel_pstate`` works in the `active mode <Active Mode_>`_: 473*4882a593Smuzhiyun 474*4882a593Smuzhiyun``scaling_available_governors`` 475*4882a593Smuzhiyun List of P-state selection algorithms provided by ``intel_pstate``. 476*4882a593Smuzhiyun 477*4882a593Smuzhiyun``scaling_governor`` 478*4882a593Smuzhiyun P-state selection algorithm provided by ``intel_pstate`` currently in 479*4882a593Smuzhiyun use with the given policy. 480*4882a593Smuzhiyun 481*4882a593Smuzhiyun``scaling_cur_freq`` 482*4882a593Smuzhiyun Frequency of the average P-state of the CPU represented by the given 483*4882a593Smuzhiyun policy for the time interval between the last two invocations of the 484*4882a593Smuzhiyun driver's utilization update callback by the CPU scheduler for that CPU. 485*4882a593Smuzhiyun 486*4882a593SmuzhiyunOne more policy attribute is present if the HWP feature is enabled in the 487*4882a593Smuzhiyunprocessor: 488*4882a593Smuzhiyun 489*4882a593Smuzhiyun``base_frequency`` 490*4882a593Smuzhiyun Shows the base frequency of the CPU. Any frequency above this will be 491*4882a593Smuzhiyun in the turbo frequency range. 492*4882a593Smuzhiyun 493*4882a593SmuzhiyunThe meaning of these attributes in the `passive mode <Passive Mode_>`_ is the 494*4882a593Smuzhiyunsame as for other scaling drivers. 495*4882a593Smuzhiyun 496*4882a593SmuzhiyunAdditionally, the value of the ``scaling_driver`` attribute for ``intel_pstate`` 497*4882a593Smuzhiyundepends on the operation mode of the driver. Namely, it is either 498*4882a593Smuzhiyun"intel_pstate" (in the `active mode <Active Mode_>`_) or "intel_cpufreq" (in the 499*4882a593Smuzhiyun`passive mode <Passive Mode_>`_). 500*4882a593Smuzhiyun 501*4882a593SmuzhiyunCoordination of P-State Limits 502*4882a593Smuzhiyun------------------------------ 503*4882a593Smuzhiyun 504*4882a593Smuzhiyun``intel_pstate`` allows P-state limits to be set in two ways: with the help of 505*4882a593Smuzhiyunthe ``max_perf_pct`` and ``min_perf_pct`` `global attributes 506*4882a593Smuzhiyun<Global Attributes_>`_ or via the ``scaling_max_freq`` and ``scaling_min_freq`` 507*4882a593Smuzhiyun``CPUFreq`` policy attributes. The coordination between those limits is based 508*4882a593Smuzhiyunon the following rules, regardless of the current operation mode of the driver: 509*4882a593Smuzhiyun 510*4882a593Smuzhiyun 1. All CPUs are affected by the global limits (that is, none of them can be 511*4882a593Smuzhiyun requested to run faster than the global maximum and none of them can be 512*4882a593Smuzhiyun requested to run slower than the global minimum). 513*4882a593Smuzhiyun 514*4882a593Smuzhiyun 2. Each individual CPU is affected by its own per-policy limits (that is, it 515*4882a593Smuzhiyun cannot be requested to run faster than its own per-policy maximum and it 516*4882a593Smuzhiyun cannot be requested to run slower than its own per-policy minimum). The 517*4882a593Smuzhiyun effective performance depends on whether the platform supports per core 518*4882a593Smuzhiyun P-states, hyper-threading is enabled and on current performance requests 519*4882a593Smuzhiyun from other CPUs. When platform doesn't support per core P-states, the 520*4882a593Smuzhiyun effective performance can be more than the policy limits set on a CPU, if 521*4882a593Smuzhiyun other CPUs are requesting higher performance at that moment. Even with per 522*4882a593Smuzhiyun core P-states support, when hyper-threading is enabled, if the sibling CPU 523*4882a593Smuzhiyun is requesting higher performance, the other siblings will get higher 524*4882a593Smuzhiyun performance than their policy limits. 525*4882a593Smuzhiyun 526*4882a593Smuzhiyun 3. The global and per-policy limits can be set independently. 527*4882a593Smuzhiyun 528*4882a593SmuzhiyunIn the `active mode with the HWP feature enabled <Active Mode With HWP_>`_, the 529*4882a593Smuzhiyunresulting effective values are written into hardware registers whenever the 530*4882a593Smuzhiyunlimits change in order to request its internal P-state selection logic to always 531*4882a593Smuzhiyunset P-states within these limits. Otherwise, the limits are taken into account 532*4882a593Smuzhiyunby scaling governors (in the `passive mode <Passive Mode_>`_) and by the driver 533*4882a593Smuzhiyunevery time before setting a new P-state for a CPU. 534*4882a593Smuzhiyun 535*4882a593SmuzhiyunAdditionally, if the ``intel_pstate=per_cpu_perf_limits`` command line argument 536*4882a593Smuzhiyunis passed to the kernel, ``max_perf_pct`` and ``min_perf_pct`` are not exposed 537*4882a593Smuzhiyunat all and the only way to set the limits is by using the policy attributes. 538*4882a593Smuzhiyun 539*4882a593Smuzhiyun 540*4882a593SmuzhiyunEnergy vs Performance Hints 541*4882a593Smuzhiyun--------------------------- 542*4882a593Smuzhiyun 543*4882a593SmuzhiyunIf the hardware-managed P-states (HWP) is enabled in the processor, additional 544*4882a593Smuzhiyunattributes, intended to allow user space to help ``intel_pstate`` to adjust the 545*4882a593Smuzhiyunprocessor's internal P-state selection logic by focusing it on performance or on 546*4882a593Smuzhiyunenergy-efficiency, or somewhere between the two extremes, are present in every 547*4882a593Smuzhiyun``CPUFreq`` policy directory in ``sysfs``. They are : 548*4882a593Smuzhiyun 549*4882a593Smuzhiyun``energy_performance_preference`` 550*4882a593Smuzhiyun Current value of the energy vs performance hint for the given policy 551*4882a593Smuzhiyun (or the CPU represented by it). 552*4882a593Smuzhiyun 553*4882a593Smuzhiyun The hint can be changed by writing to this attribute. 554*4882a593Smuzhiyun 555*4882a593Smuzhiyun``energy_performance_available_preferences`` 556*4882a593Smuzhiyun List of strings that can be written to the 557*4882a593Smuzhiyun ``energy_performance_preference`` attribute. 558*4882a593Smuzhiyun 559*4882a593Smuzhiyun They represent different energy vs performance hints and should be 560*4882a593Smuzhiyun self-explanatory, except that ``default`` represents whatever hint 561*4882a593Smuzhiyun value was set by the platform firmware. 562*4882a593Smuzhiyun 563*4882a593SmuzhiyunStrings written to the ``energy_performance_preference`` attribute are 564*4882a593Smuzhiyuninternally translated to integer values written to the processor's 565*4882a593SmuzhiyunEnergy-Performance Preference (EPP) knob (if supported) or its 566*4882a593SmuzhiyunEnergy-Performance Bias (EPB) knob. It is also possible to write a positive 567*4882a593Smuzhiyuninteger value between 0 to 255, if the EPP feature is present. If the EPP 568*4882a593Smuzhiyunfeature is not present, writing integer value to this attribute is not 569*4882a593Smuzhiyunsupported. In this case, user can use the 570*4882a593Smuzhiyun"/sys/devices/system/cpu/cpu*/power/energy_perf_bias" interface. 571*4882a593Smuzhiyun 572*4882a593Smuzhiyun[Note that tasks may by migrated from one CPU to another by the scheduler's 573*4882a593Smuzhiyunload-balancing algorithm and if different energy vs performance hints are 574*4882a593Smuzhiyunset for those CPUs, that may lead to undesirable outcomes. To avoid such 575*4882a593Smuzhiyunissues it is better to set the same energy vs performance hint for all CPUs 576*4882a593Smuzhiyunor to pin every task potentially sensitive to them to a specific CPU.] 577*4882a593Smuzhiyun 578*4882a593Smuzhiyun.. _acpi-cpufreq: 579*4882a593Smuzhiyun 580*4882a593Smuzhiyun``intel_pstate`` vs ``acpi-cpufreq`` 581*4882a593Smuzhiyun==================================== 582*4882a593Smuzhiyun 583*4882a593SmuzhiyunOn the majority of systems supported by ``intel_pstate``, the ACPI tables 584*4882a593Smuzhiyunprovided by the platform firmware contain ``_PSS`` objects returning information 585*4882a593Smuzhiyunthat can be used for CPU performance scaling (refer to the ACPI specification 586*4882a593Smuzhiyun[3]_ for details on the ``_PSS`` objects and the format of the information 587*4882a593Smuzhiyunreturned by them). 588*4882a593Smuzhiyun 589*4882a593SmuzhiyunThe information returned by the ACPI ``_PSS`` objects is used by the 590*4882a593Smuzhiyun``acpi-cpufreq`` scaling driver. On systems supported by ``intel_pstate`` 591*4882a593Smuzhiyunthe ``acpi-cpufreq`` driver uses the same hardware CPU performance scaling 592*4882a593Smuzhiyuninterface, but the set of P-states it can use is limited by the ``_PSS`` 593*4882a593Smuzhiyunoutput. 594*4882a593Smuzhiyun 595*4882a593SmuzhiyunOn those systems each ``_PSS`` object returns a list of P-states supported by 596*4882a593Smuzhiyunthe corresponding CPU which basically is a subset of the P-states range that can 597*4882a593Smuzhiyunbe used by ``intel_pstate`` on the same system, with one exception: the whole 598*4882a593Smuzhiyun`turbo range <turbo_>`_ is represented by one item in it (the topmost one). By 599*4882a593Smuzhiyunconvention, the frequency returned by ``_PSS`` for that item is greater by 1 MHz 600*4882a593Smuzhiyunthan the frequency of the highest non-turbo P-state listed by it, but the 601*4882a593Smuzhiyuncorresponding P-state representation (following the hardware specification) 602*4882a593Smuzhiyunreturned for it matches the maximum supported turbo P-state (or is the 603*4882a593Smuzhiyunspecial value 255 meaning essentially "go as high as you can get"). 604*4882a593Smuzhiyun 605*4882a593SmuzhiyunThe list of P-states returned by ``_PSS`` is reflected by the table of 606*4882a593Smuzhiyunavailable frequencies supplied by ``acpi-cpufreq`` to the ``CPUFreq`` core and 607*4882a593Smuzhiyunscaling governors and the minimum and maximum supported frequencies reported by 608*4882a593Smuzhiyunit come from that list as well. In particular, given the special representation 609*4882a593Smuzhiyunof the turbo range described above, this means that the maximum supported 610*4882a593Smuzhiyunfrequency reported by ``acpi-cpufreq`` is higher by 1 MHz than the frequency 611*4882a593Smuzhiyunof the highest supported non-turbo P-state listed by ``_PSS`` which, of course, 612*4882a593Smuzhiyunaffects decisions made by the scaling governors, except for ``powersave`` and 613*4882a593Smuzhiyun``performance``. 614*4882a593Smuzhiyun 615*4882a593SmuzhiyunFor example, if a given governor attempts to select a frequency proportional to 616*4882a593Smuzhiyunestimated CPU load and maps the load of 100% to the maximum supported frequency 617*4882a593Smuzhiyun(possibly multiplied by a constant), then it will tend to choose P-states below 618*4882a593Smuzhiyunthe turbo threshold if ``acpi-cpufreq`` is used as the scaling driver, because 619*4882a593Smuzhiyunin that case the turbo range corresponds to a small fraction of the frequency 620*4882a593Smuzhiyunband it can use (1 MHz vs 1 GHz or more). In consequence, it will only go to 621*4882a593Smuzhiyunthe turbo range for the highest loads and the other loads above 50% that might 622*4882a593Smuzhiyunbenefit from running at turbo frequencies will be given non-turbo P-states 623*4882a593Smuzhiyuninstead. 624*4882a593Smuzhiyun 625*4882a593SmuzhiyunOne more issue related to that may appear on systems supporting the 626*4882a593Smuzhiyun`Configurable TDP feature <turbo_>`_ allowing the platform firmware to set the 627*4882a593Smuzhiyunturbo threshold. Namely, if that is not coordinated with the lists of P-states 628*4882a593Smuzhiyunreturned by ``_PSS`` properly, there may be more than one item corresponding to 629*4882a593Smuzhiyuna turbo P-state in those lists and there may be a problem with avoiding the 630*4882a593Smuzhiyunturbo range (if desirable or necessary). Usually, to avoid using turbo 631*4882a593SmuzhiyunP-states overall, ``acpi-cpufreq`` simply avoids using the topmost state listed 632*4882a593Smuzhiyunby ``_PSS``, but that is not sufficient when there are other turbo P-states in 633*4882a593Smuzhiyunthe list returned by it. 634*4882a593Smuzhiyun 635*4882a593SmuzhiyunApart from the above, ``acpi-cpufreq`` works like ``intel_pstate`` in the 636*4882a593Smuzhiyun`passive mode <Passive Mode_>`_, except that the number of P-states it can set 637*4882a593Smuzhiyunis limited to the ones listed by the ACPI ``_PSS`` objects. 638*4882a593Smuzhiyun 639*4882a593Smuzhiyun 640*4882a593SmuzhiyunKernel Command Line Options for ``intel_pstate`` 641*4882a593Smuzhiyun================================================ 642*4882a593Smuzhiyun 643*4882a593SmuzhiyunSeveral kernel command line options can be used to pass early-configuration-time 644*4882a593Smuzhiyunparameters to ``intel_pstate`` in order to enforce specific behavior of it. All 645*4882a593Smuzhiyunof them have to be prepended with the ``intel_pstate=`` prefix. 646*4882a593Smuzhiyun 647*4882a593Smuzhiyun``disable`` 648*4882a593Smuzhiyun Do not register ``intel_pstate`` as the scaling driver even if the 649*4882a593Smuzhiyun processor is supported by it. 650*4882a593Smuzhiyun 651*4882a593Smuzhiyun``active`` 652*4882a593Smuzhiyun Register ``intel_pstate`` in the `active mode <Active Mode_>`_ to start 653*4882a593Smuzhiyun with. 654*4882a593Smuzhiyun 655*4882a593Smuzhiyun``passive`` 656*4882a593Smuzhiyun Register ``intel_pstate`` in the `passive mode <Passive Mode_>`_ to 657*4882a593Smuzhiyun start with. 658*4882a593Smuzhiyun 659*4882a593Smuzhiyun``force`` 660*4882a593Smuzhiyun Register ``intel_pstate`` as the scaling driver instead of 661*4882a593Smuzhiyun ``acpi-cpufreq`` even if the latter is preferred on the given system. 662*4882a593Smuzhiyun 663*4882a593Smuzhiyun This may prevent some platform features (such as thermal controls and 664*4882a593Smuzhiyun power capping) that rely on the availability of ACPI P-states 665*4882a593Smuzhiyun information from functioning as expected, so it should be used with 666*4882a593Smuzhiyun caution. 667*4882a593Smuzhiyun 668*4882a593Smuzhiyun This option does not work with processors that are not supported by 669*4882a593Smuzhiyun ``intel_pstate`` and on platforms where the ``pcc-cpufreq`` scaling 670*4882a593Smuzhiyun driver is used instead of ``acpi-cpufreq``. 671*4882a593Smuzhiyun 672*4882a593Smuzhiyun``no_hwp`` 673*4882a593Smuzhiyun Do not enable the hardware-managed P-states (HWP) feature even if it is 674*4882a593Smuzhiyun supported by the processor. 675*4882a593Smuzhiyun 676*4882a593Smuzhiyun``hwp_only`` 677*4882a593Smuzhiyun Register ``intel_pstate`` as the scaling driver only if the 678*4882a593Smuzhiyun hardware-managed P-states (HWP) feature is supported by the processor. 679*4882a593Smuzhiyun 680*4882a593Smuzhiyun``support_acpi_ppc`` 681*4882a593Smuzhiyun Take ACPI ``_PPC`` performance limits into account. 682*4882a593Smuzhiyun 683*4882a593Smuzhiyun If the preferred power management profile in the FADT (Fixed ACPI 684*4882a593Smuzhiyun Description Table) is set to "Enterprise Server" or "Performance 685*4882a593Smuzhiyun Server", the ACPI ``_PPC`` limits are taken into account by default 686*4882a593Smuzhiyun and this option has no effect. 687*4882a593Smuzhiyun 688*4882a593Smuzhiyun``per_cpu_perf_limits`` 689*4882a593Smuzhiyun Use per-logical-CPU P-State limits (see `Coordination of P-state 690*4882a593Smuzhiyun Limits`_ for details). 691*4882a593Smuzhiyun 692*4882a593Smuzhiyun 693*4882a593SmuzhiyunDiagnostics and Tuning 694*4882a593Smuzhiyun====================== 695*4882a593Smuzhiyun 696*4882a593SmuzhiyunTrace Events 697*4882a593Smuzhiyun------------ 698*4882a593Smuzhiyun 699*4882a593SmuzhiyunThere are two static trace events that can be used for ``intel_pstate`` 700*4882a593Smuzhiyundiagnostics. One of them is the ``cpu_frequency`` trace event generally used 701*4882a593Smuzhiyunby ``CPUFreq``, and the other one is the ``pstate_sample`` trace event specific 702*4882a593Smuzhiyunto ``intel_pstate``. Both of them are triggered by ``intel_pstate`` only if 703*4882a593Smuzhiyunit works in the `active mode <Active Mode_>`_. 704*4882a593Smuzhiyun 705*4882a593SmuzhiyunThe following sequence of shell commands can be used to enable them and see 706*4882a593Smuzhiyuntheir output (if the kernel is generally configured to support event tracing):: 707*4882a593Smuzhiyun 708*4882a593Smuzhiyun # cd /sys/kernel/debug/tracing/ 709*4882a593Smuzhiyun # echo 1 > events/power/pstate_sample/enable 710*4882a593Smuzhiyun # echo 1 > events/power/cpu_frequency/enable 711*4882a593Smuzhiyun # cat trace 712*4882a593Smuzhiyun gnome-terminal--4510 [001] ..s. 1177.680733: pstate_sample: core_busy=107 scaled=94 from=26 to=26 mperf=1143818 aperf=1230607 tsc=29838618 freq=2474476 713*4882a593Smuzhiyun cat-5235 [002] ..s. 1177.681723: cpu_frequency: state=2900000 cpu_id=2 714*4882a593Smuzhiyun 715*4882a593SmuzhiyunIf ``intel_pstate`` works in the `passive mode <Passive Mode_>`_, the 716*4882a593Smuzhiyun``cpu_frequency`` trace event will be triggered either by the ``schedutil`` 717*4882a593Smuzhiyunscaling governor (for the policies it is attached to), or by the ``CPUFreq`` 718*4882a593Smuzhiyuncore (for the policies with other scaling governors). 719*4882a593Smuzhiyun 720*4882a593Smuzhiyun``ftrace`` 721*4882a593Smuzhiyun---------- 722*4882a593Smuzhiyun 723*4882a593SmuzhiyunThe ``ftrace`` interface can be used for low-level diagnostics of 724*4882a593Smuzhiyun``intel_pstate``. For example, to check how often the function to set a 725*4882a593SmuzhiyunP-state is called, the ``ftrace`` filter can be set to 726*4882a593Smuzhiyun:c:func:`intel_pstate_set_pstate`:: 727*4882a593Smuzhiyun 728*4882a593Smuzhiyun # cd /sys/kernel/debug/tracing/ 729*4882a593Smuzhiyun # cat available_filter_functions | grep -i pstate 730*4882a593Smuzhiyun intel_pstate_set_pstate 731*4882a593Smuzhiyun intel_pstate_cpu_init 732*4882a593Smuzhiyun ... 733*4882a593Smuzhiyun # echo intel_pstate_set_pstate > set_ftrace_filter 734*4882a593Smuzhiyun # echo function > current_tracer 735*4882a593Smuzhiyun # cat trace | head -15 736*4882a593Smuzhiyun # tracer: function 737*4882a593Smuzhiyun # 738*4882a593Smuzhiyun # entries-in-buffer/entries-written: 80/80 #P:4 739*4882a593Smuzhiyun # 740*4882a593Smuzhiyun # _-----=> irqs-off 741*4882a593Smuzhiyun # / _----=> need-resched 742*4882a593Smuzhiyun # | / _---=> hardirq/softirq 743*4882a593Smuzhiyun # || / _--=> preempt-depth 744*4882a593Smuzhiyun # ||| / delay 745*4882a593Smuzhiyun # TASK-PID CPU# |||| TIMESTAMP FUNCTION 746*4882a593Smuzhiyun # | | | |||| | | 747*4882a593Smuzhiyun Xorg-3129 [000] ..s. 2537.644844: intel_pstate_set_pstate <-intel_pstate_timer_func 748*4882a593Smuzhiyun gnome-terminal--4510 [002] ..s. 2537.649844: intel_pstate_set_pstate <-intel_pstate_timer_func 749*4882a593Smuzhiyun gnome-shell-3409 [001] ..s. 2537.650850: intel_pstate_set_pstate <-intel_pstate_timer_func 750*4882a593Smuzhiyun <idle>-0 [000] ..s. 2537.654843: intel_pstate_set_pstate <-intel_pstate_timer_func 751*4882a593Smuzhiyun 752*4882a593Smuzhiyun 753*4882a593SmuzhiyunReferences 754*4882a593Smuzhiyun========== 755*4882a593Smuzhiyun 756*4882a593Smuzhiyun.. [1] Kristen Accardi, *Balancing Power and Performance in the Linux Kernel*, 757*4882a593Smuzhiyun https://events.static.linuxfound.org/sites/events/files/slides/LinuxConEurope_2015.pdf 758*4882a593Smuzhiyun 759*4882a593Smuzhiyun.. [2] *Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3: System Programming Guide*, 760*4882a593Smuzhiyun https://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-system-programming-manual-325384.html 761*4882a593Smuzhiyun 762*4882a593Smuzhiyun.. [3] *Advanced Configuration and Power Interface Specification*, 763*4882a593Smuzhiyun https://uefi.org/sites/default/files/resources/ACPI_6_3_final_Jan30.pdf 764