xref: /OK3568_Linux_fs/kernel/Documentation/virt/kvm/halt-polling.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun===========================
4*4882a593SmuzhiyunThe KVM halt polling system
5*4882a593Smuzhiyun===========================
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunThe KVM halt polling system provides a feature within KVM whereby the latency
8*4882a593Smuzhiyunof a guest can, under some circumstances, be reduced by polling in the host
9*4882a593Smuzhiyunfor some time period after the guest has elected to no longer run by cedeing.
10*4882a593SmuzhiyunThat is, when a guest vcpu has ceded, or in the case of powerpc when all of the
11*4882a593Smuzhiyunvcpus of a single vcore have ceded, the host kernel polls for wakeup conditions
12*4882a593Smuzhiyunbefore giving up the cpu to the scheduler in order to let something else run.
13*4882a593Smuzhiyun
14*4882a593SmuzhiyunPolling provides a latency advantage in cases where the guest can be run again
15*4882a593Smuzhiyunvery quickly by at least saving us a trip through the scheduler, normally on
16*4882a593Smuzhiyunthe order of a few micro-seconds, although performance benefits are workload
17*4882a593Smuzhiyundependant. In the event that no wakeup source arrives during the polling
18*4882a593Smuzhiyuninterval or some other task on the runqueue is runnable the scheduler is
19*4882a593Smuzhiyuninvoked. Thus halt polling is especially useful on workloads with very short
20*4882a593Smuzhiyunwakeup periods where the time spent halt polling is minimised and the time
21*4882a593Smuzhiyunsavings of not invoking the scheduler are distinguishable.
22*4882a593Smuzhiyun
23*4882a593SmuzhiyunThe generic halt polling code is implemented in:
24*4882a593Smuzhiyun
25*4882a593Smuzhiyun	virt/kvm/kvm_main.c: kvm_vcpu_block()
26*4882a593Smuzhiyun
27*4882a593SmuzhiyunThe powerpc kvm-hv specific case is implemented in:
28*4882a593Smuzhiyun
29*4882a593Smuzhiyun	arch/powerpc/kvm/book3s_hv.c: kvmppc_vcore_blocked()
30*4882a593Smuzhiyun
31*4882a593SmuzhiyunHalt Polling Interval
32*4882a593Smuzhiyun=====================
33*4882a593Smuzhiyun
34*4882a593SmuzhiyunThe maximum time for which to poll before invoking the scheduler, referred to
35*4882a593Smuzhiyunas the halt polling interval, is increased and decreased based on the perceived
36*4882a593Smuzhiyuneffectiveness of the polling in an attempt to limit pointless polling.
37*4882a593SmuzhiyunThis value is stored in either the vcpu struct:
38*4882a593Smuzhiyun
39*4882a593Smuzhiyun	kvm_vcpu->halt_poll_ns
40*4882a593Smuzhiyun
41*4882a593Smuzhiyunor in the case of powerpc kvm-hv, in the vcore struct:
42*4882a593Smuzhiyun
43*4882a593Smuzhiyun	kvmppc_vcore->halt_poll_ns
44*4882a593Smuzhiyun
45*4882a593SmuzhiyunThus this is a per vcpu (or vcore) value.
46*4882a593Smuzhiyun
47*4882a593SmuzhiyunDuring polling if a wakeup source is received within the halt polling interval,
48*4882a593Smuzhiyunthe interval is left unchanged. In the event that a wakeup source isn't
49*4882a593Smuzhiyunreceived during the polling interval (and thus schedule is invoked) there are
50*4882a593Smuzhiyuntwo options, either the polling interval and total block time[0] were less than
51*4882a593Smuzhiyunthe global max polling interval (see module params below), or the total block
52*4882a593Smuzhiyuntime was greater than the global max polling interval.
53*4882a593Smuzhiyun
54*4882a593SmuzhiyunIn the event that both the polling interval and total block time were less than
55*4882a593Smuzhiyunthe global max polling interval then the polling interval can be increased in
56*4882a593Smuzhiyunthe hope that next time during the longer polling interval the wake up source
57*4882a593Smuzhiyunwill be received while the host is polling and the latency benefits will be
58*4882a593Smuzhiyunreceived. The polling interval is grown in the function grow_halt_poll_ns() and
59*4882a593Smuzhiyunis multiplied by the module parameters halt_poll_ns_grow and
60*4882a593Smuzhiyunhalt_poll_ns_grow_start.
61*4882a593Smuzhiyun
62*4882a593SmuzhiyunIn the event that the total block time was greater than the global max polling
63*4882a593Smuzhiyuninterval then the host will never poll for long enough (limited by the global
64*4882a593Smuzhiyunmax) to wakeup during the polling interval so it may as well be shrunk in order
65*4882a593Smuzhiyunto avoid pointless polling. The polling interval is shrunk in the function
66*4882a593Smuzhiyunshrink_halt_poll_ns() and is divided by the module parameter
67*4882a593Smuzhiyunhalt_poll_ns_shrink, or set to 0 iff halt_poll_ns_shrink == 0.
68*4882a593Smuzhiyun
69*4882a593SmuzhiyunIt is worth noting that this adjustment process attempts to hone in on some
70*4882a593Smuzhiyunsteady state polling interval but will only really do a good job for wakeups
71*4882a593Smuzhiyunwhich come at an approximately constant rate, otherwise there will be constant
72*4882a593Smuzhiyunadjustment of the polling interval.
73*4882a593Smuzhiyun
74*4882a593Smuzhiyun[0] total block time:
75*4882a593Smuzhiyun		      the time between when the halt polling function is
76*4882a593Smuzhiyun		      invoked and a wakeup source received (irrespective of
77*4882a593Smuzhiyun		      whether the scheduler is invoked within that function).
78*4882a593Smuzhiyun
79*4882a593SmuzhiyunModule Parameters
80*4882a593Smuzhiyun=================
81*4882a593Smuzhiyun
82*4882a593SmuzhiyunThe kvm module has 3 tuneable module parameters to adjust the global max
83*4882a593Smuzhiyunpolling interval as well as the rate at which the polling interval is grown and
84*4882a593Smuzhiyunshrunk. These variables are defined in include/linux/kvm_host.h and as module
85*4882a593Smuzhiyunparameters in virt/kvm/kvm_main.c, or arch/powerpc/kvm/book3s_hv.c in the
86*4882a593Smuzhiyunpowerpc kvm-hv case.
87*4882a593Smuzhiyun
88*4882a593Smuzhiyun+-----------------------+---------------------------+-------------------------+
89*4882a593Smuzhiyun|Module Parameter	|   Description		    |	     Default Value    |
90*4882a593Smuzhiyun+-----------------------+---------------------------+-------------------------+
91*4882a593Smuzhiyun|halt_poll_ns		| The global max polling    | KVM_HALT_POLL_NS_DEFAULT|
92*4882a593Smuzhiyun|			| interval which defines    |			      |
93*4882a593Smuzhiyun|			| the ceiling value of the  |			      |
94*4882a593Smuzhiyun|			| polling interval for      | (per arch value)	      |
95*4882a593Smuzhiyun|			| each vcpu.		    |			      |
96*4882a593Smuzhiyun+-----------------------+---------------------------+-------------------------+
97*4882a593Smuzhiyun|halt_poll_ns_grow	| The value by which the    | 2			      |
98*4882a593Smuzhiyun|			| halt polling interval is  |			      |
99*4882a593Smuzhiyun|			| multiplied in the	    |			      |
100*4882a593Smuzhiyun|			| grow_halt_poll_ns()	    |			      |
101*4882a593Smuzhiyun|			| function.		    |			      |
102*4882a593Smuzhiyun+-----------------------+---------------------------+-------------------------+
103*4882a593Smuzhiyun|halt_poll_ns_grow_start| The initial value to grow | 10000		      |
104*4882a593Smuzhiyun|			| to from zero in the	    |			      |
105*4882a593Smuzhiyun|			| grow_halt_poll_ns()	    |			      |
106*4882a593Smuzhiyun|			| function.		    |			      |
107*4882a593Smuzhiyun+-----------------------+---------------------------+-------------------------+
108*4882a593Smuzhiyun|halt_poll_ns_shrink	| The value by which the    | 0			      |
109*4882a593Smuzhiyun|			| halt polling interval is  |			      |
110*4882a593Smuzhiyun|			| divided in the	    |			      |
111*4882a593Smuzhiyun|			| shrink_halt_poll_ns()	    |			      |
112*4882a593Smuzhiyun|			| function.		    |			      |
113*4882a593Smuzhiyun+-----------------------+---------------------------+-------------------------+
114*4882a593Smuzhiyun
115*4882a593SmuzhiyunThese module parameters can be set from the debugfs files in:
116*4882a593Smuzhiyun
117*4882a593Smuzhiyun	/sys/module/kvm/parameters/
118*4882a593Smuzhiyun
119*4882a593SmuzhiyunNote: that these module parameters are system wide values and are not able to
120*4882a593Smuzhiyun      be tuned on a per vm basis.
121*4882a593Smuzhiyun
122*4882a593SmuzhiyunFurther Notes
123*4882a593Smuzhiyun=============
124*4882a593Smuzhiyun
125*4882a593Smuzhiyun- Care should be taken when setting the halt_poll_ns module parameter as a large value
126*4882a593Smuzhiyun  has the potential to drive the cpu usage to 100% on a machine which would be almost
127*4882a593Smuzhiyun  entirely idle otherwise. This is because even if a guest has wakeups during which very
128*4882a593Smuzhiyun  little work is done and which are quite far apart, if the period is shorter than the
129*4882a593Smuzhiyun  global max polling interval (halt_poll_ns) then the host will always poll for the
130*4882a593Smuzhiyun  entire block time and thus cpu utilisation will go to 100%.
131*4882a593Smuzhiyun
132*4882a593Smuzhiyun- Halt polling essentially presents a trade off between power usage and latency and
133*4882a593Smuzhiyun  the module parameters should be used to tune the affinity for this. Idle cpu time is
134*4882a593Smuzhiyun  essentially converted to host kernel time with the aim of decreasing latency when
135*4882a593Smuzhiyun  entering the guest.
136*4882a593Smuzhiyun
137*4882a593Smuzhiyun- Halt polling will only be conducted by the host when no other tasks are runnable on
138*4882a593Smuzhiyun  that cpu, otherwise the polling will cease immediately and schedule will be invoked to
139*4882a593Smuzhiyun  allow that other task to run. Thus this doesn't allow a guest to denial of service the
140*4882a593Smuzhiyun  cpu.
141