xref: /OK3568_Linux_fs/kernel/Documentation/trace/hwlat_detector.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun=========================
2*4882a593SmuzhiyunHardware Latency Detector
3*4882a593Smuzhiyun=========================
4*4882a593Smuzhiyun
5*4882a593SmuzhiyunIntroduction
6*4882a593Smuzhiyun-------------
7*4882a593Smuzhiyun
8*4882a593SmuzhiyunThe tracer hwlat_detector is a special purpose tracer that is used to
9*4882a593Smuzhiyundetect large system latencies induced by the behavior of certain underlying
10*4882a593Smuzhiyunhardware or firmware, independent of Linux itself. The code was developed
11*4882a593Smuzhiyunoriginally to detect SMIs (System Management Interrupts) on x86 systems,
12*4882a593Smuzhiyunhowever there is nothing x86 specific about this patchset. It was
13*4882a593Smuzhiyunoriginally written for use by the "RT" patch since the Real Time
14*4882a593Smuzhiyunkernel is highly latency sensitive.
15*4882a593Smuzhiyun
16*4882a593SmuzhiyunSMIs are not serviced by the Linux kernel, which means that it does not
17*4882a593Smuzhiyuneven know that they are occuring. SMIs are instead set up by BIOS code
18*4882a593Smuzhiyunand are serviced by BIOS code, usually for "critical" events such as
19*4882a593Smuzhiyunmanagement of thermal sensors and fans. Sometimes though, SMIs are used for
20*4882a593Smuzhiyunother tasks and those tasks can spend an inordinate amount of time in the
21*4882a593Smuzhiyunhandler (sometimes measured in milliseconds). Obviously this is a problem if
22*4882a593Smuzhiyunyou are trying to keep event service latencies down in the microsecond range.
23*4882a593Smuzhiyun
24*4882a593SmuzhiyunThe hardware latency detector works by hogging one of the cpus for configurable
25*4882a593Smuzhiyunamounts of time (with interrupts disabled), polling the CPU Time Stamp Counter
26*4882a593Smuzhiyunfor some period, then looking for gaps in the TSC data. Any gap indicates a
27*4882a593Smuzhiyuntime when the polling was interrupted and since the interrupts are disabled,
28*4882a593Smuzhiyunthe only thing that could do that would be an SMI or other hardware hiccup
29*4882a593Smuzhiyun(or an NMI, but those can be tracked).
30*4882a593Smuzhiyun
31*4882a593SmuzhiyunNote that the hwlat detector should *NEVER* be used in a production environment.
32*4882a593SmuzhiyunIt is intended to be run manually to determine if the hardware platform has a
33*4882a593Smuzhiyunproblem with long system firmware service routines.
34*4882a593Smuzhiyun
35*4882a593SmuzhiyunUsage
36*4882a593Smuzhiyun------
37*4882a593Smuzhiyun
38*4882a593SmuzhiyunWrite the ASCII text "hwlat" into the current_tracer file of the tracing system
39*4882a593Smuzhiyun(mounted at /sys/kernel/tracing or /sys/kernel/tracing). It is possible to
40*4882a593Smuzhiyunredefine the threshold in microseconds (us) above which latency spikes will
41*4882a593Smuzhiyunbe taken into account.
42*4882a593Smuzhiyun
43*4882a593SmuzhiyunExample::
44*4882a593Smuzhiyun
45*4882a593Smuzhiyun	# echo hwlat > /sys/kernel/tracing/current_tracer
46*4882a593Smuzhiyun	# echo 100 > /sys/kernel/tracing/tracing_thresh
47*4882a593Smuzhiyun
48*4882a593SmuzhiyunThe /sys/kernel/tracing/hwlat_detector interface contains the following files:
49*4882a593Smuzhiyun
50*4882a593Smuzhiyun  - width - time period to sample with CPUs held (usecs)
51*4882a593Smuzhiyun            must be less than the total window size (enforced)
52*4882a593Smuzhiyun  - window - total period of sampling, width being inside (usecs)
53*4882a593Smuzhiyun
54*4882a593SmuzhiyunBy default the width is set to 500,000 and window to 1,000,000, meaning that
55*4882a593Smuzhiyunfor every 1,000,000 usecs (1s) the hwlat detector will spin for 500,000 usecs
56*4882a593Smuzhiyun(0.5s). If tracing_thresh contains zero when hwlat tracer is enabled, it will
57*4882a593Smuzhiyunchange to a default of 10 usecs. If any latencies that exceed the threshold is
58*4882a593Smuzhiyunobserved then the data will be written to the tracing ring buffer.
59*4882a593Smuzhiyun
60*4882a593SmuzhiyunThe minimum sleep time between periods is 1 millisecond. Even if width
61*4882a593Smuzhiyunis less than 1 millisecond apart from window, to allow the system to not
62*4882a593Smuzhiyunbe totally starved.
63*4882a593Smuzhiyun
64*4882a593SmuzhiyunIf tracing_thresh was zero when hwlat detector was started, it will be set
65*4882a593Smuzhiyunback to zero if another tracer is loaded. Note, the last value in
66*4882a593Smuzhiyuntracing_thresh that hwlat detector had will be saved and this value will
67*4882a593Smuzhiyunbe restored in tracing_thresh if it is still zero when hwlat detector is
68*4882a593Smuzhiyunstarted again.
69*4882a593Smuzhiyun
70*4882a593SmuzhiyunThe following tracing directory files are used by the hwlat_detector:
71*4882a593Smuzhiyun
72*4882a593Smuzhiyunin /sys/kernel/tracing:
73*4882a593Smuzhiyun
74*4882a593Smuzhiyun - tracing_threshold	- minimum latency value to be considered (usecs)
75*4882a593Smuzhiyun - tracing_max_latency	- maximum hardware latency actually observed (usecs)
76*4882a593Smuzhiyun - tracing_cpumask	- the CPUs to move the hwlat thread across
77*4882a593Smuzhiyun - hwlat_detector/width	- specified amount of time to spin within window (usecs)
78*4882a593Smuzhiyun - hwlat_detector/window	- amount of time between (width) runs (usecs)
79*4882a593Smuzhiyun
80*4882a593SmuzhiyunThe hwlat detector's kernel thread will migrate across each CPU specified in
81*4882a593Smuzhiyuntracing_cpumask between each window. To limit the migration, either modify
82*4882a593Smuzhiyuntracing_cpumask, or modify the hwlat kernel thread (named [hwlatd]) CPU
83*4882a593Smuzhiyunaffinity directly, and the migration will stop.
84