xref: /OK3568_Linux_fs/kernel/Documentation/powerpc/imc.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun.. _imc:
3*4882a593Smuzhiyun
4*4882a593Smuzhiyun===================================
5*4882a593SmuzhiyunIMC (In-Memory Collection Counters)
6*4882a593Smuzhiyun===================================
7*4882a593Smuzhiyun
8*4882a593SmuzhiyunAnju T Sudhakar, 10 May 2019
9*4882a593Smuzhiyun
10*4882a593Smuzhiyun.. contents::
11*4882a593Smuzhiyun    :depth: 3
12*4882a593Smuzhiyun
13*4882a593Smuzhiyun
14*4882a593SmuzhiyunBasic overview
15*4882a593Smuzhiyun==============
16*4882a593Smuzhiyun
17*4882a593SmuzhiyunIMC (In-Memory collection counters) is a hardware monitoring facility that
18*4882a593Smuzhiyuncollects large numbers of hardware performance events at Nest level (these are
19*4882a593Smuzhiyunon-chip but off-core), Core level and Thread level.
20*4882a593Smuzhiyun
21*4882a593SmuzhiyunThe Nest PMU counters are handled by a Nest IMC microcode which runs in the OCC
22*4882a593Smuzhiyun(On-Chip Controller) complex. The microcode collects the counter data and moves
23*4882a593Smuzhiyunthe nest IMC counter data to memory.
24*4882a593Smuzhiyun
25*4882a593SmuzhiyunThe Core and Thread IMC PMU counters are handled in the core. Core level PMU
26*4882a593Smuzhiyuncounters give us the IMC counters' data per core and thread level PMU counters
27*4882a593Smuzhiyungive us the IMC counters' data per CPU thread.
28*4882a593Smuzhiyun
29*4882a593SmuzhiyunOPAL obtains the IMC PMU and supported events information from the IMC Catalog
30*4882a593Smuzhiyunand passes on to the kernel via the device tree. The event's information
31*4882a593Smuzhiyuncontains:
32*4882a593Smuzhiyun
33*4882a593Smuzhiyun- Event name
34*4882a593Smuzhiyun- Event Offset
35*4882a593Smuzhiyun- Event description
36*4882a593Smuzhiyun
37*4882a593Smuzhiyunand possibly also:
38*4882a593Smuzhiyun
39*4882a593Smuzhiyun- Event scale
40*4882a593Smuzhiyun- Event unit
41*4882a593Smuzhiyun
42*4882a593SmuzhiyunSome PMUs may have a common scale and unit values for all their supported
43*4882a593Smuzhiyunevents. For those cases, the scale and unit properties for those events must be
44*4882a593Smuzhiyuninherited from the PMU.
45*4882a593Smuzhiyun
46*4882a593SmuzhiyunThe event offset in the memory is where the counter data gets accumulated.
47*4882a593Smuzhiyun
48*4882a593SmuzhiyunIMC catalog is available at:
49*4882a593Smuzhiyun	https://github.com/open-power/ima-catalog
50*4882a593Smuzhiyun
51*4882a593SmuzhiyunThe kernel discovers the IMC counters information in the device tree at the
52*4882a593Smuzhiyun`imc-counters` device node which has a compatible field
53*4882a593Smuzhiyun`ibm,opal-in-memory-counters`. From the device tree, the kernel parses the PMUs
54*4882a593Smuzhiyunand their event's information and register the PMU and its attributes in the
55*4882a593Smuzhiyunkernel.
56*4882a593Smuzhiyun
57*4882a593SmuzhiyunIMC example usage
58*4882a593Smuzhiyun=================
59*4882a593Smuzhiyun
60*4882a593Smuzhiyun.. code-block:: sh
61*4882a593Smuzhiyun
62*4882a593Smuzhiyun  # perf list
63*4882a593Smuzhiyun  [...]
64*4882a593Smuzhiyun  nest_mcs01/PM_MCS01_64B_RD_DISP_PORT01/            [Kernel PMU event]
65*4882a593Smuzhiyun  nest_mcs01/PM_MCS01_64B_RD_DISP_PORT23/            [Kernel PMU event]
66*4882a593Smuzhiyun  [...]
67*4882a593Smuzhiyun  core_imc/CPM_0THRD_NON_IDLE_PCYC/                  [Kernel PMU event]
68*4882a593Smuzhiyun  core_imc/CPM_1THRD_NON_IDLE_INST/                  [Kernel PMU event]
69*4882a593Smuzhiyun  [...]
70*4882a593Smuzhiyun  thread_imc/CPM_0THRD_NON_IDLE_PCYC/                [Kernel PMU event]
71*4882a593Smuzhiyun  thread_imc/CPM_1THRD_NON_IDLE_INST/                [Kernel PMU event]
72*4882a593Smuzhiyun
73*4882a593SmuzhiyunTo see per chip data for nest_mcs0/PM_MCS_DOWN_128B_DATA_XFER_MC0/:
74*4882a593Smuzhiyun
75*4882a593Smuzhiyun.. code-block:: sh
76*4882a593Smuzhiyun
77*4882a593Smuzhiyun  # ./perf stat -e "nest_mcs01/PM_MCS01_64B_WR_DISP_PORT01/" -a --per-socket
78*4882a593Smuzhiyun
79*4882a593SmuzhiyunTo see non-idle instructions for core 0:
80*4882a593Smuzhiyun
81*4882a593Smuzhiyun.. code-block:: sh
82*4882a593Smuzhiyun
83*4882a593Smuzhiyun  # ./perf stat -e "core_imc/CPM_NON_IDLE_INST/" -C 0 -I 1000
84*4882a593Smuzhiyun
85*4882a593SmuzhiyunTo see non-idle instructions for a "make":
86*4882a593Smuzhiyun
87*4882a593Smuzhiyun.. code-block:: sh
88*4882a593Smuzhiyun
89*4882a593Smuzhiyun  # ./perf stat -e "thread_imc/CPM_NON_IDLE_PCYC/" make
90*4882a593Smuzhiyun
91*4882a593Smuzhiyun
92*4882a593SmuzhiyunIMC Trace-mode
93*4882a593Smuzhiyun===============
94*4882a593Smuzhiyun
95*4882a593SmuzhiyunPOWER9 supports two modes for IMC which are the Accumulation mode and Trace
96*4882a593Smuzhiyunmode. In Accumulation mode, event counts are accumulated in system Memory.
97*4882a593SmuzhiyunHypervisor then reads the posted counts periodically or when requested. In IMC
98*4882a593SmuzhiyunTrace mode, the 64 bit trace SCOM value is initialized with the event
99*4882a593Smuzhiyuninformation. The CPMCxSEL and CPMC_LOAD in the trace SCOM, specifies the event
100*4882a593Smuzhiyunto be monitored and the sampling duration. On each overflow in the CPMCxSEL,
101*4882a593Smuzhiyunhardware snapshots the program counter along with event counts and writes into
102*4882a593Smuzhiyunmemory pointed by LDBAR.
103*4882a593Smuzhiyun
104*4882a593SmuzhiyunLDBAR is a 64 bit special purpose per thread register, it has bits to indicate
105*4882a593Smuzhiyunwhether hardware is configured for accumulation or trace mode.
106*4882a593Smuzhiyun
107*4882a593SmuzhiyunLDBAR Register Layout
108*4882a593Smuzhiyun---------------------
109*4882a593Smuzhiyun
110*4882a593Smuzhiyun  +-------+----------------------+
111*4882a593Smuzhiyun  | 0     | Enable/Disable       |
112*4882a593Smuzhiyun  +-------+----------------------+
113*4882a593Smuzhiyun  | 1     | 0: Accumulation Mode |
114*4882a593Smuzhiyun  |       +----------------------+
115*4882a593Smuzhiyun  |       | 1: Trace Mode        |
116*4882a593Smuzhiyun  +-------+----------------------+
117*4882a593Smuzhiyun  | 2:3   | Reserved             |
118*4882a593Smuzhiyun  +-------+----------------------+
119*4882a593Smuzhiyun  | 4-6   | PB scope             |
120*4882a593Smuzhiyun  +-------+----------------------+
121*4882a593Smuzhiyun  | 7     | Reserved             |
122*4882a593Smuzhiyun  +-------+----------------------+
123*4882a593Smuzhiyun  | 8:50  | Counter Address      |
124*4882a593Smuzhiyun  +-------+----------------------+
125*4882a593Smuzhiyun  | 51:63 | Reserved             |
126*4882a593Smuzhiyun  +-------+----------------------+
127*4882a593Smuzhiyun
128*4882a593SmuzhiyunTRACE_IMC_SCOM bit representation
129*4882a593Smuzhiyun---------------------------------
130*4882a593Smuzhiyun
131*4882a593Smuzhiyun  +-------+------------+
132*4882a593Smuzhiyun  | 0:1   | SAMPSEL    |
133*4882a593Smuzhiyun  +-------+------------+
134*4882a593Smuzhiyun  | 2:33  | CPMC_LOAD  |
135*4882a593Smuzhiyun  +-------+------------+
136*4882a593Smuzhiyun  | 34:40 | CPMC1SEL   |
137*4882a593Smuzhiyun  +-------+------------+
138*4882a593Smuzhiyun  | 41:47 | CPMC2SEL   |
139*4882a593Smuzhiyun  +-------+------------+
140*4882a593Smuzhiyun  | 48:50 | BUFFERSIZE |
141*4882a593Smuzhiyun  +-------+------------+
142*4882a593Smuzhiyun  | 51:63 | RESERVED   |
143*4882a593Smuzhiyun  +-------+------------+
144*4882a593Smuzhiyun
145*4882a593SmuzhiyunCPMC_LOAD contains the sampling duration. SAMPSEL and CPMCxSEL determines the
146*4882a593Smuzhiyunevent to count. BUFFERSIZE indicates the memory range. On each overflow,
147*4882a593Smuzhiyunhardware snapshots the program counter along with event counts and updates the
148*4882a593Smuzhiyunmemory and reloads the CMPC_LOAD value for the next sampling duration. IMC
149*4882a593Smuzhiyunhardware does not support exceptions, so it quietly wraps around if memory
150*4882a593Smuzhiyunbuffer reaches the end.
151*4882a593Smuzhiyun
152*4882a593Smuzhiyun*Currently the event monitored for trace-mode is fixed as cycle.*
153*4882a593Smuzhiyun
154*4882a593SmuzhiyunTrace IMC example usage
155*4882a593Smuzhiyun=======================
156*4882a593Smuzhiyun
157*4882a593Smuzhiyun.. code-block:: sh
158*4882a593Smuzhiyun
159*4882a593Smuzhiyun  # perf list
160*4882a593Smuzhiyun  [....]
161*4882a593Smuzhiyun  trace_imc/trace_cycles/                            [Kernel PMU event]
162*4882a593Smuzhiyun
163*4882a593SmuzhiyunTo record an application/process with trace-imc event:
164*4882a593Smuzhiyun
165*4882a593Smuzhiyun.. code-block:: sh
166*4882a593Smuzhiyun
167*4882a593Smuzhiyun  # perf record -e trace_imc/trace_cycles/ yes > /dev/null
168*4882a593Smuzhiyun  [ perf record: Woken up 1 times to write data ]
169*4882a593Smuzhiyun  [ perf record: Captured and wrote 0.012 MB perf.data (21 samples) ]
170*4882a593Smuzhiyun
171*4882a593SmuzhiyunThe `perf.data` generated, can be read using perf report.
172*4882a593Smuzhiyun
173*4882a593SmuzhiyunBenefits of using IMC trace-mode
174*4882a593Smuzhiyun================================
175*4882a593Smuzhiyun
176*4882a593SmuzhiyunPMI (Performance Monitoring Interrupts) interrupt handling is avoided, since IMC
177*4882a593Smuzhiyuntrace mode snapshots the program counter and updates to the memory. And this
178*4882a593Smuzhiyunalso provide a way for the operating system to do instruction sampling in real
179*4882a593Smuzhiyuntime without PMI processing overhead.
180*4882a593Smuzhiyun
181*4882a593SmuzhiyunPerformance data using `perf top` with and without trace-imc event.
182*4882a593Smuzhiyun
183*4882a593SmuzhiyunPMI interrupts count when `perf top` command is executed without trace-imc event.
184*4882a593Smuzhiyun
185*4882a593Smuzhiyun.. code-block:: sh
186*4882a593Smuzhiyun
187*4882a593Smuzhiyun  # grep PMI /proc/interrupts
188*4882a593Smuzhiyun  PMI:          0          0          0          0   Performance monitoring interrupts
189*4882a593Smuzhiyun  # ./perf top
190*4882a593Smuzhiyun  ...
191*4882a593Smuzhiyun  # grep PMI /proc/interrupts
192*4882a593Smuzhiyun  PMI:      39735       8710      17338      17801   Performance monitoring interrupts
193*4882a593Smuzhiyun  # ./perf top -e trace_imc/trace_cycles/
194*4882a593Smuzhiyun  ...
195*4882a593Smuzhiyun  # grep PMI /proc/interrupts
196*4882a593Smuzhiyun  PMI:      39735       8710      17338      17801   Performance monitoring interrupts
197*4882a593Smuzhiyun
198*4882a593Smuzhiyun
199*4882a593SmuzhiyunThat is, the PMI interrupt counts do not increment when using the `trace_imc` event.
200