xref: /OK3568_Linux_fs/kernel/tools/perf/Documentation/callchain-overhead-calculation.txt (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593SmuzhiyunOverhead calculation
2*4882a593Smuzhiyun--------------------
3*4882a593SmuzhiyunThe overhead can be shown in two columns as 'Children' and 'Self' when
4*4882a593Smuzhiyunperf collects callchains.  The 'self' overhead is simply calculated by
5*4882a593Smuzhiyunadding all period values of the entry - usually a function (symbol).
6*4882a593SmuzhiyunThis is the value that perf shows traditionally and sum of all the
7*4882a593Smuzhiyun'self' overhead values should be 100%.
8*4882a593Smuzhiyun
9*4882a593SmuzhiyunThe 'children' overhead is calculated by adding all period values of
10*4882a593Smuzhiyunthe child functions so that it can show the total overhead of the
11*4882a593Smuzhiyunhigher level functions even if they don't directly execute much.
12*4882a593Smuzhiyun'Children' here means functions that are called from another (parent)
13*4882a593Smuzhiyunfunction.
14*4882a593Smuzhiyun
15*4882a593SmuzhiyunIt might be confusing that the sum of all the 'children' overhead
16*4882a593Smuzhiyunvalues exceeds 100% since each of them is already an accumulation of
17*4882a593Smuzhiyun'self' overhead of its child functions.  But with this enabled, users
18*4882a593Smuzhiyuncan find which function has the most overhead even if samples are
19*4882a593Smuzhiyunspread over the children.
20*4882a593Smuzhiyun
21*4882a593SmuzhiyunConsider the following example; there are three functions like below.
22*4882a593Smuzhiyun
23*4882a593Smuzhiyun-----------------------
24*4882a593Smuzhiyunvoid foo(void) {
25*4882a593Smuzhiyun    /* do something */
26*4882a593Smuzhiyun}
27*4882a593Smuzhiyun
28*4882a593Smuzhiyunvoid bar(void) {
29*4882a593Smuzhiyun    /* do something */
30*4882a593Smuzhiyun    foo();
31*4882a593Smuzhiyun}
32*4882a593Smuzhiyun
33*4882a593Smuzhiyunint main(void) {
34*4882a593Smuzhiyun    bar()
35*4882a593Smuzhiyun    return 0;
36*4882a593Smuzhiyun}
37*4882a593Smuzhiyun-----------------------
38*4882a593Smuzhiyun
39*4882a593SmuzhiyunIn this case 'foo' is a child of 'bar', and 'bar' is an immediate
40*4882a593Smuzhiyunchild of 'main' so 'foo' also is a child of 'main'.  In other words,
41*4882a593Smuzhiyun'main' is a parent of 'foo' and 'bar', and 'bar' is a parent of 'foo'.
42*4882a593Smuzhiyun
43*4882a593SmuzhiyunSuppose all samples are recorded in 'foo' and 'bar' only.  When it's
44*4882a593Smuzhiyunrecorded with callchains the output will show something like below
45*4882a593Smuzhiyunin the usual (self-overhead-only) output of perf report:
46*4882a593Smuzhiyun
47*4882a593Smuzhiyun----------------------------------
48*4882a593SmuzhiyunOverhead  Symbol
49*4882a593Smuzhiyun........  .....................
50*4882a593Smuzhiyun  60.00%  foo
51*4882a593Smuzhiyun          |
52*4882a593Smuzhiyun          --- foo
53*4882a593Smuzhiyun              bar
54*4882a593Smuzhiyun              main
55*4882a593Smuzhiyun              __libc_start_main
56*4882a593Smuzhiyun
57*4882a593Smuzhiyun  40.00%  bar
58*4882a593Smuzhiyun          |
59*4882a593Smuzhiyun          --- bar
60*4882a593Smuzhiyun              main
61*4882a593Smuzhiyun              __libc_start_main
62*4882a593Smuzhiyun----------------------------------
63*4882a593Smuzhiyun
64*4882a593SmuzhiyunWhen the --children option is enabled, the 'self' overhead values of
65*4882a593Smuzhiyunchild functions (i.e. 'foo' and 'bar') are added to the parents to
66*4882a593Smuzhiyuncalculate the 'children' overhead.  In this case the report could be
67*4882a593Smuzhiyundisplayed as:
68*4882a593Smuzhiyun
69*4882a593Smuzhiyun-------------------------------------------
70*4882a593SmuzhiyunChildren      Self  Symbol
71*4882a593Smuzhiyun........  ........  ....................
72*4882a593Smuzhiyun 100.00%     0.00%  __libc_start_main
73*4882a593Smuzhiyun          |
74*4882a593Smuzhiyun          --- __libc_start_main
75*4882a593Smuzhiyun
76*4882a593Smuzhiyun 100.00%     0.00%  main
77*4882a593Smuzhiyun          |
78*4882a593Smuzhiyun          --- main
79*4882a593Smuzhiyun              __libc_start_main
80*4882a593Smuzhiyun
81*4882a593Smuzhiyun 100.00%    40.00%  bar
82*4882a593Smuzhiyun          |
83*4882a593Smuzhiyun          --- bar
84*4882a593Smuzhiyun              main
85*4882a593Smuzhiyun              __libc_start_main
86*4882a593Smuzhiyun
87*4882a593Smuzhiyun  60.00%    60.00%  foo
88*4882a593Smuzhiyun          |
89*4882a593Smuzhiyun          --- foo
90*4882a593Smuzhiyun              bar
91*4882a593Smuzhiyun              main
92*4882a593Smuzhiyun              __libc_start_main
93*4882a593Smuzhiyun-------------------------------------------
94*4882a593Smuzhiyun
95*4882a593SmuzhiyunIn the above output, the 'self' overhead of 'foo' (60%) was add to the
96*4882a593Smuzhiyun'children' overhead of 'bar', 'main' and '\_\_libc_start_main'.
97*4882a593SmuzhiyunLikewise, the 'self' overhead of 'bar' (40%) was added to the
98*4882a593Smuzhiyun'children' overhead of 'main' and '\_\_libc_start_main'.
99*4882a593Smuzhiyun
100*4882a593SmuzhiyunSo '\_\_libc_start_main' and 'main' are shown first since they have
101*4882a593Smuzhiyunsame (100%) 'children' overhead (even though they have zero 'self'
102*4882a593Smuzhiyunoverhead) and they are the parents of 'foo' and 'bar'.
103*4882a593Smuzhiyun
104*4882a593SmuzhiyunSince v3.16 the 'children' overhead is shown by default and the output
105*4882a593Smuzhiyunis sorted by its values. The 'children' overhead is disabled by
106*4882a593Smuzhiyunspecifying --no-children option on the command line or by adding
107*4882a593Smuzhiyun'report.children = false' or 'top.children = false' in the perf config
108*4882a593Smuzhiyunfile.
109