1<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
2<html>
3<!-- This file documents the gprof profiler of the GNU system.
4
5Copyright (C) 1988-2021 Free Software Foundation, Inc.
6
7Permission is granted to copy, distribute and/or modify this document
8under the terms of the GNU Free Documentation License, Version 1.3
9or any later version published by the Free Software Foundation;
10with no Invariant Sections, with no Front-Cover Texts, and with no
11Back-Cover Texts.  A copy of the license is included in the
12section entitled "GNU Free Documentation License".
13 -->
14<!-- Created by GNU Texinfo 5.1, http://www.gnu.org/software/texinfo/ -->
15<head>
16<title>GNU gprof: Internals</title>
17
18<meta name="description" content="GNU gprof: Internals">
19<meta name="keywords" content="GNU gprof: Internals">
20<meta name="resource-type" content="document">
21<meta name="distribution" content="global">
22<meta name="Generator" content="makeinfo">
23<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
24<link href="index.html#Top" rel="start" title="Top">
25<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
26<link href="Details.html#Details" rel="up" title="Details">
27<link href="Debugging.html#Debugging" rel="next" title="Debugging">
28<link href="File-Format.html#File-Format" rel="previous" title="File Format">
29<style type="text/css">
30<!--
31a.summary-letter {text-decoration: none}
32blockquote.smallquotation {font-size: smaller}
33div.display {margin-left: 3.2em}
34div.example {margin-left: 3.2em}
35div.indentedblock {margin-left: 3.2em}
36div.lisp {margin-left: 3.2em}
37div.smalldisplay {margin-left: 3.2em}
38div.smallexample {margin-left: 3.2em}
39div.smallindentedblock {margin-left: 3.2em; font-size: smaller}
40div.smalllisp {margin-left: 3.2em}
41kbd {font-style:oblique}
42pre.display {font-family: inherit}
43pre.format {font-family: inherit}
44pre.menu-comment {font-family: serif}
45pre.menu-preformatted {font-family: serif}
46pre.smalldisplay {font-family: inherit; font-size: smaller}
47pre.smallexample {font-size: smaller}
48pre.smallformat {font-family: inherit; font-size: smaller}
49pre.smalllisp {font-size: smaller}
50span.nocodebreak {white-space:nowrap}
51span.nolinebreak {white-space:nowrap}
52span.roman {font-family:serif; font-weight:normal}
53span.sansserif {font-family:sans-serif; font-weight:normal}
54ul.no-bullet {list-style: none}
55-->
56</style>
57
58
59</head>
60
61<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000">
62<a name="Internals"></a>
63<div class="header">
64<p>
65Next: <a href="Debugging.html#Debugging" accesskey="n" rel="next">Debugging</a>, Previous: <a href="File-Format.html#File-Format" accesskey="p" rel="previous">File Format</a>, Up: <a href="Details.html#Details" accesskey="u" rel="up">Details</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>]</p>
66</div>
67<hr>
68<a name="gprof_0027s-Internal-Operation"></a>
69<h3 class="section">9.3 <code>gprof</code>&rsquo;s Internal Operation</h3>
70
71<p>Like most programs, <code>gprof</code> begins by processing its options.
72During this stage, it may building its symspec list
73(<code>sym_ids.c:sym_id_add</code>), if
74options are specified which use symspecs.
75<code>gprof</code> maintains a single linked list of symspecs,
76which will eventually get turned into 12 symbol tables,
77organized into six include/exclude pairs&mdash;one
78pair each for the flat profile (INCL_FLAT/EXCL_FLAT),
79the call graph arcs (INCL_ARCS/EXCL_ARCS),
80printing in the call graph (INCL_GRAPH/EXCL_GRAPH),
81timing propagation in the call graph (INCL_TIME/EXCL_TIME),
82the annotated source listing (INCL_ANNO/EXCL_ANNO),
83and the execution count listing (INCL_EXEC/EXCL_EXEC).
84</p>
85<p>After option processing, <code>gprof</code> finishes
86building the symspec list by adding all the symspecs in
87<code>default_excluded_list</code> to the exclude lists
88EXCL_TIME and EXCL_GRAPH, and if line-by-line profiling is specified,
89EXCL_FLAT as well.
90These default excludes are not added to EXCL_ANNO, EXCL_ARCS, and EXCL_EXEC.
91</p>
92<p>Next, the BFD library is called to open the object file,
93verify that it is an object file,
94and read its symbol table (<code>core.c:core_init</code>),
95using <code>bfd_canonicalize_symtab</code> after mallocing
96an appropriately sized array of symbols.  At this point,
97function mappings are read (if the &lsquo;<samp>--file-ordering</samp>&rsquo; option
98has been specified), and the core text space is read into
99memory (if the &lsquo;<samp>-c</samp>&rsquo; option was given).
100</p>
101<p><code>gprof</code>&rsquo;s own symbol table, an array of Sym structures,
102is now built.
103This is done in one of two ways, by one of two routines, depending
104on whether line-by-line profiling (&lsquo;<samp>-l</samp>&rsquo; option) has been
105enabled.
106For normal profiling, the BFD canonical symbol table is scanned.
107For line-by-line profiling, every
108text space address is examined, and a new symbol table entry
109gets created every time the line number changes.
110In either case, two passes are made through the symbol
111table&mdash;one to count the size of the symbol table required,
112and the other to actually read the symbols.  In between the
113two passes, a single array of type <code>Sym</code> is created of
114the appropriate length.
115Finally, <code>symtab.c:symtab_finalize</code>
116is called to sort the symbol table and remove duplicate entries
117(entries with the same memory address).
118</p>
119<p>The symbol table must be a contiguous array for two reasons.
120First, the <code>qsort</code> library function (which sorts an array)
121will be used to sort the symbol table.
122Also, the symbol lookup routine (<code>symtab.c:sym_lookup</code>),
123which finds symbols
124based on memory address, uses a binary search algorithm
125which requires the symbol table to be a sorted array.
126Function symbols are indicated with an <code>is_func</code> flag.
127Line number symbols have no special flags set.
128Additionally, a symbol can have an <code>is_static</code> flag
129to indicate that it is a local symbol.
130</p>
131<p>With the symbol table read, the symspecs can now be translated
132into Syms (<code>sym_ids.c:sym_id_parse</code>).  Remember that a single
133symspec can match multiple symbols.
134An array of symbol tables
135(<code>syms</code>) is created, each entry of which is a symbol table
136of Syms to be included or excluded from a particular listing.
137The master symbol table and the symspecs are examined by nested
138loops, and every symbol that matches a symspec is inserted
139into the appropriate syms table.  This is done twice, once to
140count the size of each required symbol table, and again to build
141the tables, which have been malloced between passes.
142From now on, to determine whether a symbol is on an include
143or exclude symspec list, <code>gprof</code> simply uses its
144standard symbol lookup routine on the appropriate table
145in the <code>syms</code> array.
146</p>
147<p>Now the profile data file(s) themselves are read
148(<code>gmon_io.c:gmon_out_read</code>),
149first by checking for a new-style &lsquo;<samp>gmon.out</samp>&rsquo; header,
150then assuming this is an old-style BSD &lsquo;<samp>gmon.out</samp>&rsquo;
151if the magic number test failed.
152</p>
153<p>New-style histogram records are read by <code>hist.c:hist_read_rec</code>.
154For the first histogram record, allocate a memory array to hold
155all the bins, and read them in.
156When multiple profile data files (or files with multiple histogram
157records) are read, the memory ranges of each pair of histogram records
158must be either equal, or non-overlapping.  For each pair of histogram
159records, the resolution (memory region size divided by the number of
160bins) must be the same.  The time unit must be the same for all
161histogram records. If the above containts are met, all histograms
162for the same memory range are merged.
163</p>
164<p>As each call graph record is read (<code>call_graph.c:cg_read_rec</code>),
165the parent and child addresses
166are matched to symbol table entries, and a call graph arc is
167created by <code>cg_arcs.c:arc_add</code>, unless the arc fails a symspec
168check against INCL_ARCS/EXCL_ARCS.  As each arc is added,
169a linked list is maintained of the parent&rsquo;s child arcs, and of the child&rsquo;s
170parent arcs.
171Both the child&rsquo;s call count and the arc&rsquo;s call count are
172incremented by the record&rsquo;s call count.
173</p>
174<p>Basic-block records are read (<code>basic_blocks.c:bb_read_rec</code>),
175but only if line-by-line profiling has been selected.
176Each basic-block address is matched to a corresponding line
177symbol in the symbol table, and an entry made in the symbol&rsquo;s
178bb_addr and bb_calls arrays.  Again, if multiple basic-block
179records are present for the same address, the call counts
180are cumulative.
181</p>
182<p>A gmon.sum file is dumped, if requested (<code>gmon_io.c:gmon_out_write</code>).
183</p>
184<p>If histograms were present in the data files, assign them to symbols
185(<code>hist.c:hist_assign_samples</code>) by iterating over all the sample
186bins and assigning them to symbols.  Since the symbol table
187is sorted in order of ascending memory addresses, we can
188simple follow along in the symbol table as we make our pass
189over the sample bins.
190This step includes a symspec check against INCL_FLAT/EXCL_FLAT.
191Depending on the histogram
192scale factor, a sample bin may span multiple symbols,
193in which case a fraction of the sample count is allocated
194to each symbol, proportional to the degree of overlap.
195This effect is rare for normal profiling, but overlaps
196are more common during line-by-line profiling, and can
197cause each of two adjacent lines to be credited with half
198a hit, for example.
199</p>
200<p>If call graph data is present, <code>cg_arcs.c:cg_assemble</code> is called.
201First, if &lsquo;<samp>-c</samp>&rsquo; was specified, a machine-dependent
202routine (<code>find_call</code>) scans through each symbol&rsquo;s machine code,
203looking for subroutine call instructions, and adding them
204to the call graph with a zero call count.
205A topological sort is performed by depth-first numbering
206all the symbols (<code>cg_dfn.c:cg_dfn</code>), so that
207children are always numbered less than their parents,
208then making a array of pointers into the symbol table and sorting it into
209numerical order, which is reverse topological
210order (children appear before parents).
211Cycles are also detected at this point, all members
212of which are assigned the same topological number.
213Two passes are now made through this sorted array of symbol pointers.
214The first pass, from end to beginning (parents to children),
215computes the fraction of child time to propagate to each parent
216and a print flag.
217The print flag reflects symspec handling of INCL_GRAPH/EXCL_GRAPH,
218with a parent&rsquo;s include or exclude (print or no print) property
219being propagated to its children, unless they themselves explicitly appear
220in INCL_GRAPH or EXCL_GRAPH.
221A second pass, from beginning to end (children to parents) actually
222propagates the timings along the call graph, subject
223to a check against INCL_TIME/EXCL_TIME.
224With the print flag, fractions, and timings now stored in the symbol
225structures, the topological sort array is now discarded, and a
226new array of pointers is assembled, this time sorted by propagated time.
227</p>
228<p>Finally, print the various outputs the user requested, which is now fairly
229straightforward.  The call graph (<code>cg_print.c:cg_print</code>) and
230flat profile (<code>hist.c:hist_print</code>) are regurgitations of values
231already computed.  The annotated source listing
232(<code>basic_blocks.c:print_annotated_source</code>) uses basic-block
233information, if present, to label each line of code with call counts,
234otherwise only the function call counts are presented.
235</p>
236<p>The function ordering code is marginally well documented
237in the source code itself (<code>cg_print.c</code>).  Basically,
238the functions with the most use and the most parents are
239placed first, followed by other functions with the most use,
240followed by lower use functions, followed by unused functions
241at the end.
242</p>
243<hr>
244<div class="header">
245<p>
246Next: <a href="Debugging.html#Debugging" accesskey="n" rel="next">Debugging</a>, Previous: <a href="File-Format.html#File-Format" accesskey="p" rel="previous">File Format</a>, Up: <a href="Details.html#Details" accesskey="u" rel="up">Details</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>]</p>
247</div>
248
249
250
251</body>
252</html>
253