1<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> 2<html> 3<!-- This file documents the gprof profiler of the GNU system. 4 5Copyright (C) 1988-2021 Free Software Foundation, Inc. 6 7Permission is granted to copy, distribute and/or modify this document 8under the terms of the GNU Free Documentation License, Version 1.3 9or any later version published by the Free Software Foundation; 10with no Invariant Sections, with no Front-Cover Texts, and with no 11Back-Cover Texts. A copy of the license is included in the 12section entitled "GNU Free Documentation License". 13 --> 14<!-- Created by GNU Texinfo 5.1, http://www.gnu.org/software/texinfo/ --> 15<head> 16<title>GNU gprof: Internals</title> 17 18<meta name="description" content="GNU gprof: Internals"> 19<meta name="keywords" content="GNU gprof: Internals"> 20<meta name="resource-type" content="document"> 21<meta name="distribution" content="global"> 22<meta name="Generator" content="makeinfo"> 23<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 24<link href="index.html#Top" rel="start" title="Top"> 25<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents"> 26<link href="Details.html#Details" rel="up" title="Details"> 27<link href="Debugging.html#Debugging" rel="next" title="Debugging"> 28<link href="File-Format.html#File-Format" rel="previous" title="File Format"> 29<style type="text/css"> 30<!-- 31a.summary-letter {text-decoration: none} 32blockquote.smallquotation {font-size: smaller} 33div.display {margin-left: 3.2em} 34div.example {margin-left: 3.2em} 35div.indentedblock {margin-left: 3.2em} 36div.lisp {margin-left: 3.2em} 37div.smalldisplay {margin-left: 3.2em} 38div.smallexample {margin-left: 3.2em} 39div.smallindentedblock {margin-left: 3.2em; font-size: smaller} 40div.smalllisp {margin-left: 3.2em} 41kbd {font-style:oblique} 42pre.display {font-family: inherit} 43pre.format {font-family: inherit} 44pre.menu-comment {font-family: serif} 45pre.menu-preformatted {font-family: serif} 46pre.smalldisplay {font-family: inherit; font-size: smaller} 47pre.smallexample {font-size: smaller} 48pre.smallformat {font-family: inherit; font-size: smaller} 49pre.smalllisp {font-size: smaller} 50span.nocodebreak {white-space:nowrap} 51span.nolinebreak {white-space:nowrap} 52span.roman {font-family:serif; font-weight:normal} 53span.sansserif {font-family:sans-serif; font-weight:normal} 54ul.no-bullet {list-style: none} 55--> 56</style> 57 58 59</head> 60 61<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000"> 62<a name="Internals"></a> 63<div class="header"> 64<p> 65Next: <a href="Debugging.html#Debugging" accesskey="n" rel="next">Debugging</a>, Previous: <a href="File-Format.html#File-Format" accesskey="p" rel="previous">File Format</a>, Up: <a href="Details.html#Details" accesskey="u" rel="up">Details</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>]</p> 66</div> 67<hr> 68<a name="gprof_0027s-Internal-Operation"></a> 69<h3 class="section">9.3 <code>gprof</code>’s Internal Operation</h3> 70 71<p>Like most programs, <code>gprof</code> begins by processing its options. 72During this stage, it may building its symspec list 73(<code>sym_ids.c:sym_id_add</code>), if 74options are specified which use symspecs. 75<code>gprof</code> maintains a single linked list of symspecs, 76which will eventually get turned into 12 symbol tables, 77organized into six include/exclude pairs—one 78pair each for the flat profile (INCL_FLAT/EXCL_FLAT), 79the call graph arcs (INCL_ARCS/EXCL_ARCS), 80printing in the call graph (INCL_GRAPH/EXCL_GRAPH), 81timing propagation in the call graph (INCL_TIME/EXCL_TIME), 82the annotated source listing (INCL_ANNO/EXCL_ANNO), 83and the execution count listing (INCL_EXEC/EXCL_EXEC). 84</p> 85<p>After option processing, <code>gprof</code> finishes 86building the symspec list by adding all the symspecs in 87<code>default_excluded_list</code> to the exclude lists 88EXCL_TIME and EXCL_GRAPH, and if line-by-line profiling is specified, 89EXCL_FLAT as well. 90These default excludes are not added to EXCL_ANNO, EXCL_ARCS, and EXCL_EXEC. 91</p> 92<p>Next, the BFD library is called to open the object file, 93verify that it is an object file, 94and read its symbol table (<code>core.c:core_init</code>), 95using <code>bfd_canonicalize_symtab</code> after mallocing 96an appropriately sized array of symbols. At this point, 97function mappings are read (if the ‘<samp>--file-ordering</samp>’ option 98has been specified), and the core text space is read into 99memory (if the ‘<samp>-c</samp>’ option was given). 100</p> 101<p><code>gprof</code>’s own symbol table, an array of Sym structures, 102is now built. 103This is done in one of two ways, by one of two routines, depending 104on whether line-by-line profiling (‘<samp>-l</samp>’ option) has been 105enabled. 106For normal profiling, the BFD canonical symbol table is scanned. 107For line-by-line profiling, every 108text space address is examined, and a new symbol table entry 109gets created every time the line number changes. 110In either case, two passes are made through the symbol 111table—one to count the size of the symbol table required, 112and the other to actually read the symbols. In between the 113two passes, a single array of type <code>Sym</code> is created of 114the appropriate length. 115Finally, <code>symtab.c:symtab_finalize</code> 116is called to sort the symbol table and remove duplicate entries 117(entries with the same memory address). 118</p> 119<p>The symbol table must be a contiguous array for two reasons. 120First, the <code>qsort</code> library function (which sorts an array) 121will be used to sort the symbol table. 122Also, the symbol lookup routine (<code>symtab.c:sym_lookup</code>), 123which finds symbols 124based on memory address, uses a binary search algorithm 125which requires the symbol table to be a sorted array. 126Function symbols are indicated with an <code>is_func</code> flag. 127Line number symbols have no special flags set. 128Additionally, a symbol can have an <code>is_static</code> flag 129to indicate that it is a local symbol. 130</p> 131<p>With the symbol table read, the symspecs can now be translated 132into Syms (<code>sym_ids.c:sym_id_parse</code>). Remember that a single 133symspec can match multiple symbols. 134An array of symbol tables 135(<code>syms</code>) is created, each entry of which is a symbol table 136of Syms to be included or excluded from a particular listing. 137The master symbol table and the symspecs are examined by nested 138loops, and every symbol that matches a symspec is inserted 139into the appropriate syms table. This is done twice, once to 140count the size of each required symbol table, and again to build 141the tables, which have been malloced between passes. 142From now on, to determine whether a symbol is on an include 143or exclude symspec list, <code>gprof</code> simply uses its 144standard symbol lookup routine on the appropriate table 145in the <code>syms</code> array. 146</p> 147<p>Now the profile data file(s) themselves are read 148(<code>gmon_io.c:gmon_out_read</code>), 149first by checking for a new-style ‘<samp>gmon.out</samp>’ header, 150then assuming this is an old-style BSD ‘<samp>gmon.out</samp>’ 151if the magic number test failed. 152</p> 153<p>New-style histogram records are read by <code>hist.c:hist_read_rec</code>. 154For the first histogram record, allocate a memory array to hold 155all the bins, and read them in. 156When multiple profile data files (or files with multiple histogram 157records) are read, the memory ranges of each pair of histogram records 158must be either equal, or non-overlapping. For each pair of histogram 159records, the resolution (memory region size divided by the number of 160bins) must be the same. The time unit must be the same for all 161histogram records. If the above containts are met, all histograms 162for the same memory range are merged. 163</p> 164<p>As each call graph record is read (<code>call_graph.c:cg_read_rec</code>), 165the parent and child addresses 166are matched to symbol table entries, and a call graph arc is 167created by <code>cg_arcs.c:arc_add</code>, unless the arc fails a symspec 168check against INCL_ARCS/EXCL_ARCS. As each arc is added, 169a linked list is maintained of the parent’s child arcs, and of the child’s 170parent arcs. 171Both the child’s call count and the arc’s call count are 172incremented by the record’s call count. 173</p> 174<p>Basic-block records are read (<code>basic_blocks.c:bb_read_rec</code>), 175but only if line-by-line profiling has been selected. 176Each basic-block address is matched to a corresponding line 177symbol in the symbol table, and an entry made in the symbol’s 178bb_addr and bb_calls arrays. Again, if multiple basic-block 179records are present for the same address, the call counts 180are cumulative. 181</p> 182<p>A gmon.sum file is dumped, if requested (<code>gmon_io.c:gmon_out_write</code>). 183</p> 184<p>If histograms were present in the data files, assign them to symbols 185(<code>hist.c:hist_assign_samples</code>) by iterating over all the sample 186bins and assigning them to symbols. Since the symbol table 187is sorted in order of ascending memory addresses, we can 188simple follow along in the symbol table as we make our pass 189over the sample bins. 190This step includes a symspec check against INCL_FLAT/EXCL_FLAT. 191Depending on the histogram 192scale factor, a sample bin may span multiple symbols, 193in which case a fraction of the sample count is allocated 194to each symbol, proportional to the degree of overlap. 195This effect is rare for normal profiling, but overlaps 196are more common during line-by-line profiling, and can 197cause each of two adjacent lines to be credited with half 198a hit, for example. 199</p> 200<p>If call graph data is present, <code>cg_arcs.c:cg_assemble</code> is called. 201First, if ‘<samp>-c</samp>’ was specified, a machine-dependent 202routine (<code>find_call</code>) scans through each symbol’s machine code, 203looking for subroutine call instructions, and adding them 204to the call graph with a zero call count. 205A topological sort is performed by depth-first numbering 206all the symbols (<code>cg_dfn.c:cg_dfn</code>), so that 207children are always numbered less than their parents, 208then making a array of pointers into the symbol table and sorting it into 209numerical order, which is reverse topological 210order (children appear before parents). 211Cycles are also detected at this point, all members 212of which are assigned the same topological number. 213Two passes are now made through this sorted array of symbol pointers. 214The first pass, from end to beginning (parents to children), 215computes the fraction of child time to propagate to each parent 216and a print flag. 217The print flag reflects symspec handling of INCL_GRAPH/EXCL_GRAPH, 218with a parent’s include or exclude (print or no print) property 219being propagated to its children, unless they themselves explicitly appear 220in INCL_GRAPH or EXCL_GRAPH. 221A second pass, from beginning to end (children to parents) actually 222propagates the timings along the call graph, subject 223to a check against INCL_TIME/EXCL_TIME. 224With the print flag, fractions, and timings now stored in the symbol 225structures, the topological sort array is now discarded, and a 226new array of pointers is assembled, this time sorted by propagated time. 227</p> 228<p>Finally, print the various outputs the user requested, which is now fairly 229straightforward. The call graph (<code>cg_print.c:cg_print</code>) and 230flat profile (<code>hist.c:hist_print</code>) are regurgitations of values 231already computed. The annotated source listing 232(<code>basic_blocks.c:print_annotated_source</code>) uses basic-block 233information, if present, to label each line of code with call counts, 234otherwise only the function call counts are presented. 235</p> 236<p>The function ordering code is marginally well documented 237in the source code itself (<code>cg_print.c</code>). Basically, 238the functions with the most use and the most parents are 239placed first, followed by other functions with the most use, 240followed by lower use functions, followed by unused functions 241at the end. 242</p> 243<hr> 244<div class="header"> 245<p> 246Next: <a href="Debugging.html#Debugging" accesskey="n" rel="next">Debugging</a>, Previous: <a href="File-Format.html#File-Format" accesskey="p" rel="previous">File Format</a>, Up: <a href="Details.html#Details" accesskey="u" rel="up">Details</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>]</p> 247</div> 248 249 250 251</body> 252</html> 253