1<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
2<html>
3<!-- This file documents the gprof profiler of the GNU system.
4
5Copyright (C) 1988-2021 Free Software Foundation, Inc.
6
7Permission is granted to copy, distribute and/or modify this document
8under the terms of the GNU Free Documentation License, Version 1.3
9or any later version published by the Free Software Foundation;
10with no Invariant Sections, with no Front-Cover Texts, and with no
11Back-Cover Texts.  A copy of the license is included in the
12section entitled "GNU Free Documentation License".
13 -->
14<!-- Created by GNU Texinfo 5.1, http://www.gnu.org/software/texinfo/ -->
15<head>
16<title>GNU gprof: Implementation</title>
17
18<meta name="description" content="GNU gprof: Implementation">
19<meta name="keywords" content="GNU gprof: Implementation">
20<meta name="resource-type" content="document">
21<meta name="distribution" content="global">
22<meta name="Generator" content="makeinfo">
23<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
24<link href="index.html#Top" rel="start" title="Top">
25<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
26<link href="Details.html#Details" rel="up" title="Details">
27<link href="File-Format.html#File-Format" rel="next" title="File Format">
28<link href="Details.html#Details" rel="previous" title="Details">
29<style type="text/css">
30<!--
31a.summary-letter {text-decoration: none}
32blockquote.smallquotation {font-size: smaller}
33div.display {margin-left: 3.2em}
34div.example {margin-left: 3.2em}
35div.indentedblock {margin-left: 3.2em}
36div.lisp {margin-left: 3.2em}
37div.smalldisplay {margin-left: 3.2em}
38div.smallexample {margin-left: 3.2em}
39div.smallindentedblock {margin-left: 3.2em; font-size: smaller}
40div.smalllisp {margin-left: 3.2em}
41kbd {font-style:oblique}
42pre.display {font-family: inherit}
43pre.format {font-family: inherit}
44pre.menu-comment {font-family: serif}
45pre.menu-preformatted {font-family: serif}
46pre.smalldisplay {font-family: inherit; font-size: smaller}
47pre.smallexample {font-size: smaller}
48pre.smallformat {font-family: inherit; font-size: smaller}
49pre.smalllisp {font-size: smaller}
50span.nocodebreak {white-space:nowrap}
51span.nolinebreak {white-space:nowrap}
52span.roman {font-family:serif; font-weight:normal}
53span.sansserif {font-family:sans-serif; font-weight:normal}
54ul.no-bullet {list-style: none}
55-->
56</style>
57
58
59</head>
60
61<body lang="en" bgcolor="#FFFFFF" text="#000000" link="#0000FF" vlink="#800080" alink="#FF0000">
62<a name="Implementation"></a>
63<div class="header">
64<p>
65Next: <a href="File-Format.html#File-Format" accesskey="n" rel="next">File Format</a>, Up: <a href="Details.html#Details" accesskey="u" rel="up">Details</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>]</p>
66</div>
67<hr>
68<a name="Implementation-of-Profiling"></a>
69<h3 class="section">9.1 Implementation of Profiling</h3>
70
71<p>Profiling works by changing how every function in your program is compiled
72so that when it is called, it will stash away some information about where
73it was called from.  From this, the profiler can figure out what function
74called it, and can count how many times it was called.  This change is made
75by the compiler when your program is compiled with the &lsquo;<samp>-pg</samp>&rsquo; option,
76which causes every function to call <code>mcount</code>
77(or <code>_mcount</code>, or <code>__mcount</code>, depending on the OS and compiler)
78as one of its first operations.
79</p>
80<p>The <code>mcount</code> routine, included in the profiling library,
81is responsible for recording in an in-memory call graph table
82both its parent routine (the child) and its parent&rsquo;s parent.  This is
83typically done by examining the stack frame to find both
84the address of the child, and the return address in the original parent.
85Since this is a very machine-dependent operation, <code>mcount</code>
86itself is typically a short assembly-language stub routine
87that extracts the required
88information, and then calls <code>__mcount_internal</code>
89(a normal C function) with two arguments&mdash;<code>frompc</code> and <code>selfpc</code>.
90<code>__mcount_internal</code> is responsible for maintaining
91the in-memory call graph, which records <code>frompc</code>, <code>selfpc</code>,
92and the number of times each of these call arcs was traversed.
93</p>
94<p>GCC Version 2 provides a magical function (<code>__builtin_return_address</code>),
95which allows a generic <code>mcount</code> function to extract the
96required information from the stack frame.  However, on some
97architectures, most notably the SPARC, using this builtin can be
98very computationally expensive, and an assembly language version
99of <code>mcount</code> is used for performance reasons.
100</p>
101<p>Number-of-calls information for library routines is collected by using a
102special version of the C library.  The programs in it are the same as in
103the usual C library, but they were compiled with &lsquo;<samp>-pg</samp>&rsquo;.  If you
104link your program with &lsquo;<samp>gcc &hellip; -pg</samp>&rsquo;, it automatically uses the
105profiling version of the library.
106</p>
107<p>Profiling also involves watching your program as it runs, and keeping a
108histogram of where the program counter happens to be every now and then.
109Typically the program counter is looked at around 100 times per second of
110run time, but the exact frequency may vary from system to system.
111</p>
112<p>This is done is one of two ways.  Most UNIX-like operating systems
113provide a <code>profil()</code> system call, which registers a memory
114array with the kernel, along with a scale
115factor that determines how the program&rsquo;s address space maps
116into the array.
117Typical scaling values cause every 2 to 8 bytes of address space
118to map into a single array slot.
119On every tick of the system clock
120(assuming the profiled program is running), the value of the
121program counter is examined and the corresponding slot in
122the memory array is incremented.  Since this is done in the kernel,
123which had to interrupt the process anyway to handle the clock
124interrupt, very little additional system overhead is required.
125</p>
126<p>However, some operating systems, most notably Linux 2.0 (and earlier),
127do not provide a <code>profil()</code> system call.  On such a system,
128arrangements are made for the kernel to periodically deliver
129a signal to the process (typically via <code>setitimer()</code>),
130which then performs the same operation of examining the
131program counter and incrementing a slot in the memory array.
132Since this method requires a signal to be delivered to
133user space every time a sample is taken, it uses considerably
134more overhead than kernel-based profiling.  Also, due to the
135added delay required to deliver the signal, this method is
136less accurate as well.
137</p>
138<p>A special startup routine allocates memory for the histogram and
139either calls <code>profil()</code> or sets up
140a clock signal handler.
141This routine (<code>monstartup</code>) can be invoked in several ways.
142On Linux systems, a special profiling startup file <code>gcrt0.o</code>,
143which invokes <code>monstartup</code> before <code>main</code>,
144is used instead of the default <code>crt0.o</code>.
145Use of this special startup file is one of the effects
146of using &lsquo;<samp>gcc &hellip; -pg</samp>&rsquo; to link.
147On SPARC systems, no special startup files are used.
148Rather, the <code>mcount</code> routine, when it is invoked for
149the first time (typically when <code>main</code> is called),
150calls <code>monstartup</code>.
151</p>
152<p>If the compiler&rsquo;s &lsquo;<samp>-a</samp>&rsquo; option was used, basic-block counting
153is also enabled.  Each object file is then compiled with a static array
154of counts, initially zero.
155In the executable code, every time a new basic-block begins
156(i.e., when an <code>if</code> statement appears), an extra instruction
157is inserted to increment the corresponding count in the array.
158At compile time, a paired array was constructed that recorded
159the starting address of each basic-block.  Taken together,
160the two arrays record the starting address of every basic-block,
161along with the number of times it was executed.
162</p>
163<p>The profiling library also includes a function (<code>mcleanup</code>) which is
164typically registered using <code>atexit()</code> to be called as the
165program exits, and is responsible for writing the file <samp>gmon.out</samp>.
166Profiling is turned off, various headers are output, and the histogram
167is written, followed by the call-graph arcs and the basic-block counts.
168</p>
169<p>The output from <code>gprof</code> gives no indication of parts of your program that
170are limited by I/O or swapping bandwidth.  This is because samples of the
171program counter are taken at fixed intervals of the program&rsquo;s run time.
172Therefore, the
173time measurements in <code>gprof</code> output say nothing about time that your
174program was not running.  For example, a part of the program that creates
175so much data that it cannot all fit in physical memory at once may run very
176slowly due to thrashing, but <code>gprof</code> will say it uses little time.  On
177the other hand, sampling by run time has the advantage that the amount of
178load due to other users won&rsquo;t directly affect the output you get.
179</p>
180<hr>
181<div class="header">
182<p>
183Next: <a href="File-Format.html#File-Format" accesskey="n" rel="next">File Format</a>, Up: <a href="Details.html#Details" accesskey="u" rel="up">Details</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>]</p>
184</div>
185
186
187
188</body>
189</html>
190