1*4882a593Smuzhiyun# 2*4882a593Smuzhiyun# Copyright (c) 2013 The Chromium OS Authors. 3*4882a593Smuzhiyun# 4*4882a593Smuzhiyun# SPDX-License-Identifier: GPL-2.0+ 5*4882a593Smuzhiyun# 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunTracing in U-Boot 8*4882a593Smuzhiyun================= 9*4882a593Smuzhiyun 10*4882a593SmuzhiyunU-Boot supports a simple tracing feature which allows a record of excecution 11*4882a593Smuzhiyunto be collected and sent to a host machine for analysis. At present the 12*4882a593Smuzhiyunmain use for this is to profile boot time. 13*4882a593Smuzhiyun 14*4882a593Smuzhiyun 15*4882a593SmuzhiyunOverview 16*4882a593Smuzhiyun-------- 17*4882a593Smuzhiyun 18*4882a593SmuzhiyunThe trace feature uses GCC's instrument-functions feature to trace all 19*4882a593Smuzhiyunfunction entry/exit points. These are then recorded in a memory buffer. 20*4882a593SmuzhiyunThe memory buffer can be saved to the host over a network link using 21*4882a593Smuzhiyuntftpput or by writing to an attached memory device such as MMC. 22*4882a593Smuzhiyun 23*4882a593SmuzhiyunOn the host, the file is first converted with a tool called 'proftool', 24*4882a593Smuzhiyunwhich extracts useful information from it. The resulting trace output 25*4882a593Smuzhiyunresembles that emitted by Linux's ftrace feature, so can be visually 26*4882a593Smuzhiyundisplayed by pytimechart. 27*4882a593Smuzhiyun 28*4882a593Smuzhiyun 29*4882a593SmuzhiyunQuick-start using Sandbox 30*4882a593Smuzhiyun------------------------- 31*4882a593Smuzhiyun 32*4882a593SmuzhiyunSandbox is a build of U-Boot that can run under Linux so it is a convenient 33*4882a593Smuzhiyunway of trying out tracing before you use it on your actual board. To do 34*4882a593Smuzhiyunthis, follow these steps: 35*4882a593Smuzhiyun 36*4882a593SmuzhiyunAdd the following to include/configs/sandbox.h (if not already there) 37*4882a593Smuzhiyun 38*4882a593Smuzhiyun#define CONFIG_TRACE 39*4882a593Smuzhiyun#define CONFIG_CMD_TRACE 40*4882a593Smuzhiyun#define CONFIG_TRACE_BUFFER_SIZE (16 << 20) 41*4882a593Smuzhiyun#define CONFIG_TRACE_EARLY_SIZE (8 << 20) 42*4882a593Smuzhiyun#define CONFIG_TRACE_EARLY 43*4882a593Smuzhiyun#define CONFIG_TRACE_EARLY_ADDR 0x00100000 44*4882a593Smuzhiyun 45*4882a593SmuzhiyunBuild sandbox U-Boot with tracing enabled: 46*4882a593Smuzhiyun 47*4882a593Smuzhiyun$ make FTRACE=1 O=sandbox sandbox_config 48*4882a593Smuzhiyun$ make FTRACE=1 O=sandbox 49*4882a593Smuzhiyun 50*4882a593SmuzhiyunRun sandbox, wait for a bit of trace information to appear, and then capture 51*4882a593Smuzhiyuna trace: 52*4882a593Smuzhiyun 53*4882a593Smuzhiyun$ ./sandbox/u-boot 54*4882a593Smuzhiyun 55*4882a593Smuzhiyun 56*4882a593SmuzhiyunU-Boot 2013.04-rc2-00100-ga72fcef (Apr 17 2013 - 19:25:24) 57*4882a593Smuzhiyun 58*4882a593SmuzhiyunDRAM: 128 MiB 59*4882a593Smuzhiyuntrace: enabled 60*4882a593SmuzhiyunUsing default environment 61*4882a593Smuzhiyun 62*4882a593SmuzhiyunIn: serial 63*4882a593SmuzhiyunOut: serial 64*4882a593SmuzhiyunErr: serial 65*4882a593Smuzhiyun=>trace stats 66*4882a593Smuzhiyun 671,406 function sites 67*4882a593Smuzhiyun 69,712 function calls 68*4882a593Smuzhiyun 0 untracked function calls 69*4882a593Smuzhiyun 73,373 traced function calls 70*4882a593Smuzhiyun 16 maximum observed call depth 71*4882a593Smuzhiyun 15 call depth limit 72*4882a593Smuzhiyun 66,491 calls not traced due to depth 73*4882a593Smuzhiyun=>trace stats 74*4882a593Smuzhiyun 671,406 function sites 75*4882a593Smuzhiyun 1,279,450 function calls 76*4882a593Smuzhiyun 0 untracked function calls 77*4882a593Smuzhiyun 950,490 traced function calls (333217 dropped due to overflow) 78*4882a593Smuzhiyun 16 maximum observed call depth 79*4882a593Smuzhiyun 15 call depth limit 80*4882a593Smuzhiyun 1,275,767 calls not traced due to depth 81*4882a593Smuzhiyun=>trace calls 0 e00000 82*4882a593SmuzhiyunCall list dumped to 00000000, size 0xae0a40 83*4882a593Smuzhiyun=>print 84*4882a593Smuzhiyunbaudrate=115200 85*4882a593Smuzhiyunprofbase=0 86*4882a593Smuzhiyunprofoffset=ae0a40 87*4882a593Smuzhiyunprofsize=e00000 88*4882a593Smuzhiyunstderr=serial 89*4882a593Smuzhiyunstdin=serial 90*4882a593Smuzhiyunstdout=serial 91*4882a593Smuzhiyun 92*4882a593SmuzhiyunEnvironment size: 117/8188 bytes 93*4882a593Smuzhiyun=>sb save host 0 trace 0 ${profoffset} 94*4882a593Smuzhiyun11405888 bytes written in 10 ms (1.1 GiB/s) 95*4882a593Smuzhiyun=>reset 96*4882a593Smuzhiyun 97*4882a593Smuzhiyun 98*4882a593SmuzhiyunThen run proftool to convert the trace information to ftrace format. 99*4882a593Smuzhiyun 100*4882a593Smuzhiyun$ ./sandbox/tools/proftool -m sandbox/System.map -p trace dump-ftrace >trace.txt 101*4882a593Smuzhiyun 102*4882a593SmuzhiyunFinally run pytimechart to display it: 103*4882a593Smuzhiyun 104*4882a593Smuzhiyun$ pytimechart trace.txt 105*4882a593Smuzhiyun 106*4882a593SmuzhiyunUsing this tool you can zoom and pan across the trace, with the function 107*4882a593Smuzhiyuncalls on the left and little marks representing the start and end of each 108*4882a593Smuzhiyunfunction. 109*4882a593Smuzhiyun 110*4882a593Smuzhiyun 111*4882a593SmuzhiyunCONFIG Options 112*4882a593Smuzhiyun-------------- 113*4882a593Smuzhiyun 114*4882a593Smuzhiyun- CONFIG_TRACE 115*4882a593Smuzhiyun Enables the trace feature in U-Boot. 116*4882a593Smuzhiyun 117*4882a593Smuzhiyun- CONFIG_CMD_TRACE 118*4882a593Smuzhiyun Enables the trace command. 119*4882a593Smuzhiyun 120*4882a593Smuzhiyun- CONFIG_TRACE_BUFFER_SIZE 121*4882a593Smuzhiyun Size of trace buffer to allocate for U-Boot. This buffer is 122*4882a593Smuzhiyun used after relocation, as a place to put function tracing 123*4882a593Smuzhiyun information. The address of the buffer is determined by 124*4882a593Smuzhiyun the relocation code. 125*4882a593Smuzhiyun 126*4882a593Smuzhiyun- CONFIG_TRACE_EARLY 127*4882a593Smuzhiyun Define this to start tracing early, before relocation. 128*4882a593Smuzhiyun 129*4882a593Smuzhiyun- CONFIG_TRACE_EARLY_SIZE 130*4882a593Smuzhiyun Size of 'early' trace buffer. Before U-Boot has relocated 131*4882a593Smuzhiyun it doesn't have a proper trace buffer. On many boards 132*4882a593Smuzhiyun you can define an area of memory to use for the trace 133*4882a593Smuzhiyun buffer until the 'real' trace buffer is available after 134*4882a593Smuzhiyun relocation. The contents of this buffer are then copied to 135*4882a593Smuzhiyun the real buffer. 136*4882a593Smuzhiyun 137*4882a593Smuzhiyun- CONFIG_TRACE_EARLY_ADDR 138*4882a593Smuzhiyun Address of early trace buffer 139*4882a593Smuzhiyun 140*4882a593Smuzhiyun 141*4882a593SmuzhiyunBuilding U-Boot with Tracing Enabled 142*4882a593Smuzhiyun------------------------------------ 143*4882a593Smuzhiyun 144*4882a593SmuzhiyunPass 'FTRACE=1' to the U-Boot Makefile to actually instrument the code. 145*4882a593SmuzhiyunThis is kept as a separate option so that it is easy to enable/disable 146*4882a593Smuzhiyuninstrumenting from the command line instead of having to change board 147*4882a593Smuzhiyunconfig files. 148*4882a593Smuzhiyun 149*4882a593Smuzhiyun 150*4882a593SmuzhiyunCollecting Trace Data 151*4882a593Smuzhiyun--------------------- 152*4882a593Smuzhiyun 153*4882a593SmuzhiyunWhen you run U-Boot on your board it will collect trace data up to the 154*4882a593Smuzhiyunlimit of the trace buffer size you have specified. Once that is exhausted 155*4882a593Smuzhiyunno more data will be collected. 156*4882a593Smuzhiyun 157*4882a593SmuzhiyunCollecting trace data has an affect on execution time/performance. You 158*4882a593Smuzhiyunwill notice this particularly with trvial functions - the overhead of 159*4882a593Smuzhiyunrecording their execution may even exceed their normal execution time. 160*4882a593SmuzhiyunIn practice this doesn't matter much so long as you are aware of the 161*4882a593Smuzhiyuneffect. Once you have done your optimisations, turn off tracing before 162*4882a593Smuzhiyundoing end-to-end timing. 163*4882a593Smuzhiyun 164*4882a593SmuzhiyunThe best time to start tracing is right at the beginning of U-Boot. The 165*4882a593Smuzhiyunbest time to stop tracing is right at the end. In practice it is hard 166*4882a593Smuzhiyunto achieve these ideals. 167*4882a593Smuzhiyun 168*4882a593SmuzhiyunThis implementation enables tracing early in board_init_f(). This means 169*4882a593Smuzhiyunthat it captures most of the board init process, missing only the 170*4882a593Smuzhiyunearly architecture-specific init. However, it also misses the entire 171*4882a593SmuzhiyunSPL stage if there is one. 172*4882a593Smuzhiyun 173*4882a593SmuzhiyunU-Boot typically ends with a 'bootm' command which loads and runs an 174*4882a593SmuzhiyunOS. There is useful trace data in the execution of that bootm 175*4882a593Smuzhiyuncommand. Therefore this implementation provides a way to collect trace 176*4882a593Smuzhiyundata after bootm has finished processing, but just before it jumps to 177*4882a593Smuzhiyunthe OS. In practical terms, U-Boot runs the 'fakegocmd' environment 178*4882a593Smuzhiyunvariable at this point. This variable should have a short script which 179*4882a593Smuzhiyuncollects the trace data and writes it somewhere. 180*4882a593Smuzhiyun 181*4882a593SmuzhiyunTrace data collection relies on a microsecond timer, accesed through 182*4882a593Smuzhiyuntimer_get_us(). So the first think you should do is make sure that 183*4882a593Smuzhiyunthis produces sensible results for your board. Suitable sources for 184*4882a593Smuzhiyunthis timer include high resolution timers, PWMs or profile timers if 185*4882a593Smuzhiyunavailable. Most modern SOCs have a suitable timer for this. Make sure 186*4882a593Smuzhiyunthat you mark this timer (and anything it calls) with 187*4882a593Smuzhiyun__attribute__((no_instrument_function)) so that the trace library can 188*4882a593Smuzhiyunuse it without causing an infinite loop. 189*4882a593Smuzhiyun 190*4882a593Smuzhiyun 191*4882a593SmuzhiyunCommands 192*4882a593Smuzhiyun-------- 193*4882a593Smuzhiyun 194*4882a593SmuzhiyunThe trace command has variable sub-commands: 195*4882a593Smuzhiyun 196*4882a593Smuzhiyun- stats 197*4882a593Smuzhiyun Display tracing statistics 198*4882a593Smuzhiyun 199*4882a593Smuzhiyun- pause 200*4882a593Smuzhiyun Pause tracing 201*4882a593Smuzhiyun 202*4882a593Smuzhiyun- resume 203*4882a593Smuzhiyun Resume tracing 204*4882a593Smuzhiyun 205*4882a593Smuzhiyun- funclist [<addr> <size>] 206*4882a593Smuzhiyun Dump a list of functions into the buffer 207*4882a593Smuzhiyun 208*4882a593Smuzhiyun- calls [<addr> <size>] 209*4882a593Smuzhiyun Dump function call trace into buffer 210*4882a593Smuzhiyun 211*4882a593SmuzhiyunIf the address and size are not given, these are obtained from environment 212*4882a593Smuzhiyunvariables (see below). In any case the environment variables are updated 213*4882a593Smuzhiyunafter the command runs. 214*4882a593Smuzhiyun 215*4882a593Smuzhiyun 216*4882a593SmuzhiyunEnvironment Variables 217*4882a593Smuzhiyun--------------------- 218*4882a593Smuzhiyun 219*4882a593SmuzhiyunThe following are used: 220*4882a593Smuzhiyun 221*4882a593Smuzhiyun- profbase 222*4882a593Smuzhiyun Base address of trace output buffer 223*4882a593Smuzhiyun 224*4882a593Smuzhiyun- profoffset 225*4882a593Smuzhiyun Offset of first unwritten byte in trace output buffer 226*4882a593Smuzhiyun 227*4882a593Smuzhiyun- profsize 228*4882a593Smuzhiyun Size of trace output buffer 229*4882a593Smuzhiyun 230*4882a593SmuzhiyunAll of these are set by the 'trace calls' command. 231*4882a593Smuzhiyun 232*4882a593SmuzhiyunThese variables keep track of the amount of data written to the trace 233*4882a593Smuzhiyunoutput buffer by the 'trace' command. The trace commands which write data 234*4882a593Smuzhiyunto the output buffer can use these to specify the buffer to write to, and 235*4882a593Smuzhiyunupdate profoffset each time. This allows successive commands to append data 236*4882a593Smuzhiyunto the same buffer, for example: 237*4882a593Smuzhiyun 238*4882a593Smuzhiyun trace funclist 10000 e00000 239*4882a593Smuzhiyun trace calls 240*4882a593Smuzhiyun 241*4882a593Smuzhiyun(the latter command appends more data to the buffer). 242*4882a593Smuzhiyun 243*4882a593Smuzhiyun 244*4882a593Smuzhiyun- fakegocmd 245*4882a593Smuzhiyun Specifies commands to run just before booting the OS. This 246*4882a593Smuzhiyun is a useful time to write the trace data to the host for 247*4882a593Smuzhiyun processing. 248*4882a593Smuzhiyun 249*4882a593Smuzhiyun 250*4882a593SmuzhiyunWriting Out Trace Data 251*4882a593Smuzhiyun---------------------- 252*4882a593Smuzhiyun 253*4882a593SmuzhiyunOnce the trace data is in an output buffer in memory there are various ways 254*4882a593Smuzhiyunto transmit it to the host. Notably you can use tftput to send the data 255*4882a593Smuzhiyunover a network link: 256*4882a593Smuzhiyun 257*4882a593Smuzhiyunfakegocmd=trace pause; usb start; set autoload n; bootp; 258*4882a593Smuzhiyun trace calls 10000000 1000000; 259*4882a593Smuzhiyun tftpput ${profbase} ${profoffset} 192.168.1.4:/tftpboot/calls 260*4882a593Smuzhiyun 261*4882a593SmuzhiyunThis starts up USB (to talk to an attached USB Ethernet dongle), writes 262*4882a593Smuzhiyuna trace log to address 10000000 and sends it to a host machine using 263*4882a593SmuzhiyunTFTP. After this, U-Boot will boot the OS normally, albeit a little 264*4882a593Smuzhiyunlater. 265*4882a593Smuzhiyun 266*4882a593Smuzhiyun 267*4882a593SmuzhiyunConverting Trace Output Data 268*4882a593Smuzhiyun---------------------------- 269*4882a593Smuzhiyun 270*4882a593SmuzhiyunThe trace output data is kept in a binary format which is not documented 271*4882a593Smuzhiyunhere. To convert it into something useful, you can use proftool. 272*4882a593Smuzhiyun 273*4882a593SmuzhiyunThis tool must be given the U-Boot map file and the trace data received 274*4882a593Smuzhiyunfrom running that U-Boot. It produces a text output file. 275*4882a593Smuzhiyun 276*4882a593SmuzhiyunOptions 277*4882a593Smuzhiyun -m <map_file> 278*4882a593Smuzhiyun Specify U-Boot map file 279*4882a593Smuzhiyun 280*4882a593Smuzhiyun -p <trace_file> 281*4882a593Smuzhiyun Specifiy profile/trace file 282*4882a593Smuzhiyun 283*4882a593SmuzhiyunCommands: 284*4882a593Smuzhiyun 285*4882a593Smuzhiyun- dump-ftrace 286*4882a593Smuzhiyun Write a text dump of the file in Linux ftrace format to stdout 287*4882a593Smuzhiyun 288*4882a593Smuzhiyun 289*4882a593SmuzhiyunViewing the Trace Data 290*4882a593Smuzhiyun---------------------- 291*4882a593Smuzhiyun 292*4882a593SmuzhiyunYou can use pytimechart for this (sudo apt-get pytimechart might work on 293*4882a593Smuzhiyunyour Debian-style machine, and use your favourite search engine to obtain 294*4882a593Smuzhiyundocumentation). It expects the file to have a .txt extension. The program 295*4882a593Smuzhiyunhas terse user interface but is very convenient for viewing U-Boot 296*4882a593Smuzhiyunprofile information. 297*4882a593Smuzhiyun 298*4882a593Smuzhiyun 299*4882a593SmuzhiyunWorkflow Suggestions 300*4882a593Smuzhiyun-------------------- 301*4882a593Smuzhiyun 302*4882a593SmuzhiyunThe following suggestions may be helpful if you are trying to reduce boot 303*4882a593Smuzhiyuntime: 304*4882a593Smuzhiyun 305*4882a593Smuzhiyun1. Enable CONFIG_BOOTSTAGE and CONFIG_BOOTSTAGE_REPORT. This should get 306*4882a593Smuzhiyunyou are helpful overall snapshot of the boot time. 307*4882a593Smuzhiyun 308*4882a593Smuzhiyun2. Build U-Boot with tracing and run it. Note the difference in boot time 309*4882a593Smuzhiyun(it is common for tracing to add 10% to the time) 310*4882a593Smuzhiyun 311*4882a593Smuzhiyun3. Collect the trace information as descibed above. Use this to find where 312*4882a593Smuzhiyunall the time is being spent. 313*4882a593Smuzhiyun 314*4882a593Smuzhiyun4. Take a look at that code and see if you can optimise it. Perhaps it is 315*4882a593Smuzhiyunpossible to speed up the initialisation of a device, or remove an unused 316*4882a593Smuzhiyunfeature. 317*4882a593Smuzhiyun 318*4882a593Smuzhiyun5. Rebuild, run and collect again. Compare your results. 319*4882a593Smuzhiyun 320*4882a593Smuzhiyun6. Keep going until you run out of steam, or your boot is fast enough. 321*4882a593Smuzhiyun 322*4882a593Smuzhiyun 323*4882a593SmuzhiyunConfiguring Trace 324*4882a593Smuzhiyun----------------- 325*4882a593Smuzhiyun 326*4882a593SmuzhiyunThere are a few parameters in the code that you may want to consider. 327*4882a593SmuzhiyunThere is a function call depth limit (set to 15 by default). When the 328*4882a593Smuzhiyunstack depth goes above this then no tracing information is recorded. 329*4882a593SmuzhiyunThe maximum depth reached is recorded and displayed by the 'trace stats' 330*4882a593Smuzhiyuncommand. 331*4882a593Smuzhiyun 332*4882a593Smuzhiyun 333*4882a593SmuzhiyunFuture Work 334*4882a593Smuzhiyun----------- 335*4882a593Smuzhiyun 336*4882a593SmuzhiyunTracing could be a little tidier in some areas, for example providing 337*4882a593Smuzhiyunrun-time configuration options for trace. 338*4882a593Smuzhiyun 339*4882a593SmuzhiyunSome other features that might be useful: 340*4882a593Smuzhiyun 341*4882a593Smuzhiyun- Trace filter to select which functions are recorded 342*4882a593Smuzhiyun- Sample-based profiling using a timer interrupt 343*4882a593Smuzhiyun- Better control over trace depth 344*4882a593Smuzhiyun- Compression of trace information 345*4882a593Smuzhiyun 346*4882a593Smuzhiyun 347*4882a593SmuzhiyunSimon Glass <sjg@chromium.org> 348*4882a593SmuzhiyunApril 2013 349