1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun============ 4*4882a593SmuzhiyunORC unwinder 5*4882a593Smuzhiyun============ 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunOverview 8*4882a593Smuzhiyun======== 9*4882a593Smuzhiyun 10*4882a593SmuzhiyunThe kernel CONFIG_UNWINDER_ORC option enables the ORC unwinder, which is 11*4882a593Smuzhiyunsimilar in concept to a DWARF unwinder. The difference is that the 12*4882a593Smuzhiyunformat of the ORC data is much simpler than DWARF, which in turn allows 13*4882a593Smuzhiyunthe ORC unwinder to be much simpler and faster. 14*4882a593Smuzhiyun 15*4882a593SmuzhiyunThe ORC data consists of unwind tables which are generated by objtool. 16*4882a593SmuzhiyunThey contain out-of-band data which is used by the in-kernel ORC 17*4882a593Smuzhiyununwinder. Objtool generates the ORC data by first doing compile-time 18*4882a593Smuzhiyunstack metadata validation (CONFIG_STACK_VALIDATION). After analyzing 19*4882a593Smuzhiyunall the code paths of a .o file, it determines information about the 20*4882a593Smuzhiyunstack state at each instruction address in the file and outputs that 21*4882a593Smuzhiyuninformation to the .orc_unwind and .orc_unwind_ip sections. 22*4882a593Smuzhiyun 23*4882a593SmuzhiyunThe per-object ORC sections are combined at link time and are sorted and 24*4882a593Smuzhiyunpost-processed at boot time. The unwinder uses the resulting data to 25*4882a593Smuzhiyuncorrelate instruction addresses with their stack states at run time. 26*4882a593Smuzhiyun 27*4882a593Smuzhiyun 28*4882a593SmuzhiyunORC vs frame pointers 29*4882a593Smuzhiyun===================== 30*4882a593Smuzhiyun 31*4882a593SmuzhiyunWith frame pointers enabled, GCC adds instrumentation code to every 32*4882a593Smuzhiyunfunction in the kernel. The kernel's .text size increases by about 33*4882a593Smuzhiyun3.2%, resulting in a broad kernel-wide slowdown. Measurements by Mel 34*4882a593SmuzhiyunGorman [1]_ have shown a slowdown of 5-10% for some workloads. 35*4882a593Smuzhiyun 36*4882a593SmuzhiyunIn contrast, the ORC unwinder has no effect on text size or runtime 37*4882a593Smuzhiyunperformance, because the debuginfo is out of band. So if you disable 38*4882a593Smuzhiyunframe pointers and enable the ORC unwinder, you get a nice performance 39*4882a593Smuzhiyunimprovement across the board, and still have reliable stack traces. 40*4882a593Smuzhiyun 41*4882a593SmuzhiyunIngo Molnar says: 42*4882a593Smuzhiyun 43*4882a593Smuzhiyun "Note that it's not just a performance improvement, but also an 44*4882a593Smuzhiyun instruction cache locality improvement: 3.2% .text savings almost 45*4882a593Smuzhiyun directly transform into a similarly sized reduction in cache 46*4882a593Smuzhiyun footprint. That can transform to even higher speedups for workloads 47*4882a593Smuzhiyun whose cache locality is borderline." 48*4882a593Smuzhiyun 49*4882a593SmuzhiyunAnother benefit of ORC compared to frame pointers is that it can 50*4882a593Smuzhiyunreliably unwind across interrupts and exceptions. Frame pointer based 51*4882a593Smuzhiyununwinds can sometimes skip the caller of the interrupted function, if it 52*4882a593Smuzhiyunwas a leaf function or if the interrupt hit before the frame pointer was 53*4882a593Smuzhiyunsaved. 54*4882a593Smuzhiyun 55*4882a593SmuzhiyunThe main disadvantage of the ORC unwinder compared to frame pointers is 56*4882a593Smuzhiyunthat it needs more memory to store the ORC unwind tables: roughly 2-4MB 57*4882a593Smuzhiyundepending on the kernel config. 58*4882a593Smuzhiyun 59*4882a593Smuzhiyun 60*4882a593SmuzhiyunORC vs DWARF 61*4882a593Smuzhiyun============ 62*4882a593Smuzhiyun 63*4882a593SmuzhiyunORC debuginfo's advantage over DWARF itself is that it's much simpler. 64*4882a593SmuzhiyunIt gets rid of the complex DWARF CFI state machine and also gets rid of 65*4882a593Smuzhiyunthe tracking of unnecessary registers. This allows the unwinder to be 66*4882a593Smuzhiyunmuch simpler, meaning fewer bugs, which is especially important for 67*4882a593Smuzhiyunmission critical oops code. 68*4882a593Smuzhiyun 69*4882a593SmuzhiyunThe simpler debuginfo format also enables the unwinder to be much faster 70*4882a593Smuzhiyunthan DWARF, which is important for perf and lockdep. In a basic 71*4882a593Smuzhiyunperformance test by Jiri Slaby [2]_, the ORC unwinder was about 20x 72*4882a593Smuzhiyunfaster than an out-of-tree DWARF unwinder. (Note: That measurement was 73*4882a593Smuzhiyuntaken before some performance tweaks were added, which doubled 74*4882a593Smuzhiyunperformance, so the speedup over DWARF may be closer to 40x.) 75*4882a593Smuzhiyun 76*4882a593SmuzhiyunThe ORC data format does have a few downsides compared to DWARF. ORC 77*4882a593Smuzhiyununwind tables take up ~50% more RAM (+1.3MB on an x86 defconfig kernel) 78*4882a593Smuzhiyunthan DWARF-based eh_frame tables. 79*4882a593Smuzhiyun 80*4882a593SmuzhiyunAnother potential downside is that, as GCC evolves, it's conceivable 81*4882a593Smuzhiyunthat the ORC data may end up being *too* simple to describe the state of 82*4882a593Smuzhiyunthe stack for certain optimizations. But IMO this is unlikely because 83*4882a593SmuzhiyunGCC saves the frame pointer for any unusual stack adjustments it does, 84*4882a593Smuzhiyunso I suspect we'll really only ever need to keep track of the stack 85*4882a593Smuzhiyunpointer and the frame pointer between call frames. But even if we do 86*4882a593Smuzhiyunend up having to track all the registers DWARF tracks, at least we will 87*4882a593Smuzhiyunstill be able to control the format, e.g. no complex state machines. 88*4882a593Smuzhiyun 89*4882a593Smuzhiyun 90*4882a593SmuzhiyunORC unwind table generation 91*4882a593Smuzhiyun=========================== 92*4882a593Smuzhiyun 93*4882a593SmuzhiyunThe ORC data is generated by objtool. With the existing compile-time 94*4882a593Smuzhiyunstack metadata validation feature, objtool already follows all code 95*4882a593Smuzhiyunpaths, and so it already has all the information it needs to be able to 96*4882a593Smuzhiyungenerate ORC data from scratch. So it's an easy step to go from stack 97*4882a593Smuzhiyunvalidation to ORC data generation. 98*4882a593Smuzhiyun 99*4882a593SmuzhiyunIt should be possible to instead generate the ORC data with a simple 100*4882a593Smuzhiyuntool which converts DWARF to ORC data. However, such a solution would 101*4882a593Smuzhiyunbe incomplete due to the kernel's extensive use of asm, inline asm, and 102*4882a593Smuzhiyunspecial sections like exception tables. 103*4882a593Smuzhiyun 104*4882a593SmuzhiyunThat could be rectified by manually annotating those special code paths 105*4882a593Smuzhiyunusing GNU assembler .cfi annotations in .S files, and homegrown 106*4882a593Smuzhiyunannotations for inline asm in .c files. But asm annotations were tried 107*4882a593Smuzhiyunin the past and were found to be unmaintainable. They were often 108*4882a593Smuzhiyunincorrect/incomplete and made the code harder to read and keep updated. 109*4882a593SmuzhiyunAnd based on looking at glibc code, annotating inline asm in .c files 110*4882a593Smuzhiyunmight be even worse. 111*4882a593Smuzhiyun 112*4882a593SmuzhiyunObjtool still needs a few annotations, but only in code which does 113*4882a593Smuzhiyununusual things to the stack like entry code. And even then, far fewer 114*4882a593Smuzhiyunannotations are needed than what DWARF would need, so they're much more 115*4882a593Smuzhiyunmaintainable than DWARF CFI annotations. 116*4882a593Smuzhiyun 117*4882a593SmuzhiyunSo the advantages of using objtool to generate ORC data are that it 118*4882a593Smuzhiyungives more accurate debuginfo, with very few annotations. It also 119*4882a593Smuzhiyuninsulates the kernel from toolchain bugs which can be very painful to 120*4882a593Smuzhiyundeal with in the kernel since we often have to workaround issues in 121*4882a593Smuzhiyunolder versions of the toolchain for years. 122*4882a593Smuzhiyun 123*4882a593SmuzhiyunThe downside is that the unwinder now becomes dependent on objtool's 124*4882a593Smuzhiyunability to reverse engineer GCC code flow. If GCC optimizations become 125*4882a593Smuzhiyuntoo complicated for objtool to follow, the ORC data generation might 126*4882a593Smuzhiyunstop working or become incomplete. (It's worth noting that livepatch 127*4882a593Smuzhiyunalready has such a dependency on objtool's ability to follow GCC code 128*4882a593Smuzhiyunflow.) 129*4882a593Smuzhiyun 130*4882a593SmuzhiyunIf newer versions of GCC come up with some optimizations which break 131*4882a593Smuzhiyunobjtool, we may need to revisit the current implementation. Some 132*4882a593Smuzhiyunpossible solutions would be asking GCC to make the optimizations more 133*4882a593Smuzhiyunpalatable, or having objtool use DWARF as an additional input, or 134*4882a593Smuzhiyuncreating a GCC plugin to assist objtool with its analysis. But for now, 135*4882a593Smuzhiyunobjtool follows GCC code quite well. 136*4882a593Smuzhiyun 137*4882a593Smuzhiyun 138*4882a593SmuzhiyunUnwinder implementation details 139*4882a593Smuzhiyun=============================== 140*4882a593Smuzhiyun 141*4882a593SmuzhiyunObjtool generates the ORC data by integrating with the compile-time 142*4882a593Smuzhiyunstack metadata validation feature, which is described in detail in 143*4882a593Smuzhiyuntools/objtool/Documentation/stack-validation.txt. After analyzing all 144*4882a593Smuzhiyunthe code paths of a .o file, it creates an array of orc_entry structs, 145*4882a593Smuzhiyunand a parallel array of instruction addresses associated with those 146*4882a593Smuzhiyunstructs, and writes them to the .orc_unwind and .orc_unwind_ip sections 147*4882a593Smuzhiyunrespectively. 148*4882a593Smuzhiyun 149*4882a593SmuzhiyunThe ORC data is split into the two arrays for performance reasons, to 150*4882a593Smuzhiyunmake the searchable part of the data (.orc_unwind_ip) more compact. The 151*4882a593Smuzhiyunarrays are sorted in parallel at boot time. 152*4882a593Smuzhiyun 153*4882a593SmuzhiyunPerformance is further improved by the use of a fast lookup table which 154*4882a593Smuzhiyunis created at runtime. The fast lookup table associates a given address 155*4882a593Smuzhiyunwith a range of indices for the .orc_unwind table, so that only a small 156*4882a593Smuzhiyunsubset of the table needs to be searched. 157*4882a593Smuzhiyun 158*4882a593Smuzhiyun 159*4882a593SmuzhiyunEtymology 160*4882a593Smuzhiyun========= 161*4882a593Smuzhiyun 162*4882a593SmuzhiyunOrcs, fearsome creatures of medieval folklore, are the Dwarves' natural 163*4882a593Smuzhiyunenemies. Similarly, the ORC unwinder was created in opposition to the 164*4882a593Smuzhiyuncomplexity and slowness of DWARF. 165*4882a593Smuzhiyun 166*4882a593Smuzhiyun"Although Orcs rarely consider multiple solutions to a problem, they do 167*4882a593Smuzhiyunexcel at getting things done because they are creatures of action, not 168*4882a593Smuzhiyunthought." [3]_ Similarly, unlike the esoteric DWARF unwinder, the 169*4882a593Smuzhiyunveracious ORC unwinder wastes no time or siloconic effort decoding 170*4882a593Smuzhiyunvariable-length zero-extended unsigned-integer byte-coded 171*4882a593Smuzhiyunstate-machine-based debug information entries. 172*4882a593Smuzhiyun 173*4882a593SmuzhiyunSimilar to how Orcs frequently unravel the well-intentioned plans of 174*4882a593Smuzhiyuntheir adversaries, the ORC unwinder frequently unravels stacks with 175*4882a593Smuzhiyunbrutal, unyielding efficiency. 176*4882a593Smuzhiyun 177*4882a593SmuzhiyunORC stands for Oops Rewind Capability. 178*4882a593Smuzhiyun 179*4882a593Smuzhiyun 180*4882a593Smuzhiyun.. [1] https://lkml.kernel.org/r/20170602104048.jkkzssljsompjdwy@suse.de 181*4882a593Smuzhiyun.. [2] https://lkml.kernel.org/r/d2ca5435-6386-29b8-db87-7f227c2b713a@suse.cz 182*4882a593Smuzhiyun.. [3] http://dustin.wikidot.com/half-orcs-and-orcs 183