1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun============================ 4*4882a593SmuzhiyunLC-trie implementation notes 5*4882a593Smuzhiyun============================ 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunNode types 8*4882a593Smuzhiyun---------- 9*4882a593Smuzhiyunleaf 10*4882a593Smuzhiyun An end node with data. This has a copy of the relevant key, along 11*4882a593Smuzhiyun with 'hlist' with routing table entries sorted by prefix length. 12*4882a593Smuzhiyun See struct leaf and struct leaf_info. 13*4882a593Smuzhiyun 14*4882a593Smuzhiyuntrie node or tnode 15*4882a593Smuzhiyun An internal node, holding an array of child (leaf or tnode) pointers, 16*4882a593Smuzhiyun indexed through a subset of the key. See Level Compression. 17*4882a593Smuzhiyun 18*4882a593SmuzhiyunA few concepts explained 19*4882a593Smuzhiyun------------------------ 20*4882a593SmuzhiyunBits (tnode) 21*4882a593Smuzhiyun The number of bits in the key segment used for indexing into the 22*4882a593Smuzhiyun child array - the "child index". See Level Compression. 23*4882a593Smuzhiyun 24*4882a593SmuzhiyunPos (tnode) 25*4882a593Smuzhiyun The position (in the key) of the key segment used for indexing into 26*4882a593Smuzhiyun the child array. See Path Compression. 27*4882a593Smuzhiyun 28*4882a593SmuzhiyunPath Compression / skipped bits 29*4882a593Smuzhiyun Any given tnode is linked to from the child array of its parent, using 30*4882a593Smuzhiyun a segment of the key specified by the parent's "pos" and "bits" 31*4882a593Smuzhiyun In certain cases, this tnode's own "pos" will not be immediately 32*4882a593Smuzhiyun adjacent to the parent (pos+bits), but there will be some bits 33*4882a593Smuzhiyun in the key skipped over because they represent a single path with no 34*4882a593Smuzhiyun deviations. These "skipped bits" constitute Path Compression. 35*4882a593Smuzhiyun Note that the search algorithm will simply skip over these bits when 36*4882a593Smuzhiyun searching, making it necessary to save the keys in the leaves to 37*4882a593Smuzhiyun verify that they actually do match the key we are searching for. 38*4882a593Smuzhiyun 39*4882a593SmuzhiyunLevel Compression / child arrays 40*4882a593Smuzhiyun the trie is kept level balanced moving, under certain conditions, the 41*4882a593Smuzhiyun children of a full child (see "full_children") up one level, so that 42*4882a593Smuzhiyun instead of a pure binary tree, each internal node ("tnode") may 43*4882a593Smuzhiyun contain an arbitrarily large array of links to several children. 44*4882a593Smuzhiyun Conversely, a tnode with a mostly empty child array (see empty_children) 45*4882a593Smuzhiyun may be "halved", having some of its children moved downwards one level, 46*4882a593Smuzhiyun in order to avoid ever-increasing child arrays. 47*4882a593Smuzhiyun 48*4882a593Smuzhiyunempty_children 49*4882a593Smuzhiyun the number of positions in the child array of a given tnode that are 50*4882a593Smuzhiyun NULL. 51*4882a593Smuzhiyun 52*4882a593Smuzhiyunfull_children 53*4882a593Smuzhiyun the number of children of a given tnode that aren't path compressed. 54*4882a593Smuzhiyun (in other words, they aren't NULL or leaves and their "pos" is equal 55*4882a593Smuzhiyun to this tnode's "pos"+"bits"). 56*4882a593Smuzhiyun 57*4882a593Smuzhiyun (The word "full" here is used more in the sense of "complete" than 58*4882a593Smuzhiyun as the opposite of "empty", which might be a tad confusing.) 59*4882a593Smuzhiyun 60*4882a593SmuzhiyunComments 61*4882a593Smuzhiyun--------- 62*4882a593Smuzhiyun 63*4882a593SmuzhiyunWe have tried to keep the structure of the code as close to fib_hash as 64*4882a593Smuzhiyunpossible to allow verification and help up reviewing. 65*4882a593Smuzhiyun 66*4882a593Smuzhiyunfib_find_node() 67*4882a593Smuzhiyun A good start for understanding this code. This function implements a 68*4882a593Smuzhiyun straightforward trie lookup. 69*4882a593Smuzhiyun 70*4882a593Smuzhiyunfib_insert_node() 71*4882a593Smuzhiyun Inserts a new leaf node in the trie. This is bit more complicated than 72*4882a593Smuzhiyun fib_find_node(). Inserting a new node means we might have to run the 73*4882a593Smuzhiyun level compression algorithm on part of the trie. 74*4882a593Smuzhiyun 75*4882a593Smuzhiyuntrie_leaf_remove() 76*4882a593Smuzhiyun Looks up a key, deletes it and runs the level compression algorithm. 77*4882a593Smuzhiyun 78*4882a593Smuzhiyuntrie_rebalance() 79*4882a593Smuzhiyun The key function for the dynamic trie after any change in the trie 80*4882a593Smuzhiyun it is run to optimize and reorganize. It will walk the trie upwards 81*4882a593Smuzhiyun towards the root from a given tnode, doing a resize() at each step 82*4882a593Smuzhiyun to implement level compression. 83*4882a593Smuzhiyun 84*4882a593Smuzhiyunresize() 85*4882a593Smuzhiyun Analyzes a tnode and optimizes the child array size by either inflating 86*4882a593Smuzhiyun or shrinking it repeatedly until it fulfills the criteria for optimal 87*4882a593Smuzhiyun level compression. This part follows the original paper pretty closely 88*4882a593Smuzhiyun and there may be some room for experimentation here. 89*4882a593Smuzhiyun 90*4882a593Smuzhiyuninflate() 91*4882a593Smuzhiyun Doubles the size of the child array within a tnode. Used by resize(). 92*4882a593Smuzhiyun 93*4882a593Smuzhiyunhalve() 94*4882a593Smuzhiyun Halves the size of the child array within a tnode - the inverse of 95*4882a593Smuzhiyun inflate(). Used by resize(); 96*4882a593Smuzhiyun 97*4882a593Smuzhiyunfn_trie_insert(), fn_trie_delete(), fn_trie_select_default() 98*4882a593Smuzhiyun The route manipulation functions. Should conform pretty closely to the 99*4882a593Smuzhiyun corresponding functions in fib_hash. 100*4882a593Smuzhiyun 101*4882a593Smuzhiyunfn_trie_flush() 102*4882a593Smuzhiyun This walks the full trie (using nextleaf()) and searches for empty 103*4882a593Smuzhiyun leaves which have to be removed. 104*4882a593Smuzhiyun 105*4882a593Smuzhiyunfn_trie_dump() 106*4882a593Smuzhiyun Dumps the routing table ordered by prefix length. This is somewhat 107*4882a593Smuzhiyun slower than the corresponding fib_hash function, as we have to walk the 108*4882a593Smuzhiyun entire trie for each prefix length. In comparison, fib_hash is organized 109*4882a593Smuzhiyun as one "zone"/hash per prefix length. 110*4882a593Smuzhiyun 111*4882a593SmuzhiyunLocking 112*4882a593Smuzhiyun------- 113*4882a593Smuzhiyun 114*4882a593Smuzhiyunfib_lock is used for an RW-lock in the same way that this is done in fib_hash. 115*4882a593SmuzhiyunHowever, the functions are somewhat separated for other possible locking 116*4882a593Smuzhiyunscenarios. It might conceivably be possible to run trie_rebalance via RCU 117*4882a593Smuzhiyunto avoid read_lock in the fn_trie_lookup() function. 118*4882a593Smuzhiyun 119*4882a593SmuzhiyunMain lookup mechanism 120*4882a593Smuzhiyun--------------------- 121*4882a593Smuzhiyunfn_trie_lookup() is the main lookup function. 122*4882a593Smuzhiyun 123*4882a593SmuzhiyunThe lookup is in its simplest form just like fib_find_node(). We descend the 124*4882a593Smuzhiyuntrie, key segment by key segment, until we find a leaf. check_leaf() does 125*4882a593Smuzhiyunthe fib_semantic_match in the leaf's sorted prefix hlist. 126*4882a593Smuzhiyun 127*4882a593SmuzhiyunIf we find a match, we are done. 128*4882a593Smuzhiyun 129*4882a593SmuzhiyunIf we don't find a match, we enter prefix matching mode. The prefix length, 130*4882a593Smuzhiyunstarting out at the same as the key length, is reduced one step at a time, 131*4882a593Smuzhiyunand we backtrack upwards through the trie trying to find a longest matching 132*4882a593Smuzhiyunprefix. The goal is always to reach a leaf and get a positive result from the 133*4882a593Smuzhiyunfib_semantic_match mechanism. 134*4882a593Smuzhiyun 135*4882a593SmuzhiyunInside each tnode, the search for longest matching prefix consists of searching 136*4882a593Smuzhiyunthrough the child array, chopping off (zeroing) the least significant "1" of 137*4882a593Smuzhiyunthe child index until we find a match or the child index consists of nothing but 138*4882a593Smuzhiyunzeros. 139*4882a593Smuzhiyun 140*4882a593SmuzhiyunAt this point we backtrack (t->stats.backtrack++) up the trie, continuing to 141*4882a593Smuzhiyunchop off part of the key in order to find the longest matching prefix. 142*4882a593Smuzhiyun 143*4882a593SmuzhiyunAt this point we will repeatedly descend subtries to look for a match, and there 144*4882a593Smuzhiyunare some optimizations available that can provide us with "shortcuts" to avoid 145*4882a593Smuzhiyundescending into dead ends. Look for "HL_OPTIMIZE" sections in the code. 146*4882a593Smuzhiyun 147*4882a593SmuzhiyunTo alleviate any doubts about the correctness of the route selection process, 148*4882a593Smuzhiyuna new netlink operation has been added. Look for NETLINK_FIB_LOOKUP, which 149*4882a593Smuzhiyungives userland access to fib_lookup(). 150