1*4882a593Smuzhiyun====================================== 2*4882a593SmuzhiyunImmutable biovecs and biovec iterators 3*4882a593Smuzhiyun====================================== 4*4882a593Smuzhiyun 5*4882a593SmuzhiyunKent Overstreet <kmo@daterainc.com> 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunAs of 3.13, biovecs should never be modified after a bio has been submitted. 8*4882a593SmuzhiyunInstead, we have a new struct bvec_iter which represents a range of a biovec - 9*4882a593Smuzhiyunthe iterator will be modified as the bio is completed, not the biovec. 10*4882a593Smuzhiyun 11*4882a593SmuzhiyunMore specifically, old code that needed to partially complete a bio would 12*4882a593Smuzhiyunupdate bi_sector and bi_size, and advance bi_idx to the next biovec. If it 13*4882a593Smuzhiyunended up partway through a biovec, it would increment bv_offset and decrement 14*4882a593Smuzhiyunbv_len by the number of bytes completed in that biovec. 15*4882a593Smuzhiyun 16*4882a593SmuzhiyunIn the new scheme of things, everything that must be mutated in order to 17*4882a593Smuzhiyunpartially complete a bio is segregated into struct bvec_iter: bi_sector, 18*4882a593Smuzhiyunbi_size and bi_idx have been moved there; and instead of modifying bv_offset 19*4882a593Smuzhiyunand bv_len, struct bvec_iter has bi_bvec_done, which represents the number of 20*4882a593Smuzhiyunbytes completed in the current bvec. 21*4882a593Smuzhiyun 22*4882a593SmuzhiyunThere are a bunch of new helper macros for hiding the gory details - in 23*4882a593Smuzhiyunparticular, presenting the illusion of partially completed biovecs so that 24*4882a593Smuzhiyunnormal code doesn't have to deal with bi_bvec_done. 25*4882a593Smuzhiyun 26*4882a593Smuzhiyun * Driver code should no longer refer to biovecs directly; we now have 27*4882a593Smuzhiyun bio_iovec() and bio_iter_iovec() macros that return literal struct biovecs, 28*4882a593Smuzhiyun constructed from the raw biovecs but taking into account bi_bvec_done and 29*4882a593Smuzhiyun bi_size. 30*4882a593Smuzhiyun 31*4882a593Smuzhiyun bio_for_each_segment() has been updated to take a bvec_iter argument 32*4882a593Smuzhiyun instead of an integer (that corresponded to bi_idx); for a lot of code the 33*4882a593Smuzhiyun conversion just required changing the types of the arguments to 34*4882a593Smuzhiyun bio_for_each_segment(). 35*4882a593Smuzhiyun 36*4882a593Smuzhiyun * Advancing a bvec_iter is done with bio_advance_iter(); bio_advance() is a 37*4882a593Smuzhiyun wrapper around bio_advance_iter() that operates on bio->bi_iter, and also 38*4882a593Smuzhiyun advances the bio integrity's iter if present. 39*4882a593Smuzhiyun 40*4882a593Smuzhiyun There is a lower level advance function - bvec_iter_advance() - which takes 41*4882a593Smuzhiyun a pointer to a biovec, not a bio; this is used by the bio integrity code. 42*4882a593Smuzhiyun 43*4882a593SmuzhiyunWhat's all this get us? 44*4882a593Smuzhiyun======================= 45*4882a593Smuzhiyun 46*4882a593SmuzhiyunHaving a real iterator, and making biovecs immutable, has a number of 47*4882a593Smuzhiyunadvantages: 48*4882a593Smuzhiyun 49*4882a593Smuzhiyun * Before, iterating over bios was very awkward when you weren't processing 50*4882a593Smuzhiyun exactly one bvec at a time - for example, bio_copy_data() in block/bio.c, 51*4882a593Smuzhiyun which copies the contents of one bio into another. Because the biovecs 52*4882a593Smuzhiyun wouldn't necessarily be the same size, the old code was tricky convoluted - 53*4882a593Smuzhiyun it had to walk two different bios at the same time, keeping both bi_idx and 54*4882a593Smuzhiyun and offset into the current biovec for each. 55*4882a593Smuzhiyun 56*4882a593Smuzhiyun The new code is much more straightforward - have a look. This sort of 57*4882a593Smuzhiyun pattern comes up in a lot of places; a lot of drivers were essentially open 58*4882a593Smuzhiyun coding bvec iterators before, and having common implementation considerably 59*4882a593Smuzhiyun simplifies a lot of code. 60*4882a593Smuzhiyun 61*4882a593Smuzhiyun * Before, any code that might need to use the biovec after the bio had been 62*4882a593Smuzhiyun completed (perhaps to copy the data somewhere else, or perhaps to resubmit 63*4882a593Smuzhiyun it somewhere else if there was an error) had to save the entire bvec array 64*4882a593Smuzhiyun - again, this was being done in a fair number of places. 65*4882a593Smuzhiyun 66*4882a593Smuzhiyun * Biovecs can be shared between multiple bios - a bvec iter can represent an 67*4882a593Smuzhiyun arbitrary range of an existing biovec, both starting and ending midway 68*4882a593Smuzhiyun through biovecs. This is what enables efficient splitting of arbitrary 69*4882a593Smuzhiyun bios. Note that this means we _only_ use bi_size to determine when we've 70*4882a593Smuzhiyun reached the end of a bio, not bi_vcnt - and the bio_iovec() macro takes 71*4882a593Smuzhiyun bi_size into account when constructing biovecs. 72*4882a593Smuzhiyun 73*4882a593Smuzhiyun * Splitting bios is now much simpler. The old bio_split() didn't even work on 74*4882a593Smuzhiyun bios with more than a single bvec! Now, we can efficiently split arbitrary 75*4882a593Smuzhiyun size bios - because the new bio can share the old bio's biovec. 76*4882a593Smuzhiyun 77*4882a593Smuzhiyun Care must be taken to ensure the biovec isn't freed while the split bio is 78*4882a593Smuzhiyun still using it, in case the original bio completes first, though. Using 79*4882a593Smuzhiyun bio_chain() when splitting bios helps with this. 80*4882a593Smuzhiyun 81*4882a593Smuzhiyun * Submitting partially completed bios is now perfectly fine - this comes up 82*4882a593Smuzhiyun occasionally in stacking block drivers and various code (e.g. md and 83*4882a593Smuzhiyun bcache) had some ugly workarounds for this. 84*4882a593Smuzhiyun 85*4882a593Smuzhiyun It used to be the case that submitting a partially completed bio would work 86*4882a593Smuzhiyun fine to _most_ devices, but since accessing the raw bvec array was the 87*4882a593Smuzhiyun norm, not all drivers would respect bi_idx and those would break. Now, 88*4882a593Smuzhiyun since all drivers _must_ go through the bvec iterator - and have been 89*4882a593Smuzhiyun audited to make sure they are - submitting partially completed bios is 90*4882a593Smuzhiyun perfectly fine. 91*4882a593Smuzhiyun 92*4882a593SmuzhiyunOther implications: 93*4882a593Smuzhiyun=================== 94*4882a593Smuzhiyun 95*4882a593Smuzhiyun * Almost all usage of bi_idx is now incorrect and has been removed; instead, 96*4882a593Smuzhiyun where previously you would have used bi_idx you'd now use a bvec_iter, 97*4882a593Smuzhiyun probably passing it to one of the helper macros. 98*4882a593Smuzhiyun 99*4882a593Smuzhiyun I.e. instead of using bio_iovec_idx() (or bio->bi_iovec[bio->bi_idx]), you 100*4882a593Smuzhiyun now use bio_iter_iovec(), which takes a bvec_iter and returns a 101*4882a593Smuzhiyun literal struct bio_vec - constructed on the fly from the raw biovec but 102*4882a593Smuzhiyun taking into account bi_bvec_done (and bi_size). 103*4882a593Smuzhiyun 104*4882a593Smuzhiyun * bi_vcnt can't be trusted or relied upon by driver code - i.e. anything that 105*4882a593Smuzhiyun doesn't actually own the bio. The reason is twofold: firstly, it's not 106*4882a593Smuzhiyun actually needed for iterating over the bio anymore - we only use bi_size. 107*4882a593Smuzhiyun Secondly, when cloning a bio and reusing (a portion of) the original bio's 108*4882a593Smuzhiyun biovec, in order to calculate bi_vcnt for the new bio we'd have to iterate 109*4882a593Smuzhiyun over all the biovecs in the new bio - which is silly as it's not needed. 110*4882a593Smuzhiyun 111*4882a593Smuzhiyun So, don't use bi_vcnt anymore. 112*4882a593Smuzhiyun 113*4882a593Smuzhiyun * The current interface allows the block layer to split bios as needed, so we 114*4882a593Smuzhiyun could eliminate a lot of complexity particularly in stacked drivers. Code 115*4882a593Smuzhiyun that creates bios can then create whatever size bios are convenient, and 116*4882a593Smuzhiyun more importantly stacked drivers don't have to deal with both their own bio 117*4882a593Smuzhiyun size limitations and the limitations of the underlying devices. Thus 118*4882a593Smuzhiyun there's no need to define ->merge_bvec_fn() callbacks for individual block 119*4882a593Smuzhiyun drivers. 120*4882a593Smuzhiyun 121*4882a593SmuzhiyunUsage of helpers: 122*4882a593Smuzhiyun================= 123*4882a593Smuzhiyun 124*4882a593Smuzhiyun* The following helpers whose names have the suffix of `_all` can only be used 125*4882a593Smuzhiyun on non-BIO_CLONED bio. They are usually used by filesystem code. Drivers 126*4882a593Smuzhiyun shouldn't use them because the bio may have been split before it reached the 127*4882a593Smuzhiyun driver. 128*4882a593Smuzhiyun 129*4882a593Smuzhiyun:: 130*4882a593Smuzhiyun 131*4882a593Smuzhiyun bio_for_each_segment_all() 132*4882a593Smuzhiyun bio_for_each_bvec_all() 133*4882a593Smuzhiyun bio_first_bvec_all() 134*4882a593Smuzhiyun bio_first_page_all() 135*4882a593Smuzhiyun bio_last_bvec_all() 136*4882a593Smuzhiyun 137*4882a593Smuzhiyun* The following helpers iterate over single-page segment. The passed 'struct 138*4882a593Smuzhiyun bio_vec' will contain a single-page IO vector during the iteration:: 139*4882a593Smuzhiyun 140*4882a593Smuzhiyun bio_for_each_segment() 141*4882a593Smuzhiyun bio_for_each_segment_all() 142*4882a593Smuzhiyun 143*4882a593Smuzhiyun* The following helpers iterate over multi-page bvec. The passed 'struct 144*4882a593Smuzhiyun bio_vec' will contain a multi-page IO vector during the iteration:: 145*4882a593Smuzhiyun 146*4882a593Smuzhiyun bio_for_each_bvec() 147*4882a593Smuzhiyun bio_for_each_bvec_all() 148*4882a593Smuzhiyun rq_for_each_bvec() 149