xref: /OK3568_Linux_fs/kernel/Documentation/block/biovecs.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun======================================
2*4882a593SmuzhiyunImmutable biovecs and biovec iterators
3*4882a593Smuzhiyun======================================
4*4882a593Smuzhiyun
5*4882a593SmuzhiyunKent Overstreet <kmo@daterainc.com>
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunAs of 3.13, biovecs should never be modified after a bio has been submitted.
8*4882a593SmuzhiyunInstead, we have a new struct bvec_iter which represents a range of a biovec -
9*4882a593Smuzhiyunthe iterator will be modified as the bio is completed, not the biovec.
10*4882a593Smuzhiyun
11*4882a593SmuzhiyunMore specifically, old code that needed to partially complete a bio would
12*4882a593Smuzhiyunupdate bi_sector and bi_size, and advance bi_idx to the next biovec. If it
13*4882a593Smuzhiyunended up partway through a biovec, it would increment bv_offset and decrement
14*4882a593Smuzhiyunbv_len by the number of bytes completed in that biovec.
15*4882a593Smuzhiyun
16*4882a593SmuzhiyunIn the new scheme of things, everything that must be mutated in order to
17*4882a593Smuzhiyunpartially complete a bio is segregated into struct bvec_iter: bi_sector,
18*4882a593Smuzhiyunbi_size and bi_idx have been moved there; and instead of modifying bv_offset
19*4882a593Smuzhiyunand bv_len, struct bvec_iter has bi_bvec_done, which represents the number of
20*4882a593Smuzhiyunbytes completed in the current bvec.
21*4882a593Smuzhiyun
22*4882a593SmuzhiyunThere are a bunch of new helper macros for hiding the gory details - in
23*4882a593Smuzhiyunparticular, presenting the illusion of partially completed biovecs so that
24*4882a593Smuzhiyunnormal code doesn't have to deal with bi_bvec_done.
25*4882a593Smuzhiyun
26*4882a593Smuzhiyun * Driver code should no longer refer to biovecs directly; we now have
27*4882a593Smuzhiyun   bio_iovec() and bio_iter_iovec() macros that return literal struct biovecs,
28*4882a593Smuzhiyun   constructed from the raw biovecs but taking into account bi_bvec_done and
29*4882a593Smuzhiyun   bi_size.
30*4882a593Smuzhiyun
31*4882a593Smuzhiyun   bio_for_each_segment() has been updated to take a bvec_iter argument
32*4882a593Smuzhiyun   instead of an integer (that corresponded to bi_idx); for a lot of code the
33*4882a593Smuzhiyun   conversion just required changing the types of the arguments to
34*4882a593Smuzhiyun   bio_for_each_segment().
35*4882a593Smuzhiyun
36*4882a593Smuzhiyun * Advancing a bvec_iter is done with bio_advance_iter(); bio_advance() is a
37*4882a593Smuzhiyun   wrapper around bio_advance_iter() that operates on bio->bi_iter, and also
38*4882a593Smuzhiyun   advances the bio integrity's iter if present.
39*4882a593Smuzhiyun
40*4882a593Smuzhiyun   There is a lower level advance function - bvec_iter_advance() - which takes
41*4882a593Smuzhiyun   a pointer to a biovec, not a bio; this is used by the bio integrity code.
42*4882a593Smuzhiyun
43*4882a593SmuzhiyunWhat's all this get us?
44*4882a593Smuzhiyun=======================
45*4882a593Smuzhiyun
46*4882a593SmuzhiyunHaving a real iterator, and making biovecs immutable, has a number of
47*4882a593Smuzhiyunadvantages:
48*4882a593Smuzhiyun
49*4882a593Smuzhiyun * Before, iterating over bios was very awkward when you weren't processing
50*4882a593Smuzhiyun   exactly one bvec at a time - for example, bio_copy_data() in block/bio.c,
51*4882a593Smuzhiyun   which copies the contents of one bio into another. Because the biovecs
52*4882a593Smuzhiyun   wouldn't necessarily be the same size, the old code was tricky convoluted -
53*4882a593Smuzhiyun   it had to walk two different bios at the same time, keeping both bi_idx and
54*4882a593Smuzhiyun   and offset into the current biovec for each.
55*4882a593Smuzhiyun
56*4882a593Smuzhiyun   The new code is much more straightforward - have a look. This sort of
57*4882a593Smuzhiyun   pattern comes up in a lot of places; a lot of drivers were essentially open
58*4882a593Smuzhiyun   coding bvec iterators before, and having common implementation considerably
59*4882a593Smuzhiyun   simplifies a lot of code.
60*4882a593Smuzhiyun
61*4882a593Smuzhiyun * Before, any code that might need to use the biovec after the bio had been
62*4882a593Smuzhiyun   completed (perhaps to copy the data somewhere else, or perhaps to resubmit
63*4882a593Smuzhiyun   it somewhere else if there was an error) had to save the entire bvec array
64*4882a593Smuzhiyun   - again, this was being done in a fair number of places.
65*4882a593Smuzhiyun
66*4882a593Smuzhiyun * Biovecs can be shared between multiple bios - a bvec iter can represent an
67*4882a593Smuzhiyun   arbitrary range of an existing biovec, both starting and ending midway
68*4882a593Smuzhiyun   through biovecs. This is what enables efficient splitting of arbitrary
69*4882a593Smuzhiyun   bios. Note that this means we _only_ use bi_size to determine when we've
70*4882a593Smuzhiyun   reached the end of a bio, not bi_vcnt - and the bio_iovec() macro takes
71*4882a593Smuzhiyun   bi_size into account when constructing biovecs.
72*4882a593Smuzhiyun
73*4882a593Smuzhiyun * Splitting bios is now much simpler. The old bio_split() didn't even work on
74*4882a593Smuzhiyun   bios with more than a single bvec! Now, we can efficiently split arbitrary
75*4882a593Smuzhiyun   size bios - because the new bio can share the old bio's biovec.
76*4882a593Smuzhiyun
77*4882a593Smuzhiyun   Care must be taken to ensure the biovec isn't freed while the split bio is
78*4882a593Smuzhiyun   still using it, in case the original bio completes first, though. Using
79*4882a593Smuzhiyun   bio_chain() when splitting bios helps with this.
80*4882a593Smuzhiyun
81*4882a593Smuzhiyun * Submitting partially completed bios is now perfectly fine - this comes up
82*4882a593Smuzhiyun   occasionally in stacking block drivers and various code (e.g. md and
83*4882a593Smuzhiyun   bcache) had some ugly workarounds for this.
84*4882a593Smuzhiyun
85*4882a593Smuzhiyun   It used to be the case that submitting a partially completed bio would work
86*4882a593Smuzhiyun   fine to _most_ devices, but since accessing the raw bvec array was the
87*4882a593Smuzhiyun   norm, not all drivers would respect bi_idx and those would break. Now,
88*4882a593Smuzhiyun   since all drivers _must_ go through the bvec iterator - and have been
89*4882a593Smuzhiyun   audited to make sure they are - submitting partially completed bios is
90*4882a593Smuzhiyun   perfectly fine.
91*4882a593Smuzhiyun
92*4882a593SmuzhiyunOther implications:
93*4882a593Smuzhiyun===================
94*4882a593Smuzhiyun
95*4882a593Smuzhiyun * Almost all usage of bi_idx is now incorrect and has been removed; instead,
96*4882a593Smuzhiyun   where previously you would have used bi_idx you'd now use a bvec_iter,
97*4882a593Smuzhiyun   probably passing it to one of the helper macros.
98*4882a593Smuzhiyun
99*4882a593Smuzhiyun   I.e. instead of using bio_iovec_idx() (or bio->bi_iovec[bio->bi_idx]), you
100*4882a593Smuzhiyun   now use bio_iter_iovec(), which takes a bvec_iter and returns a
101*4882a593Smuzhiyun   literal struct bio_vec - constructed on the fly from the raw biovec but
102*4882a593Smuzhiyun   taking into account bi_bvec_done (and bi_size).
103*4882a593Smuzhiyun
104*4882a593Smuzhiyun * bi_vcnt can't be trusted or relied upon by driver code - i.e. anything that
105*4882a593Smuzhiyun   doesn't actually own the bio. The reason is twofold: firstly, it's not
106*4882a593Smuzhiyun   actually needed for iterating over the bio anymore - we only use bi_size.
107*4882a593Smuzhiyun   Secondly, when cloning a bio and reusing (a portion of) the original bio's
108*4882a593Smuzhiyun   biovec, in order to calculate bi_vcnt for the new bio we'd have to iterate
109*4882a593Smuzhiyun   over all the biovecs in the new bio - which is silly as it's not needed.
110*4882a593Smuzhiyun
111*4882a593Smuzhiyun   So, don't use bi_vcnt anymore.
112*4882a593Smuzhiyun
113*4882a593Smuzhiyun * The current interface allows the block layer to split bios as needed, so we
114*4882a593Smuzhiyun   could eliminate a lot of complexity particularly in stacked drivers. Code
115*4882a593Smuzhiyun   that creates bios can then create whatever size bios are convenient, and
116*4882a593Smuzhiyun   more importantly stacked drivers don't have to deal with both their own bio
117*4882a593Smuzhiyun   size limitations and the limitations of the underlying devices. Thus
118*4882a593Smuzhiyun   there's no need to define ->merge_bvec_fn() callbacks for individual block
119*4882a593Smuzhiyun   drivers.
120*4882a593Smuzhiyun
121*4882a593SmuzhiyunUsage of helpers:
122*4882a593Smuzhiyun=================
123*4882a593Smuzhiyun
124*4882a593Smuzhiyun* The following helpers whose names have the suffix of `_all` can only be used
125*4882a593Smuzhiyun  on non-BIO_CLONED bio. They are usually used by filesystem code. Drivers
126*4882a593Smuzhiyun  shouldn't use them because the bio may have been split before it reached the
127*4882a593Smuzhiyun  driver.
128*4882a593Smuzhiyun
129*4882a593Smuzhiyun::
130*4882a593Smuzhiyun
131*4882a593Smuzhiyun	bio_for_each_segment_all()
132*4882a593Smuzhiyun	bio_for_each_bvec_all()
133*4882a593Smuzhiyun	bio_first_bvec_all()
134*4882a593Smuzhiyun	bio_first_page_all()
135*4882a593Smuzhiyun	bio_last_bvec_all()
136*4882a593Smuzhiyun
137*4882a593Smuzhiyun* The following helpers iterate over single-page segment. The passed 'struct
138*4882a593Smuzhiyun  bio_vec' will contain a single-page IO vector during the iteration::
139*4882a593Smuzhiyun
140*4882a593Smuzhiyun	bio_for_each_segment()
141*4882a593Smuzhiyun	bio_for_each_segment_all()
142*4882a593Smuzhiyun
143*4882a593Smuzhiyun* The following helpers iterate over multi-page bvec. The passed 'struct
144*4882a593Smuzhiyun  bio_vec' will contain a multi-page IO vector during the iteration::
145*4882a593Smuzhiyun
146*4882a593Smuzhiyun	bio_for_each_bvec()
147*4882a593Smuzhiyun	bio_for_each_bvec_all()
148*4882a593Smuzhiyun	rq_for_each_bvec()
149