1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun==================================================== 4*4882a593SmuzhiyunIn-Kernel Cache Object Representation and Management 5*4882a593Smuzhiyun==================================================== 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunBy: David Howells <dhowells@redhat.com> 8*4882a593Smuzhiyun 9*4882a593Smuzhiyun.. Contents: 10*4882a593Smuzhiyun 11*4882a593Smuzhiyun (*) Representation 12*4882a593Smuzhiyun 13*4882a593Smuzhiyun (*) Object management state machine. 14*4882a593Smuzhiyun 15*4882a593Smuzhiyun - Provision of cpu time. 16*4882a593Smuzhiyun - Locking simplification. 17*4882a593Smuzhiyun 18*4882a593Smuzhiyun (*) The set of states. 19*4882a593Smuzhiyun 20*4882a593Smuzhiyun (*) The set of events. 21*4882a593Smuzhiyun 22*4882a593Smuzhiyun 23*4882a593SmuzhiyunRepresentation 24*4882a593Smuzhiyun============== 25*4882a593Smuzhiyun 26*4882a593SmuzhiyunFS-Cache maintains an in-kernel representation of each object that a netfs is 27*4882a593Smuzhiyuncurrently interested in. Such objects are represented by the fscache_cookie 28*4882a593Smuzhiyunstruct and are referred to as cookies. 29*4882a593Smuzhiyun 30*4882a593SmuzhiyunFS-Cache also maintains a separate in-kernel representation of the objects that 31*4882a593Smuzhiyuna cache backend is currently actively caching. Such objects are represented by 32*4882a593Smuzhiyunthe fscache_object struct. The cache backends allocate these upon request, and 33*4882a593Smuzhiyunare expected to embed them in their own representations. These are referred to 34*4882a593Smuzhiyunas objects. 35*4882a593Smuzhiyun 36*4882a593SmuzhiyunThere is a 1:N relationship between cookies and objects. A cookie may be 37*4882a593Smuzhiyunrepresented by multiple objects - an index may exist in more than one cache - 38*4882a593Smuzhiyunor even by no objects (it may not be cached). 39*4882a593Smuzhiyun 40*4882a593SmuzhiyunFurthermore, both cookies and objects are hierarchical. The two hierarchies 41*4882a593Smuzhiyuncorrespond, but the cookies tree is a superset of the union of the object trees 42*4882a593Smuzhiyunof multiple caches:: 43*4882a593Smuzhiyun 44*4882a593Smuzhiyun NETFS INDEX TREE : CACHE 1 : CACHE 2 45*4882a593Smuzhiyun : : 46*4882a593Smuzhiyun : +-----------+ : 47*4882a593Smuzhiyun +----------->| IObject | : 48*4882a593Smuzhiyun +-----------+ | : +-----------+ : 49*4882a593Smuzhiyun | ICookie |-------+ : | : 50*4882a593Smuzhiyun +-----------+ | : | : +-----------+ 51*4882a593Smuzhiyun | +------------------------------>| IObject | 52*4882a593Smuzhiyun | : | : +-----------+ 53*4882a593Smuzhiyun | : V : | 54*4882a593Smuzhiyun | : +-----------+ : | 55*4882a593Smuzhiyun V +----------->| IObject | : | 56*4882a593Smuzhiyun +-----------+ | : +-----------+ : | 57*4882a593Smuzhiyun | ICookie |-------+ : | : V 58*4882a593Smuzhiyun +-----------+ | : | : +-----------+ 59*4882a593Smuzhiyun | +------------------------------>| IObject | 60*4882a593Smuzhiyun +-----+-----+ : | : +-----------+ 61*4882a593Smuzhiyun | | : | : | 62*4882a593Smuzhiyun V | : V : | 63*4882a593Smuzhiyun +-----------+ | : +-----------+ : | 64*4882a593Smuzhiyun | ICookie |------------------------->| IObject | : | 65*4882a593Smuzhiyun +-----------+ | : +-----------+ : | 66*4882a593Smuzhiyun | V : | : V 67*4882a593Smuzhiyun | +-----------+ : | : +-----------+ 68*4882a593Smuzhiyun | | ICookie |-------------------------------->| IObject | 69*4882a593Smuzhiyun | +-----------+ : | : +-----------+ 70*4882a593Smuzhiyun V | : V : | 71*4882a593Smuzhiyun +-----------+ | : +-----------+ : | 72*4882a593Smuzhiyun | DCookie |------------------------->| DObject | : | 73*4882a593Smuzhiyun +-----------+ | : +-----------+ : | 74*4882a593Smuzhiyun | : : | 75*4882a593Smuzhiyun +-------+-------+ : : | 76*4882a593Smuzhiyun | | : : | 77*4882a593Smuzhiyun V V : : V 78*4882a593Smuzhiyun +-----------+ +-----------+ : : +-----------+ 79*4882a593Smuzhiyun | DCookie | | DCookie |------------------------>| DObject | 80*4882a593Smuzhiyun +-----------+ +-----------+ : : +-----------+ 81*4882a593Smuzhiyun : : 82*4882a593Smuzhiyun 83*4882a593SmuzhiyunIn the above illustration, ICookie and IObject represent indices and DCookie 84*4882a593Smuzhiyunand DObject represent data storage objects. Indices may have representation in 85*4882a593Smuzhiyunmultiple caches, but currently, non-index objects may not. Objects of any type 86*4882a593Smuzhiyunmay also be entirely unrepresented. 87*4882a593Smuzhiyun 88*4882a593SmuzhiyunAs far as the netfs API goes, the netfs is only actually permitted to see 89*4882a593Smuzhiyunpointers to the cookies. The cookies themselves and any objects attached to 90*4882a593Smuzhiyunthose cookies are hidden from it. 91*4882a593Smuzhiyun 92*4882a593Smuzhiyun 93*4882a593SmuzhiyunObject Management State Machine 94*4882a593Smuzhiyun=============================== 95*4882a593Smuzhiyun 96*4882a593SmuzhiyunWithin FS-Cache, each active object is managed by its own individual state 97*4882a593Smuzhiyunmachine. The state for an object is kept in the fscache_object struct, in 98*4882a593Smuzhiyunobject->state. A cookie may point to a set of objects that are in different 99*4882a593Smuzhiyunstates. 100*4882a593Smuzhiyun 101*4882a593SmuzhiyunEach state has an action associated with it that is invoked when the machine 102*4882a593Smuzhiyunwakes up in that state. There are four logical sets of states: 103*4882a593Smuzhiyun 104*4882a593Smuzhiyun (1) Preparation: states that wait for the parent objects to become ready. The 105*4882a593Smuzhiyun representations are hierarchical, and it is expected that an object must 106*4882a593Smuzhiyun be created or accessed with respect to its parent object. 107*4882a593Smuzhiyun 108*4882a593Smuzhiyun (2) Initialisation: states that perform lookups in the cache and validate 109*4882a593Smuzhiyun what's found and that create on disk any missing metadata. 110*4882a593Smuzhiyun 111*4882a593Smuzhiyun (3) Normal running: states that allow netfs operations on objects to proceed 112*4882a593Smuzhiyun and that update the state of objects. 113*4882a593Smuzhiyun 114*4882a593Smuzhiyun (4) Termination: states that detach objects from their netfs cookies, that 115*4882a593Smuzhiyun delete objects from disk, that handle disk and system errors and that free 116*4882a593Smuzhiyun up in-memory resources. 117*4882a593Smuzhiyun 118*4882a593Smuzhiyun 119*4882a593SmuzhiyunIn most cases, transitioning between states is in response to signalled events. 120*4882a593SmuzhiyunWhen a state has finished processing, it will usually set the mask of events in 121*4882a593Smuzhiyunwhich it is interested (object->event_mask) and relinquish the worker thread. 122*4882a593SmuzhiyunThen when an event is raised (by calling fscache_raise_event()), if the event 123*4882a593Smuzhiyunis not masked, the object will be queued for processing (by calling 124*4882a593Smuzhiyunfscache_enqueue_object()). 125*4882a593Smuzhiyun 126*4882a593Smuzhiyun 127*4882a593SmuzhiyunProvision of CPU Time 128*4882a593Smuzhiyun--------------------- 129*4882a593Smuzhiyun 130*4882a593SmuzhiyunThe work to be done by the various states was given CPU time by the threads of 131*4882a593Smuzhiyunthe slow work facility. This was used in preference to the workqueue facility 132*4882a593Smuzhiyunbecause: 133*4882a593Smuzhiyun 134*4882a593Smuzhiyun (1) Threads may be completely occupied for very long periods of time by a 135*4882a593Smuzhiyun particular work item. These state actions may be doing sequences of 136*4882a593Smuzhiyun synchronous, journalled disk accesses (lookup, mkdir, create, setxattr, 137*4882a593Smuzhiyun getxattr, truncate, unlink, rmdir, rename). 138*4882a593Smuzhiyun 139*4882a593Smuzhiyun (2) Threads may do little actual work, but may rather spend a lot of time 140*4882a593Smuzhiyun sleeping on I/O. This means that single-threaded and 1-per-CPU-threaded 141*4882a593Smuzhiyun workqueues don't necessarily have the right numbers of threads. 142*4882a593Smuzhiyun 143*4882a593Smuzhiyun 144*4882a593SmuzhiyunLocking Simplification 145*4882a593Smuzhiyun---------------------- 146*4882a593Smuzhiyun 147*4882a593SmuzhiyunBecause only one worker thread may be operating on any particular object's 148*4882a593Smuzhiyunstate machine at once, this simplifies the locking, particularly with respect 149*4882a593Smuzhiyunto disconnecting the netfs's representation of a cache object (fscache_cookie) 150*4882a593Smuzhiyunfrom the cache backend's representation (fscache_object) - which may be 151*4882a593Smuzhiyunrequested from either end. 152*4882a593Smuzhiyun 153*4882a593Smuzhiyun 154*4882a593SmuzhiyunThe Set of States 155*4882a593Smuzhiyun================= 156*4882a593Smuzhiyun 157*4882a593SmuzhiyunThe object state machine has a set of states that it can be in. There are 158*4882a593Smuzhiyunpreparation states in which the object sets itself up and waits for its parent 159*4882a593Smuzhiyunobject to transit to a state that allows access to its children: 160*4882a593Smuzhiyun 161*4882a593Smuzhiyun (1) State FSCACHE_OBJECT_INIT. 162*4882a593Smuzhiyun 163*4882a593Smuzhiyun Initialise the object and wait for the parent object to become active. In 164*4882a593Smuzhiyun the cache, it is expected that it will not be possible to look an object 165*4882a593Smuzhiyun up from the parent object, until that parent object itself has been looked 166*4882a593Smuzhiyun up. 167*4882a593Smuzhiyun 168*4882a593SmuzhiyunThere are initialisation states in which the object sets itself up and accesses 169*4882a593Smuzhiyundisk for the object metadata: 170*4882a593Smuzhiyun 171*4882a593Smuzhiyun (2) State FSCACHE_OBJECT_LOOKING_UP. 172*4882a593Smuzhiyun 173*4882a593Smuzhiyun Look up the object on disk, using the parent as a starting point. 174*4882a593Smuzhiyun FS-Cache expects the cache backend to probe the cache to see whether this 175*4882a593Smuzhiyun object is represented there, and if it is, to see if it's valid (coherency 176*4882a593Smuzhiyun management). 177*4882a593Smuzhiyun 178*4882a593Smuzhiyun The cache should call fscache_object_lookup_negative() to indicate lookup 179*4882a593Smuzhiyun failure for whatever reason, and should call fscache_obtained_object() to 180*4882a593Smuzhiyun indicate success. 181*4882a593Smuzhiyun 182*4882a593Smuzhiyun At the completion of lookup, FS-Cache will let the netfs go ahead with 183*4882a593Smuzhiyun read operations, no matter whether the file is yet cached. If not yet 184*4882a593Smuzhiyun cached, read operations will be immediately rejected with ENODATA until 185*4882a593Smuzhiyun the first known page is uncached - as to that point there can be no data 186*4882a593Smuzhiyun to be read out of the cache for that file that isn't currently also held 187*4882a593Smuzhiyun in the pagecache. 188*4882a593Smuzhiyun 189*4882a593Smuzhiyun (3) State FSCACHE_OBJECT_CREATING. 190*4882a593Smuzhiyun 191*4882a593Smuzhiyun Create an object on disk, using the parent as a starting point. This 192*4882a593Smuzhiyun happens if the lookup failed to find the object, or if the object's 193*4882a593Smuzhiyun coherency data indicated what's on disk is out of date. In this state, 194*4882a593Smuzhiyun FS-Cache expects the cache to create 195*4882a593Smuzhiyun 196*4882a593Smuzhiyun The cache should call fscache_obtained_object() if creation completes 197*4882a593Smuzhiyun successfully, fscache_object_lookup_negative() otherwise. 198*4882a593Smuzhiyun 199*4882a593Smuzhiyun At the completion of creation, FS-Cache will start processing write 200*4882a593Smuzhiyun operations the netfs has queued for an object. If creation failed, the 201*4882a593Smuzhiyun write ops will be transparently discarded, and nothing recorded in the 202*4882a593Smuzhiyun cache. 203*4882a593Smuzhiyun 204*4882a593SmuzhiyunThere are some normal running states in which the object spends its time 205*4882a593Smuzhiyunservicing netfs requests: 206*4882a593Smuzhiyun 207*4882a593Smuzhiyun (4) State FSCACHE_OBJECT_AVAILABLE. 208*4882a593Smuzhiyun 209*4882a593Smuzhiyun A transient state in which pending operations are started, child objects 210*4882a593Smuzhiyun are permitted to advance from FSCACHE_OBJECT_INIT state, and temporary 211*4882a593Smuzhiyun lookup data is freed. 212*4882a593Smuzhiyun 213*4882a593Smuzhiyun (5) State FSCACHE_OBJECT_ACTIVE. 214*4882a593Smuzhiyun 215*4882a593Smuzhiyun The normal running state. In this state, requests the netfs makes will be 216*4882a593Smuzhiyun passed on to the cache. 217*4882a593Smuzhiyun 218*4882a593Smuzhiyun (6) State FSCACHE_OBJECT_INVALIDATING. 219*4882a593Smuzhiyun 220*4882a593Smuzhiyun The object is undergoing invalidation. When the state comes here, it 221*4882a593Smuzhiyun discards all pending read, write and attribute change operations as it is 222*4882a593Smuzhiyun going to clear out the cache entirely and reinitialise it. It will then 223*4882a593Smuzhiyun continue to the FSCACHE_OBJECT_UPDATING state. 224*4882a593Smuzhiyun 225*4882a593Smuzhiyun (7) State FSCACHE_OBJECT_UPDATING. 226*4882a593Smuzhiyun 227*4882a593Smuzhiyun The state machine comes here to update the object in the cache from the 228*4882a593Smuzhiyun netfs's records. This involves updating the auxiliary data that is used 229*4882a593Smuzhiyun to maintain coherency. 230*4882a593Smuzhiyun 231*4882a593SmuzhiyunAnd there are terminal states in which an object cleans itself up, deallocates 232*4882a593Smuzhiyunmemory and potentially deletes stuff from disk: 233*4882a593Smuzhiyun 234*4882a593Smuzhiyun (8) State FSCACHE_OBJECT_LC_DYING. 235*4882a593Smuzhiyun 236*4882a593Smuzhiyun The object comes here if it is dying because of a lookup or creation 237*4882a593Smuzhiyun error. This would be due to a disk error or system error of some sort. 238*4882a593Smuzhiyun Temporary data is cleaned up, and the parent is released. 239*4882a593Smuzhiyun 240*4882a593Smuzhiyun (9) State FSCACHE_OBJECT_DYING. 241*4882a593Smuzhiyun 242*4882a593Smuzhiyun The object comes here if it is dying due to an error, because its parent 243*4882a593Smuzhiyun cookie has been relinquished by the netfs or because the cache is being 244*4882a593Smuzhiyun withdrawn. 245*4882a593Smuzhiyun 246*4882a593Smuzhiyun Any child objects waiting on this one are given CPU time so that they too 247*4882a593Smuzhiyun can destroy themselves. This object waits for all its children to go away 248*4882a593Smuzhiyun before advancing to the next state. 249*4882a593Smuzhiyun 250*4882a593Smuzhiyun(10) State FSCACHE_OBJECT_ABORT_INIT. 251*4882a593Smuzhiyun 252*4882a593Smuzhiyun The object comes to this state if it was waiting on its parent in 253*4882a593Smuzhiyun FSCACHE_OBJECT_INIT, but its parent died. The object will destroy itself 254*4882a593Smuzhiyun so that the parent may proceed from the FSCACHE_OBJECT_DYING state. 255*4882a593Smuzhiyun 256*4882a593Smuzhiyun(11) State FSCACHE_OBJECT_RELEASING. 257*4882a593Smuzhiyun(12) State FSCACHE_OBJECT_RECYCLING. 258*4882a593Smuzhiyun 259*4882a593Smuzhiyun The object comes to one of these two states when dying once it is rid of 260*4882a593Smuzhiyun all its children, if it is dying because the netfs relinquished its 261*4882a593Smuzhiyun cookie. In the first state, the cached data is expected to persist, and 262*4882a593Smuzhiyun in the second it will be deleted. 263*4882a593Smuzhiyun 264*4882a593Smuzhiyun(13) State FSCACHE_OBJECT_WITHDRAWING. 265*4882a593Smuzhiyun 266*4882a593Smuzhiyun The object transits to this state if the cache decides it wants to 267*4882a593Smuzhiyun withdraw the object from service, perhaps to make space, but also due to 268*4882a593Smuzhiyun error or just because the whole cache is being withdrawn. 269*4882a593Smuzhiyun 270*4882a593Smuzhiyun(14) State FSCACHE_OBJECT_DEAD. 271*4882a593Smuzhiyun 272*4882a593Smuzhiyun The object transits to this state when the in-memory object record is 273*4882a593Smuzhiyun ready to be deleted. The object processor shouldn't ever see an object in 274*4882a593Smuzhiyun this state. 275*4882a593Smuzhiyun 276*4882a593Smuzhiyun 277*4882a593SmuzhiyunThe Set of Events 278*4882a593Smuzhiyun----------------- 279*4882a593Smuzhiyun 280*4882a593SmuzhiyunThere are a number of events that can be raised to an object state machine: 281*4882a593Smuzhiyun 282*4882a593Smuzhiyun FSCACHE_OBJECT_EV_UPDATE 283*4882a593Smuzhiyun The netfs requested that an object be updated. The state machine will ask 284*4882a593Smuzhiyun the cache backend to update the object, and the cache backend will ask the 285*4882a593Smuzhiyun netfs for details of the change through its cookie definition ops. 286*4882a593Smuzhiyun 287*4882a593Smuzhiyun FSCACHE_OBJECT_EV_CLEARED 288*4882a593Smuzhiyun This is signalled in two circumstances: 289*4882a593Smuzhiyun 290*4882a593Smuzhiyun (a) when an object's last child object is dropped and 291*4882a593Smuzhiyun 292*4882a593Smuzhiyun (b) when the last operation outstanding on an object is completed. 293*4882a593Smuzhiyun 294*4882a593Smuzhiyun This is used to proceed from the dying state. 295*4882a593Smuzhiyun 296*4882a593Smuzhiyun FSCACHE_OBJECT_EV_ERROR 297*4882a593Smuzhiyun This is signalled when an I/O error occurs during the processing of some 298*4882a593Smuzhiyun object. 299*4882a593Smuzhiyun 300*4882a593Smuzhiyun FSCACHE_OBJECT_EV_RELEASE, FSCACHE_OBJECT_EV_RETIRE 301*4882a593Smuzhiyun These are signalled when the netfs relinquishes a cookie it was using. 302*4882a593Smuzhiyun The event selected depends on whether the netfs asks for the backing 303*4882a593Smuzhiyun object to be retired (deleted) or retained. 304*4882a593Smuzhiyun 305*4882a593Smuzhiyun FSCACHE_OBJECT_EV_WITHDRAW 306*4882a593Smuzhiyun This is signalled when the cache backend wants to withdraw an object. 307*4882a593Smuzhiyun This means that the object will have to be detached from the netfs's 308*4882a593Smuzhiyun cookie. 309*4882a593Smuzhiyun 310*4882a593SmuzhiyunBecause the withdrawing releasing/retiring events are all handled by the object 311*4882a593Smuzhiyunstate machine, it doesn't matter if there's a collision with both ends trying 312*4882a593Smuzhiyunto sever the connection at the same time. The state machine can just pick 313*4882a593Smuzhiyunwhich one it wants to honour, and that effects the other. 314