1*4882a593Smuzhiyun========================================== 2*4882a593SmuzhiyunExplicit volatile write back cache control 3*4882a593Smuzhiyun========================================== 4*4882a593Smuzhiyun 5*4882a593SmuzhiyunIntroduction 6*4882a593Smuzhiyun------------ 7*4882a593Smuzhiyun 8*4882a593SmuzhiyunMany storage devices, especially in the consumer market, come with volatile 9*4882a593Smuzhiyunwrite back caches. That means the devices signal I/O completion to the 10*4882a593Smuzhiyunoperating system before data actually has hit the non-volatile storage. This 11*4882a593Smuzhiyunbehavior obviously speeds up various workloads, but it means the operating 12*4882a593Smuzhiyunsystem needs to force data out to the non-volatile storage when it performs 13*4882a593Smuzhiyuna data integrity operation like fsync, sync or an unmount. 14*4882a593Smuzhiyun 15*4882a593SmuzhiyunThe Linux block layer provides two simple mechanisms that let filesystems 16*4882a593Smuzhiyuncontrol the caching behavior of the storage device. These mechanisms are 17*4882a593Smuzhiyuna forced cache flush, and the Force Unit Access (FUA) flag for requests. 18*4882a593Smuzhiyun 19*4882a593Smuzhiyun 20*4882a593SmuzhiyunExplicit cache flushes 21*4882a593Smuzhiyun---------------------- 22*4882a593Smuzhiyun 23*4882a593SmuzhiyunThe REQ_PREFLUSH flag can be OR ed into the r/w flags of a bio submitted from 24*4882a593Smuzhiyunthe filesystem and will make sure the volatile cache of the storage device 25*4882a593Smuzhiyunhas been flushed before the actual I/O operation is started. This explicitly 26*4882a593Smuzhiyunguarantees that previously completed write requests are on non-volatile 27*4882a593Smuzhiyunstorage before the flagged bio starts. In addition the REQ_PREFLUSH flag can be 28*4882a593Smuzhiyunset on an otherwise empty bio structure, which causes only an explicit cache 29*4882a593Smuzhiyunflush without any dependent I/O. It is recommend to use 30*4882a593Smuzhiyunthe blkdev_issue_flush() helper for a pure cache flush. 31*4882a593Smuzhiyun 32*4882a593Smuzhiyun 33*4882a593SmuzhiyunForced Unit Access 34*4882a593Smuzhiyun------------------ 35*4882a593Smuzhiyun 36*4882a593SmuzhiyunThe REQ_FUA flag can be OR ed into the r/w flags of a bio submitted from the 37*4882a593Smuzhiyunfilesystem and will make sure that I/O completion for this request is only 38*4882a593Smuzhiyunsignaled after the data has been committed to non-volatile storage. 39*4882a593Smuzhiyun 40*4882a593Smuzhiyun 41*4882a593SmuzhiyunImplementation details for filesystems 42*4882a593Smuzhiyun-------------------------------------- 43*4882a593Smuzhiyun 44*4882a593SmuzhiyunFilesystems can simply set the REQ_PREFLUSH and REQ_FUA bits and do not have to 45*4882a593Smuzhiyunworry if the underlying devices need any explicit cache flushing and how 46*4882a593Smuzhiyunthe Forced Unit Access is implemented. The REQ_PREFLUSH and REQ_FUA flags 47*4882a593Smuzhiyunmay both be set on a single bio. 48*4882a593Smuzhiyun 49*4882a593Smuzhiyun 50*4882a593SmuzhiyunImplementation details for bio based block drivers 51*4882a593Smuzhiyun-------------------------------------------------------------- 52*4882a593Smuzhiyun 53*4882a593SmuzhiyunThese drivers will always see the REQ_PREFLUSH and REQ_FUA bits as they sit 54*4882a593Smuzhiyundirectly below the submit_bio interface. For remapping drivers the REQ_FUA 55*4882a593Smuzhiyunbits need to be propagated to underlying devices, and a global flush needs 56*4882a593Smuzhiyunto be implemented for bios with the REQ_PREFLUSH bit set. For real device 57*4882a593Smuzhiyundrivers that do not have a volatile cache the REQ_PREFLUSH and REQ_FUA bits 58*4882a593Smuzhiyunon non-empty bios can simply be ignored, and REQ_PREFLUSH requests without 59*4882a593Smuzhiyundata can be completed successfully without doing any work. Drivers for 60*4882a593Smuzhiyundevices with volatile caches need to implement the support for these 61*4882a593Smuzhiyunflags themselves without any help from the block layer. 62*4882a593Smuzhiyun 63*4882a593Smuzhiyun 64*4882a593SmuzhiyunImplementation details for request_fn based block drivers 65*4882a593Smuzhiyun--------------------------------------------------------- 66*4882a593Smuzhiyun 67*4882a593SmuzhiyunFor devices that do not support volatile write caches there is no driver 68*4882a593Smuzhiyunsupport required, the block layer completes empty REQ_PREFLUSH requests before 69*4882a593Smuzhiyunentering the driver and strips off the REQ_PREFLUSH and REQ_FUA bits from 70*4882a593Smuzhiyunrequests that have a payload. For devices with volatile write caches the 71*4882a593Smuzhiyundriver needs to tell the block layer that it supports flushing caches by 72*4882a593Smuzhiyundoing:: 73*4882a593Smuzhiyun 74*4882a593Smuzhiyun blk_queue_write_cache(sdkp->disk->queue, true, false); 75*4882a593Smuzhiyun 76*4882a593Smuzhiyunand handle empty REQ_OP_FLUSH requests in its prep_fn/request_fn. Note that 77*4882a593SmuzhiyunREQ_PREFLUSH requests with a payload are automatically turned into a sequence 78*4882a593Smuzhiyunof an empty REQ_OP_FLUSH request followed by the actual write by the block 79*4882a593Smuzhiyunlayer. For devices that also support the FUA bit the block layer needs 80*4882a593Smuzhiyunto be told to pass through the REQ_FUA bit using:: 81*4882a593Smuzhiyun 82*4882a593Smuzhiyun blk_queue_write_cache(sdkp->disk->queue, true, true); 83*4882a593Smuzhiyun 84*4882a593Smuzhiyunand the driver must handle write requests that have the REQ_FUA bit set 85*4882a593Smuzhiyunin prep_fn/request_fn. If the FUA bit is not natively supported the block 86*4882a593Smuzhiyunlayer turns it into an empty REQ_OP_FLUSH request after the actual write. 87