1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun==== 4*4882a593SmuzhiyunFUSE 5*4882a593Smuzhiyun==== 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunDefinitions 8*4882a593Smuzhiyun=========== 9*4882a593Smuzhiyun 10*4882a593SmuzhiyunUserspace filesystem: 11*4882a593Smuzhiyun A filesystem in which data and metadata are provided by an ordinary 12*4882a593Smuzhiyun userspace process. The filesystem can be accessed normally through 13*4882a593Smuzhiyun the kernel interface. 14*4882a593Smuzhiyun 15*4882a593SmuzhiyunFilesystem daemon: 16*4882a593Smuzhiyun The process(es) providing the data and metadata of the filesystem. 17*4882a593Smuzhiyun 18*4882a593SmuzhiyunNon-privileged mount (or user mount): 19*4882a593Smuzhiyun A userspace filesystem mounted by a non-privileged (non-root) user. 20*4882a593Smuzhiyun The filesystem daemon is running with the privileges of the mounting 21*4882a593Smuzhiyun user. NOTE: this is not the same as mounts allowed with the "user" 22*4882a593Smuzhiyun option in /etc/fstab, which is not discussed here. 23*4882a593Smuzhiyun 24*4882a593SmuzhiyunFilesystem connection: 25*4882a593Smuzhiyun A connection between the filesystem daemon and the kernel. The 26*4882a593Smuzhiyun connection exists until either the daemon dies, or the filesystem is 27*4882a593Smuzhiyun umounted. Note that detaching (or lazy umounting) the filesystem 28*4882a593Smuzhiyun does *not* break the connection, in this case it will exist until 29*4882a593Smuzhiyun the last reference to the filesystem is released. 30*4882a593Smuzhiyun 31*4882a593SmuzhiyunMount owner: 32*4882a593Smuzhiyun The user who does the mounting. 33*4882a593Smuzhiyun 34*4882a593SmuzhiyunUser: 35*4882a593Smuzhiyun The user who is performing filesystem operations. 36*4882a593Smuzhiyun 37*4882a593SmuzhiyunWhat is FUSE? 38*4882a593Smuzhiyun============= 39*4882a593Smuzhiyun 40*4882a593SmuzhiyunFUSE is a userspace filesystem framework. It consists of a kernel 41*4882a593Smuzhiyunmodule (fuse.ko), a userspace library (libfuse.*) and a mount utility 42*4882a593Smuzhiyun(fusermount). 43*4882a593Smuzhiyun 44*4882a593SmuzhiyunOne of the most important features of FUSE is allowing secure, 45*4882a593Smuzhiyunnon-privileged mounts. This opens up new possibilities for the use of 46*4882a593Smuzhiyunfilesystems. A good example is sshfs: a secure network filesystem 47*4882a593Smuzhiyunusing the sftp protocol. 48*4882a593Smuzhiyun 49*4882a593SmuzhiyunThe userspace library and utilities are available from the 50*4882a593Smuzhiyun`FUSE homepage: <https://github.com/libfuse/>`_ 51*4882a593Smuzhiyun 52*4882a593SmuzhiyunFilesystem type 53*4882a593Smuzhiyun=============== 54*4882a593Smuzhiyun 55*4882a593SmuzhiyunThe filesystem type given to mount(2) can be one of the following: 56*4882a593Smuzhiyun 57*4882a593Smuzhiyun fuse 58*4882a593Smuzhiyun This is the usual way to mount a FUSE filesystem. The first 59*4882a593Smuzhiyun argument of the mount system call may contain an arbitrary string, 60*4882a593Smuzhiyun which is not interpreted by the kernel. 61*4882a593Smuzhiyun 62*4882a593Smuzhiyun fuseblk 63*4882a593Smuzhiyun The filesystem is block device based. The first argument of the 64*4882a593Smuzhiyun mount system call is interpreted as the name of the device. 65*4882a593Smuzhiyun 66*4882a593SmuzhiyunMount options 67*4882a593Smuzhiyun============= 68*4882a593Smuzhiyun 69*4882a593Smuzhiyunfd=N 70*4882a593Smuzhiyun The file descriptor to use for communication between the userspace 71*4882a593Smuzhiyun filesystem and the kernel. The file descriptor must have been 72*4882a593Smuzhiyun obtained by opening the FUSE device ('/dev/fuse'). 73*4882a593Smuzhiyun 74*4882a593Smuzhiyunrootmode=M 75*4882a593Smuzhiyun The file mode of the filesystem's root in octal representation. 76*4882a593Smuzhiyun 77*4882a593Smuzhiyunuser_id=N 78*4882a593Smuzhiyun The numeric user id of the mount owner. 79*4882a593Smuzhiyun 80*4882a593Smuzhiyungroup_id=N 81*4882a593Smuzhiyun The numeric group id of the mount owner. 82*4882a593Smuzhiyun 83*4882a593Smuzhiyundefault_permissions 84*4882a593Smuzhiyun By default FUSE doesn't check file access permissions, the 85*4882a593Smuzhiyun filesystem is free to implement its access policy or leave it to 86*4882a593Smuzhiyun the underlying file access mechanism (e.g. in case of network 87*4882a593Smuzhiyun filesystems). This option enables permission checking, restricting 88*4882a593Smuzhiyun access based on file mode. It is usually useful together with the 89*4882a593Smuzhiyun 'allow_other' mount option. 90*4882a593Smuzhiyun 91*4882a593Smuzhiyunallow_other 92*4882a593Smuzhiyun This option overrides the security measure restricting file access 93*4882a593Smuzhiyun to the user mounting the filesystem. This option is by default only 94*4882a593Smuzhiyun allowed to root, but this restriction can be removed with a 95*4882a593Smuzhiyun (userspace) configuration option. 96*4882a593Smuzhiyun 97*4882a593Smuzhiyunmax_read=N 98*4882a593Smuzhiyun With this option the maximum size of read operations can be set. 99*4882a593Smuzhiyun The default is infinite. Note that the size of read requests is 100*4882a593Smuzhiyun limited anyway to 32 pages (which is 128kbyte on i386). 101*4882a593Smuzhiyun 102*4882a593Smuzhiyunblksize=N 103*4882a593Smuzhiyun Set the block size for the filesystem. The default is 512. This 104*4882a593Smuzhiyun option is only valid for 'fuseblk' type mounts. 105*4882a593Smuzhiyun 106*4882a593SmuzhiyunControl filesystem 107*4882a593Smuzhiyun================== 108*4882a593Smuzhiyun 109*4882a593SmuzhiyunThere's a control filesystem for FUSE, which can be mounted by:: 110*4882a593Smuzhiyun 111*4882a593Smuzhiyun mount -t fusectl none /sys/fs/fuse/connections 112*4882a593Smuzhiyun 113*4882a593SmuzhiyunMounting it under the '/sys/fs/fuse/connections' directory makes it 114*4882a593Smuzhiyunbackwards compatible with earlier versions. 115*4882a593Smuzhiyun 116*4882a593SmuzhiyunUnder the fuse control filesystem each connection has a directory 117*4882a593Smuzhiyunnamed by a unique number. 118*4882a593Smuzhiyun 119*4882a593SmuzhiyunFor each connection the following files exist within this directory: 120*4882a593Smuzhiyun 121*4882a593Smuzhiyun waiting 122*4882a593Smuzhiyun The number of requests which are waiting to be transferred to 123*4882a593Smuzhiyun userspace or being processed by the filesystem daemon. If there is 124*4882a593Smuzhiyun no filesystem activity and 'waiting' is non-zero, then the 125*4882a593Smuzhiyun filesystem is hung or deadlocked. 126*4882a593Smuzhiyun 127*4882a593Smuzhiyun abort 128*4882a593Smuzhiyun Writing anything into this file will abort the filesystem 129*4882a593Smuzhiyun connection. This means that all waiting requests will be aborted an 130*4882a593Smuzhiyun error returned for all aborted and new requests. 131*4882a593Smuzhiyun 132*4882a593SmuzhiyunOnly the owner of the mount may read or write these files. 133*4882a593Smuzhiyun 134*4882a593SmuzhiyunInterrupting filesystem operations 135*4882a593Smuzhiyun################################## 136*4882a593Smuzhiyun 137*4882a593SmuzhiyunIf a process issuing a FUSE filesystem request is interrupted, the 138*4882a593Smuzhiyunfollowing will happen: 139*4882a593Smuzhiyun 140*4882a593Smuzhiyun - If the request is not yet sent to userspace AND the signal is 141*4882a593Smuzhiyun fatal (SIGKILL or unhandled fatal signal), then the request is 142*4882a593Smuzhiyun dequeued and returns immediately. 143*4882a593Smuzhiyun 144*4882a593Smuzhiyun - If the request is not yet sent to userspace AND the signal is not 145*4882a593Smuzhiyun fatal, then an interrupted flag is set for the request. When 146*4882a593Smuzhiyun the request has been successfully transferred to userspace and 147*4882a593Smuzhiyun this flag is set, an INTERRUPT request is queued. 148*4882a593Smuzhiyun 149*4882a593Smuzhiyun - If the request is already sent to userspace, then an INTERRUPT 150*4882a593Smuzhiyun request is queued. 151*4882a593Smuzhiyun 152*4882a593SmuzhiyunINTERRUPT requests take precedence over other requests, so the 153*4882a593Smuzhiyunuserspace filesystem will receive queued INTERRUPTs before any others. 154*4882a593Smuzhiyun 155*4882a593SmuzhiyunThe userspace filesystem may ignore the INTERRUPT requests entirely, 156*4882a593Smuzhiyunor may honor them by sending a reply to the *original* request, with 157*4882a593Smuzhiyunthe error set to EINTR. 158*4882a593Smuzhiyun 159*4882a593SmuzhiyunIt is also possible that there's a race between processing the 160*4882a593Smuzhiyunoriginal request and its INTERRUPT request. There are two possibilities: 161*4882a593Smuzhiyun 162*4882a593Smuzhiyun 1. The INTERRUPT request is processed before the original request is 163*4882a593Smuzhiyun processed 164*4882a593Smuzhiyun 165*4882a593Smuzhiyun 2. The INTERRUPT request is processed after the original request has 166*4882a593Smuzhiyun been answered 167*4882a593Smuzhiyun 168*4882a593SmuzhiyunIf the filesystem cannot find the original request, it should wait for 169*4882a593Smuzhiyunsome timeout and/or a number of new requests to arrive, after which it 170*4882a593Smuzhiyunshould reply to the INTERRUPT request with an EAGAIN error. In case 171*4882a593Smuzhiyun1) the INTERRUPT request will be requeued. In case 2) the INTERRUPT 172*4882a593Smuzhiyunreply will be ignored. 173*4882a593Smuzhiyun 174*4882a593SmuzhiyunAborting a filesystem connection 175*4882a593Smuzhiyun================================ 176*4882a593Smuzhiyun 177*4882a593SmuzhiyunIt is possible to get into certain situations where the filesystem is 178*4882a593Smuzhiyunnot responding. Reasons for this may be: 179*4882a593Smuzhiyun 180*4882a593Smuzhiyun a) Broken userspace filesystem implementation 181*4882a593Smuzhiyun 182*4882a593Smuzhiyun b) Network connection down 183*4882a593Smuzhiyun 184*4882a593Smuzhiyun c) Accidental deadlock 185*4882a593Smuzhiyun 186*4882a593Smuzhiyun d) Malicious deadlock 187*4882a593Smuzhiyun 188*4882a593Smuzhiyun(For more on c) and d) see later sections) 189*4882a593Smuzhiyun 190*4882a593SmuzhiyunIn either of these cases it may be useful to abort the connection to 191*4882a593Smuzhiyunthe filesystem. There are several ways to do this: 192*4882a593Smuzhiyun 193*4882a593Smuzhiyun - Kill the filesystem daemon. Works in case of a) and b) 194*4882a593Smuzhiyun 195*4882a593Smuzhiyun - Kill the filesystem daemon and all users of the filesystem. Works 196*4882a593Smuzhiyun in all cases except some malicious deadlocks 197*4882a593Smuzhiyun 198*4882a593Smuzhiyun - Use forced umount (umount -f). Works in all cases but only if 199*4882a593Smuzhiyun filesystem is still attached (it hasn't been lazy unmounted) 200*4882a593Smuzhiyun 201*4882a593Smuzhiyun - Abort filesystem through the FUSE control filesystem. Most 202*4882a593Smuzhiyun powerful method, always works. 203*4882a593Smuzhiyun 204*4882a593SmuzhiyunHow do non-privileged mounts work? 205*4882a593Smuzhiyun================================== 206*4882a593Smuzhiyun 207*4882a593SmuzhiyunSince the mount() system call is a privileged operation, a helper 208*4882a593Smuzhiyunprogram (fusermount) is needed, which is installed setuid root. 209*4882a593Smuzhiyun 210*4882a593SmuzhiyunThe implication of providing non-privileged mounts is that the mount 211*4882a593Smuzhiyunowner must not be able to use this capability to compromise the 212*4882a593Smuzhiyunsystem. Obvious requirements arising from this are: 213*4882a593Smuzhiyun 214*4882a593Smuzhiyun A) mount owner should not be able to get elevated privileges with the 215*4882a593Smuzhiyun help of the mounted filesystem 216*4882a593Smuzhiyun 217*4882a593Smuzhiyun B) mount owner should not get illegitimate access to information from 218*4882a593Smuzhiyun other users' and the super user's processes 219*4882a593Smuzhiyun 220*4882a593Smuzhiyun C) mount owner should not be able to induce undesired behavior in 221*4882a593Smuzhiyun other users' or the super user's processes 222*4882a593Smuzhiyun 223*4882a593SmuzhiyunHow are requirements fulfilled? 224*4882a593Smuzhiyun=============================== 225*4882a593Smuzhiyun 226*4882a593Smuzhiyun A) The mount owner could gain elevated privileges by either: 227*4882a593Smuzhiyun 228*4882a593Smuzhiyun 1. creating a filesystem containing a device file, then opening this device 229*4882a593Smuzhiyun 230*4882a593Smuzhiyun 2. creating a filesystem containing a suid or sgid application, then executing this application 231*4882a593Smuzhiyun 232*4882a593Smuzhiyun The solution is not to allow opening device files and ignore 233*4882a593Smuzhiyun setuid and setgid bits when executing programs. To ensure this 234*4882a593Smuzhiyun fusermount always adds "nosuid" and "nodev" to the mount options 235*4882a593Smuzhiyun for non-privileged mounts. 236*4882a593Smuzhiyun 237*4882a593Smuzhiyun B) If another user is accessing files or directories in the 238*4882a593Smuzhiyun filesystem, the filesystem daemon serving requests can record the 239*4882a593Smuzhiyun exact sequence and timing of operations performed. This 240*4882a593Smuzhiyun information is otherwise inaccessible to the mount owner, so this 241*4882a593Smuzhiyun counts as an information leak. 242*4882a593Smuzhiyun 243*4882a593Smuzhiyun The solution to this problem will be presented in point 2) of C). 244*4882a593Smuzhiyun 245*4882a593Smuzhiyun C) There are several ways in which the mount owner can induce 246*4882a593Smuzhiyun undesired behavior in other users' processes, such as: 247*4882a593Smuzhiyun 248*4882a593Smuzhiyun 1) mounting a filesystem over a file or directory which the mount 249*4882a593Smuzhiyun owner could otherwise not be able to modify (or could only 250*4882a593Smuzhiyun make limited modifications). 251*4882a593Smuzhiyun 252*4882a593Smuzhiyun This is solved in fusermount, by checking the access 253*4882a593Smuzhiyun permissions on the mountpoint and only allowing the mount if 254*4882a593Smuzhiyun the mount owner can do unlimited modification (has write 255*4882a593Smuzhiyun access to the mountpoint, and mountpoint is not a "sticky" 256*4882a593Smuzhiyun directory) 257*4882a593Smuzhiyun 258*4882a593Smuzhiyun 2) Even if 1) is solved the mount owner can change the behavior 259*4882a593Smuzhiyun of other users' processes. 260*4882a593Smuzhiyun 261*4882a593Smuzhiyun i) It can slow down or indefinitely delay the execution of a 262*4882a593Smuzhiyun filesystem operation creating a DoS against the user or the 263*4882a593Smuzhiyun whole system. For example a suid application locking a 264*4882a593Smuzhiyun system file, and then accessing a file on the mount owner's 265*4882a593Smuzhiyun filesystem could be stopped, and thus causing the system 266*4882a593Smuzhiyun file to be locked forever. 267*4882a593Smuzhiyun 268*4882a593Smuzhiyun ii) It can present files or directories of unlimited length, or 269*4882a593Smuzhiyun directory structures of unlimited depth, possibly causing a 270*4882a593Smuzhiyun system process to eat up diskspace, memory or other 271*4882a593Smuzhiyun resources, again causing *DoS*. 272*4882a593Smuzhiyun 273*4882a593Smuzhiyun The solution to this as well as B) is not to allow processes 274*4882a593Smuzhiyun to access the filesystem, which could otherwise not be 275*4882a593Smuzhiyun monitored or manipulated by the mount owner. Since if the 276*4882a593Smuzhiyun mount owner can ptrace a process, it can do all of the above 277*4882a593Smuzhiyun without using a FUSE mount, the same criteria as used in 278*4882a593Smuzhiyun ptrace can be used to check if a process is allowed to access 279*4882a593Smuzhiyun the filesystem or not. 280*4882a593Smuzhiyun 281*4882a593Smuzhiyun Note that the *ptrace* check is not strictly necessary to 282*4882a593Smuzhiyun prevent B/2/i, it is enough to check if mount owner has enough 283*4882a593Smuzhiyun privilege to send signal to the process accessing the 284*4882a593Smuzhiyun filesystem, since *SIGSTOP* can be used to get a similar effect. 285*4882a593Smuzhiyun 286*4882a593SmuzhiyunI think these limitations are unacceptable? 287*4882a593Smuzhiyun=========================================== 288*4882a593Smuzhiyun 289*4882a593SmuzhiyunIf a sysadmin trusts the users enough, or can ensure through other 290*4882a593Smuzhiyunmeasures, that system processes will never enter non-privileged 291*4882a593Smuzhiyunmounts, it can relax the last limitation with a 'user_allow_other' 292*4882a593Smuzhiyunconfig option. If this config option is set, the mounting user can 293*4882a593Smuzhiyunadd the 'allow_other' mount option which disables the check for other 294*4882a593Smuzhiyunusers' processes. 295*4882a593Smuzhiyun 296*4882a593SmuzhiyunKernel - userspace interface 297*4882a593Smuzhiyun============================ 298*4882a593Smuzhiyun 299*4882a593SmuzhiyunThe following diagram shows how a filesystem operation (in this 300*4882a593Smuzhiyunexample unlink) is performed in FUSE. :: 301*4882a593Smuzhiyun 302*4882a593Smuzhiyun 303*4882a593Smuzhiyun | "rm /mnt/fuse/file" | FUSE filesystem daemon 304*4882a593Smuzhiyun | | 305*4882a593Smuzhiyun | | >sys_read() 306*4882a593Smuzhiyun | | >fuse_dev_read() 307*4882a593Smuzhiyun | | >request_wait() 308*4882a593Smuzhiyun | | [sleep on fc->waitq] 309*4882a593Smuzhiyun | | 310*4882a593Smuzhiyun | >sys_unlink() | 311*4882a593Smuzhiyun | >fuse_unlink() | 312*4882a593Smuzhiyun | [get request from | 313*4882a593Smuzhiyun | fc->unused_list] | 314*4882a593Smuzhiyun | >request_send() | 315*4882a593Smuzhiyun | [queue req on fc->pending] | 316*4882a593Smuzhiyun | [wake up fc->waitq] | [woken up] 317*4882a593Smuzhiyun | >request_wait_answer() | 318*4882a593Smuzhiyun | [sleep on req->waitq] | 319*4882a593Smuzhiyun | | <request_wait() 320*4882a593Smuzhiyun | | [remove req from fc->pending] 321*4882a593Smuzhiyun | | [copy req to read buffer] 322*4882a593Smuzhiyun | | [add req to fc->processing] 323*4882a593Smuzhiyun | | <fuse_dev_read() 324*4882a593Smuzhiyun | | <sys_read() 325*4882a593Smuzhiyun | | 326*4882a593Smuzhiyun | | [perform unlink] 327*4882a593Smuzhiyun | | 328*4882a593Smuzhiyun | | >sys_write() 329*4882a593Smuzhiyun | | >fuse_dev_write() 330*4882a593Smuzhiyun | | [look up req in fc->processing] 331*4882a593Smuzhiyun | | [remove from fc->processing] 332*4882a593Smuzhiyun | | [copy write buffer to req] 333*4882a593Smuzhiyun | [woken up] | [wake up req->waitq] 334*4882a593Smuzhiyun | | <fuse_dev_write() 335*4882a593Smuzhiyun | | <sys_write() 336*4882a593Smuzhiyun | <request_wait_answer() | 337*4882a593Smuzhiyun | <request_send() | 338*4882a593Smuzhiyun | [add request to | 339*4882a593Smuzhiyun | fc->unused_list] | 340*4882a593Smuzhiyun | <fuse_unlink() | 341*4882a593Smuzhiyun | <sys_unlink() | 342*4882a593Smuzhiyun 343*4882a593Smuzhiyun.. note:: Everything in the description above is greatly simplified 344*4882a593Smuzhiyun 345*4882a593SmuzhiyunThere are a couple of ways in which to deadlock a FUSE filesystem. 346*4882a593SmuzhiyunSince we are talking about unprivileged userspace programs, 347*4882a593Smuzhiyunsomething must be done about these. 348*4882a593Smuzhiyun 349*4882a593Smuzhiyun**Scenario 1 - Simple deadlock**:: 350*4882a593Smuzhiyun 351*4882a593Smuzhiyun | "rm /mnt/fuse/file" | FUSE filesystem daemon 352*4882a593Smuzhiyun | | 353*4882a593Smuzhiyun | >sys_unlink("/mnt/fuse/file") | 354*4882a593Smuzhiyun | [acquire inode semaphore | 355*4882a593Smuzhiyun | for "file"] | 356*4882a593Smuzhiyun | >fuse_unlink() | 357*4882a593Smuzhiyun | [sleep on req->waitq] | 358*4882a593Smuzhiyun | | <sys_read() 359*4882a593Smuzhiyun | | >sys_unlink("/mnt/fuse/file") 360*4882a593Smuzhiyun | | [acquire inode semaphore 361*4882a593Smuzhiyun | | for "file"] 362*4882a593Smuzhiyun | | *DEADLOCK* 363*4882a593Smuzhiyun 364*4882a593SmuzhiyunThe solution for this is to allow the filesystem to be aborted. 365*4882a593Smuzhiyun 366*4882a593Smuzhiyun**Scenario 2 - Tricky deadlock** 367*4882a593Smuzhiyun 368*4882a593Smuzhiyun 369*4882a593SmuzhiyunThis one needs a carefully crafted filesystem. It's a variation on 370*4882a593Smuzhiyunthe above, only the call back to the filesystem is not explicit, 371*4882a593Smuzhiyunbut is caused by a pagefault. :: 372*4882a593Smuzhiyun 373*4882a593Smuzhiyun | Kamikaze filesystem thread 1 | Kamikaze filesystem thread 2 374*4882a593Smuzhiyun | | 375*4882a593Smuzhiyun | [fd = open("/mnt/fuse/file")] | [request served normally] 376*4882a593Smuzhiyun | [mmap fd to 'addr'] | 377*4882a593Smuzhiyun | [close fd] | [FLUSH triggers 'magic' flag] 378*4882a593Smuzhiyun | [read a byte from addr] | 379*4882a593Smuzhiyun | >do_page_fault() | 380*4882a593Smuzhiyun | [find or create page] | 381*4882a593Smuzhiyun | [lock page] | 382*4882a593Smuzhiyun | >fuse_readpage() | 383*4882a593Smuzhiyun | [queue READ request] | 384*4882a593Smuzhiyun | [sleep on req->waitq] | 385*4882a593Smuzhiyun | | [read request to buffer] 386*4882a593Smuzhiyun | | [create reply header before addr] 387*4882a593Smuzhiyun | | >sys_write(addr - headerlength) 388*4882a593Smuzhiyun | | >fuse_dev_write() 389*4882a593Smuzhiyun | | [look up req in fc->processing] 390*4882a593Smuzhiyun | | [remove from fc->processing] 391*4882a593Smuzhiyun | | [copy write buffer to req] 392*4882a593Smuzhiyun | | >do_page_fault() 393*4882a593Smuzhiyun | | [find or create page] 394*4882a593Smuzhiyun | | [lock page] 395*4882a593Smuzhiyun | | * DEADLOCK * 396*4882a593Smuzhiyun 397*4882a593SmuzhiyunThe solution is basically the same as above. 398*4882a593Smuzhiyun 399*4882a593SmuzhiyunAn additional problem is that while the write buffer is being copied 400*4882a593Smuzhiyunto the request, the request must not be interrupted/aborted. This is 401*4882a593Smuzhiyunbecause the destination address of the copy may not be valid after the 402*4882a593Smuzhiyunrequest has returned. 403*4882a593Smuzhiyun 404*4882a593SmuzhiyunThis is solved with doing the copy atomically, and allowing abort 405*4882a593Smuzhiyunwhile the page(s) belonging to the write buffer are faulted with 406*4882a593Smuzhiyunget_user_pages(). The 'req->locked' flag indicates when the copy is 407*4882a593Smuzhiyuntaking place, and abort is delayed until this flag is unset. 408