1*4882a593Smuzhiyun====================== 2*4882a593Smuzhiyunioctl based interfaces 3*4882a593Smuzhiyun====================== 4*4882a593Smuzhiyun 5*4882a593Smuzhiyunioctl() is the most common way for applications to interface 6*4882a593Smuzhiyunwith device drivers. It is flexible and easily extended by adding new 7*4882a593Smuzhiyuncommands and can be passed through character devices, block devices as 8*4882a593Smuzhiyunwell as sockets and other special file descriptors. 9*4882a593Smuzhiyun 10*4882a593SmuzhiyunHowever, it is also very easy to get ioctl command definitions wrong, 11*4882a593Smuzhiyunand hard to fix them later without breaking existing applications, 12*4882a593Smuzhiyunso this documentation tries to help developers get it right. 13*4882a593Smuzhiyun 14*4882a593SmuzhiyunCommand number definitions 15*4882a593Smuzhiyun========================== 16*4882a593Smuzhiyun 17*4882a593SmuzhiyunThe command number, or request number, is the second argument passed to 18*4882a593Smuzhiyunthe ioctl system call. While this can be any 32-bit number that uniquely 19*4882a593Smuzhiyunidentifies an action for a particular driver, there are a number of 20*4882a593Smuzhiyunconventions around defining them. 21*4882a593Smuzhiyun 22*4882a593Smuzhiyun``include/uapi/asm-generic/ioctl.h`` provides four macros for defining 23*4882a593Smuzhiyunioctl commands that follow modern conventions: ``_IO``, ``_IOR``, 24*4882a593Smuzhiyun``_IOW``, and ``_IOWR``. These should be used for all new commands, 25*4882a593Smuzhiyunwith the correct parameters: 26*4882a593Smuzhiyun 27*4882a593Smuzhiyun_IO/_IOR/_IOW/_IOWR 28*4882a593Smuzhiyun The macro name specifies how the argument will be used. It may be a 29*4882a593Smuzhiyun pointer to data to be passed into the kernel (_IOW), out of the kernel 30*4882a593Smuzhiyun (_IOR), or both (_IOWR). _IO can indicate either commands with no 31*4882a593Smuzhiyun argument or those passing an integer value instead of a pointer. 32*4882a593Smuzhiyun It is recommended to only use _IO for commands without arguments, 33*4882a593Smuzhiyun and use pointers for passing data. 34*4882a593Smuzhiyun 35*4882a593Smuzhiyuntype 36*4882a593Smuzhiyun An 8-bit number, often a character literal, specific to a subsystem 37*4882a593Smuzhiyun or driver, and listed in :doc:`../userspace-api/ioctl/ioctl-number` 38*4882a593Smuzhiyun 39*4882a593Smuzhiyunnr 40*4882a593Smuzhiyun An 8-bit number identifying the specific command, unique for a give 41*4882a593Smuzhiyun value of 'type' 42*4882a593Smuzhiyun 43*4882a593Smuzhiyundata_type 44*4882a593Smuzhiyun The name of the data type pointed to by the argument, the command number 45*4882a593Smuzhiyun encodes the ``sizeof(data_type)`` value in a 13-bit or 14-bit integer, 46*4882a593Smuzhiyun leading to a limit of 8191 bytes for the maximum size of the argument. 47*4882a593Smuzhiyun Note: do not pass sizeof(data_type) type into _IOR/_IOW/IOWR, as that 48*4882a593Smuzhiyun will lead to encoding sizeof(sizeof(data_type)), i.e. sizeof(size_t). 49*4882a593Smuzhiyun _IO does not have a data_type parameter. 50*4882a593Smuzhiyun 51*4882a593Smuzhiyun 52*4882a593SmuzhiyunInterface versions 53*4882a593Smuzhiyun================== 54*4882a593Smuzhiyun 55*4882a593SmuzhiyunSome subsystems use version numbers in data structures to overload 56*4882a593Smuzhiyuncommands with different interpretations of the argument. 57*4882a593Smuzhiyun 58*4882a593SmuzhiyunThis is generally a bad idea, since changes to existing commands tend 59*4882a593Smuzhiyunto break existing applications. 60*4882a593Smuzhiyun 61*4882a593SmuzhiyunA better approach is to add a new ioctl command with a new number. The 62*4882a593Smuzhiyunold command still needs to be implemented in the kernel for compatibility, 63*4882a593Smuzhiyunbut this can be a wrapper around the new implementation. 64*4882a593Smuzhiyun 65*4882a593SmuzhiyunReturn code 66*4882a593Smuzhiyun=========== 67*4882a593Smuzhiyun 68*4882a593Smuzhiyunioctl commands can return negative error codes as documented in errno(3); 69*4882a593Smuzhiyunthese get turned into errno values in user space. On success, the return 70*4882a593Smuzhiyuncode should be zero. It is also possible but not recommended to return 71*4882a593Smuzhiyuna positive 'long' value. 72*4882a593Smuzhiyun 73*4882a593SmuzhiyunWhen the ioctl callback is called with an unknown command number, the 74*4882a593Smuzhiyunhandler returns either -ENOTTY or -ENOIOCTLCMD, which also results in 75*4882a593Smuzhiyun-ENOTTY being returned from the system call. Some subsystems return 76*4882a593Smuzhiyun-ENOSYS or -EINVAL here for historic reasons, but this is wrong. 77*4882a593Smuzhiyun 78*4882a593SmuzhiyunPrior to Linux 5.5, compat_ioctl handlers were required to return 79*4882a593Smuzhiyun-ENOIOCTLCMD in order to use the fallback conversion into native 80*4882a593Smuzhiyuncommands. As all subsystems are now responsible for handling compat 81*4882a593Smuzhiyunmode themselves, this is no longer needed, but it may be important to 82*4882a593Smuzhiyunconsider when backporting bug fixes to older kernels. 83*4882a593Smuzhiyun 84*4882a593SmuzhiyunTimestamps 85*4882a593Smuzhiyun========== 86*4882a593Smuzhiyun 87*4882a593SmuzhiyunTraditionally, timestamps and timeout values are passed as ``struct 88*4882a593Smuzhiyuntimespec`` or ``struct timeval``, but these are problematic because of 89*4882a593Smuzhiyunincompatible definitions of these structures in user space after the 90*4882a593Smuzhiyunmove to 64-bit time_t. 91*4882a593Smuzhiyun 92*4882a593SmuzhiyunThe ``struct __kernel_timespec`` type can be used instead to be embedded 93*4882a593Smuzhiyunin other data structures when separate second/nanosecond values are 94*4882a593Smuzhiyundesired, or passed to user space directly. This is still not ideal though, 95*4882a593Smuzhiyunas the structure matches neither the kernel's timespec64 nor the user 96*4882a593Smuzhiyunspace timespec exactly. The get_timespec64() and put_timespec64() helper 97*4882a593Smuzhiyunfunctions can be used to ensure that the layout remains compatible with 98*4882a593Smuzhiyunuser space and the padding is treated correctly. 99*4882a593Smuzhiyun 100*4882a593SmuzhiyunAs it is cheap to convert seconds to nanoseconds, but the opposite 101*4882a593Smuzhiyunrequires an expensive 64-bit division, a simple __u64 nanosecond value 102*4882a593Smuzhiyuncan be simpler and more efficient. 103*4882a593Smuzhiyun 104*4882a593SmuzhiyunTimeout values and timestamps should ideally use CLOCK_MONOTONIC time, 105*4882a593Smuzhiyunas returned by ktime_get_ns() or ktime_get_ts64(). Unlike 106*4882a593SmuzhiyunCLOCK_REALTIME, this makes the timestamps immune from jumping backwards 107*4882a593Smuzhiyunor forwards due to leap second adjustments and clock_settime() calls. 108*4882a593Smuzhiyun 109*4882a593Smuzhiyunktime_get_real_ns() can be used for CLOCK_REALTIME timestamps that 110*4882a593Smuzhiyunneed to be persistent across a reboot or between multiple machines. 111*4882a593Smuzhiyun 112*4882a593Smuzhiyun32-bit compat mode 113*4882a593Smuzhiyun================== 114*4882a593Smuzhiyun 115*4882a593SmuzhiyunIn order to support 32-bit user space running on a 64-bit machine, each 116*4882a593Smuzhiyunsubsystem or driver that implements an ioctl callback handler must also 117*4882a593Smuzhiyunimplement the corresponding compat_ioctl handler. 118*4882a593Smuzhiyun 119*4882a593SmuzhiyunAs long as all the rules for data structures are followed, this is as 120*4882a593Smuzhiyuneasy as setting the .compat_ioctl pointer to a helper function such as 121*4882a593Smuzhiyuncompat_ptr_ioctl() or blkdev_compat_ptr_ioctl(). 122*4882a593Smuzhiyun 123*4882a593Smuzhiyuncompat_ptr() 124*4882a593Smuzhiyun------------ 125*4882a593Smuzhiyun 126*4882a593SmuzhiyunOn the s390 architecture, 31-bit user space has ambiguous representations 127*4882a593Smuzhiyunfor data pointers, with the upper bit being ignored. When running such 128*4882a593Smuzhiyuna process in compat mode, the compat_ptr() helper must be used to 129*4882a593Smuzhiyunclear the upper bit of a compat_uptr_t and turn it into a valid 64-bit 130*4882a593Smuzhiyunpointer. On other architectures, this macro only performs a cast to a 131*4882a593Smuzhiyun``void __user *`` pointer. 132*4882a593Smuzhiyun 133*4882a593SmuzhiyunIn an compat_ioctl() callback, the last argument is an unsigned long, 134*4882a593Smuzhiyunwhich can be interpreted as either a pointer or a scalar depending on 135*4882a593Smuzhiyunthe command. If it is a scalar, then compat_ptr() must not be used, to 136*4882a593Smuzhiyunensure that the 64-bit kernel behaves the same way as a 32-bit kernel 137*4882a593Smuzhiyunfor arguments with the upper bit set. 138*4882a593Smuzhiyun 139*4882a593SmuzhiyunThe compat_ptr_ioctl() helper can be used in place of a custom 140*4882a593Smuzhiyuncompat_ioctl file operation for drivers that only take arguments that 141*4882a593Smuzhiyunare pointers to compatible data structures. 142*4882a593Smuzhiyun 143*4882a593SmuzhiyunStructure layout 144*4882a593Smuzhiyun---------------- 145*4882a593Smuzhiyun 146*4882a593SmuzhiyunCompatible data structures have the same layout on all architectures, 147*4882a593Smuzhiyunavoiding all problematic members: 148*4882a593Smuzhiyun 149*4882a593Smuzhiyun* ``long`` and ``unsigned long`` are the size of a register, so 150*4882a593Smuzhiyun they can be either 32-bit or 64-bit wide and cannot be used in portable 151*4882a593Smuzhiyun data structures. Fixed-length replacements are ``__s32``, ``__u32``, 152*4882a593Smuzhiyun ``__s64`` and ``__u64``. 153*4882a593Smuzhiyun 154*4882a593Smuzhiyun* Pointers have the same problem, in addition to requiring the 155*4882a593Smuzhiyun use of compat_ptr(). The best workaround is to use ``__u64`` 156*4882a593Smuzhiyun in place of pointers, which requires a cast to ``uintptr_t`` in user 157*4882a593Smuzhiyun space, and the use of u64_to_user_ptr() in the kernel to convert 158*4882a593Smuzhiyun it back into a user pointer. 159*4882a593Smuzhiyun 160*4882a593Smuzhiyun* On the x86-32 (i386) architecture, the alignment of 64-bit variables 161*4882a593Smuzhiyun is only 32-bit, but they are naturally aligned on most other 162*4882a593Smuzhiyun architectures including x86-64. This means a structure like:: 163*4882a593Smuzhiyun 164*4882a593Smuzhiyun struct foo { 165*4882a593Smuzhiyun __u32 a; 166*4882a593Smuzhiyun __u64 b; 167*4882a593Smuzhiyun __u32 c; 168*4882a593Smuzhiyun }; 169*4882a593Smuzhiyun 170*4882a593Smuzhiyun has four bytes of padding between a and b on x86-64, plus another four 171*4882a593Smuzhiyun bytes of padding at the end, but no padding on i386, and it needs a 172*4882a593Smuzhiyun compat_ioctl conversion handler to translate between the two formats. 173*4882a593Smuzhiyun 174*4882a593Smuzhiyun To avoid this problem, all structures should have their members 175*4882a593Smuzhiyun naturally aligned, or explicit reserved fields added in place of the 176*4882a593Smuzhiyun implicit padding. The ``pahole`` tool can be used for checking the 177*4882a593Smuzhiyun alignment. 178*4882a593Smuzhiyun 179*4882a593Smuzhiyun* On ARM OABI user space, structures are padded to multiples of 32-bit, 180*4882a593Smuzhiyun making some structs incompatible with modern EABI kernels if they 181*4882a593Smuzhiyun do not end on a 32-bit boundary. 182*4882a593Smuzhiyun 183*4882a593Smuzhiyun* On the m68k architecture, struct members are not guaranteed to have an 184*4882a593Smuzhiyun alignment greater than 16-bit, which is a problem when relying on 185*4882a593Smuzhiyun implicit padding. 186*4882a593Smuzhiyun 187*4882a593Smuzhiyun* Bitfields and enums generally work as one would expect them to, 188*4882a593Smuzhiyun but some properties of them are implementation-defined, so it is better 189*4882a593Smuzhiyun to avoid them completely in ioctl interfaces. 190*4882a593Smuzhiyun 191*4882a593Smuzhiyun* ``char`` members can be either signed or unsigned, depending on 192*4882a593Smuzhiyun the architecture, so the __u8 and __s8 types should be used for 8-bit 193*4882a593Smuzhiyun integer values, though char arrays are clearer for fixed-length strings. 194*4882a593Smuzhiyun 195*4882a593SmuzhiyunInformation leaks 196*4882a593Smuzhiyun================= 197*4882a593Smuzhiyun 198*4882a593SmuzhiyunUninitialized data must not be copied back to user space, as this can 199*4882a593Smuzhiyuncause an information leak, which can be used to defeat kernel address 200*4882a593Smuzhiyunspace layout randomization (KASLR), helping in an attack. 201*4882a593Smuzhiyun 202*4882a593SmuzhiyunFor this reason (and for compat support) it is best to avoid any 203*4882a593Smuzhiyunimplicit padding in data structures. Where there is implicit padding 204*4882a593Smuzhiyunin an existing structure, kernel drivers must be careful to fully 205*4882a593Smuzhiyuninitialize an instance of the structure before copying it to user 206*4882a593Smuzhiyunspace. This is usually done by calling memset() before assigning to 207*4882a593Smuzhiyunindividual members. 208*4882a593Smuzhiyun 209*4882a593SmuzhiyunSubsystem abstractions 210*4882a593Smuzhiyun====================== 211*4882a593Smuzhiyun 212*4882a593SmuzhiyunWhile some device drivers implement their own ioctl function, most 213*4882a593Smuzhiyunsubsystems implement the same command for multiple drivers. Ideally the 214*4882a593Smuzhiyunsubsystem has an .ioctl() handler that copies the arguments from and 215*4882a593Smuzhiyunto user space, passing them into subsystem specific callback functions 216*4882a593Smuzhiyunthrough normal kernel pointers. 217*4882a593Smuzhiyun 218*4882a593SmuzhiyunThis helps in various ways: 219*4882a593Smuzhiyun 220*4882a593Smuzhiyun* Applications written for one driver are more likely to work for 221*4882a593Smuzhiyun another one in the same subsystem if there are no subtle differences 222*4882a593Smuzhiyun in the user space ABI. 223*4882a593Smuzhiyun 224*4882a593Smuzhiyun* The complexity of user space access and data structure layout is done 225*4882a593Smuzhiyun in one place, reducing the potential for implementation bugs. 226*4882a593Smuzhiyun 227*4882a593Smuzhiyun* It is more likely to be reviewed by experienced developers 228*4882a593Smuzhiyun that can spot problems in the interface when the ioctl is shared 229*4882a593Smuzhiyun between multiple drivers than when it is only used in a single driver. 230*4882a593Smuzhiyun 231*4882a593SmuzhiyunAlternatives to ioctl 232*4882a593Smuzhiyun===================== 233*4882a593Smuzhiyun 234*4882a593SmuzhiyunThere are many cases in which ioctl is not the best solution for a 235*4882a593Smuzhiyunproblem. Alternatives include: 236*4882a593Smuzhiyun 237*4882a593Smuzhiyun* System calls are a better choice for a system-wide feature that 238*4882a593Smuzhiyun is not tied to a physical device or constrained by the file system 239*4882a593Smuzhiyun permissions of a character device node 240*4882a593Smuzhiyun 241*4882a593Smuzhiyun* netlink is the preferred way of configuring any network related 242*4882a593Smuzhiyun objects through sockets. 243*4882a593Smuzhiyun 244*4882a593Smuzhiyun* debugfs is used for ad-hoc interfaces for debugging functionality 245*4882a593Smuzhiyun that does not need to be exposed as a stable interface to applications. 246*4882a593Smuzhiyun 247*4882a593Smuzhiyun* sysfs is a good way to expose the state of an in-kernel object 248*4882a593Smuzhiyun that is not tied to a file descriptor. 249*4882a593Smuzhiyun 250*4882a593Smuzhiyun* configfs can be used for more complex configuration than sysfs 251*4882a593Smuzhiyun 252*4882a593Smuzhiyun* A custom file system can provide extra flexibility with a simple 253*4882a593Smuzhiyun user interface but adds a lot of complexity to the implementation. 254