xref: /OK3568_Linux_fs/kernel/Documentation/driver-api/ioctl.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun======================
2*4882a593Smuzhiyunioctl based interfaces
3*4882a593Smuzhiyun======================
4*4882a593Smuzhiyun
5*4882a593Smuzhiyunioctl() is the most common way for applications to interface
6*4882a593Smuzhiyunwith device drivers. It is flexible and easily extended by adding new
7*4882a593Smuzhiyuncommands and can be passed through character devices, block devices as
8*4882a593Smuzhiyunwell as sockets and other special file descriptors.
9*4882a593Smuzhiyun
10*4882a593SmuzhiyunHowever, it is also very easy to get ioctl command definitions wrong,
11*4882a593Smuzhiyunand hard to fix them later without breaking existing applications,
12*4882a593Smuzhiyunso this documentation tries to help developers get it right.
13*4882a593Smuzhiyun
14*4882a593SmuzhiyunCommand number definitions
15*4882a593Smuzhiyun==========================
16*4882a593Smuzhiyun
17*4882a593SmuzhiyunThe command number, or request number, is the second argument passed to
18*4882a593Smuzhiyunthe ioctl system call. While this can be any 32-bit number that uniquely
19*4882a593Smuzhiyunidentifies an action for a particular driver, there are a number of
20*4882a593Smuzhiyunconventions around defining them.
21*4882a593Smuzhiyun
22*4882a593Smuzhiyun``include/uapi/asm-generic/ioctl.h`` provides four macros for defining
23*4882a593Smuzhiyunioctl commands that follow modern conventions: ``_IO``, ``_IOR``,
24*4882a593Smuzhiyun``_IOW``, and ``_IOWR``. These should be used for all new commands,
25*4882a593Smuzhiyunwith the correct parameters:
26*4882a593Smuzhiyun
27*4882a593Smuzhiyun_IO/_IOR/_IOW/_IOWR
28*4882a593Smuzhiyun   The macro name specifies how the argument will be used.  It may be a
29*4882a593Smuzhiyun   pointer to data to be passed into the kernel (_IOW), out of the kernel
30*4882a593Smuzhiyun   (_IOR), or both (_IOWR).  _IO can indicate either commands with no
31*4882a593Smuzhiyun   argument or those passing an integer value instead of a pointer.
32*4882a593Smuzhiyun   It is recommended to only use _IO for commands without arguments,
33*4882a593Smuzhiyun   and use pointers for passing data.
34*4882a593Smuzhiyun
35*4882a593Smuzhiyuntype
36*4882a593Smuzhiyun   An 8-bit number, often a character literal, specific to a subsystem
37*4882a593Smuzhiyun   or driver, and listed in :doc:`../userspace-api/ioctl/ioctl-number`
38*4882a593Smuzhiyun
39*4882a593Smuzhiyunnr
40*4882a593Smuzhiyun  An 8-bit number identifying the specific command, unique for a give
41*4882a593Smuzhiyun  value of 'type'
42*4882a593Smuzhiyun
43*4882a593Smuzhiyundata_type
44*4882a593Smuzhiyun  The name of the data type pointed to by the argument, the command number
45*4882a593Smuzhiyun  encodes the ``sizeof(data_type)`` value in a 13-bit or 14-bit integer,
46*4882a593Smuzhiyun  leading to a limit of 8191 bytes for the maximum size of the argument.
47*4882a593Smuzhiyun  Note: do not pass sizeof(data_type) type into _IOR/_IOW/IOWR, as that
48*4882a593Smuzhiyun  will lead to encoding sizeof(sizeof(data_type)), i.e. sizeof(size_t).
49*4882a593Smuzhiyun  _IO does not have a data_type parameter.
50*4882a593Smuzhiyun
51*4882a593Smuzhiyun
52*4882a593SmuzhiyunInterface versions
53*4882a593Smuzhiyun==================
54*4882a593Smuzhiyun
55*4882a593SmuzhiyunSome subsystems use version numbers in data structures to overload
56*4882a593Smuzhiyuncommands with different interpretations of the argument.
57*4882a593Smuzhiyun
58*4882a593SmuzhiyunThis is generally a bad idea, since changes to existing commands tend
59*4882a593Smuzhiyunto break existing applications.
60*4882a593Smuzhiyun
61*4882a593SmuzhiyunA better approach is to add a new ioctl command with a new number. The
62*4882a593Smuzhiyunold command still needs to be implemented in the kernel for compatibility,
63*4882a593Smuzhiyunbut this can be a wrapper around the new implementation.
64*4882a593Smuzhiyun
65*4882a593SmuzhiyunReturn code
66*4882a593Smuzhiyun===========
67*4882a593Smuzhiyun
68*4882a593Smuzhiyunioctl commands can return negative error codes as documented in errno(3);
69*4882a593Smuzhiyunthese get turned into errno values in user space. On success, the return
70*4882a593Smuzhiyuncode should be zero. It is also possible but not recommended to return
71*4882a593Smuzhiyuna positive 'long' value.
72*4882a593Smuzhiyun
73*4882a593SmuzhiyunWhen the ioctl callback is called with an unknown command number, the
74*4882a593Smuzhiyunhandler returns either -ENOTTY or -ENOIOCTLCMD, which also results in
75*4882a593Smuzhiyun-ENOTTY being returned from the system call. Some subsystems return
76*4882a593Smuzhiyun-ENOSYS or -EINVAL here for historic reasons, but this is wrong.
77*4882a593Smuzhiyun
78*4882a593SmuzhiyunPrior to Linux 5.5, compat_ioctl handlers were required to return
79*4882a593Smuzhiyun-ENOIOCTLCMD in order to use the fallback conversion into native
80*4882a593Smuzhiyuncommands. As all subsystems are now responsible for handling compat
81*4882a593Smuzhiyunmode themselves, this is no longer needed, but it may be important to
82*4882a593Smuzhiyunconsider when backporting bug fixes to older kernels.
83*4882a593Smuzhiyun
84*4882a593SmuzhiyunTimestamps
85*4882a593Smuzhiyun==========
86*4882a593Smuzhiyun
87*4882a593SmuzhiyunTraditionally, timestamps and timeout values are passed as ``struct
88*4882a593Smuzhiyuntimespec`` or ``struct timeval``, but these are problematic because of
89*4882a593Smuzhiyunincompatible definitions of these structures in user space after the
90*4882a593Smuzhiyunmove to 64-bit time_t.
91*4882a593Smuzhiyun
92*4882a593SmuzhiyunThe ``struct __kernel_timespec`` type can be used instead to be embedded
93*4882a593Smuzhiyunin other data structures when separate second/nanosecond values are
94*4882a593Smuzhiyundesired, or passed to user space directly. This is still not ideal though,
95*4882a593Smuzhiyunas the structure matches neither the kernel's timespec64 nor the user
96*4882a593Smuzhiyunspace timespec exactly. The get_timespec64() and put_timespec64() helper
97*4882a593Smuzhiyunfunctions can be used to ensure that the layout remains compatible with
98*4882a593Smuzhiyunuser space and the padding is treated correctly.
99*4882a593Smuzhiyun
100*4882a593SmuzhiyunAs it is cheap to convert seconds to nanoseconds, but the opposite
101*4882a593Smuzhiyunrequires an expensive 64-bit division, a simple __u64 nanosecond value
102*4882a593Smuzhiyuncan be simpler and more efficient.
103*4882a593Smuzhiyun
104*4882a593SmuzhiyunTimeout values and timestamps should ideally use CLOCK_MONOTONIC time,
105*4882a593Smuzhiyunas returned by ktime_get_ns() or ktime_get_ts64().  Unlike
106*4882a593SmuzhiyunCLOCK_REALTIME, this makes the timestamps immune from jumping backwards
107*4882a593Smuzhiyunor forwards due to leap second adjustments and clock_settime() calls.
108*4882a593Smuzhiyun
109*4882a593Smuzhiyunktime_get_real_ns() can be used for CLOCK_REALTIME timestamps that
110*4882a593Smuzhiyunneed to be persistent across a reboot or between multiple machines.
111*4882a593Smuzhiyun
112*4882a593Smuzhiyun32-bit compat mode
113*4882a593Smuzhiyun==================
114*4882a593Smuzhiyun
115*4882a593SmuzhiyunIn order to support 32-bit user space running on a 64-bit machine, each
116*4882a593Smuzhiyunsubsystem or driver that implements an ioctl callback handler must also
117*4882a593Smuzhiyunimplement the corresponding compat_ioctl handler.
118*4882a593Smuzhiyun
119*4882a593SmuzhiyunAs long as all the rules for data structures are followed, this is as
120*4882a593Smuzhiyuneasy as setting the .compat_ioctl pointer to a helper function such as
121*4882a593Smuzhiyuncompat_ptr_ioctl() or blkdev_compat_ptr_ioctl().
122*4882a593Smuzhiyun
123*4882a593Smuzhiyuncompat_ptr()
124*4882a593Smuzhiyun------------
125*4882a593Smuzhiyun
126*4882a593SmuzhiyunOn the s390 architecture, 31-bit user space has ambiguous representations
127*4882a593Smuzhiyunfor data pointers, with the upper bit being ignored. When running such
128*4882a593Smuzhiyuna process in compat mode, the compat_ptr() helper must be used to
129*4882a593Smuzhiyunclear the upper bit of a compat_uptr_t and turn it into a valid 64-bit
130*4882a593Smuzhiyunpointer.  On other architectures, this macro only performs a cast to a
131*4882a593Smuzhiyun``void __user *`` pointer.
132*4882a593Smuzhiyun
133*4882a593SmuzhiyunIn an compat_ioctl() callback, the last argument is an unsigned long,
134*4882a593Smuzhiyunwhich can be interpreted as either a pointer or a scalar depending on
135*4882a593Smuzhiyunthe command. If it is a scalar, then compat_ptr() must not be used, to
136*4882a593Smuzhiyunensure that the 64-bit kernel behaves the same way as a 32-bit kernel
137*4882a593Smuzhiyunfor arguments with the upper bit set.
138*4882a593Smuzhiyun
139*4882a593SmuzhiyunThe compat_ptr_ioctl() helper can be used in place of a custom
140*4882a593Smuzhiyuncompat_ioctl file operation for drivers that only take arguments that
141*4882a593Smuzhiyunare pointers to compatible data structures.
142*4882a593Smuzhiyun
143*4882a593SmuzhiyunStructure layout
144*4882a593Smuzhiyun----------------
145*4882a593Smuzhiyun
146*4882a593SmuzhiyunCompatible data structures have the same layout on all architectures,
147*4882a593Smuzhiyunavoiding all problematic members:
148*4882a593Smuzhiyun
149*4882a593Smuzhiyun* ``long`` and ``unsigned long`` are the size of a register, so
150*4882a593Smuzhiyun  they can be either 32-bit or 64-bit wide and cannot be used in portable
151*4882a593Smuzhiyun  data structures. Fixed-length replacements are ``__s32``, ``__u32``,
152*4882a593Smuzhiyun  ``__s64`` and ``__u64``.
153*4882a593Smuzhiyun
154*4882a593Smuzhiyun* Pointers have the same problem, in addition to requiring the
155*4882a593Smuzhiyun  use of compat_ptr(). The best workaround is to use ``__u64``
156*4882a593Smuzhiyun  in place of pointers, which requires a cast to ``uintptr_t`` in user
157*4882a593Smuzhiyun  space, and the use of u64_to_user_ptr() in the kernel to convert
158*4882a593Smuzhiyun  it back into a user pointer.
159*4882a593Smuzhiyun
160*4882a593Smuzhiyun* On the x86-32 (i386) architecture, the alignment of 64-bit variables
161*4882a593Smuzhiyun  is only 32-bit, but they are naturally aligned on most other
162*4882a593Smuzhiyun  architectures including x86-64. This means a structure like::
163*4882a593Smuzhiyun
164*4882a593Smuzhiyun    struct foo {
165*4882a593Smuzhiyun        __u32 a;
166*4882a593Smuzhiyun        __u64 b;
167*4882a593Smuzhiyun        __u32 c;
168*4882a593Smuzhiyun    };
169*4882a593Smuzhiyun
170*4882a593Smuzhiyun  has four bytes of padding between a and b on x86-64, plus another four
171*4882a593Smuzhiyun  bytes of padding at the end, but no padding on i386, and it needs a
172*4882a593Smuzhiyun  compat_ioctl conversion handler to translate between the two formats.
173*4882a593Smuzhiyun
174*4882a593Smuzhiyun  To avoid this problem, all structures should have their members
175*4882a593Smuzhiyun  naturally aligned, or explicit reserved fields added in place of the
176*4882a593Smuzhiyun  implicit padding. The ``pahole`` tool can be used for checking the
177*4882a593Smuzhiyun  alignment.
178*4882a593Smuzhiyun
179*4882a593Smuzhiyun* On ARM OABI user space, structures are padded to multiples of 32-bit,
180*4882a593Smuzhiyun  making some structs incompatible with modern EABI kernels if they
181*4882a593Smuzhiyun  do not end on a 32-bit boundary.
182*4882a593Smuzhiyun
183*4882a593Smuzhiyun* On the m68k architecture, struct members are not guaranteed to have an
184*4882a593Smuzhiyun  alignment greater than 16-bit, which is a problem when relying on
185*4882a593Smuzhiyun  implicit padding.
186*4882a593Smuzhiyun
187*4882a593Smuzhiyun* Bitfields and enums generally work as one would expect them to,
188*4882a593Smuzhiyun  but some properties of them are implementation-defined, so it is better
189*4882a593Smuzhiyun  to avoid them completely in ioctl interfaces.
190*4882a593Smuzhiyun
191*4882a593Smuzhiyun* ``char`` members can be either signed or unsigned, depending on
192*4882a593Smuzhiyun  the architecture, so the __u8 and __s8 types should be used for 8-bit
193*4882a593Smuzhiyun  integer values, though char arrays are clearer for fixed-length strings.
194*4882a593Smuzhiyun
195*4882a593SmuzhiyunInformation leaks
196*4882a593Smuzhiyun=================
197*4882a593Smuzhiyun
198*4882a593SmuzhiyunUninitialized data must not be copied back to user space, as this can
199*4882a593Smuzhiyuncause an information leak, which can be used to defeat kernel address
200*4882a593Smuzhiyunspace layout randomization (KASLR), helping in an attack.
201*4882a593Smuzhiyun
202*4882a593SmuzhiyunFor this reason (and for compat support) it is best to avoid any
203*4882a593Smuzhiyunimplicit padding in data structures.  Where there is implicit padding
204*4882a593Smuzhiyunin an existing structure, kernel drivers must be careful to fully
205*4882a593Smuzhiyuninitialize an instance of the structure before copying it to user
206*4882a593Smuzhiyunspace.  This is usually done by calling memset() before assigning to
207*4882a593Smuzhiyunindividual members.
208*4882a593Smuzhiyun
209*4882a593SmuzhiyunSubsystem abstractions
210*4882a593Smuzhiyun======================
211*4882a593Smuzhiyun
212*4882a593SmuzhiyunWhile some device drivers implement their own ioctl function, most
213*4882a593Smuzhiyunsubsystems implement the same command for multiple drivers.  Ideally the
214*4882a593Smuzhiyunsubsystem has an .ioctl() handler that copies the arguments from and
215*4882a593Smuzhiyunto user space, passing them into subsystem specific callback functions
216*4882a593Smuzhiyunthrough normal kernel pointers.
217*4882a593Smuzhiyun
218*4882a593SmuzhiyunThis helps in various ways:
219*4882a593Smuzhiyun
220*4882a593Smuzhiyun* Applications written for one driver are more likely to work for
221*4882a593Smuzhiyun  another one in the same subsystem if there are no subtle differences
222*4882a593Smuzhiyun  in the user space ABI.
223*4882a593Smuzhiyun
224*4882a593Smuzhiyun* The complexity of user space access and data structure layout is done
225*4882a593Smuzhiyun  in one place, reducing the potential for implementation bugs.
226*4882a593Smuzhiyun
227*4882a593Smuzhiyun* It is more likely to be reviewed by experienced developers
228*4882a593Smuzhiyun  that can spot problems in the interface when the ioctl is shared
229*4882a593Smuzhiyun  between multiple drivers than when it is only used in a single driver.
230*4882a593Smuzhiyun
231*4882a593SmuzhiyunAlternatives to ioctl
232*4882a593Smuzhiyun=====================
233*4882a593Smuzhiyun
234*4882a593SmuzhiyunThere are many cases in which ioctl is not the best solution for a
235*4882a593Smuzhiyunproblem. Alternatives include:
236*4882a593Smuzhiyun
237*4882a593Smuzhiyun* System calls are a better choice for a system-wide feature that
238*4882a593Smuzhiyun  is not tied to a physical device or constrained by the file system
239*4882a593Smuzhiyun  permissions of a character device node
240*4882a593Smuzhiyun
241*4882a593Smuzhiyun* netlink is the preferred way of configuring any network related
242*4882a593Smuzhiyun  objects through sockets.
243*4882a593Smuzhiyun
244*4882a593Smuzhiyun* debugfs is used for ad-hoc interfaces for debugging functionality
245*4882a593Smuzhiyun  that does not need to be exposed as a stable interface to applications.
246*4882a593Smuzhiyun
247*4882a593Smuzhiyun* sysfs is a good way to expose the state of an in-kernel object
248*4882a593Smuzhiyun  that is not tied to a file descriptor.
249*4882a593Smuzhiyun
250*4882a593Smuzhiyun* configfs can be used for more complex configuration than sysfs
251*4882a593Smuzhiyun
252*4882a593Smuzhiyun* A custom file system can provide extra flexibility with a simple
253*4882a593Smuzhiyun  user interface but adds a lot of complexity to the implementation.
254