xref: /OK3568_Linux_fs/kernel/Documentation/networking/netdevices.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun=====================================
4*4882a593SmuzhiyunNetwork Devices, the Kernel, and You!
5*4882a593Smuzhiyun=====================================
6*4882a593Smuzhiyun
7*4882a593Smuzhiyun
8*4882a593SmuzhiyunIntroduction
9*4882a593Smuzhiyun============
10*4882a593SmuzhiyunThe following is a random collection of documentation regarding
11*4882a593Smuzhiyunnetwork devices.
12*4882a593Smuzhiyun
13*4882a593Smuzhiyunstruct net_device lifetime rules
14*4882a593Smuzhiyun================================
15*4882a593SmuzhiyunNetwork device structures need to persist even after module is unloaded and
16*4882a593Smuzhiyunmust be allocated with alloc_netdev_mqs() and friends.
17*4882a593SmuzhiyunIf device has registered successfully, it will be freed on last use
18*4882a593Smuzhiyunby free_netdev(). This is required to handle the pathological case cleanly
19*4882a593Smuzhiyun(example: ``rmmod mydriver </sys/class/net/myeth/mtu``)
20*4882a593Smuzhiyun
21*4882a593Smuzhiyunalloc_netdev_mqs() / alloc_netdev() reserve extra space for driver
22*4882a593Smuzhiyunprivate data which gets freed when the network device is freed. If
23*4882a593Smuzhiyunseparately allocated data is attached to the network device
24*4882a593Smuzhiyun(netdev_priv()) then it is up to the module exit handler to free that.
25*4882a593Smuzhiyun
26*4882a593SmuzhiyunThere are two groups of APIs for registering struct net_device.
27*4882a593SmuzhiyunFirst group can be used in normal contexts where ``rtnl_lock`` is not already
28*4882a593Smuzhiyunheld: register_netdev(), unregister_netdev().
29*4882a593SmuzhiyunSecond group can be used when ``rtnl_lock`` is already held:
30*4882a593Smuzhiyunregister_netdevice(), unregister_netdevice(), free_netdevice().
31*4882a593Smuzhiyun
32*4882a593SmuzhiyunSimple drivers
33*4882a593Smuzhiyun--------------
34*4882a593Smuzhiyun
35*4882a593SmuzhiyunMost drivers (especially device drivers) handle lifetime of struct net_device
36*4882a593Smuzhiyunin context where ``rtnl_lock`` is not held (e.g. driver probe and remove paths).
37*4882a593Smuzhiyun
38*4882a593SmuzhiyunIn that case the struct net_device registration is done using
39*4882a593Smuzhiyunthe register_netdev(), and unregister_netdev() functions:
40*4882a593Smuzhiyun
41*4882a593Smuzhiyun.. code-block:: c
42*4882a593Smuzhiyun
43*4882a593Smuzhiyun  int probe()
44*4882a593Smuzhiyun  {
45*4882a593Smuzhiyun    struct my_device_priv *priv;
46*4882a593Smuzhiyun    int err;
47*4882a593Smuzhiyun
48*4882a593Smuzhiyun    dev = alloc_netdev_mqs(...);
49*4882a593Smuzhiyun    if (!dev)
50*4882a593Smuzhiyun      return -ENOMEM;
51*4882a593Smuzhiyun    priv = netdev_priv(dev);
52*4882a593Smuzhiyun
53*4882a593Smuzhiyun    /* ... do all device setup before calling register_netdev() ...
54*4882a593Smuzhiyun     */
55*4882a593Smuzhiyun
56*4882a593Smuzhiyun    err = register_netdev(dev);
57*4882a593Smuzhiyun    if (err)
58*4882a593Smuzhiyun      goto err_undo;
59*4882a593Smuzhiyun
60*4882a593Smuzhiyun    /* net_device is visible to the user! */
61*4882a593Smuzhiyun
62*4882a593Smuzhiyun  err_undo:
63*4882a593Smuzhiyun    /* ... undo the device setup ... */
64*4882a593Smuzhiyun    free_netdev(dev);
65*4882a593Smuzhiyun    return err;
66*4882a593Smuzhiyun  }
67*4882a593Smuzhiyun
68*4882a593Smuzhiyun  void remove()
69*4882a593Smuzhiyun  {
70*4882a593Smuzhiyun    unregister_netdev(dev);
71*4882a593Smuzhiyun    free_netdev(dev);
72*4882a593Smuzhiyun  }
73*4882a593Smuzhiyun
74*4882a593SmuzhiyunNote that after calling register_netdev() the device is visible in the system.
75*4882a593SmuzhiyunUsers can open it and start sending / receiving traffic immediately,
76*4882a593Smuzhiyunor run any other callback, so all initialization must be done prior to
77*4882a593Smuzhiyunregistration.
78*4882a593Smuzhiyun
79*4882a593Smuzhiyununregister_netdev() closes the device and waits for all users to be done
80*4882a593Smuzhiyunwith it. The memory of struct net_device itself may still be referenced
81*4882a593Smuzhiyunby sysfs but all operations on that device will fail.
82*4882a593Smuzhiyun
83*4882a593Smuzhiyunfree_netdev() can be called after unregister_netdev() returns on when
84*4882a593Smuzhiyunregister_netdev() failed.
85*4882a593Smuzhiyun
86*4882a593SmuzhiyunDevice management under RTNL
87*4882a593Smuzhiyun----------------------------
88*4882a593Smuzhiyun
89*4882a593SmuzhiyunRegistering struct net_device while in context which already holds
90*4882a593Smuzhiyunthe ``rtnl_lock`` requires extra care. In those scenarios most drivers
91*4882a593Smuzhiyunwill want to make use of struct net_device's ``needs_free_netdev``
92*4882a593Smuzhiyunand ``priv_destructor`` members for freeing of state.
93*4882a593Smuzhiyun
94*4882a593SmuzhiyunExample flow of netdev handling under ``rtnl_lock``:
95*4882a593Smuzhiyun
96*4882a593Smuzhiyun.. code-block:: c
97*4882a593Smuzhiyun
98*4882a593Smuzhiyun  static void my_setup(struct net_device *dev)
99*4882a593Smuzhiyun  {
100*4882a593Smuzhiyun    dev->needs_free_netdev = true;
101*4882a593Smuzhiyun  }
102*4882a593Smuzhiyun
103*4882a593Smuzhiyun  static void my_destructor(struct net_device *dev)
104*4882a593Smuzhiyun  {
105*4882a593Smuzhiyun    some_obj_destroy(priv->obj);
106*4882a593Smuzhiyun    some_uninit(priv);
107*4882a593Smuzhiyun  }
108*4882a593Smuzhiyun
109*4882a593Smuzhiyun  int create_link()
110*4882a593Smuzhiyun  {
111*4882a593Smuzhiyun    struct my_device_priv *priv;
112*4882a593Smuzhiyun    int err;
113*4882a593Smuzhiyun
114*4882a593Smuzhiyun    ASSERT_RTNL();
115*4882a593Smuzhiyun
116*4882a593Smuzhiyun    dev = alloc_netdev(sizeof(*priv), "net%d", NET_NAME_UNKNOWN, my_setup);
117*4882a593Smuzhiyun    if (!dev)
118*4882a593Smuzhiyun      return -ENOMEM;
119*4882a593Smuzhiyun    priv = netdev_priv(dev);
120*4882a593Smuzhiyun
121*4882a593Smuzhiyun    /* Implicit constructor */
122*4882a593Smuzhiyun    err = some_init(priv);
123*4882a593Smuzhiyun    if (err)
124*4882a593Smuzhiyun      goto err_free_dev;
125*4882a593Smuzhiyun
126*4882a593Smuzhiyun    priv->obj = some_obj_create();
127*4882a593Smuzhiyun    if (!priv->obj) {
128*4882a593Smuzhiyun      err = -ENOMEM;
129*4882a593Smuzhiyun      goto err_some_uninit;
130*4882a593Smuzhiyun    }
131*4882a593Smuzhiyun    /* End of constructor, set the destructor: */
132*4882a593Smuzhiyun    dev->priv_destructor = my_destructor;
133*4882a593Smuzhiyun
134*4882a593Smuzhiyun    err = register_netdevice(dev);
135*4882a593Smuzhiyun    if (err)
136*4882a593Smuzhiyun      /* register_netdevice() calls destructor on failure */
137*4882a593Smuzhiyun      goto err_free_dev;
138*4882a593Smuzhiyun
139*4882a593Smuzhiyun    /* If anything fails now unregister_netdevice() (or unregister_netdev())
140*4882a593Smuzhiyun     * will take care of calling my_destructor and free_netdev().
141*4882a593Smuzhiyun     */
142*4882a593Smuzhiyun
143*4882a593Smuzhiyun    return 0;
144*4882a593Smuzhiyun
145*4882a593Smuzhiyun  err_some_uninit:
146*4882a593Smuzhiyun    some_uninit(priv);
147*4882a593Smuzhiyun  err_free_dev:
148*4882a593Smuzhiyun    free_netdev(dev);
149*4882a593Smuzhiyun    return err;
150*4882a593Smuzhiyun  }
151*4882a593Smuzhiyun
152*4882a593SmuzhiyunIf struct net_device.priv_destructor is set it will be called by the core
153*4882a593Smuzhiyunsome time after unregister_netdevice(), it will also be called if
154*4882a593Smuzhiyunregister_netdevice() fails. The callback may be invoked with or without
155*4882a593Smuzhiyun``rtnl_lock`` held.
156*4882a593Smuzhiyun
157*4882a593SmuzhiyunThere is no explicit constructor callback, driver "constructs" the private
158*4882a593Smuzhiyunnetdev state after allocating it and before registration.
159*4882a593Smuzhiyun
160*4882a593SmuzhiyunSetting struct net_device.needs_free_netdev makes core call free_netdevice()
161*4882a593Smuzhiyunautomatically after unregister_netdevice() when all references to the device
162*4882a593Smuzhiyunare gone. It only takes effect after a successful call to register_netdevice()
163*4882a593Smuzhiyunso if register_netdevice() fails driver is responsible for calling
164*4882a593Smuzhiyunfree_netdev().
165*4882a593Smuzhiyun
166*4882a593Smuzhiyunfree_netdev() is safe to call on error paths right after unregister_netdevice()
167*4882a593Smuzhiyunor when register_netdevice() fails. Parts of netdev (de)registration process
168*4882a593Smuzhiyunhappen after ``rtnl_lock`` is released, therefore in those cases free_netdev()
169*4882a593Smuzhiyunwill defer some of the processing until ``rtnl_lock`` is released.
170*4882a593Smuzhiyun
171*4882a593SmuzhiyunDevices spawned from struct rtnl_link_ops should never free the
172*4882a593Smuzhiyunstruct net_device directly.
173*4882a593Smuzhiyun
174*4882a593Smuzhiyun.ndo_init and .ndo_uninit
175*4882a593Smuzhiyun~~~~~~~~~~~~~~~~~~~~~~~~~
176*4882a593Smuzhiyun
177*4882a593Smuzhiyun``.ndo_init`` and ``.ndo_uninit`` callbacks are called during net_device
178*4882a593Smuzhiyunregistration and de-registration, under ``rtnl_lock``. Drivers can use
179*4882a593Smuzhiyunthose e.g. when parts of their init process need to run under ``rtnl_lock``.
180*4882a593Smuzhiyun
181*4882a593Smuzhiyun``.ndo_init`` runs before device is visible in the system, ``.ndo_uninit``
182*4882a593Smuzhiyunruns during de-registering after device is closed but other subsystems
183*4882a593Smuzhiyunmay still have outstanding references to the netdevice.
184*4882a593Smuzhiyun
185*4882a593SmuzhiyunMTU
186*4882a593Smuzhiyun===
187*4882a593SmuzhiyunEach network device has a Maximum Transfer Unit. The MTU does not
188*4882a593Smuzhiyuninclude any link layer protocol overhead. Upper layer protocols must
189*4882a593Smuzhiyunnot pass a socket buffer (skb) to a device to transmit with more data
190*4882a593Smuzhiyunthan the mtu. The MTU does not include link layer header overhead, so
191*4882a593Smuzhiyunfor example on Ethernet if the standard MTU is 1500 bytes used, the
192*4882a593Smuzhiyunactual skb will contain up to 1514 bytes because of the Ethernet
193*4882a593Smuzhiyunheader. Devices should allow for the 4 byte VLAN header as well.
194*4882a593Smuzhiyun
195*4882a593SmuzhiyunSegmentation Offload (GSO, TSO) is an exception to this rule.  The
196*4882a593Smuzhiyunupper layer protocol may pass a large socket buffer to the device
197*4882a593Smuzhiyuntransmit routine, and the device will break that up into separate
198*4882a593Smuzhiyunpackets based on the current MTU.
199*4882a593Smuzhiyun
200*4882a593SmuzhiyunMTU is symmetrical and applies both to receive and transmit. A device
201*4882a593Smuzhiyunmust be able to receive at least the maximum size packet allowed by
202*4882a593Smuzhiyunthe MTU. A network device may use the MTU as mechanism to size receive
203*4882a593Smuzhiyunbuffers, but the device should allow packets with VLAN header. With
204*4882a593Smuzhiyunstandard Ethernet mtu of 1500 bytes, the device should allow up to
205*4882a593Smuzhiyun1518 byte packets (1500 + 14 header + 4 tag).  The device may either:
206*4882a593Smuzhiyundrop, truncate, or pass up oversize packets, but dropping oversize
207*4882a593Smuzhiyunpackets is preferred.
208*4882a593Smuzhiyun
209*4882a593Smuzhiyun
210*4882a593Smuzhiyunstruct net_device synchronization rules
211*4882a593Smuzhiyun=======================================
212*4882a593Smuzhiyunndo_open:
213*4882a593Smuzhiyun	Synchronization: rtnl_lock() semaphore.
214*4882a593Smuzhiyun	Context: process
215*4882a593Smuzhiyun
216*4882a593Smuzhiyunndo_stop:
217*4882a593Smuzhiyun	Synchronization: rtnl_lock() semaphore.
218*4882a593Smuzhiyun	Context: process
219*4882a593Smuzhiyun	Note: netif_running() is guaranteed false
220*4882a593Smuzhiyun
221*4882a593Smuzhiyunndo_do_ioctl:
222*4882a593Smuzhiyun	Synchronization: rtnl_lock() semaphore.
223*4882a593Smuzhiyun	Context: process
224*4882a593Smuzhiyun
225*4882a593Smuzhiyunndo_get_stats:
226*4882a593Smuzhiyun	Synchronization: dev_base_lock rwlock.
227*4882a593Smuzhiyun	Context: nominally process, but don't sleep inside an rwlock
228*4882a593Smuzhiyun
229*4882a593Smuzhiyunndo_start_xmit:
230*4882a593Smuzhiyun	Synchronization: __netif_tx_lock spinlock.
231*4882a593Smuzhiyun
232*4882a593Smuzhiyun	When the driver sets NETIF_F_LLTX in dev->features this will be
233*4882a593Smuzhiyun	called without holding netif_tx_lock. In this case the driver
234*4882a593Smuzhiyun	has to lock by itself when needed.
235*4882a593Smuzhiyun	The locking there should also properly protect against
236*4882a593Smuzhiyun	set_rx_mode. WARNING: use of NETIF_F_LLTX is deprecated.
237*4882a593Smuzhiyun	Don't use it for new drivers.
238*4882a593Smuzhiyun
239*4882a593Smuzhiyun	Context: Process with BHs disabled or BH (timer),
240*4882a593Smuzhiyun		 will be called with interrupts disabled by netconsole.
241*4882a593Smuzhiyun
242*4882a593Smuzhiyun	Return codes:
243*4882a593Smuzhiyun
244*4882a593Smuzhiyun	* NETDEV_TX_OK everything ok.
245*4882a593Smuzhiyun	* NETDEV_TX_BUSY Cannot transmit packet, try later
246*4882a593Smuzhiyun	  Usually a bug, means queue start/stop flow control is broken in
247*4882a593Smuzhiyun	  the driver. Note: the driver must NOT put the skb in its DMA ring.
248*4882a593Smuzhiyun
249*4882a593Smuzhiyunndo_tx_timeout:
250*4882a593Smuzhiyun	Synchronization: netif_tx_lock spinlock; all TX queues frozen.
251*4882a593Smuzhiyun	Context: BHs disabled
252*4882a593Smuzhiyun	Notes: netif_queue_stopped() is guaranteed true
253*4882a593Smuzhiyun
254*4882a593Smuzhiyunndo_set_rx_mode:
255*4882a593Smuzhiyun	Synchronization: netif_addr_lock spinlock.
256*4882a593Smuzhiyun	Context: BHs disabled
257*4882a593Smuzhiyun
258*4882a593Smuzhiyunstruct napi_struct synchronization rules
259*4882a593Smuzhiyun========================================
260*4882a593Smuzhiyunnapi->poll:
261*4882a593Smuzhiyun	Synchronization:
262*4882a593Smuzhiyun		NAPI_STATE_SCHED bit in napi->state.  Device
263*4882a593Smuzhiyun		driver's ndo_stop method will invoke napi_disable() on
264*4882a593Smuzhiyun		all NAPI instances which will do a sleeping poll on the
265*4882a593Smuzhiyun		NAPI_STATE_SCHED napi->state bit, waiting for all pending
266*4882a593Smuzhiyun		NAPI activity to cease.
267*4882a593Smuzhiyun
268*4882a593Smuzhiyun	Context:
269*4882a593Smuzhiyun		 softirq
270*4882a593Smuzhiyun		 will be called with interrupts disabled by netconsole.
271