xref: /OK3568_Linux_fs/kernel/Documentation/networking/packet_mmap.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun===========
4*4882a593SmuzhiyunPacket MMAP
5*4882a593Smuzhiyun===========
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunAbstract
8*4882a593Smuzhiyun========
9*4882a593Smuzhiyun
10*4882a593SmuzhiyunThis file documents the mmap() facility available with the PACKET
11*4882a593Smuzhiyunsocket interface on 2.4/2.6/3.x kernels. This type of sockets is used for
12*4882a593Smuzhiyun
13*4882a593Smuzhiyuni) capture network traffic with utilities like tcpdump,
14*4882a593Smuzhiyunii) transmit network traffic, or any other that needs raw
15*4882a593Smuzhiyun    access to network interface.
16*4882a593Smuzhiyun
17*4882a593SmuzhiyunHowto can be found at:
18*4882a593Smuzhiyun
19*4882a593Smuzhiyun    https://sites.google.com/site/packetmmap/
20*4882a593Smuzhiyun
21*4882a593SmuzhiyunPlease send your comments to
22*4882a593Smuzhiyun    - Ulisses Alonso Camaró <uaca@i.hate.spam.alumni.uv.es>
23*4882a593Smuzhiyun    - Johann Baudy
24*4882a593Smuzhiyun
25*4882a593SmuzhiyunWhy use PACKET_MMAP
26*4882a593Smuzhiyun===================
27*4882a593Smuzhiyun
28*4882a593SmuzhiyunIn Linux 2.4/2.6/3.x if PACKET_MMAP is not enabled, the capture process is very
29*4882a593Smuzhiyuninefficient. It uses very limited buffers and requires one system call to
30*4882a593Smuzhiyuncapture each packet, it requires two if you want to get packet's timestamp
31*4882a593Smuzhiyun(like libpcap always does).
32*4882a593Smuzhiyun
33*4882a593SmuzhiyunIn the other hand PACKET_MMAP is very efficient. PACKET_MMAP provides a size
34*4882a593Smuzhiyunconfigurable circular buffer mapped in user space that can be used to either
35*4882a593Smuzhiyunsend or receive packets. This way reading packets just needs to wait for them,
36*4882a593Smuzhiyunmost of the time there is no need to issue a single system call. Concerning
37*4882a593Smuzhiyuntransmission, multiple packets can be sent through one system call to get the
38*4882a593Smuzhiyunhighest bandwidth. By using a shared buffer between the kernel and the user
39*4882a593Smuzhiyunalso has the benefit of minimizing packet copies.
40*4882a593Smuzhiyun
41*4882a593SmuzhiyunIt's fine to use PACKET_MMAP to improve the performance of the capture and
42*4882a593Smuzhiyuntransmission process, but it isn't everything. At least, if you are capturing
43*4882a593Smuzhiyunat high speeds (this is relative to the cpu speed), you should check if the
44*4882a593Smuzhiyundevice driver of your network interface card supports some sort of interrupt
45*4882a593Smuzhiyunload mitigation or (even better) if it supports NAPI, also make sure it is
46*4882a593Smuzhiyunenabled. For transmission, check the MTU (Maximum Transmission Unit) used and
47*4882a593Smuzhiyunsupported by devices of your network. CPU IRQ pinning of your network interface
48*4882a593Smuzhiyuncard can also be an advantage.
49*4882a593Smuzhiyun
50*4882a593SmuzhiyunHow to use mmap() to improve capture process
51*4882a593Smuzhiyun============================================
52*4882a593Smuzhiyun
53*4882a593SmuzhiyunFrom the user standpoint, you should use the higher level libpcap library, which
54*4882a593Smuzhiyunis a de facto standard, portable across nearly all operating systems
55*4882a593Smuzhiyunincluding Win32.
56*4882a593Smuzhiyun
57*4882a593SmuzhiyunPacket MMAP support was integrated into libpcap around the time of version 1.3.0;
58*4882a593SmuzhiyunTPACKET_V3 support was added in version 1.5.0
59*4882a593Smuzhiyun
60*4882a593SmuzhiyunHow to use mmap() directly to improve capture process
61*4882a593Smuzhiyun=====================================================
62*4882a593Smuzhiyun
63*4882a593SmuzhiyunFrom the system calls stand point, the use of PACKET_MMAP involves
64*4882a593Smuzhiyunthe following process::
65*4882a593Smuzhiyun
66*4882a593Smuzhiyun
67*4882a593Smuzhiyun    [setup]     socket() -------> creation of the capture socket
68*4882a593Smuzhiyun		setsockopt() ---> allocation of the circular buffer (ring)
69*4882a593Smuzhiyun				  option: PACKET_RX_RING
70*4882a593Smuzhiyun		mmap() ---------> mapping of the allocated buffer to the
71*4882a593Smuzhiyun				  user process
72*4882a593Smuzhiyun
73*4882a593Smuzhiyun    [capture]   poll() ---------> to wait for incoming packets
74*4882a593Smuzhiyun
75*4882a593Smuzhiyun    [shutdown]  close() --------> destruction of the capture socket and
76*4882a593Smuzhiyun				  deallocation of all associated
77*4882a593Smuzhiyun				  resources.
78*4882a593Smuzhiyun
79*4882a593Smuzhiyun
80*4882a593Smuzhiyunsocket creation and destruction is straight forward, and is done
81*4882a593Smuzhiyunthe same way with or without PACKET_MMAP::
82*4882a593Smuzhiyun
83*4882a593Smuzhiyun int fd = socket(PF_PACKET, mode, htons(ETH_P_ALL));
84*4882a593Smuzhiyun
85*4882a593Smuzhiyunwhere mode is SOCK_RAW for the raw interface were link level
86*4882a593Smuzhiyuninformation can be captured or SOCK_DGRAM for the cooked
87*4882a593Smuzhiyuninterface where link level information capture is not
88*4882a593Smuzhiyunsupported and a link level pseudo-header is provided
89*4882a593Smuzhiyunby the kernel.
90*4882a593Smuzhiyun
91*4882a593SmuzhiyunThe destruction of the socket and all associated resources
92*4882a593Smuzhiyunis done by a simple call to close(fd).
93*4882a593Smuzhiyun
94*4882a593SmuzhiyunSimilarly as without PACKET_MMAP, it is possible to use one socket
95*4882a593Smuzhiyunfor capture and transmission. This can be done by mapping the
96*4882a593Smuzhiyunallocated RX and TX buffer ring with a single mmap() call.
97*4882a593SmuzhiyunSee "Mapping and use of the circular buffer (ring)".
98*4882a593Smuzhiyun
99*4882a593SmuzhiyunNext I will describe PACKET_MMAP settings and its constraints,
100*4882a593Smuzhiyunalso the mapping of the circular buffer in the user process and
101*4882a593Smuzhiyunthe use of this buffer.
102*4882a593Smuzhiyun
103*4882a593SmuzhiyunHow to use mmap() directly to improve transmission process
104*4882a593Smuzhiyun==========================================================
105*4882a593SmuzhiyunTransmission process is similar to capture as shown below::
106*4882a593Smuzhiyun
107*4882a593Smuzhiyun    [setup]         socket() -------> creation of the transmission socket
108*4882a593Smuzhiyun		    setsockopt() ---> allocation of the circular buffer (ring)
109*4882a593Smuzhiyun				      option: PACKET_TX_RING
110*4882a593Smuzhiyun		    bind() ---------> bind transmission socket with a network interface
111*4882a593Smuzhiyun		    mmap() ---------> mapping of the allocated buffer to the
112*4882a593Smuzhiyun				      user process
113*4882a593Smuzhiyun
114*4882a593Smuzhiyun    [transmission]  poll() ---------> wait for free packets (optional)
115*4882a593Smuzhiyun		    send() ---------> send all packets that are set as ready in
116*4882a593Smuzhiyun				      the ring
117*4882a593Smuzhiyun				      The flag MSG_DONTWAIT can be used to return
118*4882a593Smuzhiyun				      before end of transfer.
119*4882a593Smuzhiyun
120*4882a593Smuzhiyun    [shutdown]      close() --------> destruction of the transmission socket and
121*4882a593Smuzhiyun				      deallocation of all associated resources.
122*4882a593Smuzhiyun
123*4882a593SmuzhiyunSocket creation and destruction is also straight forward, and is done
124*4882a593Smuzhiyunthe same way as in capturing described in the previous paragraph::
125*4882a593Smuzhiyun
126*4882a593Smuzhiyun int fd = socket(PF_PACKET, mode, 0);
127*4882a593Smuzhiyun
128*4882a593SmuzhiyunThe protocol can optionally be 0 in case we only want to transmit
129*4882a593Smuzhiyunvia this socket, which avoids an expensive call to packet_rcv().
130*4882a593SmuzhiyunIn this case, you also need to bind(2) the TX_RING with sll_protocol = 0
131*4882a593Smuzhiyunset. Otherwise, htons(ETH_P_ALL) or any other protocol, for example.
132*4882a593Smuzhiyun
133*4882a593SmuzhiyunBinding the socket to your network interface is mandatory (with zero copy) to
134*4882a593Smuzhiyunknow the header size of frames used in the circular buffer.
135*4882a593Smuzhiyun
136*4882a593SmuzhiyunAs capture, each frame contains two parts::
137*4882a593Smuzhiyun
138*4882a593Smuzhiyun    --------------------
139*4882a593Smuzhiyun    | struct tpacket_hdr | Header. It contains the status of
140*4882a593Smuzhiyun    |                    | of this frame
141*4882a593Smuzhiyun    |--------------------|
142*4882a593Smuzhiyun    | data buffer        |
143*4882a593Smuzhiyun    .                    .  Data that will be sent over the network interface.
144*4882a593Smuzhiyun    .                    .
145*4882a593Smuzhiyun    --------------------
146*4882a593Smuzhiyun
147*4882a593Smuzhiyun bind() associates the socket to your network interface thanks to
148*4882a593Smuzhiyun sll_ifindex parameter of struct sockaddr_ll.
149*4882a593Smuzhiyun
150*4882a593Smuzhiyun Initialization example::
151*4882a593Smuzhiyun
152*4882a593Smuzhiyun    struct sockaddr_ll my_addr;
153*4882a593Smuzhiyun    struct ifreq s_ifr;
154*4882a593Smuzhiyun    ...
155*4882a593Smuzhiyun
156*4882a593Smuzhiyun    strncpy (s_ifr.ifr_name, "eth0", sizeof(s_ifr.ifr_name));
157*4882a593Smuzhiyun
158*4882a593Smuzhiyun    /* get interface index of eth0 */
159*4882a593Smuzhiyun    ioctl(this->socket, SIOCGIFINDEX, &s_ifr);
160*4882a593Smuzhiyun
161*4882a593Smuzhiyun    /* fill sockaddr_ll struct to prepare binding */
162*4882a593Smuzhiyun    my_addr.sll_family = AF_PACKET;
163*4882a593Smuzhiyun    my_addr.sll_protocol = htons(ETH_P_ALL);
164*4882a593Smuzhiyun    my_addr.sll_ifindex =  s_ifr.ifr_ifindex;
165*4882a593Smuzhiyun
166*4882a593Smuzhiyun    /* bind socket to eth0 */
167*4882a593Smuzhiyun    bind(this->socket, (struct sockaddr *)&my_addr, sizeof(struct sockaddr_ll));
168*4882a593Smuzhiyun
169*4882a593Smuzhiyun A complete tutorial is available at: https://sites.google.com/site/packetmmap/
170*4882a593Smuzhiyun
171*4882a593SmuzhiyunBy default, the user should put data at::
172*4882a593Smuzhiyun
173*4882a593Smuzhiyun frame base + TPACKET_HDRLEN - sizeof(struct sockaddr_ll)
174*4882a593Smuzhiyun
175*4882a593SmuzhiyunSo, whatever you choose for the socket mode (SOCK_DGRAM or SOCK_RAW),
176*4882a593Smuzhiyunthe beginning of the user data will be at::
177*4882a593Smuzhiyun
178*4882a593Smuzhiyun frame base + TPACKET_ALIGN(sizeof(struct tpacket_hdr))
179*4882a593Smuzhiyun
180*4882a593SmuzhiyunIf you wish to put user data at a custom offset from the beginning of
181*4882a593Smuzhiyunthe frame (for payload alignment with SOCK_RAW mode for instance) you
182*4882a593Smuzhiyuncan set tp_net (with SOCK_DGRAM) or tp_mac (with SOCK_RAW). In order
183*4882a593Smuzhiyunto make this work it must be enabled previously with setsockopt()
184*4882a593Smuzhiyunand the PACKET_TX_HAS_OFF option.
185*4882a593Smuzhiyun
186*4882a593SmuzhiyunPACKET_MMAP settings
187*4882a593Smuzhiyun====================
188*4882a593Smuzhiyun
189*4882a593SmuzhiyunTo setup PACKET_MMAP from user level code is done with a call like
190*4882a593Smuzhiyun
191*4882a593Smuzhiyun - Capture process::
192*4882a593Smuzhiyun
193*4882a593Smuzhiyun     setsockopt(fd, SOL_PACKET, PACKET_RX_RING, (void *) &req, sizeof(req))
194*4882a593Smuzhiyun
195*4882a593Smuzhiyun - Transmission process::
196*4882a593Smuzhiyun
197*4882a593Smuzhiyun     setsockopt(fd, SOL_PACKET, PACKET_TX_RING, (void *) &req, sizeof(req))
198*4882a593Smuzhiyun
199*4882a593SmuzhiyunThe most significant argument in the previous call is the req parameter,
200*4882a593Smuzhiyunthis parameter must to have the following structure::
201*4882a593Smuzhiyun
202*4882a593Smuzhiyun    struct tpacket_req
203*4882a593Smuzhiyun    {
204*4882a593Smuzhiyun	unsigned int    tp_block_size;  /* Minimal size of contiguous block */
205*4882a593Smuzhiyun	unsigned int    tp_block_nr;    /* Number of blocks */
206*4882a593Smuzhiyun	unsigned int    tp_frame_size;  /* Size of frame */
207*4882a593Smuzhiyun	unsigned int    tp_frame_nr;    /* Total number of frames */
208*4882a593Smuzhiyun    };
209*4882a593Smuzhiyun
210*4882a593SmuzhiyunThis structure is defined in /usr/include/linux/if_packet.h and establishes a
211*4882a593Smuzhiyuncircular buffer (ring) of unswappable memory.
212*4882a593SmuzhiyunBeing mapped in the capture process allows reading the captured frames and
213*4882a593Smuzhiyunrelated meta-information like timestamps without requiring a system call.
214*4882a593Smuzhiyun
215*4882a593SmuzhiyunFrames are grouped in blocks. Each block is a physically contiguous
216*4882a593Smuzhiyunregion of memory and holds tp_block_size/tp_frame_size frames. The total number
217*4882a593Smuzhiyunof blocks is tp_block_nr. Note that tp_frame_nr is a redundant parameter because::
218*4882a593Smuzhiyun
219*4882a593Smuzhiyun    frames_per_block = tp_block_size/tp_frame_size
220*4882a593Smuzhiyun
221*4882a593Smuzhiyunindeed, packet_set_ring checks that the following condition is true::
222*4882a593Smuzhiyun
223*4882a593Smuzhiyun    frames_per_block * tp_block_nr == tp_frame_nr
224*4882a593Smuzhiyun
225*4882a593SmuzhiyunLets see an example, with the following values::
226*4882a593Smuzhiyun
227*4882a593Smuzhiyun     tp_block_size= 4096
228*4882a593Smuzhiyun     tp_frame_size= 2048
229*4882a593Smuzhiyun     tp_block_nr  = 4
230*4882a593Smuzhiyun     tp_frame_nr  = 8
231*4882a593Smuzhiyun
232*4882a593Smuzhiyunwe will get the following buffer structure::
233*4882a593Smuzhiyun
234*4882a593Smuzhiyun	    block #1                 block #2
235*4882a593Smuzhiyun    +---------+---------+    +---------+---------+
236*4882a593Smuzhiyun    | frame 1 | frame 2 |    | frame 3 | frame 4 |
237*4882a593Smuzhiyun    +---------+---------+    +---------+---------+
238*4882a593Smuzhiyun
239*4882a593Smuzhiyun	    block #3                 block #4
240*4882a593Smuzhiyun    +---------+---------+    +---------+---------+
241*4882a593Smuzhiyun    | frame 5 | frame 6 |    | frame 7 | frame 8 |
242*4882a593Smuzhiyun    +---------+---------+    +---------+---------+
243*4882a593Smuzhiyun
244*4882a593SmuzhiyunA frame can be of any size with the only condition it can fit in a block. A block
245*4882a593Smuzhiyuncan only hold an integer number of frames, or in other words, a frame cannot
246*4882a593Smuzhiyunbe spawned across two blocks, so there are some details you have to take into
247*4882a593Smuzhiyunaccount when choosing the frame_size. See "Mapping and use of the circular
248*4882a593Smuzhiyunbuffer (ring)".
249*4882a593Smuzhiyun
250*4882a593SmuzhiyunPACKET_MMAP setting constraints
251*4882a593Smuzhiyun===============================
252*4882a593Smuzhiyun
253*4882a593SmuzhiyunIn kernel versions prior to 2.4.26 (for the 2.4 branch) and 2.6.5 (2.6 branch),
254*4882a593Smuzhiyunthe PACKET_MMAP buffer could hold only 32768 frames in a 32 bit architecture or
255*4882a593Smuzhiyun16384 in a 64 bit architecture. For information on these kernel versions
256*4882a593Smuzhiyunsee http://pusa.uv.es/~ulisses/packet_mmap/packet_mmap.pre-2.4.26_2.6.5.txt
257*4882a593Smuzhiyun
258*4882a593SmuzhiyunBlock size limit
259*4882a593Smuzhiyun----------------
260*4882a593Smuzhiyun
261*4882a593SmuzhiyunAs stated earlier, each block is a contiguous physical region of memory. These
262*4882a593Smuzhiyunmemory regions are allocated with calls to the __get_free_pages() function. As
263*4882a593Smuzhiyunthe name indicates, this function allocates pages of memory, and the second
264*4882a593Smuzhiyunargument is "order" or a power of two number of pages, that is
265*4882a593Smuzhiyun(for PAGE_SIZE == 4096) order=0 ==> 4096 bytes, order=1 ==> 8192 bytes,
266*4882a593Smuzhiyunorder=2 ==> 16384 bytes, etc. The maximum size of a
267*4882a593Smuzhiyunregion allocated by __get_free_pages is determined by the MAX_ORDER macro. More
268*4882a593Smuzhiyunprecisely the limit can be calculated as::
269*4882a593Smuzhiyun
270*4882a593Smuzhiyun   PAGE_SIZE << MAX_ORDER
271*4882a593Smuzhiyun
272*4882a593Smuzhiyun   In a i386 architecture PAGE_SIZE is 4096 bytes
273*4882a593Smuzhiyun   In a 2.4/i386 kernel MAX_ORDER is 10
274*4882a593Smuzhiyun   In a 2.6/i386 kernel MAX_ORDER is 11
275*4882a593Smuzhiyun
276*4882a593SmuzhiyunSo get_free_pages can allocate as much as 4MB or 8MB in a 2.4/2.6 kernel
277*4882a593Smuzhiyunrespectively, with an i386 architecture.
278*4882a593Smuzhiyun
279*4882a593SmuzhiyunUser space programs can include /usr/include/sys/user.h and
280*4882a593Smuzhiyun/usr/include/linux/mmzone.h to get PAGE_SIZE MAX_ORDER declarations.
281*4882a593Smuzhiyun
282*4882a593SmuzhiyunThe pagesize can also be determined dynamically with the getpagesize (2)
283*4882a593Smuzhiyunsystem call.
284*4882a593Smuzhiyun
285*4882a593SmuzhiyunBlock number limit
286*4882a593Smuzhiyun------------------
287*4882a593Smuzhiyun
288*4882a593SmuzhiyunTo understand the constraints of PACKET_MMAP, we have to see the structure
289*4882a593Smuzhiyunused to hold the pointers to each block.
290*4882a593Smuzhiyun
291*4882a593SmuzhiyunCurrently, this structure is a dynamically allocated vector with kmalloc
292*4882a593Smuzhiyuncalled pg_vec, its size limits the number of blocks that can be allocated::
293*4882a593Smuzhiyun
294*4882a593Smuzhiyun    +---+---+---+---+
295*4882a593Smuzhiyun    | x | x | x | x |
296*4882a593Smuzhiyun    +---+---+---+---+
297*4882a593Smuzhiyun      |   |   |   |
298*4882a593Smuzhiyun      |   |   |   v
299*4882a593Smuzhiyun      |   |   v  block #4
300*4882a593Smuzhiyun      |   v  block #3
301*4882a593Smuzhiyun      v  block #2
302*4882a593Smuzhiyun     block #1
303*4882a593Smuzhiyun
304*4882a593Smuzhiyunkmalloc allocates any number of bytes of physically contiguous memory from
305*4882a593Smuzhiyuna pool of pre-determined sizes. This pool of memory is maintained by the slab
306*4882a593Smuzhiyunallocator which is at the end the responsible for doing the allocation and
307*4882a593Smuzhiyunhence which imposes the maximum memory that kmalloc can allocate.
308*4882a593Smuzhiyun
309*4882a593SmuzhiyunIn a 2.4/2.6 kernel and the i386 architecture, the limit is 131072 bytes. The
310*4882a593Smuzhiyunpredetermined sizes that kmalloc uses can be checked in the "size-<bytes>"
311*4882a593Smuzhiyunentries of /proc/slabinfo
312*4882a593Smuzhiyun
313*4882a593SmuzhiyunIn a 32 bit architecture, pointers are 4 bytes long, so the total number of
314*4882a593Smuzhiyunpointers to blocks is::
315*4882a593Smuzhiyun
316*4882a593Smuzhiyun     131072/4 = 32768 blocks
317*4882a593Smuzhiyun
318*4882a593SmuzhiyunPACKET_MMAP buffer size calculator
319*4882a593Smuzhiyun==================================
320*4882a593Smuzhiyun
321*4882a593SmuzhiyunDefinitions:
322*4882a593Smuzhiyun
323*4882a593Smuzhiyun==============  ================================================================
324*4882a593Smuzhiyun<size-max>      is the maximum size of allocable with kmalloc
325*4882a593Smuzhiyun		(see /proc/slabinfo)
326*4882a593Smuzhiyun<pointer size>  depends on the architecture -- ``sizeof(void *)``
327*4882a593Smuzhiyun<page size>     depends on the architecture -- PAGE_SIZE or getpagesize (2)
328*4882a593Smuzhiyun<max-order>     is the value defined with MAX_ORDER
329*4882a593Smuzhiyun<frame size>    it's an upper bound of frame's capture size (more on this later)
330*4882a593Smuzhiyun==============  ================================================================
331*4882a593Smuzhiyun
332*4882a593Smuzhiyunfrom these definitions we will derive::
333*4882a593Smuzhiyun
334*4882a593Smuzhiyun	<block number> = <size-max>/<pointer size>
335*4882a593Smuzhiyun	<block size> = <pagesize> << <max-order>
336*4882a593Smuzhiyun
337*4882a593Smuzhiyunso, the max buffer size is::
338*4882a593Smuzhiyun
339*4882a593Smuzhiyun	<block number> * <block size>
340*4882a593Smuzhiyun
341*4882a593Smuzhiyunand, the number of frames be::
342*4882a593Smuzhiyun
343*4882a593Smuzhiyun	<block number> * <block size> / <frame size>
344*4882a593Smuzhiyun
345*4882a593SmuzhiyunSuppose the following parameters, which apply for 2.6 kernel and an
346*4882a593Smuzhiyuni386 architecture::
347*4882a593Smuzhiyun
348*4882a593Smuzhiyun	<size-max> = 131072 bytes
349*4882a593Smuzhiyun	<pointer size> = 4 bytes
350*4882a593Smuzhiyun	<pagesize> = 4096 bytes
351*4882a593Smuzhiyun	<max-order> = 11
352*4882a593Smuzhiyun
353*4882a593Smuzhiyunand a value for <frame size> of 2048 bytes. These parameters will yield::
354*4882a593Smuzhiyun
355*4882a593Smuzhiyun	<block number> = 131072/4 = 32768 blocks
356*4882a593Smuzhiyun	<block size> = 4096 << 11 = 8 MiB.
357*4882a593Smuzhiyun
358*4882a593Smuzhiyunand hence the buffer will have a 262144 MiB size. So it can hold
359*4882a593Smuzhiyun262144 MiB / 2048 bytes = 134217728 frames
360*4882a593Smuzhiyun
361*4882a593SmuzhiyunActually, this buffer size is not possible with an i386 architecture.
362*4882a593SmuzhiyunRemember that the memory is allocated in kernel space, in the case of
363*4882a593Smuzhiyunan i386 kernel's memory size is limited to 1GiB.
364*4882a593Smuzhiyun
365*4882a593SmuzhiyunAll memory allocations are not freed until the socket is closed. The memory
366*4882a593Smuzhiyunallocations are done with GFP_KERNEL priority, this basically means that
367*4882a593Smuzhiyunthe allocation can wait and swap other process' memory in order to allocate
368*4882a593Smuzhiyunthe necessary memory, so normally limits can be reached.
369*4882a593Smuzhiyun
370*4882a593SmuzhiyunOther constraints
371*4882a593Smuzhiyun-----------------
372*4882a593Smuzhiyun
373*4882a593SmuzhiyunIf you check the source code you will see that what I draw here as a frame
374*4882a593Smuzhiyunis not only the link level frame. At the beginning of each frame there is a
375*4882a593Smuzhiyunheader called struct tpacket_hdr used in PACKET_MMAP to hold link level's frame
376*4882a593Smuzhiyunmeta information like timestamp. So what we draw here a frame it's really
377*4882a593Smuzhiyunthe following (from include/linux/if_packet.h)::
378*4882a593Smuzhiyun
379*4882a593Smuzhiyun /*
380*4882a593Smuzhiyun   Frame structure:
381*4882a593Smuzhiyun
382*4882a593Smuzhiyun   - Start. Frame must be aligned to TPACKET_ALIGNMENT=16
383*4882a593Smuzhiyun   - struct tpacket_hdr
384*4882a593Smuzhiyun   - pad to TPACKET_ALIGNMENT=16
385*4882a593Smuzhiyun   - struct sockaddr_ll
386*4882a593Smuzhiyun   - Gap, chosen so that packet data (Start+tp_net) aligns to
387*4882a593Smuzhiyun     TPACKET_ALIGNMENT=16
388*4882a593Smuzhiyun   - Start+tp_mac: [ Optional MAC header ]
389*4882a593Smuzhiyun   - Start+tp_net: Packet data, aligned to TPACKET_ALIGNMENT=16.
390*4882a593Smuzhiyun   - Pad to align to TPACKET_ALIGNMENT=16
391*4882a593Smuzhiyun */
392*4882a593Smuzhiyun
393*4882a593SmuzhiyunThe following are conditions that are checked in packet_set_ring
394*4882a593Smuzhiyun
395*4882a593Smuzhiyun   - tp_block_size must be a multiple of PAGE_SIZE (1)
396*4882a593Smuzhiyun   - tp_frame_size must be greater than TPACKET_HDRLEN (obvious)
397*4882a593Smuzhiyun   - tp_frame_size must be a multiple of TPACKET_ALIGNMENT
398*4882a593Smuzhiyun   - tp_frame_nr   must be exactly frames_per_block*tp_block_nr
399*4882a593Smuzhiyun
400*4882a593SmuzhiyunNote that tp_block_size should be chosen to be a power of two or there will
401*4882a593Smuzhiyunbe a waste of memory.
402*4882a593Smuzhiyun
403*4882a593SmuzhiyunMapping and use of the circular buffer (ring)
404*4882a593Smuzhiyun---------------------------------------------
405*4882a593Smuzhiyun
406*4882a593SmuzhiyunThe mapping of the buffer in the user process is done with the conventional
407*4882a593Smuzhiyunmmap function. Even the circular buffer is compound of several physically
408*4882a593Smuzhiyundiscontiguous blocks of memory, they are contiguous to the user space, hence
409*4882a593Smuzhiyunjust one call to mmap is needed::
410*4882a593Smuzhiyun
411*4882a593Smuzhiyun    mmap(0, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
412*4882a593Smuzhiyun
413*4882a593SmuzhiyunIf tp_frame_size is a divisor of tp_block_size frames will be
414*4882a593Smuzhiyuncontiguously spaced by tp_frame_size bytes. If not, each
415*4882a593Smuzhiyuntp_block_size/tp_frame_size frames there will be a gap between
416*4882a593Smuzhiyunthe frames. This is because a frame cannot be spawn across two
417*4882a593Smuzhiyunblocks.
418*4882a593Smuzhiyun
419*4882a593SmuzhiyunTo use one socket for capture and transmission, the mapping of both the
420*4882a593SmuzhiyunRX and TX buffer ring has to be done with one call to mmap::
421*4882a593Smuzhiyun
422*4882a593Smuzhiyun    ...
423*4882a593Smuzhiyun    setsockopt(fd, SOL_PACKET, PACKET_RX_RING, &foo, sizeof(foo));
424*4882a593Smuzhiyun    setsockopt(fd, SOL_PACKET, PACKET_TX_RING, &bar, sizeof(bar));
425*4882a593Smuzhiyun    ...
426*4882a593Smuzhiyun    rx_ring = mmap(0, size * 2, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
427*4882a593Smuzhiyun    tx_ring = rx_ring + size;
428*4882a593Smuzhiyun
429*4882a593SmuzhiyunRX must be the first as the kernel maps the TX ring memory right
430*4882a593Smuzhiyunafter the RX one.
431*4882a593Smuzhiyun
432*4882a593SmuzhiyunAt the beginning of each frame there is an status field (see
433*4882a593Smuzhiyunstruct tpacket_hdr). If this field is 0 means that the frame is ready
434*4882a593Smuzhiyunto be used for the kernel, If not, there is a frame the user can read
435*4882a593Smuzhiyunand the following flags apply:
436*4882a593Smuzhiyun
437*4882a593SmuzhiyunCapture process
438*4882a593Smuzhiyun^^^^^^^^^^^^^^^
439*4882a593Smuzhiyun
440*4882a593Smuzhiyun     from include/linux/if_packet.h
441*4882a593Smuzhiyun
442*4882a593Smuzhiyun     #define TP_STATUS_COPY          (1 << 1)
443*4882a593Smuzhiyun     #define TP_STATUS_LOSING        (1 << 2)
444*4882a593Smuzhiyun     #define TP_STATUS_CSUMNOTREADY  (1 << 3)
445*4882a593Smuzhiyun     #define TP_STATUS_CSUM_VALID    (1 << 7)
446*4882a593Smuzhiyun
447*4882a593Smuzhiyun======================  =======================================================
448*4882a593SmuzhiyunTP_STATUS_COPY		This flag indicates that the frame (and associated
449*4882a593Smuzhiyun			meta information) has been truncated because it's
450*4882a593Smuzhiyun			larger than tp_frame_size. This packet can be
451*4882a593Smuzhiyun			read entirely with recvfrom().
452*4882a593Smuzhiyun
453*4882a593Smuzhiyun			In order to make this work it must to be
454*4882a593Smuzhiyun			enabled previously with setsockopt() and
455*4882a593Smuzhiyun			the PACKET_COPY_THRESH option.
456*4882a593Smuzhiyun
457*4882a593Smuzhiyun			The number of frames that can be buffered to
458*4882a593Smuzhiyun			be read with recvfrom is limited like a normal socket.
459*4882a593Smuzhiyun			See the SO_RCVBUF option in the socket (7) man page.
460*4882a593Smuzhiyun
461*4882a593SmuzhiyunTP_STATUS_LOSING	indicates there were packet drops from last time
462*4882a593Smuzhiyun			statistics where checked with getsockopt() and
463*4882a593Smuzhiyun			the PACKET_STATISTICS option.
464*4882a593Smuzhiyun
465*4882a593SmuzhiyunTP_STATUS_CSUMNOTREADY	currently it's used for outgoing IP packets which
466*4882a593Smuzhiyun			its checksum will be done in hardware. So while
467*4882a593Smuzhiyun			reading the packet we should not try to check the
468*4882a593Smuzhiyun			checksum.
469*4882a593Smuzhiyun
470*4882a593SmuzhiyunTP_STATUS_CSUM_VALID	This flag indicates that at least the transport
471*4882a593Smuzhiyun			header checksum of the packet has been already
472*4882a593Smuzhiyun			validated on the kernel side. If the flag is not set
473*4882a593Smuzhiyun			then we are free to check the checksum by ourselves
474*4882a593Smuzhiyun			provided that TP_STATUS_CSUMNOTREADY is also not set.
475*4882a593Smuzhiyun======================  =======================================================
476*4882a593Smuzhiyun
477*4882a593Smuzhiyunfor convenience there are also the following defines::
478*4882a593Smuzhiyun
479*4882a593Smuzhiyun     #define TP_STATUS_KERNEL        0
480*4882a593Smuzhiyun     #define TP_STATUS_USER          1
481*4882a593Smuzhiyun
482*4882a593SmuzhiyunThe kernel initializes all frames to TP_STATUS_KERNEL, when the kernel
483*4882a593Smuzhiyunreceives a packet it puts in the buffer and updates the status with
484*4882a593Smuzhiyunat least the TP_STATUS_USER flag. Then the user can read the packet,
485*4882a593Smuzhiyunonce the packet is read the user must zero the status field, so the kernel
486*4882a593Smuzhiyuncan use again that frame buffer.
487*4882a593Smuzhiyun
488*4882a593SmuzhiyunThe user can use poll (any other variant should apply too) to check if new
489*4882a593Smuzhiyunpackets are in the ring::
490*4882a593Smuzhiyun
491*4882a593Smuzhiyun    struct pollfd pfd;
492*4882a593Smuzhiyun
493*4882a593Smuzhiyun    pfd.fd = fd;
494*4882a593Smuzhiyun    pfd.revents = 0;
495*4882a593Smuzhiyun    pfd.events = POLLIN|POLLRDNORM|POLLERR;
496*4882a593Smuzhiyun
497*4882a593Smuzhiyun    if (status == TP_STATUS_KERNEL)
498*4882a593Smuzhiyun	retval = poll(&pfd, 1, timeout);
499*4882a593Smuzhiyun
500*4882a593SmuzhiyunIt doesn't incur in a race condition to first check the status value and
501*4882a593Smuzhiyunthen poll for frames.
502*4882a593Smuzhiyun
503*4882a593SmuzhiyunTransmission process
504*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^^^
505*4882a593Smuzhiyun
506*4882a593SmuzhiyunThose defines are also used for transmission::
507*4882a593Smuzhiyun
508*4882a593Smuzhiyun     #define TP_STATUS_AVAILABLE        0 // Frame is available
509*4882a593Smuzhiyun     #define TP_STATUS_SEND_REQUEST     1 // Frame will be sent on next send()
510*4882a593Smuzhiyun     #define TP_STATUS_SENDING          2 // Frame is currently in transmission
511*4882a593Smuzhiyun     #define TP_STATUS_WRONG_FORMAT     4 // Frame format is not correct
512*4882a593Smuzhiyun
513*4882a593SmuzhiyunFirst, the kernel initializes all frames to TP_STATUS_AVAILABLE. To send a
514*4882a593Smuzhiyunpacket, the user fills a data buffer of an available frame, sets tp_len to
515*4882a593Smuzhiyuncurrent data buffer size and sets its status field to TP_STATUS_SEND_REQUEST.
516*4882a593SmuzhiyunThis can be done on multiple frames. Once the user is ready to transmit, it
517*4882a593Smuzhiyuncalls send(). Then all buffers with status equal to TP_STATUS_SEND_REQUEST are
518*4882a593Smuzhiyunforwarded to the network device. The kernel updates each status of sent
519*4882a593Smuzhiyunframes with TP_STATUS_SENDING until the end of transfer.
520*4882a593Smuzhiyun
521*4882a593SmuzhiyunAt the end of each transfer, buffer status returns to TP_STATUS_AVAILABLE.
522*4882a593Smuzhiyun
523*4882a593Smuzhiyun::
524*4882a593Smuzhiyun
525*4882a593Smuzhiyun    header->tp_len = in_i_size;
526*4882a593Smuzhiyun    header->tp_status = TP_STATUS_SEND_REQUEST;
527*4882a593Smuzhiyun    retval = send(this->socket, NULL, 0, 0);
528*4882a593Smuzhiyun
529*4882a593SmuzhiyunThe user can also use poll() to check if a buffer is available:
530*4882a593Smuzhiyun
531*4882a593Smuzhiyun(status == TP_STATUS_SENDING)
532*4882a593Smuzhiyun
533*4882a593Smuzhiyun::
534*4882a593Smuzhiyun
535*4882a593Smuzhiyun    struct pollfd pfd;
536*4882a593Smuzhiyun    pfd.fd = fd;
537*4882a593Smuzhiyun    pfd.revents = 0;
538*4882a593Smuzhiyun    pfd.events = POLLOUT;
539*4882a593Smuzhiyun    retval = poll(&pfd, 1, timeout);
540*4882a593Smuzhiyun
541*4882a593SmuzhiyunWhat TPACKET versions are available and when to use them?
542*4882a593Smuzhiyun=========================================================
543*4882a593Smuzhiyun
544*4882a593Smuzhiyun::
545*4882a593Smuzhiyun
546*4882a593Smuzhiyun int val = tpacket_version;
547*4882a593Smuzhiyun setsockopt(fd, SOL_PACKET, PACKET_VERSION, &val, sizeof(val));
548*4882a593Smuzhiyun getsockopt(fd, SOL_PACKET, PACKET_VERSION, &val, sizeof(val));
549*4882a593Smuzhiyun
550*4882a593Smuzhiyunwhere 'tpacket_version' can be TPACKET_V1 (default), TPACKET_V2, TPACKET_V3.
551*4882a593Smuzhiyun
552*4882a593SmuzhiyunTPACKET_V1:
553*4882a593Smuzhiyun	- Default if not otherwise specified by setsockopt(2)
554*4882a593Smuzhiyun	- RX_RING, TX_RING available
555*4882a593Smuzhiyun
556*4882a593SmuzhiyunTPACKET_V1 --> TPACKET_V2:
557*4882a593Smuzhiyun	- Made 64 bit clean due to unsigned long usage in TPACKET_V1
558*4882a593Smuzhiyun	  structures, thus this also works on 64 bit kernel with 32 bit
559*4882a593Smuzhiyun	  userspace and the like
560*4882a593Smuzhiyun	- Timestamp resolution in nanoseconds instead of microseconds
561*4882a593Smuzhiyun	- RX_RING, TX_RING available
562*4882a593Smuzhiyun	- VLAN metadata information available for packets
563*4882a593Smuzhiyun	  (TP_STATUS_VLAN_VALID, TP_STATUS_VLAN_TPID_VALID),
564*4882a593Smuzhiyun	  in the tpacket2_hdr structure:
565*4882a593Smuzhiyun
566*4882a593Smuzhiyun		- TP_STATUS_VLAN_VALID bit being set into the tp_status field indicates
567*4882a593Smuzhiyun		  that the tp_vlan_tci field has valid VLAN TCI value
568*4882a593Smuzhiyun		- TP_STATUS_VLAN_TPID_VALID bit being set into the tp_status field
569*4882a593Smuzhiyun		  indicates that the tp_vlan_tpid field has valid VLAN TPID value
570*4882a593Smuzhiyun
571*4882a593Smuzhiyun	- How to switch to TPACKET_V2:
572*4882a593Smuzhiyun
573*4882a593Smuzhiyun		1. Replace struct tpacket_hdr by struct tpacket2_hdr
574*4882a593Smuzhiyun		2. Query header len and save
575*4882a593Smuzhiyun		3. Set protocol version to 2, set up ring as usual
576*4882a593Smuzhiyun		4. For getting the sockaddr_ll,
577*4882a593Smuzhiyun		   use ``(void *)hdr + TPACKET_ALIGN(hdrlen)`` instead of
578*4882a593Smuzhiyun		   ``(void *)hdr + TPACKET_ALIGN(sizeof(struct tpacket_hdr))``
579*4882a593Smuzhiyun
580*4882a593SmuzhiyunTPACKET_V2 --> TPACKET_V3:
581*4882a593Smuzhiyun	- Flexible buffer implementation for RX_RING:
582*4882a593Smuzhiyun		1. Blocks can be configured with non-static frame-size
583*4882a593Smuzhiyun		2. Read/poll is at a block-level (as opposed to packet-level)
584*4882a593Smuzhiyun		3. Added poll timeout to avoid indefinite user-space wait
585*4882a593Smuzhiyun		   on idle links
586*4882a593Smuzhiyun		4. Added user-configurable knobs:
587*4882a593Smuzhiyun
588*4882a593Smuzhiyun			4.1 block::timeout
589*4882a593Smuzhiyun			4.2 tpkt_hdr::sk_rxhash
590*4882a593Smuzhiyun
591*4882a593Smuzhiyun	- RX Hash data available in user space
592*4882a593Smuzhiyun	- TX_RING semantics are conceptually similar to TPACKET_V2;
593*4882a593Smuzhiyun	  use tpacket3_hdr instead of tpacket2_hdr, and TPACKET3_HDRLEN
594*4882a593Smuzhiyun	  instead of TPACKET2_HDRLEN. In the current implementation,
595*4882a593Smuzhiyun	  the tp_next_offset field in the tpacket3_hdr MUST be set to
596*4882a593Smuzhiyun	  zero, indicating that the ring does not hold variable sized frames.
597*4882a593Smuzhiyun	  Packets with non-zero values of tp_next_offset will be dropped.
598*4882a593Smuzhiyun
599*4882a593SmuzhiyunAF_PACKET fanout mode
600*4882a593Smuzhiyun=====================
601*4882a593Smuzhiyun
602*4882a593SmuzhiyunIn the AF_PACKET fanout mode, packet reception can be load balanced among
603*4882a593Smuzhiyunprocesses. This also works in combination with mmap(2) on packet sockets.
604*4882a593Smuzhiyun
605*4882a593SmuzhiyunCurrently implemented fanout policies are:
606*4882a593Smuzhiyun
607*4882a593Smuzhiyun  - PACKET_FANOUT_HASH: schedule to socket by skb's packet hash
608*4882a593Smuzhiyun  - PACKET_FANOUT_LB: schedule to socket by round-robin
609*4882a593Smuzhiyun  - PACKET_FANOUT_CPU: schedule to socket by CPU packet arrives on
610*4882a593Smuzhiyun  - PACKET_FANOUT_RND: schedule to socket by random selection
611*4882a593Smuzhiyun  - PACKET_FANOUT_ROLLOVER: if one socket is full, rollover to another
612*4882a593Smuzhiyun  - PACKET_FANOUT_QM: schedule to socket by skbs recorded queue_mapping
613*4882a593Smuzhiyun
614*4882a593SmuzhiyunMinimal example code by David S. Miller (try things like "./test eth0 hash",
615*4882a593Smuzhiyun"./test eth0 lb", etc.)::
616*4882a593Smuzhiyun
617*4882a593Smuzhiyun    #include <stddef.h>
618*4882a593Smuzhiyun    #include <stdlib.h>
619*4882a593Smuzhiyun    #include <stdio.h>
620*4882a593Smuzhiyun    #include <string.h>
621*4882a593Smuzhiyun
622*4882a593Smuzhiyun    #include <sys/types.h>
623*4882a593Smuzhiyun    #include <sys/wait.h>
624*4882a593Smuzhiyun    #include <sys/socket.h>
625*4882a593Smuzhiyun    #include <sys/ioctl.h>
626*4882a593Smuzhiyun
627*4882a593Smuzhiyun    #include <unistd.h>
628*4882a593Smuzhiyun
629*4882a593Smuzhiyun    #include <linux/if_ether.h>
630*4882a593Smuzhiyun    #include <linux/if_packet.h>
631*4882a593Smuzhiyun
632*4882a593Smuzhiyun    #include <net/if.h>
633*4882a593Smuzhiyun
634*4882a593Smuzhiyun    static const char *device_name;
635*4882a593Smuzhiyun    static int fanout_type;
636*4882a593Smuzhiyun    static int fanout_id;
637*4882a593Smuzhiyun
638*4882a593Smuzhiyun    #ifndef PACKET_FANOUT
639*4882a593Smuzhiyun    # define PACKET_FANOUT			18
640*4882a593Smuzhiyun    # define PACKET_FANOUT_HASH		0
641*4882a593Smuzhiyun    # define PACKET_FANOUT_LB		1
642*4882a593Smuzhiyun    #endif
643*4882a593Smuzhiyun
644*4882a593Smuzhiyun    static int setup_socket(void)
645*4882a593Smuzhiyun    {
646*4882a593Smuzhiyun	    int err, fd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_IP));
647*4882a593Smuzhiyun	    struct sockaddr_ll ll;
648*4882a593Smuzhiyun	    struct ifreq ifr;
649*4882a593Smuzhiyun	    int fanout_arg;
650*4882a593Smuzhiyun
651*4882a593Smuzhiyun	    if (fd < 0) {
652*4882a593Smuzhiyun		    perror("socket");
653*4882a593Smuzhiyun		    return EXIT_FAILURE;
654*4882a593Smuzhiyun	    }
655*4882a593Smuzhiyun
656*4882a593Smuzhiyun	    memset(&ifr, 0, sizeof(ifr));
657*4882a593Smuzhiyun	    strcpy(ifr.ifr_name, device_name);
658*4882a593Smuzhiyun	    err = ioctl(fd, SIOCGIFINDEX, &ifr);
659*4882a593Smuzhiyun	    if (err < 0) {
660*4882a593Smuzhiyun		    perror("SIOCGIFINDEX");
661*4882a593Smuzhiyun		    return EXIT_FAILURE;
662*4882a593Smuzhiyun	    }
663*4882a593Smuzhiyun
664*4882a593Smuzhiyun	    memset(&ll, 0, sizeof(ll));
665*4882a593Smuzhiyun	    ll.sll_family = AF_PACKET;
666*4882a593Smuzhiyun	    ll.sll_ifindex = ifr.ifr_ifindex;
667*4882a593Smuzhiyun	    err = bind(fd, (struct sockaddr *) &ll, sizeof(ll));
668*4882a593Smuzhiyun	    if (err < 0) {
669*4882a593Smuzhiyun		    perror("bind");
670*4882a593Smuzhiyun		    return EXIT_FAILURE;
671*4882a593Smuzhiyun	    }
672*4882a593Smuzhiyun
673*4882a593Smuzhiyun	    fanout_arg = (fanout_id | (fanout_type << 16));
674*4882a593Smuzhiyun	    err = setsockopt(fd, SOL_PACKET, PACKET_FANOUT,
675*4882a593Smuzhiyun			    &fanout_arg, sizeof(fanout_arg));
676*4882a593Smuzhiyun	    if (err) {
677*4882a593Smuzhiyun		    perror("setsockopt");
678*4882a593Smuzhiyun		    return EXIT_FAILURE;
679*4882a593Smuzhiyun	    }
680*4882a593Smuzhiyun
681*4882a593Smuzhiyun	    return fd;
682*4882a593Smuzhiyun    }
683*4882a593Smuzhiyun
684*4882a593Smuzhiyun    static void fanout_thread(void)
685*4882a593Smuzhiyun    {
686*4882a593Smuzhiyun	    int fd = setup_socket();
687*4882a593Smuzhiyun	    int limit = 10000;
688*4882a593Smuzhiyun
689*4882a593Smuzhiyun	    if (fd < 0)
690*4882a593Smuzhiyun		    exit(fd);
691*4882a593Smuzhiyun
692*4882a593Smuzhiyun	    while (limit-- > 0) {
693*4882a593Smuzhiyun		    char buf[1600];
694*4882a593Smuzhiyun		    int err;
695*4882a593Smuzhiyun
696*4882a593Smuzhiyun		    err = read(fd, buf, sizeof(buf));
697*4882a593Smuzhiyun		    if (err < 0) {
698*4882a593Smuzhiyun			    perror("read");
699*4882a593Smuzhiyun			    exit(EXIT_FAILURE);
700*4882a593Smuzhiyun		    }
701*4882a593Smuzhiyun		    if ((limit % 10) == 0)
702*4882a593Smuzhiyun			    fprintf(stdout, "(%d) \n", getpid());
703*4882a593Smuzhiyun	    }
704*4882a593Smuzhiyun
705*4882a593Smuzhiyun	    fprintf(stdout, "%d: Received 10000 packets\n", getpid());
706*4882a593Smuzhiyun
707*4882a593Smuzhiyun	    close(fd);
708*4882a593Smuzhiyun	    exit(0);
709*4882a593Smuzhiyun    }
710*4882a593Smuzhiyun
711*4882a593Smuzhiyun    int main(int argc, char **argp)
712*4882a593Smuzhiyun    {
713*4882a593Smuzhiyun	    int fd, err;
714*4882a593Smuzhiyun	    int i;
715*4882a593Smuzhiyun
716*4882a593Smuzhiyun	    if (argc != 3) {
717*4882a593Smuzhiyun		    fprintf(stderr, "Usage: %s INTERFACE {hash|lb}\n", argp[0]);
718*4882a593Smuzhiyun		    return EXIT_FAILURE;
719*4882a593Smuzhiyun	    }
720*4882a593Smuzhiyun
721*4882a593Smuzhiyun	    if (!strcmp(argp[2], "hash"))
722*4882a593Smuzhiyun		    fanout_type = PACKET_FANOUT_HASH;
723*4882a593Smuzhiyun	    else if (!strcmp(argp[2], "lb"))
724*4882a593Smuzhiyun		    fanout_type = PACKET_FANOUT_LB;
725*4882a593Smuzhiyun	    else {
726*4882a593Smuzhiyun		    fprintf(stderr, "Unknown fanout type [%s]\n", argp[2]);
727*4882a593Smuzhiyun		    exit(EXIT_FAILURE);
728*4882a593Smuzhiyun	    }
729*4882a593Smuzhiyun
730*4882a593Smuzhiyun	    device_name = argp[1];
731*4882a593Smuzhiyun	    fanout_id = getpid() & 0xffff;
732*4882a593Smuzhiyun
733*4882a593Smuzhiyun	    for (i = 0; i < 4; i++) {
734*4882a593Smuzhiyun		    pid_t pid = fork();
735*4882a593Smuzhiyun
736*4882a593Smuzhiyun		    switch (pid) {
737*4882a593Smuzhiyun		    case 0:
738*4882a593Smuzhiyun			    fanout_thread();
739*4882a593Smuzhiyun
740*4882a593Smuzhiyun		    case -1:
741*4882a593Smuzhiyun			    perror("fork");
742*4882a593Smuzhiyun			    exit(EXIT_FAILURE);
743*4882a593Smuzhiyun		    }
744*4882a593Smuzhiyun	    }
745*4882a593Smuzhiyun
746*4882a593Smuzhiyun	    for (i = 0; i < 4; i++) {
747*4882a593Smuzhiyun		    int status;
748*4882a593Smuzhiyun
749*4882a593Smuzhiyun		    wait(&status);
750*4882a593Smuzhiyun	    }
751*4882a593Smuzhiyun
752*4882a593Smuzhiyun	    return 0;
753*4882a593Smuzhiyun    }
754*4882a593Smuzhiyun
755*4882a593SmuzhiyunAF_PACKET TPACKET_V3 example
756*4882a593Smuzhiyun============================
757*4882a593Smuzhiyun
758*4882a593SmuzhiyunAF_PACKET's TPACKET_V3 ring buffer can be configured to use non-static frame
759*4882a593Smuzhiyunsizes by doing it's own memory management. It is based on blocks where polling
760*4882a593Smuzhiyunworks on a per block basis instead of per ring as in TPACKET_V2 and predecessor.
761*4882a593Smuzhiyun
762*4882a593SmuzhiyunIt is said that TPACKET_V3 brings the following benefits:
763*4882a593Smuzhiyun
764*4882a593Smuzhiyun * ~15% - 20% reduction in CPU-usage
765*4882a593Smuzhiyun * ~20% increase in packet capture rate
766*4882a593Smuzhiyun * ~2x increase in packet density
767*4882a593Smuzhiyun * Port aggregation analysis
768*4882a593Smuzhiyun * Non static frame size to capture entire packet payload
769*4882a593Smuzhiyun
770*4882a593SmuzhiyunSo it seems to be a good candidate to be used with packet fanout.
771*4882a593Smuzhiyun
772*4882a593SmuzhiyunMinimal example code by Daniel Borkmann based on Chetan Loke's lolpcap (compile
773*4882a593Smuzhiyunit with gcc -Wall -O2 blob.c, and try things like "./a.out eth0", etc.)::
774*4882a593Smuzhiyun
775*4882a593Smuzhiyun    /* Written from scratch, but kernel-to-user space API usage
776*4882a593Smuzhiyun    * dissected from lolpcap:
777*4882a593Smuzhiyun    *  Copyright 2011, Chetan Loke <loke.chetan@gmail.com>
778*4882a593Smuzhiyun    *  License: GPL, version 2.0
779*4882a593Smuzhiyun    */
780*4882a593Smuzhiyun
781*4882a593Smuzhiyun    #include <stdio.h>
782*4882a593Smuzhiyun    #include <stdlib.h>
783*4882a593Smuzhiyun    #include <stdint.h>
784*4882a593Smuzhiyun    #include <string.h>
785*4882a593Smuzhiyun    #include <assert.h>
786*4882a593Smuzhiyun    #include <net/if.h>
787*4882a593Smuzhiyun    #include <arpa/inet.h>
788*4882a593Smuzhiyun    #include <netdb.h>
789*4882a593Smuzhiyun    #include <poll.h>
790*4882a593Smuzhiyun    #include <unistd.h>
791*4882a593Smuzhiyun    #include <signal.h>
792*4882a593Smuzhiyun    #include <inttypes.h>
793*4882a593Smuzhiyun    #include <sys/socket.h>
794*4882a593Smuzhiyun    #include <sys/mman.h>
795*4882a593Smuzhiyun    #include <linux/if_packet.h>
796*4882a593Smuzhiyun    #include <linux/if_ether.h>
797*4882a593Smuzhiyun    #include <linux/ip.h>
798*4882a593Smuzhiyun
799*4882a593Smuzhiyun    #ifndef likely
800*4882a593Smuzhiyun    # define likely(x)		__builtin_expect(!!(x), 1)
801*4882a593Smuzhiyun    #endif
802*4882a593Smuzhiyun    #ifndef unlikely
803*4882a593Smuzhiyun    # define unlikely(x)		__builtin_expect(!!(x), 0)
804*4882a593Smuzhiyun    #endif
805*4882a593Smuzhiyun
806*4882a593Smuzhiyun    struct block_desc {
807*4882a593Smuzhiyun	    uint32_t version;
808*4882a593Smuzhiyun	    uint32_t offset_to_priv;
809*4882a593Smuzhiyun	    struct tpacket_hdr_v1 h1;
810*4882a593Smuzhiyun    };
811*4882a593Smuzhiyun
812*4882a593Smuzhiyun    struct ring {
813*4882a593Smuzhiyun	    struct iovec *rd;
814*4882a593Smuzhiyun	    uint8_t *map;
815*4882a593Smuzhiyun	    struct tpacket_req3 req;
816*4882a593Smuzhiyun    };
817*4882a593Smuzhiyun
818*4882a593Smuzhiyun    static unsigned long packets_total = 0, bytes_total = 0;
819*4882a593Smuzhiyun    static sig_atomic_t sigint = 0;
820*4882a593Smuzhiyun
821*4882a593Smuzhiyun    static void sighandler(int num)
822*4882a593Smuzhiyun    {
823*4882a593Smuzhiyun	    sigint = 1;
824*4882a593Smuzhiyun    }
825*4882a593Smuzhiyun
826*4882a593Smuzhiyun    static int setup_socket(struct ring *ring, char *netdev)
827*4882a593Smuzhiyun    {
828*4882a593Smuzhiyun	    int err, i, fd, v = TPACKET_V3;
829*4882a593Smuzhiyun	    struct sockaddr_ll ll;
830*4882a593Smuzhiyun	    unsigned int blocksiz = 1 << 22, framesiz = 1 << 11;
831*4882a593Smuzhiyun	    unsigned int blocknum = 64;
832*4882a593Smuzhiyun
833*4882a593Smuzhiyun	    fd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
834*4882a593Smuzhiyun	    if (fd < 0) {
835*4882a593Smuzhiyun		    perror("socket");
836*4882a593Smuzhiyun		    exit(1);
837*4882a593Smuzhiyun	    }
838*4882a593Smuzhiyun
839*4882a593Smuzhiyun	    err = setsockopt(fd, SOL_PACKET, PACKET_VERSION, &v, sizeof(v));
840*4882a593Smuzhiyun	    if (err < 0) {
841*4882a593Smuzhiyun		    perror("setsockopt");
842*4882a593Smuzhiyun		    exit(1);
843*4882a593Smuzhiyun	    }
844*4882a593Smuzhiyun
845*4882a593Smuzhiyun	    memset(&ring->req, 0, sizeof(ring->req));
846*4882a593Smuzhiyun	    ring->req.tp_block_size = blocksiz;
847*4882a593Smuzhiyun	    ring->req.tp_frame_size = framesiz;
848*4882a593Smuzhiyun	    ring->req.tp_block_nr = blocknum;
849*4882a593Smuzhiyun	    ring->req.tp_frame_nr = (blocksiz * blocknum) / framesiz;
850*4882a593Smuzhiyun	    ring->req.tp_retire_blk_tov = 60;
851*4882a593Smuzhiyun	    ring->req.tp_feature_req_word = TP_FT_REQ_FILL_RXHASH;
852*4882a593Smuzhiyun
853*4882a593Smuzhiyun	    err = setsockopt(fd, SOL_PACKET, PACKET_RX_RING, &ring->req,
854*4882a593Smuzhiyun			    sizeof(ring->req));
855*4882a593Smuzhiyun	    if (err < 0) {
856*4882a593Smuzhiyun		    perror("setsockopt");
857*4882a593Smuzhiyun		    exit(1);
858*4882a593Smuzhiyun	    }
859*4882a593Smuzhiyun
860*4882a593Smuzhiyun	    ring->map = mmap(NULL, ring->req.tp_block_size * ring->req.tp_block_nr,
861*4882a593Smuzhiyun			    PROT_READ | PROT_WRITE, MAP_SHARED | MAP_LOCKED, fd, 0);
862*4882a593Smuzhiyun	    if (ring->map == MAP_FAILED) {
863*4882a593Smuzhiyun		    perror("mmap");
864*4882a593Smuzhiyun		    exit(1);
865*4882a593Smuzhiyun	    }
866*4882a593Smuzhiyun
867*4882a593Smuzhiyun	    ring->rd = malloc(ring->req.tp_block_nr * sizeof(*ring->rd));
868*4882a593Smuzhiyun	    assert(ring->rd);
869*4882a593Smuzhiyun	    for (i = 0; i < ring->req.tp_block_nr; ++i) {
870*4882a593Smuzhiyun		    ring->rd[i].iov_base = ring->map + (i * ring->req.tp_block_size);
871*4882a593Smuzhiyun		    ring->rd[i].iov_len = ring->req.tp_block_size;
872*4882a593Smuzhiyun	    }
873*4882a593Smuzhiyun
874*4882a593Smuzhiyun	    memset(&ll, 0, sizeof(ll));
875*4882a593Smuzhiyun	    ll.sll_family = PF_PACKET;
876*4882a593Smuzhiyun	    ll.sll_protocol = htons(ETH_P_ALL);
877*4882a593Smuzhiyun	    ll.sll_ifindex = if_nametoindex(netdev);
878*4882a593Smuzhiyun	    ll.sll_hatype = 0;
879*4882a593Smuzhiyun	    ll.sll_pkttype = 0;
880*4882a593Smuzhiyun	    ll.sll_halen = 0;
881*4882a593Smuzhiyun
882*4882a593Smuzhiyun	    err = bind(fd, (struct sockaddr *) &ll, sizeof(ll));
883*4882a593Smuzhiyun	    if (err < 0) {
884*4882a593Smuzhiyun		    perror("bind");
885*4882a593Smuzhiyun		    exit(1);
886*4882a593Smuzhiyun	    }
887*4882a593Smuzhiyun
888*4882a593Smuzhiyun	    return fd;
889*4882a593Smuzhiyun    }
890*4882a593Smuzhiyun
891*4882a593Smuzhiyun    static void display(struct tpacket3_hdr *ppd)
892*4882a593Smuzhiyun    {
893*4882a593Smuzhiyun	    struct ethhdr *eth = (struct ethhdr *) ((uint8_t *) ppd + ppd->tp_mac);
894*4882a593Smuzhiyun	    struct iphdr *ip = (struct iphdr *) ((uint8_t *) eth + ETH_HLEN);
895*4882a593Smuzhiyun
896*4882a593Smuzhiyun	    if (eth->h_proto == htons(ETH_P_IP)) {
897*4882a593Smuzhiyun		    struct sockaddr_in ss, sd;
898*4882a593Smuzhiyun		    char sbuff[NI_MAXHOST], dbuff[NI_MAXHOST];
899*4882a593Smuzhiyun
900*4882a593Smuzhiyun		    memset(&ss, 0, sizeof(ss));
901*4882a593Smuzhiyun		    ss.sin_family = PF_INET;
902*4882a593Smuzhiyun		    ss.sin_addr.s_addr = ip->saddr;
903*4882a593Smuzhiyun		    getnameinfo((struct sockaddr *) &ss, sizeof(ss),
904*4882a593Smuzhiyun				sbuff, sizeof(sbuff), NULL, 0, NI_NUMERICHOST);
905*4882a593Smuzhiyun
906*4882a593Smuzhiyun		    memset(&sd, 0, sizeof(sd));
907*4882a593Smuzhiyun		    sd.sin_family = PF_INET;
908*4882a593Smuzhiyun		    sd.sin_addr.s_addr = ip->daddr;
909*4882a593Smuzhiyun		    getnameinfo((struct sockaddr *) &sd, sizeof(sd),
910*4882a593Smuzhiyun				dbuff, sizeof(dbuff), NULL, 0, NI_NUMERICHOST);
911*4882a593Smuzhiyun
912*4882a593Smuzhiyun		    printf("%s -> %s, ", sbuff, dbuff);
913*4882a593Smuzhiyun	    }
914*4882a593Smuzhiyun
915*4882a593Smuzhiyun	    printf("rxhash: 0x%x\n", ppd->hv1.tp_rxhash);
916*4882a593Smuzhiyun    }
917*4882a593Smuzhiyun
918*4882a593Smuzhiyun    static void walk_block(struct block_desc *pbd, const int block_num)
919*4882a593Smuzhiyun    {
920*4882a593Smuzhiyun	    int num_pkts = pbd->h1.num_pkts, i;
921*4882a593Smuzhiyun	    unsigned long bytes = 0;
922*4882a593Smuzhiyun	    struct tpacket3_hdr *ppd;
923*4882a593Smuzhiyun
924*4882a593Smuzhiyun	    ppd = (struct tpacket3_hdr *) ((uint8_t *) pbd +
925*4882a593Smuzhiyun					pbd->h1.offset_to_first_pkt);
926*4882a593Smuzhiyun	    for (i = 0; i < num_pkts; ++i) {
927*4882a593Smuzhiyun		    bytes += ppd->tp_snaplen;
928*4882a593Smuzhiyun		    display(ppd);
929*4882a593Smuzhiyun
930*4882a593Smuzhiyun		    ppd = (struct tpacket3_hdr *) ((uint8_t *) ppd +
931*4882a593Smuzhiyun						ppd->tp_next_offset);
932*4882a593Smuzhiyun	    }
933*4882a593Smuzhiyun
934*4882a593Smuzhiyun	    packets_total += num_pkts;
935*4882a593Smuzhiyun	    bytes_total += bytes;
936*4882a593Smuzhiyun    }
937*4882a593Smuzhiyun
938*4882a593Smuzhiyun    static void flush_block(struct block_desc *pbd)
939*4882a593Smuzhiyun    {
940*4882a593Smuzhiyun	    pbd->h1.block_status = TP_STATUS_KERNEL;
941*4882a593Smuzhiyun    }
942*4882a593Smuzhiyun
943*4882a593Smuzhiyun    static void teardown_socket(struct ring *ring, int fd)
944*4882a593Smuzhiyun    {
945*4882a593Smuzhiyun	    munmap(ring->map, ring->req.tp_block_size * ring->req.tp_block_nr);
946*4882a593Smuzhiyun	    free(ring->rd);
947*4882a593Smuzhiyun	    close(fd);
948*4882a593Smuzhiyun    }
949*4882a593Smuzhiyun
950*4882a593Smuzhiyun    int main(int argc, char **argp)
951*4882a593Smuzhiyun    {
952*4882a593Smuzhiyun	    int fd, err;
953*4882a593Smuzhiyun	    socklen_t len;
954*4882a593Smuzhiyun	    struct ring ring;
955*4882a593Smuzhiyun	    struct pollfd pfd;
956*4882a593Smuzhiyun	    unsigned int block_num = 0, blocks = 64;
957*4882a593Smuzhiyun	    struct block_desc *pbd;
958*4882a593Smuzhiyun	    struct tpacket_stats_v3 stats;
959*4882a593Smuzhiyun
960*4882a593Smuzhiyun	    if (argc != 2) {
961*4882a593Smuzhiyun		    fprintf(stderr, "Usage: %s INTERFACE\n", argp[0]);
962*4882a593Smuzhiyun		    return EXIT_FAILURE;
963*4882a593Smuzhiyun	    }
964*4882a593Smuzhiyun
965*4882a593Smuzhiyun	    signal(SIGINT, sighandler);
966*4882a593Smuzhiyun
967*4882a593Smuzhiyun	    memset(&ring, 0, sizeof(ring));
968*4882a593Smuzhiyun	    fd = setup_socket(&ring, argp[argc - 1]);
969*4882a593Smuzhiyun	    assert(fd > 0);
970*4882a593Smuzhiyun
971*4882a593Smuzhiyun	    memset(&pfd, 0, sizeof(pfd));
972*4882a593Smuzhiyun	    pfd.fd = fd;
973*4882a593Smuzhiyun	    pfd.events = POLLIN | POLLERR;
974*4882a593Smuzhiyun	    pfd.revents = 0;
975*4882a593Smuzhiyun
976*4882a593Smuzhiyun	    while (likely(!sigint)) {
977*4882a593Smuzhiyun		    pbd = (struct block_desc *) ring.rd[block_num].iov_base;
978*4882a593Smuzhiyun
979*4882a593Smuzhiyun		    if ((pbd->h1.block_status & TP_STATUS_USER) == 0) {
980*4882a593Smuzhiyun			    poll(&pfd, 1, -1);
981*4882a593Smuzhiyun			    continue;
982*4882a593Smuzhiyun		    }
983*4882a593Smuzhiyun
984*4882a593Smuzhiyun		    walk_block(pbd, block_num);
985*4882a593Smuzhiyun		    flush_block(pbd);
986*4882a593Smuzhiyun		    block_num = (block_num + 1) % blocks;
987*4882a593Smuzhiyun	    }
988*4882a593Smuzhiyun
989*4882a593Smuzhiyun	    len = sizeof(stats);
990*4882a593Smuzhiyun	    err = getsockopt(fd, SOL_PACKET, PACKET_STATISTICS, &stats, &len);
991*4882a593Smuzhiyun	    if (err < 0) {
992*4882a593Smuzhiyun		    perror("getsockopt");
993*4882a593Smuzhiyun		    exit(1);
994*4882a593Smuzhiyun	    }
995*4882a593Smuzhiyun
996*4882a593Smuzhiyun	    fflush(stdout);
997*4882a593Smuzhiyun	    printf("\nReceived %u packets, %lu bytes, %u dropped, freeze_q_cnt: %u\n",
998*4882a593Smuzhiyun		stats.tp_packets, bytes_total, stats.tp_drops,
999*4882a593Smuzhiyun		stats.tp_freeze_q_cnt);
1000*4882a593Smuzhiyun
1001*4882a593Smuzhiyun	    teardown_socket(&ring, fd);
1002*4882a593Smuzhiyun	    return 0;
1003*4882a593Smuzhiyun    }
1004*4882a593Smuzhiyun
1005*4882a593SmuzhiyunPACKET_QDISC_BYPASS
1006*4882a593Smuzhiyun===================
1007*4882a593Smuzhiyun
1008*4882a593SmuzhiyunIf there is a requirement to load the network with many packets in a similar
1009*4882a593Smuzhiyunfashion as pktgen does, you might set the following option after socket
1010*4882a593Smuzhiyuncreation::
1011*4882a593Smuzhiyun
1012*4882a593Smuzhiyun    int one = 1;
1013*4882a593Smuzhiyun    setsockopt(fd, SOL_PACKET, PACKET_QDISC_BYPASS, &one, sizeof(one));
1014*4882a593Smuzhiyun
1015*4882a593SmuzhiyunThis has the side-effect, that packets sent through PF_PACKET will bypass the
1016*4882a593Smuzhiyunkernel's qdisc layer and are forcedly pushed to the driver directly. Meaning,
1017*4882a593Smuzhiyunpacket are not buffered, tc disciplines are ignored, increased loss can occur
1018*4882a593Smuzhiyunand such packets are also not visible to other PF_PACKET sockets anymore. So,
1019*4882a593Smuzhiyunyou have been warned; generally, this can be useful for stress testing various
1020*4882a593Smuzhiyuncomponents of a system.
1021*4882a593Smuzhiyun
1022*4882a593SmuzhiyunOn default, PACKET_QDISC_BYPASS is disabled and needs to be explicitly enabled
1023*4882a593Smuzhiyunon PF_PACKET sockets.
1024*4882a593Smuzhiyun
1025*4882a593SmuzhiyunPACKET_TIMESTAMP
1026*4882a593Smuzhiyun================
1027*4882a593Smuzhiyun
1028*4882a593SmuzhiyunThe PACKET_TIMESTAMP setting determines the source of the timestamp in
1029*4882a593Smuzhiyunthe packet meta information for mmap(2)ed RX_RING and TX_RINGs.  If your
1030*4882a593SmuzhiyunNIC is capable of timestamping packets in hardware, you can request those
1031*4882a593Smuzhiyunhardware timestamps to be used. Note: you may need to enable the generation
1032*4882a593Smuzhiyunof hardware timestamps with SIOCSHWTSTAMP (see related information from
1033*4882a593SmuzhiyunDocumentation/networking/timestamping.rst).
1034*4882a593Smuzhiyun
1035*4882a593SmuzhiyunPACKET_TIMESTAMP accepts the same integer bit field as SO_TIMESTAMPING::
1036*4882a593Smuzhiyun
1037*4882a593Smuzhiyun    int req = SOF_TIMESTAMPING_RAW_HARDWARE;
1038*4882a593Smuzhiyun    setsockopt(fd, SOL_PACKET, PACKET_TIMESTAMP, (void *) &req, sizeof(req))
1039*4882a593Smuzhiyun
1040*4882a593SmuzhiyunFor the mmap(2)ed ring buffers, such timestamps are stored in the
1041*4882a593Smuzhiyun``tpacket{,2,3}_hdr`` structure's tp_sec and ``tp_{n,u}sec`` members.
1042*4882a593SmuzhiyunTo determine what kind of timestamp has been reported, the tp_status field
1043*4882a593Smuzhiyunis binary or'ed with the following possible bits ...
1044*4882a593Smuzhiyun
1045*4882a593Smuzhiyun::
1046*4882a593Smuzhiyun
1047*4882a593Smuzhiyun    TP_STATUS_TS_RAW_HARDWARE
1048*4882a593Smuzhiyun    TP_STATUS_TS_SOFTWARE
1049*4882a593Smuzhiyun
1050*4882a593Smuzhiyun... that are equivalent to its ``SOF_TIMESTAMPING_*`` counterparts. For the
1051*4882a593SmuzhiyunRX_RING, if neither is set (i.e. PACKET_TIMESTAMP is not set), then a
1052*4882a593Smuzhiyunsoftware fallback was invoked *within* PF_PACKET's processing code (less
1053*4882a593Smuzhiyunprecise).
1054*4882a593Smuzhiyun
1055*4882a593SmuzhiyunGetting timestamps for the TX_RING works as follows: i) fill the ring frames,
1056*4882a593Smuzhiyunii) call sendto() e.g. in blocking mode, iii) wait for status of relevant
1057*4882a593Smuzhiyunframes to be updated resp. the frame handed over to the application, iv) walk
1058*4882a593Smuzhiyunthrough the frames to pick up the individual hw/sw timestamps.
1059*4882a593Smuzhiyun
1060*4882a593SmuzhiyunOnly (!) if transmit timestamping is enabled, then these bits are combined
1061*4882a593Smuzhiyunwith binary | with TP_STATUS_AVAILABLE, so you must check for that in your
1062*4882a593Smuzhiyunapplication (e.g. !(tp_status & (TP_STATUS_SEND_REQUEST | TP_STATUS_SENDING))
1063*4882a593Smuzhiyunin a first step to see if the frame belongs to the application, and then
1064*4882a593Smuzhiyunone can extract the type of timestamp in a second step from tp_status)!
1065*4882a593Smuzhiyun
1066*4882a593SmuzhiyunIf you don't care about them, thus having it disabled, checking for
1067*4882a593SmuzhiyunTP_STATUS_AVAILABLE resp. TP_STATUS_WRONG_FORMAT is sufficient. If in the
1068*4882a593SmuzhiyunTX_RING part only TP_STATUS_AVAILABLE is set, then the tp_sec and tp_{n,u}sec
1069*4882a593Smuzhiyunmembers do not contain a valid value. For TX_RINGs, by default no timestamp
1070*4882a593Smuzhiyunis generated!
1071*4882a593Smuzhiyun
1072*4882a593SmuzhiyunSee include/linux/net_tstamp.h and Documentation/networking/timestamping.rst
1073*4882a593Smuzhiyunfor more information on hardware timestamps.
1074*4882a593Smuzhiyun
1075*4882a593SmuzhiyunMiscellaneous bits
1076*4882a593Smuzhiyun==================
1077*4882a593Smuzhiyun
1078*4882a593Smuzhiyun- Packet sockets work well together with Linux socket filters, thus you also
1079*4882a593Smuzhiyun  might want to have a look at Documentation/networking/filter.rst
1080*4882a593Smuzhiyun
1081*4882a593SmuzhiyunTHANKS
1082*4882a593Smuzhiyun======
1083*4882a593Smuzhiyun
1084*4882a593Smuzhiyun   Jesse Brandeburg, for fixing my grammathical/spelling errors
1085