xref: /OK3568_Linux_fs/kernel/Documentation/infiniband/opa_vnic.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun=================================================================
2*4882a593SmuzhiyunIntel Omni-Path (OPA) Virtual Network Interface Controller (VNIC)
3*4882a593Smuzhiyun=================================================================
4*4882a593Smuzhiyun
5*4882a593SmuzhiyunIntel Omni-Path (OPA) Virtual Network Interface Controller (VNIC) feature
6*4882a593Smuzhiyunsupports Ethernet functionality over Omni-Path fabric by encapsulating
7*4882a593Smuzhiyunthe Ethernet packets between HFI nodes.
8*4882a593Smuzhiyun
9*4882a593SmuzhiyunArchitecture
10*4882a593Smuzhiyun=============
11*4882a593SmuzhiyunThe patterns of exchanges of Omni-Path encapsulated Ethernet packets
12*4882a593Smuzhiyuninvolves one or more virtual Ethernet switches overlaid on the Omni-Path
13*4882a593Smuzhiyunfabric topology. A subset of HFI nodes on the Omni-Path fabric are
14*4882a593Smuzhiyunpermitted to exchange encapsulated Ethernet packets across a particular
15*4882a593Smuzhiyunvirtual Ethernet switch. The virtual Ethernet switches are logical
16*4882a593Smuzhiyunabstractions achieved by configuring the HFI nodes on the fabric for
17*4882a593Smuzhiyunheader generation and processing. In the simplest configuration all HFI
18*4882a593Smuzhiyunnodes across the fabric exchange encapsulated Ethernet packets over a
19*4882a593Smuzhiyunsingle virtual Ethernet switch. A virtual Ethernet switch, is effectively
20*4882a593Smuzhiyunan independent Ethernet network. The configuration is performed by an
21*4882a593SmuzhiyunEthernet Manager (EM) which is part of the trusted Fabric Manager (FM)
22*4882a593Smuzhiyunapplication. HFI nodes can have multiple VNICs each connected to a
23*4882a593Smuzhiyundifferent virtual Ethernet switch. The below diagram presents a case
24*4882a593Smuzhiyunof two virtual Ethernet switches with two HFI nodes::
25*4882a593Smuzhiyun
26*4882a593Smuzhiyun                               +-------------------+
27*4882a593Smuzhiyun                               |      Subnet/      |
28*4882a593Smuzhiyun                               |     Ethernet      |
29*4882a593Smuzhiyun                               |      Manager      |
30*4882a593Smuzhiyun                               +-------------------+
31*4882a593Smuzhiyun                                  /          /
32*4882a593Smuzhiyun                                /           /
33*4882a593Smuzhiyun                              /            /
34*4882a593Smuzhiyun                            /             /
35*4882a593Smuzhiyun  +-----------------------------+  +------------------------------+
36*4882a593Smuzhiyun  |  Virtual Ethernet Switch    |  |  Virtual Ethernet Switch     |
37*4882a593Smuzhiyun  |  +---------+    +---------+ |  | +---------+    +---------+   |
38*4882a593Smuzhiyun  |  | VPORT   |    |  VPORT  | |  | |  VPORT  |    |  VPORT  |   |
39*4882a593Smuzhiyun  +--+---------+----+---------+-+  +-+---------+----+---------+---+
40*4882a593Smuzhiyun           |                 \        /                 |
41*4882a593Smuzhiyun           |                   \    /                   |
42*4882a593Smuzhiyun           |                     \/                     |
43*4882a593Smuzhiyun           |                    /  \                    |
44*4882a593Smuzhiyun           |                  /      \                  |
45*4882a593Smuzhiyun       +-----------+------------+  +-----------+------------+
46*4882a593Smuzhiyun       |   VNIC    |    VNIC    |  |    VNIC   |    VNIC    |
47*4882a593Smuzhiyun       +-----------+------------+  +-----------+------------+
48*4882a593Smuzhiyun       |          HFI           |  |          HFI           |
49*4882a593Smuzhiyun       +------------------------+  +------------------------+
50*4882a593Smuzhiyun
51*4882a593Smuzhiyun
52*4882a593SmuzhiyunThe Omni-Path encapsulated Ethernet packet format is as described below.
53*4882a593Smuzhiyun
54*4882a593Smuzhiyun==================== ================================
55*4882a593SmuzhiyunBits                 Field
56*4882a593Smuzhiyun==================== ================================
57*4882a593SmuzhiyunQuad Word 0:
58*4882a593Smuzhiyun0-19                 SLID (lower 20 bits)
59*4882a593Smuzhiyun20-30                Length (in Quad Words)
60*4882a593Smuzhiyun31                   BECN bit
61*4882a593Smuzhiyun32-51                DLID (lower 20 bits)
62*4882a593Smuzhiyun52-56                SC (Service Class)
63*4882a593Smuzhiyun57-59                RC (Routing Control)
64*4882a593Smuzhiyun60                   FECN bit
65*4882a593Smuzhiyun61-62                L2 (=10, 16B format)
66*4882a593Smuzhiyun63                   LT (=1, Link Transfer Head Flit)
67*4882a593Smuzhiyun
68*4882a593SmuzhiyunQuad Word 1:
69*4882a593Smuzhiyun0-7                  L4 type (=0x78 ETHERNET)
70*4882a593Smuzhiyun8-11                 SLID[23:20]
71*4882a593Smuzhiyun12-15                DLID[23:20]
72*4882a593Smuzhiyun16-31                PKEY
73*4882a593Smuzhiyun32-47                Entropy
74*4882a593Smuzhiyun48-63                Reserved
75*4882a593Smuzhiyun
76*4882a593SmuzhiyunQuad Word 2:
77*4882a593Smuzhiyun0-15                 Reserved
78*4882a593Smuzhiyun16-31                L4 header
79*4882a593Smuzhiyun32-63                Ethernet Packet
80*4882a593Smuzhiyun
81*4882a593SmuzhiyunQuad Words 3 to N-1:
82*4882a593Smuzhiyun0-63                 Ethernet packet (pad extended)
83*4882a593Smuzhiyun
84*4882a593SmuzhiyunQuad Word N (last):
85*4882a593Smuzhiyun0-23                 Ethernet packet (pad extended)
86*4882a593Smuzhiyun24-55                ICRC
87*4882a593Smuzhiyun56-61                Tail
88*4882a593Smuzhiyun62-63                LT (=01, Link Transfer Tail Flit)
89*4882a593Smuzhiyun==================== ================================
90*4882a593Smuzhiyun
91*4882a593SmuzhiyunEthernet packet is padded on the transmit side to ensure that the VNIC OPA
92*4882a593Smuzhiyunpacket is quad word aligned. The 'Tail' field contains the number of bytes
93*4882a593Smuzhiyunpadded. On the receive side the 'Tail' field is read and the padding is
94*4882a593Smuzhiyunremoved (along with ICRC, Tail and OPA header) before passing packet up
95*4882a593Smuzhiyunthe network stack.
96*4882a593Smuzhiyun
97*4882a593SmuzhiyunThe L4 header field contains the virtual Ethernet switch id the VNIC port
98*4882a593Smuzhiyunbelongs to. On the receive side, this field is used to de-multiplex the
99*4882a593Smuzhiyunreceived VNIC packets to different VNIC ports.
100*4882a593Smuzhiyun
101*4882a593SmuzhiyunDriver Design
102*4882a593Smuzhiyun==============
103*4882a593SmuzhiyunIntel OPA VNIC software design is presented in the below diagram.
104*4882a593SmuzhiyunOPA VNIC functionality has a HW dependent component and a HW
105*4882a593Smuzhiyunindependent component.
106*4882a593Smuzhiyun
107*4882a593SmuzhiyunThe support has been added for IB device to allocate and free the RDMA
108*4882a593Smuzhiyunnetdev devices. The RDMA netdev supports interfacing with the network
109*4882a593Smuzhiyunstack thus creating standard network interfaces. OPA_VNIC is an RDMA
110*4882a593Smuzhiyunnetdev device type.
111*4882a593Smuzhiyun
112*4882a593SmuzhiyunThe HW dependent VNIC functionality is part of the HFI1 driver. It
113*4882a593Smuzhiyunimplements the verbs to allocate and free the OPA_VNIC RDMA netdev.
114*4882a593SmuzhiyunIt involves HW resource allocation/management for VNIC functionality.
115*4882a593SmuzhiyunIt interfaces with the network stack and implements the required
116*4882a593Smuzhiyunnet_device_ops functions. It expects Omni-Path encapsulated Ethernet
117*4882a593Smuzhiyunpackets in the transmit path and provides HW access to them. It strips
118*4882a593Smuzhiyunthe Omni-Path header from the received packets before passing them up
119*4882a593Smuzhiyunthe network stack. It also implements the RDMA netdev control operations.
120*4882a593Smuzhiyun
121*4882a593SmuzhiyunThe OPA VNIC module implements the HW independent VNIC functionality.
122*4882a593SmuzhiyunIt consists of two parts. The VNIC Ethernet Management Agent (VEMA)
123*4882a593Smuzhiyunregisters itself with IB core as an IB client and interfaces with the
124*4882a593SmuzhiyunIB MAD stack. It exchanges the management information with the Ethernet
125*4882a593SmuzhiyunManager (EM) and the VNIC netdev. The VNIC netdev part allocates and frees
126*4882a593Smuzhiyunthe OPA_VNIC RDMA netdev devices. It overrides the net_device_ops functions
127*4882a593Smuzhiyunset by HW dependent VNIC driver where required to accommodate any control
128*4882a593Smuzhiyunoperation. It also handles the encapsulation of Ethernet packets with an
129*4882a593SmuzhiyunOmni-Path header in the transmit path. For each VNIC interface, the
130*4882a593Smuzhiyuninformation required for encapsulation is configured by the EM via VEMA MAD
131*4882a593Smuzhiyuninterface. It also passes any control information to the HW dependent driver
132*4882a593Smuzhiyunby invoking the RDMA netdev control operations::
133*4882a593Smuzhiyun
134*4882a593Smuzhiyun        +-------------------+ +----------------------+
135*4882a593Smuzhiyun        |                   | |       Linux          |
136*4882a593Smuzhiyun        |     IB MAD        | |      Network         |
137*4882a593Smuzhiyun        |                   | |       Stack          |
138*4882a593Smuzhiyun        +-------------------+ +----------------------+
139*4882a593Smuzhiyun                 |               |          |
140*4882a593Smuzhiyun                 |               |          |
141*4882a593Smuzhiyun        +----------------------------+      |
142*4882a593Smuzhiyun        |                            |      |
143*4882a593Smuzhiyun        |      OPA VNIC Module       |      |
144*4882a593Smuzhiyun        |  (OPA VNIC RDMA Netdev     |      |
145*4882a593Smuzhiyun        |     & EMA functions)       |      |
146*4882a593Smuzhiyun        |                            |      |
147*4882a593Smuzhiyun        +----------------------------+      |
148*4882a593Smuzhiyun                    |                       |
149*4882a593Smuzhiyun                    |                       |
150*4882a593Smuzhiyun           +------------------+             |
151*4882a593Smuzhiyun           |     IB core      |             |
152*4882a593Smuzhiyun           +------------------+             |
153*4882a593Smuzhiyun                    |                       |
154*4882a593Smuzhiyun                    |                       |
155*4882a593Smuzhiyun        +--------------------------------------------+
156*4882a593Smuzhiyun        |                                            |
157*4882a593Smuzhiyun        |      HFI1 Driver with VNIC support         |
158*4882a593Smuzhiyun        |                                            |
159*4882a593Smuzhiyun        +--------------------------------------------+
160