1*4882a593Smuzhiyun================================================================= 2*4882a593SmuzhiyunIntel Omni-Path (OPA) Virtual Network Interface Controller (VNIC) 3*4882a593Smuzhiyun================================================================= 4*4882a593Smuzhiyun 5*4882a593SmuzhiyunIntel Omni-Path (OPA) Virtual Network Interface Controller (VNIC) feature 6*4882a593Smuzhiyunsupports Ethernet functionality over Omni-Path fabric by encapsulating 7*4882a593Smuzhiyunthe Ethernet packets between HFI nodes. 8*4882a593Smuzhiyun 9*4882a593SmuzhiyunArchitecture 10*4882a593Smuzhiyun============= 11*4882a593SmuzhiyunThe patterns of exchanges of Omni-Path encapsulated Ethernet packets 12*4882a593Smuzhiyuninvolves one or more virtual Ethernet switches overlaid on the Omni-Path 13*4882a593Smuzhiyunfabric topology. A subset of HFI nodes on the Omni-Path fabric are 14*4882a593Smuzhiyunpermitted to exchange encapsulated Ethernet packets across a particular 15*4882a593Smuzhiyunvirtual Ethernet switch. The virtual Ethernet switches are logical 16*4882a593Smuzhiyunabstractions achieved by configuring the HFI nodes on the fabric for 17*4882a593Smuzhiyunheader generation and processing. In the simplest configuration all HFI 18*4882a593Smuzhiyunnodes across the fabric exchange encapsulated Ethernet packets over a 19*4882a593Smuzhiyunsingle virtual Ethernet switch. A virtual Ethernet switch, is effectively 20*4882a593Smuzhiyunan independent Ethernet network. The configuration is performed by an 21*4882a593SmuzhiyunEthernet Manager (EM) which is part of the trusted Fabric Manager (FM) 22*4882a593Smuzhiyunapplication. HFI nodes can have multiple VNICs each connected to a 23*4882a593Smuzhiyundifferent virtual Ethernet switch. The below diagram presents a case 24*4882a593Smuzhiyunof two virtual Ethernet switches with two HFI nodes:: 25*4882a593Smuzhiyun 26*4882a593Smuzhiyun +-------------------+ 27*4882a593Smuzhiyun | Subnet/ | 28*4882a593Smuzhiyun | Ethernet | 29*4882a593Smuzhiyun | Manager | 30*4882a593Smuzhiyun +-------------------+ 31*4882a593Smuzhiyun / / 32*4882a593Smuzhiyun / / 33*4882a593Smuzhiyun / / 34*4882a593Smuzhiyun / / 35*4882a593Smuzhiyun +-----------------------------+ +------------------------------+ 36*4882a593Smuzhiyun | Virtual Ethernet Switch | | Virtual Ethernet Switch | 37*4882a593Smuzhiyun | +---------+ +---------+ | | +---------+ +---------+ | 38*4882a593Smuzhiyun | | VPORT | | VPORT | | | | VPORT | | VPORT | | 39*4882a593Smuzhiyun +--+---------+----+---------+-+ +-+---------+----+---------+---+ 40*4882a593Smuzhiyun | \ / | 41*4882a593Smuzhiyun | \ / | 42*4882a593Smuzhiyun | \/ | 43*4882a593Smuzhiyun | / \ | 44*4882a593Smuzhiyun | / \ | 45*4882a593Smuzhiyun +-----------+------------+ +-----------+------------+ 46*4882a593Smuzhiyun | VNIC | VNIC | | VNIC | VNIC | 47*4882a593Smuzhiyun +-----------+------------+ +-----------+------------+ 48*4882a593Smuzhiyun | HFI | | HFI | 49*4882a593Smuzhiyun +------------------------+ +------------------------+ 50*4882a593Smuzhiyun 51*4882a593Smuzhiyun 52*4882a593SmuzhiyunThe Omni-Path encapsulated Ethernet packet format is as described below. 53*4882a593Smuzhiyun 54*4882a593Smuzhiyun==================== ================================ 55*4882a593SmuzhiyunBits Field 56*4882a593Smuzhiyun==================== ================================ 57*4882a593SmuzhiyunQuad Word 0: 58*4882a593Smuzhiyun0-19 SLID (lower 20 bits) 59*4882a593Smuzhiyun20-30 Length (in Quad Words) 60*4882a593Smuzhiyun31 BECN bit 61*4882a593Smuzhiyun32-51 DLID (lower 20 bits) 62*4882a593Smuzhiyun52-56 SC (Service Class) 63*4882a593Smuzhiyun57-59 RC (Routing Control) 64*4882a593Smuzhiyun60 FECN bit 65*4882a593Smuzhiyun61-62 L2 (=10, 16B format) 66*4882a593Smuzhiyun63 LT (=1, Link Transfer Head Flit) 67*4882a593Smuzhiyun 68*4882a593SmuzhiyunQuad Word 1: 69*4882a593Smuzhiyun0-7 L4 type (=0x78 ETHERNET) 70*4882a593Smuzhiyun8-11 SLID[23:20] 71*4882a593Smuzhiyun12-15 DLID[23:20] 72*4882a593Smuzhiyun16-31 PKEY 73*4882a593Smuzhiyun32-47 Entropy 74*4882a593Smuzhiyun48-63 Reserved 75*4882a593Smuzhiyun 76*4882a593SmuzhiyunQuad Word 2: 77*4882a593Smuzhiyun0-15 Reserved 78*4882a593Smuzhiyun16-31 L4 header 79*4882a593Smuzhiyun32-63 Ethernet Packet 80*4882a593Smuzhiyun 81*4882a593SmuzhiyunQuad Words 3 to N-1: 82*4882a593Smuzhiyun0-63 Ethernet packet (pad extended) 83*4882a593Smuzhiyun 84*4882a593SmuzhiyunQuad Word N (last): 85*4882a593Smuzhiyun0-23 Ethernet packet (pad extended) 86*4882a593Smuzhiyun24-55 ICRC 87*4882a593Smuzhiyun56-61 Tail 88*4882a593Smuzhiyun62-63 LT (=01, Link Transfer Tail Flit) 89*4882a593Smuzhiyun==================== ================================ 90*4882a593Smuzhiyun 91*4882a593SmuzhiyunEthernet packet is padded on the transmit side to ensure that the VNIC OPA 92*4882a593Smuzhiyunpacket is quad word aligned. The 'Tail' field contains the number of bytes 93*4882a593Smuzhiyunpadded. On the receive side the 'Tail' field is read and the padding is 94*4882a593Smuzhiyunremoved (along with ICRC, Tail and OPA header) before passing packet up 95*4882a593Smuzhiyunthe network stack. 96*4882a593Smuzhiyun 97*4882a593SmuzhiyunThe L4 header field contains the virtual Ethernet switch id the VNIC port 98*4882a593Smuzhiyunbelongs to. On the receive side, this field is used to de-multiplex the 99*4882a593Smuzhiyunreceived VNIC packets to different VNIC ports. 100*4882a593Smuzhiyun 101*4882a593SmuzhiyunDriver Design 102*4882a593Smuzhiyun============== 103*4882a593SmuzhiyunIntel OPA VNIC software design is presented in the below diagram. 104*4882a593SmuzhiyunOPA VNIC functionality has a HW dependent component and a HW 105*4882a593Smuzhiyunindependent component. 106*4882a593Smuzhiyun 107*4882a593SmuzhiyunThe support has been added for IB device to allocate and free the RDMA 108*4882a593Smuzhiyunnetdev devices. The RDMA netdev supports interfacing with the network 109*4882a593Smuzhiyunstack thus creating standard network interfaces. OPA_VNIC is an RDMA 110*4882a593Smuzhiyunnetdev device type. 111*4882a593Smuzhiyun 112*4882a593SmuzhiyunThe HW dependent VNIC functionality is part of the HFI1 driver. It 113*4882a593Smuzhiyunimplements the verbs to allocate and free the OPA_VNIC RDMA netdev. 114*4882a593SmuzhiyunIt involves HW resource allocation/management for VNIC functionality. 115*4882a593SmuzhiyunIt interfaces with the network stack and implements the required 116*4882a593Smuzhiyunnet_device_ops functions. It expects Omni-Path encapsulated Ethernet 117*4882a593Smuzhiyunpackets in the transmit path and provides HW access to them. It strips 118*4882a593Smuzhiyunthe Omni-Path header from the received packets before passing them up 119*4882a593Smuzhiyunthe network stack. It also implements the RDMA netdev control operations. 120*4882a593Smuzhiyun 121*4882a593SmuzhiyunThe OPA VNIC module implements the HW independent VNIC functionality. 122*4882a593SmuzhiyunIt consists of two parts. The VNIC Ethernet Management Agent (VEMA) 123*4882a593Smuzhiyunregisters itself with IB core as an IB client and interfaces with the 124*4882a593SmuzhiyunIB MAD stack. It exchanges the management information with the Ethernet 125*4882a593SmuzhiyunManager (EM) and the VNIC netdev. The VNIC netdev part allocates and frees 126*4882a593Smuzhiyunthe OPA_VNIC RDMA netdev devices. It overrides the net_device_ops functions 127*4882a593Smuzhiyunset by HW dependent VNIC driver where required to accommodate any control 128*4882a593Smuzhiyunoperation. It also handles the encapsulation of Ethernet packets with an 129*4882a593SmuzhiyunOmni-Path header in the transmit path. For each VNIC interface, the 130*4882a593Smuzhiyuninformation required for encapsulation is configured by the EM via VEMA MAD 131*4882a593Smuzhiyuninterface. It also passes any control information to the HW dependent driver 132*4882a593Smuzhiyunby invoking the RDMA netdev control operations:: 133*4882a593Smuzhiyun 134*4882a593Smuzhiyun +-------------------+ +----------------------+ 135*4882a593Smuzhiyun | | | Linux | 136*4882a593Smuzhiyun | IB MAD | | Network | 137*4882a593Smuzhiyun | | | Stack | 138*4882a593Smuzhiyun +-------------------+ +----------------------+ 139*4882a593Smuzhiyun | | | 140*4882a593Smuzhiyun | | | 141*4882a593Smuzhiyun +----------------------------+ | 142*4882a593Smuzhiyun | | | 143*4882a593Smuzhiyun | OPA VNIC Module | | 144*4882a593Smuzhiyun | (OPA VNIC RDMA Netdev | | 145*4882a593Smuzhiyun | & EMA functions) | | 146*4882a593Smuzhiyun | | | 147*4882a593Smuzhiyun +----------------------------+ | 148*4882a593Smuzhiyun | | 149*4882a593Smuzhiyun | | 150*4882a593Smuzhiyun +------------------+ | 151*4882a593Smuzhiyun | IB core | | 152*4882a593Smuzhiyun +------------------+ | 153*4882a593Smuzhiyun | | 154*4882a593Smuzhiyun | | 155*4882a593Smuzhiyun +--------------------------------------------+ 156*4882a593Smuzhiyun | | 157*4882a593Smuzhiyun | HFI1 Driver with VNIC support | 158*4882a593Smuzhiyun | | 159*4882a593Smuzhiyun +--------------------------------------------+ 160