1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun=================================== 4*4882a593SmuzhiyunLinux Ethernet Bonding Driver HOWTO 5*4882a593Smuzhiyun=================================== 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunLatest update: 27 April 2011 8*4882a593Smuzhiyun 9*4882a593SmuzhiyunInitial release: Thomas Davis <tadavis at lbl.gov> 10*4882a593Smuzhiyun 11*4882a593SmuzhiyunCorrections, HA extensions: 2000/10/03-15: 12*4882a593Smuzhiyun 13*4882a593Smuzhiyun - Willy Tarreau <willy at meta-x.org> 14*4882a593Smuzhiyun - Constantine Gavrilov <const-g at xpert.com> 15*4882a593Smuzhiyun - Chad N. Tindel <ctindel at ieee dot org> 16*4882a593Smuzhiyun - Janice Girouard <girouard at us dot ibm dot com> 17*4882a593Smuzhiyun - Jay Vosburgh <fubar at us dot ibm dot com> 18*4882a593Smuzhiyun 19*4882a593SmuzhiyunReorganized and updated Feb 2005 by Jay Vosburgh 20*4882a593SmuzhiyunAdded Sysfs information: 2006/04/24 21*4882a593Smuzhiyun 22*4882a593Smuzhiyun - Mitch Williams <mitch.a.williams at intel.com> 23*4882a593Smuzhiyun 24*4882a593SmuzhiyunIntroduction 25*4882a593Smuzhiyun============ 26*4882a593Smuzhiyun 27*4882a593SmuzhiyunThe Linux bonding driver provides a method for aggregating 28*4882a593Smuzhiyunmultiple network interfaces into a single logical "bonded" interface. 29*4882a593SmuzhiyunThe behavior of the bonded interfaces depends upon the mode; generally 30*4882a593Smuzhiyunspeaking, modes provide either hot standby or load balancing services. 31*4882a593SmuzhiyunAdditionally, link integrity monitoring may be performed. 32*4882a593Smuzhiyun 33*4882a593SmuzhiyunThe bonding driver originally came from Donald Becker's 34*4882a593Smuzhiyunbeowulf patches for kernel 2.0. It has changed quite a bit since, and 35*4882a593Smuzhiyunthe original tools from extreme-linux and beowulf sites will not work 36*4882a593Smuzhiyunwith this version of the driver. 37*4882a593Smuzhiyun 38*4882a593SmuzhiyunFor new versions of the driver, updated userspace tools, and 39*4882a593Smuzhiyunwho to ask for help, please follow the links at the end of this file. 40*4882a593Smuzhiyun 41*4882a593Smuzhiyun.. Table of Contents 42*4882a593Smuzhiyun 43*4882a593Smuzhiyun 1. Bonding Driver Installation 44*4882a593Smuzhiyun 45*4882a593Smuzhiyun 2. Bonding Driver Options 46*4882a593Smuzhiyun 47*4882a593Smuzhiyun 3. Configuring Bonding Devices 48*4882a593Smuzhiyun 3.1 Configuration with Sysconfig Support 49*4882a593Smuzhiyun 3.1.1 Using DHCP with Sysconfig 50*4882a593Smuzhiyun 3.1.2 Configuring Multiple Bonds with Sysconfig 51*4882a593Smuzhiyun 3.2 Configuration with Initscripts Support 52*4882a593Smuzhiyun 3.2.1 Using DHCP with Initscripts 53*4882a593Smuzhiyun 3.2.2 Configuring Multiple Bonds with Initscripts 54*4882a593Smuzhiyun 3.3 Configuring Bonding Manually with Ifenslave 55*4882a593Smuzhiyun 3.3.1 Configuring Multiple Bonds Manually 56*4882a593Smuzhiyun 3.4 Configuring Bonding Manually via Sysfs 57*4882a593Smuzhiyun 3.5 Configuration with Interfaces Support 58*4882a593Smuzhiyun 3.6 Overriding Configuration for Special Cases 59*4882a593Smuzhiyun 3.7 Configuring LACP for 802.3ad mode in a more secure way 60*4882a593Smuzhiyun 61*4882a593Smuzhiyun 4. Querying Bonding Configuration 62*4882a593Smuzhiyun 4.1 Bonding Configuration 63*4882a593Smuzhiyun 4.2 Network Configuration 64*4882a593Smuzhiyun 65*4882a593Smuzhiyun 5. Switch Configuration 66*4882a593Smuzhiyun 67*4882a593Smuzhiyun 6. 802.1q VLAN Support 68*4882a593Smuzhiyun 69*4882a593Smuzhiyun 7. Link Monitoring 70*4882a593Smuzhiyun 7.1 ARP Monitor Operation 71*4882a593Smuzhiyun 7.2 Configuring Multiple ARP Targets 72*4882a593Smuzhiyun 7.3 MII Monitor Operation 73*4882a593Smuzhiyun 74*4882a593Smuzhiyun 8. Potential Trouble Sources 75*4882a593Smuzhiyun 8.1 Adventures in Routing 76*4882a593Smuzhiyun 8.2 Ethernet Device Renaming 77*4882a593Smuzhiyun 8.3 Painfully Slow Or No Failed Link Detection By Miimon 78*4882a593Smuzhiyun 79*4882a593Smuzhiyun 9. SNMP agents 80*4882a593Smuzhiyun 81*4882a593Smuzhiyun 10. Promiscuous mode 82*4882a593Smuzhiyun 83*4882a593Smuzhiyun 11. Configuring Bonding for High Availability 84*4882a593Smuzhiyun 11.1 High Availability in a Single Switch Topology 85*4882a593Smuzhiyun 11.2 High Availability in a Multiple Switch Topology 86*4882a593Smuzhiyun 11.2.1 HA Bonding Mode Selection for Multiple Switch Topology 87*4882a593Smuzhiyun 11.2.2 HA Link Monitoring for Multiple Switch Topology 88*4882a593Smuzhiyun 89*4882a593Smuzhiyun 12. Configuring Bonding for Maximum Throughput 90*4882a593Smuzhiyun 12.1 Maximum Throughput in a Single Switch Topology 91*4882a593Smuzhiyun 12.1.1 MT Bonding Mode Selection for Single Switch Topology 92*4882a593Smuzhiyun 12.1.2 MT Link Monitoring for Single Switch Topology 93*4882a593Smuzhiyun 12.2 Maximum Throughput in a Multiple Switch Topology 94*4882a593Smuzhiyun 12.2.1 MT Bonding Mode Selection for Multiple Switch Topology 95*4882a593Smuzhiyun 12.2.2 MT Link Monitoring for Multiple Switch Topology 96*4882a593Smuzhiyun 97*4882a593Smuzhiyun 13. Switch Behavior Issues 98*4882a593Smuzhiyun 13.1 Link Establishment and Failover Delays 99*4882a593Smuzhiyun 13.2 Duplicated Incoming Packets 100*4882a593Smuzhiyun 101*4882a593Smuzhiyun 14. Hardware Specific Considerations 102*4882a593Smuzhiyun 14.1 IBM BladeCenter 103*4882a593Smuzhiyun 104*4882a593Smuzhiyun 15. Frequently Asked Questions 105*4882a593Smuzhiyun 106*4882a593Smuzhiyun 16. Resources and Links 107*4882a593Smuzhiyun 108*4882a593Smuzhiyun 109*4882a593Smuzhiyun1. Bonding Driver Installation 110*4882a593Smuzhiyun============================== 111*4882a593Smuzhiyun 112*4882a593SmuzhiyunMost popular distro kernels ship with the bonding driver 113*4882a593Smuzhiyunalready available as a module. If your distro does not, or you 114*4882a593Smuzhiyunhave need to compile bonding from source (e.g., configuring and 115*4882a593Smuzhiyuninstalling a mainline kernel from kernel.org), you'll need to perform 116*4882a593Smuzhiyunthe following steps: 117*4882a593Smuzhiyun 118*4882a593Smuzhiyun1.1 Configure and build the kernel with bonding 119*4882a593Smuzhiyun----------------------------------------------- 120*4882a593Smuzhiyun 121*4882a593SmuzhiyunThe current version of the bonding driver is available in the 122*4882a593Smuzhiyundrivers/net/bonding subdirectory of the most recent kernel source 123*4882a593Smuzhiyun(which is available on http://kernel.org). Most users "rolling their 124*4882a593Smuzhiyunown" will want to use the most recent kernel from kernel.org. 125*4882a593Smuzhiyun 126*4882a593SmuzhiyunConfigure kernel with "make menuconfig" (or "make xconfig" or 127*4882a593Smuzhiyun"make config"), then select "Bonding driver support" in the "Network 128*4882a593Smuzhiyundevice support" section. It is recommended that you configure the 129*4882a593Smuzhiyundriver as module since it is currently the only way to pass parameters 130*4882a593Smuzhiyunto the driver or configure more than one bonding device. 131*4882a593Smuzhiyun 132*4882a593SmuzhiyunBuild and install the new kernel and modules. 133*4882a593Smuzhiyun 134*4882a593Smuzhiyun1.2 Bonding Control Utility 135*4882a593Smuzhiyun--------------------------- 136*4882a593Smuzhiyun 137*4882a593SmuzhiyunIt is recommended to configure bonding via iproute2 (netlink) 138*4882a593Smuzhiyunor sysfs, the old ifenslave control utility is obsolete. 139*4882a593Smuzhiyun 140*4882a593Smuzhiyun2. Bonding Driver Options 141*4882a593Smuzhiyun========================= 142*4882a593Smuzhiyun 143*4882a593SmuzhiyunOptions for the bonding driver are supplied as parameters to the 144*4882a593Smuzhiyunbonding module at load time, or are specified via sysfs. 145*4882a593Smuzhiyun 146*4882a593SmuzhiyunModule options may be given as command line arguments to the 147*4882a593Smuzhiyuninsmod or modprobe command, but are usually specified in either the 148*4882a593Smuzhiyun``/etc/modprobe.d/*.conf`` configuration files, or in a distro-specific 149*4882a593Smuzhiyunconfiguration file (some of which are detailed in the next section). 150*4882a593Smuzhiyun 151*4882a593SmuzhiyunDetails on bonding support for sysfs is provided in the 152*4882a593Smuzhiyun"Configuring Bonding Manually via Sysfs" section, below. 153*4882a593Smuzhiyun 154*4882a593SmuzhiyunThe available bonding driver parameters are listed below. If a 155*4882a593Smuzhiyunparameter is not specified the default value is used. When initially 156*4882a593Smuzhiyunconfiguring a bond, it is recommended "tail -f /var/log/messages" be 157*4882a593Smuzhiyunrun in a separate window to watch for bonding driver error messages. 158*4882a593Smuzhiyun 159*4882a593SmuzhiyunIt is critical that either the miimon or arp_interval and 160*4882a593Smuzhiyunarp_ip_target parameters be specified, otherwise serious network 161*4882a593Smuzhiyundegradation will occur during link failures. Very few devices do not 162*4882a593Smuzhiyunsupport at least miimon, so there is really no reason not to use it. 163*4882a593Smuzhiyun 164*4882a593SmuzhiyunOptions with textual values will accept either the text name 165*4882a593Smuzhiyunor, for backwards compatibility, the option value. E.g., 166*4882a593Smuzhiyun"mode=802.3ad" and "mode=4" set the same mode. 167*4882a593Smuzhiyun 168*4882a593SmuzhiyunThe parameters are as follows: 169*4882a593Smuzhiyun 170*4882a593Smuzhiyunactive_slave 171*4882a593Smuzhiyun 172*4882a593Smuzhiyun Specifies the new active slave for modes that support it 173*4882a593Smuzhiyun (active-backup, balance-alb and balance-tlb). Possible values 174*4882a593Smuzhiyun are the name of any currently enslaved interface, or an empty 175*4882a593Smuzhiyun string. If a name is given, the slave and its link must be up in order 176*4882a593Smuzhiyun to be selected as the new active slave. If an empty string is 177*4882a593Smuzhiyun specified, the current active slave is cleared, and a new active 178*4882a593Smuzhiyun slave is selected automatically. 179*4882a593Smuzhiyun 180*4882a593Smuzhiyun Note that this is only available through the sysfs interface. No module 181*4882a593Smuzhiyun parameter by this name exists. 182*4882a593Smuzhiyun 183*4882a593Smuzhiyun The normal value of this option is the name of the currently 184*4882a593Smuzhiyun active slave, or the empty string if there is no active slave or 185*4882a593Smuzhiyun the current mode does not use an active slave. 186*4882a593Smuzhiyun 187*4882a593Smuzhiyunad_actor_sys_prio 188*4882a593Smuzhiyun 189*4882a593Smuzhiyun In an AD system, this specifies the system priority. The allowed range 190*4882a593Smuzhiyun is 1 - 65535. If the value is not specified, it takes 65535 as the 191*4882a593Smuzhiyun default value. 192*4882a593Smuzhiyun 193*4882a593Smuzhiyun This parameter has effect only in 802.3ad mode and is available through 194*4882a593Smuzhiyun SysFs interface. 195*4882a593Smuzhiyun 196*4882a593Smuzhiyunad_actor_system 197*4882a593Smuzhiyun 198*4882a593Smuzhiyun In an AD system, this specifies the mac-address for the actor in 199*4882a593Smuzhiyun protocol packet exchanges (LACPDUs). The value cannot be a multicast 200*4882a593Smuzhiyun address. If the all-zeroes MAC is specified, bonding will internally 201*4882a593Smuzhiyun use the MAC of the bond itself. It is preferred to have the 202*4882a593Smuzhiyun local-admin bit set for this mac but driver does not enforce it. If 203*4882a593Smuzhiyun the value is not given then system defaults to using the masters' 204*4882a593Smuzhiyun mac address as actors' system address. 205*4882a593Smuzhiyun 206*4882a593Smuzhiyun This parameter has effect only in 802.3ad mode and is available through 207*4882a593Smuzhiyun SysFs interface. 208*4882a593Smuzhiyun 209*4882a593Smuzhiyunad_select 210*4882a593Smuzhiyun 211*4882a593Smuzhiyun Specifies the 802.3ad aggregation selection logic to use. The 212*4882a593Smuzhiyun possible values and their effects are: 213*4882a593Smuzhiyun 214*4882a593Smuzhiyun stable or 0 215*4882a593Smuzhiyun 216*4882a593Smuzhiyun The active aggregator is chosen by largest aggregate 217*4882a593Smuzhiyun bandwidth. 218*4882a593Smuzhiyun 219*4882a593Smuzhiyun Reselection of the active aggregator occurs only when all 220*4882a593Smuzhiyun slaves of the active aggregator are down or the active 221*4882a593Smuzhiyun aggregator has no slaves. 222*4882a593Smuzhiyun 223*4882a593Smuzhiyun This is the default value. 224*4882a593Smuzhiyun 225*4882a593Smuzhiyun bandwidth or 1 226*4882a593Smuzhiyun 227*4882a593Smuzhiyun The active aggregator is chosen by largest aggregate 228*4882a593Smuzhiyun bandwidth. Reselection occurs if: 229*4882a593Smuzhiyun 230*4882a593Smuzhiyun - A slave is added to or removed from the bond 231*4882a593Smuzhiyun 232*4882a593Smuzhiyun - Any slave's link state changes 233*4882a593Smuzhiyun 234*4882a593Smuzhiyun - Any slave's 802.3ad association state changes 235*4882a593Smuzhiyun 236*4882a593Smuzhiyun - The bond's administrative state changes to up 237*4882a593Smuzhiyun 238*4882a593Smuzhiyun count or 2 239*4882a593Smuzhiyun 240*4882a593Smuzhiyun The active aggregator is chosen by the largest number of 241*4882a593Smuzhiyun ports (slaves). Reselection occurs as described under the 242*4882a593Smuzhiyun "bandwidth" setting, above. 243*4882a593Smuzhiyun 244*4882a593Smuzhiyun The bandwidth and count selection policies permit failover of 245*4882a593Smuzhiyun 802.3ad aggregations when partial failure of the active aggregator 246*4882a593Smuzhiyun occurs. This keeps the aggregator with the highest availability 247*4882a593Smuzhiyun (either in bandwidth or in number of ports) active at all times. 248*4882a593Smuzhiyun 249*4882a593Smuzhiyun This option was added in bonding version 3.4.0. 250*4882a593Smuzhiyun 251*4882a593Smuzhiyunad_user_port_key 252*4882a593Smuzhiyun 253*4882a593Smuzhiyun In an AD system, the port-key has three parts as shown below - 254*4882a593Smuzhiyun 255*4882a593Smuzhiyun ===== ============ 256*4882a593Smuzhiyun Bits Use 257*4882a593Smuzhiyun ===== ============ 258*4882a593Smuzhiyun 00 Duplex 259*4882a593Smuzhiyun 01-05 Speed 260*4882a593Smuzhiyun 06-15 User-defined 261*4882a593Smuzhiyun ===== ============ 262*4882a593Smuzhiyun 263*4882a593Smuzhiyun This defines the upper 10 bits of the port key. The values can be 264*4882a593Smuzhiyun from 0 - 1023. If not given, the system defaults to 0. 265*4882a593Smuzhiyun 266*4882a593Smuzhiyun This parameter has effect only in 802.3ad mode and is available through 267*4882a593Smuzhiyun SysFs interface. 268*4882a593Smuzhiyun 269*4882a593Smuzhiyunall_slaves_active 270*4882a593Smuzhiyun 271*4882a593Smuzhiyun Specifies that duplicate frames (received on inactive ports) should be 272*4882a593Smuzhiyun dropped (0) or delivered (1). 273*4882a593Smuzhiyun 274*4882a593Smuzhiyun Normally, bonding will drop duplicate frames (received on inactive 275*4882a593Smuzhiyun ports), which is desirable for most users. But there are some times 276*4882a593Smuzhiyun it is nice to allow duplicate frames to be delivered. 277*4882a593Smuzhiyun 278*4882a593Smuzhiyun The default value is 0 (drop duplicate frames received on inactive 279*4882a593Smuzhiyun ports). 280*4882a593Smuzhiyun 281*4882a593Smuzhiyunarp_interval 282*4882a593Smuzhiyun 283*4882a593Smuzhiyun Specifies the ARP link monitoring frequency in milliseconds. 284*4882a593Smuzhiyun 285*4882a593Smuzhiyun The ARP monitor works by periodically checking the slave 286*4882a593Smuzhiyun devices to determine whether they have sent or received 287*4882a593Smuzhiyun traffic recently (the precise criteria depends upon the 288*4882a593Smuzhiyun bonding mode, and the state of the slave). Regular traffic is 289*4882a593Smuzhiyun generated via ARP probes issued for the addresses specified by 290*4882a593Smuzhiyun the arp_ip_target option. 291*4882a593Smuzhiyun 292*4882a593Smuzhiyun This behavior can be modified by the arp_validate option, 293*4882a593Smuzhiyun below. 294*4882a593Smuzhiyun 295*4882a593Smuzhiyun If ARP monitoring is used in an etherchannel compatible mode 296*4882a593Smuzhiyun (modes 0 and 2), the switch should be configured in a mode 297*4882a593Smuzhiyun that evenly distributes packets across all links. If the 298*4882a593Smuzhiyun switch is configured to distribute the packets in an XOR 299*4882a593Smuzhiyun fashion, all replies from the ARP targets will be received on 300*4882a593Smuzhiyun the same link which could cause the other team members to 301*4882a593Smuzhiyun fail. ARP monitoring should not be used in conjunction with 302*4882a593Smuzhiyun miimon. A value of 0 disables ARP monitoring. The default 303*4882a593Smuzhiyun value is 0. 304*4882a593Smuzhiyun 305*4882a593Smuzhiyunarp_ip_target 306*4882a593Smuzhiyun 307*4882a593Smuzhiyun Specifies the IP addresses to use as ARP monitoring peers when 308*4882a593Smuzhiyun arp_interval is > 0. These are the targets of the ARP request 309*4882a593Smuzhiyun sent to determine the health of the link to the targets. 310*4882a593Smuzhiyun Specify these values in ddd.ddd.ddd.ddd format. Multiple IP 311*4882a593Smuzhiyun addresses must be separated by a comma. At least one IP 312*4882a593Smuzhiyun address must be given for ARP monitoring to function. The 313*4882a593Smuzhiyun maximum number of targets that can be specified is 16. The 314*4882a593Smuzhiyun default value is no IP addresses. 315*4882a593Smuzhiyun 316*4882a593Smuzhiyunarp_validate 317*4882a593Smuzhiyun 318*4882a593Smuzhiyun Specifies whether or not ARP probes and replies should be 319*4882a593Smuzhiyun validated in any mode that supports arp monitoring, or whether 320*4882a593Smuzhiyun non-ARP traffic should be filtered (disregarded) for link 321*4882a593Smuzhiyun monitoring purposes. 322*4882a593Smuzhiyun 323*4882a593Smuzhiyun Possible values are: 324*4882a593Smuzhiyun 325*4882a593Smuzhiyun none or 0 326*4882a593Smuzhiyun 327*4882a593Smuzhiyun No validation or filtering is performed. 328*4882a593Smuzhiyun 329*4882a593Smuzhiyun active or 1 330*4882a593Smuzhiyun 331*4882a593Smuzhiyun Validation is performed only for the active slave. 332*4882a593Smuzhiyun 333*4882a593Smuzhiyun backup or 2 334*4882a593Smuzhiyun 335*4882a593Smuzhiyun Validation is performed only for backup slaves. 336*4882a593Smuzhiyun 337*4882a593Smuzhiyun all or 3 338*4882a593Smuzhiyun 339*4882a593Smuzhiyun Validation is performed for all slaves. 340*4882a593Smuzhiyun 341*4882a593Smuzhiyun filter or 4 342*4882a593Smuzhiyun 343*4882a593Smuzhiyun Filtering is applied to all slaves. No validation is 344*4882a593Smuzhiyun performed. 345*4882a593Smuzhiyun 346*4882a593Smuzhiyun filter_active or 5 347*4882a593Smuzhiyun 348*4882a593Smuzhiyun Filtering is applied to all slaves, validation is performed 349*4882a593Smuzhiyun only for the active slave. 350*4882a593Smuzhiyun 351*4882a593Smuzhiyun filter_backup or 6 352*4882a593Smuzhiyun 353*4882a593Smuzhiyun Filtering is applied to all slaves, validation is performed 354*4882a593Smuzhiyun only for backup slaves. 355*4882a593Smuzhiyun 356*4882a593Smuzhiyun Validation: 357*4882a593Smuzhiyun 358*4882a593Smuzhiyun Enabling validation causes the ARP monitor to examine the incoming 359*4882a593Smuzhiyun ARP requests and replies, and only consider a slave to be up if it 360*4882a593Smuzhiyun is receiving the appropriate ARP traffic. 361*4882a593Smuzhiyun 362*4882a593Smuzhiyun For an active slave, the validation checks ARP replies to confirm 363*4882a593Smuzhiyun that they were generated by an arp_ip_target. Since backup slaves 364*4882a593Smuzhiyun do not typically receive these replies, the validation performed 365*4882a593Smuzhiyun for backup slaves is on the broadcast ARP request sent out via the 366*4882a593Smuzhiyun active slave. It is possible that some switch or network 367*4882a593Smuzhiyun configurations may result in situations wherein the backup slaves 368*4882a593Smuzhiyun do not receive the ARP requests; in such a situation, validation 369*4882a593Smuzhiyun of backup slaves must be disabled. 370*4882a593Smuzhiyun 371*4882a593Smuzhiyun The validation of ARP requests on backup slaves is mainly helping 372*4882a593Smuzhiyun bonding to decide which slaves are more likely to work in case of 373*4882a593Smuzhiyun the active slave failure, it doesn't really guarantee that the 374*4882a593Smuzhiyun backup slave will work if it's selected as the next active slave. 375*4882a593Smuzhiyun 376*4882a593Smuzhiyun Validation is useful in network configurations in which multiple 377*4882a593Smuzhiyun bonding hosts are concurrently issuing ARPs to one or more targets 378*4882a593Smuzhiyun beyond a common switch. Should the link between the switch and 379*4882a593Smuzhiyun target fail (but not the switch itself), the probe traffic 380*4882a593Smuzhiyun generated by the multiple bonding instances will fool the standard 381*4882a593Smuzhiyun ARP monitor into considering the links as still up. Use of 382*4882a593Smuzhiyun validation can resolve this, as the ARP monitor will only consider 383*4882a593Smuzhiyun ARP requests and replies associated with its own instance of 384*4882a593Smuzhiyun bonding. 385*4882a593Smuzhiyun 386*4882a593Smuzhiyun Filtering: 387*4882a593Smuzhiyun 388*4882a593Smuzhiyun Enabling filtering causes the ARP monitor to only use incoming ARP 389*4882a593Smuzhiyun packets for link availability purposes. Arriving packets that are 390*4882a593Smuzhiyun not ARPs are delivered normally, but do not count when determining 391*4882a593Smuzhiyun if a slave is available. 392*4882a593Smuzhiyun 393*4882a593Smuzhiyun Filtering operates by only considering the reception of ARP 394*4882a593Smuzhiyun packets (any ARP packet, regardless of source or destination) when 395*4882a593Smuzhiyun determining if a slave has received traffic for link availability 396*4882a593Smuzhiyun purposes. 397*4882a593Smuzhiyun 398*4882a593Smuzhiyun Filtering is useful in network configurations in which significant 399*4882a593Smuzhiyun levels of third party broadcast traffic would fool the standard 400*4882a593Smuzhiyun ARP monitor into considering the links as still up. Use of 401*4882a593Smuzhiyun filtering can resolve this, as only ARP traffic is considered for 402*4882a593Smuzhiyun link availability purposes. 403*4882a593Smuzhiyun 404*4882a593Smuzhiyun This option was added in bonding version 3.1.0. 405*4882a593Smuzhiyun 406*4882a593Smuzhiyunarp_all_targets 407*4882a593Smuzhiyun 408*4882a593Smuzhiyun Specifies the quantity of arp_ip_targets that must be reachable 409*4882a593Smuzhiyun in order for the ARP monitor to consider a slave as being up. 410*4882a593Smuzhiyun This option affects only active-backup mode for slaves with 411*4882a593Smuzhiyun arp_validation enabled. 412*4882a593Smuzhiyun 413*4882a593Smuzhiyun Possible values are: 414*4882a593Smuzhiyun 415*4882a593Smuzhiyun any or 0 416*4882a593Smuzhiyun 417*4882a593Smuzhiyun consider the slave up only when any of the arp_ip_targets 418*4882a593Smuzhiyun is reachable 419*4882a593Smuzhiyun 420*4882a593Smuzhiyun all or 1 421*4882a593Smuzhiyun 422*4882a593Smuzhiyun consider the slave up only when all of the arp_ip_targets 423*4882a593Smuzhiyun are reachable 424*4882a593Smuzhiyun 425*4882a593Smuzhiyundowndelay 426*4882a593Smuzhiyun 427*4882a593Smuzhiyun Specifies the time, in milliseconds, to wait before disabling 428*4882a593Smuzhiyun a slave after a link failure has been detected. This option 429*4882a593Smuzhiyun is only valid for the miimon link monitor. The downdelay 430*4882a593Smuzhiyun value should be a multiple of the miimon value; if not, it 431*4882a593Smuzhiyun will be rounded down to the nearest multiple. The default 432*4882a593Smuzhiyun value is 0. 433*4882a593Smuzhiyun 434*4882a593Smuzhiyunfail_over_mac 435*4882a593Smuzhiyun 436*4882a593Smuzhiyun Specifies whether active-backup mode should set all slaves to 437*4882a593Smuzhiyun the same MAC address at enslavement (the traditional 438*4882a593Smuzhiyun behavior), or, when enabled, perform special handling of the 439*4882a593Smuzhiyun bond's MAC address in accordance with the selected policy. 440*4882a593Smuzhiyun 441*4882a593Smuzhiyun Possible values are: 442*4882a593Smuzhiyun 443*4882a593Smuzhiyun none or 0 444*4882a593Smuzhiyun 445*4882a593Smuzhiyun This setting disables fail_over_mac, and causes 446*4882a593Smuzhiyun bonding to set all slaves of an active-backup bond to 447*4882a593Smuzhiyun the same MAC address at enslavement time. This is the 448*4882a593Smuzhiyun default. 449*4882a593Smuzhiyun 450*4882a593Smuzhiyun active or 1 451*4882a593Smuzhiyun 452*4882a593Smuzhiyun The "active" fail_over_mac policy indicates that the 453*4882a593Smuzhiyun MAC address of the bond should always be the MAC 454*4882a593Smuzhiyun address of the currently active slave. The MAC 455*4882a593Smuzhiyun address of the slaves is not changed; instead, the MAC 456*4882a593Smuzhiyun address of the bond changes during a failover. 457*4882a593Smuzhiyun 458*4882a593Smuzhiyun This policy is useful for devices that cannot ever 459*4882a593Smuzhiyun alter their MAC address, or for devices that refuse 460*4882a593Smuzhiyun incoming broadcasts with their own source MAC (which 461*4882a593Smuzhiyun interferes with the ARP monitor). 462*4882a593Smuzhiyun 463*4882a593Smuzhiyun The down side of this policy is that every device on 464*4882a593Smuzhiyun the network must be updated via gratuitous ARP, 465*4882a593Smuzhiyun vs. just updating a switch or set of switches (which 466*4882a593Smuzhiyun often takes place for any traffic, not just ARP 467*4882a593Smuzhiyun traffic, if the switch snoops incoming traffic to 468*4882a593Smuzhiyun update its tables) for the traditional method. If the 469*4882a593Smuzhiyun gratuitous ARP is lost, communication may be 470*4882a593Smuzhiyun disrupted. 471*4882a593Smuzhiyun 472*4882a593Smuzhiyun When this policy is used in conjunction with the mii 473*4882a593Smuzhiyun monitor, devices which assert link up prior to being 474*4882a593Smuzhiyun able to actually transmit and receive are particularly 475*4882a593Smuzhiyun susceptible to loss of the gratuitous ARP, and an 476*4882a593Smuzhiyun appropriate updelay setting may be required. 477*4882a593Smuzhiyun 478*4882a593Smuzhiyun follow or 2 479*4882a593Smuzhiyun 480*4882a593Smuzhiyun The "follow" fail_over_mac policy causes the MAC 481*4882a593Smuzhiyun address of the bond to be selected normally (normally 482*4882a593Smuzhiyun the MAC address of the first slave added to the bond). 483*4882a593Smuzhiyun However, the second and subsequent slaves are not set 484*4882a593Smuzhiyun to this MAC address while they are in a backup role; a 485*4882a593Smuzhiyun slave is programmed with the bond's MAC address at 486*4882a593Smuzhiyun failover time (and the formerly active slave receives 487*4882a593Smuzhiyun the newly active slave's MAC address). 488*4882a593Smuzhiyun 489*4882a593Smuzhiyun This policy is useful for multiport devices that 490*4882a593Smuzhiyun either become confused or incur a performance penalty 491*4882a593Smuzhiyun when multiple ports are programmed with the same MAC 492*4882a593Smuzhiyun address. 493*4882a593Smuzhiyun 494*4882a593Smuzhiyun 495*4882a593Smuzhiyun The default policy is none, unless the first slave cannot 496*4882a593Smuzhiyun change its MAC address, in which case the active policy is 497*4882a593Smuzhiyun selected by default. 498*4882a593Smuzhiyun 499*4882a593Smuzhiyun This option may be modified via sysfs only when no slaves are 500*4882a593Smuzhiyun present in the bond. 501*4882a593Smuzhiyun 502*4882a593Smuzhiyun This option was added in bonding version 3.2.0. The "follow" 503*4882a593Smuzhiyun policy was added in bonding version 3.3.0. 504*4882a593Smuzhiyun 505*4882a593Smuzhiyunlacp_rate 506*4882a593Smuzhiyun 507*4882a593Smuzhiyun Option specifying the rate in which we'll ask our link partner 508*4882a593Smuzhiyun to transmit LACPDU packets in 802.3ad mode. Possible values 509*4882a593Smuzhiyun are: 510*4882a593Smuzhiyun 511*4882a593Smuzhiyun slow or 0 512*4882a593Smuzhiyun Request partner to transmit LACPDUs every 30 seconds 513*4882a593Smuzhiyun 514*4882a593Smuzhiyun fast or 1 515*4882a593Smuzhiyun Request partner to transmit LACPDUs every 1 second 516*4882a593Smuzhiyun 517*4882a593Smuzhiyun The default is slow. 518*4882a593Smuzhiyun 519*4882a593Smuzhiyunmax_bonds 520*4882a593Smuzhiyun 521*4882a593Smuzhiyun Specifies the number of bonding devices to create for this 522*4882a593Smuzhiyun instance of the bonding driver. E.g., if max_bonds is 3, and 523*4882a593Smuzhiyun the bonding driver is not already loaded, then bond0, bond1 524*4882a593Smuzhiyun and bond2 will be created. The default value is 1. Specifying 525*4882a593Smuzhiyun a value of 0 will load bonding, but will not create any devices. 526*4882a593Smuzhiyun 527*4882a593Smuzhiyunmiimon 528*4882a593Smuzhiyun 529*4882a593Smuzhiyun Specifies the MII link monitoring frequency in milliseconds. 530*4882a593Smuzhiyun This determines how often the link state of each slave is 531*4882a593Smuzhiyun inspected for link failures. A value of zero disables MII 532*4882a593Smuzhiyun link monitoring. A value of 100 is a good starting point. 533*4882a593Smuzhiyun The use_carrier option, below, affects how the link state is 534*4882a593Smuzhiyun determined. See the High Availability section for additional 535*4882a593Smuzhiyun information. The default value is 0. 536*4882a593Smuzhiyun 537*4882a593Smuzhiyunmin_links 538*4882a593Smuzhiyun 539*4882a593Smuzhiyun Specifies the minimum number of links that must be active before 540*4882a593Smuzhiyun asserting carrier. It is similar to the Cisco EtherChannel min-links 541*4882a593Smuzhiyun feature. This allows setting the minimum number of member ports that 542*4882a593Smuzhiyun must be up (link-up state) before marking the bond device as up 543*4882a593Smuzhiyun (carrier on). This is useful for situations where higher level services 544*4882a593Smuzhiyun such as clustering want to ensure a minimum number of low bandwidth 545*4882a593Smuzhiyun links are active before switchover. This option only affect 802.3ad 546*4882a593Smuzhiyun mode. 547*4882a593Smuzhiyun 548*4882a593Smuzhiyun The default value is 0. This will cause carrier to be asserted (for 549*4882a593Smuzhiyun 802.3ad mode) whenever there is an active aggregator, regardless of the 550*4882a593Smuzhiyun number of available links in that aggregator. Note that, because an 551*4882a593Smuzhiyun aggregator cannot be active without at least one available link, 552*4882a593Smuzhiyun setting this option to 0 or to 1 has the exact same effect. 553*4882a593Smuzhiyun 554*4882a593Smuzhiyunmode 555*4882a593Smuzhiyun 556*4882a593Smuzhiyun Specifies one of the bonding policies. The default is 557*4882a593Smuzhiyun balance-rr (round robin). Possible values are: 558*4882a593Smuzhiyun 559*4882a593Smuzhiyun balance-rr or 0 560*4882a593Smuzhiyun 561*4882a593Smuzhiyun Round-robin policy: Transmit packets in sequential 562*4882a593Smuzhiyun order from the first available slave through the 563*4882a593Smuzhiyun last. This mode provides load balancing and fault 564*4882a593Smuzhiyun tolerance. 565*4882a593Smuzhiyun 566*4882a593Smuzhiyun active-backup or 1 567*4882a593Smuzhiyun 568*4882a593Smuzhiyun Active-backup policy: Only one slave in the bond is 569*4882a593Smuzhiyun active. A different slave becomes active if, and only 570*4882a593Smuzhiyun if, the active slave fails. The bond's MAC address is 571*4882a593Smuzhiyun externally visible on only one port (network adapter) 572*4882a593Smuzhiyun to avoid confusing the switch. 573*4882a593Smuzhiyun 574*4882a593Smuzhiyun In bonding version 2.6.2 or later, when a failover 575*4882a593Smuzhiyun occurs in active-backup mode, bonding will issue one 576*4882a593Smuzhiyun or more gratuitous ARPs on the newly active slave. 577*4882a593Smuzhiyun One gratuitous ARP is issued for the bonding master 578*4882a593Smuzhiyun interface and each VLAN interfaces configured above 579*4882a593Smuzhiyun it, provided that the interface has at least one IP 580*4882a593Smuzhiyun address configured. Gratuitous ARPs issued for VLAN 581*4882a593Smuzhiyun interfaces are tagged with the appropriate VLAN id. 582*4882a593Smuzhiyun 583*4882a593Smuzhiyun This mode provides fault tolerance. The primary 584*4882a593Smuzhiyun option, documented below, affects the behavior of this 585*4882a593Smuzhiyun mode. 586*4882a593Smuzhiyun 587*4882a593Smuzhiyun balance-xor or 2 588*4882a593Smuzhiyun 589*4882a593Smuzhiyun XOR policy: Transmit based on the selected transmit 590*4882a593Smuzhiyun hash policy. The default policy is a simple [(source 591*4882a593Smuzhiyun MAC address XOR'd with destination MAC address XOR 592*4882a593Smuzhiyun packet type ID) modulo slave count]. Alternate transmit 593*4882a593Smuzhiyun policies may be selected via the xmit_hash_policy option, 594*4882a593Smuzhiyun described below. 595*4882a593Smuzhiyun 596*4882a593Smuzhiyun This mode provides load balancing and fault tolerance. 597*4882a593Smuzhiyun 598*4882a593Smuzhiyun broadcast or 3 599*4882a593Smuzhiyun 600*4882a593Smuzhiyun Broadcast policy: transmits everything on all slave 601*4882a593Smuzhiyun interfaces. This mode provides fault tolerance. 602*4882a593Smuzhiyun 603*4882a593Smuzhiyun 802.3ad or 4 604*4882a593Smuzhiyun 605*4882a593Smuzhiyun IEEE 802.3ad Dynamic link aggregation. Creates 606*4882a593Smuzhiyun aggregation groups that share the same speed and 607*4882a593Smuzhiyun duplex settings. Utilizes all slaves in the active 608*4882a593Smuzhiyun aggregator according to the 802.3ad specification. 609*4882a593Smuzhiyun 610*4882a593Smuzhiyun Slave selection for outgoing traffic is done according 611*4882a593Smuzhiyun to the transmit hash policy, which may be changed from 612*4882a593Smuzhiyun the default simple XOR policy via the xmit_hash_policy 613*4882a593Smuzhiyun option, documented below. Note that not all transmit 614*4882a593Smuzhiyun policies may be 802.3ad compliant, particularly in 615*4882a593Smuzhiyun regards to the packet mis-ordering requirements of 616*4882a593Smuzhiyun section 43.2.4 of the 802.3ad standard. Differing 617*4882a593Smuzhiyun peer implementations will have varying tolerances for 618*4882a593Smuzhiyun noncompliance. 619*4882a593Smuzhiyun 620*4882a593Smuzhiyun Prerequisites: 621*4882a593Smuzhiyun 622*4882a593Smuzhiyun 1. Ethtool support in the base drivers for retrieving 623*4882a593Smuzhiyun the speed and duplex of each slave. 624*4882a593Smuzhiyun 625*4882a593Smuzhiyun 2. A switch that supports IEEE 802.3ad Dynamic link 626*4882a593Smuzhiyun aggregation. 627*4882a593Smuzhiyun 628*4882a593Smuzhiyun Most switches will require some type of configuration 629*4882a593Smuzhiyun to enable 802.3ad mode. 630*4882a593Smuzhiyun 631*4882a593Smuzhiyun balance-tlb or 5 632*4882a593Smuzhiyun 633*4882a593Smuzhiyun Adaptive transmit load balancing: channel bonding that 634*4882a593Smuzhiyun does not require any special switch support. 635*4882a593Smuzhiyun 636*4882a593Smuzhiyun In tlb_dynamic_lb=1 mode; the outgoing traffic is 637*4882a593Smuzhiyun distributed according to the current load (computed 638*4882a593Smuzhiyun relative to the speed) on each slave. 639*4882a593Smuzhiyun 640*4882a593Smuzhiyun In tlb_dynamic_lb=0 mode; the load balancing based on 641*4882a593Smuzhiyun current load is disabled and the load is distributed 642*4882a593Smuzhiyun only using the hash distribution. 643*4882a593Smuzhiyun 644*4882a593Smuzhiyun Incoming traffic is received by the current slave. 645*4882a593Smuzhiyun If the receiving slave fails, another slave takes over 646*4882a593Smuzhiyun the MAC address of the failed receiving slave. 647*4882a593Smuzhiyun 648*4882a593Smuzhiyun Prerequisite: 649*4882a593Smuzhiyun 650*4882a593Smuzhiyun Ethtool support in the base drivers for retrieving the 651*4882a593Smuzhiyun speed of each slave. 652*4882a593Smuzhiyun 653*4882a593Smuzhiyun balance-alb or 6 654*4882a593Smuzhiyun 655*4882a593Smuzhiyun Adaptive load balancing: includes balance-tlb plus 656*4882a593Smuzhiyun receive load balancing (rlb) for IPV4 traffic, and 657*4882a593Smuzhiyun does not require any special switch support. The 658*4882a593Smuzhiyun receive load balancing is achieved by ARP negotiation. 659*4882a593Smuzhiyun The bonding driver intercepts the ARP Replies sent by 660*4882a593Smuzhiyun the local system on their way out and overwrites the 661*4882a593Smuzhiyun source hardware address with the unique hardware 662*4882a593Smuzhiyun address of one of the slaves in the bond such that 663*4882a593Smuzhiyun different peers use different hardware addresses for 664*4882a593Smuzhiyun the server. 665*4882a593Smuzhiyun 666*4882a593Smuzhiyun Receive traffic from connections created by the server 667*4882a593Smuzhiyun is also balanced. When the local system sends an ARP 668*4882a593Smuzhiyun Request the bonding driver copies and saves the peer's 669*4882a593Smuzhiyun IP information from the ARP packet. When the ARP 670*4882a593Smuzhiyun Reply arrives from the peer, its hardware address is 671*4882a593Smuzhiyun retrieved and the bonding driver initiates an ARP 672*4882a593Smuzhiyun reply to this peer assigning it to one of the slaves 673*4882a593Smuzhiyun in the bond. A problematic outcome of using ARP 674*4882a593Smuzhiyun negotiation for balancing is that each time that an 675*4882a593Smuzhiyun ARP request is broadcast it uses the hardware address 676*4882a593Smuzhiyun of the bond. Hence, peers learn the hardware address 677*4882a593Smuzhiyun of the bond and the balancing of receive traffic 678*4882a593Smuzhiyun collapses to the current slave. This is handled by 679*4882a593Smuzhiyun sending updates (ARP Replies) to all the peers with 680*4882a593Smuzhiyun their individually assigned hardware address such that 681*4882a593Smuzhiyun the traffic is redistributed. Receive traffic is also 682*4882a593Smuzhiyun redistributed when a new slave is added to the bond 683*4882a593Smuzhiyun and when an inactive slave is re-activated. The 684*4882a593Smuzhiyun receive load is distributed sequentially (round robin) 685*4882a593Smuzhiyun among the group of highest speed slaves in the bond. 686*4882a593Smuzhiyun 687*4882a593Smuzhiyun When a link is reconnected or a new slave joins the 688*4882a593Smuzhiyun bond the receive traffic is redistributed among all 689*4882a593Smuzhiyun active slaves in the bond by initiating ARP Replies 690*4882a593Smuzhiyun with the selected MAC address to each of the 691*4882a593Smuzhiyun clients. The updelay parameter (detailed below) must 692*4882a593Smuzhiyun be set to a value equal or greater than the switch's 693*4882a593Smuzhiyun forwarding delay so that the ARP Replies sent to the 694*4882a593Smuzhiyun peers will not be blocked by the switch. 695*4882a593Smuzhiyun 696*4882a593Smuzhiyun Prerequisites: 697*4882a593Smuzhiyun 698*4882a593Smuzhiyun 1. Ethtool support in the base drivers for retrieving 699*4882a593Smuzhiyun the speed of each slave. 700*4882a593Smuzhiyun 701*4882a593Smuzhiyun 2. Base driver support for setting the hardware 702*4882a593Smuzhiyun address of a device while it is open. This is 703*4882a593Smuzhiyun required so that there will always be one slave in the 704*4882a593Smuzhiyun team using the bond hardware address (the 705*4882a593Smuzhiyun curr_active_slave) while having a unique hardware 706*4882a593Smuzhiyun address for each slave in the bond. If the 707*4882a593Smuzhiyun curr_active_slave fails its hardware address is 708*4882a593Smuzhiyun swapped with the new curr_active_slave that was 709*4882a593Smuzhiyun chosen. 710*4882a593Smuzhiyun 711*4882a593Smuzhiyunnum_grat_arp, 712*4882a593Smuzhiyunnum_unsol_na 713*4882a593Smuzhiyun 714*4882a593Smuzhiyun Specify the number of peer notifications (gratuitous ARPs and 715*4882a593Smuzhiyun unsolicited IPv6 Neighbor Advertisements) to be issued after a 716*4882a593Smuzhiyun failover event. As soon as the link is up on the new slave 717*4882a593Smuzhiyun (possibly immediately) a peer notification is sent on the 718*4882a593Smuzhiyun bonding device and each VLAN sub-device. This is repeated at 719*4882a593Smuzhiyun the rate specified by peer_notif_delay if the number is 720*4882a593Smuzhiyun greater than 1. 721*4882a593Smuzhiyun 722*4882a593Smuzhiyun The valid range is 0 - 255; the default value is 1. These options 723*4882a593Smuzhiyun affect only the active-backup mode. These options were added for 724*4882a593Smuzhiyun bonding versions 3.3.0 and 3.4.0 respectively. 725*4882a593Smuzhiyun 726*4882a593Smuzhiyun From Linux 3.0 and bonding version 3.7.1, these notifications 727*4882a593Smuzhiyun are generated by the ipv4 and ipv6 code and the numbers of 728*4882a593Smuzhiyun repetitions cannot be set independently. 729*4882a593Smuzhiyun 730*4882a593Smuzhiyunpackets_per_slave 731*4882a593Smuzhiyun 732*4882a593Smuzhiyun Specify the number of packets to transmit through a slave before 733*4882a593Smuzhiyun moving to the next one. When set to 0 then a slave is chosen at 734*4882a593Smuzhiyun random. 735*4882a593Smuzhiyun 736*4882a593Smuzhiyun The valid range is 0 - 65535; the default value is 1. This option 737*4882a593Smuzhiyun has effect only in balance-rr mode. 738*4882a593Smuzhiyun 739*4882a593Smuzhiyunpeer_notif_delay 740*4882a593Smuzhiyun 741*4882a593Smuzhiyun Specify the delay, in milliseconds, between each peer 742*4882a593Smuzhiyun notification (gratuitous ARP and unsolicited IPv6 Neighbor 743*4882a593Smuzhiyun Advertisement) when they are issued after a failover event. 744*4882a593Smuzhiyun This delay should be a multiple of the link monitor interval 745*4882a593Smuzhiyun (arp_interval or miimon, whichever is active). The default 746*4882a593Smuzhiyun value is 0 which means to match the value of the link monitor 747*4882a593Smuzhiyun interval. 748*4882a593Smuzhiyun 749*4882a593Smuzhiyunprimary 750*4882a593Smuzhiyun 751*4882a593Smuzhiyun A string (eth0, eth2, etc) specifying which slave is the 752*4882a593Smuzhiyun primary device. The specified device will always be the 753*4882a593Smuzhiyun active slave while it is available. Only when the primary is 754*4882a593Smuzhiyun off-line will alternate devices be used. This is useful when 755*4882a593Smuzhiyun one slave is preferred over another, e.g., when one slave has 756*4882a593Smuzhiyun higher throughput than another. 757*4882a593Smuzhiyun 758*4882a593Smuzhiyun The primary option is only valid for active-backup(1), 759*4882a593Smuzhiyun balance-tlb (5) and balance-alb (6) mode. 760*4882a593Smuzhiyun 761*4882a593Smuzhiyunprimary_reselect 762*4882a593Smuzhiyun 763*4882a593Smuzhiyun Specifies the reselection policy for the primary slave. This 764*4882a593Smuzhiyun affects how the primary slave is chosen to become the active slave 765*4882a593Smuzhiyun when failure of the active slave or recovery of the primary slave 766*4882a593Smuzhiyun occurs. This option is designed to prevent flip-flopping between 767*4882a593Smuzhiyun the primary slave and other slaves. Possible values are: 768*4882a593Smuzhiyun 769*4882a593Smuzhiyun always or 0 (default) 770*4882a593Smuzhiyun 771*4882a593Smuzhiyun The primary slave becomes the active slave whenever it 772*4882a593Smuzhiyun comes back up. 773*4882a593Smuzhiyun 774*4882a593Smuzhiyun better or 1 775*4882a593Smuzhiyun 776*4882a593Smuzhiyun The primary slave becomes the active slave when it comes 777*4882a593Smuzhiyun back up, if the speed and duplex of the primary slave is 778*4882a593Smuzhiyun better than the speed and duplex of the current active 779*4882a593Smuzhiyun slave. 780*4882a593Smuzhiyun 781*4882a593Smuzhiyun failure or 2 782*4882a593Smuzhiyun 783*4882a593Smuzhiyun The primary slave becomes the active slave only if the 784*4882a593Smuzhiyun current active slave fails and the primary slave is up. 785*4882a593Smuzhiyun 786*4882a593Smuzhiyun The primary_reselect setting is ignored in two cases: 787*4882a593Smuzhiyun 788*4882a593Smuzhiyun If no slaves are active, the first slave to recover is 789*4882a593Smuzhiyun made the active slave. 790*4882a593Smuzhiyun 791*4882a593Smuzhiyun When initially enslaved, the primary slave is always made 792*4882a593Smuzhiyun the active slave. 793*4882a593Smuzhiyun 794*4882a593Smuzhiyun Changing the primary_reselect policy via sysfs will cause an 795*4882a593Smuzhiyun immediate selection of the best active slave according to the new 796*4882a593Smuzhiyun policy. This may or may not result in a change of the active 797*4882a593Smuzhiyun slave, depending upon the circumstances. 798*4882a593Smuzhiyun 799*4882a593Smuzhiyun This option was added for bonding version 3.6.0. 800*4882a593Smuzhiyun 801*4882a593Smuzhiyuntlb_dynamic_lb 802*4882a593Smuzhiyun 803*4882a593Smuzhiyun Specifies if dynamic shuffling of flows is enabled in tlb 804*4882a593Smuzhiyun mode. The value has no effect on any other modes. 805*4882a593Smuzhiyun 806*4882a593Smuzhiyun The default behavior of tlb mode is to shuffle active flows across 807*4882a593Smuzhiyun slaves based on the load in that interval. This gives nice lb 808*4882a593Smuzhiyun characteristics but can cause packet reordering. If re-ordering is 809*4882a593Smuzhiyun a concern use this variable to disable flow shuffling and rely on 810*4882a593Smuzhiyun load balancing provided solely by the hash distribution. 811*4882a593Smuzhiyun xmit-hash-policy can be used to select the appropriate hashing for 812*4882a593Smuzhiyun the setup. 813*4882a593Smuzhiyun 814*4882a593Smuzhiyun The sysfs entry can be used to change the setting per bond device 815*4882a593Smuzhiyun and the initial value is derived from the module parameter. The 816*4882a593Smuzhiyun sysfs entry is allowed to be changed only if the bond device is 817*4882a593Smuzhiyun down. 818*4882a593Smuzhiyun 819*4882a593Smuzhiyun The default value is "1" that enables flow shuffling while value "0" 820*4882a593Smuzhiyun disables it. This option was added in bonding driver 3.7.1 821*4882a593Smuzhiyun 822*4882a593Smuzhiyun 823*4882a593Smuzhiyunupdelay 824*4882a593Smuzhiyun 825*4882a593Smuzhiyun Specifies the time, in milliseconds, to wait before enabling a 826*4882a593Smuzhiyun slave after a link recovery has been detected. This option is 827*4882a593Smuzhiyun only valid for the miimon link monitor. The updelay value 828*4882a593Smuzhiyun should be a multiple of the miimon value; if not, it will be 829*4882a593Smuzhiyun rounded down to the nearest multiple. The default value is 0. 830*4882a593Smuzhiyun 831*4882a593Smuzhiyunuse_carrier 832*4882a593Smuzhiyun 833*4882a593Smuzhiyun Specifies whether or not miimon should use MII or ETHTOOL 834*4882a593Smuzhiyun ioctls vs. netif_carrier_ok() to determine the link 835*4882a593Smuzhiyun status. The MII or ETHTOOL ioctls are less efficient and 836*4882a593Smuzhiyun utilize a deprecated calling sequence within the kernel. The 837*4882a593Smuzhiyun netif_carrier_ok() relies on the device driver to maintain its 838*4882a593Smuzhiyun state with netif_carrier_on/off; at this writing, most, but 839*4882a593Smuzhiyun not all, device drivers support this facility. 840*4882a593Smuzhiyun 841*4882a593Smuzhiyun If bonding insists that the link is up when it should not be, 842*4882a593Smuzhiyun it may be that your network device driver does not support 843*4882a593Smuzhiyun netif_carrier_on/off. The default state for netif_carrier is 844*4882a593Smuzhiyun "carrier on," so if a driver does not support netif_carrier, 845*4882a593Smuzhiyun it will appear as if the link is always up. In this case, 846*4882a593Smuzhiyun setting use_carrier to 0 will cause bonding to revert to the 847*4882a593Smuzhiyun MII / ETHTOOL ioctl method to determine the link state. 848*4882a593Smuzhiyun 849*4882a593Smuzhiyun A value of 1 enables the use of netif_carrier_ok(), a value of 850*4882a593Smuzhiyun 0 will use the deprecated MII / ETHTOOL ioctls. The default 851*4882a593Smuzhiyun value is 1. 852*4882a593Smuzhiyun 853*4882a593Smuzhiyunxmit_hash_policy 854*4882a593Smuzhiyun 855*4882a593Smuzhiyun Selects the transmit hash policy to use for slave selection in 856*4882a593Smuzhiyun balance-xor, 802.3ad, and tlb modes. Possible values are: 857*4882a593Smuzhiyun 858*4882a593Smuzhiyun layer2 859*4882a593Smuzhiyun 860*4882a593Smuzhiyun Uses XOR of hardware MAC addresses and packet type ID 861*4882a593Smuzhiyun field to generate the hash. The formula is 862*4882a593Smuzhiyun 863*4882a593Smuzhiyun hash = source MAC XOR destination MAC XOR packet type ID 864*4882a593Smuzhiyun slave number = hash modulo slave count 865*4882a593Smuzhiyun 866*4882a593Smuzhiyun This algorithm will place all traffic to a particular 867*4882a593Smuzhiyun network peer on the same slave. 868*4882a593Smuzhiyun 869*4882a593Smuzhiyun This algorithm is 802.3ad compliant. 870*4882a593Smuzhiyun 871*4882a593Smuzhiyun layer2+3 872*4882a593Smuzhiyun 873*4882a593Smuzhiyun This policy uses a combination of layer2 and layer3 874*4882a593Smuzhiyun protocol information to generate the hash. 875*4882a593Smuzhiyun 876*4882a593Smuzhiyun Uses XOR of hardware MAC addresses and IP addresses to 877*4882a593Smuzhiyun generate the hash. The formula is 878*4882a593Smuzhiyun 879*4882a593Smuzhiyun hash = source MAC XOR destination MAC XOR packet type ID 880*4882a593Smuzhiyun hash = hash XOR source IP XOR destination IP 881*4882a593Smuzhiyun hash = hash XOR (hash RSHIFT 16) 882*4882a593Smuzhiyun hash = hash XOR (hash RSHIFT 8) 883*4882a593Smuzhiyun And then hash is reduced modulo slave count. 884*4882a593Smuzhiyun 885*4882a593Smuzhiyun If the protocol is IPv6 then the source and destination 886*4882a593Smuzhiyun addresses are first hashed using ipv6_addr_hash. 887*4882a593Smuzhiyun 888*4882a593Smuzhiyun This algorithm will place all traffic to a particular 889*4882a593Smuzhiyun network peer on the same slave. For non-IP traffic, 890*4882a593Smuzhiyun the formula is the same as for the layer2 transmit 891*4882a593Smuzhiyun hash policy. 892*4882a593Smuzhiyun 893*4882a593Smuzhiyun This policy is intended to provide a more balanced 894*4882a593Smuzhiyun distribution of traffic than layer2 alone, especially 895*4882a593Smuzhiyun in environments where a layer3 gateway device is 896*4882a593Smuzhiyun required to reach most destinations. 897*4882a593Smuzhiyun 898*4882a593Smuzhiyun This algorithm is 802.3ad compliant. 899*4882a593Smuzhiyun 900*4882a593Smuzhiyun layer3+4 901*4882a593Smuzhiyun 902*4882a593Smuzhiyun This policy uses upper layer protocol information, 903*4882a593Smuzhiyun when available, to generate the hash. This allows for 904*4882a593Smuzhiyun traffic to a particular network peer to span multiple 905*4882a593Smuzhiyun slaves, although a single connection will not span 906*4882a593Smuzhiyun multiple slaves. 907*4882a593Smuzhiyun 908*4882a593Smuzhiyun The formula for unfragmented TCP and UDP packets is 909*4882a593Smuzhiyun 910*4882a593Smuzhiyun hash = source port, destination port (as in the header) 911*4882a593Smuzhiyun hash = hash XOR source IP XOR destination IP 912*4882a593Smuzhiyun hash = hash XOR (hash RSHIFT 16) 913*4882a593Smuzhiyun hash = hash XOR (hash RSHIFT 8) 914*4882a593Smuzhiyun And then hash is reduced modulo slave count. 915*4882a593Smuzhiyun 916*4882a593Smuzhiyun If the protocol is IPv6 then the source and destination 917*4882a593Smuzhiyun addresses are first hashed using ipv6_addr_hash. 918*4882a593Smuzhiyun 919*4882a593Smuzhiyun For fragmented TCP or UDP packets and all other IPv4 and 920*4882a593Smuzhiyun IPv6 protocol traffic, the source and destination port 921*4882a593Smuzhiyun information is omitted. For non-IP traffic, the 922*4882a593Smuzhiyun formula is the same as for the layer2 transmit hash 923*4882a593Smuzhiyun policy. 924*4882a593Smuzhiyun 925*4882a593Smuzhiyun This algorithm is not fully 802.3ad compliant. A 926*4882a593Smuzhiyun single TCP or UDP conversation containing both 927*4882a593Smuzhiyun fragmented and unfragmented packets will see packets 928*4882a593Smuzhiyun striped across two interfaces. This may result in out 929*4882a593Smuzhiyun of order delivery. Most traffic types will not meet 930*4882a593Smuzhiyun this criteria, as TCP rarely fragments traffic, and 931*4882a593Smuzhiyun most UDP traffic is not involved in extended 932*4882a593Smuzhiyun conversations. Other implementations of 802.3ad may 933*4882a593Smuzhiyun or may not tolerate this noncompliance. 934*4882a593Smuzhiyun 935*4882a593Smuzhiyun encap2+3 936*4882a593Smuzhiyun 937*4882a593Smuzhiyun This policy uses the same formula as layer2+3 but it 938*4882a593Smuzhiyun relies on skb_flow_dissect to obtain the header fields 939*4882a593Smuzhiyun which might result in the use of inner headers if an 940*4882a593Smuzhiyun encapsulation protocol is used. For example this will 941*4882a593Smuzhiyun improve the performance for tunnel users because the 942*4882a593Smuzhiyun packets will be distributed according to the encapsulated 943*4882a593Smuzhiyun flows. 944*4882a593Smuzhiyun 945*4882a593Smuzhiyun encap3+4 946*4882a593Smuzhiyun 947*4882a593Smuzhiyun This policy uses the same formula as layer3+4 but it 948*4882a593Smuzhiyun relies on skb_flow_dissect to obtain the header fields 949*4882a593Smuzhiyun which might result in the use of inner headers if an 950*4882a593Smuzhiyun encapsulation protocol is used. For example this will 951*4882a593Smuzhiyun improve the performance for tunnel users because the 952*4882a593Smuzhiyun packets will be distributed according to the encapsulated 953*4882a593Smuzhiyun flows. 954*4882a593Smuzhiyun 955*4882a593Smuzhiyun The default value is layer2. This option was added in bonding 956*4882a593Smuzhiyun version 2.6.3. In earlier versions of bonding, this parameter 957*4882a593Smuzhiyun does not exist, and the layer2 policy is the only policy. The 958*4882a593Smuzhiyun layer2+3 value was added for bonding version 3.2.2. 959*4882a593Smuzhiyun 960*4882a593Smuzhiyunresend_igmp 961*4882a593Smuzhiyun 962*4882a593Smuzhiyun Specifies the number of IGMP membership reports to be issued after 963*4882a593Smuzhiyun a failover event. One membership report is issued immediately after 964*4882a593Smuzhiyun the failover, subsequent packets are sent in each 200ms interval. 965*4882a593Smuzhiyun 966*4882a593Smuzhiyun The valid range is 0 - 255; the default value is 1. A value of 0 967*4882a593Smuzhiyun prevents the IGMP membership report from being issued in response 968*4882a593Smuzhiyun to the failover event. 969*4882a593Smuzhiyun 970*4882a593Smuzhiyun This option is useful for bonding modes balance-rr (0), active-backup 971*4882a593Smuzhiyun (1), balance-tlb (5) and balance-alb (6), in which a failover can 972*4882a593Smuzhiyun switch the IGMP traffic from one slave to another. Therefore a fresh 973*4882a593Smuzhiyun IGMP report must be issued to cause the switch to forward the incoming 974*4882a593Smuzhiyun IGMP traffic over the newly selected slave. 975*4882a593Smuzhiyun 976*4882a593Smuzhiyun This option was added for bonding version 3.7.0. 977*4882a593Smuzhiyun 978*4882a593Smuzhiyunlp_interval 979*4882a593Smuzhiyun 980*4882a593Smuzhiyun Specifies the number of seconds between instances where the bonding 981*4882a593Smuzhiyun driver sends learning packets to each slaves peer switch. 982*4882a593Smuzhiyun 983*4882a593Smuzhiyun The valid range is 1 - 0x7fffffff; the default value is 1. This Option 984*4882a593Smuzhiyun has effect only in balance-tlb and balance-alb modes. 985*4882a593Smuzhiyun 986*4882a593Smuzhiyun3. Configuring Bonding Devices 987*4882a593Smuzhiyun============================== 988*4882a593Smuzhiyun 989*4882a593SmuzhiyunYou can configure bonding using either your distro's network 990*4882a593Smuzhiyuninitialization scripts, or manually using either iproute2 or the 991*4882a593Smuzhiyunsysfs interface. Distros generally use one of three packages for the 992*4882a593Smuzhiyunnetwork initialization scripts: initscripts, sysconfig or interfaces. 993*4882a593SmuzhiyunRecent versions of these packages have support for bonding, while older 994*4882a593Smuzhiyunversions do not. 995*4882a593Smuzhiyun 996*4882a593SmuzhiyunWe will first describe the options for configuring bonding for 997*4882a593Smuzhiyundistros using versions of initscripts, sysconfig and interfaces with full 998*4882a593Smuzhiyunor partial support for bonding, then provide information on enabling 999*4882a593Smuzhiyunbonding without support from the network initialization scripts (i.e., 1000*4882a593Smuzhiyunolder versions of initscripts or sysconfig). 1001*4882a593Smuzhiyun 1002*4882a593SmuzhiyunIf you're unsure whether your distro uses sysconfig, 1003*4882a593Smuzhiyuninitscripts or interfaces, or don't know if it's new enough, have no fear. 1004*4882a593SmuzhiyunDetermining this is fairly straightforward. 1005*4882a593Smuzhiyun 1006*4882a593SmuzhiyunFirst, look for a file called interfaces in /etc/network directory. 1007*4882a593SmuzhiyunIf this file is present in your system, then your system use interfaces. See 1008*4882a593SmuzhiyunConfiguration with Interfaces Support. 1009*4882a593Smuzhiyun 1010*4882a593SmuzhiyunElse, issue the command:: 1011*4882a593Smuzhiyun 1012*4882a593Smuzhiyun $ rpm -qf /sbin/ifup 1013*4882a593Smuzhiyun 1014*4882a593SmuzhiyunIt will respond with a line of text starting with either 1015*4882a593Smuzhiyun"initscripts" or "sysconfig," followed by some numbers. This is the 1016*4882a593Smuzhiyunpackage that provides your network initialization scripts. 1017*4882a593Smuzhiyun 1018*4882a593SmuzhiyunNext, to determine if your installation supports bonding, 1019*4882a593Smuzhiyunissue the command:: 1020*4882a593Smuzhiyun 1021*4882a593Smuzhiyun $ grep ifenslave /sbin/ifup 1022*4882a593Smuzhiyun 1023*4882a593SmuzhiyunIf this returns any matches, then your initscripts or 1024*4882a593Smuzhiyunsysconfig has support for bonding. 1025*4882a593Smuzhiyun 1026*4882a593Smuzhiyun3.1 Configuration with Sysconfig Support 1027*4882a593Smuzhiyun---------------------------------------- 1028*4882a593Smuzhiyun 1029*4882a593SmuzhiyunThis section applies to distros using a version of sysconfig 1030*4882a593Smuzhiyunwith bonding support, for example, SuSE Linux Enterprise Server 9. 1031*4882a593Smuzhiyun 1032*4882a593SmuzhiyunSuSE SLES 9's networking configuration system does support 1033*4882a593Smuzhiyunbonding, however, at this writing, the YaST system configuration 1034*4882a593Smuzhiyunfront end does not provide any means to work with bonding devices. 1035*4882a593SmuzhiyunBonding devices can be managed by hand, however, as follows. 1036*4882a593Smuzhiyun 1037*4882a593SmuzhiyunFirst, if they have not already been configured, configure the 1038*4882a593Smuzhiyunslave devices. On SLES 9, this is most easily done by running the 1039*4882a593Smuzhiyunyast2 sysconfig configuration utility. The goal is for to create an 1040*4882a593Smuzhiyunifcfg-id file for each slave device. The simplest way to accomplish 1041*4882a593Smuzhiyunthis is to configure the devices for DHCP (this is only to get the 1042*4882a593Smuzhiyunfile ifcfg-id file created; see below for some issues with DHCP). The 1043*4882a593Smuzhiyunname of the configuration file for each device will be of the form:: 1044*4882a593Smuzhiyun 1045*4882a593Smuzhiyun ifcfg-id-xx:xx:xx:xx:xx:xx 1046*4882a593Smuzhiyun 1047*4882a593SmuzhiyunWhere the "xx" portion will be replaced with the digits from 1048*4882a593Smuzhiyunthe device's permanent MAC address. 1049*4882a593Smuzhiyun 1050*4882a593SmuzhiyunOnce the set of ifcfg-id-xx:xx:xx:xx:xx:xx files has been 1051*4882a593Smuzhiyuncreated, it is necessary to edit the configuration files for the slave 1052*4882a593Smuzhiyundevices (the MAC addresses correspond to those of the slave devices). 1053*4882a593SmuzhiyunBefore editing, the file will contain multiple lines, and will look 1054*4882a593Smuzhiyunsomething like this:: 1055*4882a593Smuzhiyun 1056*4882a593Smuzhiyun BOOTPROTO='dhcp' 1057*4882a593Smuzhiyun STARTMODE='on' 1058*4882a593Smuzhiyun USERCTL='no' 1059*4882a593Smuzhiyun UNIQUE='XNzu.WeZGOGF+4wE' 1060*4882a593Smuzhiyun _nm_name='bus-pci-0001:61:01.0' 1061*4882a593Smuzhiyun 1062*4882a593SmuzhiyunChange the BOOTPROTO and STARTMODE lines to the following:: 1063*4882a593Smuzhiyun 1064*4882a593Smuzhiyun BOOTPROTO='none' 1065*4882a593Smuzhiyun STARTMODE='off' 1066*4882a593Smuzhiyun 1067*4882a593SmuzhiyunDo not alter the UNIQUE or _nm_name lines. Remove any other 1068*4882a593Smuzhiyunlines (USERCTL, etc). 1069*4882a593Smuzhiyun 1070*4882a593SmuzhiyunOnce the ifcfg-id-xx:xx:xx:xx:xx:xx files have been modified, 1071*4882a593Smuzhiyunit's time to create the configuration file for the bonding device 1072*4882a593Smuzhiyunitself. This file is named ifcfg-bondX, where X is the number of the 1073*4882a593Smuzhiyunbonding device to create, starting at 0. The first such file is 1074*4882a593Smuzhiyunifcfg-bond0, the second is ifcfg-bond1, and so on. The sysconfig 1075*4882a593Smuzhiyunnetwork configuration system will correctly start multiple instances 1076*4882a593Smuzhiyunof bonding. 1077*4882a593Smuzhiyun 1078*4882a593SmuzhiyunThe contents of the ifcfg-bondX file is as follows:: 1079*4882a593Smuzhiyun 1080*4882a593Smuzhiyun BOOTPROTO="static" 1081*4882a593Smuzhiyun BROADCAST="10.0.2.255" 1082*4882a593Smuzhiyun IPADDR="10.0.2.10" 1083*4882a593Smuzhiyun NETMASK="255.255.0.0" 1084*4882a593Smuzhiyun NETWORK="10.0.2.0" 1085*4882a593Smuzhiyun REMOTE_IPADDR="" 1086*4882a593Smuzhiyun STARTMODE="onboot" 1087*4882a593Smuzhiyun BONDING_MASTER="yes" 1088*4882a593Smuzhiyun BONDING_MODULE_OPTS="mode=active-backup miimon=100" 1089*4882a593Smuzhiyun BONDING_SLAVE0="eth0" 1090*4882a593Smuzhiyun BONDING_SLAVE1="bus-pci-0000:06:08.1" 1091*4882a593Smuzhiyun 1092*4882a593SmuzhiyunReplace the sample BROADCAST, IPADDR, NETMASK and NETWORK 1093*4882a593Smuzhiyunvalues with the appropriate values for your network. 1094*4882a593Smuzhiyun 1095*4882a593SmuzhiyunThe STARTMODE specifies when the device is brought online. 1096*4882a593SmuzhiyunThe possible values are: 1097*4882a593Smuzhiyun 1098*4882a593Smuzhiyun ======== ====================================================== 1099*4882a593Smuzhiyun onboot The device is started at boot time. If you're not 1100*4882a593Smuzhiyun sure, this is probably what you want. 1101*4882a593Smuzhiyun 1102*4882a593Smuzhiyun manual The device is started only when ifup is called 1103*4882a593Smuzhiyun manually. Bonding devices may be configured this 1104*4882a593Smuzhiyun way if you do not wish them to start automatically 1105*4882a593Smuzhiyun at boot for some reason. 1106*4882a593Smuzhiyun 1107*4882a593Smuzhiyun hotplug The device is started by a hotplug event. This is not 1108*4882a593Smuzhiyun a valid choice for a bonding device. 1109*4882a593Smuzhiyun 1110*4882a593Smuzhiyun off or The device configuration is ignored. 1111*4882a593Smuzhiyun ignore 1112*4882a593Smuzhiyun ======== ====================================================== 1113*4882a593Smuzhiyun 1114*4882a593SmuzhiyunThe line BONDING_MASTER='yes' indicates that the device is a 1115*4882a593Smuzhiyunbonding master device. The only useful value is "yes." 1116*4882a593Smuzhiyun 1117*4882a593SmuzhiyunThe contents of BONDING_MODULE_OPTS are supplied to the 1118*4882a593Smuzhiyuninstance of the bonding module for this device. Specify the options 1119*4882a593Smuzhiyunfor the bonding mode, link monitoring, and so on here. Do not include 1120*4882a593Smuzhiyunthe max_bonds bonding parameter; this will confuse the configuration 1121*4882a593Smuzhiyunsystem if you have multiple bonding devices. 1122*4882a593Smuzhiyun 1123*4882a593SmuzhiyunFinally, supply one BONDING_SLAVEn="slave device" for each 1124*4882a593Smuzhiyunslave. where "n" is an increasing value, one for each slave. The 1125*4882a593Smuzhiyun"slave device" is either an interface name, e.g., "eth0", or a device 1126*4882a593Smuzhiyunspecifier for the network device. The interface name is easier to 1127*4882a593Smuzhiyunfind, but the ethN names are subject to change at boot time if, e.g., 1128*4882a593Smuzhiyuna device early in the sequence has failed. The device specifiers 1129*4882a593Smuzhiyun(bus-pci-0000:06:08.1 in the example above) specify the physical 1130*4882a593Smuzhiyunnetwork device, and will not change unless the device's bus location 1131*4882a593Smuzhiyunchanges (for example, it is moved from one PCI slot to another). The 1132*4882a593Smuzhiyunexample above uses one of each type for demonstration purposes; most 1133*4882a593Smuzhiyunconfigurations will choose one or the other for all slave devices. 1134*4882a593Smuzhiyun 1135*4882a593SmuzhiyunWhen all configuration files have been modified or created, 1136*4882a593Smuzhiyunnetworking must be restarted for the configuration changes to take 1137*4882a593Smuzhiyuneffect. This can be accomplished via the following:: 1138*4882a593Smuzhiyun 1139*4882a593Smuzhiyun # /etc/init.d/network restart 1140*4882a593Smuzhiyun 1141*4882a593SmuzhiyunNote that the network control script (/sbin/ifdown) will 1142*4882a593Smuzhiyunremove the bonding module as part of the network shutdown processing, 1143*4882a593Smuzhiyunso it is not necessary to remove the module by hand if, e.g., the 1144*4882a593Smuzhiyunmodule parameters have changed. 1145*4882a593Smuzhiyun 1146*4882a593SmuzhiyunAlso, at this writing, YaST/YaST2 will not manage bonding 1147*4882a593Smuzhiyundevices (they do not show bonding interfaces on its list of network 1148*4882a593Smuzhiyundevices). It is necessary to edit the configuration file by hand to 1149*4882a593Smuzhiyunchange the bonding configuration. 1150*4882a593Smuzhiyun 1151*4882a593SmuzhiyunAdditional general options and details of the ifcfg file 1152*4882a593Smuzhiyunformat can be found in an example ifcfg template file:: 1153*4882a593Smuzhiyun 1154*4882a593Smuzhiyun /etc/sysconfig/network/ifcfg.template 1155*4882a593Smuzhiyun 1156*4882a593SmuzhiyunNote that the template does not document the various ``BONDING_*`` 1157*4882a593Smuzhiyunsettings described above, but does describe many of the other options. 1158*4882a593Smuzhiyun 1159*4882a593Smuzhiyun3.1.1 Using DHCP with Sysconfig 1160*4882a593Smuzhiyun------------------------------- 1161*4882a593Smuzhiyun 1162*4882a593SmuzhiyunUnder sysconfig, configuring a device with BOOTPROTO='dhcp' 1163*4882a593Smuzhiyunwill cause it to query DHCP for its IP address information. At this 1164*4882a593Smuzhiyunwriting, this does not function for bonding devices; the scripts 1165*4882a593Smuzhiyunattempt to obtain the device address from DHCP prior to adding any of 1166*4882a593Smuzhiyunthe slave devices. Without active slaves, the DHCP requests are not 1167*4882a593Smuzhiyunsent to the network. 1168*4882a593Smuzhiyun 1169*4882a593Smuzhiyun3.1.2 Configuring Multiple Bonds with Sysconfig 1170*4882a593Smuzhiyun----------------------------------------------- 1171*4882a593Smuzhiyun 1172*4882a593SmuzhiyunThe sysconfig network initialization system is capable of 1173*4882a593Smuzhiyunhandling multiple bonding devices. All that is necessary is for each 1174*4882a593Smuzhiyunbonding instance to have an appropriately configured ifcfg-bondX file 1175*4882a593Smuzhiyun(as described above). Do not specify the "max_bonds" parameter to any 1176*4882a593Smuzhiyuninstance of bonding, as this will confuse sysconfig. If you require 1177*4882a593Smuzhiyunmultiple bonding devices with identical parameters, create multiple 1178*4882a593Smuzhiyunifcfg-bondX files. 1179*4882a593Smuzhiyun 1180*4882a593SmuzhiyunBecause the sysconfig scripts supply the bonding module 1181*4882a593Smuzhiyunoptions in the ifcfg-bondX file, it is not necessary to add them to 1182*4882a593Smuzhiyunthe system ``/etc/modules.d/*.conf`` configuration files. 1183*4882a593Smuzhiyun 1184*4882a593Smuzhiyun3.2 Configuration with Initscripts Support 1185*4882a593Smuzhiyun------------------------------------------ 1186*4882a593Smuzhiyun 1187*4882a593SmuzhiyunThis section applies to distros using a recent version of 1188*4882a593Smuzhiyuninitscripts with bonding support, for example, Red Hat Enterprise Linux 1189*4882a593Smuzhiyunversion 3 or later, Fedora, etc. On these systems, the network 1190*4882a593Smuzhiyuninitialization scripts have knowledge of bonding, and can be configured to 1191*4882a593Smuzhiyuncontrol bonding devices. Note that older versions of the initscripts 1192*4882a593Smuzhiyunpackage have lower levels of support for bonding; this will be noted where 1193*4882a593Smuzhiyunapplicable. 1194*4882a593Smuzhiyun 1195*4882a593SmuzhiyunThese distros will not automatically load the network adapter 1196*4882a593Smuzhiyundriver unless the ethX device is configured with an IP address. 1197*4882a593SmuzhiyunBecause of this constraint, users must manually configure a 1198*4882a593Smuzhiyunnetwork-script file for all physical adapters that will be members of 1199*4882a593Smuzhiyuna bondX link. Network script files are located in the directory: 1200*4882a593Smuzhiyun 1201*4882a593Smuzhiyun/etc/sysconfig/network-scripts 1202*4882a593Smuzhiyun 1203*4882a593SmuzhiyunThe file name must be prefixed with "ifcfg-eth" and suffixed 1204*4882a593Smuzhiyunwith the adapter's physical adapter number. For example, the script 1205*4882a593Smuzhiyunfor eth0 would be named /etc/sysconfig/network-scripts/ifcfg-eth0. 1206*4882a593SmuzhiyunPlace the following text in the file:: 1207*4882a593Smuzhiyun 1208*4882a593Smuzhiyun DEVICE=eth0 1209*4882a593Smuzhiyun USERCTL=no 1210*4882a593Smuzhiyun ONBOOT=yes 1211*4882a593Smuzhiyun MASTER=bond0 1212*4882a593Smuzhiyun SLAVE=yes 1213*4882a593Smuzhiyun BOOTPROTO=none 1214*4882a593Smuzhiyun 1215*4882a593SmuzhiyunThe DEVICE= line will be different for every ethX device and 1216*4882a593Smuzhiyunmust correspond with the name of the file, i.e., ifcfg-eth1 must have 1217*4882a593Smuzhiyuna device line of DEVICE=eth1. The setting of the MASTER= line will 1218*4882a593Smuzhiyunalso depend on the final bonding interface name chosen for your bond. 1219*4882a593SmuzhiyunAs with other network devices, these typically start at 0, and go up 1220*4882a593Smuzhiyunone for each device, i.e., the first bonding instance is bond0, the 1221*4882a593Smuzhiyunsecond is bond1, and so on. 1222*4882a593Smuzhiyun 1223*4882a593SmuzhiyunNext, create a bond network script. The file name for this 1224*4882a593Smuzhiyunscript will be /etc/sysconfig/network-scripts/ifcfg-bondX where X is 1225*4882a593Smuzhiyunthe number of the bond. For bond0 the file is named "ifcfg-bond0", 1226*4882a593Smuzhiyunfor bond1 it is named "ifcfg-bond1", and so on. Within that file, 1227*4882a593Smuzhiyunplace the following text:: 1228*4882a593Smuzhiyun 1229*4882a593Smuzhiyun DEVICE=bond0 1230*4882a593Smuzhiyun IPADDR=192.168.1.1 1231*4882a593Smuzhiyun NETMASK=255.255.255.0 1232*4882a593Smuzhiyun NETWORK=192.168.1.0 1233*4882a593Smuzhiyun BROADCAST=192.168.1.255 1234*4882a593Smuzhiyun ONBOOT=yes 1235*4882a593Smuzhiyun BOOTPROTO=none 1236*4882a593Smuzhiyun USERCTL=no 1237*4882a593Smuzhiyun 1238*4882a593SmuzhiyunBe sure to change the networking specific lines (IPADDR, 1239*4882a593SmuzhiyunNETMASK, NETWORK and BROADCAST) to match your network configuration. 1240*4882a593Smuzhiyun 1241*4882a593SmuzhiyunFor later versions of initscripts, such as that found with Fedora 1242*4882a593Smuzhiyun7 (or later) and Red Hat Enterprise Linux version 5 (or later), it is possible, 1243*4882a593Smuzhiyunand, indeed, preferable, to specify the bonding options in the ifcfg-bond0 1244*4882a593Smuzhiyunfile, e.g. a line of the format:: 1245*4882a593Smuzhiyun 1246*4882a593Smuzhiyun BONDING_OPTS="mode=active-backup arp_interval=60 arp_ip_target=192.168.1.254" 1247*4882a593Smuzhiyun 1248*4882a593Smuzhiyunwill configure the bond with the specified options. The options 1249*4882a593Smuzhiyunspecified in BONDING_OPTS are identical to the bonding module parameters 1250*4882a593Smuzhiyunexcept for the arp_ip_target field when using versions of initscripts older 1251*4882a593Smuzhiyunthan and 8.57 (Fedora 8) and 8.45.19 (Red Hat Enterprise Linux 5.2). When 1252*4882a593Smuzhiyunusing older versions each target should be included as a separate option and 1253*4882a593Smuzhiyunshould be preceded by a '+' to indicate it should be added to the list of 1254*4882a593Smuzhiyunqueried targets, e.g.,:: 1255*4882a593Smuzhiyun 1256*4882a593Smuzhiyun arp_ip_target=+192.168.1.1 arp_ip_target=+192.168.1.2 1257*4882a593Smuzhiyun 1258*4882a593Smuzhiyunis the proper syntax to specify multiple targets. When specifying 1259*4882a593Smuzhiyunoptions via BONDING_OPTS, it is not necessary to edit 1260*4882a593Smuzhiyun``/etc/modprobe.d/*.conf``. 1261*4882a593Smuzhiyun 1262*4882a593SmuzhiyunFor even older versions of initscripts that do not support 1263*4882a593SmuzhiyunBONDING_OPTS, it is necessary to edit /etc/modprobe.d/*.conf, depending upon 1264*4882a593Smuzhiyunyour distro) to load the bonding module with your desired options when the 1265*4882a593Smuzhiyunbond0 interface is brought up. The following lines in /etc/modprobe.d/*.conf 1266*4882a593Smuzhiyunwill load the bonding module, and select its options: 1267*4882a593Smuzhiyun 1268*4882a593Smuzhiyun alias bond0 bonding 1269*4882a593Smuzhiyun options bond0 mode=balance-alb miimon=100 1270*4882a593Smuzhiyun 1271*4882a593SmuzhiyunReplace the sample parameters with the appropriate set of 1272*4882a593Smuzhiyunoptions for your configuration. 1273*4882a593Smuzhiyun 1274*4882a593SmuzhiyunFinally run "/etc/rc.d/init.d/network restart" as root. This 1275*4882a593Smuzhiyunwill restart the networking subsystem and your bond link should be now 1276*4882a593Smuzhiyunup and running. 1277*4882a593Smuzhiyun 1278*4882a593Smuzhiyun3.2.1 Using DHCP with Initscripts 1279*4882a593Smuzhiyun--------------------------------- 1280*4882a593Smuzhiyun 1281*4882a593SmuzhiyunRecent versions of initscripts (the versions supplied with Fedora 1282*4882a593SmuzhiyunCore 3 and Red Hat Enterprise Linux 4, or later versions, are reported to 1283*4882a593Smuzhiyunwork) have support for assigning IP information to bonding devices via 1284*4882a593SmuzhiyunDHCP. 1285*4882a593Smuzhiyun 1286*4882a593SmuzhiyunTo configure bonding for DHCP, configure it as described 1287*4882a593Smuzhiyunabove, except replace the line "BOOTPROTO=none" with "BOOTPROTO=dhcp" 1288*4882a593Smuzhiyunand add a line consisting of "TYPE=Bonding". Note that the TYPE value 1289*4882a593Smuzhiyunis case sensitive. 1290*4882a593Smuzhiyun 1291*4882a593Smuzhiyun3.2.2 Configuring Multiple Bonds with Initscripts 1292*4882a593Smuzhiyun------------------------------------------------- 1293*4882a593Smuzhiyun 1294*4882a593SmuzhiyunInitscripts packages that are included with Fedora 7 and Red Hat 1295*4882a593SmuzhiyunEnterprise Linux 5 support multiple bonding interfaces by simply 1296*4882a593Smuzhiyunspecifying the appropriate BONDING_OPTS= in ifcfg-bondX where X is the 1297*4882a593Smuzhiyunnumber of the bond. This support requires sysfs support in the kernel, 1298*4882a593Smuzhiyunand a bonding driver of version 3.0.0 or later. Other configurations may 1299*4882a593Smuzhiyunnot support this method for specifying multiple bonding interfaces; for 1300*4882a593Smuzhiyunthose instances, see the "Configuring Multiple Bonds Manually" section, 1301*4882a593Smuzhiyunbelow. 1302*4882a593Smuzhiyun 1303*4882a593Smuzhiyun3.3 Configuring Bonding Manually with iproute2 1304*4882a593Smuzhiyun----------------------------------------------- 1305*4882a593Smuzhiyun 1306*4882a593SmuzhiyunThis section applies to distros whose network initialization 1307*4882a593Smuzhiyunscripts (the sysconfig or initscripts package) do not have specific 1308*4882a593Smuzhiyunknowledge of bonding. One such distro is SuSE Linux Enterprise Server 1309*4882a593Smuzhiyunversion 8. 1310*4882a593Smuzhiyun 1311*4882a593SmuzhiyunThe general method for these systems is to place the bonding 1312*4882a593Smuzhiyunmodule parameters into a config file in /etc/modprobe.d/ (as 1313*4882a593Smuzhiyunappropriate for the installed distro), then add modprobe and/or 1314*4882a593Smuzhiyun`ip link` commands to the system's global init script. The name of 1315*4882a593Smuzhiyunthe global init script differs; for sysconfig, it is 1316*4882a593Smuzhiyun/etc/init.d/boot.local and for initscripts it is /etc/rc.d/rc.local. 1317*4882a593Smuzhiyun 1318*4882a593SmuzhiyunFor example, if you wanted to make a simple bond of two e100 1319*4882a593Smuzhiyundevices (presumed to be eth0 and eth1), and have it persist across 1320*4882a593Smuzhiyunreboots, edit the appropriate file (/etc/init.d/boot.local or 1321*4882a593Smuzhiyun/etc/rc.d/rc.local), and add the following:: 1322*4882a593Smuzhiyun 1323*4882a593Smuzhiyun modprobe bonding mode=balance-alb miimon=100 1324*4882a593Smuzhiyun modprobe e100 1325*4882a593Smuzhiyun ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up 1326*4882a593Smuzhiyun ip link set eth0 master bond0 1327*4882a593Smuzhiyun ip link set eth1 master bond0 1328*4882a593Smuzhiyun 1329*4882a593SmuzhiyunReplace the example bonding module parameters and bond0 1330*4882a593Smuzhiyunnetwork configuration (IP address, netmask, etc) with the appropriate 1331*4882a593Smuzhiyunvalues for your configuration. 1332*4882a593Smuzhiyun 1333*4882a593SmuzhiyunUnfortunately, this method will not provide support for the 1334*4882a593Smuzhiyunifup and ifdown scripts on the bond devices. To reload the bonding 1335*4882a593Smuzhiyunconfiguration, it is necessary to run the initialization script, e.g.,:: 1336*4882a593Smuzhiyun 1337*4882a593Smuzhiyun # /etc/init.d/boot.local 1338*4882a593Smuzhiyun 1339*4882a593Smuzhiyunor:: 1340*4882a593Smuzhiyun 1341*4882a593Smuzhiyun # /etc/rc.d/rc.local 1342*4882a593Smuzhiyun 1343*4882a593SmuzhiyunIt may be desirable in such a case to create a separate script 1344*4882a593Smuzhiyunwhich only initializes the bonding configuration, then call that 1345*4882a593Smuzhiyunseparate script from within boot.local. This allows for bonding to be 1346*4882a593Smuzhiyunenabled without re-running the entire global init script. 1347*4882a593Smuzhiyun 1348*4882a593SmuzhiyunTo shut down the bonding devices, it is necessary to first 1349*4882a593Smuzhiyunmark the bonding device itself as being down, then remove the 1350*4882a593Smuzhiyunappropriate device driver modules. For our example above, you can do 1351*4882a593Smuzhiyunthe following:: 1352*4882a593Smuzhiyun 1353*4882a593Smuzhiyun # ifconfig bond0 down 1354*4882a593Smuzhiyun # rmmod bonding 1355*4882a593Smuzhiyun # rmmod e100 1356*4882a593Smuzhiyun 1357*4882a593SmuzhiyunAgain, for convenience, it may be desirable to create a script 1358*4882a593Smuzhiyunwith these commands. 1359*4882a593Smuzhiyun 1360*4882a593Smuzhiyun 1361*4882a593Smuzhiyun3.3.1 Configuring Multiple Bonds Manually 1362*4882a593Smuzhiyun----------------------------------------- 1363*4882a593Smuzhiyun 1364*4882a593SmuzhiyunThis section contains information on configuring multiple 1365*4882a593Smuzhiyunbonding devices with differing options for those systems whose network 1366*4882a593Smuzhiyuninitialization scripts lack support for configuring multiple bonds. 1367*4882a593Smuzhiyun 1368*4882a593SmuzhiyunIf you require multiple bonding devices, but all with the same 1369*4882a593Smuzhiyunoptions, you may wish to use the "max_bonds" module parameter, 1370*4882a593Smuzhiyundocumented above. 1371*4882a593Smuzhiyun 1372*4882a593SmuzhiyunTo create multiple bonding devices with differing options, it is 1373*4882a593Smuzhiyunpreferable to use bonding parameters exported by sysfs, documented in the 1374*4882a593Smuzhiyunsection below. 1375*4882a593Smuzhiyun 1376*4882a593SmuzhiyunFor versions of bonding without sysfs support, the only means to 1377*4882a593Smuzhiyunprovide multiple instances of bonding with differing options is to load 1378*4882a593Smuzhiyunthe bonding driver multiple times. Note that current versions of the 1379*4882a593Smuzhiyunsysconfig network initialization scripts handle this automatically; if 1380*4882a593Smuzhiyunyour distro uses these scripts, no special action is needed. See the 1381*4882a593Smuzhiyunsection Configuring Bonding Devices, above, if you're not sure about your 1382*4882a593Smuzhiyunnetwork initialization scripts. 1383*4882a593Smuzhiyun 1384*4882a593SmuzhiyunTo load multiple instances of the module, it is necessary to 1385*4882a593Smuzhiyunspecify a different name for each instance (the module loading system 1386*4882a593Smuzhiyunrequires that every loaded module, even multiple instances of the same 1387*4882a593Smuzhiyunmodule, have a unique name). This is accomplished by supplying multiple 1388*4882a593Smuzhiyunsets of bonding options in ``/etc/modprobe.d/*.conf``, for example:: 1389*4882a593Smuzhiyun 1390*4882a593Smuzhiyun alias bond0 bonding 1391*4882a593Smuzhiyun options bond0 -o bond0 mode=balance-rr miimon=100 1392*4882a593Smuzhiyun 1393*4882a593Smuzhiyun alias bond1 bonding 1394*4882a593Smuzhiyun options bond1 -o bond1 mode=balance-alb miimon=50 1395*4882a593Smuzhiyun 1396*4882a593Smuzhiyunwill load the bonding module two times. The first instance is 1397*4882a593Smuzhiyunnamed "bond0" and creates the bond0 device in balance-rr mode with an 1398*4882a593Smuzhiyunmiimon of 100. The second instance is named "bond1" and creates the 1399*4882a593Smuzhiyunbond1 device in balance-alb mode with an miimon of 50. 1400*4882a593Smuzhiyun 1401*4882a593SmuzhiyunIn some circumstances (typically with older distributions), 1402*4882a593Smuzhiyunthe above does not work, and the second bonding instance never sees 1403*4882a593Smuzhiyunits options. In that case, the second options line can be substituted 1404*4882a593Smuzhiyunas follows:: 1405*4882a593Smuzhiyun 1406*4882a593Smuzhiyun install bond1 /sbin/modprobe --ignore-install bonding -o bond1 \ 1407*4882a593Smuzhiyun mode=balance-alb miimon=50 1408*4882a593Smuzhiyun 1409*4882a593SmuzhiyunThis may be repeated any number of times, specifying a new and 1410*4882a593Smuzhiyununique name in place of bond1 for each subsequent instance. 1411*4882a593Smuzhiyun 1412*4882a593SmuzhiyunIt has been observed that some Red Hat supplied kernels are unable 1413*4882a593Smuzhiyunto rename modules at load time (the "-o bond1" part). Attempts to pass 1414*4882a593Smuzhiyunthat option to modprobe will produce an "Operation not permitted" error. 1415*4882a593SmuzhiyunThis has been reported on some Fedora Core kernels, and has been seen on 1416*4882a593SmuzhiyunRHEL 4 as well. On kernels exhibiting this problem, it will be impossible 1417*4882a593Smuzhiyunto configure multiple bonds with differing parameters (as they are older 1418*4882a593Smuzhiyunkernels, and also lack sysfs support). 1419*4882a593Smuzhiyun 1420*4882a593Smuzhiyun3.4 Configuring Bonding Manually via Sysfs 1421*4882a593Smuzhiyun------------------------------------------ 1422*4882a593Smuzhiyun 1423*4882a593SmuzhiyunStarting with version 3.0.0, Channel Bonding may be configured 1424*4882a593Smuzhiyunvia the sysfs interface. This interface allows dynamic configuration 1425*4882a593Smuzhiyunof all bonds in the system without unloading the module. It also 1426*4882a593Smuzhiyunallows for adding and removing bonds at runtime. Ifenslave is no 1427*4882a593Smuzhiyunlonger required, though it is still supported. 1428*4882a593Smuzhiyun 1429*4882a593SmuzhiyunUse of the sysfs interface allows you to use multiple bonds 1430*4882a593Smuzhiyunwith different configurations without having to reload the module. 1431*4882a593SmuzhiyunIt also allows you to use multiple, differently configured bonds when 1432*4882a593Smuzhiyunbonding is compiled into the kernel. 1433*4882a593Smuzhiyun 1434*4882a593SmuzhiyunYou must have the sysfs filesystem mounted to configure 1435*4882a593Smuzhiyunbonding this way. The examples in this document assume that you 1436*4882a593Smuzhiyunare using the standard mount point for sysfs, e.g. /sys. If your 1437*4882a593Smuzhiyunsysfs filesystem is mounted elsewhere, you will need to adjust the 1438*4882a593Smuzhiyunexample paths accordingly. 1439*4882a593Smuzhiyun 1440*4882a593SmuzhiyunCreating and Destroying Bonds 1441*4882a593Smuzhiyun----------------------------- 1442*4882a593SmuzhiyunTo add a new bond foo:: 1443*4882a593Smuzhiyun 1444*4882a593Smuzhiyun # echo +foo > /sys/class/net/bonding_masters 1445*4882a593Smuzhiyun 1446*4882a593SmuzhiyunTo remove an existing bond bar:: 1447*4882a593Smuzhiyun 1448*4882a593Smuzhiyun # echo -bar > /sys/class/net/bonding_masters 1449*4882a593Smuzhiyun 1450*4882a593SmuzhiyunTo show all existing bonds:: 1451*4882a593Smuzhiyun 1452*4882a593Smuzhiyun # cat /sys/class/net/bonding_masters 1453*4882a593Smuzhiyun 1454*4882a593Smuzhiyun.. note:: 1455*4882a593Smuzhiyun 1456*4882a593Smuzhiyun due to 4K size limitation of sysfs files, this list may be 1457*4882a593Smuzhiyun truncated if you have more than a few hundred bonds. This is unlikely 1458*4882a593Smuzhiyun to occur under normal operating conditions. 1459*4882a593Smuzhiyun 1460*4882a593SmuzhiyunAdding and Removing Slaves 1461*4882a593Smuzhiyun-------------------------- 1462*4882a593SmuzhiyunInterfaces may be enslaved to a bond using the file 1463*4882a593Smuzhiyun/sys/class/net/<bond>/bonding/slaves. The semantics for this file 1464*4882a593Smuzhiyunare the same as for the bonding_masters file. 1465*4882a593Smuzhiyun 1466*4882a593SmuzhiyunTo enslave interface eth0 to bond bond0:: 1467*4882a593Smuzhiyun 1468*4882a593Smuzhiyun # ifconfig bond0 up 1469*4882a593Smuzhiyun # echo +eth0 > /sys/class/net/bond0/bonding/slaves 1470*4882a593Smuzhiyun 1471*4882a593SmuzhiyunTo free slave eth0 from bond bond0:: 1472*4882a593Smuzhiyun 1473*4882a593Smuzhiyun # echo -eth0 > /sys/class/net/bond0/bonding/slaves 1474*4882a593Smuzhiyun 1475*4882a593SmuzhiyunWhen an interface is enslaved to a bond, symlinks between the 1476*4882a593Smuzhiyuntwo are created in the sysfs filesystem. In this case, you would get 1477*4882a593Smuzhiyun/sys/class/net/bond0/slave_eth0 pointing to /sys/class/net/eth0, and 1478*4882a593Smuzhiyun/sys/class/net/eth0/master pointing to /sys/class/net/bond0. 1479*4882a593Smuzhiyun 1480*4882a593SmuzhiyunThis means that you can tell quickly whether or not an 1481*4882a593Smuzhiyuninterface is enslaved by looking for the master symlink. Thus: 1482*4882a593Smuzhiyun# echo -eth0 > /sys/class/net/eth0/master/bonding/slaves 1483*4882a593Smuzhiyunwill free eth0 from whatever bond it is enslaved to, regardless of 1484*4882a593Smuzhiyunthe name of the bond interface. 1485*4882a593Smuzhiyun 1486*4882a593SmuzhiyunChanging a Bond's Configuration 1487*4882a593Smuzhiyun------------------------------- 1488*4882a593SmuzhiyunEach bond may be configured individually by manipulating the 1489*4882a593Smuzhiyunfiles located in /sys/class/net/<bond name>/bonding 1490*4882a593Smuzhiyun 1491*4882a593SmuzhiyunThe names of these files correspond directly with the command- 1492*4882a593Smuzhiyunline parameters described elsewhere in this file, and, with the 1493*4882a593Smuzhiyunexception of arp_ip_target, they accept the same values. To see the 1494*4882a593Smuzhiyuncurrent setting, simply cat the appropriate file. 1495*4882a593Smuzhiyun 1496*4882a593SmuzhiyunA few examples will be given here; for specific usage 1497*4882a593Smuzhiyunguidelines for each parameter, see the appropriate section in this 1498*4882a593Smuzhiyundocument. 1499*4882a593Smuzhiyun 1500*4882a593SmuzhiyunTo configure bond0 for balance-alb mode:: 1501*4882a593Smuzhiyun 1502*4882a593Smuzhiyun # ifconfig bond0 down 1503*4882a593Smuzhiyun # echo 6 > /sys/class/net/bond0/bonding/mode 1504*4882a593Smuzhiyun - or - 1505*4882a593Smuzhiyun # echo balance-alb > /sys/class/net/bond0/bonding/mode 1506*4882a593Smuzhiyun 1507*4882a593Smuzhiyun.. note:: 1508*4882a593Smuzhiyun 1509*4882a593Smuzhiyun The bond interface must be down before the mode can be changed. 1510*4882a593Smuzhiyun 1511*4882a593SmuzhiyunTo enable MII monitoring on bond0 with a 1 second interval:: 1512*4882a593Smuzhiyun 1513*4882a593Smuzhiyun # echo 1000 > /sys/class/net/bond0/bonding/miimon 1514*4882a593Smuzhiyun 1515*4882a593Smuzhiyun.. note:: 1516*4882a593Smuzhiyun 1517*4882a593Smuzhiyun If ARP monitoring is enabled, it will disabled when MII 1518*4882a593Smuzhiyun monitoring is enabled, and vice-versa. 1519*4882a593Smuzhiyun 1520*4882a593SmuzhiyunTo add ARP targets:: 1521*4882a593Smuzhiyun 1522*4882a593Smuzhiyun # echo +192.168.0.100 > /sys/class/net/bond0/bonding/arp_ip_target 1523*4882a593Smuzhiyun # echo +192.168.0.101 > /sys/class/net/bond0/bonding/arp_ip_target 1524*4882a593Smuzhiyun 1525*4882a593Smuzhiyun.. note:: 1526*4882a593Smuzhiyun 1527*4882a593Smuzhiyun up to 16 target addresses may be specified. 1528*4882a593Smuzhiyun 1529*4882a593SmuzhiyunTo remove an ARP target:: 1530*4882a593Smuzhiyun 1531*4882a593Smuzhiyun # echo -192.168.0.100 > /sys/class/net/bond0/bonding/arp_ip_target 1532*4882a593Smuzhiyun 1533*4882a593SmuzhiyunTo configure the interval between learning packet transmits:: 1534*4882a593Smuzhiyun 1535*4882a593Smuzhiyun # echo 12 > /sys/class/net/bond0/bonding/lp_interval 1536*4882a593Smuzhiyun 1537*4882a593Smuzhiyun.. note:: 1538*4882a593Smuzhiyun 1539*4882a593Smuzhiyun the lp_interval is the number of seconds between instances where 1540*4882a593Smuzhiyun the bonding driver sends learning packets to each slaves peer switch. The 1541*4882a593Smuzhiyun default interval is 1 second. 1542*4882a593Smuzhiyun 1543*4882a593SmuzhiyunExample Configuration 1544*4882a593Smuzhiyun--------------------- 1545*4882a593SmuzhiyunWe begin with the same example that is shown in section 3.3, 1546*4882a593Smuzhiyunexecuted with sysfs, and without using ifenslave. 1547*4882a593Smuzhiyun 1548*4882a593SmuzhiyunTo make a simple bond of two e100 devices (presumed to be eth0 1549*4882a593Smuzhiyunand eth1), and have it persist across reboots, edit the appropriate 1550*4882a593Smuzhiyunfile (/etc/init.d/boot.local or /etc/rc.d/rc.local), and add the 1551*4882a593Smuzhiyunfollowing:: 1552*4882a593Smuzhiyun 1553*4882a593Smuzhiyun modprobe bonding 1554*4882a593Smuzhiyun modprobe e100 1555*4882a593Smuzhiyun echo balance-alb > /sys/class/net/bond0/bonding/mode 1556*4882a593Smuzhiyun ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up 1557*4882a593Smuzhiyun echo 100 > /sys/class/net/bond0/bonding/miimon 1558*4882a593Smuzhiyun echo +eth0 > /sys/class/net/bond0/bonding/slaves 1559*4882a593Smuzhiyun echo +eth1 > /sys/class/net/bond0/bonding/slaves 1560*4882a593Smuzhiyun 1561*4882a593SmuzhiyunTo add a second bond, with two e1000 interfaces in 1562*4882a593Smuzhiyunactive-backup mode, using ARP monitoring, add the following lines to 1563*4882a593Smuzhiyunyour init script:: 1564*4882a593Smuzhiyun 1565*4882a593Smuzhiyun modprobe e1000 1566*4882a593Smuzhiyun echo +bond1 > /sys/class/net/bonding_masters 1567*4882a593Smuzhiyun echo active-backup > /sys/class/net/bond1/bonding/mode 1568*4882a593Smuzhiyun ifconfig bond1 192.168.2.1 netmask 255.255.255.0 up 1569*4882a593Smuzhiyun echo +192.168.2.100 /sys/class/net/bond1/bonding/arp_ip_target 1570*4882a593Smuzhiyun echo 2000 > /sys/class/net/bond1/bonding/arp_interval 1571*4882a593Smuzhiyun echo +eth2 > /sys/class/net/bond1/bonding/slaves 1572*4882a593Smuzhiyun echo +eth3 > /sys/class/net/bond1/bonding/slaves 1573*4882a593Smuzhiyun 1574*4882a593Smuzhiyun3.5 Configuration with Interfaces Support 1575*4882a593Smuzhiyun----------------------------------------- 1576*4882a593Smuzhiyun 1577*4882a593SmuzhiyunThis section applies to distros which use /etc/network/interfaces file 1578*4882a593Smuzhiyunto describe network interface configuration, most notably Debian and it's 1579*4882a593Smuzhiyunderivatives. 1580*4882a593Smuzhiyun 1581*4882a593SmuzhiyunThe ifup and ifdown commands on Debian don't support bonding out of 1582*4882a593Smuzhiyunthe box. The ifenslave-2.6 package should be installed to provide bonding 1583*4882a593Smuzhiyunsupport. Once installed, this package will provide ``bond-*`` options 1584*4882a593Smuzhiyunto be used into /etc/network/interfaces. 1585*4882a593Smuzhiyun 1586*4882a593SmuzhiyunNote that ifenslave-2.6 package will load the bonding module and use 1587*4882a593Smuzhiyunthe ifenslave command when appropriate. 1588*4882a593Smuzhiyun 1589*4882a593SmuzhiyunExample Configurations 1590*4882a593Smuzhiyun---------------------- 1591*4882a593Smuzhiyun 1592*4882a593SmuzhiyunIn /etc/network/interfaces, the following stanza will configure bond0, in 1593*4882a593Smuzhiyunactive-backup mode, with eth0 and eth1 as slaves:: 1594*4882a593Smuzhiyun 1595*4882a593Smuzhiyun auto bond0 1596*4882a593Smuzhiyun iface bond0 inet dhcp 1597*4882a593Smuzhiyun bond-slaves eth0 eth1 1598*4882a593Smuzhiyun bond-mode active-backup 1599*4882a593Smuzhiyun bond-miimon 100 1600*4882a593Smuzhiyun bond-primary eth0 eth1 1601*4882a593Smuzhiyun 1602*4882a593SmuzhiyunIf the above configuration doesn't work, you might have a system using 1603*4882a593Smuzhiyunupstart for system startup. This is most notably true for recent 1604*4882a593SmuzhiyunUbuntu versions. The following stanza in /etc/network/interfaces will 1605*4882a593Smuzhiyunproduce the same result on those systems:: 1606*4882a593Smuzhiyun 1607*4882a593Smuzhiyun auto bond0 1608*4882a593Smuzhiyun iface bond0 inet dhcp 1609*4882a593Smuzhiyun bond-slaves none 1610*4882a593Smuzhiyun bond-mode active-backup 1611*4882a593Smuzhiyun bond-miimon 100 1612*4882a593Smuzhiyun 1613*4882a593Smuzhiyun auto eth0 1614*4882a593Smuzhiyun iface eth0 inet manual 1615*4882a593Smuzhiyun bond-master bond0 1616*4882a593Smuzhiyun bond-primary eth0 eth1 1617*4882a593Smuzhiyun 1618*4882a593Smuzhiyun auto eth1 1619*4882a593Smuzhiyun iface eth1 inet manual 1620*4882a593Smuzhiyun bond-master bond0 1621*4882a593Smuzhiyun bond-primary eth0 eth1 1622*4882a593Smuzhiyun 1623*4882a593SmuzhiyunFor a full list of ``bond-*`` supported options in /etc/network/interfaces and 1624*4882a593Smuzhiyunsome more advanced examples tailored to you particular distros, see the files in 1625*4882a593Smuzhiyun/usr/share/doc/ifenslave-2.6. 1626*4882a593Smuzhiyun 1627*4882a593Smuzhiyun3.6 Overriding Configuration for Special Cases 1628*4882a593Smuzhiyun---------------------------------------------- 1629*4882a593Smuzhiyun 1630*4882a593SmuzhiyunWhen using the bonding driver, the physical port which transmits a frame is 1631*4882a593Smuzhiyuntypically selected by the bonding driver, and is not relevant to the user or 1632*4882a593Smuzhiyunsystem administrator. The output port is simply selected using the policies of 1633*4882a593Smuzhiyunthe selected bonding mode. On occasion however, it is helpful to direct certain 1634*4882a593Smuzhiyunclasses of traffic to certain physical interfaces on output to implement 1635*4882a593Smuzhiyunslightly more complex policies. For example, to reach a web server over a 1636*4882a593Smuzhiyunbonded interface in which eth0 connects to a private network, while eth1 1637*4882a593Smuzhiyunconnects via a public network, it may be desirous to bias the bond to send said 1638*4882a593Smuzhiyuntraffic over eth0 first, using eth1 only as a fall back, while all other traffic 1639*4882a593Smuzhiyuncan safely be sent over either interface. Such configurations may be achieved 1640*4882a593Smuzhiyunusing the traffic control utilities inherent in linux. 1641*4882a593Smuzhiyun 1642*4882a593SmuzhiyunBy default the bonding driver is multiqueue aware and 16 queues are created 1643*4882a593Smuzhiyunwhen the driver initializes (see Documentation/networking/multiqueue.rst 1644*4882a593Smuzhiyunfor details). If more or less queues are desired the module parameter 1645*4882a593Smuzhiyuntx_queues can be used to change this value. There is no sysfs parameter 1646*4882a593Smuzhiyunavailable as the allocation is done at module init time. 1647*4882a593Smuzhiyun 1648*4882a593SmuzhiyunThe output of the file /proc/net/bonding/bondX has changed so the output Queue 1649*4882a593SmuzhiyunID is now printed for each slave:: 1650*4882a593Smuzhiyun 1651*4882a593Smuzhiyun Bonding Mode: fault-tolerance (active-backup) 1652*4882a593Smuzhiyun Primary Slave: None 1653*4882a593Smuzhiyun Currently Active Slave: eth0 1654*4882a593Smuzhiyun MII Status: up 1655*4882a593Smuzhiyun MII Polling Interval (ms): 0 1656*4882a593Smuzhiyun Up Delay (ms): 0 1657*4882a593Smuzhiyun Down Delay (ms): 0 1658*4882a593Smuzhiyun 1659*4882a593Smuzhiyun Slave Interface: eth0 1660*4882a593Smuzhiyun MII Status: up 1661*4882a593Smuzhiyun Link Failure Count: 0 1662*4882a593Smuzhiyun Permanent HW addr: 00:1a:a0:12:8f:cb 1663*4882a593Smuzhiyun Slave queue ID: 0 1664*4882a593Smuzhiyun 1665*4882a593Smuzhiyun Slave Interface: eth1 1666*4882a593Smuzhiyun MII Status: up 1667*4882a593Smuzhiyun Link Failure Count: 0 1668*4882a593Smuzhiyun Permanent HW addr: 00:1a:a0:12:8f:cc 1669*4882a593Smuzhiyun Slave queue ID: 2 1670*4882a593Smuzhiyun 1671*4882a593SmuzhiyunThe queue_id for a slave can be set using the command:: 1672*4882a593Smuzhiyun 1673*4882a593Smuzhiyun # echo "eth1:2" > /sys/class/net/bond0/bonding/queue_id 1674*4882a593Smuzhiyun 1675*4882a593SmuzhiyunAny interface that needs a queue_id set should set it with multiple calls 1676*4882a593Smuzhiyunlike the one above until proper priorities are set for all interfaces. On 1677*4882a593Smuzhiyundistributions that allow configuration via initscripts, multiple 'queue_id' 1678*4882a593Smuzhiyunarguments can be added to BONDING_OPTS to set all needed slave queues. 1679*4882a593Smuzhiyun 1680*4882a593SmuzhiyunThese queue id's can be used in conjunction with the tc utility to configure 1681*4882a593Smuzhiyuna multiqueue qdisc and filters to bias certain traffic to transmit on certain 1682*4882a593Smuzhiyunslave devices. For instance, say we wanted, in the above configuration to 1683*4882a593Smuzhiyunforce all traffic bound to 192.168.1.100 to use eth1 in the bond as its output 1684*4882a593Smuzhiyundevice. The following commands would accomplish this:: 1685*4882a593Smuzhiyun 1686*4882a593Smuzhiyun # tc qdisc add dev bond0 handle 1 root multiq 1687*4882a593Smuzhiyun 1688*4882a593Smuzhiyun # tc filter add dev bond0 protocol ip parent 1: prio 1 u32 match ip \ 1689*4882a593Smuzhiyun dst 192.168.1.100 action skbedit queue_mapping 2 1690*4882a593Smuzhiyun 1691*4882a593SmuzhiyunThese commands tell the kernel to attach a multiqueue queue discipline to the 1692*4882a593Smuzhiyunbond0 interface and filter traffic enqueued to it, such that packets with a dst 1693*4882a593Smuzhiyunip of 192.168.1.100 have their output queue mapping value overwritten to 2. 1694*4882a593SmuzhiyunThis value is then passed into the driver, causing the normal output path 1695*4882a593Smuzhiyunselection policy to be overridden, selecting instead qid 2, which maps to eth1. 1696*4882a593Smuzhiyun 1697*4882a593SmuzhiyunNote that qid values begin at 1. Qid 0 is reserved to initiate to the driver 1698*4882a593Smuzhiyunthat normal output policy selection should take place. One benefit to simply 1699*4882a593Smuzhiyunleaving the qid for a slave to 0 is the multiqueue awareness in the bonding 1700*4882a593Smuzhiyundriver that is now present. This awareness allows tc filters to be placed on 1701*4882a593Smuzhiyunslave devices as well as bond devices and the bonding driver will simply act as 1702*4882a593Smuzhiyuna pass-through for selecting output queues on the slave device rather than 1703*4882a593Smuzhiyunoutput port selection. 1704*4882a593Smuzhiyun 1705*4882a593SmuzhiyunThis feature first appeared in bonding driver version 3.7.0 and support for 1706*4882a593Smuzhiyunoutput slave selection was limited to round-robin and active-backup modes. 1707*4882a593Smuzhiyun 1708*4882a593Smuzhiyun3.7 Configuring LACP for 802.3ad mode in a more secure way 1709*4882a593Smuzhiyun---------------------------------------------------------- 1710*4882a593Smuzhiyun 1711*4882a593SmuzhiyunWhen using 802.3ad bonding mode, the Actor (host) and Partner (switch) 1712*4882a593Smuzhiyunexchange LACPDUs. These LACPDUs cannot be sniffed, because they are 1713*4882a593Smuzhiyundestined to link local mac addresses (which switches/bridges are not 1714*4882a593Smuzhiyunsupposed to forward). However, most of the values are easily predictable 1715*4882a593Smuzhiyunor are simply the machine's MAC address (which is trivially known to all 1716*4882a593Smuzhiyunother hosts in the same L2). This implies that other machines in the L2 1717*4882a593Smuzhiyundomain can spoof LACPDU packets from other hosts to the switch and potentially 1718*4882a593Smuzhiyuncause mayhem by joining (from the point of view of the switch) another 1719*4882a593Smuzhiyunmachine's aggregate, thus receiving a portion of that hosts incoming 1720*4882a593Smuzhiyuntraffic and / or spoofing traffic from that machine themselves (potentially 1721*4882a593Smuzhiyuneven successfully terminating some portion of flows). Though this is not 1722*4882a593Smuzhiyuna likely scenario, one could avoid this possibility by simply configuring 1723*4882a593Smuzhiyunfew bonding parameters: 1724*4882a593Smuzhiyun 1725*4882a593Smuzhiyun (a) ad_actor_system : You can set a random mac-address that can be used for 1726*4882a593Smuzhiyun these LACPDU exchanges. The value can not be either NULL or Multicast. 1727*4882a593Smuzhiyun Also it's preferable to set the local-admin bit. Following shell code 1728*4882a593Smuzhiyun generates a random mac-address as described above:: 1729*4882a593Smuzhiyun 1730*4882a593Smuzhiyun # sys_mac_addr=$(printf '%02x:%02x:%02x:%02x:%02x:%02x' \ 1731*4882a593Smuzhiyun $(( (RANDOM & 0xFE) | 0x02 )) \ 1732*4882a593Smuzhiyun $(( RANDOM & 0xFF )) \ 1733*4882a593Smuzhiyun $(( RANDOM & 0xFF )) \ 1734*4882a593Smuzhiyun $(( RANDOM & 0xFF )) \ 1735*4882a593Smuzhiyun $(( RANDOM & 0xFF )) \ 1736*4882a593Smuzhiyun $(( RANDOM & 0xFF ))) 1737*4882a593Smuzhiyun # echo $sys_mac_addr > /sys/class/net/bond0/bonding/ad_actor_system 1738*4882a593Smuzhiyun 1739*4882a593Smuzhiyun (b) ad_actor_sys_prio : Randomize the system priority. The default value 1740*4882a593Smuzhiyun is 65535, but system can take the value from 1 - 65535. Following shell 1741*4882a593Smuzhiyun code generates random priority and sets it:: 1742*4882a593Smuzhiyun 1743*4882a593Smuzhiyun # sys_prio=$(( 1 + RANDOM + RANDOM )) 1744*4882a593Smuzhiyun # echo $sys_prio > /sys/class/net/bond0/bonding/ad_actor_sys_prio 1745*4882a593Smuzhiyun 1746*4882a593Smuzhiyun (c) ad_user_port_key : Use the user portion of the port-key. The default 1747*4882a593Smuzhiyun keeps this empty. These are the upper 10 bits of the port-key and value 1748*4882a593Smuzhiyun ranges from 0 - 1023. Following shell code generates these 10 bits and 1749*4882a593Smuzhiyun sets it:: 1750*4882a593Smuzhiyun 1751*4882a593Smuzhiyun # usr_port_key=$(( RANDOM & 0x3FF )) 1752*4882a593Smuzhiyun # echo $usr_port_key > /sys/class/net/bond0/bonding/ad_user_port_key 1753*4882a593Smuzhiyun 1754*4882a593Smuzhiyun 1755*4882a593Smuzhiyun4 Querying Bonding Configuration 1756*4882a593Smuzhiyun================================= 1757*4882a593Smuzhiyun 1758*4882a593Smuzhiyun4.1 Bonding Configuration 1759*4882a593Smuzhiyun------------------------- 1760*4882a593Smuzhiyun 1761*4882a593SmuzhiyunEach bonding device has a read-only file residing in the 1762*4882a593Smuzhiyun/proc/net/bonding directory. The file contents include information 1763*4882a593Smuzhiyunabout the bonding configuration, options and state of each slave. 1764*4882a593Smuzhiyun 1765*4882a593SmuzhiyunFor example, the contents of /proc/net/bonding/bond0 after the 1766*4882a593Smuzhiyundriver is loaded with parameters of mode=0 and miimon=1000 is 1767*4882a593Smuzhiyungenerally as follows:: 1768*4882a593Smuzhiyun 1769*4882a593Smuzhiyun Ethernet Channel Bonding Driver: 2.6.1 (October 29, 2004) 1770*4882a593Smuzhiyun Bonding Mode: load balancing (round-robin) 1771*4882a593Smuzhiyun Currently Active Slave: eth0 1772*4882a593Smuzhiyun MII Status: up 1773*4882a593Smuzhiyun MII Polling Interval (ms): 1000 1774*4882a593Smuzhiyun Up Delay (ms): 0 1775*4882a593Smuzhiyun Down Delay (ms): 0 1776*4882a593Smuzhiyun 1777*4882a593Smuzhiyun Slave Interface: eth1 1778*4882a593Smuzhiyun MII Status: up 1779*4882a593Smuzhiyun Link Failure Count: 1 1780*4882a593Smuzhiyun 1781*4882a593Smuzhiyun Slave Interface: eth0 1782*4882a593Smuzhiyun MII Status: up 1783*4882a593Smuzhiyun Link Failure Count: 1 1784*4882a593Smuzhiyun 1785*4882a593SmuzhiyunThe precise format and contents will change depending upon the 1786*4882a593Smuzhiyunbonding configuration, state, and version of the bonding driver. 1787*4882a593Smuzhiyun 1788*4882a593Smuzhiyun4.2 Network configuration 1789*4882a593Smuzhiyun------------------------- 1790*4882a593Smuzhiyun 1791*4882a593SmuzhiyunThe network configuration can be inspected using the ifconfig 1792*4882a593Smuzhiyuncommand. Bonding devices will have the MASTER flag set; Bonding slave 1793*4882a593Smuzhiyundevices will have the SLAVE flag set. The ifconfig output does not 1794*4882a593Smuzhiyuncontain information on which slaves are associated with which masters. 1795*4882a593Smuzhiyun 1796*4882a593SmuzhiyunIn the example below, the bond0 interface is the master 1797*4882a593Smuzhiyun(MASTER) while eth0 and eth1 are slaves (SLAVE). Notice all slaves of 1798*4882a593Smuzhiyunbond0 have the same MAC address (HWaddr) as bond0 for all modes except 1799*4882a593SmuzhiyunTLB and ALB that require a unique MAC address for each slave:: 1800*4882a593Smuzhiyun 1801*4882a593Smuzhiyun # /sbin/ifconfig 1802*4882a593Smuzhiyun bond0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 1803*4882a593Smuzhiyun inet addr:XXX.XXX.XXX.YYY Bcast:XXX.XXX.XXX.255 Mask:255.255.252.0 1804*4882a593Smuzhiyun UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 1805*4882a593Smuzhiyun RX packets:7224794 errors:0 dropped:0 overruns:0 frame:0 1806*4882a593Smuzhiyun TX packets:3286647 errors:1 dropped:0 overruns:1 carrier:0 1807*4882a593Smuzhiyun collisions:0 txqueuelen:0 1808*4882a593Smuzhiyun 1809*4882a593Smuzhiyun eth0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 1810*4882a593Smuzhiyun UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 1811*4882a593Smuzhiyun RX packets:3573025 errors:0 dropped:0 overruns:0 frame:0 1812*4882a593Smuzhiyun TX packets:1643167 errors:1 dropped:0 overruns:1 carrier:0 1813*4882a593Smuzhiyun collisions:0 txqueuelen:100 1814*4882a593Smuzhiyun Interrupt:10 Base address:0x1080 1815*4882a593Smuzhiyun 1816*4882a593Smuzhiyun eth1 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 1817*4882a593Smuzhiyun UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 1818*4882a593Smuzhiyun RX packets:3651769 errors:0 dropped:0 overruns:0 frame:0 1819*4882a593Smuzhiyun TX packets:1643480 errors:0 dropped:0 overruns:0 carrier:0 1820*4882a593Smuzhiyun collisions:0 txqueuelen:100 1821*4882a593Smuzhiyun Interrupt:9 Base address:0x1400 1822*4882a593Smuzhiyun 1823*4882a593Smuzhiyun5. Switch Configuration 1824*4882a593Smuzhiyun======================= 1825*4882a593Smuzhiyun 1826*4882a593SmuzhiyunFor this section, "switch" refers to whatever system the 1827*4882a593Smuzhiyunbonded devices are directly connected to (i.e., where the other end of 1828*4882a593Smuzhiyunthe cable plugs into). This may be an actual dedicated switch device, 1829*4882a593Smuzhiyunor it may be another regular system (e.g., another computer running 1830*4882a593SmuzhiyunLinux), 1831*4882a593Smuzhiyun 1832*4882a593SmuzhiyunThe active-backup, balance-tlb and balance-alb modes do not 1833*4882a593Smuzhiyunrequire any specific configuration of the switch. 1834*4882a593Smuzhiyun 1835*4882a593SmuzhiyunThe 802.3ad mode requires that the switch have the appropriate 1836*4882a593Smuzhiyunports configured as an 802.3ad aggregation. The precise method used 1837*4882a593Smuzhiyunto configure this varies from switch to switch, but, for example, a 1838*4882a593SmuzhiyunCisco 3550 series switch requires that the appropriate ports first be 1839*4882a593Smuzhiyungrouped together in a single etherchannel instance, then that 1840*4882a593Smuzhiyunetherchannel is set to mode "lacp" to enable 802.3ad (instead of 1841*4882a593Smuzhiyunstandard EtherChannel). 1842*4882a593Smuzhiyun 1843*4882a593SmuzhiyunThe balance-rr, balance-xor and broadcast modes generally 1844*4882a593Smuzhiyunrequire that the switch have the appropriate ports grouped together. 1845*4882a593SmuzhiyunThe nomenclature for such a group differs between switches, it may be 1846*4882a593Smuzhiyuncalled an "etherchannel" (as in the Cisco example, above), a "trunk 1847*4882a593Smuzhiyungroup" or some other similar variation. For these modes, each switch 1848*4882a593Smuzhiyunwill also have its own configuration options for the switch's transmit 1849*4882a593Smuzhiyunpolicy to the bond. Typical choices include XOR of either the MAC or 1850*4882a593SmuzhiyunIP addresses. The transmit policy of the two peers does not need to 1851*4882a593Smuzhiyunmatch. For these three modes, the bonding mode really selects a 1852*4882a593Smuzhiyuntransmit policy for an EtherChannel group; all three will interoperate 1853*4882a593Smuzhiyunwith another EtherChannel group. 1854*4882a593Smuzhiyun 1855*4882a593Smuzhiyun 1856*4882a593Smuzhiyun6. 802.1q VLAN Support 1857*4882a593Smuzhiyun====================== 1858*4882a593Smuzhiyun 1859*4882a593SmuzhiyunIt is possible to configure VLAN devices over a bond interface 1860*4882a593Smuzhiyunusing the 8021q driver. However, only packets coming from the 8021q 1861*4882a593Smuzhiyundriver and passing through bonding will be tagged by default. Self 1862*4882a593Smuzhiyungenerated packets, for example, bonding's learning packets or ARP 1863*4882a593Smuzhiyunpackets generated by either ALB mode or the ARP monitor mechanism, are 1864*4882a593Smuzhiyuntagged internally by bonding itself. As a result, bonding must 1865*4882a593Smuzhiyun"learn" the VLAN IDs configured above it, and use those IDs to tag 1866*4882a593Smuzhiyunself generated packets. 1867*4882a593Smuzhiyun 1868*4882a593SmuzhiyunFor reasons of simplicity, and to support the use of adapters 1869*4882a593Smuzhiyunthat can do VLAN hardware acceleration offloading, the bonding 1870*4882a593Smuzhiyuninterface declares itself as fully hardware offloading capable, it gets 1871*4882a593Smuzhiyunthe add_vid/kill_vid notifications to gather the necessary 1872*4882a593Smuzhiyuninformation, and it propagates those actions to the slaves. In case 1873*4882a593Smuzhiyunof mixed adapter types, hardware accelerated tagged packets that 1874*4882a593Smuzhiyunshould go through an adapter that is not offloading capable are 1875*4882a593Smuzhiyun"un-accelerated" by the bonding driver so the VLAN tag sits in the 1876*4882a593Smuzhiyunregular location. 1877*4882a593Smuzhiyun 1878*4882a593SmuzhiyunVLAN interfaces *must* be added on top of a bonding interface 1879*4882a593Smuzhiyunonly after enslaving at least one slave. The bonding interface has a 1880*4882a593Smuzhiyunhardware address of 00:00:00:00:00:00 until the first slave is added. 1881*4882a593SmuzhiyunIf the VLAN interface is created prior to the first enslavement, it 1882*4882a593Smuzhiyunwould pick up the all-zeroes hardware address. Once the first slave 1883*4882a593Smuzhiyunis attached to the bond, the bond device itself will pick up the 1884*4882a593Smuzhiyunslave's hardware address, which is then available for the VLAN device. 1885*4882a593Smuzhiyun 1886*4882a593SmuzhiyunAlso, be aware that a similar problem can occur if all slaves 1887*4882a593Smuzhiyunare released from a bond that still has one or more VLAN interfaces on 1888*4882a593Smuzhiyuntop of it. When a new slave is added, the bonding interface will 1889*4882a593Smuzhiyunobtain its hardware address from the first slave, which might not 1890*4882a593Smuzhiyunmatch the hardware address of the VLAN interfaces (which was 1891*4882a593Smuzhiyunultimately copied from an earlier slave). 1892*4882a593Smuzhiyun 1893*4882a593SmuzhiyunThere are two methods to insure that the VLAN device operates 1894*4882a593Smuzhiyunwith the correct hardware address if all slaves are removed from a 1895*4882a593Smuzhiyunbond interface: 1896*4882a593Smuzhiyun 1897*4882a593Smuzhiyun1. Remove all VLAN interfaces then recreate them 1898*4882a593Smuzhiyun 1899*4882a593Smuzhiyun2. Set the bonding interface's hardware address so that it 1900*4882a593Smuzhiyunmatches the hardware address of the VLAN interfaces. 1901*4882a593Smuzhiyun 1902*4882a593SmuzhiyunNote that changing a VLAN interface's HW address would set the 1903*4882a593Smuzhiyununderlying device -- i.e. the bonding interface -- to promiscuous 1904*4882a593Smuzhiyunmode, which might not be what you want. 1905*4882a593Smuzhiyun 1906*4882a593Smuzhiyun 1907*4882a593Smuzhiyun7. Link Monitoring 1908*4882a593Smuzhiyun================== 1909*4882a593Smuzhiyun 1910*4882a593SmuzhiyunThe bonding driver at present supports two schemes for 1911*4882a593Smuzhiyunmonitoring a slave device's link state: the ARP monitor and the MII 1912*4882a593Smuzhiyunmonitor. 1913*4882a593Smuzhiyun 1914*4882a593SmuzhiyunAt the present time, due to implementation restrictions in the 1915*4882a593Smuzhiyunbonding driver itself, it is not possible to enable both ARP and MII 1916*4882a593Smuzhiyunmonitoring simultaneously. 1917*4882a593Smuzhiyun 1918*4882a593Smuzhiyun7.1 ARP Monitor Operation 1919*4882a593Smuzhiyun------------------------- 1920*4882a593Smuzhiyun 1921*4882a593SmuzhiyunThe ARP monitor operates as its name suggests: it sends ARP 1922*4882a593Smuzhiyunqueries to one or more designated peer systems on the network, and 1923*4882a593Smuzhiyunuses the response as an indication that the link is operating. This 1924*4882a593Smuzhiyungives some assurance that traffic is actually flowing to and from one 1925*4882a593Smuzhiyunor more peers on the local network. 1926*4882a593Smuzhiyun 1927*4882a593SmuzhiyunThe ARP monitor relies on the device driver itself to verify 1928*4882a593Smuzhiyunthat traffic is flowing. In particular, the driver must keep up to 1929*4882a593Smuzhiyundate the last receive time, dev->last_rx. Drivers that use NETIF_F_LLTX 1930*4882a593Smuzhiyunflag must also update netdev_queue->trans_start. If they do not, then the 1931*4882a593SmuzhiyunARP monitor will immediately fail any slaves using that driver, and 1932*4882a593Smuzhiyunthose slaves will stay down. If networking monitoring (tcpdump, etc) 1933*4882a593Smuzhiyunshows the ARP requests and replies on the network, then it may be that 1934*4882a593Smuzhiyunyour device driver is not updating last_rx and trans_start. 1935*4882a593Smuzhiyun 1936*4882a593Smuzhiyun7.2 Configuring Multiple ARP Targets 1937*4882a593Smuzhiyun------------------------------------ 1938*4882a593Smuzhiyun 1939*4882a593SmuzhiyunWhile ARP monitoring can be done with just one target, it can 1940*4882a593Smuzhiyunbe useful in a High Availability setup to have several targets to 1941*4882a593Smuzhiyunmonitor. In the case of just one target, the target itself may go 1942*4882a593Smuzhiyundown or have a problem making it unresponsive to ARP requests. Having 1943*4882a593Smuzhiyunan additional target (or several) increases the reliability of the ARP 1944*4882a593Smuzhiyunmonitoring. 1945*4882a593Smuzhiyun 1946*4882a593SmuzhiyunMultiple ARP targets must be separated by commas as follows:: 1947*4882a593Smuzhiyun 1948*4882a593Smuzhiyun # example options for ARP monitoring with three targets 1949*4882a593Smuzhiyun alias bond0 bonding 1950*4882a593Smuzhiyun options bond0 arp_interval=60 arp_ip_target=192.168.0.1,192.168.0.3,192.168.0.9 1951*4882a593Smuzhiyun 1952*4882a593SmuzhiyunFor just a single target the options would resemble:: 1953*4882a593Smuzhiyun 1954*4882a593Smuzhiyun # example options for ARP monitoring with one target 1955*4882a593Smuzhiyun alias bond0 bonding 1956*4882a593Smuzhiyun options bond0 arp_interval=60 arp_ip_target=192.168.0.100 1957*4882a593Smuzhiyun 1958*4882a593Smuzhiyun 1959*4882a593Smuzhiyun7.3 MII Monitor Operation 1960*4882a593Smuzhiyun------------------------- 1961*4882a593Smuzhiyun 1962*4882a593SmuzhiyunThe MII monitor monitors only the carrier state of the local 1963*4882a593Smuzhiyunnetwork interface. It accomplishes this in one of three ways: by 1964*4882a593Smuzhiyundepending upon the device driver to maintain its carrier state, by 1965*4882a593Smuzhiyunquerying the device's MII registers, or by making an ethtool query to 1966*4882a593Smuzhiyunthe device. 1967*4882a593Smuzhiyun 1968*4882a593SmuzhiyunIf the use_carrier module parameter is 1 (the default value), 1969*4882a593Smuzhiyunthen the MII monitor will rely on the driver for carrier state 1970*4882a593Smuzhiyuninformation (via the netif_carrier subsystem). As explained in the 1971*4882a593Smuzhiyunuse_carrier parameter information, above, if the MII monitor fails to 1972*4882a593Smuzhiyundetect carrier loss on the device (e.g., when the cable is physically 1973*4882a593Smuzhiyundisconnected), it may be that the driver does not support 1974*4882a593Smuzhiyunnetif_carrier. 1975*4882a593Smuzhiyun 1976*4882a593SmuzhiyunIf use_carrier is 0, then the MII monitor will first query the 1977*4882a593Smuzhiyundevice's (via ioctl) MII registers and check the link state. If that 1978*4882a593Smuzhiyunrequest fails (not just that it returns carrier down), then the MII 1979*4882a593Smuzhiyunmonitor will make an ethtool ETHOOL_GLINK request to attempt to obtain 1980*4882a593Smuzhiyunthe same information. If both methods fail (i.e., the driver either 1981*4882a593Smuzhiyundoes not support or had some error in processing both the MII register 1982*4882a593Smuzhiyunand ethtool requests), then the MII monitor will assume the link is 1983*4882a593Smuzhiyunup. 1984*4882a593Smuzhiyun 1985*4882a593Smuzhiyun8. Potential Sources of Trouble 1986*4882a593Smuzhiyun=============================== 1987*4882a593Smuzhiyun 1988*4882a593Smuzhiyun8.1 Adventures in Routing 1989*4882a593Smuzhiyun------------------------- 1990*4882a593Smuzhiyun 1991*4882a593SmuzhiyunWhen bonding is configured, it is important that the slave 1992*4882a593Smuzhiyundevices not have routes that supersede routes of the master (or, 1993*4882a593Smuzhiyungenerally, not have routes at all). For example, suppose the bonding 1994*4882a593Smuzhiyundevice bond0 has two slaves, eth0 and eth1, and the routing table is 1995*4882a593Smuzhiyunas follows:: 1996*4882a593Smuzhiyun 1997*4882a593Smuzhiyun Kernel IP routing table 1998*4882a593Smuzhiyun Destination Gateway Genmask Flags MSS Window irtt Iface 1999*4882a593Smuzhiyun 10.0.0.0 0.0.0.0 255.255.0.0 U 40 0 0 eth0 2000*4882a593Smuzhiyun 10.0.0.0 0.0.0.0 255.255.0.0 U 40 0 0 eth1 2001*4882a593Smuzhiyun 10.0.0.0 0.0.0.0 255.255.0.0 U 40 0 0 bond0 2002*4882a593Smuzhiyun 127.0.0.0 0.0.0.0 255.0.0.0 U 40 0 0 lo 2003*4882a593Smuzhiyun 2004*4882a593SmuzhiyunThis routing configuration will likely still update the 2005*4882a593Smuzhiyunreceive/transmit times in the driver (needed by the ARP monitor), but 2006*4882a593Smuzhiyunmay bypass the bonding driver (because outgoing traffic to, in this 2007*4882a593Smuzhiyuncase, another host on network 10 would use eth0 or eth1 before bond0). 2008*4882a593Smuzhiyun 2009*4882a593SmuzhiyunThe ARP monitor (and ARP itself) may become confused by this 2010*4882a593Smuzhiyunconfiguration, because ARP requests (generated by the ARP monitor) 2011*4882a593Smuzhiyunwill be sent on one interface (bond0), but the corresponding reply 2012*4882a593Smuzhiyunwill arrive on a different interface (eth0). This reply looks to ARP 2013*4882a593Smuzhiyunas an unsolicited ARP reply (because ARP matches replies on an 2014*4882a593Smuzhiyuninterface basis), and is discarded. The MII monitor is not affected 2015*4882a593Smuzhiyunby the state of the routing table. 2016*4882a593Smuzhiyun 2017*4882a593SmuzhiyunThe solution here is simply to insure that slaves do not have 2018*4882a593Smuzhiyunroutes of their own, and if for some reason they must, those routes do 2019*4882a593Smuzhiyunnot supersede routes of their master. This should generally be the 2020*4882a593Smuzhiyuncase, but unusual configurations or errant manual or automatic static 2021*4882a593Smuzhiyunroute additions may cause trouble. 2022*4882a593Smuzhiyun 2023*4882a593Smuzhiyun8.2 Ethernet Device Renaming 2024*4882a593Smuzhiyun---------------------------- 2025*4882a593Smuzhiyun 2026*4882a593SmuzhiyunOn systems with network configuration scripts that do not 2027*4882a593Smuzhiyunassociate physical devices directly with network interface names (so 2028*4882a593Smuzhiyunthat the same physical device always has the same "ethX" name), it may 2029*4882a593Smuzhiyunbe necessary to add some special logic to config files in 2030*4882a593Smuzhiyun/etc/modprobe.d/. 2031*4882a593Smuzhiyun 2032*4882a593SmuzhiyunFor example, given a modules.conf containing the following:: 2033*4882a593Smuzhiyun 2034*4882a593Smuzhiyun alias bond0 bonding 2035*4882a593Smuzhiyun options bond0 mode=some-mode miimon=50 2036*4882a593Smuzhiyun alias eth0 tg3 2037*4882a593Smuzhiyun alias eth1 tg3 2038*4882a593Smuzhiyun alias eth2 e1000 2039*4882a593Smuzhiyun alias eth3 e1000 2040*4882a593Smuzhiyun 2041*4882a593SmuzhiyunIf neither eth0 and eth1 are slaves to bond0, then when the 2042*4882a593Smuzhiyunbond0 interface comes up, the devices may end up reordered. This 2043*4882a593Smuzhiyunhappens because bonding is loaded first, then its slave device's 2044*4882a593Smuzhiyundrivers are loaded next. Since no other drivers have been loaded, 2045*4882a593Smuzhiyunwhen the e1000 driver loads, it will receive eth0 and eth1 for its 2046*4882a593Smuzhiyundevices, but the bonding configuration tries to enslave eth2 and eth3 2047*4882a593Smuzhiyun(which may later be assigned to the tg3 devices). 2048*4882a593Smuzhiyun 2049*4882a593SmuzhiyunAdding the following:: 2050*4882a593Smuzhiyun 2051*4882a593Smuzhiyun add above bonding e1000 tg3 2052*4882a593Smuzhiyun 2053*4882a593Smuzhiyuncauses modprobe to load e1000 then tg3, in that order, when 2054*4882a593Smuzhiyunbonding is loaded. This command is fully documented in the 2055*4882a593Smuzhiyunmodules.conf manual page. 2056*4882a593Smuzhiyun 2057*4882a593SmuzhiyunOn systems utilizing modprobe an equivalent problem can occur. 2058*4882a593SmuzhiyunIn this case, the following can be added to config files in 2059*4882a593Smuzhiyun/etc/modprobe.d/ as:: 2060*4882a593Smuzhiyun 2061*4882a593Smuzhiyun softdep bonding pre: tg3 e1000 2062*4882a593Smuzhiyun 2063*4882a593SmuzhiyunThis will load tg3 and e1000 modules before loading the bonding one. 2064*4882a593SmuzhiyunFull documentation on this can be found in the modprobe.d and modprobe 2065*4882a593Smuzhiyunmanual pages. 2066*4882a593Smuzhiyun 2067*4882a593Smuzhiyun8.3. Painfully Slow Or No Failed Link Detection By Miimon 2068*4882a593Smuzhiyun--------------------------------------------------------- 2069*4882a593Smuzhiyun 2070*4882a593SmuzhiyunBy default, bonding enables the use_carrier option, which 2071*4882a593Smuzhiyuninstructs bonding to trust the driver to maintain carrier state. 2072*4882a593Smuzhiyun 2073*4882a593SmuzhiyunAs discussed in the options section, above, some drivers do 2074*4882a593Smuzhiyunnot support the netif_carrier_on/_off link state tracking system. 2075*4882a593SmuzhiyunWith use_carrier enabled, bonding will always see these links as up, 2076*4882a593Smuzhiyunregardless of their actual state. 2077*4882a593Smuzhiyun 2078*4882a593SmuzhiyunAdditionally, other drivers do support netif_carrier, but do 2079*4882a593Smuzhiyunnot maintain it in real time, e.g., only polling the link state at 2080*4882a593Smuzhiyunsome fixed interval. In this case, miimon will detect failures, but 2081*4882a593Smuzhiyunonly after some long period of time has expired. If it appears that 2082*4882a593Smuzhiyunmiimon is very slow in detecting link failures, try specifying 2083*4882a593Smuzhiyunuse_carrier=0 to see if that improves the failure detection time. If 2084*4882a593Smuzhiyunit does, then it may be that the driver checks the carrier state at a 2085*4882a593Smuzhiyunfixed interval, but does not cache the MII register values (so the 2086*4882a593Smuzhiyunuse_carrier=0 method of querying the registers directly works). If 2087*4882a593Smuzhiyunuse_carrier=0 does not improve the failover, then the driver may cache 2088*4882a593Smuzhiyunthe registers, or the problem may be elsewhere. 2089*4882a593Smuzhiyun 2090*4882a593SmuzhiyunAlso, remember that miimon only checks for the device's 2091*4882a593Smuzhiyuncarrier state. It has no way to determine the state of devices on or 2092*4882a593Smuzhiyunbeyond other ports of a switch, or if a switch is refusing to pass 2093*4882a593Smuzhiyuntraffic while still maintaining carrier on. 2094*4882a593Smuzhiyun 2095*4882a593Smuzhiyun9. SNMP agents 2096*4882a593Smuzhiyun=============== 2097*4882a593Smuzhiyun 2098*4882a593SmuzhiyunIf running SNMP agents, the bonding driver should be loaded 2099*4882a593Smuzhiyunbefore any network drivers participating in a bond. This requirement 2100*4882a593Smuzhiyunis due to the interface index (ipAdEntIfIndex) being associated to 2101*4882a593Smuzhiyunthe first interface found with a given IP address. That is, there is 2102*4882a593Smuzhiyunonly one ipAdEntIfIndex for each IP address. For example, if eth0 and 2103*4882a593Smuzhiyuneth1 are slaves of bond0 and the driver for eth0 is loaded before the 2104*4882a593Smuzhiyunbonding driver, the interface for the IP address will be associated 2105*4882a593Smuzhiyunwith the eth0 interface. This configuration is shown below, the IP 2106*4882a593Smuzhiyunaddress 192.168.1.1 has an interface index of 2 which indexes to eth0 2107*4882a593Smuzhiyunin the ifDescr table (ifDescr.2). 2108*4882a593Smuzhiyun 2109*4882a593Smuzhiyun:: 2110*4882a593Smuzhiyun 2111*4882a593Smuzhiyun interfaces.ifTable.ifEntry.ifDescr.1 = lo 2112*4882a593Smuzhiyun interfaces.ifTable.ifEntry.ifDescr.2 = eth0 2113*4882a593Smuzhiyun interfaces.ifTable.ifEntry.ifDescr.3 = eth1 2114*4882a593Smuzhiyun interfaces.ifTable.ifEntry.ifDescr.4 = eth2 2115*4882a593Smuzhiyun interfaces.ifTable.ifEntry.ifDescr.5 = eth3 2116*4882a593Smuzhiyun interfaces.ifTable.ifEntry.ifDescr.6 = bond0 2117*4882a593Smuzhiyun ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.10.10.10 = 5 2118*4882a593Smuzhiyun ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.192.168.1.1 = 2 2119*4882a593Smuzhiyun ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.74.20.94 = 4 2120*4882a593Smuzhiyun ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.127.0.0.1 = 1 2121*4882a593Smuzhiyun 2122*4882a593SmuzhiyunThis problem is avoided by loading the bonding driver before 2123*4882a593Smuzhiyunany network drivers participating in a bond. Below is an example of 2124*4882a593Smuzhiyunloading the bonding driver first, the IP address 192.168.1.1 is 2125*4882a593Smuzhiyuncorrectly associated with ifDescr.2. 2126*4882a593Smuzhiyun 2127*4882a593Smuzhiyun interfaces.ifTable.ifEntry.ifDescr.1 = lo 2128*4882a593Smuzhiyun interfaces.ifTable.ifEntry.ifDescr.2 = bond0 2129*4882a593Smuzhiyun interfaces.ifTable.ifEntry.ifDescr.3 = eth0 2130*4882a593Smuzhiyun interfaces.ifTable.ifEntry.ifDescr.4 = eth1 2131*4882a593Smuzhiyun interfaces.ifTable.ifEntry.ifDescr.5 = eth2 2132*4882a593Smuzhiyun interfaces.ifTable.ifEntry.ifDescr.6 = eth3 2133*4882a593Smuzhiyun ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.10.10.10 = 6 2134*4882a593Smuzhiyun ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.192.168.1.1 = 2 2135*4882a593Smuzhiyun ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.74.20.94 = 5 2136*4882a593Smuzhiyun ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.127.0.0.1 = 1 2137*4882a593Smuzhiyun 2138*4882a593SmuzhiyunWhile some distributions may not report the interface name in 2139*4882a593SmuzhiyunifDescr, the association between the IP address and IfIndex remains 2140*4882a593Smuzhiyunand SNMP functions such as Interface_Scan_Next will report that 2141*4882a593Smuzhiyunassociation. 2142*4882a593Smuzhiyun 2143*4882a593Smuzhiyun10. Promiscuous mode 2144*4882a593Smuzhiyun==================== 2145*4882a593Smuzhiyun 2146*4882a593SmuzhiyunWhen running network monitoring tools, e.g., tcpdump, it is 2147*4882a593Smuzhiyuncommon to enable promiscuous mode on the device, so that all traffic 2148*4882a593Smuzhiyunis seen (instead of seeing only traffic destined for the local host). 2149*4882a593SmuzhiyunThe bonding driver handles promiscuous mode changes to the bonding 2150*4882a593Smuzhiyunmaster device (e.g., bond0), and propagates the setting to the slave 2151*4882a593Smuzhiyundevices. 2152*4882a593Smuzhiyun 2153*4882a593SmuzhiyunFor the balance-rr, balance-xor, broadcast, and 802.3ad modes, 2154*4882a593Smuzhiyunthe promiscuous mode setting is propagated to all slaves. 2155*4882a593Smuzhiyun 2156*4882a593SmuzhiyunFor the active-backup, balance-tlb and balance-alb modes, the 2157*4882a593Smuzhiyunpromiscuous mode setting is propagated only to the active slave. 2158*4882a593Smuzhiyun 2159*4882a593SmuzhiyunFor balance-tlb mode, the active slave is the slave currently 2160*4882a593Smuzhiyunreceiving inbound traffic. 2161*4882a593Smuzhiyun 2162*4882a593SmuzhiyunFor balance-alb mode, the active slave is the slave used as a 2163*4882a593Smuzhiyun"primary." This slave is used for mode-specific control traffic, for 2164*4882a593Smuzhiyunsending to peers that are unassigned or if the load is unbalanced. 2165*4882a593Smuzhiyun 2166*4882a593SmuzhiyunFor the active-backup, balance-tlb and balance-alb modes, when 2167*4882a593Smuzhiyunthe active slave changes (e.g., due to a link failure), the 2168*4882a593Smuzhiyunpromiscuous setting will be propagated to the new active slave. 2169*4882a593Smuzhiyun 2170*4882a593Smuzhiyun11. Configuring Bonding for High Availability 2171*4882a593Smuzhiyun============================================= 2172*4882a593Smuzhiyun 2173*4882a593SmuzhiyunHigh Availability refers to configurations that provide 2174*4882a593Smuzhiyunmaximum network availability by having redundant or backup devices, 2175*4882a593Smuzhiyunlinks or switches between the host and the rest of the world. The 2176*4882a593Smuzhiyungoal is to provide the maximum availability of network connectivity 2177*4882a593Smuzhiyun(i.e., the network always works), even though other configurations 2178*4882a593Smuzhiyuncould provide higher throughput. 2179*4882a593Smuzhiyun 2180*4882a593Smuzhiyun11.1 High Availability in a Single Switch Topology 2181*4882a593Smuzhiyun-------------------------------------------------- 2182*4882a593Smuzhiyun 2183*4882a593SmuzhiyunIf two hosts (or a host and a single switch) are directly 2184*4882a593Smuzhiyunconnected via multiple physical links, then there is no availability 2185*4882a593Smuzhiyunpenalty to optimizing for maximum bandwidth. In this case, there is 2186*4882a593Smuzhiyunonly one switch (or peer), so if it fails, there is no alternative 2187*4882a593Smuzhiyunaccess to fail over to. Additionally, the bonding load balance modes 2188*4882a593Smuzhiyunsupport link monitoring of their members, so if individual links fail, 2189*4882a593Smuzhiyunthe load will be rebalanced across the remaining devices. 2190*4882a593Smuzhiyun 2191*4882a593SmuzhiyunSee Section 12, "Configuring Bonding for Maximum Throughput" 2192*4882a593Smuzhiyunfor information on configuring bonding with one peer device. 2193*4882a593Smuzhiyun 2194*4882a593Smuzhiyun11.2 High Availability in a Multiple Switch Topology 2195*4882a593Smuzhiyun---------------------------------------------------- 2196*4882a593Smuzhiyun 2197*4882a593SmuzhiyunWith multiple switches, the configuration of bonding and the 2198*4882a593Smuzhiyunnetwork changes dramatically. In multiple switch topologies, there is 2199*4882a593Smuzhiyuna trade off between network availability and usable bandwidth. 2200*4882a593Smuzhiyun 2201*4882a593SmuzhiyunBelow is a sample network, configured to maximize the 2202*4882a593Smuzhiyunavailability of the network:: 2203*4882a593Smuzhiyun 2204*4882a593Smuzhiyun | | 2205*4882a593Smuzhiyun |port3 port3| 2206*4882a593Smuzhiyun +-----+----+ +-----+----+ 2207*4882a593Smuzhiyun | |port2 ISL port2| | 2208*4882a593Smuzhiyun | switch A +--------------------------+ switch B | 2209*4882a593Smuzhiyun | | | | 2210*4882a593Smuzhiyun +-----+----+ +-----++---+ 2211*4882a593Smuzhiyun |port1 port1| 2212*4882a593Smuzhiyun | +-------+ | 2213*4882a593Smuzhiyun +-------------+ host1 +---------------+ 2214*4882a593Smuzhiyun eth0 +-------+ eth1 2215*4882a593Smuzhiyun 2216*4882a593SmuzhiyunIn this configuration, there is a link between the two 2217*4882a593Smuzhiyunswitches (ISL, or inter switch link), and multiple ports connecting to 2218*4882a593Smuzhiyunthe outside world ("port3" on each switch). There is no technical 2219*4882a593Smuzhiyunreason that this could not be extended to a third switch. 2220*4882a593Smuzhiyun 2221*4882a593Smuzhiyun11.2.1 HA Bonding Mode Selection for Multiple Switch Topology 2222*4882a593Smuzhiyun------------------------------------------------------------- 2223*4882a593Smuzhiyun 2224*4882a593SmuzhiyunIn a topology such as the example above, the active-backup and 2225*4882a593Smuzhiyunbroadcast modes are the only useful bonding modes when optimizing for 2226*4882a593Smuzhiyunavailability; the other modes require all links to terminate on the 2227*4882a593Smuzhiyunsame peer for them to behave rationally. 2228*4882a593Smuzhiyun 2229*4882a593Smuzhiyunactive-backup: 2230*4882a593Smuzhiyun This is generally the preferred mode, particularly if 2231*4882a593Smuzhiyun the switches have an ISL and play together well. If the 2232*4882a593Smuzhiyun network configuration is such that one switch is specifically 2233*4882a593Smuzhiyun a backup switch (e.g., has lower capacity, higher cost, etc), 2234*4882a593Smuzhiyun then the primary option can be used to insure that the 2235*4882a593Smuzhiyun preferred link is always used when it is available. 2236*4882a593Smuzhiyun 2237*4882a593Smuzhiyunbroadcast: 2238*4882a593Smuzhiyun This mode is really a special purpose mode, and is suitable 2239*4882a593Smuzhiyun only for very specific needs. For example, if the two 2240*4882a593Smuzhiyun switches are not connected (no ISL), and the networks beyond 2241*4882a593Smuzhiyun them are totally independent. In this case, if it is 2242*4882a593Smuzhiyun necessary for some specific one-way traffic to reach both 2243*4882a593Smuzhiyun independent networks, then the broadcast mode may be suitable. 2244*4882a593Smuzhiyun 2245*4882a593Smuzhiyun11.2.2 HA Link Monitoring Selection for Multiple Switch Topology 2246*4882a593Smuzhiyun---------------------------------------------------------------- 2247*4882a593Smuzhiyun 2248*4882a593SmuzhiyunThe choice of link monitoring ultimately depends upon your 2249*4882a593Smuzhiyunswitch. If the switch can reliably fail ports in response to other 2250*4882a593Smuzhiyunfailures, then either the MII or ARP monitors should work. For 2251*4882a593Smuzhiyunexample, in the above example, if the "port3" link fails at the remote 2252*4882a593Smuzhiyunend, the MII monitor has no direct means to detect this. The ARP 2253*4882a593Smuzhiyunmonitor could be configured with a target at the remote end of port3, 2254*4882a593Smuzhiyunthus detecting that failure without switch support. 2255*4882a593Smuzhiyun 2256*4882a593SmuzhiyunIn general, however, in a multiple switch topology, the ARP 2257*4882a593Smuzhiyunmonitor can provide a higher level of reliability in detecting end to 2258*4882a593Smuzhiyunend connectivity failures (which may be caused by the failure of any 2259*4882a593Smuzhiyunindividual component to pass traffic for any reason). Additionally, 2260*4882a593Smuzhiyunthe ARP monitor should be configured with multiple targets (at least 2261*4882a593Smuzhiyunone for each switch in the network). This will insure that, 2262*4882a593Smuzhiyunregardless of which switch is active, the ARP monitor has a suitable 2263*4882a593Smuzhiyuntarget to query. 2264*4882a593Smuzhiyun 2265*4882a593SmuzhiyunNote, also, that of late many switches now support a functionality 2266*4882a593Smuzhiyungenerally referred to as "trunk failover." This is a feature of the 2267*4882a593Smuzhiyunswitch that causes the link state of a particular switch port to be set 2268*4882a593Smuzhiyundown (or up) when the state of another switch port goes down (or up). 2269*4882a593SmuzhiyunIts purpose is to propagate link failures from logically "exterior" ports 2270*4882a593Smuzhiyunto the logically "interior" ports that bonding is able to monitor via 2271*4882a593Smuzhiyunmiimon. Availability and configuration for trunk failover varies by 2272*4882a593Smuzhiyunswitch, but this can be a viable alternative to the ARP monitor when using 2273*4882a593Smuzhiyunsuitable switches. 2274*4882a593Smuzhiyun 2275*4882a593Smuzhiyun12. Configuring Bonding for Maximum Throughput 2276*4882a593Smuzhiyun============================================== 2277*4882a593Smuzhiyun 2278*4882a593Smuzhiyun12.1 Maximizing Throughput in a Single Switch Topology 2279*4882a593Smuzhiyun------------------------------------------------------ 2280*4882a593Smuzhiyun 2281*4882a593SmuzhiyunIn a single switch configuration, the best method to maximize 2282*4882a593Smuzhiyunthroughput depends upon the application and network environment. The 2283*4882a593Smuzhiyunvarious load balancing modes each have strengths and weaknesses in 2284*4882a593Smuzhiyundifferent environments, as detailed below. 2285*4882a593Smuzhiyun 2286*4882a593SmuzhiyunFor this discussion, we will break down the topologies into 2287*4882a593Smuzhiyuntwo categories. Depending upon the destination of most traffic, we 2288*4882a593Smuzhiyuncategorize them into either "gatewayed" or "local" configurations. 2289*4882a593Smuzhiyun 2290*4882a593SmuzhiyunIn a gatewayed configuration, the "switch" is acting primarily 2291*4882a593Smuzhiyunas a router, and the majority of traffic passes through this router to 2292*4882a593Smuzhiyunother networks. An example would be the following:: 2293*4882a593Smuzhiyun 2294*4882a593Smuzhiyun 2295*4882a593Smuzhiyun +----------+ +----------+ 2296*4882a593Smuzhiyun | |eth0 port1| | to other networks 2297*4882a593Smuzhiyun | Host A +---------------------+ router +-------------------> 2298*4882a593Smuzhiyun | +---------------------+ | Hosts B and C are out 2299*4882a593Smuzhiyun | |eth1 port2| | here somewhere 2300*4882a593Smuzhiyun +----------+ +----------+ 2301*4882a593Smuzhiyun 2302*4882a593SmuzhiyunThe router may be a dedicated router device, or another host 2303*4882a593Smuzhiyunacting as a gateway. For our discussion, the important point is that 2304*4882a593Smuzhiyunthe majority of traffic from Host A will pass through the router to 2305*4882a593Smuzhiyunsome other network before reaching its final destination. 2306*4882a593Smuzhiyun 2307*4882a593SmuzhiyunIn a gatewayed network configuration, although Host A may 2308*4882a593Smuzhiyuncommunicate with many other systems, all of its traffic will be sent 2309*4882a593Smuzhiyunand received via one other peer on the local network, the router. 2310*4882a593Smuzhiyun 2311*4882a593SmuzhiyunNote that the case of two systems connected directly via 2312*4882a593Smuzhiyunmultiple physical links is, for purposes of configuring bonding, the 2313*4882a593Smuzhiyunsame as a gatewayed configuration. In that case, it happens that all 2314*4882a593Smuzhiyuntraffic is destined for the "gateway" itself, not some other network 2315*4882a593Smuzhiyunbeyond the gateway. 2316*4882a593Smuzhiyun 2317*4882a593SmuzhiyunIn a local configuration, the "switch" is acting primarily as 2318*4882a593Smuzhiyuna switch, and the majority of traffic passes through this switch to 2319*4882a593Smuzhiyunreach other stations on the same network. An example would be the 2320*4882a593Smuzhiyunfollowing:: 2321*4882a593Smuzhiyun 2322*4882a593Smuzhiyun +----------+ +----------+ +--------+ 2323*4882a593Smuzhiyun | |eth0 port1| +-------+ Host B | 2324*4882a593Smuzhiyun | Host A +------------+ switch |port3 +--------+ 2325*4882a593Smuzhiyun | +------------+ | +--------+ 2326*4882a593Smuzhiyun | |eth1 port2| +------------------+ Host C | 2327*4882a593Smuzhiyun +----------+ +----------+port4 +--------+ 2328*4882a593Smuzhiyun 2329*4882a593Smuzhiyun 2330*4882a593SmuzhiyunAgain, the switch may be a dedicated switch device, or another 2331*4882a593Smuzhiyunhost acting as a gateway. For our discussion, the important point is 2332*4882a593Smuzhiyunthat the majority of traffic from Host A is destined for other hosts 2333*4882a593Smuzhiyunon the same local network (Hosts B and C in the above example). 2334*4882a593Smuzhiyun 2335*4882a593SmuzhiyunIn summary, in a gatewayed configuration, traffic to and from 2336*4882a593Smuzhiyunthe bonded device will be to the same MAC level peer on the network 2337*4882a593Smuzhiyun(the gateway itself, i.e., the router), regardless of its final 2338*4882a593Smuzhiyundestination. In a local configuration, traffic flows directly to and 2339*4882a593Smuzhiyunfrom the final destinations, thus, each destination (Host B, Host C) 2340*4882a593Smuzhiyunwill be addressed directly by their individual MAC addresses. 2341*4882a593Smuzhiyun 2342*4882a593SmuzhiyunThis distinction between a gatewayed and a local network 2343*4882a593Smuzhiyunconfiguration is important because many of the load balancing modes 2344*4882a593Smuzhiyunavailable use the MAC addresses of the local network source and 2345*4882a593Smuzhiyundestination to make load balancing decisions. The behavior of each 2346*4882a593Smuzhiyunmode is described below. 2347*4882a593Smuzhiyun 2348*4882a593Smuzhiyun 2349*4882a593Smuzhiyun12.1.1 MT Bonding Mode Selection for Single Switch Topology 2350*4882a593Smuzhiyun----------------------------------------------------------- 2351*4882a593Smuzhiyun 2352*4882a593SmuzhiyunThis configuration is the easiest to set up and to understand, 2353*4882a593Smuzhiyunalthough you will have to decide which bonding mode best suits your 2354*4882a593Smuzhiyunneeds. The trade offs for each mode are detailed below: 2355*4882a593Smuzhiyun 2356*4882a593Smuzhiyunbalance-rr: 2357*4882a593Smuzhiyun This mode is the only mode that will permit a single 2358*4882a593Smuzhiyun TCP/IP connection to stripe traffic across multiple 2359*4882a593Smuzhiyun interfaces. It is therefore the only mode that will allow a 2360*4882a593Smuzhiyun single TCP/IP stream to utilize more than one interface's 2361*4882a593Smuzhiyun worth of throughput. This comes at a cost, however: the 2362*4882a593Smuzhiyun striping generally results in peer systems receiving packets out 2363*4882a593Smuzhiyun of order, causing TCP/IP's congestion control system to kick 2364*4882a593Smuzhiyun in, often by retransmitting segments. 2365*4882a593Smuzhiyun 2366*4882a593Smuzhiyun It is possible to adjust TCP/IP's congestion limits by 2367*4882a593Smuzhiyun altering the net.ipv4.tcp_reordering sysctl parameter. The 2368*4882a593Smuzhiyun usual default value is 3. But keep in mind TCP stack is able 2369*4882a593Smuzhiyun to automatically increase this when it detects reorders. 2370*4882a593Smuzhiyun 2371*4882a593Smuzhiyun Note that the fraction of packets that will be delivered out of 2372*4882a593Smuzhiyun order is highly variable, and is unlikely to be zero. The level 2373*4882a593Smuzhiyun of reordering depends upon a variety of factors, including the 2374*4882a593Smuzhiyun networking interfaces, the switch, and the topology of the 2375*4882a593Smuzhiyun configuration. Speaking in general terms, higher speed network 2376*4882a593Smuzhiyun cards produce more reordering (due to factors such as packet 2377*4882a593Smuzhiyun coalescing), and a "many to many" topology will reorder at a 2378*4882a593Smuzhiyun higher rate than a "many slow to one fast" configuration. 2379*4882a593Smuzhiyun 2380*4882a593Smuzhiyun Many switches do not support any modes that stripe traffic 2381*4882a593Smuzhiyun (instead choosing a port based upon IP or MAC level addresses); 2382*4882a593Smuzhiyun for those devices, traffic for a particular connection flowing 2383*4882a593Smuzhiyun through the switch to a balance-rr bond will not utilize greater 2384*4882a593Smuzhiyun than one interface's worth of bandwidth. 2385*4882a593Smuzhiyun 2386*4882a593Smuzhiyun If you are utilizing protocols other than TCP/IP, UDP for 2387*4882a593Smuzhiyun example, and your application can tolerate out of order 2388*4882a593Smuzhiyun delivery, then this mode can allow for single stream datagram 2389*4882a593Smuzhiyun performance that scales near linearly as interfaces are added 2390*4882a593Smuzhiyun to the bond. 2391*4882a593Smuzhiyun 2392*4882a593Smuzhiyun This mode requires the switch to have the appropriate ports 2393*4882a593Smuzhiyun configured for "etherchannel" or "trunking." 2394*4882a593Smuzhiyun 2395*4882a593Smuzhiyunactive-backup: 2396*4882a593Smuzhiyun There is not much advantage in this network topology to 2397*4882a593Smuzhiyun the active-backup mode, as the inactive backup devices are all 2398*4882a593Smuzhiyun connected to the same peer as the primary. In this case, a 2399*4882a593Smuzhiyun load balancing mode (with link monitoring) will provide the 2400*4882a593Smuzhiyun same level of network availability, but with increased 2401*4882a593Smuzhiyun available bandwidth. On the plus side, active-backup mode 2402*4882a593Smuzhiyun does not require any configuration of the switch, so it may 2403*4882a593Smuzhiyun have value if the hardware available does not support any of 2404*4882a593Smuzhiyun the load balance modes. 2405*4882a593Smuzhiyun 2406*4882a593Smuzhiyunbalance-xor: 2407*4882a593Smuzhiyun This mode will limit traffic such that packets destined 2408*4882a593Smuzhiyun for specific peers will always be sent over the same 2409*4882a593Smuzhiyun interface. Since the destination is determined by the MAC 2410*4882a593Smuzhiyun addresses involved, this mode works best in a "local" network 2411*4882a593Smuzhiyun configuration (as described above), with destinations all on 2412*4882a593Smuzhiyun the same local network. This mode is likely to be suboptimal 2413*4882a593Smuzhiyun if all your traffic is passed through a single router (i.e., a 2414*4882a593Smuzhiyun "gatewayed" network configuration, as described above). 2415*4882a593Smuzhiyun 2416*4882a593Smuzhiyun As with balance-rr, the switch ports need to be configured for 2417*4882a593Smuzhiyun "etherchannel" or "trunking." 2418*4882a593Smuzhiyun 2419*4882a593Smuzhiyunbroadcast: 2420*4882a593Smuzhiyun Like active-backup, there is not much advantage to this 2421*4882a593Smuzhiyun mode in this type of network topology. 2422*4882a593Smuzhiyun 2423*4882a593Smuzhiyun802.3ad: 2424*4882a593Smuzhiyun This mode can be a good choice for this type of network 2425*4882a593Smuzhiyun topology. The 802.3ad mode is an IEEE standard, so all peers 2426*4882a593Smuzhiyun that implement 802.3ad should interoperate well. The 802.3ad 2427*4882a593Smuzhiyun protocol includes automatic configuration of the aggregates, 2428*4882a593Smuzhiyun so minimal manual configuration of the switch is needed 2429*4882a593Smuzhiyun (typically only to designate that some set of devices is 2430*4882a593Smuzhiyun available for 802.3ad). The 802.3ad standard also mandates 2431*4882a593Smuzhiyun that frames be delivered in order (within certain limits), so 2432*4882a593Smuzhiyun in general single connections will not see misordering of 2433*4882a593Smuzhiyun packets. The 802.3ad mode does have some drawbacks: the 2434*4882a593Smuzhiyun standard mandates that all devices in the aggregate operate at 2435*4882a593Smuzhiyun the same speed and duplex. Also, as with all bonding load 2436*4882a593Smuzhiyun balance modes other than balance-rr, no single connection will 2437*4882a593Smuzhiyun be able to utilize more than a single interface's worth of 2438*4882a593Smuzhiyun bandwidth. 2439*4882a593Smuzhiyun 2440*4882a593Smuzhiyun Additionally, the linux bonding 802.3ad implementation 2441*4882a593Smuzhiyun distributes traffic by peer (using an XOR of MAC addresses 2442*4882a593Smuzhiyun and packet type ID), so in a "gatewayed" configuration, all 2443*4882a593Smuzhiyun outgoing traffic will generally use the same device. Incoming 2444*4882a593Smuzhiyun traffic may also end up on a single device, but that is 2445*4882a593Smuzhiyun dependent upon the balancing policy of the peer's 802.3ad 2446*4882a593Smuzhiyun implementation. In a "local" configuration, traffic will be 2447*4882a593Smuzhiyun distributed across the devices in the bond. 2448*4882a593Smuzhiyun 2449*4882a593Smuzhiyun Finally, the 802.3ad mode mandates the use of the MII monitor, 2450*4882a593Smuzhiyun therefore, the ARP monitor is not available in this mode. 2451*4882a593Smuzhiyun 2452*4882a593Smuzhiyunbalance-tlb: 2453*4882a593Smuzhiyun The balance-tlb mode balances outgoing traffic by peer. 2454*4882a593Smuzhiyun Since the balancing is done according to MAC address, in a 2455*4882a593Smuzhiyun "gatewayed" configuration (as described above), this mode will 2456*4882a593Smuzhiyun send all traffic across a single device. However, in a 2457*4882a593Smuzhiyun "local" network configuration, this mode balances multiple 2458*4882a593Smuzhiyun local network peers across devices in a vaguely intelligent 2459*4882a593Smuzhiyun manner (not a simple XOR as in balance-xor or 802.3ad mode), 2460*4882a593Smuzhiyun so that mathematically unlucky MAC addresses (i.e., ones that 2461*4882a593Smuzhiyun XOR to the same value) will not all "bunch up" on a single 2462*4882a593Smuzhiyun interface. 2463*4882a593Smuzhiyun 2464*4882a593Smuzhiyun Unlike 802.3ad, interfaces may be of differing speeds, and no 2465*4882a593Smuzhiyun special switch configuration is required. On the down side, 2466*4882a593Smuzhiyun in this mode all incoming traffic arrives over a single 2467*4882a593Smuzhiyun interface, this mode requires certain ethtool support in the 2468*4882a593Smuzhiyun network device driver of the slave interfaces, and the ARP 2469*4882a593Smuzhiyun monitor is not available. 2470*4882a593Smuzhiyun 2471*4882a593Smuzhiyunbalance-alb: 2472*4882a593Smuzhiyun This mode is everything that balance-tlb is, and more. 2473*4882a593Smuzhiyun It has all of the features (and restrictions) of balance-tlb, 2474*4882a593Smuzhiyun and will also balance incoming traffic from local network 2475*4882a593Smuzhiyun peers (as described in the Bonding Module Options section, 2476*4882a593Smuzhiyun above). 2477*4882a593Smuzhiyun 2478*4882a593Smuzhiyun The only additional down side to this mode is that the network 2479*4882a593Smuzhiyun device driver must support changing the hardware address while 2480*4882a593Smuzhiyun the device is open. 2481*4882a593Smuzhiyun 2482*4882a593Smuzhiyun12.1.2 MT Link Monitoring for Single Switch Topology 2483*4882a593Smuzhiyun---------------------------------------------------- 2484*4882a593Smuzhiyun 2485*4882a593SmuzhiyunThe choice of link monitoring may largely depend upon which 2486*4882a593Smuzhiyunmode you choose to use. The more advanced load balancing modes do not 2487*4882a593Smuzhiyunsupport the use of the ARP monitor, and are thus restricted to using 2488*4882a593Smuzhiyunthe MII monitor (which does not provide as high a level of end to end 2489*4882a593Smuzhiyunassurance as the ARP monitor). 2490*4882a593Smuzhiyun 2491*4882a593Smuzhiyun12.2 Maximum Throughput in a Multiple Switch Topology 2492*4882a593Smuzhiyun----------------------------------------------------- 2493*4882a593Smuzhiyun 2494*4882a593SmuzhiyunMultiple switches may be utilized to optimize for throughput 2495*4882a593Smuzhiyunwhen they are configured in parallel as part of an isolated network 2496*4882a593Smuzhiyunbetween two or more systems, for example:: 2497*4882a593Smuzhiyun 2498*4882a593Smuzhiyun +-----------+ 2499*4882a593Smuzhiyun | Host A | 2500*4882a593Smuzhiyun +-+---+---+-+ 2501*4882a593Smuzhiyun | | | 2502*4882a593Smuzhiyun +--------+ | +---------+ 2503*4882a593Smuzhiyun | | | 2504*4882a593Smuzhiyun +------+---+ +-----+----+ +-----+----+ 2505*4882a593Smuzhiyun | Switch A | | Switch B | | Switch C | 2506*4882a593Smuzhiyun +------+---+ +-----+----+ +-----+----+ 2507*4882a593Smuzhiyun | | | 2508*4882a593Smuzhiyun +--------+ | +---------+ 2509*4882a593Smuzhiyun | | | 2510*4882a593Smuzhiyun +-+---+---+-+ 2511*4882a593Smuzhiyun | Host B | 2512*4882a593Smuzhiyun +-----------+ 2513*4882a593Smuzhiyun 2514*4882a593SmuzhiyunIn this configuration, the switches are isolated from one 2515*4882a593Smuzhiyunanother. One reason to employ a topology such as this is for an 2516*4882a593Smuzhiyunisolated network with many hosts (a cluster configured for high 2517*4882a593Smuzhiyunperformance, for example), using multiple smaller switches can be more 2518*4882a593Smuzhiyuncost effective than a single larger switch, e.g., on a network with 24 2519*4882a593Smuzhiyunhosts, three 24 port switches can be significantly less expensive than 2520*4882a593Smuzhiyuna single 72 port switch. 2521*4882a593Smuzhiyun 2522*4882a593SmuzhiyunIf access beyond the network is required, an individual host 2523*4882a593Smuzhiyuncan be equipped with an additional network device connected to an 2524*4882a593Smuzhiyunexternal network; this host then additionally acts as a gateway. 2525*4882a593Smuzhiyun 2526*4882a593Smuzhiyun12.2.1 MT Bonding Mode Selection for Multiple Switch Topology 2527*4882a593Smuzhiyun------------------------------------------------------------- 2528*4882a593Smuzhiyun 2529*4882a593SmuzhiyunIn actual practice, the bonding mode typically employed in 2530*4882a593Smuzhiyunconfigurations of this type is balance-rr. Historically, in this 2531*4882a593Smuzhiyunnetwork configuration, the usual caveats about out of order packet 2532*4882a593Smuzhiyundelivery are mitigated by the use of network adapters that do not do 2533*4882a593Smuzhiyunany kind of packet coalescing (via the use of NAPI, or because the 2534*4882a593Smuzhiyundevice itself does not generate interrupts until some number of 2535*4882a593Smuzhiyunpackets has arrived). When employed in this fashion, the balance-rr 2536*4882a593Smuzhiyunmode allows individual connections between two hosts to effectively 2537*4882a593Smuzhiyunutilize greater than one interface's bandwidth. 2538*4882a593Smuzhiyun 2539*4882a593Smuzhiyun12.2.2 MT Link Monitoring for Multiple Switch Topology 2540*4882a593Smuzhiyun------------------------------------------------------ 2541*4882a593Smuzhiyun 2542*4882a593SmuzhiyunAgain, in actual practice, the MII monitor is most often used 2543*4882a593Smuzhiyunin this configuration, as performance is given preference over 2544*4882a593Smuzhiyunavailability. The ARP monitor will function in this topology, but its 2545*4882a593Smuzhiyunadvantages over the MII monitor are mitigated by the volume of probes 2546*4882a593Smuzhiyunneeded as the number of systems involved grows (remember that each 2547*4882a593Smuzhiyunhost in the network is configured with bonding). 2548*4882a593Smuzhiyun 2549*4882a593Smuzhiyun13. Switch Behavior Issues 2550*4882a593Smuzhiyun========================== 2551*4882a593Smuzhiyun 2552*4882a593Smuzhiyun13.1 Link Establishment and Failover Delays 2553*4882a593Smuzhiyun------------------------------------------- 2554*4882a593Smuzhiyun 2555*4882a593SmuzhiyunSome switches exhibit undesirable behavior with regard to the 2556*4882a593Smuzhiyuntiming of link up and down reporting by the switch. 2557*4882a593Smuzhiyun 2558*4882a593SmuzhiyunFirst, when a link comes up, some switches may indicate that 2559*4882a593Smuzhiyunthe link is up (carrier available), but not pass traffic over the 2560*4882a593Smuzhiyuninterface for some period of time. This delay is typically due to 2561*4882a593Smuzhiyunsome type of autonegotiation or routing protocol, but may also occur 2562*4882a593Smuzhiyunduring switch initialization (e.g., during recovery after a switch 2563*4882a593Smuzhiyunfailure). If you find this to be a problem, specify an appropriate 2564*4882a593Smuzhiyunvalue to the updelay bonding module option to delay the use of the 2565*4882a593Smuzhiyunrelevant interface(s). 2566*4882a593Smuzhiyun 2567*4882a593SmuzhiyunSecond, some switches may "bounce" the link state one or more 2568*4882a593Smuzhiyuntimes while a link is changing state. This occurs most commonly while 2569*4882a593Smuzhiyunthe switch is initializing. Again, an appropriate updelay value may 2570*4882a593Smuzhiyunhelp. 2571*4882a593Smuzhiyun 2572*4882a593SmuzhiyunNote that when a bonding interface has no active links, the 2573*4882a593Smuzhiyundriver will immediately reuse the first link that goes up, even if the 2574*4882a593Smuzhiyunupdelay parameter has been specified (the updelay is ignored in this 2575*4882a593Smuzhiyuncase). If there are slave interfaces waiting for the updelay timeout 2576*4882a593Smuzhiyunto expire, the interface that first went into that state will be 2577*4882a593Smuzhiyunimmediately reused. This reduces down time of the network if the 2578*4882a593Smuzhiyunvalue of updelay has been overestimated, and since this occurs only in 2579*4882a593Smuzhiyuncases with no connectivity, there is no additional penalty for 2580*4882a593Smuzhiyunignoring the updelay. 2581*4882a593Smuzhiyun 2582*4882a593SmuzhiyunIn addition to the concerns about switch timings, if your 2583*4882a593Smuzhiyunswitches take a long time to go into backup mode, it may be desirable 2584*4882a593Smuzhiyunto not activate a backup interface immediately after a link goes down. 2585*4882a593SmuzhiyunFailover may be delayed via the downdelay bonding module option. 2586*4882a593Smuzhiyun 2587*4882a593Smuzhiyun13.2 Duplicated Incoming Packets 2588*4882a593Smuzhiyun-------------------------------- 2589*4882a593Smuzhiyun 2590*4882a593SmuzhiyunNOTE: Starting with version 3.0.2, the bonding driver has logic to 2591*4882a593Smuzhiyunsuppress duplicate packets, which should largely eliminate this problem. 2592*4882a593SmuzhiyunThe following description is kept for reference. 2593*4882a593Smuzhiyun 2594*4882a593SmuzhiyunIt is not uncommon to observe a short burst of duplicated 2595*4882a593Smuzhiyuntraffic when the bonding device is first used, or after it has been 2596*4882a593Smuzhiyunidle for some period of time. This is most easily observed by issuing 2597*4882a593Smuzhiyuna "ping" to some other host on the network, and noticing that the 2598*4882a593Smuzhiyunoutput from ping flags duplicates (typically one per slave). 2599*4882a593Smuzhiyun 2600*4882a593SmuzhiyunFor example, on a bond in active-backup mode with five slaves 2601*4882a593Smuzhiyunall connected to one switch, the output may appear as follows:: 2602*4882a593Smuzhiyun 2603*4882a593Smuzhiyun # ping -n 10.0.4.2 2604*4882a593Smuzhiyun PING 10.0.4.2 (10.0.4.2) from 10.0.3.10 : 56(84) bytes of data. 2605*4882a593Smuzhiyun 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.7 ms 2606*4882a593Smuzhiyun 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!) 2607*4882a593Smuzhiyun 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!) 2608*4882a593Smuzhiyun 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!) 2609*4882a593Smuzhiyun 64 bytes from 10.0.4.2: icmp_seq=1 ttl=64 time=13.8 ms (DUP!) 2610*4882a593Smuzhiyun 64 bytes from 10.0.4.2: icmp_seq=2 ttl=64 time=0.216 ms 2611*4882a593Smuzhiyun 64 bytes from 10.0.4.2: icmp_seq=3 ttl=64 time=0.267 ms 2612*4882a593Smuzhiyun 64 bytes from 10.0.4.2: icmp_seq=4 ttl=64 time=0.222 ms 2613*4882a593Smuzhiyun 2614*4882a593SmuzhiyunThis is not due to an error in the bonding driver, rather, it 2615*4882a593Smuzhiyunis a side effect of how many switches update their MAC forwarding 2616*4882a593Smuzhiyuntables. Initially, the switch does not associate the MAC address in 2617*4882a593Smuzhiyunthe packet with a particular switch port, and so it may send the 2618*4882a593Smuzhiyuntraffic to all ports until its MAC forwarding table is updated. Since 2619*4882a593Smuzhiyunthe interfaces attached to the bond may occupy multiple ports on a 2620*4882a593Smuzhiyunsingle switch, when the switch (temporarily) floods the traffic to all 2621*4882a593Smuzhiyunports, the bond device receives multiple copies of the same packet 2622*4882a593Smuzhiyun(one per slave device). 2623*4882a593Smuzhiyun 2624*4882a593SmuzhiyunThe duplicated packet behavior is switch dependent, some 2625*4882a593Smuzhiyunswitches exhibit this, and some do not. On switches that display this 2626*4882a593Smuzhiyunbehavior, it can be induced by clearing the MAC forwarding table (on 2627*4882a593Smuzhiyunmost Cisco switches, the privileged command "clear mac address-table 2628*4882a593Smuzhiyundynamic" will accomplish this). 2629*4882a593Smuzhiyun 2630*4882a593Smuzhiyun14. Hardware Specific Considerations 2631*4882a593Smuzhiyun==================================== 2632*4882a593Smuzhiyun 2633*4882a593SmuzhiyunThis section contains additional information for configuring 2634*4882a593Smuzhiyunbonding on specific hardware platforms, or for interfacing bonding 2635*4882a593Smuzhiyunwith particular switches or other devices. 2636*4882a593Smuzhiyun 2637*4882a593Smuzhiyun14.1 IBM BladeCenter 2638*4882a593Smuzhiyun-------------------- 2639*4882a593Smuzhiyun 2640*4882a593SmuzhiyunThis applies to the JS20 and similar systems. 2641*4882a593Smuzhiyun 2642*4882a593SmuzhiyunOn the JS20 blades, the bonding driver supports only 2643*4882a593Smuzhiyunbalance-rr, active-backup, balance-tlb and balance-alb modes. This is 2644*4882a593Smuzhiyunlargely due to the network topology inside the BladeCenter, detailed 2645*4882a593Smuzhiyunbelow. 2646*4882a593Smuzhiyun 2647*4882a593SmuzhiyunJS20 network adapter information 2648*4882a593Smuzhiyun-------------------------------- 2649*4882a593Smuzhiyun 2650*4882a593SmuzhiyunAll JS20s come with two Broadcom Gigabit Ethernet ports 2651*4882a593Smuzhiyunintegrated on the planar (that's "motherboard" in IBM-speak). In the 2652*4882a593SmuzhiyunBladeCenter chassis, the eth0 port of all JS20 blades is hard wired to 2653*4882a593SmuzhiyunI/O Module #1; similarly, all eth1 ports are wired to I/O Module #2. 2654*4882a593SmuzhiyunAn add-on Broadcom daughter card can be installed on a JS20 to provide 2655*4882a593Smuzhiyuntwo more Gigabit Ethernet ports. These ports, eth2 and eth3, are 2656*4882a593Smuzhiyunwired to I/O Modules 3 and 4, respectively. 2657*4882a593Smuzhiyun 2658*4882a593SmuzhiyunEach I/O Module may contain either a switch or a passthrough 2659*4882a593Smuzhiyunmodule (which allows ports to be directly connected to an external 2660*4882a593Smuzhiyunswitch). Some bonding modes require a specific BladeCenter internal 2661*4882a593Smuzhiyunnetwork topology in order to function; these are detailed below. 2662*4882a593Smuzhiyun 2663*4882a593SmuzhiyunAdditional BladeCenter-specific networking information can be 2664*4882a593Smuzhiyunfound in two IBM Redbooks (www.ibm.com/redbooks): 2665*4882a593Smuzhiyun 2666*4882a593Smuzhiyun- "IBM eServer BladeCenter Networking Options" 2667*4882a593Smuzhiyun- "IBM eServer BladeCenter Layer 2-7 Network Switching" 2668*4882a593Smuzhiyun 2669*4882a593SmuzhiyunBladeCenter networking configuration 2670*4882a593Smuzhiyun------------------------------------ 2671*4882a593Smuzhiyun 2672*4882a593SmuzhiyunBecause a BladeCenter can be configured in a very large number 2673*4882a593Smuzhiyunof ways, this discussion will be confined to describing basic 2674*4882a593Smuzhiyunconfigurations. 2675*4882a593Smuzhiyun 2676*4882a593SmuzhiyunNormally, Ethernet Switch Modules (ESMs) are used in I/O 2677*4882a593Smuzhiyunmodules 1 and 2. In this configuration, the eth0 and eth1 ports of a 2678*4882a593SmuzhiyunJS20 will be connected to different internal switches (in the 2679*4882a593Smuzhiyunrespective I/O modules). 2680*4882a593Smuzhiyun 2681*4882a593SmuzhiyunA passthrough module (OPM or CPM, optical or copper, 2682*4882a593Smuzhiyunpassthrough module) connects the I/O module directly to an external 2683*4882a593Smuzhiyunswitch. By using PMs in I/O module #1 and #2, the eth0 and eth1 2684*4882a593Smuzhiyuninterfaces of a JS20 can be redirected to the outside world and 2685*4882a593Smuzhiyunconnected to a common external switch. 2686*4882a593Smuzhiyun 2687*4882a593SmuzhiyunDepending upon the mix of ESMs and PMs, the network will 2688*4882a593Smuzhiyunappear to bonding as either a single switch topology (all PMs) or as a 2689*4882a593Smuzhiyunmultiple switch topology (one or more ESMs, zero or more PMs). It is 2690*4882a593Smuzhiyunalso possible to connect ESMs together, resulting in a configuration 2691*4882a593Smuzhiyunmuch like the example in "High Availability in a Multiple Switch 2692*4882a593SmuzhiyunTopology," above. 2693*4882a593Smuzhiyun 2694*4882a593SmuzhiyunRequirements for specific modes 2695*4882a593Smuzhiyun------------------------------- 2696*4882a593Smuzhiyun 2697*4882a593SmuzhiyunThe balance-rr mode requires the use of passthrough modules 2698*4882a593Smuzhiyunfor devices in the bond, all connected to an common external switch. 2699*4882a593SmuzhiyunThat switch must be configured for "etherchannel" or "trunking" on the 2700*4882a593Smuzhiyunappropriate ports, as is usual for balance-rr. 2701*4882a593Smuzhiyun 2702*4882a593SmuzhiyunThe balance-alb and balance-tlb modes will function with 2703*4882a593Smuzhiyuneither switch modules or passthrough modules (or a mix). The only 2704*4882a593Smuzhiyunspecific requirement for these modes is that all network interfaces 2705*4882a593Smuzhiyunmust be able to reach all destinations for traffic sent over the 2706*4882a593Smuzhiyunbonding device (i.e., the network must converge at some point outside 2707*4882a593Smuzhiyunthe BladeCenter). 2708*4882a593Smuzhiyun 2709*4882a593SmuzhiyunThe active-backup mode has no additional requirements. 2710*4882a593Smuzhiyun 2711*4882a593SmuzhiyunLink monitoring issues 2712*4882a593Smuzhiyun---------------------- 2713*4882a593Smuzhiyun 2714*4882a593SmuzhiyunWhen an Ethernet Switch Module is in place, only the ARP 2715*4882a593Smuzhiyunmonitor will reliably detect link loss to an external switch. This is 2716*4882a593Smuzhiyunnothing unusual, but examination of the BladeCenter cabinet would 2717*4882a593Smuzhiyunsuggest that the "external" network ports are the ethernet ports for 2718*4882a593Smuzhiyunthe system, when it fact there is a switch between these "external" 2719*4882a593Smuzhiyunports and the devices on the JS20 system itself. The MII monitor is 2720*4882a593Smuzhiyunonly able to detect link failures between the ESM and the JS20 system. 2721*4882a593Smuzhiyun 2722*4882a593SmuzhiyunWhen a passthrough module is in place, the MII monitor does 2723*4882a593Smuzhiyundetect failures to the "external" port, which is then directly 2724*4882a593Smuzhiyunconnected to the JS20 system. 2725*4882a593Smuzhiyun 2726*4882a593SmuzhiyunOther concerns 2727*4882a593Smuzhiyun-------------- 2728*4882a593Smuzhiyun 2729*4882a593SmuzhiyunThe Serial Over LAN (SoL) link is established over the primary 2730*4882a593Smuzhiyunethernet (eth0) only, therefore, any loss of link to eth0 will result 2731*4882a593Smuzhiyunin losing your SoL connection. It will not fail over with other 2732*4882a593Smuzhiyunnetwork traffic, as the SoL system is beyond the control of the 2733*4882a593Smuzhiyunbonding driver. 2734*4882a593Smuzhiyun 2735*4882a593SmuzhiyunIt may be desirable to disable spanning tree on the switch 2736*4882a593Smuzhiyun(either the internal Ethernet Switch Module, or an external switch) to 2737*4882a593Smuzhiyunavoid fail-over delay issues when using bonding. 2738*4882a593Smuzhiyun 2739*4882a593Smuzhiyun 2740*4882a593Smuzhiyun15. Frequently Asked Questions 2741*4882a593Smuzhiyun============================== 2742*4882a593Smuzhiyun 2743*4882a593Smuzhiyun1. Is it SMP safe? 2744*4882a593Smuzhiyun------------------- 2745*4882a593Smuzhiyun 2746*4882a593SmuzhiyunYes. The old 2.0.xx channel bonding patch was not SMP safe. 2747*4882a593SmuzhiyunThe new driver was designed to be SMP safe from the start. 2748*4882a593Smuzhiyun 2749*4882a593Smuzhiyun2. What type of cards will work with it? 2750*4882a593Smuzhiyun----------------------------------------- 2751*4882a593Smuzhiyun 2752*4882a593SmuzhiyunAny Ethernet type cards (you can even mix cards - a Intel 2753*4882a593SmuzhiyunEtherExpress PRO/100 and a 3com 3c905b, for example). For most modes, 2754*4882a593Smuzhiyundevices need not be of the same speed. 2755*4882a593Smuzhiyun 2756*4882a593SmuzhiyunStarting with version 3.2.1, bonding also supports Infiniband 2757*4882a593Smuzhiyunslaves in active-backup mode. 2758*4882a593Smuzhiyun 2759*4882a593Smuzhiyun3. How many bonding devices can I have? 2760*4882a593Smuzhiyun---------------------------------------- 2761*4882a593Smuzhiyun 2762*4882a593SmuzhiyunThere is no limit. 2763*4882a593Smuzhiyun 2764*4882a593Smuzhiyun4. How many slaves can a bonding device have? 2765*4882a593Smuzhiyun---------------------------------------------- 2766*4882a593Smuzhiyun 2767*4882a593SmuzhiyunThis is limited only by the number of network interfaces Linux 2768*4882a593Smuzhiyunsupports and/or the number of network cards you can place in your 2769*4882a593Smuzhiyunsystem. 2770*4882a593Smuzhiyun 2771*4882a593Smuzhiyun5. What happens when a slave link dies? 2772*4882a593Smuzhiyun---------------------------------------- 2773*4882a593Smuzhiyun 2774*4882a593SmuzhiyunIf link monitoring is enabled, then the failing device will be 2775*4882a593Smuzhiyundisabled. The active-backup mode will fail over to a backup link, and 2776*4882a593Smuzhiyunother modes will ignore the failed link. The link will continue to be 2777*4882a593Smuzhiyunmonitored, and should it recover, it will rejoin the bond (in whatever 2778*4882a593Smuzhiyunmanner is appropriate for the mode). See the sections on High 2779*4882a593SmuzhiyunAvailability and the documentation for each mode for additional 2780*4882a593Smuzhiyuninformation. 2781*4882a593Smuzhiyun 2782*4882a593SmuzhiyunLink monitoring can be enabled via either the miimon or 2783*4882a593Smuzhiyunarp_interval parameters (described in the module parameters section, 2784*4882a593Smuzhiyunabove). In general, miimon monitors the carrier state as sensed by 2785*4882a593Smuzhiyunthe underlying network device, and the arp monitor (arp_interval) 2786*4882a593Smuzhiyunmonitors connectivity to another host on the local network. 2787*4882a593Smuzhiyun 2788*4882a593SmuzhiyunIf no link monitoring is configured, the bonding driver will 2789*4882a593Smuzhiyunbe unable to detect link failures, and will assume that all links are 2790*4882a593Smuzhiyunalways available. This will likely result in lost packets, and a 2791*4882a593Smuzhiyunresulting degradation of performance. The precise performance loss 2792*4882a593Smuzhiyundepends upon the bonding mode and network configuration. 2793*4882a593Smuzhiyun 2794*4882a593Smuzhiyun6. Can bonding be used for High Availability? 2795*4882a593Smuzhiyun---------------------------------------------- 2796*4882a593Smuzhiyun 2797*4882a593SmuzhiyunYes. See the section on High Availability for details. 2798*4882a593Smuzhiyun 2799*4882a593Smuzhiyun7. Which switches/systems does it work with? 2800*4882a593Smuzhiyun--------------------------------------------- 2801*4882a593Smuzhiyun 2802*4882a593SmuzhiyunThe full answer to this depends upon the desired mode. 2803*4882a593Smuzhiyun 2804*4882a593SmuzhiyunIn the basic balance modes (balance-rr and balance-xor), it 2805*4882a593Smuzhiyunworks with any system that supports etherchannel (also called 2806*4882a593Smuzhiyuntrunking). Most managed switches currently available have such 2807*4882a593Smuzhiyunsupport, and many unmanaged switches as well. 2808*4882a593Smuzhiyun 2809*4882a593SmuzhiyunThe advanced balance modes (balance-tlb and balance-alb) do 2810*4882a593Smuzhiyunnot have special switch requirements, but do need device drivers that 2811*4882a593Smuzhiyunsupport specific features (described in the appropriate section under 2812*4882a593Smuzhiyunmodule parameters, above). 2813*4882a593Smuzhiyun 2814*4882a593SmuzhiyunIn 802.3ad mode, it works with systems that support IEEE 2815*4882a593Smuzhiyun802.3ad Dynamic Link Aggregation. Most managed and many unmanaged 2816*4882a593Smuzhiyunswitches currently available support 802.3ad. 2817*4882a593Smuzhiyun 2818*4882a593SmuzhiyunThe active-backup mode should work with any Layer-II switch. 2819*4882a593Smuzhiyun 2820*4882a593Smuzhiyun8. Where does a bonding device get its MAC address from? 2821*4882a593Smuzhiyun--------------------------------------------------------- 2822*4882a593Smuzhiyun 2823*4882a593SmuzhiyunWhen using slave devices that have fixed MAC addresses, or when 2824*4882a593Smuzhiyunthe fail_over_mac option is enabled, the bonding device's MAC address is 2825*4882a593Smuzhiyunthe MAC address of the active slave. 2826*4882a593Smuzhiyun 2827*4882a593SmuzhiyunFor other configurations, if not explicitly configured (with 2828*4882a593Smuzhiyunifconfig or ip link), the MAC address of the bonding device is taken from 2829*4882a593Smuzhiyunits first slave device. This MAC address is then passed to all following 2830*4882a593Smuzhiyunslaves and remains persistent (even if the first slave is removed) until 2831*4882a593Smuzhiyunthe bonding device is brought down or reconfigured. 2832*4882a593Smuzhiyun 2833*4882a593SmuzhiyunIf you wish to change the MAC address, you can set it with 2834*4882a593Smuzhiyunifconfig or ip link:: 2835*4882a593Smuzhiyun 2836*4882a593Smuzhiyun # ifconfig bond0 hw ether 00:11:22:33:44:55 2837*4882a593Smuzhiyun 2838*4882a593Smuzhiyun # ip link set bond0 address 66:77:88:99:aa:bb 2839*4882a593Smuzhiyun 2840*4882a593SmuzhiyunThe MAC address can be also changed by bringing down/up the 2841*4882a593Smuzhiyundevice and then changing its slaves (or their order):: 2842*4882a593Smuzhiyun 2843*4882a593Smuzhiyun # ifconfig bond0 down ; modprobe -r bonding 2844*4882a593Smuzhiyun # ifconfig bond0 .... up 2845*4882a593Smuzhiyun # ifenslave bond0 eth... 2846*4882a593Smuzhiyun 2847*4882a593SmuzhiyunThis method will automatically take the address from the next 2848*4882a593Smuzhiyunslave that is added. 2849*4882a593Smuzhiyun 2850*4882a593SmuzhiyunTo restore your slaves' MAC addresses, you need to detach them 2851*4882a593Smuzhiyunfrom the bond (``ifenslave -d bond0 eth0``). The bonding driver will 2852*4882a593Smuzhiyunthen restore the MAC addresses that the slaves had before they were 2853*4882a593Smuzhiyunenslaved. 2854*4882a593Smuzhiyun 2855*4882a593Smuzhiyun16. Resources and Links 2856*4882a593Smuzhiyun======================= 2857*4882a593Smuzhiyun 2858*4882a593SmuzhiyunThe latest version of the bonding driver can be found in the latest 2859*4882a593Smuzhiyunversion of the linux kernel, found on http://kernel.org 2860*4882a593Smuzhiyun 2861*4882a593SmuzhiyunThe latest version of this document can be found in the latest kernel 2862*4882a593Smuzhiyunsource (named Documentation/networking/bonding.rst). 2863*4882a593Smuzhiyun 2864*4882a593SmuzhiyunDiscussions regarding the development of the bonding driver take place 2865*4882a593Smuzhiyunon the main Linux network mailing list, hosted at vger.kernel.org. The list 2866*4882a593Smuzhiyunaddress is: 2867*4882a593Smuzhiyun 2868*4882a593Smuzhiyunnetdev@vger.kernel.org 2869*4882a593Smuzhiyun 2870*4882a593SmuzhiyunThe administrative interface (to subscribe or unsubscribe) can 2871*4882a593Smuzhiyunbe found at: 2872*4882a593Smuzhiyun 2873*4882a593Smuzhiyunhttp://vger.kernel.org/vger-lists.html#netdev 2874