Wailok Shum | bb7408b | 2021-09-30 22:41:32 +0800 | [diff] [blame] | 1 | Dual Homing |
| 2 | =========== |
| 3 | |
| 4 | Overview |
| 5 | -------- |
| 6 | |
| 7 | .. image:: ../../images/config-dh.png |
| 8 | |
| 9 | The dual-homing feature includes several sub components |
| 10 | |
| 11 | - **Use of "paired" ToRs**: Each rack of compute nodes have exactly two |
| 12 | Top-of-Rack switches (ToRs), that are linked to each other via a single link |
| 13 | - such a link is referred to as a **pair link**. This pairing should NOT be |
| 14 | omitted. |
| 15 | |
| 16 | Currently there is support for only a single link between paired ToRs. In |
| 17 | future releases, we may include dual pair links. Note that the pair link is |
| 18 | only used in failure scenarios, and not in normal operation. |
| 19 | |
| 20 | - **Dual-homed servers (compute-nodes)**: Each server is connected to both |
| 21 | ToRs. The links to the paired ToRs are (Linux) bonded |
| 22 | |
| 23 | - **Dual-homed upstream routers**: The upstream routers MUST be connected to |
| 24 | the two ToRs that are part of a leaf-pair. You cannot connect them to leafs |
| 25 | that are not paired. This feature also requires two Quagga instances. |
| 26 | |
| 27 | - **Dual-homed access devices**. This component will be added in the future. |
| 28 | |
| 29 | Paired ToRs |
| 30 | ----------- |
| 31 | The reasoning behind two ToR (leaf) switches is simple. If you only have a |
| 32 | single ToR switch, and you lose it, the entire rack goes down. Using two ToR |
| 33 | switches increases your odds for continued connectivity for dual homed servers. |
| 34 | The reasoning behind pairing the two ToR switches is more involved, as is |
| 35 | explained in the Usage section below. |
| 36 | |
| 37 | Configure pair ToRs |
| 38 | ^^^^^^^^^^^^^^^^^^^ |
| 39 | Configuring paired-ToRs involves device configuration. Assume switches of:205 |
| 40 | and of:206 are paired ToRs. |
| 41 | |
| 42 | .. code-block:: json |
| 43 | |
| 44 | { |
| 45 | "devices" : { |
| 46 | "of:0000000000000205" : { |
| 47 | "segmentrouting" : { |
| 48 | "name" : "Leaf1-R2", |
| 49 | "ipv4NodeSid" : 205, |
| 50 | "ipv4Loopback" : "192.168.0.205", |
| 51 | "ipv6NodeSid" : 205, |
| 52 | "ipv6Loopback" : "2000::c0a8:0205", |
| 53 | "routerMac" : "00:00:02:05:00:01", |
| 54 | "pairDeviceId" : "of:0000000000000206", |
| 55 | "pairLocalPort" : 20, |
| 56 | "isEdgeRouter" : true, |
| 57 | "adjacencySids" : [] |
| 58 | } |
| 59 | }, |
| 60 | "of:0000000000000206" : { |
| 61 | "segmentrouting" : { |
| 62 | "name" : "Leaf2-R2", |
| 63 | "ipv4NodeSid" : 206, |
| 64 | "ipv4Loopback" : "192.168.0.206", |
| 65 | "ipv6NodeSid" : 206, |
| 66 | "ipv6Loopback" : "2000::c0a8:0206", |
| 67 | "routerMac" : "00:00:02:05:00:01", |
| 68 | "pairDeviceId" : "of:0000000000000205", |
| 69 | "pairLocalPort" : 30, |
| 70 | "isEdgeRouter" : true, |
| 71 | "adjacencySids" : [] |
| 72 | } |
| 73 | } |
| 74 | } |
| 75 | } |
| 76 | |
| 77 | There are two new pieces of device configuration. |
| 78 | |
| 79 | Each device in the ToR pair needs to specify the **deviceId of the leaf it is |
| 80 | paired to**, in the ``pairDeviceId`` field. For example, in ``of:205`` |
| 81 | configuration the ``pairDeviceId`` is specified as ``of:206``, and similarly in ``of:206`` |
| 82 | configuration the ``pairDeviceId`` is ``of:205``. Each device in the ToR pair needs to |
| 83 | specify the **port on the device used for the pair link** in the |
| 84 | ``pairLocalPort`` field. For example, the pair link in the config above show |
| 85 | that port 20 on of:205 is connected to port 30 on of:206. |
| 86 | |
| 87 | In addition, there is one crucial piece of config that needs to **match for |
| 88 | both ToRs** – the ``routerMac`` address. The paired-ToRs MUST have the same |
| 89 | ``routerMac`` - in the example above, they both have identical 00:00:02:05:00:01 |
| 90 | ``routerMac``. |
| 91 | |
| 92 | All other fields are the same as before, as explained in :doc:`Device |
| 93 | Configuration <../../configuration/network>` section. |
| 94 | |
| 95 | |
| 96 | Usage of pair link |
| 97 | ^^^^^^^^^^^^^^^^^^ |
| 98 | |
| 99 | .. image:: ../../images/config-dh-pair-link.png |
| 100 | |
| 101 | |
| 102 | Dual-Homed Servers |
| 103 | ------------------ |
| 104 | |
| 105 | There are a number of things to note when connecting dual-homed servers to paired-ToRs. |
| 106 | |
| 107 | - The switch ports on the two ToRs have to be configured the same way, when |
| 108 | connecting a dual-homed server to the two ToRs. |
| 109 | |
| 110 | - The server ports have to be Linux-bonded in a particular mode. |
| 111 | |
| 112 | Configure Switch Ports |
| 113 | ^^^^^^^^^^^^^^^^^^^^^^ |
| 114 | |
| 115 | The way to configure ports are similar as described in :doc:`Bridging and |
| 116 | Unicast <../../configuration/network>`. However, there are a couple of things to note. |
| 117 | |
| 118 | **First**, dual-homed servers should have the **identical configuration on each |
| 119 | switch port they connect to on the ToR pairs**. The example below shows that |
| 120 | the ``vlans`` and ``ips`` configured are the same on both switch ports |
| 121 | ``of:205/12`` and ``of:206/29``. They are both configured to be access ports |
| 122 | in ``VLAN 20``, the subnet ``10.0.2.0/24`` is assigned to these ports, and the |
| 123 | gateway-IP is ``10.0.2.254/32``. |
| 124 | |
| 125 | .. code-block:: json |
| 126 | |
| 127 | { |
| 128 | "ports" : { |
| 129 | "of:0000000000000205/12" : { |
| 130 | "interfaces" : [{ |
| 131 | "name" : "h3-intf-1", |
| 132 | "ips" : [ "10.0.2.254/24"], |
| 133 | "vlan-untagged": 20 |
| 134 | }] |
| 135 | }, |
| 136 | "of:0000000000000206/29" : { |
| 137 | "interfaces" : [{ |
| 138 | "name" : "h3-intf-2", |
| 139 | "ips" : [ "10.0.2.254/24"], |
| 140 | "vlan-untagged": 20 |
| 141 | }] |
| 142 | } |
| 143 | } |
| 144 | } |
| 145 | |
| 146 | It is worth noting the meaning behind the configuration above from a routing |
| 147 | perspective. Simply put, by configuring the same subnets on these switch |
| 148 | ports, the fabric now believes that the entire subnet ``10.0.2.0/24`` is |
| 149 | reachable by BOTH ToR switches ``of:205`` and ``of:206``. |
| 150 | |
| 151 | .. caution:: |
| 152 | Configuring different VLANs, or different subnets, or mismatches like |
| 153 | ``vlan-untagged`` in one switch port and ``vlan-tagged`` in the corresponding |
| 154 | switch port facing the dual-homed server, will result in incorrect |
| 155 | behavior. |
| 156 | |
| 157 | **Second**, we need to configure the **pair link ports on both ToR switches to |
| 158 | be trunk (``vlan-tagged``) ports that contains all dual-homed VLANs and subnets**. |
| 159 | This is an extra piece of configuration, the need for which will be removed in |
| 160 | future releases. In the example above, a dual-homed server connects to the ToR |
| 161 | pair on port 12 on of:205 and port 29 on of:206. Assume that the pair link |
| 162 | between the two ToRs is connected to port 5 of both of:205 and of:206. The |
| 163 | config for these switch ports is shown below: |
| 164 | |
| 165 | .. code-block:: json |
| 166 | |
| 167 | { |
| 168 | "ports": { |
| 169 | "of:0000000000000205/5" : { |
| 170 | "interfaces" : [{ |
| 171 | "name" : "205-pair-port", |
| 172 | "ips" : [ "10.0.2.254/24"], |
| 173 | "vlan-tagged": [20] |
| 174 | }] |
| 175 | }, |
| 176 | "of:0000000000000206/5" : { |
| 177 | "interfaces" : [{ |
| 178 | "name" : "206-pair-port", |
| 179 | "ips" : [ "10.0.2.254/24"], |
| 180 | "vlan-tagged": [20] |
| 181 | }] |
| 182 | } |
| 183 | } |
| 184 | } |
| 185 | |
| 186 | .. note:: |
| 187 | Even though the ports ``of:205/12`` and ``of:206/29`` facing the dual-homed |
| 188 | server are configured as ``vlan-untagged``, the same VLAN MUST be |
| 189 | configured as ``vlan-tagged`` on the pair-ports. |
| 190 | |
| 191 | If additional subnets and VLANs are configured facing other dual-homed |
| 192 | servers, they need to be similarly added to the ``ips`` and ``vlan-tagged`` |
| 193 | arrays in the pair port config. |
| 194 | |
| 195 | |
| 196 | Configure Servers |
| 197 | ^^^^^^^^^^^^^^^^^ |
| 198 | |
| 199 | Assuming the interfaces we are going to use for bonding are ``eth1`` and ``eth2``. |
| 200 | |
| 201 | - Bring down interfaces |
| 202 | |
| 203 | .. code-block:: console |
| 204 | |
| 205 | $ sudo ifdown eth1 |
| 206 | $ sudo ifdown eth2 |
| 207 | |
| 208 | - Modify ``/etc/network/interfaces`` |
| 209 | |
| 210 | .. code-block:: text |
| 211 | |
| 212 | auto bond0 |
| 213 | iface bond0 inet dhcp |
| 214 | bond-mode balance-xor |
| 215 | bond-xmit_hash_policy layer2+3 |
| 216 | bond-slaves none |
| 217 | |
| 218 | auto eth1 |
| 219 | iface eth1 inet manual |
| 220 | bond-master bond0 |
| 221 | |
| 222 | auto eth2 |
| 223 | iface eth2 inet manual |
| 224 | bond-master bond0 |
| 225 | |
| 226 | |
| 227 | - Start interfaces |
| 228 | |
| 229 | .. code-block:: console |
| 230 | |
| 231 | $ sudo ifup bond0 |
| 232 | $ sudo ifup eth1 |
| 233 | $ sudo ifup eth2 |
| 234 | |
| 235 | - Useful command to check bonding status |
| 236 | |
| 237 | .. code-block:: console |
| 238 | |
| 239 | # cat /proc/net/bonding/bond0 |
| 240 | Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) |
| 241 | |
| 242 | Bonding Mode: load balancing (xor) |
| 243 | Transmit Hash Policy: layer2+3 (2) |
| 244 | MII Status: up |
| 245 | MII Polling Interval (ms): 0 |
| 246 | Up Delay (ms): 0 |
| 247 | Down Delay (ms): 0 |
| 248 | |
| 249 | Slave Interface: eth1 |
| 250 | MII Status: up |
| 251 | Speed: 1000 Mbps |
| 252 | Duplex: full |
| 253 | Link Failure Count: 0 |
| 254 | Permanent HW addr: 00:1c:42:5b:07:6a |
| 255 | Slave queue ID: 0 |
| 256 | |
| 257 | Slave Interface: eth2 |
| 258 | MII Status: up |
| 259 | Speed: Unknown |
| 260 | Duplex: Unknown |
| 261 | Link Failure Count: 0 |
| 262 | Permanent HW addr: 00:1c:42:1c:a1:7c |
| 263 | Slave queue ID: 0 |
| 264 | |
| 265 | .. caution:: |
| 266 | **Dual-homed host should not be statically configured.** |
| 267 | |
| 268 | Currently in ONOS, configured hosts are not updated when the ``connectPoint`` |
| 269 | is lost. This is not a problem with single-homed hosts because there is no |
| 270 | other way to reach them anyway if their ``connectPoint`` goes down. But in |
| 271 | dual-homed scenarios, the controller should take corrective action if one |
| 272 | of the ``connectPoint`` go down – the trigger for this event does not happen |
| 273 | when the dual-homed host's connect points are configured (not discovered). |
| 274 | |
| 275 | .. note:: |
| 276 | We also support static routes with dual-homed next hop. The way to |
| 277 | configure it is exactly the same as regular single-homed next hop, as |
| 278 | described in :doc:`External Connectivity <external-connectivity>`. |
| 279 | |
| 280 | ONOS will automatically recognize when the next-hop IP resolves to a |
| 281 | dual-homed host and program both switches (the host connects to) |
| 282 | accordingly. |
| 283 | |
| 284 | The failure recovery mechanism for dual-homed hosts also applies to static |
| 285 | routes that point to the host as their next hop. |
| 286 | |
| 287 | Dual External Routers |
| 288 | --------------------- |
| 289 | |
| 290 | .. image:: ../../images/config-dh-vr.png |
| 291 | |
| 292 | .. image:: ../../images/config-dh-vr-logical.png |
| 293 | :width: 200px |
| 294 | |
| 295 | In addition to what we describe in :doc:`External Connectivity |
| 296 | <external-connectivity>`, SD-Fabric also supports dual external routers, which |
| 297 | view the SD-Fabric as 2 individual routers, as shown above. |
| 298 | |
| 299 | As before the vRouter control plane is implemented as a combination of Quagga, |
| 300 | which peers with the upstream routers, and ONOS which listens to Quagga (via |
| 301 | FPM) and programs the underlying fabric. **In dual-router scenarios, there are |
| 302 | two instances of Quagga required**. |
| 303 | |
| 304 | As before the hardware fabric serves as the data-plane of vRouter. In |
| 305 | dual-router scenarios, the **external routers MUST be connected to |
| 306 | paired-ToRs**. |
| 307 | |
| 308 | ToR connects to one upstream |
| 309 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 310 | |
| 311 | Lets consider the simpler case where the external routers are each connected to |
| 312 | a single leaf in a ToR pair. The figure on the left below shows the logical |
| 313 | view. The figure on the right shows the physical connectivity. |
| 314 | |
| 315 | .. image:: ../../images/config-dh-vr-logical-simple.png |
| 316 | :width: 200px |
| 317 | |
| 318 | .. image:: ../../images/config-dh-vr-physical-simple.png |
| 319 | :width: 400px |
| 320 | |
| 321 | One of the upstream routers is connected to ``of:205`` and the other is |
| 322 | connected to ``of:206``. Note that ``of:205`` and ``of:206`` are paired ToRs. |
| 323 | |
| 324 | The ToRs are connected via a physical port to separate Quagga VMs or |
| 325 | containers. These Quagga instances can be placed in any compute node. They do |
| 326 | not need to be in the same server, and are only shown to be co-located for |
| 327 | simplicity. |
| 328 | |
| 329 | The two Quagga instances do NOT talk to each other. |
| 330 | |
| 331 | Switch port configuration |
| 332 | """"""""""""""""""""""""" |
| 333 | |
| 334 | The ToRs follow the same rules as single router case described in |
| 335 | :doc:`External Connectivity <external-connectivity>`. In the example shown |
| 336 | above, the switch port config would look like this: |
| 337 | |
| 338 | .. code-block:: json |
| 339 | |
| 340 | { |
| 341 | "ports": { |
| 342 | "of:0000000000000205/1" : { |
| 343 | "interfaces" : [{ |
| 344 | "ips" : [ "10.0.100.3/29", "2000::6403/125" ], |
| 345 | "vlan-untagged": 100, |
| 346 | "name" : "internet-router-1" |
| 347 | }] |
| 348 | }, |
| 349 | |
| 350 | "of:0000000000000205/48" : { |
| 351 | "interfaces" : [{ |
| 352 | "ips" : [ "10.0.100.3/29", "2000::6403/125" ], |
| 353 | "vlan-untagged": 100, |
| 354 | "name" : "quagga-1" |
| 355 | }] |
| 356 | }, |
| 357 | |
| 358 | "of:0000000000000206/1" : { |
| 359 | "interfaces" : [{ |
| 360 | "ips" : [ "10.0.200.3/29", "2000::6503/125" ], |
| 361 | "vlan-untagged": 200, |
| 362 | "name" : "internet-router-2" |
| 363 | }] |
| 364 | }, |
| 365 | |
| 366 | "of:0000000000000206/48" : { |
| 367 | "interfaces" : [{ |
| 368 | "ips" : [ "10.0.200.3/29", "2000::6503/125" ], |
| 369 | "vlan-untagged": 200, |
| 370 | "name" : "quagga2" |
| 371 | }] |
| 372 | } |
| 373 | } |
| 374 | } |
| 375 | |
| 376 | .. note:: |
| 377 | In the example shown above, switch ``of:205`` uses ``VLAN 100`` for |
| 378 | bridging the peering session between Quagga1 and ExtRouter1, while switch |
| 379 | ``of:205`` uses ``VLAN 200`` to do the same for the other peering session. |
| 380 | But since these VLANs and bridging domains are defined on different |
| 381 | switches, the VLAN ids could have been the same. |
| 382 | |
| 383 | This philosophy is consistent with the fabric use of :doc:`bridging |
| 384 | <../../configuration/network>`. |
| 385 | |
| 386 | |
| 387 | Quagga configuration |
| 388 | """""""""""""""""""" |
| 389 | Configuring Quagga for dual external routers are similar to what we described |
| 390 | in :doc:`External Connectivity <external-connectivity>`. However, it is worth |
| 391 | noting that: |
| 392 | |
| 393 | - The two Zebra instances **should point to two different ONOS instances** for |
| 394 | their FPM connections. For example Zebra in Quagga1 could point to ONOS |
| 395 | instance with ``fpm connection ip 10.6.0.1 port 2620``, while the other Zebra |
| 396 | should point to a different ONOS instance with ``fpm connection ip 10.6.0.2 |
| 397 | port 2620``. It does not matter which ONOS instances they point to as long |
| 398 | as they are different. |
| 399 | |
| 400 | - The two Quagga BGP sessions should appear to come from different routers but |
| 401 | still use the same AS number – i.e. the two Quaggas' belong to the same AS, |
| 402 | the one used to represent the entire SD-Fabric. |
| 403 | |
| 404 | - The two upstream routers can belong to the same or different AS, but these AS |
| 405 | numbers should be different from the one used to represent the SD-Fabric AS. |
| 406 | |
| 407 | - Typically both Quagga instances advertise the same routes to the upstream. |
| 408 | These prefixes belonging to various infrastructure nodes in the deployment |
| 409 | should be reachable from either of the leaf switches connected to the |
| 410 | upstream routers. |
| 411 | |
| 412 | - The upstream routers may or may not advertise the same routes. SD-Fabric will |
| 413 | ensure that traffic directed to a route reachable only one upstream router is |
| 414 | directed to the appropriate leaf. |
| 415 | |
| 416 | ToR connects to both upstream |
| 417 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 418 | |
| 419 | Now lets consider the **more-complicated but more fault-tolerant** case of each |
| 420 | Quagga instance peering with BOTH external routers. Again the logical view is |
| 421 | shown on the left and the physical view on the right. |
| 422 | |
| 423 | .. image:: ../../images/config-dh-vr-logical.png |
| 424 | :width: 200px |
| 425 | |
| 426 | .. image:: ../../images/config-dh-vr-physical.png |
| 427 | :width: 500px |
| 428 | |
| 429 | First lets talk about the physical connectivity |
| 430 | |
| 431 | - Quagga instance 1 peers with external router R1 via port 1 on switch of:205 |
| 432 | - Quagga instance 1 peers with external router R2 via port 2 on switch of:205 |
| 433 | |
| 434 | Similarly |
| 435 | |
| 436 | - Quagga instance 2 peers with external router R1 via port 2 on switch of:206 |
| 437 | - Quagga instance 2 peers with external router R2 via port 1 on switch of:206 |
| 438 | |
| 439 | To distinguish between the two peering sessions in the same physical switch, |
| 440 | say of:205, the physical ports 1 and 2 need to be configured in **different |
| 441 | VLANs and subnets**. For example, port 1 on of:205 is (untagged) in VLAN 100, |
| 442 | while port 2 is in VLAN 101. |
| 443 | |
| 444 | Note that peering for **Quagga1 and R1** happens with IPs in the |
| 445 | ``10.0.100.0/29`` subnet, and for **Quagga 1 and R2** in the **10.0.101.0/29** |
| 446 | subnet. |
| 447 | |
| 448 | Furthermore, **pair link** (port 48) on of:205 carries both peering sessions to |
| 449 | Quagga1. Thus port 48 should now be configured as a **trunk port (vlan-tagged) |
| 450 | with both VLANs and both subnets**. |
| 451 | |
| 452 | Finally the **Quagga interface** on the VM now needs **sub-interface |
| 453 | configuration for each VLAN ID**. |
| 454 | |
| 455 | Similar configuration concepts apply to IPv6 as well. Here is a look at the |
| 456 | switch port config in ONOS for of:205 |
| 457 | |
| 458 | .. code-block:: json |
| 459 | |
| 460 | { |
| 461 | "ports": { |
| 462 | "of:0000000000000205/1" : { |
| 463 | "interfaces" : [{ |
| 464 | "ips" : [ "10.0.100.3/29", "2000::6403/125" ], |
| 465 | "vlan-untagged": 100, |
| 466 | "name" : "internet-router1" |
| 467 | }] |
| 468 | }, |
| 469 | |
| 470 | |
| 471 | "of:0000000000000205/2" : { |
| 472 | "interfaces" : [{ |
| 473 | "ips" : [ "10.0.101.3/29", "2000::7403/125" ], |
| 474 | "vlan-untagged": 101, |
| 475 | "name" : "internet-router2" |
| 476 | }] |
| 477 | }, |
| 478 | "of:0000000000000205/48" : { |
| 479 | "interfaces" : [{ |
| 480 | "ips" : [ "10.0.100.3/29", "2000::6403/125", "10.0.101.3/29", "2000::7403/125" ], |
| 481 | "vlan-tagged": [100, 101], |
| 482 | "name" : "quagga1" |
| 483 | }] |
| 484 | |
| 485 | } |
| 486 | } |
| 487 | } |