blob: 2ca5c75f9bb32622c216c803d815958404b4093d [file] [log] [blame]
Charles Chanfcfe8902022-02-02 17:06:27 -08001.. SPDX-FileCopyrightText: 2021 Open Networking Foundation <info@opennetworking.org>
2.. SPDX-License-Identifier: Apache-2.0
3
Wailok Shumbb7408b2021-09-30 22:41:32 +08004Dual Homing
5===========
6
7Overview
8--------
9
10.. image:: ../../images/config-dh.png
11
12The dual-homing feature includes several sub components
13
14- **Use of "paired" ToRs**: Each rack of compute nodes have exactly two
15 Top-of-Rack switches (ToRs), that are linked to each other via a single link
16 - such a link is referred to as a **pair link**. This pairing should NOT be
17 omitted.
18
19 Currently there is support for only a single link between paired ToRs. In
20 future releases, we may include dual pair links. Note that the pair link is
21 only used in failure scenarios, and not in normal operation.
22
23- **Dual-homed servers (compute-nodes)**: Each server is connected to both
24 ToRs. The links to the paired ToRs are (Linux) bonded
25
26- **Dual-homed upstream routers**: The upstream routers MUST be connected to
27 the two ToRs that are part of a leaf-pair. You cannot connect them to leafs
28 that are not paired. This feature also requires two Quagga instances.
29
30- **Dual-homed access devices**. This component will be added in the future.
31
32Paired ToRs
33-----------
34The reasoning behind two ToR (leaf) switches is simple. If you only have a
35single ToR switch, and you lose it, the entire rack goes down. Using two ToR
36switches increases your odds for continued connectivity for dual homed servers.
37The reasoning behind pairing the two ToR switches is more involved, as is
38explained in the Usage section below.
39
40Configure pair ToRs
41^^^^^^^^^^^^^^^^^^^
42Configuring paired-ToRs involves device configuration. Assume switches of:205
43and of:206 are paired ToRs.
44
45.. code-block:: json
46
47 {
48 "devices" : {
49 "of:0000000000000205" : {
50 "segmentrouting" : {
51 "name" : "Leaf1-R2",
52 "ipv4NodeSid" : 205,
53 "ipv4Loopback" : "192.168.0.205",
54 "ipv6NodeSid" : 205,
55 "ipv6Loopback" : "2000::c0a8:0205",
56 "routerMac" : "00:00:02:05:00:01",
57 "pairDeviceId" : "of:0000000000000206",
58 "pairLocalPort" : 20,
59 "isEdgeRouter" : true,
60 "adjacencySids" : []
61 }
62 },
63 "of:0000000000000206" : {
64 "segmentrouting" : {
65 "name" : "Leaf2-R2",
66 "ipv4NodeSid" : 206,
67 "ipv4Loopback" : "192.168.0.206",
68 "ipv6NodeSid" : 206,
69 "ipv6Loopback" : "2000::c0a8:0206",
70 "routerMac" : "00:00:02:05:00:01",
71 "pairDeviceId" : "of:0000000000000205",
72 "pairLocalPort" : 30,
73 "isEdgeRouter" : true,
74 "adjacencySids" : []
75 }
76 }
77 }
78 }
79
80There are two new pieces of device configuration.
81
82Each device in the ToR pair needs to specify the **deviceId of the leaf it is
83paired to**, in the ``pairDeviceId`` field. For example, in ``of:205``
84configuration the ``pairDeviceId`` is specified as ``of:206``, and similarly in ``of:206``
85configuration the ``pairDeviceId`` is ``of:205``. Each device in the ToR pair needs to
86specify the **port on the device used for the pair link** in the
87``pairLocalPort`` field. For example, the pair link in the config above show
88that port 20 on of:205 is connected to port 30 on of:206.
89
90In addition, there is one crucial piece of config that needs to **match for
91both ToRs** – the ``routerMac`` address. The paired-ToRs MUST have the same
92``routerMac`` - in the example above, they both have identical 00:00:02:05:00:01
93``routerMac``.
94
95All other fields are the same as before, as explained in :doc:`Device
96Configuration <../../configuration/network>` section.
97
98
99Usage of pair link
100^^^^^^^^^^^^^^^^^^
101
102.. image:: ../../images/config-dh-pair-link.png
103
104
105Dual-Homed Servers
106------------------
107
108There are a number of things to note when connecting dual-homed servers to paired-ToRs.
109
110- The switch ports on the two ToRs have to be configured the same way, when
111 connecting a dual-homed server to the two ToRs.
112
113- The server ports have to be Linux-bonded in a particular mode.
114
115Configure Switch Ports
116^^^^^^^^^^^^^^^^^^^^^^
117
118The way to configure ports are similar as described in :doc:`Bridging and
119Unicast <../../configuration/network>`. However, there are a couple of things to note.
120
121**First**, dual-homed servers should have the **identical configuration on each
122switch port they connect to on the ToR pairs**. The example below shows that
123the ``vlans`` and ``ips`` configured are the same on both switch ports
124``of:205/12`` and ``of:206/29``. They are both configured to be access ports
125in ``VLAN 20``, the subnet ``10.0.2.0/24`` is assigned to these ports, and the
126gateway-IP is ``10.0.2.254/32``.
127
128.. code-block:: json
129
130 {
131 "ports" : {
132 "of:0000000000000205/12" : {
133 "interfaces" : [{
134 "name" : "h3-intf-1",
135 "ips" : [ "10.0.2.254/24"],
136 "vlan-untagged": 20
137 }]
138 },
139 "of:0000000000000206/29" : {
140 "interfaces" : [{
141 "name" : "h3-intf-2",
142 "ips" : [ "10.0.2.254/24"],
143 "vlan-untagged": 20
144 }]
145 }
146 }
147 }
148
149It is worth noting the meaning behind the configuration above from a routing
150perspective. Simply put, by configuring the same subnets on these switch
151ports, the fabric now believes that the entire subnet ``10.0.2.0/24`` is
152reachable by BOTH ToR switches ``of:205`` and ``of:206``.
153
154.. caution::
155 Configuring different VLANs, or different subnets, or mismatches like
156 ``vlan-untagged`` in one switch port and ``vlan-tagged`` in the corresponding
157 switch port facing the dual-homed server, will result in incorrect
158 behavior.
159
160**Second**, we need to configure the **pair link ports on both ToR switches to
161be trunk (``vlan-tagged``) ports that contains all dual-homed VLANs and subnets**.
162This is an extra piece of configuration, the need for which will be removed in
163future releases. In the example above, a dual-homed server connects to the ToR
164pair on port 12 on of:205 and port 29 on of:206. Assume that the pair link
165between the two ToRs is connected to port 5 of both of:205 and of:206. The
166config for these switch ports is shown below:
167
168.. code-block:: json
169
170 {
171 "ports": {
172 "of:0000000000000205/5" : {
173 "interfaces" : [{
174 "name" : "205-pair-port",
175 "ips" : [ "10.0.2.254/24"],
176 "vlan-tagged": [20]
177 }]
178 },
179 "of:0000000000000206/5" : {
180 "interfaces" : [{
181 "name" : "206-pair-port",
182 "ips" : [ "10.0.2.254/24"],
183 "vlan-tagged": [20]
184 }]
185 }
186 }
187 }
188
189.. note::
190 Even though the ports ``of:205/12`` and ``of:206/29`` facing the dual-homed
191 server are configured as ``vlan-untagged``, the same VLAN MUST be
192 configured as ``vlan-tagged`` on the pair-ports.
193
194 If additional subnets and VLANs are configured facing other dual-homed
195 servers, they need to be similarly added to the ``ips`` and ``vlan-tagged``
196 arrays in the pair port config.
197
198
199Configure Servers
200^^^^^^^^^^^^^^^^^
201
202Assuming the interfaces we are going to use for bonding are ``eth1`` and ``eth2``.
203
204- Bring down interfaces
205
206 .. code-block:: console
207
208 $ sudo ifdown eth1
209 $ sudo ifdown eth2
210
211- Modify ``/etc/network/interfaces``
212
213 .. code-block:: text
214
215 auto bond0
216 iface bond0 inet dhcp
217 bond-mode balance-xor
218 bond-xmit_hash_policy layer2+3
219 bond-slaves none
220
221 auto eth1
222 iface eth1 inet manual
223 bond-master bond0
224
225 auto eth2
226 iface eth2 inet manual
227 bond-master bond0
228
229
230- Start interfaces
231
232 .. code-block:: console
233
234 $ sudo ifup bond0
235 $ sudo ifup eth1
236 $ sudo ifup eth2
237
238- Useful command to check bonding status
239
240 .. code-block:: console
241
242 # cat /proc/net/bonding/bond0
243 Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
244
245 Bonding Mode: load balancing (xor)
246 Transmit Hash Policy: layer2+3 (2)
247 MII Status: up
248 MII Polling Interval (ms): 0
249 Up Delay (ms): 0
250 Down Delay (ms): 0
251
252 Slave Interface: eth1
253 MII Status: up
254 Speed: 1000 Mbps
255 Duplex: full
256 Link Failure Count: 0
257 Permanent HW addr: 00:1c:42:5b:07:6a
258 Slave queue ID: 0
259
260 Slave Interface: eth2
261 MII Status: up
262 Speed: Unknown
263 Duplex: Unknown
264 Link Failure Count: 0
265 Permanent HW addr: 00:1c:42:1c:a1:7c
266 Slave queue ID: 0
267
268.. caution::
269 **Dual-homed host should not be statically configured.**
270
271 Currently in ONOS, configured hosts are not updated when the ``connectPoint``
272 is lost. This is not a problem with single-homed hosts because there is no
273 other way to reach them anyway if their ``connectPoint`` goes down. But in
274 dual-homed scenarios, the controller should take corrective action if one
275 of the ``connectPoint`` go down – the trigger for this event does not happen
276 when the dual-homed host's connect points are configured (not discovered).
277
278.. note::
279 We also support static routes with dual-homed next hop. The way to
280 configure it is exactly the same as regular single-homed next hop, as
281 described in :doc:`External Connectivity <external-connectivity>`.
282
283 ONOS will automatically recognize when the next-hop IP resolves to a
284 dual-homed host and program both switches (the host connects to)
285 accordingly.
286
287 The failure recovery mechanism for dual-homed hosts also applies to static
288 routes that point to the host as their next hop.
289
290Dual External Routers
291---------------------
292
293.. image:: ../../images/config-dh-vr.png
294
295.. image:: ../../images/config-dh-vr-logical.png
296 :width: 200px
297
298In addition to what we describe in :doc:`External Connectivity
299<external-connectivity>`, SD-Fabric also supports dual external routers, which
300view the SD-Fabric as 2 individual routers, as shown above.
301
302As before the vRouter control plane is implemented as a combination of Quagga,
303which peers with the upstream routers, and ONOS which listens to Quagga (via
304FPM) and programs the underlying fabric. **In dual-router scenarios, there are
305two instances of Quagga required**.
306
307As before the hardware fabric serves as the data-plane of vRouter. In
308dual-router scenarios, the **external routers MUST be connected to
309paired-ToRs**.
310
311ToR connects to one upstream
312^^^^^^^^^^^^^^^^^^^^^^^^^^^^
313
314Lets consider the simpler case where the external routers are each connected to
315a single leaf in a ToR pair. The figure on the left below shows the logical
316view. The figure on the right shows the physical connectivity.
317
318.. image:: ../../images/config-dh-vr-logical-simple.png
319 :width: 200px
320
321.. image:: ../../images/config-dh-vr-physical-simple.png
322 :width: 400px
323
324One of the upstream routers is connected to ``of:205`` and the other is
325connected to ``of:206``. Note that ``of:205`` and ``of:206`` are paired ToRs.
326
327The ToRs are connected via a physical port to separate Quagga VMs or
328containers. These Quagga instances can be placed in any compute node. They do
329not need to be in the same server, and are only shown to be co-located for
330simplicity.
331
332The two Quagga instances do NOT talk to each other.
333
334Switch port configuration
335"""""""""""""""""""""""""
336
337The ToRs follow the same rules as single router case described in
338:doc:`External Connectivity <external-connectivity>`. In the example shown
339above, the switch port config would look like this:
340
341.. code-block:: json
342
343 {
344 "ports": {
345 "of:0000000000000205/1" : {
346 "interfaces" : [{
347 "ips" : [ "10.0.100.3/29", "2000::6403/125" ],
348 "vlan-untagged": 100,
349 "name" : "internet-router-1"
350 }]
351 },
352
353 "of:0000000000000205/48" : {
354 "interfaces" : [{
355 "ips" : [ "10.0.100.3/29", "2000::6403/125" ],
356 "vlan-untagged": 100,
357 "name" : "quagga-1"
358 }]
359 },
360
361 "of:0000000000000206/1" : {
362 "interfaces" : [{
363 "ips" : [ "10.0.200.3/29", "2000::6503/125" ],
364 "vlan-untagged": 200,
365 "name" : "internet-router-2"
366 }]
367 },
368
369 "of:0000000000000206/48" : {
370 "interfaces" : [{
371 "ips" : [ "10.0.200.3/29", "2000::6503/125" ],
372 "vlan-untagged": 200,
373 "name" : "quagga2"
374 }]
375 }
376 }
377 }
378
379.. note::
380 In the example shown above, switch ``of:205`` uses ``VLAN 100`` for
381 bridging the peering session between Quagga1 and ExtRouter1, while switch
382 ``of:205`` uses ``VLAN 200`` to do the same for the other peering session.
383 But since these VLANs and bridging domains are defined on different
384 switches, the VLAN ids could have been the same.
385
386 This philosophy is consistent with the fabric use of :doc:`bridging
387 <../../configuration/network>`.
388
389
390Quagga configuration
391""""""""""""""""""""
392Configuring Quagga for dual external routers are similar to what we described
393in :doc:`External Connectivity <external-connectivity>`. However, it is worth
394noting that:
395
396- The two Zebra instances **should point to two different ONOS instances** for
397 their FPM connections. For example Zebra in Quagga1 could point to ONOS
398 instance with ``fpm connection ip 10.6.0.1 port 2620``, while the other Zebra
399 should point to a different ONOS instance with ``fpm connection ip 10.6.0.2
400 port 2620``. It does not matter which ONOS instances they point to as long
401 as they are different.
402
403- The two Quagga BGP sessions should appear to come from different routers but
404 still use the same AS number – i.e. the two Quaggas' belong to the same AS,
405 the one used to represent the entire SD-Fabric.
406
407- The two upstream routers can belong to the same or different AS, but these AS
408 numbers should be different from the one used to represent the SD-Fabric AS.
409
410- Typically both Quagga instances advertise the same routes to the upstream.
411 These prefixes belonging to various infrastructure nodes in the deployment
412 should be reachable from either of the leaf switches connected to the
413 upstream routers.
414
415- The upstream routers may or may not advertise the same routes. SD-Fabric will
416 ensure that traffic directed to a route reachable only one upstream router is
417 directed to the appropriate leaf.
418
419ToR connects to both upstream
420^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
421
422Now lets consider the **more-complicated but more fault-tolerant** case of each
423Quagga instance peering with BOTH external routers. Again the logical view is
424shown on the left and the physical view on the right.
425
426.. image:: ../../images/config-dh-vr-logical.png
427 :width: 200px
428
429.. image:: ../../images/config-dh-vr-physical.png
430 :width: 500px
431
432First lets talk about the physical connectivity
433
434- Quagga instance 1 peers with external router R1 via port 1 on switch of:205
435- Quagga instance 1 peers with external router R2 via port 2 on switch of:205
436
437Similarly
438
439- Quagga instance 2 peers with external router R1 via port 2 on switch of:206
440- Quagga instance 2 peers with external router R2 via port 1 on switch of:206
441
442To distinguish between the two peering sessions in the same physical switch,
443say of:205, the physical ports 1 and 2 need to be configured in **different
444VLANs and subnets**. For example, port 1 on of:205 is (untagged) in VLAN 100,
445while port 2 is in VLAN 101.
446
447Note that peering for **Quagga1 and R1** happens with IPs in the
448``10.0.100.0/29`` subnet, and for **Quagga 1 and R2** in the **10.0.101.0/29**
449subnet.
450
451Furthermore, **pair link** (port 48) on of:205 carries both peering sessions to
452Quagga1. Thus port 48 should now be configured as a **trunk port (vlan-tagged)
453with both VLANs and both subnets**.
454
455Finally the **Quagga interface** on the VM now needs **sub-interface
456configuration for each VLAN ID**.
457
458Similar configuration concepts apply to IPv6 as well. Here is a look at the
459switch port config in ONOS for of:205
460
461.. code-block:: json
462
463 {
464 "ports": {
465 "of:0000000000000205/1" : {
466 "interfaces" : [{
467 "ips" : [ "10.0.100.3/29", "2000::6403/125" ],
468 "vlan-untagged": 100,
469 "name" : "internet-router1"
470 }]
471 },
472
473
474 "of:0000000000000205/2" : {
475 "interfaces" : [{
476 "ips" : [ "10.0.101.3/29", "2000::7403/125" ],
477 "vlan-untagged": 101,
478 "name" : "internet-router2"
479 }]
480 },
481 "of:0000000000000205/48" : {
482 "interfaces" : [{
483 "ips" : [ "10.0.100.3/29", "2000::6403/125", "10.0.101.3/29", "2000::7403/125" ],
484 "vlan-tagged": [100, 101],
485 "name" : "quagga1"
486 }]
487
488 }
489 }
490 }