blob: 74e64968ae5d8aaa48e3eab20bcdfd8a1d41ead6 [file] [log] [blame]
Wailok Shumbb7408b2021-09-30 22:41:32 +08001Dual Homing
2===========
3
4Overview
5--------
6
7.. image:: ../../images/config-dh.png
8
9The dual-homing feature includes several sub components
10
11- **Use of "paired" ToRs**: Each rack of compute nodes have exactly two
12 Top-of-Rack switches (ToRs), that are linked to each other via a single link
13 - such a link is referred to as a **pair link**. This pairing should NOT be
14 omitted.
15
16 Currently there is support for only a single link between paired ToRs. In
17 future releases, we may include dual pair links. Note that the pair link is
18 only used in failure scenarios, and not in normal operation.
19
20- **Dual-homed servers (compute-nodes)**: Each server is connected to both
21 ToRs. The links to the paired ToRs are (Linux) bonded
22
23- **Dual-homed upstream routers**: The upstream routers MUST be connected to
24 the two ToRs that are part of a leaf-pair. You cannot connect them to leafs
25 that are not paired. This feature also requires two Quagga instances.
26
27- **Dual-homed access devices**. This component will be added in the future.
28
29Paired ToRs
30-----------
31The reasoning behind two ToR (leaf) switches is simple. If you only have a
32single ToR switch, and you lose it, the entire rack goes down. Using two ToR
33switches increases your odds for continued connectivity for dual homed servers.
34The reasoning behind pairing the two ToR switches is more involved, as is
35explained in the Usage section below.
36
37Configure pair ToRs
38^^^^^^^^^^^^^^^^^^^
39Configuring paired-ToRs involves device configuration. Assume switches of:205
40and of:206 are paired ToRs.
41
42.. code-block:: json
43
44 {
45 "devices" : {
46 "of:0000000000000205" : {
47 "segmentrouting" : {
48 "name" : "Leaf1-R2",
49 "ipv4NodeSid" : 205,
50 "ipv4Loopback" : "192.168.0.205",
51 "ipv6NodeSid" : 205,
52 "ipv6Loopback" : "2000::c0a8:0205",
53 "routerMac" : "00:00:02:05:00:01",
54 "pairDeviceId" : "of:0000000000000206",
55 "pairLocalPort" : 20,
56 "isEdgeRouter" : true,
57 "adjacencySids" : []
58 }
59 },
60 "of:0000000000000206" : {
61 "segmentrouting" : {
62 "name" : "Leaf2-R2",
63 "ipv4NodeSid" : 206,
64 "ipv4Loopback" : "192.168.0.206",
65 "ipv6NodeSid" : 206,
66 "ipv6Loopback" : "2000::c0a8:0206",
67 "routerMac" : "00:00:02:05:00:01",
68 "pairDeviceId" : "of:0000000000000205",
69 "pairLocalPort" : 30,
70 "isEdgeRouter" : true,
71 "adjacencySids" : []
72 }
73 }
74 }
75 }
76
77There are two new pieces of device configuration.
78
79Each device in the ToR pair needs to specify the **deviceId of the leaf it is
80paired to**, in the ``pairDeviceId`` field. For example, in ``of:205``
81configuration the ``pairDeviceId`` is specified as ``of:206``, and similarly in ``of:206``
82configuration the ``pairDeviceId`` is ``of:205``. Each device in the ToR pair needs to
83specify the **port on the device used for the pair link** in the
84``pairLocalPort`` field. For example, the pair link in the config above show
85that port 20 on of:205 is connected to port 30 on of:206.
86
87In addition, there is one crucial piece of config that needs to **match for
88both ToRs** – the ``routerMac`` address. The paired-ToRs MUST have the same
89``routerMac`` - in the example above, they both have identical 00:00:02:05:00:01
90``routerMac``.
91
92All other fields are the same as before, as explained in :doc:`Device
93Configuration <../../configuration/network>` section.
94
95
96Usage of pair link
97^^^^^^^^^^^^^^^^^^
98
99.. image:: ../../images/config-dh-pair-link.png
100
101
102Dual-Homed Servers
103------------------
104
105There are a number of things to note when connecting dual-homed servers to paired-ToRs.
106
107- The switch ports on the two ToRs have to be configured the same way, when
108 connecting a dual-homed server to the two ToRs.
109
110- The server ports have to be Linux-bonded in a particular mode.
111
112Configure Switch Ports
113^^^^^^^^^^^^^^^^^^^^^^
114
115The way to configure ports are similar as described in :doc:`Bridging and
116Unicast <../../configuration/network>`. However, there are a couple of things to note.
117
118**First**, dual-homed servers should have the **identical configuration on each
119switch port they connect to on the ToR pairs**. The example below shows that
120the ``vlans`` and ``ips`` configured are the same on both switch ports
121``of:205/12`` and ``of:206/29``. They are both configured to be access ports
122in ``VLAN 20``, the subnet ``10.0.2.0/24`` is assigned to these ports, and the
123gateway-IP is ``10.0.2.254/32``.
124
125.. code-block:: json
126
127 {
128 "ports" : {
129 "of:0000000000000205/12" : {
130 "interfaces" : [{
131 "name" : "h3-intf-1",
132 "ips" : [ "10.0.2.254/24"],
133 "vlan-untagged": 20
134 }]
135 },
136 "of:0000000000000206/29" : {
137 "interfaces" : [{
138 "name" : "h3-intf-2",
139 "ips" : [ "10.0.2.254/24"],
140 "vlan-untagged": 20
141 }]
142 }
143 }
144 }
145
146It is worth noting the meaning behind the configuration above from a routing
147perspective. Simply put, by configuring the same subnets on these switch
148ports, the fabric now believes that the entire subnet ``10.0.2.0/24`` is
149reachable by BOTH ToR switches ``of:205`` and ``of:206``.
150
151.. caution::
152 Configuring different VLANs, or different subnets, or mismatches like
153 ``vlan-untagged`` in one switch port and ``vlan-tagged`` in the corresponding
154 switch port facing the dual-homed server, will result in incorrect
155 behavior.
156
157**Second**, we need to configure the **pair link ports on both ToR switches to
158be trunk (``vlan-tagged``) ports that contains all dual-homed VLANs and subnets**.
159This is an extra piece of configuration, the need for which will be removed in
160future releases. In the example above, a dual-homed server connects to the ToR
161pair on port 12 on of:205 and port 29 on of:206. Assume that the pair link
162between the two ToRs is connected to port 5 of both of:205 and of:206. The
163config for these switch ports is shown below:
164
165.. code-block:: json
166
167 {
168 "ports": {
169 "of:0000000000000205/5" : {
170 "interfaces" : [{
171 "name" : "205-pair-port",
172 "ips" : [ "10.0.2.254/24"],
173 "vlan-tagged": [20]
174 }]
175 },
176 "of:0000000000000206/5" : {
177 "interfaces" : [{
178 "name" : "206-pair-port",
179 "ips" : [ "10.0.2.254/24"],
180 "vlan-tagged": [20]
181 }]
182 }
183 }
184 }
185
186.. note::
187 Even though the ports ``of:205/12`` and ``of:206/29`` facing the dual-homed
188 server are configured as ``vlan-untagged``, the same VLAN MUST be
189 configured as ``vlan-tagged`` on the pair-ports.
190
191 If additional subnets and VLANs are configured facing other dual-homed
192 servers, they need to be similarly added to the ``ips`` and ``vlan-tagged``
193 arrays in the pair port config.
194
195
196Configure Servers
197^^^^^^^^^^^^^^^^^
198
199Assuming the interfaces we are going to use for bonding are ``eth1`` and ``eth2``.
200
201- Bring down interfaces
202
203 .. code-block:: console
204
205 $ sudo ifdown eth1
206 $ sudo ifdown eth2
207
208- Modify ``/etc/network/interfaces``
209
210 .. code-block:: text
211
212 auto bond0
213 iface bond0 inet dhcp
214 bond-mode balance-xor
215 bond-xmit_hash_policy layer2+3
216 bond-slaves none
217
218 auto eth1
219 iface eth1 inet manual
220 bond-master bond0
221
222 auto eth2
223 iface eth2 inet manual
224 bond-master bond0
225
226
227- Start interfaces
228
229 .. code-block:: console
230
231 $ sudo ifup bond0
232 $ sudo ifup eth1
233 $ sudo ifup eth2
234
235- Useful command to check bonding status
236
237 .. code-block:: console
238
239 # cat /proc/net/bonding/bond0
240 Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
241
242 Bonding Mode: load balancing (xor)
243 Transmit Hash Policy: layer2+3 (2)
244 MII Status: up
245 MII Polling Interval (ms): 0
246 Up Delay (ms): 0
247 Down Delay (ms): 0
248
249 Slave Interface: eth1
250 MII Status: up
251 Speed: 1000 Mbps
252 Duplex: full
253 Link Failure Count: 0
254 Permanent HW addr: 00:1c:42:5b:07:6a
255 Slave queue ID: 0
256
257 Slave Interface: eth2
258 MII Status: up
259 Speed: Unknown
260 Duplex: Unknown
261 Link Failure Count: 0
262 Permanent HW addr: 00:1c:42:1c:a1:7c
263 Slave queue ID: 0
264
265.. caution::
266 **Dual-homed host should not be statically configured.**
267
268 Currently in ONOS, configured hosts are not updated when the ``connectPoint``
269 is lost. This is not a problem with single-homed hosts because there is no
270 other way to reach them anyway if their ``connectPoint`` goes down. But in
271 dual-homed scenarios, the controller should take corrective action if one
272 of the ``connectPoint`` go down – the trigger for this event does not happen
273 when the dual-homed host's connect points are configured (not discovered).
274
275.. note::
276 We also support static routes with dual-homed next hop. The way to
277 configure it is exactly the same as regular single-homed next hop, as
278 described in :doc:`External Connectivity <external-connectivity>`.
279
280 ONOS will automatically recognize when the next-hop IP resolves to a
281 dual-homed host and program both switches (the host connects to)
282 accordingly.
283
284 The failure recovery mechanism for dual-homed hosts also applies to static
285 routes that point to the host as their next hop.
286
287Dual External Routers
288---------------------
289
290.. image:: ../../images/config-dh-vr.png
291
292.. image:: ../../images/config-dh-vr-logical.png
293 :width: 200px
294
295In addition to what we describe in :doc:`External Connectivity
296<external-connectivity>`, SD-Fabric also supports dual external routers, which
297view the SD-Fabric as 2 individual routers, as shown above.
298
299As before the vRouter control plane is implemented as a combination of Quagga,
300which peers with the upstream routers, and ONOS which listens to Quagga (via
301FPM) and programs the underlying fabric. **In dual-router scenarios, there are
302two instances of Quagga required**.
303
304As before the hardware fabric serves as the data-plane of vRouter. In
305dual-router scenarios, the **external routers MUST be connected to
306paired-ToRs**.
307
308ToR connects to one upstream
309^^^^^^^^^^^^^^^^^^^^^^^^^^^^
310
311Lets consider the simpler case where the external routers are each connected to
312a single leaf in a ToR pair. The figure on the left below shows the logical
313view. The figure on the right shows the physical connectivity.
314
315.. image:: ../../images/config-dh-vr-logical-simple.png
316 :width: 200px
317
318.. image:: ../../images/config-dh-vr-physical-simple.png
319 :width: 400px
320
321One of the upstream routers is connected to ``of:205`` and the other is
322connected to ``of:206``. Note that ``of:205`` and ``of:206`` are paired ToRs.
323
324The ToRs are connected via a physical port to separate Quagga VMs or
325containers. These Quagga instances can be placed in any compute node. They do
326not need to be in the same server, and are only shown to be co-located for
327simplicity.
328
329The two Quagga instances do NOT talk to each other.
330
331Switch port configuration
332"""""""""""""""""""""""""
333
334The ToRs follow the same rules as single router case described in
335:doc:`External Connectivity <external-connectivity>`. In the example shown
336above, the switch port config would look like this:
337
338.. code-block:: json
339
340 {
341 "ports": {
342 "of:0000000000000205/1" : {
343 "interfaces" : [{
344 "ips" : [ "10.0.100.3/29", "2000::6403/125" ],
345 "vlan-untagged": 100,
346 "name" : "internet-router-1"
347 }]
348 },
349
350 "of:0000000000000205/48" : {
351 "interfaces" : [{
352 "ips" : [ "10.0.100.3/29", "2000::6403/125" ],
353 "vlan-untagged": 100,
354 "name" : "quagga-1"
355 }]
356 },
357
358 "of:0000000000000206/1" : {
359 "interfaces" : [{
360 "ips" : [ "10.0.200.3/29", "2000::6503/125" ],
361 "vlan-untagged": 200,
362 "name" : "internet-router-2"
363 }]
364 },
365
366 "of:0000000000000206/48" : {
367 "interfaces" : [{
368 "ips" : [ "10.0.200.3/29", "2000::6503/125" ],
369 "vlan-untagged": 200,
370 "name" : "quagga2"
371 }]
372 }
373 }
374 }
375
376.. note::
377 In the example shown above, switch ``of:205`` uses ``VLAN 100`` for
378 bridging the peering session between Quagga1 and ExtRouter1, while switch
379 ``of:205`` uses ``VLAN 200`` to do the same for the other peering session.
380 But since these VLANs and bridging domains are defined on different
381 switches, the VLAN ids could have been the same.
382
383 This philosophy is consistent with the fabric use of :doc:`bridging
384 <../../configuration/network>`.
385
386
387Quagga configuration
388""""""""""""""""""""
389Configuring Quagga for dual external routers are similar to what we described
390in :doc:`External Connectivity <external-connectivity>`. However, it is worth
391noting that:
392
393- The two Zebra instances **should point to two different ONOS instances** for
394 their FPM connections. For example Zebra in Quagga1 could point to ONOS
395 instance with ``fpm connection ip 10.6.0.1 port 2620``, while the other Zebra
396 should point to a different ONOS instance with ``fpm connection ip 10.6.0.2
397 port 2620``. It does not matter which ONOS instances they point to as long
398 as they are different.
399
400- The two Quagga BGP sessions should appear to come from different routers but
401 still use the same AS number – i.e. the two Quaggas' belong to the same AS,
402 the one used to represent the entire SD-Fabric.
403
404- The two upstream routers can belong to the same or different AS, but these AS
405 numbers should be different from the one used to represent the SD-Fabric AS.
406
407- Typically both Quagga instances advertise the same routes to the upstream.
408 These prefixes belonging to various infrastructure nodes in the deployment
409 should be reachable from either of the leaf switches connected to the
410 upstream routers.
411
412- The upstream routers may or may not advertise the same routes. SD-Fabric will
413 ensure that traffic directed to a route reachable only one upstream router is
414 directed to the appropriate leaf.
415
416ToR connects to both upstream
417^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
418
419Now lets consider the **more-complicated but more fault-tolerant** case of each
420Quagga instance peering with BOTH external routers. Again the logical view is
421shown on the left and the physical view on the right.
422
423.. image:: ../../images/config-dh-vr-logical.png
424 :width: 200px
425
426.. image:: ../../images/config-dh-vr-physical.png
427 :width: 500px
428
429First lets talk about the physical connectivity
430
431- Quagga instance 1 peers with external router R1 via port 1 on switch of:205
432- Quagga instance 1 peers with external router R2 via port 2 on switch of:205
433
434Similarly
435
436- Quagga instance 2 peers with external router R1 via port 2 on switch of:206
437- Quagga instance 2 peers with external router R2 via port 1 on switch of:206
438
439To distinguish between the two peering sessions in the same physical switch,
440say of:205, the physical ports 1 and 2 need to be configured in **different
441VLANs and subnets**. For example, port 1 on of:205 is (untagged) in VLAN 100,
442while port 2 is in VLAN 101.
443
444Note that peering for **Quagga1 and R1** happens with IPs in the
445``10.0.100.0/29`` subnet, and for **Quagga 1 and R2** in the **10.0.101.0/29**
446subnet.
447
448Furthermore, **pair link** (port 48) on of:205 carries both peering sessions to
449Quagga1. Thus port 48 should now be configured as a **trunk port (vlan-tagged)
450with both VLANs and both subnets**.
451
452Finally the **Quagga interface** on the VM now needs **sub-interface
453configuration for each VLAN ID**.
454
455Similar configuration concepts apply to IPv6 as well. Here is a look at the
456switch port config in ONOS for of:205
457
458.. code-block:: json
459
460 {
461 "ports": {
462 "of:0000000000000205/1" : {
463 "interfaces" : [{
464 "ips" : [ "10.0.100.3/29", "2000::6403/125" ],
465 "vlan-untagged": 100,
466 "name" : "internet-router1"
467 }]
468 },
469
470
471 "of:0000000000000205/2" : {
472 "interfaces" : [{
473 "ips" : [ "10.0.101.3/29", "2000::7403/125" ],
474 "vlan-untagged": 101,
475 "name" : "internet-router2"
476 }]
477 },
478 "of:0000000000000205/48" : {
479 "interfaces" : [{
480 "ips" : [ "10.0.100.3/29", "2000::6403/125", "10.0.101.3/29", "2000::7403/125" ],
481 "vlan-tagged": [100, 101],
482 "name" : "quagga1"
483 }]
484
485 }
486 }
487 }