Charles Chan | fcfe890 | 2022-02-02 17:06:27 -0800 | [diff] [blame] | 1 | .. SPDX-FileCopyrightText: 2021 Open Networking Foundation <info@opennetworking.org> |
| 2 | .. SPDX-License-Identifier: Apache-2.0 |
Hung-Wei Chiu | e49ef3e | 2021-10-04 14:13:36 -0700 | [diff] [blame] | 3 | |
Carmelo Cascone | 4398998 | 2021-10-12 00:01:19 -0700 | [diff] [blame] | 4 | .. _deployment_guide: |
| 5 | |
Charles Chan | caebcf3 | 2021-09-20 22:17:52 -0700 | [diff] [blame] | 6 | Deployment Guide |
| 7 | ================ |
Hung-Wei Chiu | e49ef3e | 2021-10-04 14:13:36 -0700 | [diff] [blame] | 8 | |
Charles Chan | b732368 | 2022-03-02 12:33:15 -0800 | [diff] [blame] | 9 | Switch Hardware Selection |
| 10 | ------------------------- |
| 11 | We have verified and therefore recommend using the switch model listed in :ref:`verified_switch`. |
| 12 | Other Stratum-enabled switches listed in :ref:`all_switch` should also work in theory |
| 13 | but more integration work may be required. |
| 14 | |
| 15 | To use the P4 UPF, you must use fabric switches based on the `Intel (formerly Barefoot) Tofino chipset |
| 16 | <https://www.intel.com/content/www/us/en/products/network-io/programmable-ethernet-switch/tofino-series.html>`_. |
| 17 | There are two variants of this switching chipset, with different resources and capabilities. |
| 18 | The **Dual Pipe** Tofino ASIC is less expensive, |
| 19 | while the **Quad Pipe** Tofino ASIC has more chip resources and a faster embedded system with more memory and storage. |
| 20 | |
| 21 | The P4 UPF and SD-Fabric features run within the constraints of the Dual Pipe |
| 22 | system for production deployments, but for development of features in P4, the |
| 23 | larger capacity of the Quad Pipe is desirable. |
| 24 | |
| 25 | These switches feature 32 QSFP+ ports capable of running in 100GbE, 40GbE, or |
| 26 | 4x 10GbE mode (using a split DAC or fiber cable) and have a 1GbE management |
| 27 | network interface. |
| 28 | |
| 29 | See also the :ref:`Rackmount of Equipment |
| 30 | <aether:edge_deployment/site_planning:rackmount of equipment>` for how the Fabric |
| 31 | switches should be rack-mounted to ensure proper airflow within a rack. |
| 32 | |
Charles Chan | 2caff7b | 2021-10-11 20:25:16 -0700 | [diff] [blame] | 33 | Deployment Overview |
| 34 | ------------------- |
| 35 | SD-Fabric is released with Helm chart and container images. |
| 36 | We recommend using **Kubernetes** and **Helm** to deploy SD-Fabric. |
| 37 | Here's a list of high level steps required to deploy SD-Fabric: |
Hung-Wei Chiu | e49ef3e | 2021-10-04 14:13:36 -0700 | [diff] [blame] | 38 | |
Charles Chan | 2caff7b | 2021-10-11 20:25:16 -0700 | [diff] [blame] | 39 | 1. **Provision switch** |
| 40 | |
| 41 | We first need to install operating system with Docker and Kubernetes on the bare-metal switches. |
| 42 | |
| 43 | 2. **Prepare switches as special Kubernetes nodes** |
| 44 | |
| 45 | Kubernetes ``label`` and ``taint`` are used to configure switches as special Kubernetes worker nodes. |
| 46 | This is to make sure we deploy Stratum (and only Stratum) on switches. |
| 47 | |
Charles Chan | a937f77 | 2022-02-23 16:24:35 -0800 | [diff] [blame] | 48 | 3. **Prepare ONOS network configuration** |
Charles Chan | 2caff7b | 2021-10-11 20:25:16 -0700 | [diff] [blame] | 49 | |
| 50 | Network configuration defines properties such as switch pipeconf, subnet and VLAN. |
| 51 | |
Charles Chan | a937f77 | 2022-02-23 16:24:35 -0800 | [diff] [blame] | 52 | 4. **Prepare Stratum chassis configuration for each switch** |
Charles Chan | 2caff7b | 2021-10-11 20:25:16 -0700 | [diff] [blame] | 53 | |
| 54 | Chassis config defines switch properties such as port speed and breakout. |
| 55 | |
Charles Chan | a937f77 | 2022-02-23 16:24:35 -0800 | [diff] [blame] | 56 | 5. **Install SD-Fabric** using Helm |
Charles Chan | 2caff7b | 2021-10-11 20:25:16 -0700 | [diff] [blame] | 57 | |
| 58 | Finally, we are going to install SD-Fabric with the information we prepared in Step 1 to 5. |
| 59 | |
| 60 | Step 1: Provision Switches |
| 61 | -------------------------- |
Hung-Wei Chiu | e49ef3e | 2021-10-04 14:13:36 -0700 | [diff] [blame] | 62 | |
Charles Chan | b732368 | 2022-03-02 12:33:15 -0800 | [diff] [blame] | 63 | We follow Open Network Install Environment (ONIE) way to install Open Network Linux (ONL) image to switch. |
Hung-Wei Chiu | e49ef3e | 2021-10-04 14:13:36 -0700 | [diff] [blame] | 64 | To work with the SD-Fabric environment, we have customized the ONL image to support related packages and dependencies. |
| 65 | |
| 66 | Image source file can be found on ONF repository `opennetworkinglab/OpenNetworkLinux <https://github.com/opennetworkinglab/OpenNetworkLinux>`_. |
| 67 | You can also download pre-compiled artifacts from `Github Release page <https://github.com/opennetworkinglab/OpenNetworkLinux/releases>`_ |
| 68 | |
| 69 | |
| 70 | .. note:: |
| 71 | If you're not familiar with ONIE/ONL environment, please check `Getting Started <https://github.com/opencomputeproject/OpenNetworkLinux/blob/master/docs/GettingStarted.md>`_ to |
| 72 | see how to install the ONL image to an ONIE supported switch. |
| 73 | |
| 74 | Below is an example about how to install the ONL image. |
| 75 | |
| 76 | 1. Prepare a server which is accessible by the switch and then download the |
| 77 | pre-compiled installer from the release page. |
| 78 | |
| 79 | .. code-block:: |
| 80 | |
Charles Chan | b732368 | 2022-03-02 12:33:15 -0800 | [diff] [blame] | 81 | wget https://github.com/opennetworkinglab/OpenNetworkLinux/releases/download/v1.4.3/ONL-onf-ONLPv2_ONL-OS_2021-07-16.2159-5195444_AMD64_INSTALLED_INSTALLER -o onl-installer |
| 82 | sudo python -m http.server 80 |
Hung-Wei Chiu | e49ef3e | 2021-10-04 14:13:36 -0700 | [diff] [blame] | 83 | |
| 84 | 2. Reboot the switch to enter ONIE installation mode |
| 85 | |
Charles Chan | b732368 | 2022-03-02 12:33:15 -0800 | [diff] [blame] | 86 | In order to reinstall an ONL image, you must change the ONIE bootloader to |
| 87 | "Rescue Mode". |
Hung-Wei Chiu | e49ef3e | 2021-10-04 14:13:36 -0700 | [diff] [blame] | 88 | |
Charles Chan | b732368 | 2022-03-02 12:33:15 -0800 | [diff] [blame] | 89 | Once the switch is powered on, it should retrieve an IP address on the OpenBMC |
| 90 | interface with DHCP. Here we use ``10.0.0.131`` as an example. |
| 91 | OpenBMC uses these default credentials |
Hung-Wei Chiu | e49ef3e | 2021-10-04 14:13:36 -0700 | [diff] [blame] | 92 | |
Charles Chan | b732368 | 2022-03-02 12:33:15 -0800 | [diff] [blame] | 93 | .. code-block:: |
Hung-Wei Chiu | e49ef3e | 2021-10-04 14:13:36 -0700 | [diff] [blame] | 94 | |
Charles Chan | b732368 | 2022-03-02 12:33:15 -0800 | [diff] [blame] | 95 | username: root |
| 96 | password: 0penBmc |
| 97 | |
| 98 | Login to OpenBMC with SSH: |
| 99 | |
| 100 | .. code-block:: |
| 101 | |
| 102 | $ ssh root@10.0.0.131 |
| 103 | The authenticity of host '10.0.0.131 (10.0.0.131)' can't be established. |
| 104 | ECDSA key fingerprint is SHA256:... |
| 105 | Are you sure you want to continue connecting (yes/no)? yes |
| 106 | Warning: Permanently added '10.0.0.131' (ECDSA) to the list of known hosts. |
| 107 | root@10.0.0.131's password: |
| 108 | root@bmc:~# |
| 109 | |
| 110 | Using the Serial-over-LAN Console, enter ONL |
| 111 | |
| 112 | .. code-block:: |
| 113 | |
| 114 | root@bmc:~# /usr/local/bin/sol.sh |
| 115 | You are in SOL session. |
| 116 | Use ctrl-x to quit. |
| 117 | ----------------------- |
| 118 | |
| 119 | root@onl:~# |
| 120 | |
| 121 | .. note:: |
| 122 | |
| 123 | If `sol.sh` is unresponsive, please try to restart the mainboard with |
| 124 | |
| 125 | .. code-block:: |
| 126 | |
| 127 | root@onl:~# wedge_power.sh reset |
| 128 | |
| 129 | Change the boot mode to rescue mode and reboot |
| 130 | |
| 131 | .. code-block:: |
| 132 | |
| 133 | root@onl:~# onl-onie-boot-mode rescue |
| 134 | [1053033.768512] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null) |
| 135 | [1053033.936893] EXT4-fs (sda3): re-mounted. Opts: (null) |
| 136 | [1053033.996727] EXT4-fs (sda3): re-mounted. Opts: (null) |
| 137 | The system will boot into ONIE rescue mode at the next restart. |
| 138 | |
| 139 | root@onl:~# reboot |
| 140 | |
| 141 | At this point, ONL will go through it's shutdown sequence and ONIE will start. |
| 142 | If it does not start right away, press the Enter/Return key a few times - it |
| 143 | may show you a boot selection screen. Pick ``ONIE`` and ``Rescue`` if given a |
| 144 | choice. |
Hung-Wei Chiu | e49ef3e | 2021-10-04 14:13:36 -0700 | [diff] [blame] | 145 | |
| 146 | 3. Install ONL installer |
| 147 | |
Charles Chan | b732368 | 2022-03-02 12:33:15 -0800 | [diff] [blame] | 148 | Now that the switch is in Rescue mode |
Hung-Wei Chiu | e49ef3e | 2021-10-04 14:13:36 -0700 | [diff] [blame] | 149 | |
Charles Chan | b732368 | 2022-03-02 12:33:15 -0800 | [diff] [blame] | 150 | Then run the ``onie-nos-install`` command, with the URL of the management |
| 151 | server (here we use ``10.0.0.129`` as an example) on the management network segment |
Hung-Wei Chiu | e49ef3e | 2021-10-04 14:13:36 -0700 | [diff] [blame] | 152 | |
Charles Chan | b732368 | 2022-03-02 12:33:15 -0800 | [diff] [blame] | 153 | .. code-block:: |
| 154 | |
| 155 | ONIE:/ # onie-nos-install http://10.0.0.129/onie-installer |
| 156 | discover: Rescue mode detected. No discover stopped. |
| 157 | ONIE: Unable to find 'Serial Number' TLV in EEPROM data. |
| 158 | Info: Fetching http://10.0.0.129/onie-installer ... |
| 159 | Connecting to 10.0.0.129 (10.0.0.129:80) |
| 160 | installer 100% |*******************************| 322M 0:00:00 ETA |
| 161 | ONIE: Executing installer: http://10.0.0.129/onie-installer |
| 162 | installer: computing checksum of original archive |
| 163 | installer: checksum is OK |
| 164 | ... |
| 165 | |
| 166 | The installation will now start, and then ONL will boot culminating in |
| 167 | |
| 168 | .. code-block:: |
| 169 | |
| 170 | Open Network Linux OS ONL-wedge100bf-32qs, 2020-11-04.19:44-64100e9 |
| 171 | |
| 172 | localhost login: |
| 173 | |
| 174 | The default ONL login is:: |
| 175 | |
| 176 | username: root |
| 177 | password: onl |
| 178 | |
| 179 | If you login, you can verify that the switch is getting it's IP address via DHCP |
| 180 | |
| 181 | .. code-block:: |
| 182 | |
| 183 | root@localhost:~# ip addr |
| 184 | ... |
| 185 | 3: ma1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 |
| 186 | link/ether 00:90:fb:5c:e1:97 brd ff:ff:ff:ff:ff:ff |
| 187 | inet 10.0.0.130/25 brd 10.0.0.255 scope global ma1 |
| 188 | ... |
| 189 | |
| 190 | 4. (Optional) Setup switch IP and hostname after the installation if DHCP is not available |
| 191 | |
| 192 | .. warning:: |
| 193 | |
| 194 | Stop and return to :ref:`Post-ONL configuration <aether:edge_deployment/fabric_switch_bootstrap:post-onl configuration>` |
| 195 | and continue the remaining steps there if you came from Aether docs. |
| 196 | Otherwise, please continue the rest of the page here. |
Hung-Wei Chiu | e49ef3e | 2021-10-04 14:13:36 -0700 | [diff] [blame] | 197 | |
| 198 | |
Charles Chan | 2caff7b | 2021-10-11 20:25:16 -0700 | [diff] [blame] | 199 | Step 2: Configure switches as special Kubernetes nodes |
| 200 | ------------------------------------------------------ |
Hung-Wei Chiu | e49ef3e | 2021-10-04 14:13:36 -0700 | [diff] [blame] | 201 | |
| 202 | Our `ONL <https://github.com/opennetworkinglab/OpenNetworkLinux>`_ version |
| 203 | includes all packages required by running the Kubernetes on top of it. |
| 204 | Once the Kubernetes is ready, the `Stratum <https://opennetworking.org/stratum/>`_ application will be deployed to the switch to manage it. |
| 205 | |
| 206 | Unlike server, switch has less CPU and memory resources and we should avoid |
| 207 | deploying unnecessary workloads into switch. |
| 208 | Besides, the Stratum application should only be deployed to all switches. |
| 209 | |
| 210 | To achieve the above goals, please apply the resources to your Kubernetes cluster. |
| 211 | |
| 212 | 1. Set up Label to all switch node, e.g ``node-role.kubernetes.io=switch`` |
| 213 | 2. Set up Taint with ``NoSchedule`` to all switch node, e.g ``node-role.kubernetes.io=switch:NoSchedule`` |
| 214 | 3. Properly configure the ``NodeSelector`` and ``Toleration`` when deploying Stratum via DaemonSet |
| 215 | |
| 216 | Example of a five nodes Kubernetes cluster, two switches and three servers |
| 217 | |
| 218 | .. code-block:: |
| 219 | |
| 220 | ╰─$ kubectl get node -o custom-columns=NAME:.metadata.name,TAINT:.spec.taints |
| 221 | NAME TAINT |
| 222 | compute1 <none> |
| 223 | compute2 <none> |
| 224 | compute3 <none> |
| 225 | leaf1 [map[effect:NoSchedule key:node-role.kubernetes.io value:switch]] |
| 226 | leaf2 [map[effect:NoSchedule key:node-role.kubernetes.io value:switch]] |
Hung-Wei Chiu | b0232a1 | 2021-10-11 11:17:54 -0700 | [diff] [blame] | 227 | ╰─$ kubectl get nodes -lnode-role.kubernetes.io=switch |
Hung-Wei Chiu | e49ef3e | 2021-10-04 14:13:36 -0700 | [diff] [blame] | 228 | NAME STATUS ROLES AGE VERSION |
| 229 | leaf1 Ready worker 27d v1.18.8 |
| 230 | leaf2 Ready worker 27d v1.18.8 |
| 231 | |
Charles Chan | a937f77 | 2022-02-23 16:24:35 -0800 | [diff] [blame] | 232 | Step 3: Prepare ONOS network configuration |
Charles Chan | 2caff7b | 2021-10-11 20:25:16 -0700 | [diff] [blame] | 233 | ------------------------------------------ |
| 234 | See :ref:`onos_network_config` for instructions |
Hung-Wei Chiu | e49ef3e | 2021-10-04 14:13:36 -0700 | [diff] [blame] | 235 | |
Charles Chan | a937f77 | 2022-02-23 16:24:35 -0800 | [diff] [blame] | 236 | Step 4: Prepare Stratum chassis configuration |
Charles Chan | 2caff7b | 2021-10-11 20:25:16 -0700 | [diff] [blame] | 237 | --------------------------------------------- |
| 238 | See See :ref:`stratum_chassis_config` for instructions |
Hung-Wei Chiu | e49ef3e | 2021-10-04 14:13:36 -0700 | [diff] [blame] | 239 | |
Hung-Wei Chiu | b0232a1 | 2021-10-11 11:17:54 -0700 | [diff] [blame] | 240 | .. _install_sd_fabric: |
Hung-Wei Chiu | e49ef3e | 2021-10-04 14:13:36 -0700 | [diff] [blame] | 241 | |
Charles Chan | a937f77 | 2022-02-23 16:24:35 -0800 | [diff] [blame] | 242 | Step 5: Install SD-Fabric with Helm |
Charles Chan | 2caff7b | 2021-10-11 20:25:16 -0700 | [diff] [blame] | 243 | ----------------------------------- |
Hung-Wei Chiu | b0232a1 | 2021-10-11 11:17:54 -0700 | [diff] [blame] | 244 | |
| 245 | To install SD-Fabric into your Kubernetes cluster, follow instructions |
Charles Chan | 2caff7b | 2021-10-11 20:25:16 -0700 | [diff] [blame] | 246 | described on the `SD-Fabric Helm Chart README <https://gerrit.opencord.org/plugins/gitiles/sdfabric-helm-charts/+/HEAD/sdfabric/README.md>`_ |