blob: 5b0d895a67c4eb6a857fe701bb60a3d6d03b696c [file] [log] [blame]
Hung-Wei Chiu77c969e2020-10-23 18:13:07 +00001..
2 SPDX-FileCopyrightText: © 2020 Open Networking Foundation <support@opennetworking.org>
3 SPDX-License-Identifier: Apache-2.0
4
Hung-Wei Chiu77c969e2020-10-23 18:13:07 +00005Bootstrapping
6=============
7
Zack Williams34c30e52020-11-16 10:55:00 -07008.. _switch-install:
9
Hyunsun Moon239df822020-11-23 21:40:28 -080010OS Installation - Switches
Zack Williams9026f532020-11-30 11:34:32 -070011--------------------------
12
13The installation of the ONL OS image on the fabric switches uses the DHCP and
14HTTP server set up on the management server.
15
16The default image is downloaded during that installation process by the
17``onieboot`` role. Make changes to that roll and rerun the management playbook
18to download a newer switch image.
19
20Preparation
21"""""""""""
22
23The switches have a single ethernet port that is shared between OpenBMC and
24ONL. Find out the MAC addresses for both of these ports and enter it into
25NetBox.
26
27Change boot mode to ONIE Rescue mode
28""""""""""""""""""""""""""""""""""""
29
30In order to reinstall an ONL image, you must change the ONIE bootloader to
31"Rescue Mode".
32
33Once the switch is powered on, it should retrieve an IP address on the OpenBMC
34interface with DHCP. OpenBMC uses these default credentials::
35
36 username: root
37 password: 0penBmc
38
39Login to OpenBMC with SSH::
40
41 $ ssh root@10.0.0.131
42 The authenticity of host '10.0.0.131 (10.0.0.131)' can't be established.
43 ECDSA key fingerprint is SHA256:...
44 Are you sure you want to continue connecting (yes/no)? yes
45 Warning: Permanently added '10.0.0.131' (ECDSA) to the list of known hosts.
46 root@10.0.0.131's password:
47 root@bmc:~#
48
49Using the Serial-over-LAN Console, enter ONL::
50
51 root@bmc:~# /usr/local/bin/sol.sh
52 You are in SOL session.
53 Use ctrl-x to quit.
54 -----------------------
55
56 root@onl:~#
Hyunsun Moon239df822020-11-23 21:40:28 -080057
58.. note::
Zack Williams9026f532020-11-30 11:34:32 -070059 If `sol.sh` is unresponsive, please try to restart the mainboard with::
Hyunsun Moon239df822020-11-23 21:40:28 -080060
Zack Williams9026f532020-11-30 11:34:32 -070061 root@onl:~# wedge_power.sh restart
Hyunsun Moon239df822020-11-23 21:40:28 -080062
Hyunsun Moon239df822020-11-23 21:40:28 -080063
Zack Williams9026f532020-11-30 11:34:32 -070064Change the boot mode to rescue mode with the command ``onl-onie-boot-mode
65rescue``, and reboot::
Hyunsun Moon239df822020-11-23 21:40:28 -080066
Zack Williams9026f532020-11-30 11:34:32 -070067 root@onl:~# onl-onie-boot-mode rescue
68 [1053033.768512] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)
69 [1053033.936893] EXT4-fs (sda3): re-mounted. Opts: (null)
70 [1053033.996727] EXT4-fs (sda3): re-mounted. Opts: (null)
71 The system will boot into ONIE rescue mode at the next restart.
72 root@onl:~# reboot
Hyunsun Moon239df822020-11-23 21:40:28 -080073
Zack Williams9026f532020-11-30 11:34:32 -070074At this point, ONL will go through it's shutdown sequence and ONIE will start.
75If it does not start right away, press the Enter/Return key a few times - it
76may show you a boot selection screen. Pick ``ONIE`` and ``Rescue`` if given a
77choice.
Hyunsun Moon239df822020-11-23 21:40:28 -080078
Zack Williams9026f532020-11-30 11:34:32 -070079Installing an ONL image over HTTP
80"""""""""""""""""""""""""""""""""
Hyunsun Moon239df822020-11-23 21:40:28 -080081
Zack Williams9026f532020-11-30 11:34:32 -070082Now that the switch is in Rescue mode
Hyunsun Moon239df822020-11-23 21:40:28 -080083
Zack Williams9026f532020-11-30 11:34:32 -070084First, activate the Console by pressing Enter::
Hyunsun Moon239df822020-11-23 21:40:28 -080085
Zack Williams9026f532020-11-30 11:34:32 -070086 discover: Rescue mode detected. Installer disabled.
Hyunsun Moon239df822020-11-23 21:40:28 -080087
Zack Williams9026f532020-11-30 11:34:32 -070088 Please press Enter to activate this console.
89 To check the install status inspect /var/log/onie.log.
90 Try this: tail -f /var/log/onie.log
Hyunsun Moon239df822020-11-23 21:40:28 -080091
Zack Williams9026f532020-11-30 11:34:32 -070092 ** Rescue Mode Enabled **
93 ONIE:/ #
Hyunsun Moon239df822020-11-23 21:40:28 -080094
Zack Williams9026f532020-11-30 11:34:32 -070095Then run the ``onie-nos-install`` command, with the URL of the management
96server on the management network segment::
Hyunsun Moon239df822020-11-23 21:40:28 -080097
Zack Williams9026f532020-11-30 11:34:32 -070098 ONIE:/ # onie-nos-install http://10.0.0.129/onie-installer
99 discover: Rescue mode detected. No discover stopped.
100 ONIE: Unable to find 'Serial Number' TLV in EEPROM data.
101 Info: Fetching http://10.0.0.129/onie-installer ...
102 Connecting to 10.0.0.129 (10.0.0.129:80)
103 installer 100% |*******************************| 322M 0:00:00 ETA
104 ONIE: Executing installer: http://10.0.0.129/onie-installer
105 installer: computing checksum of original archive
106 installer: checksum is OK
107 ...
Hyunsun Moon239df822020-11-23 21:40:28 -0800108
Zack Williams9026f532020-11-30 11:34:32 -0700109The installation will now start, and then ONL will boot culminating in::
Hyunsun Moon239df822020-11-23 21:40:28 -0800110
Zack Williams9026f532020-11-30 11:34:32 -0700111 Open Network Linux OS ONL-wedge100bf-32qs, 2020-11-04.19:44-64100e9
Hyunsun Moon239df822020-11-23 21:40:28 -0800112
Zack Williams9026f532020-11-30 11:34:32 -0700113 localhost login:
Hyunsun Moon239df822020-11-23 21:40:28 -0800114
Zack Williams9026f532020-11-30 11:34:32 -0700115The default ONL login is::
Hyunsun Moon239df822020-11-23 21:40:28 -0800116
Zack Williams9026f532020-11-30 11:34:32 -0700117 username: root
118 password: onl
Hyunsun Moon239df822020-11-23 21:40:28 -0800119
Zack Williams9026f532020-11-30 11:34:32 -0700120If you login, you can verify that the switch is getting it's IP address via
121DHCP::
Hyunsun Moon239df822020-11-23 21:40:28 -0800122
Zack Williams9026f532020-11-30 11:34:32 -0700123 root@localhost:~# ip addr
124 ...
125 3: ma1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
126 link/ether 00:90:fb:5c:e1:97 brd ff:ff:ff:ff:ff:ff
127 inet 10.0.0.130/25 brd 10.0.0.255 scope global ma1
128 ...
Hyunsun Moon239df822020-11-23 21:40:28 -0800129
Hyunsun Moon239df822020-11-23 21:40:28 -0800130
Zack Williams9026f532020-11-30 11:34:32 -0700131Post-ONL Configuration
132""""""""""""""""""""""
Hyunsun Moon239df822020-11-23 21:40:28 -0800133
Zack Williams9026f532020-11-30 11:34:32 -0700134A ``terraform`` user must be created on the switches to allow them to be
135configured.
Hyunsun Moon239df822020-11-23 21:40:28 -0800136
Zack Williams9026f532020-11-30 11:34:32 -0700137This is done using Ansible. Verify that your inventory (Created earlier from the
138``inventory/example-aether.ini`` file) includes an ``[aetherfabric]`` section
139that has all the names and IP addresses of the compute nodes in it.
Hyunsun Moon239df822020-11-23 21:40:28 -0800140
Zack Williams9026f532020-11-30 11:34:32 -0700141Then run a ping test::
Hyunsun Moon239df822020-11-23 21:40:28 -0800142
Zack Williams9026f532020-11-30 11:34:32 -0700143 ansible -i inventory/sitename.ini -m ping aetherfabric
Hyunsun Moon239df822020-11-23 21:40:28 -0800144
Zack Williams9026f532020-11-30 11:34:32 -0700145This may fail with the error::
Hyunsun Moon239df822020-11-23 21:40:28 -0800146
Zack Williams9026f532020-11-30 11:34:32 -0700147 "msg": "Using a SSH password instead of a key is not possible because Host Key checking is enabled and sshpass does not support this. Please add this host's fingerprint to your known_hosts file to manage this host."
Hyunsun Moon239df822020-11-23 21:40:28 -0800148
Zack Williams9026f532020-11-30 11:34:32 -0700149Comment out the ``ansible_ssh_pass="onl"`` line, then rerun the ping test. It
150may ask you about authorized keys - answer ``yes`` for each host to trust the
151keys::
Hyunsun Moon239df822020-11-23 21:40:28 -0800152
Zack Williams9026f532020-11-30 11:34:32 -0700153 The authenticity of host '10.0.0.138 (<no hostip for proxy command>)' can't be established.
154 ECDSA key fingerprint is SHA256:...
155 Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Hyunsun Moon239df822020-11-23 21:40:28 -0800156
Zack Williams9026f532020-11-30 11:34:32 -0700157Once you've trusted the host keys, the ping test should succeed::
158
159 spine1.role1.site | SUCCESS => {
160 "changed": false,
161 "ping": "pong"
162 }
163 leaf1.role1.site | SUCCESS => {
164 "changed": false,
165 "ping": "pong"
166 }
167 ...
168
169Then run the playbook to create the ``terraform`` user::
170
171 ansible-playbook -i inventory/sitename.ini playbooks/aetherfabric-playbook.yml
172
173Once completed, the switch should now be ready for TOST runtime install.
Hyunsun Moon239df822020-11-23 21:40:28 -0800174
Hyunsun Moona79c7422020-11-18 04:52:56 -0800175VPN
Zack Williams9026f532020-11-30 11:34:32 -0700176---
177
Hyunsun Moona79c7422020-11-18 04:52:56 -0800178This section walks you through how to set up a VPN between ACE and Aether Central in GCP.
179We will be using GitOps based Aether CD pipeline for this,
180so we just need to create a patch to **aether-pod-configs** repository.
181Note that some of the steps described here are not directly related to setting up a VPN,
182but rather are a prerequisite for adding a new ACE.
183
Hyunsun Moon5c1e0b02020-11-20 11:09:00 -0800184.. attention::
185
186 If you are adding another ACE to an existing VPN connection, go to
187 :ref:`Add ACE to an existing VPN connection <add_ace_to_vpn>`
188
Hyunsun Moona79c7422020-11-18 04:52:56 -0800189Before you begin
Zack Williams9026f532020-11-30 11:34:32 -0700190""""""""""""""""
191
Hyunsun Moona79c7422020-11-18 04:52:56 -0800192* Make sure firewall in front of ACE allows UDP port 500, UDP port 4500, and ESP packets
193 from **gcpvpn1.infra.aetherproject.net(35.242.47.15)** and **gcpvpn2.infra.aetherproject.net(34.104.68.78)**
194* Make sure that the external IP on ACE side is owned by or routed to the management node
195
196To help your understanding, the following sample ACE environment will be used in the rest of this section.
197Make sure to replace the sample values when you actually create a review request.
198
199+-----------------------------+----------------------------------+
200| Management node external IP | 128.105.144.189 |
201+-----------------------------+----------------------------------+
202| ASN | 65003 |
203+-----------------------------+----------------------------------+
204| GCP BGP IP address | Tunnel 1: 169.254.0.9/30 |
205| +----------------------------------+
206| | Tunnel 2: 169.254.1.9/30 |
207+-----------------------------+----------------------------------+
208| ACE BGP IP address | Tunnel 1: 169.254.0.10/30 |
209| +----------------------------------+
210| | Tunnel 2: 169.254.1.10/30 |
211+-----------------------------+----------------------------------+
212| PSK | UMAoZA7blv6gd3IaArDqgK2s0sDB8mlI |
213+-----------------------------+----------------------------------+
214| Management Subnet | 10.91.0.0/24 |
215+-----------------------------+----------------------------------+
216| K8S Subnet | Pod IP: 10.66.0.0/17 |
217| +----------------------------------+
218| | Cluster IP: 10.66.128.0/17 |
219+-----------------------------+----------------------------------+
220
Hyunsun Moona79c7422020-11-18 04:52:56 -0800221Download aether-pod-configs repository
Zack Williams9026f532020-11-30 11:34:32 -0700222""""""""""""""""""""""""""""""""""""""
223
Hyunsun Moona79c7422020-11-18 04:52:56 -0800224.. code-block:: shell
225
226 $ cd $WORKDIR
227 $ git clone "ssh://[username]@gerrit.opencord.org:29418/aether-pod-configs"
228
Hyunsun Moon0e080e42020-11-18 12:53:13 -0800229.. _update_global_resource:
230
Hyunsun Moona79c7422020-11-18 04:52:56 -0800231Update global resource maps
Zack Williams9026f532020-11-30 11:34:32 -0700232"""""""""""""""""""""""""""
233
Hyunsun Moona79c7422020-11-18 04:52:56 -0800234Add a new ACE information at the end of the following global resource maps.
235
236* user_map.tfvars
237* cluster_map.tfvars
238* vpn_map.tfvars
239
240As a note, you can find several other global resource maps under the `production` directory.
241Resource definitions that need to be shared among clusters or are better managed in a
242single file to avoid configuration conflicts are maintained in this way.
243
244.. code-block:: diff
245
246 $ cd $WORKDIR/aether-pod-configs/production
247 $ vi user_map.tfvars
248
249 # Add the new cluster admin user at the end of the map
250 $ git diff user_map.tfvars
251 --- a/production/user_map.tfvars
252 +++ b/production/user_map.tfvars
253 @@ user_map = {
254 username = "menlo"
255 password = "changeme"
256 global_roles = ["user-base", "catalogs-use"]
257 + },
258 + test_admin = {
259 + username = "test"
260 + password = "changeme"
261 + global_roles = ["user-base", "catalogs-use"]
262 }
263 }
264
265.. code-block:: diff
266
267 $ cd $WORKDIR/aether-pod-configs/production
268 $ vi cluster_map.tfvars
269
270 # Add the new K8S cluster information at the end of the map
271 $ git diff cluster_map.tfvars
272 --- a/production/cluster_map.tfvars
273 +++ b/production/cluster_map.tfvars
274 @@ cluster_map = {
275 kube_dns_cluster_ip = "10.53.128.10"
276 cluster_domain = "prd.menlo.aetherproject.net"
277 calico_ip_detect_method = "can-reach=www.google.com"
278 + },
279 + ace-test = {
280 + cluster_name = "ace-test"
281 + management_subnets = ["10.91.0.0/24"]
282 + k8s_version = "v1.18.8-rancher1-1"
283 + k8s_pod_range = "10.66.0.0/17"
284 + k8s_cluster_ip_range = "10.66.128.0/17"
285 + kube_dns_cluster_ip = "10.66.128.10"
286 + cluster_domain = "prd.test.aetherproject.net"
287 + calico_ip_detect_method = "can-reach=www.google.com"
288 }
289 }
290 }
291
292.. code-block:: diff
293
294 $ cd $WORKDIR/aether-pod-configs/production
295 $ vi vpn_map.tfvars
296
297 # Add VPN and tunnel information at the end of the map
298 $ git diff vpn_map.tfvars
299 --- a/production/vpn_map.tfvars
300 +++ b/production/vpn_map.tfvars
301 @@ vpn_map = {
302 bgp_peer_ip_address_1 = "169.254.0.6"
303 bgp_peer_ip_range_2 = "169.254.1.5/30"
304 bgp_peer_ip_address_2 = "169.254.1.6"
305 + },
306 + ace-test = {
307 + peer_name = "production-ace-test"
308 + peer_vpn_gateway_address = "128.105.144.189"
309 + tunnel_shared_secret = "UMAoZA7blv6gd3IaArDqgK2s0sDB8mlI"
310 + bgp_peer_asn = "65003"
311 + bgp_peer_ip_range_1 = "169.254.0.9/30"
312 + bgp_peer_ip_address_1 = "169.254.0.10"
313 + bgp_peer_ip_range_2 = "169.254.1.9/30"
314 + bgp_peer_ip_address_2 = "169.254.1.10"
315 }
316 }
317
318.. note::
319 Unless you have a specific requirement, set ASN and BGP addresses to the next available values in the map.
320
321
322Create ACE specific configurations
Zack Williams9026f532020-11-30 11:34:32 -0700323""""""""""""""""""""""""""""""""""
324
Hyunsun Moona79c7422020-11-18 04:52:56 -0800325In this step, we will create a directory under `production` with the same name as ACE,
326and add several Terraform configurations and Ansible inventory needed to configure a VPN connection.
327Throughout the deployment procedure, this directory will contain all ACE specific configurations.
328
329Run the following commands to auto-generate necessary files under the target ACE directory.
330
331.. code-block:: shell
332
333 $ cd $WORKDIR/aether-pod-configs/tools
Hyunsun Moon0e080e42020-11-18 12:53:13 -0800334 $ cp ace_env /tmp/ace_env
335 $ vi /tmp/ace_env
Hyunsun Moona79c7422020-11-18 04:52:56 -0800336 # Set environment variables
337
Hyunsun Moon0e080e42020-11-18 12:53:13 -0800338 $ source /tmp/ace_env
Hyunsun Moona79c7422020-11-18 04:52:56 -0800339 $ make vpn
340 Created ../production/ace-test
341 Created ../production/ace-test/main.tf
342 Created ../production/ace-test/variables.tf
343 Created ../production/ace-test/gcp_fw.tf
344 Created ../production/ace-test/gcp_ha_vpn.tf
345 Created ../production/ace-test/ansible
346 Created ../production/ace-test/backend.tf
347 Created ../production/ace-test/cluster_val.tfvars
348 Created ../production/ace-test/ansible/hosts.ini
349 Created ../production/ace-test/ansible/extra_vars.yml
350
351.. attention::
352 The predefined templates are tailored to Pronto BOM. You'll need to fix `cluster_val.tfvars` and `ansible/extra_vars.yml`
353 when using a different BOM.
354
355Create a review request
Zack Williams9026f532020-11-30 11:34:32 -0700356"""""""""""""""""""""""
357
Hyunsun Moona79c7422020-11-18 04:52:56 -0800358.. code-block:: shell
359
360 $ cd $WORKDIR/aether-pod-configs/production
361 $ git status
362 On branch tools
363 Changes not staged for commit:
364
365 modified: cluster_map.tfvars
366 modified: user_map.tfvars
367 modified: vpn_map.tfvars
368
369 Untracked files:
370 (use "git add <file>..." to include in what will be committed)
371
372 ace-test/
373
374 $ git add .
375 $ git commit -m "Add test ACE"
376 $ git review
377
378Once the review request is accepted and merged,
379CD pipeline will create VPN tunnels on both GCP and the management node.
380
381Verify VPN connection
Zack Williams9026f532020-11-30 11:34:32 -0700382"""""""""""""""""""""
383
Hyunsun Moona79c7422020-11-18 04:52:56 -0800384You can verify the VPN connections after successful post-merge job
385by checking the routing table on the management node and trying to ping to one of the central cluster VMs.
386Make sure two tunnel interfaces, `gcp_tunnel1` and `gcp_tunnel2`, exist
387and three additional routing entries via one of the tunnel interfaces.
388
389.. code-block:: shell
390
Hyunsun Moon5c1e0b02020-11-20 11:09:00 -0800391 # Verify routings
Hyunsun Moona79c7422020-11-18 04:52:56 -0800392 $ netstat -rn
393 Kernel IP routing table
394 Destination Gateway Genmask Flags MSS Window irtt Iface
395 0.0.0.0 128.105.144.1 0.0.0.0 UG 0 0 0 eno1
396 10.45.128.0 169.254.0.9 255.255.128.0 UG 0 0 0 gcp_tunnel1
397 10.52.128.0 169.254.0.9 255.255.128.0 UG 0 0 0 gcp_tunnel1
398 10.66.128.0 10.91.0.8 255.255.128.0 UG 0 0 0 eno1
399 10.91.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eno1
400 10.168.0.0 169.254.0.9 255.255.240.0 UG 0 0 0 gcp_tunnel1
401 128.105.144.0 0.0.0.0 255.255.252.0 U 0 0 0 eno1
402 169.254.0.8 0.0.0.0 255.255.255.252 U 0 0 0 gcp_tunnel1
403 169.254.1.8 0.0.0.0 255.255.255.252 U 0 0 0 gcp_tunnel2
404
Hyunsun Moon5c1e0b02020-11-20 11:09:00 -0800405 # Verify ACC VM access
406 $ ping 10.168.0.6
Hyunsun Moona79c7422020-11-18 04:52:56 -0800407
Hyunsun Moon5c1e0b02020-11-20 11:09:00 -0800408 # Verify ACC K8S cluster access
409 $ nslookup kube-dns.kube-system.svc.prd.acc.gcp.aetherproject.net 10.52.128.10
410
411You can further verify whether the ACE routes are propagated well to GCP
412by checking GCP dashboard **VPC Network > Routes > Dynamic**.
413
Hyunsun Moona79c7422020-11-18 04:52:56 -0800414
415Post VPN setup
Zack Williams9026f532020-11-30 11:34:32 -0700416""""""""""""""
417
Hyunsun Moona79c7422020-11-18 04:52:56 -0800418Once you verify the VPN connections, please update `ansible` directory name to `_ansible` to prevent
419the ansible playbook from running again.
420Note that it is no harm to re-run the ansible playbook but not recommended.
421
422.. code-block:: shell
423
424 $ cd $WORKDIR/aether-pod-configs/production/$ACE_NAME
425 $ mv ansible _ansible
426 $ git add .
427 $ git commit -m "Mark ansible done for test ACE"
428 $ git review
429
Hyunsun Moon5c1e0b02020-11-20 11:09:00 -0800430.. _add_ace_to_vpn:
431
432Add another ACE to an existing VPN connection
Zack Williams9026f532020-11-30 11:34:32 -0700433"""""""""""""""""""""""""""""""""""""""""""""
434
Hyunsun Moon5c1e0b02020-11-20 11:09:00 -0800435VPN connections can be shared when there are multiple ACE clusters in a site.
436In order to add ACE to an existing VPN connection,
437you'll have to SSH into the management node and manually update BIRD configuration.
438
439.. note::
440
441 This step needs improvements in the future.
442
443.. code-block:: shell
444
445 $ sudo vi /etc/bird/bird.conf
446 protocol static {
447 ...
448 route 10.66.128.0/17 via 10.91.0.10;
449
450 # Add routings for the new ACE's K8S cluster IP range via cluster nodes
451 # TODO: Configure iBGP peering with Calico nodes and dynamically learn these routings
452 route <NEW-ACE-CLUSTER-IP> via <SERVER1>
453 route <NEW-ACE-CLUSTER-IP> via <SERVER2>
454 route <NEW-ACE-CLUSTER-IP> via <SERVER3>
455 }
456
457 filter gcp_tunnel_out {
458 # Add the new ACE's K8S cluster IP range and the management subnet if required to the list
459 if (net ~ [ 10.91.0.0/24, 10.66.128.0/17, <NEW-ACE-CLUSTER-IP-RANGE> ]) then accept;
460 else reject;
461 }
462 # Save and exit
463
464 $ sudo birdc configure
465
466 # Confirm the static routes are added
467 $ sudo birdc show route
Hyunsun Moona79c7422020-11-18 04:52:56 -0800468