blob: bfac6b60d4b2e70cd25579143e4f362cb06543fb [file] [log] [blame]
Zack Williams794532a2021-03-18 17:38:36 -07001..
2 SPDX-FileCopyrightText: © 2020 Open Networking Foundation <support@opennetworking.org>
3 SPDX-License-Identifier: Apache-2.0
4
5Fabric Switch Bootstrap
6=======================
7
8The installation of the ONL OS image on the fabric switches uses the DHCP and
9HTTP server set up on the management server.
10
11The default image is downloaded during that installation process by the
12``onieboot`` role. Make changes to that roll and rerun the management playbook
13to download a newer switch image.
14
15Preparation
16-----------
17
18The switches have a single ethernet port that is shared between OpenBMC and
19ONL. Find out the MAC addresses for both of these ports and enter it into
20NetBox.
21
22Change boot mode to ONIE Rescue mode
23------------------------------------
24
25In order to reinstall an ONL image, you must change the ONIE bootloader to
26"Rescue Mode".
27
28Once the switch is powered on, it should retrieve an IP address on the OpenBMC
29interface with DHCP. OpenBMC uses these default credentials::
30
31 username: root
32 password: 0penBmc
33
34Login to OpenBMC with SSH::
35
36 $ ssh root@10.0.0.131
37 The authenticity of host '10.0.0.131 (10.0.0.131)' can't be established.
38 ECDSA key fingerprint is SHA256:...
39 Are you sure you want to continue connecting (yes/no)? yes
40 Warning: Permanently added '10.0.0.131' (ECDSA) to the list of known hosts.
41 root@10.0.0.131's password:
42 root@bmc:~#
43
44Using the Serial-over-LAN Console, enter ONL::
45
46 root@bmc:~# /usr/local/bin/sol.sh
47 You are in SOL session.
48 Use ctrl-x to quit.
49 -----------------------
50
51 root@onl:~#
52
53.. note::
54 If `sol.sh` is unresponsive, please try to restart the mainboard with::
55
56 root@onl:~# wedge_power.sh restart
57
58
59Change the boot mode to rescue mode with the command ``onl-onie-boot-mode
60rescue``, and reboot::
61
62 root@onl:~# onl-onie-boot-mode rescue
63 [1053033.768512] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)
64 [1053033.936893] EXT4-fs (sda3): re-mounted. Opts: (null)
65 [1053033.996727] EXT4-fs (sda3): re-mounted. Opts: (null)
66 The system will boot into ONIE rescue mode at the next restart.
67 root@onl:~# reboot
68
69At this point, ONL will go through it's shutdown sequence and ONIE will start.
70If it does not start right away, press the Enter/Return key a few times - it
71may show you a boot selection screen. Pick ``ONIE`` and ``Rescue`` if given a
72choice.
73
74Installing an ONL image over HTTP
75---------------------------------
76
77Now that the switch is in Rescue mode
78
79First, activate the Console by pressing Enter::
80
81 discover: Rescue mode detected. Installer disabled.
82
83 Please press Enter to activate this console.
84 To check the install status inspect /var/log/onie.log.
85 Try this: tail -f /var/log/onie.log
86
87 ** Rescue Mode Enabled **
88 ONIE:/ #
89
90Then run the ``onie-nos-install`` command, with the URL of the management
91server on the management network segment::
92
93 ONIE:/ # onie-nos-install http://10.0.0.129/onie-installer
94 discover: Rescue mode detected. No discover stopped.
95 ONIE: Unable to find 'Serial Number' TLV in EEPROM data.
96 Info: Fetching http://10.0.0.129/onie-installer ...
97 Connecting to 10.0.0.129 (10.0.0.129:80)
98 installer 100% |*******************************| 322M 0:00:00 ETA
99 ONIE: Executing installer: http://10.0.0.129/onie-installer
100 installer: computing checksum of original archive
101 installer: checksum is OK
102 ...
103
104The installation will now start, and then ONL will boot culminating in::
105
106 Open Network Linux OS ONL-wedge100bf-32qs, 2020-11-04.19:44-64100e9
107
108 localhost login:
109
110The default ONL login is::
111
112 username: root
113 password: onl
114
115If you login, you can verify that the switch is getting it's IP address via
116DHCP::
117
118 root@localhost:~# ip addr
119 ...
120 3: ma1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
121 link/ether 00:90:fb:5c:e1:97 brd ff:ff:ff:ff:ff:ff
122 inet 10.0.0.130/25 brd 10.0.0.255 scope global ma1
123 ...
124
125
126Post-ONL Configuration
127----------------------
128
129A ``terraform`` user must be created on the switches to allow them to be
130configured.
131
132This is done using Ansible. Verify that your inventory (Created earlier from the
133``inventory/example-aether.ini`` file) includes an ``[aetherfabric]`` section
134that has all the names and IP addresses of the compute nodes in it.
135
136Then run a ping test::
137
138 ansible -i inventory/sitename.ini -m ping aetherfabric
139
140This may fail with the error::
141
142 "msg": "Using a SSH password instead of a key is not possible because Host Key checking is enabled and sshpass does not support this. Please add this host's fingerprint to your known_hosts file to manage this host."
143
144Comment out the ``ansible_ssh_pass="onl"`` line, then rerun the ping test. It
145may ask you about authorized keys - answer ``yes`` for each host to trust the
146keys::
147
148 The authenticity of host '10.0.0.138 (<no hostip for proxy command>)' can't be established.
149 ECDSA key fingerprint is SHA256:...
150 Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
151
152Once you've trusted the host keys, the ping test should succeed::
153
154 spine1.role1.site | SUCCESS => {
155 "changed": false,
156 "ping": "pong"
157 }
158 leaf1.role1.site | SUCCESS => {
159 "changed": false,
160 "ping": "pong"
161 }
162 ...
163
164Then run the playbook to create the ``terraform`` user::
165
166 ansible-playbook -i inventory/sitename.ini playbooks/aetherfabric-playbook.yml
167
168Once completed, the switch should now be ready for TOST runtime install.