blob: 52c288f0050d47ac7567b2b2262df99ff23b963b [file] [log] [blame]
Zack Williams794532a2021-03-18 17:38:36 -07001..
2 SPDX-FileCopyrightText: © 2020 Open Networking Foundation <support@opennetworking.org>
3 SPDX-License-Identifier: Apache-2.0
4
5Server Bootstrap
6================
7
Zack Williamsb7d45152022-03-11 09:37:34 -07008Management Router Bootstrap
Zack Williams794532a2021-03-18 17:38:36 -07009"""""""""""""""""""""""""""
10
Zack Williamsb7d45152022-03-11 09:37:34 -070011The Management Router is bootstrapped into a customized version of the standard
Zack Williams794532a2021-03-18 17:38:36 -070012Ubuntu 18.04 OS installer.
13
14The `iPXE boot firmware <https://ipxe.org/>`_. is used to start this process
15and is built using the steps detailed in the `ipxe-build
16<https://gerrit.opencord.org/plugins/gitiles/ipxe-build>`_. repo, which
17generates both USB and PXE chainloadable boot images.
18
19Once a system has been started using these images started, these images will
20download a customized script from an external webserver to continue the boot
21process. This iPXE to webserver connection is secured with mutual TLS
22authentication, enforced by the nginx webserver.
23
24The iPXE scripts are created by the `pxeboot
25<https://gerrit.opencord.org/plugins/gitiles/ansible/role/pxeboot>`_ role,
26which creates both a boot menu, downloads the appropriate binaries for
27bootstrapping an OS installation, and creates per-node installation preseed files.
28
29The preseed files contain configuration steps to install the OS from the
30upstream Ubuntu repos, as well as customization of packages and creating the
31``onfadmin`` user.
32
33Creating a bootable USB drive
34'''''''''''''''''''''''''''''
35
361. Get a USB key. Can be tiny as the uncompressed image is floppy sized
37 (1.4MB). Download the USB image file (``<date>_onf_ipxe.usb.zip``) on the
38 system you're using to write the USB key, and unzip it.
39
402. Put a USB key in the system you're using to create the USB key, then
41 determine which USB device file it's at in ``/dev``. You might look at the
42 end of the ``dmesg`` output on Linux/Unix or the output of ``diskutil
43 list`` on macOS.
44
45 Be very careful about this, as if you accidentally overwrite some other disk in
46 your system that would be highly problematic.
47
483. Write the image to the device::
49
50 $ dd if=/path/to/20201116_onf_ipxe.usb of=/dev/sdg
51 2752+0 records in
52 2752+0 records out
53 1409024 bytes (1.4 MB, 1.3 MiB) copied, 2.0272 s, 695 kB/s
54
55 You may need to use `sudo` for this.
56
Zack Williamsb7d45152022-03-11 09:37:34 -070057Boot and Image management router
Zack Williams794532a2021-03-18 17:38:36 -070058''''''''''''''''''''''''''''''''
59
601. Connect a USB keyboard and VGA monitor to the management node. Put the USB
61 Key in one of the management node's USB ports (port 2 or 3):
62
63 .. image:: images/mgmtsrv-000.png
Zack Williamsb7d45152022-03-11 09:37:34 -070064 :alt: management router Ports
Zack Williams794532a2021-03-18 17:38:36 -070065 :scale: 50%
66
672. Turn on the management node, and press the F11 key as it starts to get into
68 the Boot Menu:
69
70 .. image:: images/mgmtsrv-001.png
Zack Williamsb7d45152022-03-11 09:37:34 -070071 :alt: management router Boot Menu
Zack Williams794532a2021-03-18 17:38:36 -070072 :scale: 50%
73
743. Select the USB key (in this case "PNY USB 2.0", your options may vary) and press return. You should see iPXE load:
75
76 .. image:: images/mgmtsrv-002.png
77 :alt: iPXE load
78 :scale: 50%
79
804. A menu will appear which displays the system information and DHCP discovered
81 network settings (your network must provide the IP address to the management
82 server via DHCP):
83
84 Use the arrow keys to select "Ubuntu 18.04 Installer (fully automatic)":
85
86 .. image:: images/mgmtsrv-003.png
87 :alt: iPXE Menu
88 :scale: 50%
89
90 There is a 10 second default timeout if left untouched (it will continue the
91 system boot process) so restart the system if you miss the 10 second window.
92
935. The Ubuntu 18.04 installer will be downloaded and booted:
94
95 .. image:: images/mgmtsrv-004.png
96 :alt: Ubuntu Boot
97 :scale: 50%
98
996. Then the installer starts and takes around 10 minutes to install (depends on
100 your connection speed):
101
102 .. image:: images/mgmtsrv-005.png
103 :alt: Ubuntu Install
104 :scale: 50%
105
106
1077. At the end of the install, the system will restart and present you with a
108 login prompt:
109
110 .. image:: images/mgmtsrv-006.png
111 :alt: Ubuntu Install Complete
112 :scale: 50%
113
114
Zack Williamsb7d45152022-03-11 09:37:34 -0700115management router Configuration
Zack Williams794532a2021-03-18 17:38:36 -0700116'''''''''''''''''''''''''''''''
117
Zack Williamsb7d45152022-03-11 09:37:34 -0700118Once the OS is installed on the management router, Ansible is used to remotely
119install software on the management router.
Zack Williams794532a2021-03-18 17:38:36 -0700120
121To checkout the ONF ansible repo and enter the virtualenv with the tooling::
122
123 mkdir infra
124 cd infra
125 repo init -u ssh://<your gerrit username>@gerrit.opencord.org:29418/infra-manifest
126 repo sync
127 cd ansible
128 make galaxy
129 source venv_onfansible/bin/activate
130
131Obtain the ``undionly.kpxe`` iPXE artifact for bootstrapping the compute
132servers, and put it in the ``playbook/files`` directory.
133
134Next, create an inventory file to access the NetBox API. An example is given
135in ``inventory/example-netbox.yml`` - duplicate this file and modify it. Fill
136in the ``api_endpoint`` address and ``token`` with an API key you get out of
137the NetBox instance. List the IP Prefixes used by the site in the
138``ip_prefixes`` list.
139
Wei-Yu Chen4c43ac32021-09-09 15:38:26 +0800140Next, run the ``scripts/edgeconfig.py`` to generate a host variables file in
Zack Williamsb7d45152022-03-11 09:37:34 -0700141``inventory/host_vars/<device name>.yaml`` for the management router and other
Wei-Yu Chen4c43ac32021-09-09 15:38:26 +0800142compute servers.::
Zack Williams794532a2021-03-18 17:38:36 -0700143
Wei-Yu Chen4c43ac32021-09-09 15:38:26 +0800144 python scripts/edgeconfig.py inventory/staging-netbox.yml
Zack Williams794532a2021-03-18 17:38:36 -0700145
Wei-Yu Chen4c43ac32021-09-09 15:38:26 +0800146The script will use the **Tenant** as the key to lookup data, and write the
147configuration files for each host. These configuration files will only be generated
148for device roles **Router** and **Server**.
Zack Williams794532a2021-03-18 17:38:36 -0700149
150In the case of the Fabric that has two leaves and IP ranges, add the Management
151server IP address used for the leaf that it is connected to, and then add a
152route for the other IP address range for the non-Management-connected leaf that
153is via the Fabric router address in the connected leaf range.
154
Zack Williams794532a2021-03-18 17:38:36 -0700155Using the ``inventory/example-aether.ini`` as a template, create an
156:doc:`ansible inventory <ansible:user_guide/intro_inventory>` file for the
157site. Change the device names, IP addresses, and ``onfadmin`` password to match
Zack Williamsb7d45152022-03-11 09:37:34 -0700158the ones for this site. The management router's configuration is in the
Zack Williams794532a2021-03-18 17:38:36 -0700159``[aethermgmt]`` and corresponding ``[aethermgmt:vars]`` section.
160
Zack Williamsb7d45152022-03-11 09:37:34 -0700161Then, to configure a management router, run::
Zack Williams794532a2021-03-18 17:38:36 -0700162
163 ansible-playbook -i inventory/sitename.ini playbooks/aethermgmt-playbook.yml
164
165This installs software with the following functionality:
166
167- VLANs on second Ethernet port to provide connectivity to the rest of the pod.
168- Firewall with NAT for routing traffic
169- DHCP and TFTP for bootstrapping servers and switches
170- DNS for host naming and identification
171- HTTP server for serving files used for bootstrapping switches
172- Downloads the Tofino switch image
173- Creates user accounts for administrative access
174
175Compute Server Bootstrap
176""""""""""""""""""""""""
177
Zack Williamsb7d45152022-03-11 09:37:34 -0700178Once the management router has finished installation, it will be set to offer
Zack Williams794532a2021-03-18 17:38:36 -0700179the same iPXE bootstrap file to the computer.
180
181Each node will be booted, and when iPXE loads select the ``Ubuntu 18.04
182Installer (fully automatic)`` option.
183
184The nodes can be controlled remotely via their BMC management interfaces - if
185the BMC is at ``10.0.0.3`` a remote user can SSH into them with::
186
187 ssh -L 2443:10.0.0.3:443 onfadmin@<mgmt server ip>
188
189And then use their web browser to access the BMC at::
190
191 https://localhost:2443
192
193The default BMC credentials for the Pronto nodes are::
194
195 login: ADMIN
196 password: Admin123
197
198The BMC will also list all of the MAC addresses for the network interfaces
199(including BMC) that are built into the logic board of the system. Add-in
200network cards like the 40GbE ones used in compute servers aren't listed.
201
202To prepare the compute nodes, software must be installed on them. As they
203can't be accessed directly from your local system, a :ref:`jump host
204<ansible:use_ssh_jump_hosts>` configuration is added, so the SSH connection
Zack Williamsb7d45152022-03-11 09:37:34 -0700205goes through the management router to the compute systems behind it. Doing this
Zack Williams794532a2021-03-18 17:38:36 -0700206requires a few steps:
207
208First, configure SSH to use Agent forwarding - create or edit your
209``~/.ssh/config`` file and add the following lines::
210
Zack Williamsb7d45152022-03-11 09:37:34 -0700211 Host <management router IP>
Zack Williams794532a2021-03-18 17:38:36 -0700212 ForwardAgent yes
213
Zack Williamsb7d45152022-03-11 09:37:34 -0700214Then try to login to the management router, then the compute node::
Zack Williams794532a2021-03-18 17:38:36 -0700215
Zack Williamsb7d45152022-03-11 09:37:34 -0700216 $ ssh onfadmin@<management router IP>
Zack Williams794532a2021-03-18 17:38:36 -0700217 Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-54-generic x86_64)
218 ...
219 onfadmin@mgmtserver1:~$ ssh onfadmin@10.0.0.138
220 Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-54-generic x86_64)
221 ...
222 onfadmin@node2:~$
223
224Being able to login to the compute nodes from the management node means that
225SSH Agent forwarding is working correctly.
226
227Verify that your inventory (Created earlier from the
228``inventory/example-aether.ini`` file) includes an ``[aethercompute]`` section
229that has all the names and IP addresses of the compute nodes in it.
230
231Then run a ping test::
232
233 ansible -i inventory/sitename.ini -m ping aethercompute
234
235It may ask you about authorized keys - answer ``yes`` for each host to trust the keys::
236
237 The authenticity of host '10.0.0.138 (<no hostip for proxy command>)' can't be established.
238 ECDSA key fingerprint is SHA256:...
239 Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
240
241You should then see a success message for each host::
242
243 node1.stage1.menlo | SUCCESS => {
244 "changed": false,
245 "ping": "pong"
246 }
247 node2.stage1.menlo | SUCCESS => {
248 "changed": false,
249 "ping": "pong"
250 }
251 ...
252
253Once you've seen this, run the playbook to install the prerequisites (Terraform
254user, Docker)::
255
256 ansible-playbook -i inventory/sitename.ini playbooks/aethercompute-playbook.yml
257
258Note that Docker is quite large and may take a few minutes for installation
259depending on internet connectivity.
260
261Now that these compute nodes have been brought up, the rest of the installation
262can continue.