blob: cfee987f3eb818b7efa4beaffacd8b2f301b335a [file] [log] [blame]
Hung-Wei Chiu77c969e2020-10-23 18:13:07 +00001..
2 SPDX-FileCopyrightText: © 2020 Open Networking Foundation <support@opennetworking.org>
3 SPDX-License-Identifier: Apache-2.0
4
Charles Chan4a107222020-10-30 17:23:48 -07005Hardware Installation
6=====================
Hung-Wei Chiu77c969e2020-10-23 18:13:07 +00007
Zack Williams9026f532020-11-30 11:34:32 -07008Once the hardware has been ordered, the installation can be planned and
9implemented. This document describes the installation of the servers and
10software.
Zack Williams34c30e52020-11-16 10:55:00 -070011
12Installation of the fabric switch hardware is covered in :ref:`OS Installation
13- Switches <switch-install>`.
14
15Installation of the radio hardware is covered in :ref:`eNB Installation
16<enb-installation>`.
17
Zack Williams9026f532020-11-30 11:34:32 -070018Site Bookkeeping
19----------------
Zack Williams34c30e52020-11-16 10:55:00 -070020
21The following items need to be added to `NetBox
22<https://netbox.readthedocs.io/en/stable>`_ to describe each edge site:
23
241. Add a Site for the edge (if one doesn't already exist), which has the
25 physical location and contact information for the edge.
26
Zack Williams9026f532020-11-30 11:34:32 -0700272. Add equipment Racks to the Site (if they don't already exist).
Zack Williams34c30e52020-11-16 10:55:00 -070028
293. Add a Tenant for the edge (who owns/manages it), assigned to the ``Pronto``
30 or ``Aether`` Tenant Group.
31
Zack Williams9026f532020-11-30 11:34:32 -0700324. Add a VRF (Routing Table) for the edge site. This is usually just the name
33 of the site. Make sure that ``Enforce unique space`` is checked, so that IP
34 addresses within the VRF are forced to be unique, and that the Tenant Group
35 and Tenant are set.
Zack Williams34c30e52020-11-16 10:55:00 -070036
375. Add a VLAN Group to the edge site, which groups the site's VLANs and
Zack Williams9026f532020-11-30 11:34:32 -070038 requires that they have a unique VLAN number.
Zack Williams34c30e52020-11-16 10:55:00 -070039
406. Add VLANs for the edge site. These should be assigned a VLAN Group, the
41 Site, and Tenant.
42
43 There can be multiple of the same VLAN in NetBox (VLANs are layer 2, and
44 local to the site), but not within the VLAN group.
45
46 The minimal list of VLANs:
47
48 * ADMIN 1
49 * UPLINK 10
50 * MGMT 800
51 * FAB 801
52
53 If you have multiple deployments at a site using the same management server,
54 add additional VLANs incremented by 10 for the MGMT/FAB - for example:
55
56 * DEVMGMT 810
57 * DEVFAB 801
58
597. Add IP Prefixes for the site. This should have the Tenant and VRF assigned.
60
61 All edge IP prefixes fit into a ``/22`` sized block.
62
63 The description of the Prefix contains the DNS suffix for all Devices that
64 have IP addresses within this Prefix. The full DNS names are generated by
65 combining the first ``<devname>`` component of the Device names with this
66 suffix.
67
Zack Williamsa7c170f2020-11-25 12:59:49 -070068 An examples using the ``10.0.0.0/22`` block. There are 4 edge
Zack Williams34c30e52020-11-16 10:55:00 -070069 prefixes, with the following purposes:
70
71 * ``10.0.0.0/25``
Zack Williamsa7c170f2020-11-25 12:59:49 -070072
Zack Williams34c30e52020-11-16 10:55:00 -070073 * Has the Server BMC/LOM and Management Switch
74 * Assign the ADMIN 1 VLAN
75 * Set the description to ``admin.<deployment>.<site>.aetherproject.net`` (or
76 ``prontoproject.net``).
77
78 * ``10.0.0.128/25``
Zack Williamsa7c170f2020-11-25 12:59:49 -070079
Zack Williams34c30e52020-11-16 10:55:00 -070080 * Has the Server Management plane, Fabric Switch Management/BMC
81 * Assign MGMT 800 VLAN
82 * Set the description to ``<deployment>.<site>.aetherproject.net`` (or
83 ``prontoproject.net``).
84
Zack Williamsa7c170f2020-11-25 12:59:49 -070085 * ``10.0.1.0/25``
86
87 * IP addresses of the qsfp0 port of the Compute Nodes to Fabric switches, devices
88 connected to the Fabric like the eNB
Zack Williams34c30e52020-11-16 10:55:00 -070089 * Assign FAB 801 VLAN
Zack Williamsa7c170f2020-11-25 12:59:49 -070090 * Set the description to ``fab1.<deployment>.<site>.aetherproject.net`` (or
Zack Williams34c30e52020-11-16 10:55:00 -070091 ``prontoproject.net``).
92
Zack Williamsa7c170f2020-11-25 12:59:49 -070093 * ``10.0.1.128/25``
94
95 * IP addresses of the qsfp1 port of the Compute Nodes to fabric switches
96 * Assign FAB 801 VLAN
97 * Set the description to ``fab2.<deployment>.<site>.aetherproject.net`` (or
98 ``prontoproject.net``).
99
Zack Williams5fd7a232020-12-03 12:45:56 -0700100 There also needs to be a parent range of the two fabric ranges added:
101
102 * ``10.0.1.0/24``
103
104 * This is used to configure the correct routes, DNS, and TFTP servers
105 provided by DHCP to the equipment that is connected to the fabric
106 leaf switch that the management server (which provides those
107 services) is not connected to.
108
Zack Williamsa7c170f2020-11-25 12:59:49 -0700109 Additionally, these edge prefixes are used for Kubernetes but don't need to
110 be created in NetBox:
111
Zack Williams34c30e52020-11-16 10:55:00 -0700112 * ``10.0.2.0/24``
Zack Williamsa7c170f2020-11-25 12:59:49 -0700113
Zack Williams34c30e52020-11-16 10:55:00 -0700114 * Kubernetes Pod IP's
115
116 * ``10.0.3.0/24``
Zack Williamsa7c170f2020-11-25 12:59:49 -0700117
Zack Williams34c30e52020-11-16 10:55:00 -0700118 * Kubernetes Cluster IP's
119
1208. Add Devices to the site, for each piece of equipment. These are named with a
121 scheme similar to the DNS names used for the pod, given in this format::
122
123 <devname>.<deployment>.<site>
124
125 Examples::
126
127 mgmtserver1.ops1.tucson
128 node1.stage1.menlo
129
130 Note that these names are transformed into DNS names using the Prefixes, and
131 may have additional components - ``admin`` or ``fabric`` may be added after
132 the ``<devname>`` for devices on those networks.
133
134 Set the following fields when creating a device:
135
136 * Site
137 * Tenant
138 * Rack & Rack Position
139 * Serial number
140
141 If a specific Device Type doesn't exist for the device, it must be created,
142 which is detailed in the NetBox documentation, or ask the OPs team for help.
143
Zack Williams9026f532020-11-30 11:34:32 -0700144 See `Rackmount of Equipment`_ below for guidance on how equipment should be
145 mounted in the Rack.
146
Zack Williamsa7c170f2020-11-25 12:59:49 -07001479. Add Services to the management server:
148
149 * name: ``dns``
150 protocol: UDP
151 port: 53
152
153 * name: ``tftp``
154 protocol: UDP
155 port: 69
156
Zack Williams9026f532020-11-30 11:34:32 -0700157 These are used by the DHCP and DNS config to know which servers offer
158 DNS or TFTP service.
Zack Williamsa7c170f2020-11-25 12:59:49 -0700159
16010. Set the MAC address for the physical interfaces on the device.
Zack Williams34c30e52020-11-16 10:55:00 -0700161
162 You may also need to add physical network interfaces if aren't already
163 created by the Device Type. An example would be if additional add-in
164 network cards were installed.
165
Zack Williamsa7c170f2020-11-25 12:59:49 -070016611. Add any virtual interfaces to the Devices. When creating a virtual
Zack Williams34c30e52020-11-16 10:55:00 -0700167 interface, it should have it's ``label`` field set to the physical network
168 interface that it is assigned
169
170 These are needed are two cases for the Pronto deployment:
171
172 1. On the Management Server, there should bet (at least) two VLAN
173 interfaces created attached to the ``eno2`` network port, which
174 are used to provide connectivity to the management plane and fabric.
175 These should be named ``<name of vlan><vlan ID>``, so the MGMT 800 VLAN
176 would become a virtual interface named ``mgmt800``, with the label
177 ``eno2``.
178
179 2. On the Fabric switches, the ``eth0`` port is shared between the OpenBMC
180 interface and the ONIE/ONL installation. Add a ``bmc`` virtual
Zack Williamsa7c170f2020-11-25 12:59:49 -0700181 interface with a label of ``eth0`` on each fabric switch, and check the
182 ``OOB Management`` checkbox.
Zack Williams34c30e52020-11-16 10:55:00 -0700183
Zack Williamsa7c170f2020-11-25 12:59:49 -070018412. Create IP addresses for the physical and virtual interfaces. These should
Zack Williams34c30e52020-11-16 10:55:00 -0700185 have the Tenant and VRF set.
186
187 The Management Server should always have the first IP address in each
188 range, and they should be incremental, in this order. Examples are given as
189 if there was a single instance of each device - adding additional devices
190 would increment the later IP addresses.
191
192 * Management Server
Zack Williamsa7c170f2020-11-25 12:59:49 -0700193
Zack Williams34c30e52020-11-16 10:55:00 -0700194 * ``eno1`` - site provided public IP address, or blank if DHCP
Zack Williamsa7c170f2020-11-25 12:59:49 -0700195 provided
196
Zack Williams34c30e52020-11-16 10:55:00 -0700197 * ``eno2`` - 10.0.0.1/25 (first of ADMIN) - set as primary IP
198 * ``bmc`` - 10.0.0.2/25 (next of ADMIN)
199 * ``mgmt800`` - 10.0.0.129/25 (first of MGMT)
Zack Williamsa7c170f2020-11-25 12:59:49 -0700200 * ``fab801`` - 10.0.1.1/25 (first of FAB)
Zack Williams34c30e52020-11-16 10:55:00 -0700201
202 * Management Switch
Zack Williamsa7c170f2020-11-25 12:59:49 -0700203
Zack Williams34c30e52020-11-16 10:55:00 -0700204 * ``gbe1`` - 10.0.0.3/25 (next of ADMIN) - set as primary IP
205
206 * Fabric Switch
Zack Williamsa7c170f2020-11-25 12:59:49 -0700207
Zack Williams34c30e52020-11-16 10:55:00 -0700208 * ``eth0`` - 10.0.0.130/25 (next of MGMT), set as primary IP
209 * ``bmc`` - 10.0.0.131/25
210
211 * Compute Server
Zack Williamsa7c170f2020-11-25 12:59:49 -0700212
Zack Williams34c30e52020-11-16 10:55:00 -0700213 * ``eth0`` - 10.0.0.132/25 (next of MGMT), set as primary IP
214 * ``bmc`` - 10.0.0.4/25 (next of ADMIN)
215 * ``qsfp0`` - 10.0.1.2/25 (next of FAB)
216 * ``qsfp1`` - 10.0.1.3/25
217
218 * Other Fabric devices (eNB, etc.)
Zack Williamsa7c170f2020-11-25 12:59:49 -0700219
Zack Williams34c30e52020-11-16 10:55:00 -0700220 * ``eth0`` or other primary interface - 10.0.1.4/25 (next of FAB)
221
Zack Williamsa7c170f2020-11-25 12:59:49 -070022213. Add DHCP ranges to the IP Prefixes for IP's that aren't reserved. These are
223 done like any other IP Address, but with the ``Status`` field is set to
224 ``DHCP``, and they'll consume the entire range of IP addresses given in the
225 CIDR mask.
226
227 For example ``10.0.0.32/27`` as a DHCP block would take up 1/4 of the ADMIN
228 prefix.
229
Zack Williamse8cb1212020-12-03 09:48:58 -070023014. Add router IP reservations to the IP Prefix for both Fabric prefixes. These
Zack Williams9d94b4f2020-12-14 11:25:29 -0700231 are IP addresses used by ONOS to route traffic to the other leaf, and have
232 the following attributes:
Zack Williamse8cb1212020-12-03 09:48:58 -0700233
Zack Williams5fd7a232020-12-03 12:45:56 -0700234 - Have the last usable address in range (in the ``/25`` fabric examples
235 above, these would be ``10.0.1.126/25`` and ``10.0.1.254/25``)
Zack Williamse8cb1212020-12-03 09:48:58 -0700236
237 - Have a ``Status`` of ``Reserved``, and the VRF, Tenant Group, and Tenant
238 set.
239
240 - The Description must start with the word ``router``, such as: ``router
Zack Williams5fd7a232020-12-03 12:45:56 -0700241 for leaf1 Fabric``
Zack Williamse8cb1212020-12-03 09:48:58 -0700242
Zack Williams9d94b4f2020-12-14 11:25:29 -0700243 - A custom field named ``RFC3442 Routes`` is set to the CIDR IP address of
244 the opposite leaf - if the leaf's prefix is ``10.0.1.0/25`` and the
245 router IP is ``10.0.1.126/25`` then ``RFC3442 Routes`` should be set to
246 ``10.0.1.128\25`` (and the reverse - on ``10.0.1.254/25`` the ``RFC3442
247 Routes`` would be set to be ``10.0.1.0/25``). This creates an `RFC3442
248 Classless Static Route Option <https://tools.ietf.org/html/rfc3442>`_
249 for the subnet in DHCP.
250
Zack Williamse8cb1212020-12-03 09:48:58 -070025115. Add Cables between physical interfaces on the devices
Zack Williams34c30e52020-11-16 10:55:00 -0700252
Zack Williamse8c3b2c2021-02-01 12:47:28 -0700253 The topology needs to match the logical diagram presented in the
254 :ref:`network_cable_plan`. Note that many of the management interfaces
255 need to be located either on the MGMT or ADMIN VLANs, and the management
256 switch is
257 used to provide that separation.
Zack Williams34c30e52020-11-16 10:55:00 -0700258
Zack Williams9026f532020-11-30 11:34:32 -0700259Rackmount of Equipment
260----------------------
Zack Williams34c30e52020-11-16 10:55:00 -0700261
Zack Williams9026f532020-11-30 11:34:32 -0700262Most of the Pronto equipment has a 19" rackmount form factor.
Zack Williams34c30e52020-11-16 10:55:00 -0700263
Zack Williams9026f532020-11-30 11:34:32 -0700264Guidelines for mounting this equipment:
Zack Williams34c30e52020-11-16 10:55:00 -0700265
Zack Williams9026f532020-11-30 11:34:32 -0700266- The EdgeCore Wedge Switches have a front-to-back (aka "port-to-power") fan
267 configuration, so hot air exhaust is out the back of the switch near the
268 power inlets, away from the 32 QSFP network ports on the front of the switch.
Zack Williams34c30e52020-11-16 10:55:00 -0700269
Zack Williams9026f532020-11-30 11:34:32 -0700270- The full-depth 1U and 2U Supermicro servers also have front-to-back airflow
271 but have most of their ports on the rear of the device.
Zack Williams34c30e52020-11-16 10:55:00 -0700272
Zack Williams9026f532020-11-30 11:34:32 -0700273- Airflow through the rack should be in one direction to avoid heat being
274 pulled from one device into another. This means that to connect the QSFP
275 network ports from the servers to the switches, cabling should be routed
276 through the rack from front (switch) to back (server). Empty rack spaces
277 should be reserved for this purpose.
Zack Williams34c30e52020-11-16 10:55:00 -0700278
Zack Williams9026f532020-11-30 11:34:32 -0700279- The short-depth management HP Switch and 1U Supermicro servers should be
280 mounted on the rear of the rack. They both don't generate an appreciable
281 amount of heat, so the airflow direction isn't a significant factor in
282 racking them.
Zack Williams34c30e52020-11-16 10:55:00 -0700283
284Inventory
285---------
286
287Once equipment arrives, any device needs to be recorded in inventory if it:
288
2891. Connects to the network (has a MAC address)
2902. Has a serial number
2913. Isn't a subcomponent (disk, add-in card, linecard, etc.) of a larger device.
292
293The following information should be recorded for every device:
294
295- Manufacturer
296- Model
297- Serial Number
298- MAC address (for the primary and any management/BMC/IPMI interfaces)
299
300This information should be be added to the corresponding Devices ONF NetBox
301instance. The accuracy of this information is very important as it is used in
302bootstrapping the systems.
303
304Once inventory has been completed, let the Infra team know, and the pxeboot
305configuration will be generated to have the OS preseed files corresponding to the
306new servers based on their serial numbers.
307
Zack Williams34c30e52020-11-16 10:55:00 -0700308Management Switch Bootstrap
309---------------------------
310
Zack Williamse8c3b2c2021-02-01 12:47:28 -0700311The current Pronto deployment uses an HP/Aruba 2540 24G PoE+ 4SFP+ JL356A
312switch to run the management network and other VLAN's that are used internally.
313
314By default the switch will pull an IP address via DHCP and ``http://<switch IP>``
315will display a management webpage for the switch. You need to be able to access
316this webpage before you can update the configuration.
317
318Loading the Management Switch Configuration
319"""""""""""""""""""""""""""""""""""""""""""
320
3211. Obtain a copy of the Management switch configuration file (this ends in ``.pcc``).
322
3232. Open the switch web interface at ``http://<switch IP>``. You may be
324 prompted to login - the default credentials are both ``admin``:
325
326 .. image:: images/pswi-000.png
327 :alt: User Login for switch
328 :scale: 50%
329
3303. Go to the "Management" section at bottom left:
331
332 .. image:: images/pswi-001.png
333 :alt: Update upload
334 :scale: 50%
335
336 In the "Update" section at left, drag the configuration file into the upload
337 area, or click Browse and select it.
338
3394. In the "Select switch configuration file to update" section, select
340 "config1", so it overwrites the default configuration.
341
3425. In the "Select switch configuration file to update" section, select
343 "config1", so it overwrites the default configuration. Click "Update".
344 You'll be prompted to reboot the switch, which you can do with the power
345 symbol at top right. You may be prompted to select an image used to reboot -
346 the "Previously Selected" is the correct one to use:
347
348 .. image:: images/pswi-003.png
349 :alt: Switch Image Select
350 :scale: 30%
351
3526. Wait for the switch to reboot:
353
354 .. image:: images/pswi-004.png
355 :alt: Switch Reboot
356 :scale: 50%
357
358 The switch is now configured with the correct VLANs for Pronto Use. If you
359 go to Interfaces > VLANs should see a list of VLANs configured on the
360 switch:
361
362 .. image:: images/pswi-005.png
363 :alt: Mgmt VLANs
364 :scale: 50%
Zack Williams34c30e52020-11-16 10:55:00 -0700365
Zack Williams9026f532020-11-30 11:34:32 -0700366Software Bootstrap
367------------------
Zack Williams34c30e52020-11-16 10:55:00 -0700368
369Management Server Bootstrap
370"""""""""""""""""""""""""""
371
372The management server is bootstrapped into a customized version of the standard
373Ubuntu 18.04 OS installer.
374
375The `iPXE boot firmware <https://ipxe.org/>`_. is used to start this process
376and is built using the steps detailed in the `ipxe-build
377<https://gerrit.opencord.org/plugins/gitiles/ipxe-build>`_. repo, which
378generates both USB and PXE chainloadable boot images.
379
380Once a system has been started using these images started, these images will
381download a customized script from an external webserver to continue the boot
382process. This iPXE to webserver connection is secured with mutual TLS
383authentication, enforced by the nginx webserver.
384
385The iPXE scripts are created by the `pxeboot
386<https://gerrit.opencord.org/plugins/gitiles/ansible/role/pxeboot>`_ role,
387which creates both a boot menu, downloads the appropriate binaries for
388bootstrapping an OS installation, and creates per-node installation preseed files.
389
390The preseed files contain configuration steps to install the OS from the
391upstream Ubuntu repos, as well as customization of packages and creating the
392``onfadmin`` user.
393
Zack Williamse8c3b2c2021-02-01 12:47:28 -0700394Creating a bootable USB drive
395'''''''''''''''''''''''''''''
396
3971. Get a USB key. Can be tiny as the uncompressed image is floppy sized
398 (1.4MB). Download the USB image file (``<date>_onf_ipxe.usb.zip``) on the
399 system you're using to write the USB key, and unzip it.
400
4012. Put a USB key in the system you're using to create the USB key, then
402 determine which USB device file it's at in ``/dev``. You might look at the
403 end of the ``dmesg`` output on Linux/Unix or the output of ``diskutil
404 list`` on macOS.
405
406 Be very careful about this, as if you accidentally overwrite some other disk in
407 your system that would be highly problematic.
408
4093. Write the image to the device::
410
411 $ dd if=/path/to/20201116_onf_ipxe.usb of=/dev/sdg
412 2752+0 records in
413 2752+0 records out
414 1409024 bytes (1.4 MB, 1.3 MiB) copied, 2.0272 s, 695 kB/s
415
416 You may need to use `sudo` for this.
417
418Boot and Image Management Server
419''''''''''''''''''''''''''''''''
420
4211. Connect a USB keyboard and VGA monitor to the management node. Put the USB
422 Key in one of the management node's USB ports (port 2 or 3):
423
424 .. image:: images/mgmtsrv-000.png
425 :alt: Management Server Ports
426 :scale: 50%
427
4282. Turn on the management node, and press the F11 key as it starts to get into
429 the Boot Menu:
430
431 .. image:: images/mgmtsrv-001.png
432 :alt: Management Server Boot Menu
433 :scale: 50%
434
4353. Select the USB key (in this case "PNY USB 2.0", your options may vary) and press return. You should see iPXE load:
436
437 .. image:: images/mgmtsrv-002.png
438 :alt: iPXE load
439 :scale: 50%
440
4414. A menu will appear which displays the system information and DHCP discovered
442 network settings (your network must provide the IP address to the management
443 server via DHCP):
444
445 Use the arrow keys to select "Ubuntu 18.04 Installer (fully automatic)":
446
447 .. image:: images/mgmtsrv-003.png
448 :alt: iPXE Menu
449 :scale: 50%
450
451 There is a 10 second default timeout if left untouched (it will continue the
452 system boot process) so restart the system if you miss the 10 second window.
453
4545. The Ubuntu 18.04 installer will be downloaded and booted:
455
456 .. image:: images/mgmtsrv-004.png
457 :alt: Ubuntu Boot
458 :scale: 50%
459
4606. Then the installer starts and takes around 10 minutes to install (depends on
461 your connection speed):
462
463 .. image:: images/mgmtsrv-005.png
464 :alt: Ubuntu Install
465 :scale: 50%
466
467
4687. At the end of the install, the system will restart and present you with a
469 login prompt:
470
471 .. image:: images/mgmtsrv-006.png
472 :alt: Ubuntu Install Complete
473 :scale: 50%
474
475
476Management Server Configuration
477'''''''''''''''''''''''''''''''
Zack Williams34c30e52020-11-16 10:55:00 -0700478
479Once the OS is installed on the management server, Ansible is used to remotely
480install software on the management server.
481
482To checkout the ONF ansible repo and enter the virtualenv with the tooling::
483
484 mkdir infra
485 cd infra
486 repo init -u ssh://<your gerrit username>@gerrit.opencord.org:29418/infra-manifest
487 repo sync
488 cd ansible
489 make galaxy
490 source venv_onfansible/bin/activate
491
Zack Williamsa7c170f2020-11-25 12:59:49 -0700492Obtain the ``undionly.kpxe`` iPXE artifact for bootstrapping the compute
Zack Williams9026f532020-11-30 11:34:32 -0700493servers, and put it in the ``playbook/files`` directory.
Zack Williamsa7c170f2020-11-25 12:59:49 -0700494
Zack Williams34c30e52020-11-16 10:55:00 -0700495Next, create an inventory file to access the NetBox API. An example is given
496in ``inventory/example-netbox.yml`` - duplicate this file and modify it. Fill
497in the ``api_endpoint`` address and ``token`` with an API key you get out of
498the NetBox instance. List the IP Prefixes used by the site in the
499``ip_prefixes`` list.
500
501Next, run the ``scripts/netbox_edgeconfig.py`` to generate a host_vars file for
502the management server. Assuming that the management server in the edge is
503named ``mgmtserver1.stage1.menlo``, you'd run::
504
505 python scripts/netbox_edgeconfig.py inventory/my-netbox.yml > inventory/host_vars/mgmtserver1.stage1.menlo.yml
506
Zack Williamsa7c170f2020-11-25 12:59:49 -0700507One manual change needs to be made to this output - edit the
508``inventory/host_vars/mgmtserver1.stage1.menlo.yml`` file and add the following
Zack Williamse8cb1212020-12-03 09:48:58 -0700509to the bottom of the file, replacing the IP addresses with the management
510server IP address for each segment.
511
512In the case of the Fabric that has two leaves and IP ranges, add the Management
513server IP address used for the leaf that it is connected to, and then add a
514route for the other IP address range for the non-Management-connected leaf that
Zack Williams5fd7a232020-12-03 12:45:56 -0700515is via the Fabric router address in the connected leaf range.
Zack Williamse8cb1212020-12-03 09:48:58 -0700516
517This configures the `netplan <https://netplan.io>`_ on the management server,
Zack Williams5fd7a232020-12-03 12:45:56 -0700518and creates a SNAT rule for the UE range route, and will be automated away
519soon::
Zack Williamsa7c170f2020-11-25 12:59:49 -0700520
521 # added manually
522 netprep_netplan:
523 ethernets:
524 eno2:
525 addresses:
526 - 10.0.0.1/25
527 vlans:
528 mgmt800:
529 id: 800
530 link: eno2
531 addresses:
532 - 10.0.0.129/25
533 fabr801:
534 id: 801
535 link: eno2
536 addresses:
Zack Williamse8cb1212020-12-03 09:48:58 -0700537 - 10.0.1.129/25
538 routes:
539 - to: 10.0.1.0/25
540 via: 10.0.1.254
Zack Williams5fd7a232020-12-03 12:45:56 -0700541 metric: 100
542
543 netprep_nftables_nat_postrouting: >
544 ip saddr 10.0.1.0/25 ip daddr 10.168.0.0/20 counter snat to 10.0.1.129;
545
Zack Williamsa7c170f2020-11-25 12:59:49 -0700546
Zack Williams9026f532020-11-30 11:34:32 -0700547Using the ``inventory/example-aether.ini`` as a template, create an
548:doc:`ansible inventory <ansible:user_guide/intro_inventory>` file for the
549site. Change the device names, IP addresses, and ``onfadmin`` password to match
550the ones for this site. The management server's configuration is in the
551``[aethermgmt]`` and corresponding ``[aethermgmt:vars]`` section.
Zack Williams34c30e52020-11-16 10:55:00 -0700552
553Then, to configure a management server, run::
554
Zack Williams9026f532020-11-30 11:34:32 -0700555 ansible-playbook -i inventory/sitename.ini playbooks/aethermgmt-playbook.yml
Zack Williams34c30e52020-11-16 10:55:00 -0700556
557This installs software with the following functionality:
558
559- VLANs on second Ethernet port to provide connectivity to the rest of the pod.
560- Firewall with NAT for routing traffic
561- DHCP and TFTP for bootstrapping servers and switches
562- DNS for host naming and identification
Zack Williamsa7c170f2020-11-25 12:59:49 -0700563- HTTP server for serving files used for bootstrapping switches
Zack Williams9026f532020-11-30 11:34:32 -0700564- Downloads the Tofino switch image
565- Creates user accounts for administrative access
Zack Williams34c30e52020-11-16 10:55:00 -0700566
567Compute Server Bootstrap
568""""""""""""""""""""""""
569
570Once the management server has finished installation, it will be set to offer
571the same iPXE bootstrap file to the computer.
572
573Each node will be booted, and when iPXE loads select the ``Ubuntu 18.04
574Installer (fully automatic)`` option.
Zack Williamsa7c170f2020-11-25 12:59:49 -0700575
576The nodes can be controlled remotely via their BMC management interfaces - if
577the BMC is at ``10.0.0.3`` a remote user can SSH into them with::
578
579 ssh -L 2443:10.0.0.3:443 onfadmin@<mgmt server ip>
580
581And then use their web browser to access the BMC at::
582
583 https://localhost:2443
584
585The default BMC credentials for the Pronto nodes are::
586
587 login: ADMIN
588 password: Admin123
589
Zack Williams9026f532020-11-30 11:34:32 -0700590The BMC will also list all of the MAC addresses for the network interfaces
591(including BMC) that are built into the logic board of the system. Add-in
592network cards like the 40GbE ones used in compute servers aren't listed.
593
594To prepare the compute nodes, software must be installed on them. As they
595can't be accessed directly from your local system, a :ref:`jump host
596<ansible:use_ssh_jump_hosts>` configuration is added, so the SSH connection
597goes through the management server to the compute systems behind it. Doing this
598requires a few steps:
599
600First, configure SSH to use Agent forwarding - create or edit your
601``~/.ssh/config`` file and add the following lines::
602
603 Host <management server IP>
604 ForwardAgent yes
605
606Then try to login to the management server, then the compute node::
607
608 $ ssh onfadmin@<management server IP>
609 Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-54-generic x86_64)
610 ...
611 onfadmin@mgmtserver1:~$ ssh onfadmin@10.0.0.138
612 Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-54-generic x86_64)
613 ...
614 onfadmin@node2:~$
615
616Being able to login to the compute nodes from the management node means that
617SSH Agent forwarding is working correctly.
618
619Verify that your inventory (Created earlier from the
620``inventory/example-aether.ini`` file) includes an ``[aethercompute]`` section
621that has all the names and IP addresses of the compute nodes in it.
622
623Then run a ping test::
624
625 ansible -i inventory/sitename.ini -m ping aethercompute
626
627It may ask you about authorized keys - answer ``yes`` for each host to trust the keys::
628
629 The authenticity of host '10.0.0.138 (<no hostip for proxy command>)' can't be established.
630 ECDSA key fingerprint is SHA256:...
631 Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
632
633You should then see a success message for each host::
634
635 node1.stage1.menlo | SUCCESS => {
636 "changed": false,
637 "ping": "pong"
638 }
639 node2.stage1.menlo | SUCCESS => {
640 "changed": false,
641 "ping": "pong"
642 }
643 ...
644
645Once you've seen this, run the playbook to install the prerequisites (Terraform
646user, Docker)::
647
648 ansible-playbook -i inventory/sitename.ini playbooks/aethercompute-playbook.yml
649
650Note that Docker is quite large and may take a few minutes for installation
651depending on internet connectivity.
652
653Now that these compute nodes have been brought up, the rest of the installation
654can continue.