pronto_deployment_guide/hw_installation.rst - aether-docs - Gitiles

 ..
    SPDX-FileCopyrightText: © 2020 Open Networking Foundation <support@opennetworking.org>
    SPDX-License-Identifier: Apache-2.0

 Hardware Installation
 =====================

 Once the hardware has been ordered, the installation can be planned and
 implemented. This document describes the installation of the servers and
 software.

 Installation of the fabric switch hardware is covered in :ref:`OS Installation
 - Switches <switch-install>`.

 Installation of the radio hardware is covered in :ref:`eNB Installation
 <enb-installation>`.

 Site Bookkeeping
 ----------------

 The following items need to be added to `NetBox
 <https://netbox.readthedocs.io/en/stable>`_ to describe each edge site:

 1. Add a Site for the edge (if one doesn't already exist), which has the
    physical location and contact information for the edge.

 2. Add equipment Racks to the Site (if they don't already exist).

 3. Add a Tenant for the edge (who owns/manages it), assigned to the ``Pronto``
    or ``Aether`` Tenant Group.

 4. Add a VRF (Routing Table) for the edge site. This is usually just the name
    of the site.  Make sure that ``Enforce unique space`` is checked, so that IP
    addresses within the VRF are forced to be unique, and that the Tenant Group
    and Tenant are set.

 5. Add a VLAN Group to the edge site, which groups the site's VLANs and
    requires that they have a unique VLAN number.

 6. Add VLANs for the edge site.  These should be assigned a VLAN Group, the
    Site, and Tenant.

    There can be multiple of the same VLAN in NetBox (VLANs are layer 2, and
    local to the site), but not within the VLAN group.

    The minimal list of VLANs:

      * ADMIN 1
      * UPLINK 10
      * MGMT 800
      * FAB 801

    If you have multiple deployments at a site using the same management server,
    add additional VLANs incremented by 10 for the MGMT/FAB - for example:

      * DEVMGMT 810
      * DEVFAB 801

 7. Add IP Prefixes for the site. This should have the Tenant and VRF assigned.

    All edge IP prefixes fit into a ``/22`` sized block.

    The description of the Prefix contains the DNS suffix for all Devices that
    have IP addresses within this Prefix. The full DNS names are generated by
    combining the first ``<devname>`` component of the Device names with this
    suffix.

    An examples using the ``10.0.0.0/22`` block. There are 4 edge
    prefixes, with the following purposes:

      * ``10.0.0.0/25``

         * Has the Server BMC/LOM and Management Switch
         * Assign the ADMIN 1 VLAN
         * Set the description to ``admin.<deployment>.<site>.aetherproject.net`` (or
           ``prontoproject.net``).

      * ``10.0.0.128/25``

         * Has the Server Management plane, Fabric Switch Management/BMC
         * Assign MGMT 800 VLAN
         * Set the description to ``<deployment>.<site>.aetherproject.net`` (or
           ``prontoproject.net``).

      * ``10.0.1.0/25``

         * IP addresses of the qsfp0 port of the Compute Nodes to Fabric switches, devices
           connected to the Fabric like the eNB
         * Assign FAB 801 VLAN
         * Set the description to ``fab1.<deployment>.<site>.aetherproject.net`` (or
           ``prontoproject.net``).

      * ``10.0.1.128/25``

         * IP addresses of the qsfp1 port of the Compute Nodes to fabric switches
         * Assign FAB 801 VLAN
         * Set the description to ``fab2.<deployment>.<site>.aetherproject.net`` (or
           ``prontoproject.net``).

    Additionally, these edge prefixes are used for Kubernetes but don't need to
    be created in NetBox:

      * ``10.0.2.0/24``

         * Kubernetes Pod IP's

      * ``10.0.3.0/24``

         * Kubernetes Cluster IP's

 8. Add Devices to the site, for each piece of equipment. These are named with a
    scheme similar to the DNS names used for the pod, given in this format::

      <devname>.<deployment>.<site>

    Examples::

      mgmtserver1.ops1.tucson
      node1.stage1.menlo

    Note that these names are transformed into DNS names using the Prefixes, and
    may have additional components - ``admin`` or ``fabric`` may be added after
    the ``<devname>`` for devices on those networks.

    Set the following fields when creating a device:

      * Site
      * Tenant
      * Rack & Rack Position
      * Serial number

    If a specific Device Type doesn't exist for the device, it must be created,
    which is detailed in the NetBox documentation, or ask the OPs team for help.

    See `Rackmount of Equipment`_ below for guidance on how equipment should be
    mounted in the Rack.

 9. Add Services to the management server:

     * name: ``dns``
       protocol: UDP
       port: 53

     * name: ``tftp``
       protocol: UDP
       port: 69

    These are used by the DHCP and DNS config to know which servers offer
    DNS or TFTP service.

 10. Set the MAC address for the physical interfaces on the device.

    You may also need to add physical network interfaces if  aren't already
    created by the Device Type.  An example would be if additional add-in
    network cards were installed.

 11. Add any virtual interfaces to the Devices. When creating a virtual
     interface, it should have it's ``label`` field set to the physical network
     interface that it is assigned

     These are needed are two cases for the Pronto deployment:

      1. On the Management Server, there should bet (at least) two VLAN
         interfaces created attached to the ``eno2`` network port, which
         are used to provide connectivity to the management plane and fabric.
         These should be named ``<name of vlan><vlan ID>``, so the MGMT 800 VLAN
         would become a virtual interface named ``mgmt800``, with the label
         ``eno2``.

      2. On the Fabric switches, the ``eth0`` port is shared between the OpenBMC
         interface and the ONIE/ONL installation.  Add a ``bmc`` virtual
         interface with a label of ``eth0`` on each fabric switch, and check the
         ``OOB Management`` checkbox.

 12. Create IP addresses for the physical and virtual interfaces.  These should
     have the Tenant and VRF set.

     The Management Server should always have the first IP address in each
     range, and they should be incremental, in this order. Examples are given as
     if there was a single instance of each device - adding additional devices
     would increment the later IP addresses.

       * Management Server

           * ``eno1`` - site provided public IP address, or blank if DHCP
             provided

           * ``eno2`` - 10.0.0.1/25 (first of ADMIN) - set as primary IP
           * ``bmc`` - 10.0.0.2/25 (next of ADMIN)
           * ``mgmt800`` - 10.0.0.129/25 (first of MGMT)
           * ``fab801`` - 10.0.1.1/25 (first of FAB)

       * Management Switch

           * ``gbe1`` - 10.0.0.3/25 (next of ADMIN) - set as primary IP

       * Fabric Switch

           * ``eth0`` - 10.0.0.130/25 (next of MGMT), set as primary IP
           * ``bmc`` - 10.0.0.131/25

       * Compute Server

           * ``eth0`` - 10.0.0.132/25 (next of MGMT), set as primary IP
           * ``bmc`` - 10.0.0.4/25 (next of ADMIN)
           * ``qsfp0`` - 10.0.1.2/25 (next of FAB)
           * ``qsfp1`` - 10.0.1.3/25

       * Other Fabric devices (eNB, etc.)

           * ``eth0`` or other primary interface - 10.0.1.4/25 (next of FAB)

 13. Add DHCP ranges to the IP Prefixes for IP's that aren't reserved. These are
     done like any other IP Address, but with the ``Status`` field is set to
     ``DHCP``, and they'll consume the entire range of IP addresses given in the
     CIDR mask.

     For example ``10.0.0.32/27`` as a DHCP block would take up 1/4 of the ADMIN
     prefix.

 14. Add Cables between physical interfaces on the devices

     TODO: Explain the cabling topology

 Rackmount of Equipment
 ----------------------

 Most of the Pronto equipment has a 19" rackmount form factor.

 Guidelines for mounting this equipment:

 - The EdgeCore Wedge Switches have a front-to-back (aka "port-to-power") fan
   configuration, so hot air exhaust is out the back of the switch near the
   power inlets, away from the 32 QSFP network ports on the front of the switch.

 - The full-depth 1U and 2U Supermicro servers also have front-to-back airflow
   but have most of their ports on the rear of the device.

 - Airflow through the rack should be in one direction to avoid heat being
   pulled from one device into another.  This means that to connect the QSFP
   network ports from the servers to the switches, cabling should be routed
   through the rack from front (switch) to back (server).  Empty rack spaces
   should be reserved for this purpose.

 - The short-depth management HP Switch and 1U Supermicro servers should be
   mounted on the rear of the rack.  They both don't generate an appreciable
   amount of heat, so the airflow direction isn't a significant factor in
   racking them.

 Inventory
 ---------

 Once equipment arrives, any device needs to be recorded in inventory if it:

 1. Connects to the network (has a MAC address)
 2. Has a serial number
 3. Isn't a subcomponent (disk, add-in card, linecard, etc.) of a larger device.

 The following information should be recorded for every device:

 - Manufacturer
 - Model
 - Serial Number
 - MAC address (for the primary and any management/BMC/IPMI interfaces)

 This information should be be added to the corresponding Devices ONF NetBox
 instance.  The accuracy of this information is very important as it is used in
 bootstrapping the systems.

 Once inventory has been completed, let the Infra team know, and the pxeboot
 configuration will be generated to have the OS preseed files corresponding to the
 new servers based on their serial numbers.

 Cabling and Network Topology
 ----------------------------

 TODO: Add diagrams of network here, and cabling plan

 Management Switch Bootstrap
 ---------------------------

 TODO: Add instructions for bootstrapping management switch, from document that
 has the linked config file.

 Software Bootstrap
 ------------------

 Management Server Bootstrap
 """""""""""""""""""""""""""

 The management server is bootstrapped into a customized version of the standard
 Ubuntu 18.04 OS installer.

 The `iPXE boot firmware <https://ipxe.org/>`_. is used to start this process
 and is built using the steps detailed in the `ipxe-build
 <https://gerrit.opencord.org/plugins/gitiles/ipxe-build>`_. repo, which
 generates both USB and PXE chainloadable boot images.

 Once a system has been started using these images started, these images will
 download a customized script from  an external webserver to continue the boot
 process. This iPXE to webserver connection is secured with mutual TLS
 authentication, enforced by the nginx webserver.

 The iPXE scripts are created by the `pxeboot
 <https://gerrit.opencord.org/plugins/gitiles/ansible/role/pxeboot>`_ role,
 which creates both a boot menu, downloads the appropriate binaries for
 bootstrapping an OS installation, and creates per-node installation preseed files.

 The preseed files contain configuration steps to install the OS from the
 upstream Ubuntu repos, as well as customization of packages and creating the
 ``onfadmin`` user.

 TODO: convert instructions for bootstrapping the management server with iPXE here.

 Once the OS is installed on the management server, Ansible is used to remotely
 install software on the management server.

 To checkout the ONF ansible repo and enter the virtualenv with the tooling::

   mkdir infra
   cd infra
   repo init -u ssh://<your gerrit username>@gerrit.opencord.org:29418/infra-manifest
   repo sync
   cd ansible
   make galaxy
   source venv_onfansible/bin/activate

 Obtain the ``undionly.kpxe`` iPXE artifact for bootstrapping the compute
 servers, and put it in the ``playbook/files`` directory.

 Next, create an inventory file to access the NetBox API.  An example is given
 in ``inventory/example-netbox.yml`` - duplicate this file and modify it. Fill
 in the ``api_endpoint`` address and ``token`` with an API key you get out of
 the NetBox instance.  List the IP Prefixes used by the site in the
 ``ip_prefixes`` list.

 Next, run the ``scripts/netbox_edgeconfig.py`` to generate a host_vars file for
 the management server.  Assuming that the management server in the edge is
 named ``mgmtserver1.stage1.menlo``, you'd run::

   python scripts/netbox_edgeconfig.py inventory/my-netbox.yml > inventory/host_vars/mgmtserver1.stage1.menlo.yml

 One manual change needs to be made to this output - edit the
 ``inventory/host_vars/mgmtserver1.stage1.menlo.yml`` file and add the following
 to the bottom of the file, replacing the IP addresses with *only the lowest
 numbered IP address* the management server has on each VLAN (if >1 IP address
 is assigned to a VLAN or Interface, the DHCP server will fail to run). This
 configures the `netplan <https://netplan.io>`_ on the management server, and
 will be automated away soon::

   # added manually
   netprep_netplan:
     ethernets:
       eno2:
         addresses:
           - 10.0.0.1/25
     vlans:
       mgmt800:
         id: 800
         link: eno2
         addresses:
           - 10.0.0.129/25
       fabr801:
         id: 801
         link: eno2
         addresses:
           - 10.0.1.1/25

 Using the ``inventory/example-aether.ini`` as a template, create an
 :doc:`ansible inventory <ansible:user_guide/intro_inventory>` file for the
 site. Change the device names, IP addresses, and ``onfadmin`` password to match
 the ones for this site.  The management server's configuration is in the
 ``[aethermgmt]`` and corresponding ``[aethermgmt:vars]`` section.

 Then, to configure a management server, run::

   ansible-playbook -i inventory/sitename.ini playbooks/aethermgmt-playbook.yml

 This installs software with the following functionality:

 - VLANs on second Ethernet port to provide connectivity to the rest of the pod.
 - Firewall with NAT for routing traffic
 - DHCP and TFTP for bootstrapping servers and switches
 - DNS for host naming and identification
 - HTTP server for serving files used for bootstrapping switches
 - Downloads the Tofino switch image
 - Creates user accounts for administrative access

 Compute Server Bootstrap
 """"""""""""""""""""""""

 Once the management server has finished installation, it will be set to offer
 the same iPXE bootstrap file to the computer.

 Each node will be booted, and when iPXE loads select the ``Ubuntu 18.04
 Installer (fully automatic)`` option.

 The nodes can be controlled remotely via their BMC management interfaces - if
 the BMC is at ``10.0.0.3`` a remote user can SSH into them with::

   ssh -L 2443:10.0.0.3:443 onfadmin@<mgmt server ip>

 And then use their web browser to access the BMC at::

   https://localhost:2443

 The default BMC credentials for the Pronto nodes are::

   login: ADMIN
   password: Admin123

 The BMC will also list all of the MAC addresses for the network interfaces
 (including BMC) that are built into the logic board of the system. Add-in
 network cards like the 40GbE ones used in compute servers aren't listed.

 To prepare the compute nodes, software must be installed on them.  As they
 can't be accessed directly from your local system, a :ref:`jump host
 <ansible:use_ssh_jump_hosts>` configuration is added, so the SSH connection
 goes through the management server to the compute systems behind it. Doing this
 requires a few steps:

 First, configure SSH to use Agent forwarding - create or edit your
 ``~/.ssh/config`` file and add the following lines::

   Host <management server IP>
     ForwardAgent yes

 Then try to login to the management server, then the compute node::

   $ ssh onfadmin@<management server IP>
   Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-54-generic x86_64)
   ...
   onfadmin@mgmtserver1:~$ ssh onfadmin@10.0.0.138
   Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-54-generic x86_64)
   ...
   onfadmin@node2:~$

 Being able to login to the compute nodes from the management node means that
 SSH Agent forwarding is working correctly.

 Verify that your inventory (Created earlier from the
 ``inventory/example-aether.ini`` file) includes an ``[aethercompute]`` section
 that has all the names and IP addresses of the compute nodes in it.

 Then run a ping test::

   ansible -i inventory/sitename.ini -m ping aethercompute

 It may ask you about authorized keys - answer ``yes`` for each host to trust the keys::

   The authenticity of host '10.0.0.138 (<no hostip for proxy command>)' can't be established.
   ECDSA key fingerprint is SHA256:...
   Are you sure you want to continue connecting (yes/no/[fingerprint])? yes

 You should then see a success message for each host::

   node1.stage1.menlo | SUCCESS => {
       "changed": false,
       "ping": "pong"
   }
   node2.stage1.menlo | SUCCESS => {
       "changed": false,
       "ping": "pong"
   }
   ...

 Once you've seen this, run the playbook to install the prerequisites (Terraform
 user, Docker)::

   ansible-playbook -i inventory/sitename.ini playbooks/aethercompute-playbook.yml

 Note that Docker is quite large and may take a few minutes for installation
 depending on internet connectivity.

 Now that these compute nodes have been brought up, the rest of the installation
 can continue.
	..
	SPDX-FileCopyrightText: © 2020 Open Networking Foundation <support@opennetworking.org>
	SPDX-License-Identifier: Apache-2.0

	Hardware Installation
	=====================

	Once the hardware has been ordered, the installation can be planned and
	implemented. This document describes the installation of the servers and
	software.

	Installation of the fabric switch hardware is covered in :ref:`OS Installation
	- Switches <switch-install>`.

	Installation of the radio hardware is covered in :ref:`eNB Installation
	<enb-installation>`.

	Site Bookkeeping
	----------------

	The following items need to be added to `NetBox
	<https://netbox.readthedocs.io/en/stable>`_ to describe each edge site:

	1. Add a Site for the edge (if one doesn't already exist), which has the
	physical location and contact information for the edge.

	2. Add equipment Racks to the Site (if they don't already exist).

	3. Add a Tenant for the edge (who owns/manages it), assigned to the ``Pronto``
	or ``Aether`` Tenant Group.

	4. Add a VRF (Routing Table) for the edge site. This is usually just the name
	of the site. Make sure that ``Enforce unique space`` is checked, so that IP
	addresses within the VRF are forced to be unique, and that the Tenant Group
	and Tenant are set.

	5. Add a VLAN Group to the edge site, which groups the site's VLANs and
	requires that they have a unique VLAN number.

	6. Add VLANs for the edge site. These should be assigned a VLAN Group, the
	Site, and Tenant.

	There can be multiple of the same VLAN in NetBox (VLANs are layer 2, and
	local to the site), but not within the VLAN group.

	The minimal list of VLANs:

	* ADMIN 1
	* UPLINK 10
	* MGMT 800
	* FAB 801

	If you have multiple deployments at a site using the same management server,
	add additional VLANs incremented by 10 for the MGMT/FAB - for example:

	* DEVMGMT 810
	* DEVFAB 801

	7. Add IP Prefixes for the site. This should have the Tenant and VRF assigned.

	All edge IP prefixes fit into a ``/22`` sized block.

	The description of the Prefix contains the DNS suffix for all Devices that
	have IP addresses within this Prefix. The full DNS names are generated by
	combining the first ``<devname>`` component of the Device names with this
	suffix.

	An examples using the ``10.0.0.0/22`` block. There are 4 edge
	prefixes, with the following purposes:

	* ``10.0.0.0/25``

	* Has the Server BMC/LOM and Management Switch
	* Assign the ADMIN 1 VLAN
	* Set the description to ``admin.<deployment>.<site>.aetherproject.net`` (or
	``prontoproject.net``).

	* ``10.0.0.128/25``

	* Has the Server Management plane, Fabric Switch Management/BMC
	* Assign MGMT 800 VLAN
	* Set the description to ``<deployment>.<site>.aetherproject.net`` (or
	``prontoproject.net``).

	* ``10.0.1.0/25``

	* IP addresses of the qsfp0 port of the Compute Nodes to Fabric switches, devices
	connected to the Fabric like the eNB
	* Assign FAB 801 VLAN
	* Set the description to ``fab1.<deployment>.<site>.aetherproject.net`` (or
	``prontoproject.net``).

	* ``10.0.1.128/25``

	* IP addresses of the qsfp1 port of the Compute Nodes to fabric switches
	* Assign FAB 801 VLAN
	* Set the description to ``fab2.<deployment>.<site>.aetherproject.net`` (or
	``prontoproject.net``).

	Additionally, these edge prefixes are used for Kubernetes but don't need to
	be created in NetBox:

	* ``10.0.2.0/24``

	* Kubernetes Pod IP's

	* ``10.0.3.0/24``

	* Kubernetes Cluster IP's

	8. Add Devices to the site, for each piece of equipment. These are named with a
	scheme similar to the DNS names used for the pod, given in this format::

	<devname>.<deployment>.<site>

	Examples::

	mgmtserver1.ops1.tucson
	node1.stage1.menlo

	Note that these names are transformed into DNS names using the Prefixes, and
	may have additional components - ``admin`` or ``fabric`` may be added after
	the ``<devname>`` for devices on those networks.

	Set the following fields when creating a device:

	* Site
	* Tenant
	* Rack & Rack Position
	* Serial number

	If a specific Device Type doesn't exist for the device, it must be created,
	which is detailed in the NetBox documentation, or ask the OPs team for help.

	See `Rackmount of Equipment`_ below for guidance on how equipment should be
	mounted in the Rack.

	9. Add Services to the management server:

	* name: ``dns``
	protocol: UDP
	port: 53

	* name: ``tftp``
	protocol: UDP
	port: 69

	These are used by the DHCP and DNS config to know which servers offer
	DNS or TFTP service.

	10. Set the MAC address for the physical interfaces on the device.

	You may also need to add physical network interfaces if aren't already
	created by the Device Type. An example would be if additional add-in
	network cards were installed.

	11. Add any virtual interfaces to the Devices. When creating a virtual
	interface, it should have it's ``label`` field set to the physical network
	interface that it is assigned

	These are needed are two cases for the Pronto deployment:

	1. On the Management Server, there should bet (at least) two VLAN
	interfaces created attached to the ``eno2`` network port, which
	are used to provide connectivity to the management plane and fabric.
	These should be named ``<name of vlan><vlan ID>``, so the MGMT 800 VLAN
	would become a virtual interface named ``mgmt800``, with the label
	``eno2``.

	2. On the Fabric switches, the ``eth0`` port is shared between the OpenBMC
	interface and the ONIE/ONL installation. Add a ``bmc`` virtual
	interface with a label of ``eth0`` on each fabric switch, and check the
	``OOB Management`` checkbox.

	12. Create IP addresses for the physical and virtual interfaces. These should
	have the Tenant and VRF set.

	The Management Server should always have the first IP address in each
	range, and they should be incremental, in this order. Examples are given as
	if there was a single instance of each device - adding additional devices
	would increment the later IP addresses.

	* Management Server

	* ``eno1`` - site provided public IP address, or blank if DHCP
	provided

	* ``eno2`` - 10.0.0.1/25 (first of ADMIN) - set as primary IP
	* ``bmc`` - 10.0.0.2/25 (next of ADMIN)
	* ``mgmt800`` - 10.0.0.129/25 (first of MGMT)
	* ``fab801`` - 10.0.1.1/25 (first of FAB)

	* Management Switch

	* ``gbe1`` - 10.0.0.3/25 (next of ADMIN) - set as primary IP

	* Fabric Switch

	* ``eth0`` - 10.0.0.130/25 (next of MGMT), set as primary IP
	* ``bmc`` - 10.0.0.131/25

	* Compute Server

	* ``eth0`` - 10.0.0.132/25 (next of MGMT), set as primary IP
	* ``bmc`` - 10.0.0.4/25 (next of ADMIN)
	* ``qsfp0`` - 10.0.1.2/25 (next of FAB)
	* ``qsfp1`` - 10.0.1.3/25

	* Other Fabric devices (eNB, etc.)

	* ``eth0`` or other primary interface - 10.0.1.4/25 (next of FAB)

	13. Add DHCP ranges to the IP Prefixes for IP's that aren't reserved. These are
	done like any other IP Address, but with the ``Status`` field is set to
	``DHCP``, and they'll consume the entire range of IP addresses given in the
	CIDR mask.

	For example ``10.0.0.32/27`` as a DHCP block would take up 1/4 of the ADMIN
	prefix.

	14. Add Cables between physical interfaces on the devices

	TODO: Explain the cabling topology

	Rackmount of Equipment
	----------------------

	Most of the Pronto equipment has a 19" rackmount form factor.

	Guidelines for mounting this equipment:

	- The EdgeCore Wedge Switches have a front-to-back (aka "port-to-power") fan
	configuration, so hot air exhaust is out the back of the switch near the
	power inlets, away from the 32 QSFP network ports on the front of the switch.

	- The full-depth 1U and 2U Supermicro servers also have front-to-back airflow
	but have most of their ports on the rear of the device.

	- Airflow through the rack should be in one direction to avoid heat being
	pulled from one device into another. This means that to connect the QSFP
	network ports from the servers to the switches, cabling should be routed
	through the rack from front (switch) to back (server). Empty rack spaces
	should be reserved for this purpose.

	- The short-depth management HP Switch and 1U Supermicro servers should be
	mounted on the rear of the rack. They both don't generate an appreciable
	amount of heat, so the airflow direction isn't a significant factor in
	racking them.

	Inventory
	---------

	Once equipment arrives, any device needs to be recorded in inventory if it:

	1. Connects to the network (has a MAC address)
	2. Has a serial number
	3. Isn't a subcomponent (disk, add-in card, linecard, etc.) of a larger device.

	The following information should be recorded for every device:

	- Manufacturer
	- Model
	- Serial Number
	- MAC address (for the primary and any management/BMC/IPMI interfaces)

	This information should be be added to the corresponding Devices ONF NetBox
	instance. The accuracy of this information is very important as it is used in
	bootstrapping the systems.

	Once inventory has been completed, let the Infra team know, and the pxeboot
	configuration will be generated to have the OS preseed files corresponding to the
	new servers based on their serial numbers.

	Cabling and Network Topology
	----------------------------

	TODO: Add diagrams of network here, and cabling plan

	Management Switch Bootstrap
	---------------------------

	TODO: Add instructions for bootstrapping management switch, from document that
	has the linked config file.

	Software Bootstrap
	------------------

	Management Server Bootstrap
	"""""""""""""""""""""""""""

	The management server is bootstrapped into a customized version of the standard
	Ubuntu 18.04 OS installer.

	The `iPXE boot firmware <https://ipxe.org/>`_. is used to start this process
	and is built using the steps detailed in the `ipxe-build
	<https://gerrit.opencord.org/plugins/gitiles/ipxe-build>`_. repo, which
	generates both USB and PXE chainloadable boot images.

	Once a system has been started using these images started, these images will
	download a customized script from an external webserver to continue the boot
	process. This iPXE to webserver connection is secured with mutual TLS
	authentication, enforced by the nginx webserver.

	The iPXE scripts are created by the `pxeboot
	<https://gerrit.opencord.org/plugins/gitiles/ansible/role/pxeboot>`_ role,
	which creates both a boot menu, downloads the appropriate binaries for
	bootstrapping an OS installation, and creates per-node installation preseed files.

	The preseed files contain configuration steps to install the OS from the
	upstream Ubuntu repos, as well as customization of packages and creating the
	``onfadmin`` user.

	TODO: convert instructions for bootstrapping the management server with iPXE here.

	Once the OS is installed on the management server, Ansible is used to remotely
	install software on the management server.

	To checkout the ONF ansible repo and enter the virtualenv with the tooling::

	mkdir infra
	cd infra
	repo init -u ssh://<your gerrit username>@gerrit.opencord.org:29418/infra-manifest
	repo sync
	cd ansible
	make galaxy
	source venv_onfansible/bin/activate

	Obtain the ``undionly.kpxe`` iPXE artifact for bootstrapping the compute
	servers, and put it in the ``playbook/files`` directory.

	Next, create an inventory file to access the NetBox API. An example is given
	in ``inventory/example-netbox.yml`` - duplicate this file and modify it. Fill
	in the ``api_endpoint`` address and ``token`` with an API key you get out of
	the NetBox instance. List the IP Prefixes used by the site in the
	``ip_prefixes`` list.

	Next, run the ``scripts/netbox_edgeconfig.py`` to generate a host_vars file for
	the management server. Assuming that the management server in the edge is
	named ``mgmtserver1.stage1.menlo``, you'd run::

	python scripts/netbox_edgeconfig.py inventory/my-netbox.yml > inventory/host_vars/mgmtserver1.stage1.menlo.yml

	One manual change needs to be made to this output - edit the
	``inventory/host_vars/mgmtserver1.stage1.menlo.yml`` file and add the following
	to the bottom of the file, replacing the IP addresses with *only the lowest
	numbered IP address* the management server has on each VLAN (if >1 IP address
	is assigned to a VLAN or Interface, the DHCP server will fail to run). This
	configures the `netplan <https://netplan.io>`_ on the management server, and
	will be automated away soon::

	# added manually
	netprep_netplan:
	ethernets:
	eno2:
	addresses:
	- 10.0.0.1/25
	vlans:
	mgmt800:
	id: 800
	link: eno2
	addresses:
	- 10.0.0.129/25
	fabr801:
	id: 801
	link: eno2
	addresses:
	- 10.0.1.1/25

	Using the ``inventory/example-aether.ini`` as a template, create an
	:doc:`ansible inventory <ansible:user_guide/intro_inventory>` file for the
	site. Change the device names, IP addresses, and ``onfadmin`` password to match
	the ones for this site. The management server's configuration is in the
	``[aethermgmt]`` and corresponding ``[aethermgmt:vars]`` section.

	Then, to configure a management server, run::

	ansible-playbook -i inventory/sitename.ini playbooks/aethermgmt-playbook.yml

	This installs software with the following functionality:

	- VLANs on second Ethernet port to provide connectivity to the rest of the pod.
	- Firewall with NAT for routing traffic
	- DHCP and TFTP for bootstrapping servers and switches
	- DNS for host naming and identification
	- HTTP server for serving files used for bootstrapping switches
	- Downloads the Tofino switch image
	- Creates user accounts for administrative access

	Compute Server Bootstrap
	""""""""""""""""""""""""

	Once the management server has finished installation, it will be set to offer
	the same iPXE bootstrap file to the computer.

	Each node will be booted, and when iPXE loads select the ``Ubuntu 18.04
	Installer (fully automatic)`` option.

	The nodes can be controlled remotely via their BMC management interfaces - if
	the BMC is at ``10.0.0.3`` a remote user can SSH into them with::

	ssh -L 2443:10.0.0.3:443 onfadmin@<mgmt server ip>

	And then use their web browser to access the BMC at::

	https://localhost:2443

	The default BMC credentials for the Pronto nodes are::

	login: ADMIN
	password: Admin123

	The BMC will also list all of the MAC addresses for the network interfaces
	(including BMC) that are built into the logic board of the system. Add-in
	network cards like the 40GbE ones used in compute servers aren't listed.

	To prepare the compute nodes, software must be installed on them. As they
	can't be accessed directly from your local system, a :ref:`jump host
	<ansible:use_ssh_jump_hosts>` configuration is added, so the SSH connection
	goes through the management server to the compute systems behind it. Doing this
	requires a few steps:

	First, configure SSH to use Agent forwarding - create or edit your
	``~/.ssh/config`` file and add the following lines::

	Host <management server IP>
	ForwardAgent yes

	Then try to login to the management server, then the compute node::

	$ ssh onfadmin@<management server IP>
	Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-54-generic x86_64)
	...
	onfadmin@mgmtserver1:~$ ssh onfadmin@10.0.0.138
	Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-54-generic x86_64)
	...
	onfadmin@node2:~$

	Being able to login to the compute nodes from the management node means that
	SSH Agent forwarding is working correctly.

	Verify that your inventory (Created earlier from the
	``inventory/example-aether.ini`` file) includes an ``[aethercompute]`` section
	that has all the names and IP addresses of the compute nodes in it.

	Then run a ping test::

	ansible -i inventory/sitename.ini -m ping aethercompute

	It may ask you about authorized keys - answer ``yes`` for each host to trust the keys::

	The authenticity of host '10.0.0.138 (<no hostip for proxy command>)' can't be established.
	ECDSA key fingerprint is SHA256:...
	Are you sure you want to continue connecting (yes/no/[fingerprint])? yes

	You should then see a success message for each host::

	node1.stage1.menlo \| SUCCESS => {
	"changed": false,
	"ping": "pong"
	}
	node2.stage1.menlo \| SUCCESS => {
	"changed": false,
	"ping": "pong"
	}
	...

	Once you've seen this, run the playbook to install the prerequisites (Terraform
	user, Docker)::

	ansible-playbook -i inventory/sitename.ini playbooks/aethercompute-playbook.yml

	Note that Docker is quite large and may take a few minutes for installation
	depending on internet connectivity.

	Now that these compute nodes have been brought up, the rest of the installation
	can continue.