Server hardware installation guide (first draft)

Change-Id: Ib4808e87f119229b7c6334e0ba6e69ee57f1230a
diff --git a/conf.py b/conf.py
index 4b3fa63..c28c9a9 100644
--- a/conf.py
+++ b/conf.py
@@ -50,6 +50,7 @@
     'sphinx.ext.coverage',
     'sphinx.ext.graphviz',
     'sphinx.ext.ifconfig',
+    'sphinx.ext.intersphinx',
     'sphinx.ext.mathjax',
     'sphinx.ext.todo',
     'sphinxcontrib.spelling',
@@ -238,10 +239,16 @@
 # -- Options for linkcheck ---------------------------------------------------
 # The link checker strips off .md from links and then complains
 linkcheck_ignore = [
-    r'https://www.sphinx-doc.org',
     r'https://jenkins\.opencord\.org/job/aether-member-only-jobs/.*'
 ]
 
+# -- options for Intersphinx extension ---------------------------------------
+
+intersphinx_mapping = {
+    'sphinx': ('https://www.sphinx-doc.org/en/master', None),
+    'trellis': ('https://docs.trellisfabric.org/master', None),
+    }
+
 def setup(app):
 
     app.add_css_file('css/rtd_theme_mods.css')
diff --git a/dict.txt b/dict.txt
index 63ca643..2b6d568 100644
--- a/dict.txt
+++ b/dict.txt
@@ -1,58 +1,64 @@
+Ansible
+ENB
+Epyc
+Grafana
+IaC
+IaaC
+Jenkins
+Kubernetes
+LTE
+Menlo
+Menlo Park
+ONOS
+PFCP
+Sercomm
+Speedtest
+SupportedTAs
+TOST
+Telegraf
+Terraform
+Tofino
+UPF
+Wireshark
+YAML
 aether
 ansible
-Ansible
+chainloadable
 config
 configs
 controlplane
 dev
 eNB
-ENB
 eNBID
 eNBs
 etcd
 ethernet
 gerrit
-Grafana
-IaC
-Jenkins
 jjb
-Kubernetes
-LTE
 macroENB
 mainboard
+makefile
 menlo
-Menlo
-Menlo Park
 namespace
+nginx
 omec
 onos
-ONOS
 patchset
 pfcp
-PFCP
+preseed
 provisioner
+pxeboot
+rackmount
+reStructuredText
 repo
 repos
 repository
-reStructuredText
 runtime
-Sercomm
-Speedtest
 subnet
-SupportedTAs
 tAC
-telegraf
-Telegraf
-Terraform
 tfvars
-Tofino
 tost
-TOST
 upf
-UPF
 virtualenv
 vpn
-Wireshark
 yaml
-YAML
-makefile
diff --git a/index.rst b/index.rst
index 29b3b24..ac4c557 100644
--- a/index.rst
+++ b/index.rst
@@ -31,3 +31,5 @@
    pronto_deployment_guide/connectivity_service_update.rst
    pronto_deployment_guide/enb_installation.rst
    pronto_deployment_guide/acceptance_test_specification.rst
+
+
diff --git a/pronto_deployment_guide/bootstrapping.rst b/pronto_deployment_guide/bootstrapping.rst
index 093eebe..fcdab9e 100644
--- a/pronto_deployment_guide/bootstrapping.rst
+++ b/pronto_deployment_guide/bootstrapping.rst
@@ -6,6 +6,8 @@
 Bootstrapping
 =============
 
+.. _switch-install:
+
 OS Installation - Switches
 ==========================
 
diff --git a/pronto_deployment_guide/enb_installation.rst b/pronto_deployment_guide/enb_installation.rst
index 393d6da..ed751c2 100644
--- a/pronto_deployment_guide/enb_installation.rst
+++ b/pronto_deployment_guide/enb_installation.rst
@@ -2,7 +2,8 @@
    SPDX-FileCopyrightText: © 2020 Open Networking Foundation <support@opennetworking.org>
    SPDX-License-Identifier: Apache-2.0
 
-================
+.. _enb-installation:
+
 eNB Installation
 ================
 
@@ -59,8 +60,8 @@
 
 With the below credentials, we can log in the UI:
 
-* ID: `sc_femto`
-* Password: `scHt3pp` (or `sc_femto`)
+* ID: ``sc_femto``
+* Password: ``scHt3pp`` (or ``sc_femto``)
 
 After log-in, we can see the state page.
 
@@ -156,8 +157,13 @@
 Then, power on the Sercomm eNB device and get rid of the LAN port cable.
 
 .. note::
-   Without the LAN port cable, we can access the Sercomm eNB admin UI through `https://192.168.251.5` URL, if the laptop/PC is connected with the same network via the fabric switch.
-   For our convenience, we can add some forwarding rules into the `iptable` in the management node to get the Sercomm eNB admin UI outside. It is optional.
+   Without the LAN port cable, we can access the Sercomm eNB admin UI through
+   `https://192.168.251.5` URL, if the laptop/PC is connected to the same
+   network via the fabric switch.
+
+   For our convenience, we can optionally add forwarding rules into the
+   firewall configuration on the management node to access the Sercomm eNB
+   admin UI from outside the network.
 
 Troubleshooting
 ===============
diff --git a/pronto_deployment_guide/hw_installation.rst b/pronto_deployment_guide/hw_installation.rst
index 274c069..cc81564 100644
--- a/pronto_deployment_guide/hw_installation.rst
+++ b/pronto_deployment_guide/hw_installation.rst
@@ -2,7 +2,433 @@
    SPDX-FileCopyrightText: © 2020 Open Networking Foundation <support@opennetworking.org>
    SPDX-License-Identifier: Apache-2.0
 
-=====================
 Hardware Installation
 =====================
 
+Hardware installation breaks down into a few steps:
+
+1. `Planning`_
+2. `Inventory`_
+3. `Rackmount of Equipment`_
+4. `Cabling and Network Topology`_
+5. `Management Switch Bootstrap`_
+6. `Management Server Bootstrap`_
+7. `Server Software Bootstrap`_
+
+Installation of the fabric switch hardware is covered in :ref:`OS Installation
+- Switches <switch-install>`.
+
+Installation of the radio hardware is covered in :ref:`eNB Installation
+<enb-installation>`.
+
+Planning
+--------
+The planning of the network topology and devices, and required cabling
+
+Once planning is complete, equipment is ordered to match the plan.
+
+Network Cable Plan
+""""""""""""""""""
+
+If a 2x2 TOST fabric is used it should be configured as a :doc:`Single-Stage
+Leaf-Spine <trellis:supported-topology>`.
+
+- The links between each leaf and spine switch must be made up of two separate
+  cables.
+
+- Each compute server is dual-homed via a separate cable to two different leaf
+  switches (as in the "paired switches" diagrams).
+
+If only a single P4 switch is used, the :doc:`Simple
+<trellis:supported-topology>` topology is used, with two connections from each
+compute server to the single switch
+
+Additionally a non-fabric switch is required to provide a set of management
+networks.  This management switch is configured with multiple VLANs to separate
+the management plane, fabric, and the out-of-band and lights out management
+connections on the equipment.
+
+Device Naming
+"""""""""""""
+
+Site Design and Bookkeeping
+"""""""""""""""""""""""""""
+
+The following items need to be added to `NetBox
+<https://netbox.readthedocs.io/en/stable>`_ to describe each edge site:
+
+1. Add a Site for the edge (if one doesn't already exist), which has the
+   physical location and contact information for the edge.
+
+2. Add Racks to the Site (if they don't already exist)
+
+3. Add a Tenant for the edge (who owns/manages it), assigned to the ``Pronto``
+   or ``Aether`` Tenant Group.
+
+4. Add a VRF (Routing Table) for the edge site.
+
+5. Add a VLAN Group to the edge site, which groups the site's VLANs and
+   prevents duplication.
+
+6. Add VLANs for the edge site.  These should be assigned a VLAN Group, the
+   Site, and Tenant.
+
+   There can be multiple of the same VLAN in NetBox (VLANs are layer 2, and
+   local to the site), but not within the VLAN group.
+
+   The minimal list of VLANs:
+
+     * ADMIN 1
+     * UPLINK 10
+     * MGMT 800
+     * FAB 801
+
+   If you have multiple deployments at a site using the same management server,
+   add additional VLANs incremented by 10 for the MGMT/FAB - for example:
+
+     * DEVMGMT 810
+     * DEVFAB 801
+
+7. Add IP Prefixes for the site. This should have the Tenant and VRF assigned.
+
+   All edge IP prefixes fit into a ``/22`` sized block.
+
+   The description of the Prefix contains the DNS suffix for all Devices that
+   have IP addresses within this Prefix. The full DNS names are generated by
+   combining the first ``<devname>`` component of the Device names with this
+   suffix.
+
+   An examples using the ``10.0.0.0/22`` block. There are 5 edge
+   prefixes, with the following purposes:
+
+     * ``10.0.0.0/25``
+        * Has the Server BMC/LOM and Management Switch
+        * Assign the ADMIN 1 VLAN
+        * Set the description to ``admin.<deployment>.<site>.aetherproject.net`` (or
+          ``prontoproject.net``).
+
+     * ``10.0.0.128/25``
+        * Has the Server Management plane, Fabric Switch Management/BMC
+        * Assign MGMT 800 VLAN
+        * Set the description to ``<deployment>.<site>.aetherproject.net`` (or
+          ``prontoproject.net``).
+
+     * ``10.0.1.0/24``
+        * Has Compute Node Fabric Connections, devices connected to the Fabric like the eNB
+        * Assign FAB 801 VLAN
+        * Set the description to ``fabric.<deployment>.<site>.aetherproject.net`` (or
+          ``prontoproject.net``).
+
+     * ``10.0.2.0/24``
+        * Kubernetes Pod IP's
+
+     * ``10.0.3.0/24``
+        * Kubernetes Cluster IP's
+
+8. Add Devices to the site, for each piece of equipment. These are named with a
+   scheme similar to the DNS names used for the pod, given in this format::
+
+     <devname>.<deployment>.<site>
+
+   Examples::
+
+     mgmtserver1.ops1.tucson
+     node1.stage1.menlo
+
+   Note that these names are transformed into DNS names using the Prefixes, and
+   may have additional components - ``admin`` or ``fabric`` may be added after
+   the ``<devname>`` for devices on those networks.
+
+   Set the following fields when creating a device:
+
+     * Site
+     * Tenant
+     * Rack & Rack Position
+     * Serial number
+
+   If a specific Device Type doesn't exist for the device, it must be created,
+   which is detailed in the NetBox documentation, or ask the OPs team for help.
+
+9. Set the MAC address for the physical interfaces on the device.
+
+   You may also need to add physical network interfaces if  aren't already
+   created by the Device Type.  An example would be if additional add-in
+   network cards were installed.
+
+10. Add any virtual interfaces to the Devices. When creating a virtual
+    interface, it should have it's ``label`` field set to the physical network
+    interface that it is assigned
+
+    These are needed are two cases for the Pronto deployment:
+
+     1. On the Management Server, there should bet (at least) two VLAN
+        interfaces created attached to the ``eno2`` network port, which
+        are used to provide connectivity to the management plane and fabric.
+        These should be named ``<name of vlan><vlan ID>``, so the MGMT 800 VLAN
+        would become a virtual interface named ``mgmt800``, with the label
+        ``eno2``.
+
+     2. On the Fabric switches, the ``eth0`` port is shared between the OpenBMC
+        interface and the ONIE/ONL installation.  Add a ``bmc`` virtual
+        interface with a label of ``eth0`` on each fabric switch.
+
+11. Create IP addresses for the physical and virtual interfaces.  These should
+    have the Tenant and VRF set.
+
+    The Management Server should always have the first IP address in each
+    range, and they should be incremental, in this order. Examples are given as
+    if there was a single instance of each device - adding additional devices
+    would increment the later IP addresses.
+
+      * Management Server
+          * ``eno1`` - site provided public IP address, or blank if DHCP
+          * ``eno2`` - 10.0.0.1/25 (first of ADMIN) - set as primary IP
+          * ``bmc`` - 10.0.0.2/25 (next of ADMIN)
+          * ``mgmt800`` - 10.0.0.129/25 (first of MGMT)
+          * ``fab801`` - 10.0.1.1/24 (first of FAB)
+
+      * Management Switch
+          * ``gbe1`` - 10.0.0.3/25 (next of ADMIN) - set as primary IP
+
+      * Fabric Switch
+          * ``eth0`` - 10.0.0.130/25 (next of MGMT), set as primary IP
+          * ``bmc`` - 10.0.0.131/25
+
+      * Compute Server
+          * ``eth0`` - 10.0.0.132/25 (next of MGMT), set as primary IP
+          * ``bmc`` - 10.0.0.4/25 (next of ADMIN)
+          * ``qsfp0`` - 10.0.1.2/25 (next of FAB)
+          * ``qsfp1`` - 10.0.1.3/25
+
+      * Other Fabric devices (eNB, etc.)
+          * ``eth0`` or other primary interface - 10.0.1.4/25 (next of FAB)
+
+10. Add Cables between physical interfaces on the devices
+
+    TODO: Explain the cabling topology
+
+Hardware
+""""""""
+
+Fabric Switches
+'''''''''''''''
+
+Pronto currently uses fabric switches based on the Intel (was Barefoot) Tofino
+chipset.  There are multiple variants of this switching chipset, with different
+speeds and capabilities.
+
+The specific hardware models in use in Pronto:
+
+* `EdgeCore Wedge100BF-32X
+  <https://www.edge-core.com/productsInfo.php?cls=1&cls2=180&cls3=181&id=335>`_
+  - a "Dual Pipe" chipset variant, used for the Spine switches
+
+* `EdgeCore Wedge100BF-32QS
+  <https://www.edge-core.com/productsInfo.php?cls=1&cls2=180&cls3=181&id=770>`_
+  - a "Quad Pipe" chipset variant, used for the Leaf switches
+
+Compute Servers
+
+These servers run Kubernetes and edge applications.
+
+The requirements for these servers:
+
+* AMD64 (aka x86-64) architecture
+* Sufficient resources to run Kubernetes
+* Two 40GbE or 100GbE Ethernet connections to the fabric switches
+* One management 1GbE port
+
+The specific hardware models in use in Pronto:
+
+* `Supermicro 6019U-TRTP2
+  <https://www.supermicro.com/en/products/system/1U/6019/SYS-6019U-TRTP2.cfm>`_
+  1U server
+
+* `Supermicro 6029U-TR4
+  <https://www.supermicro.com/en/products/system/2U/6029/SYS-6029U-TR4.cfm>`_
+  2U server
+
+These servers are configured with:
+
+* 2x `Intel Xeon 5220R CPUs
+  <https://ark.intel.com/content/www/us/en/ark/products/199354/intel-xeon-gold-5220r-processor-35-75m-cache-2-20-ghz.html>`_,
+  each with 24 cores, 48 threads
+* 384GB of DDR4 Memory, made up with 12x 16GB ECC DIMMs
+* 2TB of nVME Flash Storage
+* 2x 6TB SATA Disk storage
+* 2x 40GbE ports using an XL710QDA2 NIC
+
+The 1U servers additionally have:
+
+- 2x 1GbE copper network ports
+- 2x 10GbE SFP+ network ports
+
+The 2U servers have:
+
+- 4x 1GbE copper network ports
+
+Management Server
+'''''''''''''''''
+
+One management server is required, which must have at least two 1GbE network
+ports, and runs a variety of network services to support the edge.
+
+The model used in Pronto is a `Supermicro 5019D-FTN4
+<https://www.supermicro.com/en/Aplus/system/Embedded/AS-5019D-FTN4.cfm>`_
+
+Which is configured with:
+
+* AMD Epyc 3251 CPU with 8 cores, 16 threads
+* 32GB of DDR4 memory, in 2x 16GB ECC DIMMs
+* 1TB of nVME Flash storage
+* 4x 1GbE copper network ports
+
+Management Switch
+'''''''''''''''''
+
+This switch connects the configuration interfaces and management networks on
+all the servers and switches together.
+
+In the Pronto deployment this hardware is a `HP/Aruba 2540 Series JL356A
+<https://www.arubanetworks.com/products/switches/access/2540-series/>`_.
+
+Inventory
+---------
+
+Once equipment arrives, any device needs to be recorded in inventory if it:
+
+1. Connects to the network (has a MAC address)
+2. Has a serial number
+3. Isn't a subcomponent (disk, add-in card, linecard, etc.) of a larger device.
+
+The following information should be recorded for every device:
+
+- Manufacturer
+- Model
+- Serial Number
+- MAC address (for the primary and any management/BMC/IPMI interfaces)
+
+This information should be be added to the corresponding Devices ONF NetBox
+instance.  The accuracy of this information is very important as it is used in
+bootstrapping the systems.
+
+Once inventory has been completed, let the Infra team know, and the pxeboot
+configuration will be generated to have the OS preseed files corresponding to the
+new servers based on their serial numbers.
+
+Rackmount of Equipment
+----------------------
+
+Most of the Pronto equipment is in a 19" rackmount form factor.
+
+Guidelines for mounting this equipment:
+
+- The EdgeCore Wedge Switches have a front-to-back (aka "port-to-power") fan
+  configuration, so hot air exhaust is out the back of the switch near the
+  power inlets, away from the 32 QSFP network ports on the front of the switch.
+
+- The full-depth 1U and 2U Supermicro servers also have front-to-back airflow
+  but have most of their ports on the rear of the device.
+
+- Airflow through the rack should be in one direction to avoid heat being
+  pulled from one device into another.  This means that to connect the QSFP
+  network ports from the servers to the switches, cabling should be routed
+  through the rack from front (switch) to back (server).
+
+- The short-depth management HP Switch and 1U Supermicro servers should be
+  mounted to the rear of the rack.  They both don't generate an appreciable
+  amount of heat, so the airflow direction isn't a significant factor in
+  racking them.
+
+Cabling and Network Topology
+----------------------------
+
+TODO: Add diagrams of network here, and cabling plan
+
+Management Switch Bootstrap
+---------------------------
+
+TODO: Add instructions for bootstrapping management switch, from document that
+has the linked config file.
+
+Server Software Bootstrap
+-------------------------
+
+Management Server Bootstrap
+"""""""""""""""""""""""""""
+
+The management server is bootstrapped into a customized version of the standard
+Ubuntu 18.04 OS installer.
+
+The `iPXE boot firmware <https://ipxe.org/>`_. is used to start this process
+and is built using the steps detailed in the `ipxe-build
+<https://gerrit.opencord.org/plugins/gitiles/ipxe-build>`_. repo, which
+generates both USB and PXE chainloadable boot images.
+
+Once a system has been started using these images started, these images will
+download a customized script from  an external webserver to continue the boot
+process. This iPXE to webserver connection is secured with mutual TLS
+authentication, enforced by the nginx webserver.
+
+The iPXE scripts are created by the `pxeboot
+<https://gerrit.opencord.org/plugins/gitiles/ansible/role/pxeboot>`_ role,
+which creates both a boot menu, downloads the appropriate binaries for
+bootstrapping an OS installation, and creates per-node installation preseed files.
+
+The preseed files contain configuration steps to install the OS from the
+upstream Ubuntu repos, as well as customization of packages and creating the
+``onfadmin`` user.
+
+TODO: convert instructions for bootstrapping the management server with iPXE here.
+
+Once the OS is installed on the management server, Ansible is used to remotely
+install software on the management server.
+
+To checkout the ONF ansible repo and enter the virtualenv with the tooling::
+
+  mkdir infra
+  cd infra
+  repo init -u ssh://<your gerrit username>@gerrit.opencord.org:29418/infra-manifest
+  repo sync
+  cd ansible
+  make galaxy
+  source venv_onfansible/bin/activate
+
+Next, create an inventory file to access the NetBox API.  An example is given
+in ``inventory/example-netbox.yml`` - duplicate this file and modify it. Fill
+in the ``api_endpoint`` address and ``token`` with an API key you get out of
+the NetBox instance.  List the IP Prefixes used by the site in the
+``ip_prefixes`` list.
+
+Next, run the ``scripts/netbox_edgeconfig.py`` to generate a host_vars file for
+the management server.  Assuming that the management server in the edge is
+named ``mgmtserver1.stage1.menlo``, you'd run::
+
+  python scripts/netbox_edgeconfig.py inventory/my-netbox.yml > inventory/host_vars/mgmtserver1.stage1.menlo.yml
+
+And create an inventory file for the management server in
+``inventory/menlo-staging.ini`` which contains::
+
+  [mgmt]
+  mgmtserver1.stage1.menlo ansible_host=<public ip address> ansible_user="onfadmin" ansible_become_password=<password>
+
+Then, to configure a management server, run::
+
+  ansible-playbook -i inventory/menlo-staging.ini playbooks/prontomgmt-playbook.yml
+
+This installs software with the following functionality:
+
+- VLANs on second Ethernet port to provide connectivity to the rest of the pod.
+- Firewall with NAT for routing traffic
+- DHCP and TFTP for bootstrapping servers and switches
+- DNS for host naming and identification
+- HTTP server for serving files used for bootstrapping other equipment
+
+Compute Server Bootstrap
+""""""""""""""""""""""""
+
+Once the management server has finished installation, it will be set to offer
+the same iPXE bootstrap file to the computer.
+
+Each node will be booted, and when iPXE loads select the ``Ubuntu 18.04
+Installer (fully automatic)`` option.
diff --git a/readme.rst b/readme.rst
index 153cfd0..d77f726 100644
--- a/readme.rst
+++ b/readme.rst
@@ -4,15 +4,25 @@
 Writing Documentation
 ---------------------
 
-Docs are generated using `Sphinx <https://www.sphinx-doc.org/en/master/>`_.
+Docs are generated using :doc:`Sphinx <sphinx:usage/index>`.
 
-Documentation is written in `reStructuredText
-<https://www.sphinx-doc.org/en/master/usage/restructuredtext/>`_.
+Documentation is written in :doc:`reStructuredText
+<sphinx:usage/restructuredtext/basics>`.
 
 In reStructuredText documents, to create the section hierarchy (mapped in HTML
 to ``<h1>`` through ``<h5>``) use these characters to underline headings in the
 order given: ``=``, ``-`` ``"``, ``'``, ``^``.
 
+Referencing other Documentation
+-------------------------------
+
+Other Sphinx-built documentation, both ONF and non-ONF can be linked to using
+:doc:`Intersphinx <sphinx:usage/extensions/intersphinx>`.
+
+You can see all link targets available on a remote Sphinx's docs by running::
+
+  python -msphinx.ext.intersphinx http://otherdocs/objects.inv
+
 Building the Docs
 ------------------
 
@@ -50,9 +60,8 @@
 diagrams within the documentation. This is preferred over images as it's easier
 to change and see changes over time as a diff.
 
-`Graphviz
-<https://www.sphinx-doc.org/en/master/usage/extensions/graphviz.html>`_
-supports many standard graph types.
+:doc:`Graphviz <sphinx:usage/extensions/graphviz>` supports many standard graph
+types.
 
 The `blockdiag <http://blockdiag.com/en/blockdiag/sphinxcontrib.html>`_,
 `nwdiag, and rackdiag <http://blockdiag.com/en/nwdiag/sphinxcontrib.html>`_,