Reorg, updates and troubleshooting guide
- Expanded hw install
- Change fabric switch bootstrap to DHCP/HTTP based ONL install
- Start of operations and troubleshooting guide
- Various grammar/spelling fixes, dictionary expansion
Change-Id: I9b30d63a97e4443ea3871ee880646e161de8969a
diff --git a/pronto_deployment_guide/bootstrapping.rst b/pronto_deployment_guide/bootstrapping.rst
index fcdab9e..5b0d895 100644
--- a/pronto_deployment_guide/bootstrapping.rst
+++ b/pronto_deployment_guide/bootstrapping.rst
@@ -2,107 +2,179 @@
SPDX-FileCopyrightText: © 2020 Open Networking Foundation <support@opennetworking.org>
SPDX-License-Identifier: Apache-2.0
-=============
Bootstrapping
=============
.. _switch-install:
OS Installation - Switches
-==========================
+--------------------------
+
+The installation of the ONL OS image on the fabric switches uses the DHCP and
+HTTP server set up on the management server.
+
+The default image is downloaded during that installation process by the
+``onieboot`` role. Make changes to that roll and rerun the management playbook
+to download a newer switch image.
+
+Preparation
+"""""""""""
+
+The switches have a single ethernet port that is shared between OpenBMC and
+ONL. Find out the MAC addresses for both of these ports and enter it into
+NetBox.
+
+Change boot mode to ONIE Rescue mode
+""""""""""""""""""""""""""""""""""""
+
+In order to reinstall an ONL image, you must change the ONIE bootloader to
+"Rescue Mode".
+
+Once the switch is powered on, it should retrieve an IP address on the OpenBMC
+interface with DHCP. OpenBMC uses these default credentials::
+
+ username: root
+ password: 0penBmc
+
+Login to OpenBMC with SSH::
+
+ $ ssh root@10.0.0.131
+ The authenticity of host '10.0.0.131 (10.0.0.131)' can't be established.
+ ECDSA key fingerprint is SHA256:...
+ Are you sure you want to continue connecting (yes/no)? yes
+ Warning: Permanently added '10.0.0.131' (ECDSA) to the list of known hosts.
+ root@10.0.0.131's password:
+ root@bmc:~#
+
+Using the Serial-over-LAN Console, enter ONL::
+
+ root@bmc:~# /usr/local/bin/sol.sh
+ You are in SOL session.
+ Use ctrl-x to quit.
+ -----------------------
+
+ root@onl:~#
.. note::
+ If `sol.sh` is unresponsive, please try to restart the mainboard with::
- This part will be done automatically once we have a DHCP and HTTP server set up in the infrastructure.
- For now, we need to download and install the ONL image manually.
+ root@onl:~# wedge_power.sh restart
-Install ONL with Docker
------------------------
-First, enter **ONIE rescue mode**.
-Set up IP and route
-^^^^^^^^^^^^^^^^^^^
-.. code-block:: console
+Change the boot mode to rescue mode with the command ``onl-onie-boot-mode
+rescue``, and reboot::
- # ip addr add 10.92.1.81/24 dev eth0
- # ip route add default via 10.92.1.1
+ root@onl:~# onl-onie-boot-mode rescue
+ [1053033.768512] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)
+ [1053033.936893] EXT4-fs (sda3): re-mounted. Opts: (null)
+ [1053033.996727] EXT4-fs (sda3): re-mounted. Opts: (null)
+ The system will boot into ONIE rescue mode at the next restart.
+ root@onl:~# reboot
-- `10.92.1.81/24` should be replaced by the actual IP and subnet of the ONL.
-- `10.92.1.1` should be replaced by the actual default gateway.
+At this point, ONL will go through it's shutdown sequence and ONIE will start.
+If it does not start right away, press the Enter/Return key a few times - it
+may show you a boot selection screen. Pick ``ONIE`` and ``Rescue`` if given a
+choice.
-Download and install ONL
-^^^^^^^^^^^^^^^^^^^^^^^^
+Installing an ONL image over HTTP
+"""""""""""""""""""""""""""""""""
-.. code-block:: console
+Now that the switch is in Rescue mode
- # wget https://github.com/opennetworkinglab/OpenNetworkLinux/releases/download/v1.3.2/ONL-onf-ONLPv2_ONL-OS_2020-10-09.1741-f7428f2_AMD64_INSTALLED_INSTALLER
- # sh ONL-onf-ONLPv2_ONL-OS_2020-10-09.1741-f7428f2_AMD64_INSTALLED_INSTALLER
+First, activate the Console by pressing Enter::
-The switch will reboot automatically once the installer is done.
+ discover: Rescue mode detected. Installer disabled.
-.. note::
+ Please press Enter to activate this console.
+ To check the install status inspect /var/log/onie.log.
+ Try this: tail -f /var/log/onie.log
- Alternatively, we can `scp` the ONL installer into ONIE manually.
+ ** Rescue Mode Enabled **
+ ONIE:/ #
-Setup BMC for remote console access
------------------------------------
-Log in to the BMC from ONL by
+Then run the ``onie-nos-install`` command, with the URL of the management
+server on the management network segment::
-.. code-block:: console
+ ONIE:/ # onie-nos-install http://10.0.0.129/onie-installer
+ discover: Rescue mode detected. No discover stopped.
+ ONIE: Unable to find 'Serial Number' TLV in EEPROM data.
+ Info: Fetching http://10.0.0.129/onie-installer ...
+ Connecting to 10.0.0.129 (10.0.0.129:80)
+ installer 100% |*******************************| 322M 0:00:00 ETA
+ ONIE: Executing installer: http://10.0.0.129/onie-installer
+ installer: computing checksum of original archive
+ installer: checksum is OK
+ ...
- # ssh root@192.168.0.1 # pass: 0penBmc
+The installation will now start, and then ONL will boot culminating in::
-on `usb0` interface.
+ Open Network Linux OS ONL-wedge100bf-32qs, 2020-11-04.19:44-64100e9
-Once you are in the BMC, run the following commands to setup IP and route (or offer a fixed IP with DHCP)
+ localhost login:
-.. code-block:: console
+The default ONL login is::
- # ip addr add 10.92.1.85/24 dev eth0
- # ip route add default via 10.92.1.1
+ username: root
+ password: onl
-- `10.92.1.85/24` should be replaced by the actual IP and subnet of the BMC.
- Note that it should be different from the ONL IP.
-- `10.92.1.1` should be replaced by the actual default gateway.
+If you login, you can verify that the switch is getting it's IP address via
+DHCP::
-BMC uses the same ethernet port as ONL management so you should give it an IP address in the same subnet.
-BMC address will preserve during ONL reboot, but won’t be preserved during power outage.
+ root@localhost:~# ip addr
+ ...
+ 3: ma1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
+ link/ether 00:90:fb:5c:e1:97 brd ff:ff:ff:ff:ff:ff
+ inet 10.0.0.130/25 brd 10.0.0.255 scope global ma1
+ ...
-To log in to ONL console from BMC, run
-.. code-block:: console
+Post-ONL Configuration
+""""""""""""""""""""""
- # /usr/local/bin/sol.sh
+A ``terraform`` user must be created on the switches to allow them to be
+configured.
-If `sol.sh` is unresponsive, please try to restart the mainboard with
+This is done using Ansible. Verify that your inventory (Created earlier from the
+``inventory/example-aether.ini`` file) includes an ``[aetherfabric]`` section
+that has all the names and IP addresses of the compute nodes in it.
-.. code-block:: console
+Then run a ping test::
- # wedge_power.sh restart
+ ansible -i inventory/sitename.ini -m ping aetherfabric
-Setup network and host name for ONL
------------------------------------
+This may fail with the error::
-.. code-block:: console
+ "msg": "Using a SSH password instead of a key is not possible because Host Key checking is enabled and sshpass does not support this. Please add this host's fingerprint to your known_hosts file to manage this host."
- # hostnamectl set-hostname <host-name>
+Comment out the ``ansible_ssh_pass="onl"`` line, then rerun the ping test. It
+may ask you about authorized keys - answer ``yes`` for each host to trust the
+keys::
- # vim.tiny /etc/hosts # update accordingly
- # cat /etc/hosts # example
- 127.0.0.1 localhost
- 10.92.1.81 menlo-staging-spine-1
+ The authenticity of host '10.0.0.138 (<no hostip for proxy command>)' can't be established.
+ ECDSA key fingerprint is SHA256:...
+ Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
- # vim.tiny /etc/network/interfaces.d/ma1 # update accordingly
- # cat /etc/network/interfaces.d/ma1 # example
- auto ma1
- iface ma1 inet static
- address 10.92.1.81
- netmask 255.255.255.0
- gateway 10.92.1.1
- dns-nameservers 8.8.8.8
+Once you've trusted the host keys, the ping test should succeed::
+
+ spine1.role1.site | SUCCESS => {
+ "changed": false,
+ "ping": "pong"
+ }
+ leaf1.role1.site | SUCCESS => {
+ "changed": false,
+ "ping": "pong"
+ }
+ ...
+
+Then run the playbook to create the ``terraform`` user::
+
+ ansible-playbook -i inventory/sitename.ini playbooks/aetherfabric-playbook.yml
+
+Once completed, the switch should now be ready for TOST runtime install.
VPN
-===
+---
+
This section walks you through how to set up a VPN between ACE and Aether Central in GCP.
We will be using GitOps based Aether CD pipeline for this,
so we just need to create a patch to **aether-pod-configs** repository.
@@ -115,7 +187,8 @@
:ref:`Add ACE to an existing VPN connection <add_ace_to_vpn>`
Before you begin
-----------------
+""""""""""""""""
+
* Make sure firewall in front of ACE allows UDP port 500, UDP port 4500, and ESP packets
from **gcpvpn1.infra.aetherproject.net(35.242.47.15)** and **gcpvpn2.infra.aetherproject.net(34.104.68.78)**
* Make sure that the external IP on ACE side is owned by or routed to the management node
@@ -146,7 +219,8 @@
+-----------------------------+----------------------------------+
Download aether-pod-configs repository
---------------------------------------
+""""""""""""""""""""""""""""""""""""""
+
.. code-block:: shell
$ cd $WORKDIR
@@ -155,7 +229,8 @@
.. _update_global_resource:
Update global resource maps
----------------------------
+"""""""""""""""""""""""""""
+
Add a new ACE information at the end of the following global resource maps.
* user_map.tfvars
@@ -245,7 +320,8 @@
Create ACE specific configurations
-----------------------------------
+""""""""""""""""""""""""""""""""""
+
In this step, we will create a directory under `production` with the same name as ACE,
and add several Terraform configurations and Ansible inventory needed to configure a VPN connection.
Throughout the deployment procedure, this directory will contain all ACE specific configurations.
@@ -277,7 +353,8 @@
when using a different BOM.
Create a review request
------------------------
+"""""""""""""""""""""""
+
.. code-block:: shell
$ cd $WORKDIR/aether-pod-configs/production
@@ -302,7 +379,8 @@
CD pipeline will create VPN tunnels on both GCP and the management node.
Verify VPN connection
----------------------
+"""""""""""""""""""""
+
You can verify the VPN connections after successful post-merge job
by checking the routing table on the management node and trying to ping to one of the central cluster VMs.
Make sure two tunnel interfaces, `gcp_tunnel1` and `gcp_tunnel2`, exist
@@ -335,7 +413,8 @@
Post VPN setup
---------------
+""""""""""""""
+
Once you verify the VPN connections, please update `ansible` directory name to `_ansible` to prevent
the ansible playbook from running again.
Note that it is no harm to re-run the ansible playbook but not recommended.
@@ -351,7 +430,8 @@
.. _add_ace_to_vpn:
Add another ACE to an existing VPN connection
----------------------------------------------
+"""""""""""""""""""""""""""""""""""""""""""""
+
VPN connections can be shared when there are multiple ACE clusters in a site.
In order to add ACE to an existing VPN connection,
you'll have to SSH into the management node and manually update BIRD configuration.
diff --git a/pronto_deployment_guide/hw_installation.rst b/pronto_deployment_guide/hw_installation.rst
index 46c30d6..e0543d2 100644
--- a/pronto_deployment_guide/hw_installation.rst
+++ b/pronto_deployment_guide/hw_installation.rst
@@ -5,15 +5,9 @@
Hardware Installation
=====================
-Hardware installation breaks down into a few steps:
-
-1. `Planning`_
-2. `Inventory`_
-3. `Rackmount of Equipment`_
-4. `Cabling and Network Topology`_
-5. `Management Switch Bootstrap`_
-6. `Management Server Bootstrap`_
-7. `Server Software Bootstrap`_
+Once the hardware has been ordered, the installation can be planned and
+implemented. This document describes the installation of the servers and
+software.
Installation of the fabric switch hardware is covered in :ref:`OS Installation
- Switches <switch-install>`.
@@ -21,38 +15,8 @@
Installation of the radio hardware is covered in :ref:`eNB Installation
<enb-installation>`.
-Planning
---------
-The planning of the network topology and devices, and required cabling
-
-Once planning is complete, equipment is ordered to match the plan.
-
-Network Cable Plan
-""""""""""""""""""
-
-If a 2x2 TOST fabric is used it should be configured as a :doc:`Single-Stage
-Leaf-Spine <trellis:supported-topology>`.
-
-- The links between each leaf and spine switch must be made up of two separate
- cables.
-
-- Each compute server is dual-homed via a separate cable to two different leaf
- switches (as in the "paired switches" diagrams).
-
-If only a single P4 switch is used, the :doc:`Simple
-<trellis:supported-topology>` topology is used, with two connections from each
-compute server to the single switch
-
-Additionally a non-fabric switch is required to provide a set of management
-networks. This management switch is configured with multiple VLANs to separate
-the management plane, fabric, and the out-of-band and lights out management
-connections on the equipment.
-
-Device Naming
-"""""""""""""
-
-Site Design and Bookkeeping
-"""""""""""""""""""""""""""
+Site Bookkeeping
+----------------
The following items need to be added to `NetBox
<https://netbox.readthedocs.io/en/stable>`_ to describe each edge site:
@@ -60,15 +24,18 @@
1. Add a Site for the edge (if one doesn't already exist), which has the
physical location and contact information for the edge.
-2. Add Racks to the Site (if they don't already exist)
+2. Add equipment Racks to the Site (if they don't already exist).
3. Add a Tenant for the edge (who owns/manages it), assigned to the ``Pronto``
or ``Aether`` Tenant Group.
-4. Add a VRF (Routing Table) for the edge site.
+4. Add a VRF (Routing Table) for the edge site. This is usually just the name
+ of the site. Make sure that ``Enforce unique space`` is checked, so that IP
+ addresses within the VRF are forced to be unique, and that the Tenant Group
+ and Tenant are set.
5. Add a VLAN Group to the edge site, which groups the site's VLANs and
- prevents duplication.
+ requires that they have a unique VLAN number.
6. Add VLANs for the edge site. These should be assigned a VLAN Group, the
Site, and Tenant.
@@ -165,6 +132,9 @@
If a specific Device Type doesn't exist for the device, it must be created,
which is detailed in the NetBox documentation, or ask the OPs team for help.
+ See `Rackmount of Equipment`_ below for guidance on how equipment should be
+ mounted in the Rack.
+
9. Add Services to the management server:
* name: ``dns``
@@ -175,8 +145,8 @@
protocol: UDP
port: 69
- These are used by the DHCP and DNS config to know which servers offer a
- dns service and tftp.
+ These are used by the DHCP and DNS config to know which servers offer
+ DNS or TFTP service.
10. Set the MAC address for the physical interfaces on the device.
@@ -252,90 +222,30 @@
TODO: Explain the cabling topology
-Hardware
-""""""""
+Rackmount of Equipment
+----------------------
-Fabric Switches
-'''''''''''''''
+Most of the Pronto equipment has a 19" rackmount form factor.
-Pronto currently uses fabric switches based on the Intel (was Barefoot) Tofino
-chipset. There are multiple variants of this switching chipset, with different
-speeds and capabilities.
+Guidelines for mounting this equipment:
-The specific hardware models in use in Pronto:
+- The EdgeCore Wedge Switches have a front-to-back (aka "port-to-power") fan
+ configuration, so hot air exhaust is out the back of the switch near the
+ power inlets, away from the 32 QSFP network ports on the front of the switch.
-* `EdgeCore Wedge100BF-32X
- <https://www.edge-core.com/productsInfo.php?cls=1&cls2=180&cls3=181&id=335>`_
- - a "Dual Pipe" chipset variant, used for the Spine switches
+- The full-depth 1U and 2U Supermicro servers also have front-to-back airflow
+ but have most of their ports on the rear of the device.
-* `EdgeCore Wedge100BF-32QS
- <https://www.edge-core.com/productsInfo.php?cls=1&cls2=180&cls3=181&id=770>`_
- - a "Quad Pipe" chipset variant, used for the Leaf switches
+- Airflow through the rack should be in one direction to avoid heat being
+ pulled from one device into another. This means that to connect the QSFP
+ network ports from the servers to the switches, cabling should be routed
+ through the rack from front (switch) to back (server). Empty rack spaces
+ should be reserved for this purpose.
-Compute Servers
-
-These servers run Kubernetes and edge applications.
-
-The requirements for these servers:
-
-* AMD64 (aka x86-64) architecture
-* Sufficient resources to run Kubernetes
-* Two 40GbE or 100GbE Ethernet connections to the fabric switches
-* One management 1GbE port
-
-The specific hardware models in use in Pronto:
-
-* `Supermicro 6019U-TRTP2
- <https://www.supermicro.com/en/products/system/1U/6019/SYS-6019U-TRTP2.cfm>`_
- 1U server
-
-* `Supermicro 6029U-TR4
- <https://www.supermicro.com/en/products/system/2U/6029/SYS-6029U-TR4.cfm>`_
- 2U server
-
-These servers are configured with:
-
-* 2x `Intel Xeon 5220R CPUs
- <https://ark.intel.com/content/www/us/en/ark/products/199354/intel-xeon-gold-5220r-processor-35-75m-cache-2-20-ghz.html>`_,
- each with 24 cores, 48 threads
-* 384GB of DDR4 Memory, made up with 12x 16GB ECC DIMMs
-* 2TB of nVME Flash Storage
-* 2x 6TB SATA Disk storage
-* 2x 40GbE ports using an XL710QDA2 NIC
-
-The 1U servers additionally have:
-
-- 2x 1GbE copper network ports
-- 2x 10GbE SFP+ network ports
-
-The 2U servers have:
-
-- 4x 1GbE copper network ports
-
-Management Server
-'''''''''''''''''
-
-One management server is required, which must have at least two 1GbE network
-ports, and runs a variety of network services to support the edge.
-
-The model used in Pronto is a `Supermicro 5019D-FTN4
-<https://www.supermicro.com/en/Aplus/system/Embedded/AS-5019D-FTN4.cfm>`_
-
-Which is configured with:
-
-* AMD Epyc 3251 CPU with 8 cores, 16 threads
-* 32GB of DDR4 memory, in 2x 16GB ECC DIMMs
-* 1TB of nVME Flash storage
-* 4x 1GbE copper network ports
-
-Management Switch
-'''''''''''''''''
-
-This switch connects the configuration interfaces and management networks on
-all the servers and switches together.
-
-In the Pronto deployment this hardware is a `HP/Aruba 2540 Series JL356A
-<https://www.arubanetworks.com/products/switches/access/2540-series/>`_.
+- The short-depth management HP Switch and 1U Supermicro servers should be
+ mounted on the rear of the rack. They both don't generate an appreciable
+ amount of heat, so the airflow direction isn't a significant factor in
+ racking them.
Inventory
---------
@@ -361,30 +271,6 @@
configuration will be generated to have the OS preseed files corresponding to the
new servers based on their serial numbers.
-Rackmount of Equipment
-----------------------
-
-Most of the Pronto equipment is in a 19" rackmount form factor.
-
-Guidelines for mounting this equipment:
-
-- The EdgeCore Wedge Switches have a front-to-back (aka "port-to-power") fan
- configuration, so hot air exhaust is out the back of the switch near the
- power inlets, away from the 32 QSFP network ports on the front of the switch.
-
-- The full-depth 1U and 2U Supermicro servers also have front-to-back airflow
- but have most of their ports on the rear of the device.
-
-- Airflow through the rack should be in one direction to avoid heat being
- pulled from one device into another. This means that to connect the QSFP
- network ports from the servers to the switches, cabling should be routed
- through the rack from front (switch) to back (server).
-
-- The short-depth management HP Switch and 1U Supermicro servers should be
- mounted to the rear of the rack. They both don't generate an appreciable
- amount of heat, so the airflow direction isn't a significant factor in
- racking them.
-
Cabling and Network Topology
----------------------------
@@ -396,8 +282,8 @@
TODO: Add instructions for bootstrapping management switch, from document that
has the linked config file.
-Server Software Bootstrap
--------------------------
+Software Bootstrap
+------------------
Management Server Bootstrap
"""""""""""""""""""""""""""
@@ -440,7 +326,7 @@
source venv_onfansible/bin/activate
Obtain the ``undionly.kpxe`` iPXE artifact for bootstrapping the compute
-servers, and put it in the ``files`` directory.
+servers, and put it in the ``playbook/files`` directory.
Next, create an inventory file to access the NetBox API. An example is given
in ``inventory/example-netbox.yml`` - duplicate this file and modify it. Fill
@@ -456,10 +342,11 @@
One manual change needs to be made to this output - edit the
``inventory/host_vars/mgmtserver1.stage1.menlo.yml`` file and add the following
-to the bottom of the file, replacing the IP addresses with the ones that the
-management server is configured with on each VLAN. This configures the `netplan
-<https://netplan.io>`_ on the management server, and will be automated away
-soon::
+to the bottom of the file, replacing the IP addresses with *only the lowest
+numbered IP address* the management server has on each VLAN (if >1 IP address
+is assigned to a VLAN or Interface, the DHCP server will fail to run). This
+configures the `netplan <https://netplan.io>`_ on the management server, and
+will be automated away soon::
# added manually
netprep_netplan:
@@ -479,15 +366,15 @@
addresses:
- 10.0.1.1/25
-Create an inventory file for the management server in
-``inventory/menlo-staging.ini`` which contains::
-
- [mgmt]
- mgmtserver1.stage1.menlo ansible_host=<public ip address> ansible_user="onfadmin" ansible_become_password=<password>
+Using the ``inventory/example-aether.ini`` as a template, create an
+:doc:`ansible inventory <ansible:user_guide/intro_inventory>` file for the
+site. Change the device names, IP addresses, and ``onfadmin`` password to match
+the ones for this site. The management server's configuration is in the
+``[aethermgmt]`` and corresponding ``[aethermgmt:vars]`` section.
Then, to configure a management server, run::
- ansible-playbook -i inventory/menlo-staging.ini playbooks/aethermgmt-playbook.yml
+ ansible-playbook -i inventory/sitename.ini playbooks/aethermgmt-playbook.yml
This installs software with the following functionality:
@@ -496,6 +383,8 @@
- DHCP and TFTP for bootstrapping servers and switches
- DNS for host naming and identification
- HTTP server for serving files used for bootstrapping switches
+- Downloads the Tofino switch image
+- Creates user accounts for administrative access
Compute Server Bootstrap
""""""""""""""""""""""""
@@ -520,4 +409,68 @@
login: ADMIN
password: Admin123
-Once these nodes are brought up, the installation can continue.
+The BMC will also list all of the MAC addresses for the network interfaces
+(including BMC) that are built into the logic board of the system. Add-in
+network cards like the 40GbE ones used in compute servers aren't listed.
+
+To prepare the compute nodes, software must be installed on them. As they
+can't be accessed directly from your local system, a :ref:`jump host
+<ansible:use_ssh_jump_hosts>` configuration is added, so the SSH connection
+goes through the management server to the compute systems behind it. Doing this
+requires a few steps:
+
+First, configure SSH to use Agent forwarding - create or edit your
+``~/.ssh/config`` file and add the following lines::
+
+ Host <management server IP>
+ ForwardAgent yes
+
+Then try to login to the management server, then the compute node::
+
+ $ ssh onfadmin@<management server IP>
+ Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-54-generic x86_64)
+ ...
+ onfadmin@mgmtserver1:~$ ssh onfadmin@10.0.0.138
+ Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-54-generic x86_64)
+ ...
+ onfadmin@node2:~$
+
+Being able to login to the compute nodes from the management node means that
+SSH Agent forwarding is working correctly.
+
+Verify that your inventory (Created earlier from the
+``inventory/example-aether.ini`` file) includes an ``[aethercompute]`` section
+that has all the names and IP addresses of the compute nodes in it.
+
+Then run a ping test::
+
+ ansible -i inventory/sitename.ini -m ping aethercompute
+
+It may ask you about authorized keys - answer ``yes`` for each host to trust the keys::
+
+ The authenticity of host '10.0.0.138 (<no hostip for proxy command>)' can't be established.
+ ECDSA key fingerprint is SHA256:...
+ Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
+
+You should then see a success message for each host::
+
+ node1.stage1.menlo | SUCCESS => {
+ "changed": false,
+ "ping": "pong"
+ }
+ node2.stage1.menlo | SUCCESS => {
+ "changed": false,
+ "ping": "pong"
+ }
+ ...
+
+Once you've seen this, run the playbook to install the prerequisites (Terraform
+user, Docker)::
+
+ ansible-playbook -i inventory/sitename.ini playbooks/aethercompute-playbook.yml
+
+Note that Docker is quite large and may take a few minutes for installation
+depending on internet connectivity.
+
+Now that these compute nodes have been brought up, the rest of the installation
+can continue.
diff --git a/pronto_deployment_guide/overview.rst b/pronto_deployment_guide/overview.rst
index 18d289a..10f2e84 100644
--- a/pronto_deployment_guide/overview.rst
+++ b/pronto_deployment_guide/overview.rst
@@ -2,6 +2,120 @@
SPDX-FileCopyrightText: © 2020 Open Networking Foundation <support@opennetworking.org>
SPDX-License-Identifier: Apache-2.0
-========
Overview
========
+
+A Pronto deployment must have a detailed plan of the network topology and
+devices, and required cabling before being put assembled.
+
+Once planning is complete, equipment should be ordered to match the plan. The
+VAR we've used for most Pronto equipment is ASA (aka "RackLive").
+
+Network Cable Plan
+------------------
+
+If a 2x2 TOST fabric is used it should be configured as a :doc:`Single-Stage
+Leaf-Spine <trellis:supported-topology>`.
+
+- The links between each leaf and spine switch must be made up of two separate
+ cables.
+
+- Each compute server is dual-homed via a separate cable to two different leaf
+ switches (as in the "paired switches" diagrams).
+
+If only a single P4 switch is used, the :doc:`Simple
+<trellis:supported-topology>` topology is used, with two connections from each
+compute server to the single switch
+
+Additionally a non-fabric switch is required to provide a set of management
+networks. This management switch is configured with multiple VLANs to separate
+the management plane, fabric, and the out-of-band and lights out management
+connections on the equipment.
+
+
+Required Hardware
+-----------------
+
+Fabric Switches
+"""""""""""""""
+
+Pronto currently uses fabric switches based on the Intel (was Barefoot) Tofino
+chipset. There are multiple variants of this switching chipset, with different
+speeds and capabilities.
+
+The specific hardware models in use in Pronto:
+
+* `EdgeCore Wedge100BF-32X
+ <https://www.edge-core.com/productsInfo.php?cls=1&cls2=180&cls3=181&id=335>`_
+ - a "Dual Pipe" chipset variant, used for the Spine switches
+
+* `EdgeCore Wedge100BF-32QS
+ <https://www.edge-core.com/productsInfo.php?cls=1&cls2=180&cls3=181&id=770>`_
+ - a "Quad Pipe" chipset variant, used for the Leaf switches
+
+Compute Servers
+"""""""""""""""
+
+These servers run Kubernetes and edge applications.
+
+The requirements for these servers:
+
+* AMD64 (aka x86-64) architecture
+* Sufficient resources to run Kubernetes
+* Two 40GbE or 100GbE Ethernet connections to the fabric switches
+* One management 1GbE port
+
+The specific hardware models in use in Pronto:
+
+* `Supermicro 6019U-TRTP2
+ <https://www.supermicro.com/en/products/system/1U/6019/SYS-6019U-TRTP2.cfm>`_
+ 1U server
+
+* `Supermicro 6029U-TR4
+ <https://www.supermicro.com/en/products/system/2U/6029/SYS-6029U-TR4.cfm>`_
+ 2U server
+
+These servers are configured with:
+
+* 2x `Intel Xeon 5220R CPUs
+ <https://ark.intel.com/content/www/us/en/ark/products/199354/intel-xeon-gold-5220r-processor-35-75m-cache-2-20-ghz.html>`_,
+ each with 24 cores, 48 threads
+* 384GB of DDR4 Memory, made up with 12x 16GB ECC DIMMs
+* 2TB of nVME Flash Storage
+* 2x 6TB SATA Disk storage
+* 2x 40GbE ports using an XL710QDA2 NIC
+
+The 1U servers additionally have:
+
+- 2x 1GbE copper network ports
+- 2x 10GbE SFP+ network ports
+
+The 2U servers have:
+
+- 4x 1GbE copper network ports
+
+Management Server
+"""""""""""""""""
+
+One management server is required, which must have at least two 1GbE network
+ports, and runs a variety of network services to support the edge.
+
+The model used in Pronto is a `Supermicro 5019D-FTN4
+<https://www.supermicro.com/en/Aplus/system/Embedded/AS-5019D-FTN4.cfm>`_
+
+Which is configured with:
+
+* AMD Epyc 3251 CPU with 8 cores, 16 threads
+* 32GB of DDR4 memory, in 2x 16GB ECC DIMMs
+* 1TB of nVME Flash storage
+* 4x 1GbE copper network ports
+
+Management Switch
+"""""""""""""""""
+
+This switch connects the configuration interfaces and management networks on
+all the servers and switches together.
+
+In the Pronto deployment this hardware is a `HP/Aruba 2540 Series JL356A
+<https://www.arubanetworks.com/products/switches/access/2540-series/>`_.
+
diff --git a/pronto_deployment_guide/troubleshooting.rst b/pronto_deployment_guide/troubleshooting.rst
new file mode 100644
index 0000000..7828b76
--- /dev/null
+++ b/pronto_deployment_guide/troubleshooting.rst
@@ -0,0 +1,55 @@
+..
+ SPDX-FileCopyrightText: © 2020 Open Networking Foundation <support@opennetworking.org>
+ SPDX-License-Identifier: Apache-2.0
+
+Troubleshooting
+===============
+
+Unknown MAC addresses
+---------------------
+
+Sometimes it's hard to find out all the MAC addresses assigned to network
+cards. These can be found in a variety of ways:
+
+1. On servers, the BMC webpage will list the built-in network card MAC
+ addresses.
+
+2. If you login to a server, ``ip link`` or ``ip addr`` will show the MAC
+ address of each interface, including on add-in cards.
+
+3. If you can login to a server but don't know the BMC IP or MAC address for
+ that server, you can find it with ``sudo ipmitool lan print``.
+
+4. If you don't have a login to the server, but can get to the management
+ server, ``ip neighbor`` will show the arp table of MAC addresses known to
+ that system. It's output is unsorted - ``ip neigh | sort`` is easier to
+ read.
+
+Cabling issues
+--------------
+
+The system may not come up correctly if cabling isn't connected properly.
+If you don't have hands-on with the cabling, here are some ways to check on the
+cabling remotely:
+
+1. On servers you can check which ports are connected with ``ip link show``::
+
+ $ ip link show
+ ...
+ 3: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
+ link/ether 3c:ec:ef:4d:55:a8 brd ff:ff:ff:ff:ff:ff
+ ...
+ 5: eno2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
+ link/ether 3c:ec:ef:4d:55:a9 brd ff:ff:ff:ff:ff:ff
+
+ Ports that are up will show ``state UP``
+
+2. You can determine which remote ports are connected with LLDP, assuming that
+ the remote switch supports LLDP and has it enabled. This can be done with
+ ``networkctl lldp``, which shows both the name and the MAC address of the
+ connected switch on a per-link basis::
+
+ $ networkctl lldp
+ LINK CHASSIS ID SYSTEM NAME CAPS PORT ID PORT DESCRIPTION
+ eno1 10:4f:58:e7:d5:60 Aruba-2540-24…PP ..b........ 10 10
+ eno2 10:4f:58:e7:d5:60 Aruba-2540-24…PP ..b........ 1 1