Reorganization pass on Aether Docs
Change-Id: I0653109d6fe8d340278580ff5c7758ca264b512e
diff --git a/edge_deployment/server_bootstrap.rst b/edge_deployment/server_bootstrap.rst
new file mode 100644
index 0000000..c7fe5e1
--- /dev/null
+++ b/edge_deployment/server_bootstrap.rst
@@ -0,0 +1,293 @@
+..
+ SPDX-FileCopyrightText: © 2020 Open Networking Foundation <support@opennetworking.org>
+ SPDX-License-Identifier: Apache-2.0
+
+Server Bootstrap
+================
+
+Management Server Bootstrap
+"""""""""""""""""""""""""""
+
+The management server is bootstrapped into a customized version of the standard
+Ubuntu 18.04 OS installer.
+
+The `iPXE boot firmware <https://ipxe.org/>`_. is used to start this process
+and is built using the steps detailed in the `ipxe-build
+<https://gerrit.opencord.org/plugins/gitiles/ipxe-build>`_. repo, which
+generates both USB and PXE chainloadable boot images.
+
+Once a system has been started using these images started, these images will
+download a customized script from an external webserver to continue the boot
+process. This iPXE to webserver connection is secured with mutual TLS
+authentication, enforced by the nginx webserver.
+
+The iPXE scripts are created by the `pxeboot
+<https://gerrit.opencord.org/plugins/gitiles/ansible/role/pxeboot>`_ role,
+which creates both a boot menu, downloads the appropriate binaries for
+bootstrapping an OS installation, and creates per-node installation preseed files.
+
+The preseed files contain configuration steps to install the OS from the
+upstream Ubuntu repos, as well as customization of packages and creating the
+``onfadmin`` user.
+
+Creating a bootable USB drive
+'''''''''''''''''''''''''''''
+
+1. Get a USB key. Can be tiny as the uncompressed image is floppy sized
+ (1.4MB). Download the USB image file (``<date>_onf_ipxe.usb.zip``) on the
+ system you're using to write the USB key, and unzip it.
+
+2. Put a USB key in the system you're using to create the USB key, then
+ determine which USB device file it's at in ``/dev``. You might look at the
+ end of the ``dmesg`` output on Linux/Unix or the output of ``diskutil
+ list`` on macOS.
+
+ Be very careful about this, as if you accidentally overwrite some other disk in
+ your system that would be highly problematic.
+
+3. Write the image to the device::
+
+ $ dd if=/path/to/20201116_onf_ipxe.usb of=/dev/sdg
+ 2752+0 records in
+ 2752+0 records out
+ 1409024 bytes (1.4 MB, 1.3 MiB) copied, 2.0272 s, 695 kB/s
+
+ You may need to use `sudo` for this.
+
+Boot and Image Management Server
+''''''''''''''''''''''''''''''''
+
+1. Connect a USB keyboard and VGA monitor to the management node. Put the USB
+ Key in one of the management node's USB ports (port 2 or 3):
+
+ .. image:: images/mgmtsrv-000.png
+ :alt: Management Server Ports
+ :scale: 50%
+
+2. Turn on the management node, and press the F11 key as it starts to get into
+ the Boot Menu:
+
+ .. image:: images/mgmtsrv-001.png
+ :alt: Management Server Boot Menu
+ :scale: 50%
+
+3. Select the USB key (in this case "PNY USB 2.0", your options may vary) and press return. You should see iPXE load:
+
+ .. image:: images/mgmtsrv-002.png
+ :alt: iPXE load
+ :scale: 50%
+
+4. A menu will appear which displays the system information and DHCP discovered
+ network settings (your network must provide the IP address to the management
+ server via DHCP):
+
+ Use the arrow keys to select "Ubuntu 18.04 Installer (fully automatic)":
+
+ .. image:: images/mgmtsrv-003.png
+ :alt: iPXE Menu
+ :scale: 50%
+
+ There is a 10 second default timeout if left untouched (it will continue the
+ system boot process) so restart the system if you miss the 10 second window.
+
+5. The Ubuntu 18.04 installer will be downloaded and booted:
+
+ .. image:: images/mgmtsrv-004.png
+ :alt: Ubuntu Boot
+ :scale: 50%
+
+6. Then the installer starts and takes around 10 minutes to install (depends on
+ your connection speed):
+
+ .. image:: images/mgmtsrv-005.png
+ :alt: Ubuntu Install
+ :scale: 50%
+
+
+7. At the end of the install, the system will restart and present you with a
+ login prompt:
+
+ .. image:: images/mgmtsrv-006.png
+ :alt: Ubuntu Install Complete
+ :scale: 50%
+
+
+Management Server Configuration
+'''''''''''''''''''''''''''''''
+
+Once the OS is installed on the management server, Ansible is used to remotely
+install software on the management server.
+
+To checkout the ONF ansible repo and enter the virtualenv with the tooling::
+
+ mkdir infra
+ cd infra
+ repo init -u ssh://<your gerrit username>@gerrit.opencord.org:29418/infra-manifest
+ repo sync
+ cd ansible
+ make galaxy
+ source venv_onfansible/bin/activate
+
+Obtain the ``undionly.kpxe`` iPXE artifact for bootstrapping the compute
+servers, and put it in the ``playbook/files`` directory.
+
+Next, create an inventory file to access the NetBox API. An example is given
+in ``inventory/example-netbox.yml`` - duplicate this file and modify it. Fill
+in the ``api_endpoint`` address and ``token`` with an API key you get out of
+the NetBox instance. List the IP Prefixes used by the site in the
+``ip_prefixes`` list.
+
+Next, run the ``scripts/netbox_edgeconfig.py`` to generate a host_vars file for
+the management server. Assuming that the management server in the edge is
+named ``mgmtserver1.stage1.menlo``, you'd run::
+
+ python scripts/netbox_edgeconfig.py inventory/my-netbox.yml > inventory/host_vars/mgmtserver1.stage1.menlo.yml
+
+One manual change needs to be made to this output - edit the
+``inventory/host_vars/mgmtserver1.stage1.menlo.yml`` file and add the following
+to the bottom of the file, replacing the IP addresses with the management
+server IP address for each segment.
+
+In the case of the Fabric that has two leaves and IP ranges, add the Management
+server IP address used for the leaf that it is connected to, and then add a
+route for the other IP address range for the non-Management-connected leaf that
+is via the Fabric router address in the connected leaf range.
+
+This configures the `netplan <https://netplan.io>`_ on the management server,
+and creates a SNAT rule for the UE range route, and will be automated away
+soon::
+
+ # added manually
+ netprep_netplan:
+ ethernets:
+ eno2:
+ addresses:
+ - 10.0.0.1/25
+ vlans:
+ mgmt800:
+ id: 800
+ link: eno2
+ addresses:
+ - 10.0.0.129/25
+ fabr801:
+ id: 801
+ link: eno2
+ addresses:
+ - 10.0.1.129/25
+ routes:
+ - to: 10.0.1.0/25
+ via: 10.0.1.254
+ metric: 100
+
+ netprep_nftables_nat_postrouting: >
+ ip saddr 10.0.1.0/25 ip daddr 10.168.0.0/20 counter snat to 10.0.1.129;
+
+
+Using the ``inventory/example-aether.ini`` as a template, create an
+:doc:`ansible inventory <ansible:user_guide/intro_inventory>` file for the
+site. Change the device names, IP addresses, and ``onfadmin`` password to match
+the ones for this site. The management server's configuration is in the
+``[aethermgmt]`` and corresponding ``[aethermgmt:vars]`` section.
+
+Then, to configure a management server, run::
+
+ ansible-playbook -i inventory/sitename.ini playbooks/aethermgmt-playbook.yml
+
+This installs software with the following functionality:
+
+- VLANs on second Ethernet port to provide connectivity to the rest of the pod.
+- Firewall with NAT for routing traffic
+- DHCP and TFTP for bootstrapping servers and switches
+- DNS for host naming and identification
+- HTTP server for serving files used for bootstrapping switches
+- Downloads the Tofino switch image
+- Creates user accounts for administrative access
+
+Compute Server Bootstrap
+""""""""""""""""""""""""
+
+Once the management server has finished installation, it will be set to offer
+the same iPXE bootstrap file to the computer.
+
+Each node will be booted, and when iPXE loads select the ``Ubuntu 18.04
+Installer (fully automatic)`` option.
+
+The nodes can be controlled remotely via their BMC management interfaces - if
+the BMC is at ``10.0.0.3`` a remote user can SSH into them with::
+
+ ssh -L 2443:10.0.0.3:443 onfadmin@<mgmt server ip>
+
+And then use their web browser to access the BMC at::
+
+ https://localhost:2443
+
+The default BMC credentials for the Pronto nodes are::
+
+ login: ADMIN
+ password: Admin123
+
+The BMC will also list all of the MAC addresses for the network interfaces
+(including BMC) that are built into the logic board of the system. Add-in
+network cards like the 40GbE ones used in compute servers aren't listed.
+
+To prepare the compute nodes, software must be installed on them. As they
+can't be accessed directly from your local system, a :ref:`jump host
+<ansible:use_ssh_jump_hosts>` configuration is added, so the SSH connection
+goes through the management server to the compute systems behind it. Doing this
+requires a few steps:
+
+First, configure SSH to use Agent forwarding - create or edit your
+``~/.ssh/config`` file and add the following lines::
+
+ Host <management server IP>
+ ForwardAgent yes
+
+Then try to login to the management server, then the compute node::
+
+ $ ssh onfadmin@<management server IP>
+ Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-54-generic x86_64)
+ ...
+ onfadmin@mgmtserver1:~$ ssh onfadmin@10.0.0.138
+ Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-54-generic x86_64)
+ ...
+ onfadmin@node2:~$
+
+Being able to login to the compute nodes from the management node means that
+SSH Agent forwarding is working correctly.
+
+Verify that your inventory (Created earlier from the
+``inventory/example-aether.ini`` file) includes an ``[aethercompute]`` section
+that has all the names and IP addresses of the compute nodes in it.
+
+Then run a ping test::
+
+ ansible -i inventory/sitename.ini -m ping aethercompute
+
+It may ask you about authorized keys - answer ``yes`` for each host to trust the keys::
+
+ The authenticity of host '10.0.0.138 (<no hostip for proxy command>)' can't be established.
+ ECDSA key fingerprint is SHA256:...
+ Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
+
+You should then see a success message for each host::
+
+ node1.stage1.menlo | SUCCESS => {
+ "changed": false,
+ "ping": "pong"
+ }
+ node2.stage1.menlo | SUCCESS => {
+ "changed": false,
+ "ping": "pong"
+ }
+ ...
+
+Once you've seen this, run the playbook to install the prerequisites (Terraform
+user, Docker)::
+
+ ansible-playbook -i inventory/sitename.ini playbooks/aethercompute-playbook.yml
+
+Note that Docker is quite large and may take a few minutes for installation
+depending on internet connectivity.
+
+Now that these compute nodes have been brought up, the rest of the installation
+can continue.