Various improvements
- Move switch spec from Aether to SD-Fabric. Update the list
- Move ONIE/ONL deployment/troubleshooting section from Aether to SD-Fabric
Change-Id: I3eaeb839f8e1fc595a775c305246c7659bb82cdb
diff --git a/conf.py b/conf.py
index 7fa7edf..e949e7a 100644
--- a/conf.py
+++ b/conf.py
@@ -265,7 +265,8 @@
'sysapproach5g': ('https://5g.systemsapproach.org/', None),
'sysapproachnet': ('https://book.systemsapproach.org/', None),
'sysapproachsdn': ('https://sdn.systemsapproach.org/', None),
- }
+ 'aether': ('https://docs.aetherproject.org/master', None),
+}
def setup(app):
diff --git a/deployment.rst b/deployment.rst
index c55d3aa..3d424c9 100644
--- a/deployment.rst
+++ b/deployment.rst
@@ -6,6 +6,30 @@
Deployment Guide
================
+Switch Hardware Selection
+-------------------------
+We have verified and therefore recommend using the switch model listed in :ref:`verified_switch`.
+Other Stratum-enabled switches listed in :ref:`all_switch` should also work in theory
+but more integration work may be required.
+
+To use the P4 UPF, you must use fabric switches based on the `Intel (formerly Barefoot) Tofino chipset
+<https://www.intel.com/content/www/us/en/products/network-io/programmable-ethernet-switch/tofino-series.html>`_.
+There are two variants of this switching chipset, with different resources and capabilities.
+The **Dual Pipe** Tofino ASIC is less expensive,
+while the **Quad Pipe** Tofino ASIC has more chip resources and a faster embedded system with more memory and storage.
+
+The P4 UPF and SD-Fabric features run within the constraints of the Dual Pipe
+system for production deployments, but for development of features in P4, the
+larger capacity of the Quad Pipe is desirable.
+
+These switches feature 32 QSFP+ ports capable of running in 100GbE, 40GbE, or
+4x 10GbE mode (using a split DAC or fiber cable) and have a 1GbE management
+network interface.
+
+See also the :ref:`Rackmount of Equipment
+<aether:edge_deployment/site_planning:rackmount of equipment>` for how the Fabric
+switches should be rack-mounted to ensure proper airflow within a rack.
+
Deployment Overview
-------------------
SD-Fabric is released with Helm chart and container images.
@@ -36,7 +60,7 @@
Step 1: Provision Switches
--------------------------
-We follow Open Network Install Environment(ONIE) way to install Open Network Linux (ONL) image to switch.
+We follow Open Network Install Environment (ONIE) way to install Open Network Linux (ONL) image to switch.
To work with the SD-Fabric environment, we have customized the ONL image to support related packages and dependencies.
Image source file can be found on ONF repository `opennetworkinglab/OpenNetworkLinux <https://github.com/opennetworkinglab/OpenNetworkLinux>`_.
@@ -54,26 +78,122 @@
.. code-block::
- wget https://github.com/opennetworkinglab/OpenNetworkLinux/releases/download/v1.4.3/ONL-onf-ONLPv2_ONL-OS_2021-07-16.2159-5195444_AMD64_INSTALLED_INSTALLER
- python -m http.server 8080
+ wget https://github.com/opennetworkinglab/OpenNetworkLinux/releases/download/v1.4.3/ONL-onf-ONLPv2_ONL-OS_2021-07-16.2159-5195444_AMD64_INSTALLED_INSTALLER -o onl-installer
+ sudo python -m http.server 80
2. Reboot the switch to enter ONIE installation mode
-.. note::
- Please access the switch via BMC or serial console to keep connection during the installation.
+ In order to reinstall an ONL image, you must change the ONIE bootloader to
+ "Rescue Mode".
+ Once the switch is powered on, it should retrieve an IP address on the OpenBMC
+ interface with DHCP. Here we use ``10.0.0.131`` as an example.
+ OpenBMC uses these default credentials
-.. code-block::
+ .. code-block::
- onl-onie-boot-mode rescue; reboot
+ username: root
+ password: 0penBmc
+
+ Login to OpenBMC with SSH:
+
+ .. code-block::
+
+ $ ssh root@10.0.0.131
+ The authenticity of host '10.0.0.131 (10.0.0.131)' can't be established.
+ ECDSA key fingerprint is SHA256:...
+ Are you sure you want to continue connecting (yes/no)? yes
+ Warning: Permanently added '10.0.0.131' (ECDSA) to the list of known hosts.
+ root@10.0.0.131's password:
+ root@bmc:~#
+
+ Using the Serial-over-LAN Console, enter ONL
+
+ .. code-block::
+
+ root@bmc:~# /usr/local/bin/sol.sh
+ You are in SOL session.
+ Use ctrl-x to quit.
+ -----------------------
+
+ root@onl:~#
+
+ .. note::
+
+ If `sol.sh` is unresponsive, please try to restart the mainboard with
+
+ .. code-block::
+
+ root@onl:~# wedge_power.sh reset
+
+ Change the boot mode to rescue mode and reboot
+
+ .. code-block::
+
+ root@onl:~# onl-onie-boot-mode rescue
+ [1053033.768512] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)
+ [1053033.936893] EXT4-fs (sda3): re-mounted. Opts: (null)
+ [1053033.996727] EXT4-fs (sda3): re-mounted. Opts: (null)
+ The system will boot into ONIE rescue mode at the next restart.
+
+ root@onl:~# reboot
+
+ At this point, ONL will go through it's shutdown sequence and ONIE will start.
+ If it does not start right away, press the Enter/Return key a few times - it
+ may show you a boot selection screen. Pick ``ONIE`` and ``Rescue`` if given a
+ choice.
3. Install ONL installer
-.. code-block::
+ Now that the switch is in Rescue mode
- onie-nos-install http://$SERVER_IP:8080/ONL-onf-ONLPv2_ONL-OS_2021-07-16.2159-5195444_AMD64_INSTALLED_INSTALLER
+ Then run the ``onie-nos-install`` command, with the URL of the management
+ server (here we use ``10.0.0.129`` as an example) on the management network segment
-4. Setup switch IP and hostname after the installation.
+ .. code-block::
+
+ ONIE:/ # onie-nos-install http://10.0.0.129/onie-installer
+ discover: Rescue mode detected. No discover stopped.
+ ONIE: Unable to find 'Serial Number' TLV in EEPROM data.
+ Info: Fetching http://10.0.0.129/onie-installer ...
+ Connecting to 10.0.0.129 (10.0.0.129:80)
+ installer 100% |*******************************| 322M 0:00:00 ETA
+ ONIE: Executing installer: http://10.0.0.129/onie-installer
+ installer: computing checksum of original archive
+ installer: checksum is OK
+ ...
+
+ The installation will now start, and then ONL will boot culminating in
+
+ .. code-block::
+
+ Open Network Linux OS ONL-wedge100bf-32qs, 2020-11-04.19:44-64100e9
+
+ localhost login:
+
+ The default ONL login is::
+
+ username: root
+ password: onl
+
+ If you login, you can verify that the switch is getting it's IP address via DHCP
+
+ .. code-block::
+
+ root@localhost:~# ip addr
+ ...
+ 3: ma1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
+ link/ether 00:90:fb:5c:e1:97 brd ff:ff:ff:ff:ff:ff
+ inet 10.0.0.130/25 brd 10.0.0.255 scope global ma1
+ ...
+
+4. (Optional) Setup switch IP and hostname after the installation if DHCP is not available
+
+.. warning::
+
+ Stop and return to :ref:`Post-ONL configuration <aether:edge_deployment/fabric_switch_bootstrap:post-onl configuration>`
+ and continue the remaining steps there if you came from Aether docs.
+ Otherwise, please continue the rest of the page here.
Step 2: Configure switches as special Kubernetes nodes
diff --git a/dict.txt b/dict.txt
index 3e7bd66..21a62e8 100644
--- a/dict.txt
+++ b/dict.txt
@@ -13,6 +13,7 @@
DDoS
Edgecore
Fluentbit
+GbE
GPP
Gerrit
Grafana
@@ -63,6 +64,7 @@
bitrate
blackhole
blackholing
+bootloader
bursty
centric
chipset
@@ -95,6 +97,7 @@
linecard
linecards
loopback
+mainboard
microburst
microservice
misconfiguration
diff --git a/specification.rst b/specification.rst
index b4194a9..43e66fc 100644
--- a/specification.rst
+++ b/specification.rst
@@ -197,15 +197,14 @@
Controller Server Specs
-----------------------
Recommendation (per ONOS instance) based on 50K routes
-
- CPU: 32 Cores
- RAM: 128GB RAM. 64GB dedicated to ONOS JVM heap
Recommendation (per ONOS instance) for 5K UEs when enabling UPF:
-
- CPU: 1 Cores
- RAM: 4GB RAM
+.. _all_switch:
White Box Switch Hardware
-------------------------
@@ -216,6 +215,16 @@
- 1/10G, 25G, 40G, and 100G ports
- Refer to Supported Devices list in https://github.com/stratum/stratum for the most up-to-date hardware list
+.. _verified_switch:
+
+Aether-verified Switch Hardware
+-------------------------------
+ - `EdgeCore DCS800 <https://www.edge-core.com/productsInfo.php?cls=1&cls2=180&cls3=181&id=335>`_
+ with Dual Pipe Tofino ASIC (formerly Wedge100BF-32X)
+
+ - `EdgeCore DCS801 <https://www.edge-core.com/productsInfo.php?cls=1&cls2=180&cls3=181&id=770>`_
+ with Quad Pipe Tofino ASIC (formerly Wedge100BF-32QS)
+
White Box Switch Software
-------------------------
- Open source ONL, ONIE, Docker, Kubernetes
diff --git a/troubleshooting.rst b/troubleshooting.rst
index 5169dc3..981072e 100644
--- a/troubleshooting.rst
+++ b/troubleshooting.rst
@@ -11,6 +11,19 @@
control plane software and data plane software are containerized and deployed as Kubernetes services in SD-Fabric.
Please refer to :ref:`architecture_design` for further details.
+ONL troubleshooting
+-------------------
+
+Can't reboot into ONL, loops on ONIE installer mode
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Sometimes an ONL installation is incomplete or problematic, and reinstalling it
+doesn't result in a working system.
+
+If this is the case, reboot into ONIE Rescue mode and use ``parted`` to delete
+all the ``ONL-`` prefixed partitions, then reinstall with an ``onie-installer``
+image.
+
K8s troubleshooting
-------------------