Add INT overview and expand configuration Change-Id: Ieebfbf95f569ad6060c196070eff337a9288ea8b

commit: deffaddf2a1939c76ddbbd1d20155f09afc869ce [log] [tgz]
author: Carmelo Cascone <carmelo@opennetworking.org> Tue Oct 05 18:28:58 2021 -0700
committer: Carmelo Cascone <carmelo@opennetworking.org> Thu Oct 07 14:33:18 2021 -0700
tree: d92a4f5a70d0db8c27d012fb6b4f17c43f552b8e
parent: 0bc8bf55e3b584cb0b94a7d3e931552392cff94d [diff]
diff --git a/advanced/int.rst b/advanced/int.rst
index aefc2ca..390c06f 100644
--- a/advanced/int.rst
+++ b/advanced/int.rst

@@ -1,46 +1,104 @@
 In-band Network Telemetry (INT)
 ===============================
 
-ONOS network configuration for INT application
-----------------------------------------------
+Overview
+--------
 
-.. tip::
-    Learn more about `ONOS network configuration service
-    <https://wiki.onosproject.org/display/ONOS/The+Network+Configuration+Service>`_.
+SD-Fabric supports the Inband Network Telemetry (INT) standard for data plane
+telemetry.
 
-Here's a list of fields that you can configure:
+When INT is enabled, all switches are instrumented to generate INT reports for
+all traffic, reporting per-packet metadata such as the switch ID, ingress/egress
+ports, latency, queue congestion status, etc. Report generation is handled
+entirely at the data plane, in a way that does not affect the performance
+observed by regular traffic.
 
-* ``collectorIp``: The IP address of the INT collector, Must be an IP of a host that attached to the fabric. *Required*
+.. image:: ../images/int-overview.png
+    :width: 700px
 
-* ``collectorPort``: The UDP port that the INT collector will listen on. *Required*
+We aim at achieving visibility end-to-end. For this reason, we provide an
+implementation of INT for switches as well hosts. For switches, INT report
+generation is integrated as part of the same P4 pipeline responsible for
+bridging, routing, UPF, etc. For hosts, we provide *experimental* support for
+an eBPF-based application that can monitor packets as they are processed by the
+Kernel networking stack and Kubernetes CNI plug-ins. In the following, we use
+the term INT nodes to refer to both switches and hosts.
 
-* ``minFlowHopLatencyChangeNs``: The minimum latency change to bypass the flow report
-  filter in nanosecond. *Optional, default is 0.*
+SD-Fabric is responsible for producing and delivering INT report packets to an
+external collector. The actual collection and analysis of reports is out of
+scope, but we support integration with 3rd party analytics platforms. SD-Fabric
+is currently being validated for integration with Intel\ :sup:`TM` DeepInsight, a
+commercial analytics platform. However, any collector compatible with the INT
+standard can be used instead.
 
-  We use this value to instruct an INT-capable device to produce reports only for packets
-  which hop latency changed by at least ``minFlowHopLatencyChangeNs`` nanosecond from
-  the previously reported value for the same flow (5-tuple).
+Supported Features
+~~~~~~~~~~~~~~~~~~
 
-  For example: produce a report only if ``(currentHopLatency - previousHopLatency) > minFlowHopLatencyChangeNs``.
-  Some pipeline implementations might support only specific intervals, e.g., powers of 2.
+* **Telemetry Report Format Specification v0.5**: report packets generated by
+  nodes adhere to this version of the standard.
+* **INT-XD mode (eXport Data)**: all nodes generate "postcards". For a given
+  packet, the INT collector might receive up to N reports where N is the number
+  of INT nodes in the path.
+* **Configurable watchlist**: specify which flows to monitor. It could be all
+  traffic, entire subnets, or specific 5-tuples.
+* **Flow reports**: for a given flow (5-tuple), each node produces reports
+  periodically allowing a collector to monitor the path and end-to-end latency,
+  as well as detect anomalies such as path loop/change.
+* **Drop reports**: when a node drops a packet, it generates a report carrying
+  the switch ID and the drop reason (e.g., routing table miss, TTL zero, queue
+  congestion, and more).
+* **Queue congestion reports**: when queue utilization goes above a configurable
+  threshold, switches produce reports for all packets in the queue,
+  making it possible to identify exactly which flow is causing congestion.
+* **Smart filters and triggers**: generating INT reports for each packet seen by
+  a node can lead to excessive network overhead and overloading at the
+  collector. For this reason, nodes implement logic to limit the volume of
+  reports generated in a way that doesn't cause anomalies to go undetected. For
+  flow reports and drop reports, the pipeline generates 1 report/sec for each
+  5-tuple, or more when detecting anomalies (e.g., changes in the ingress/egress
+  port, queues, hop latency, etc). For queue congestion reports, the number of
+  reports that can be generated for each congestion event is limited to a
+  configurable "quota".
+* **Integration with P4-UPF**: when processing GTP-U encapsulated packets,
+  switches can watch inside GTP-U tunnels, generating reports for the inner
+  headers and making it possible to troubleshoot issues at the application
+  level. In addition, when generating drop reports, we support UPF-specific drop
+  reasons to identify if drops are caused by the UPF tables (because of a
+  misconfiguration somewhere in the control stack, or simply because the
+  specific user device is not authorized).
 
-* ``watchSubnets``: List of subnets we want to watch. *Optional, default is an empty list.*
+INT Report Delivery
+~~~~~~~~~~~~~~~~~~~
 
-  * Devices will watch packets with source IP or destination IP matched in this list.
-  * To watch every packet, use ``0.0.0.0/0``.
-  * Note that the pipeline won't watch the INT report traffic.
+INT reports generated by nodes are delivered to the INT collector using the same
+fabric links. In the example below, user traffic goes through three switches.
+Each one is generating an INT report packet (postcard), which is forwarded using
+the same flow rules for regular traffic.
 
-* ``queueReportLatencyThresholds``: A map that specified thresholds to trigger queue
-  report or reset the queue report quota. *Optional, default is an empty map.*
+.. image:: ../images/int-reports.png
+    :width: 700px
 
-  The key of this map is the queue ID. The devices will only report queues in this map.
+This choice has the advantage of simplifying deployment and control plane logic,
+as it doesn't require setting up a different network and handling installation
+of flow rules specific to INT reports. However, the downside is that delivery of
+INT reports can be subject to the same issues that we are trying to detect using
+INT. For example, if a user packet is getting dropped because of missing routing
+entries, the INT report generated for the drop event might also be dropped for
+the same reason.
 
-  * ``triggerNs``: The latency threshold to trigger queue report in nanosecond. **Required**
-  * ``resetNs``: The latency threshold to reset queue report quota in nanosecond. **Optional**
+In future releases, we might add support for using the management network for
+report delivery, but for now using the fabric network is the only supported
+option.
 
-    * When absent, the device will reset the queue report quota when latency is half of ``triggerNs``.
+ONOS Configuration
+------------------
 
-Below is an example of ONOS network configuration for INT application:
+To enable INT, modify the ONOS netcfg in the following way:
+
+* in the ``devices`` section, use an INT-enabled pipeconf ID (``fabric-int``
+  or ``fabric-spgw-int``);
+* in the ``apps`` section, add a config block for app ID
+  ``org.stratumproject.fabric.tna.inbandtelemetry``, like in the example below:
 
 .. code-block:: json
 
@@ -50,7 +108,7 @@
                 "report": {
                     "collectorIp": "10.32.11.2",
                     "collectorPort": 32766,
-                    "minFlowHopLatencyChangeNs": 512,
+                    "minFlowHopLatencyChangeNs": 256,
                     "watchSubnets": [
                         "10.32.11.0/24"
                     ],
@@ -63,47 +121,174 @@
         }
     }
 
+Here's a reference of the fields that you can configure for the INT app:
 
-Intel DeepInsight integration
------------------------------
+* ``collectorIp``: The IP address of the INT collector. Must be an IP address
+  routable by the fabric, either the IP address of a host directly connected to
+  the fabric and discovered by ONOS, or reachable via an external router.
+  *Required*
+
+* ``collectorPort``: The UDP port used by the INT collector to listen for report
+  packets. *Required*
+
+* ``minFlowHopLatencyChangeNs``: Minimum latency difference in nanoseconds to
+  trigger flow report generation. *Optional, default is 256.*
+
+  Used by the smart filters to immediately report abnormal latency changes. In
+  normal conditions, switches generate 1 report per second for each active
+  5-tuple. During congestion, when packets experience higher latency, the
+  switch will generate a report immediately if the latency difference between
+  this packet and the previous one of the same 5-tuple is greater than
+  ``minFlowHopLatencyChangeNs``.
+
+  **Warning:** Setting ``minFlowHopLatencyChangeNs`` to ``0`` or small values
+  (lower than the switch normal jitter) will cause the switch to generate a lot
+  of reports. The current implementation only supports powers of 2.
+
+* ``watchSubnets``: List of IPv4 prefixes to add to the watchlist.
+  *Optional, default is an empty list.*
+
+  All traffic with source or destination IPv4 address included in one of these
+  prefixes will be reported (both flow and drop reports). All other packets will
+  be ignored. To watch all traffic, use ``0.0.0.0/0``. For GTP-U encapsulated
+  traffic, the watchlist is always applied to the inner headers. Hence to
+  monitor UE traffic, you should provide the UE subnet.
+
+  INT traffic is always excluded from the watchlist.
+
+  The default value (empty list) implies that flow reports and drop reports
+  are disabled.
+
+* ``queueReportLatencyThresholds``: A map specifying latency thresholds to
+  trigger queue reports or reset the queue report quota.
+  *Optional, default is an empty map.*
+
+  The key of this map is the queue ID. Switches will generate queue congestion
+  reports only for queue IDs in this map. Congestion detection for other queues
+  is disabled. The same thresholds are used for all devices, i.e., it's not
+  possible to configure different thresholds for the same queue on different
+  devices.
+
+  The value of this map is a tuple:
+
+  * ``triggerNs``: The latency threshold in nanoseconds to trigger queue
+    reports. Once a packet experiences latency **above** this threshold, all
+    subsequent packets in the same queue will be reported, independently of the
+    watchlist, up to the quota or until latency drops below ``triggerNs``.
+    **Required**
+  * ``resetNs``: The latency threshold in nanosecond to reset the quota. When
+    packet latency goes below this threshold, the quota is reset to its original
+    non-zero value. **Optional, default is triggerNs/2**.
+
+  Currently the default quota is 1024 and cannot be configured.
+
+
+Intel\ :sup:`TM` DeepInsight Integration
+----------------------------------------
 
 .. note::
-    In this chapter, we assume that you already deploy the
-    `Intel DeepInsight Network Analytics Software <https://www.intel.com/content/www/us/en/products/network-io/programmable-ethernet-switch/network-analytics/deep-insight.html>`_
-    to your setup with a valid license.
+    In this section, we assume that you already know how to deploy DeepInsight
+    to your Kubernetes cluster with a valid license. For more information please
+    reach out to Intel's support.
 
-    Please contact Intel to get the software package and license of the DeepInsight Network Analytics Software.
+To be able to use DeepInsight with SD-Fabric, use the following steps:
 
-Configure the DeepInsight topology
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+* Modify the DeepInsight Helm Chart `values.yml` to include the following setting for `tos_mask`:
 
-We use `SD-Fabric utiliity <https://github.com/opennetworkinglab/sdfabric-utils>`_
-to convert the ONOS topology and configurations to DeepInsight
-topology configuration and upload it.
+  .. code-block:: yaml
 
-To install the DeepInsight utility, use the following command:
+    global:
+      preprocessor:
+        params:
+          tos_mask: 0
 
-.. code-block:: bash
+* Deploy DeepInsight
+* Obtain the IPv4 address of the fabric-facing NIC interface of the Kubernetes
+  node where the ``preprocessor`` container is deployed. This is the address to
+  use as the ``collectorIp`` in the ONOS netcfg. This address must be routable
+  by the fabric, i.e., make sure you can ping that from any other host in the
+  fabric. Similarly, from within the preprocessor container you should be able
+  to ping the loopback IPv4 address of **all** fabric switches (``ipv4Loopback``
+  in the ONOS netcfg). If ping doesn't work, check the server's RPF
+  configuration, we recommend setting it to ``net.ipv4.conf.all.rp_filter=2``.
+* Generate a ``topology.json`` using the
+  `SD-Fabric utility scripts
+  <https://github.com/opennetworkinglab/sdfabric-utils/tree/main/deep-insight>`_
+  (includes instructions) and upload it using the DeepInsight UI. Make sure to
+  update and re-upload the ``topology.json`` frequently if you modify the
+  network configuration in ONOS (e.g., if you add/remove switches or links,
+  static routes, new hosts are discovered, etc.).
 
-    pip3 install git+ssh://git@github.com/opennetworkinglab/sdfabric-utils.git#subdirectory=deep-insight
+Enabling Host-INT
+-----------------
 
-use the following command to generate the topology configuration file:
+Support for INT on hosts is still experimental.
 
-.. code-block:: bash
+Please check the documentation here to install ``int-host-reporter`` on your
+servers:
+``https://github.com/opennetworkinglab/int-host-reporter``
 
-    di gen-topology [-s ONOS_ADDRESS] [-u ONOS_USER] [-p ONOS_PASSWORD] [-o TOPOLOGY-CONFIG-JSON]
+Drop Reasons
+------------
 
-For example, we installs ONOS to ``192.168.100.1:8181``
+We use the following reason codes when generating drop reports. Please use this
+table as a reference when debugging drop reasons in DeepInsight or other INT
+collector.
 
-.. code-block:: bash
+.. list-table:: SD-Fabric INT Drop Reasons
+   :widths: 15 25 60
+   :header-rows: 1
 
-    di gen-topology -s 192.168.100.1:8181 -o /tmp/topology.json
+   * - Code
+     - Name
+     - Description
+   * - 0
+     - UNKNOWN
+     - Drop with unknown reason.
+   * - 26
+     - IP TTL ZERO
+     - IPv4 or IPv6 TTL zero. There might be a forwarding loop.
+   * - 29
+     - IP MISS
+     - IPv4 or IPv6 routing table miss. Check for missing routes in ONOS or if
+       the host is discovered.
+   * - 55
+     - INGRESS PORT VLAN MAPPING MISS
+     - Ingress port VLAN table miss. Packets are being received with an
+       unexpected VLAN. Check the ``interfaces`` section of the netcfg.
+   * - 71
+     - TRAFFIC MANAGER
+     - Packet dropped by traffic manager due to congestion (tail drop) or
+       because the port is down.
+   * - 80
+     - ACL DENY
+     - Check the ACL table rules.
+   * - 89
+     - BRIDGING MISS
+     - Missing bridging entry. Check table entry from bridging table.
+   * - 128 (WIP)
+     - NEXT ID MISS
+     - Missing next ID from ECMP (``hashed``) or multicast table.
+   * - 129
+     - MPLS MISS
+     - MPLS table miss. Check the segment routing device config in the netcfg.
+   * - 130
+     - EGRESS NEXT MISS
+     - Egress VLAN table miss. Check the ``interfaces`` section of the netcfg.
+   * - 131
+     - MPLS TTL ZERO
+     - There might be a forwarding loop.
+   * - 132
+     - UPF DOWNLINK PDR MISS
+     - Missing downlink PDR rule for the UE. Check UP4 flows.
+   * - 133
+     - UPF UPLINK PDR MISS
+     - Missing uplink PDR rule. Check UP4 flows.
+   * - 134
+     - UPF FAR MISS
+     - Missing FAR rule. Check UP4 flows.
+   * - 150
+     - UPF UPLINK RECIRC DENY
+     - Missing rules for UE-to-UE communication.
 
-.. tip::
 
-    Use ``di -h`` to get more detail about commands and parameters
-
-To upload the topology configuration file, go to DeepInsight web UI.
-In ``settings`` page there is a ``Topology Settings`` section.
-
-Choose ``Upload topology.json`` and use ``Browse...`` button to upload it.

diff --git a/dict.txt b/dict.txt
index 9d79dca..8d2eddd 100644
--- a/dict.txt
+++ b/dict.txt

@@ -1,3 +1,5 @@
+# Please keep lines in ascending case-sensitive order, so it's easier to spot duplicates
+# during merge conflict resolution.
 Aether
 Analytics
 Broadcom
@@ -16,6 +18,7 @@
 Multicast
 Netburg
 Netlink
+NxM
 ONF
 ONIE
 OpenConfig
@@ -37,8 +40,10 @@
 ToR
 ToRs
 Tofino
+UE
 UPF
 Unicast
+analytics
 backdoors
 backend
 backhaul
@@ -55,6 +60,7 @@
 dev
 disaggregated
 downlink
+eBPF
 encap
 failover
 gNMI
@@ -63,8 +69,10 @@
 gatewayIP
 hostname
 hyperscalers
+inband
 instantiation
 ip
+jitter
 json
 keepalives
 lifecycle
@@ -73,6 +81,7 @@
 loopback
 microburst
 microservice
+misconfiguration
 multicast
 natively
 netcfg
@@ -83,6 +92,7 @@
 pipeconfs
 pluggable
 pre
+preprocessor
 programmability
 protobuf
 reStructuredText
@@ -104,5 +114,6 @@
 vRouter
 verifiability
 virtualenv
+watchlist
 whitebox
 whitepaper

diff --git a/images/int-overview.png b/images/int-overview.png
new file mode 100644
index 0000000..348a194
--- /dev/null
+++ b/images/int-overview.png
Binary files differ

diff --git a/images/int-reports.png b/images/int-reports.png
new file mode 100644
index 0000000..fcedb86
--- /dev/null
+++ b/images/int-reports.png
Binary files differ
commit	deffaddf2a1939c76ddbbd1d20155f09afc869ce	[log] [tgz]
author	Carmelo Cascone <carmelo@opennetworking.org>	Tue Oct 05 18:28:58 2021 -0700
committer	Carmelo Cascone <carmelo@opennetworking.org>	Thu Oct 07 14:33:18 2021 -0700
tree	d92a4f5a70d0db8c27d012fb6b4f17c43f552b8e
parent	0bc8bf55e3b584cb0b94a7d3e931552392cff94d [diff]