Add INT overview and expand configuration
Change-Id: Ieebfbf95f569ad6060c196070eff337a9288ea8b
diff --git a/advanced/int.rst b/advanced/int.rst
index aefc2ca..390c06f 100644
--- a/advanced/int.rst
+++ b/advanced/int.rst
@@ -1,46 +1,104 @@
In-band Network Telemetry (INT)
===============================
-ONOS network configuration for INT application
-----------------------------------------------
+Overview
+--------
-.. tip::
- Learn more about `ONOS network configuration service
- <https://wiki.onosproject.org/display/ONOS/The+Network+Configuration+Service>`_.
+SD-Fabric supports the Inband Network Telemetry (INT) standard for data plane
+telemetry.
-Here's a list of fields that you can configure:
+When INT is enabled, all switches are instrumented to generate INT reports for
+all traffic, reporting per-packet metadata such as the switch ID, ingress/egress
+ports, latency, queue congestion status, etc. Report generation is handled
+entirely at the data plane, in a way that does not affect the performance
+observed by regular traffic.
-* ``collectorIp``: The IP address of the INT collector, Must be an IP of a host that attached to the fabric. *Required*
+.. image:: ../images/int-overview.png
+ :width: 700px
-* ``collectorPort``: The UDP port that the INT collector will listen on. *Required*
+We aim at achieving visibility end-to-end. For this reason, we provide an
+implementation of INT for switches as well hosts. For switches, INT report
+generation is integrated as part of the same P4 pipeline responsible for
+bridging, routing, UPF, etc. For hosts, we provide *experimental* support for
+an eBPF-based application that can monitor packets as they are processed by the
+Kernel networking stack and Kubernetes CNI plug-ins. In the following, we use
+the term INT nodes to refer to both switches and hosts.
-* ``minFlowHopLatencyChangeNs``: The minimum latency change to bypass the flow report
- filter in nanosecond. *Optional, default is 0.*
+SD-Fabric is responsible for producing and delivering INT report packets to an
+external collector. The actual collection and analysis of reports is out of
+scope, but we support integration with 3rd party analytics platforms. SD-Fabric
+is currently being validated for integration with Intel\ :sup:`TM` DeepInsight, a
+commercial analytics platform. However, any collector compatible with the INT
+standard can be used instead.
- We use this value to instruct an INT-capable device to produce reports only for packets
- which hop latency changed by at least ``minFlowHopLatencyChangeNs`` nanosecond from
- the previously reported value for the same flow (5-tuple).
+Supported Features
+~~~~~~~~~~~~~~~~~~
- For example: produce a report only if ``(currentHopLatency - previousHopLatency) > minFlowHopLatencyChangeNs``.
- Some pipeline implementations might support only specific intervals, e.g., powers of 2.
+* **Telemetry Report Format Specification v0.5**: report packets generated by
+ nodes adhere to this version of the standard.
+* **INT-XD mode (eXport Data)**: all nodes generate "postcards". For a given
+ packet, the INT collector might receive up to N reports where N is the number
+ of INT nodes in the path.
+* **Configurable watchlist**: specify which flows to monitor. It could be all
+ traffic, entire subnets, or specific 5-tuples.
+* **Flow reports**: for a given flow (5-tuple), each node produces reports
+ periodically allowing a collector to monitor the path and end-to-end latency,
+ as well as detect anomalies such as path loop/change.
+* **Drop reports**: when a node drops a packet, it generates a report carrying
+ the switch ID and the drop reason (e.g., routing table miss, TTL zero, queue
+ congestion, and more).
+* **Queue congestion reports**: when queue utilization goes above a configurable
+ threshold, switches produce reports for all packets in the queue,
+ making it possible to identify exactly which flow is causing congestion.
+* **Smart filters and triggers**: generating INT reports for each packet seen by
+ a node can lead to excessive network overhead and overloading at the
+ collector. For this reason, nodes implement logic to limit the volume of
+ reports generated in a way that doesn't cause anomalies to go undetected. For
+ flow reports and drop reports, the pipeline generates 1 report/sec for each
+ 5-tuple, or more when detecting anomalies (e.g., changes in the ingress/egress
+ port, queues, hop latency, etc). For queue congestion reports, the number of
+ reports that can be generated for each congestion event is limited to a
+ configurable "quota".
+* **Integration with P4-UPF**: when processing GTP-U encapsulated packets,
+ switches can watch inside GTP-U tunnels, generating reports for the inner
+ headers and making it possible to troubleshoot issues at the application
+ level. In addition, when generating drop reports, we support UPF-specific drop
+ reasons to identify if drops are caused by the UPF tables (because of a
+ misconfiguration somewhere in the control stack, or simply because the
+ specific user device is not authorized).
-* ``watchSubnets``: List of subnets we want to watch. *Optional, default is an empty list.*
+INT Report Delivery
+~~~~~~~~~~~~~~~~~~~
- * Devices will watch packets with source IP or destination IP matched in this list.
- * To watch every packet, use ``0.0.0.0/0``.
- * Note that the pipeline won't watch the INT report traffic.
+INT reports generated by nodes are delivered to the INT collector using the same
+fabric links. In the example below, user traffic goes through three switches.
+Each one is generating an INT report packet (postcard), which is forwarded using
+the same flow rules for regular traffic.
-* ``queueReportLatencyThresholds``: A map that specified thresholds to trigger queue
- report or reset the queue report quota. *Optional, default is an empty map.*
+.. image:: ../images/int-reports.png
+ :width: 700px
- The key of this map is the queue ID. The devices will only report queues in this map.
+This choice has the advantage of simplifying deployment and control plane logic,
+as it doesn't require setting up a different network and handling installation
+of flow rules specific to INT reports. However, the downside is that delivery of
+INT reports can be subject to the same issues that we are trying to detect using
+INT. For example, if a user packet is getting dropped because of missing routing
+entries, the INT report generated for the drop event might also be dropped for
+the same reason.
- * ``triggerNs``: The latency threshold to trigger queue report in nanosecond. **Required**
- * ``resetNs``: The latency threshold to reset queue report quota in nanosecond. **Optional**
+In future releases, we might add support for using the management network for
+report delivery, but for now using the fabric network is the only supported
+option.
- * When absent, the device will reset the queue report quota when latency is half of ``triggerNs``.
+ONOS Configuration
+------------------
-Below is an example of ONOS network configuration for INT application:
+To enable INT, modify the ONOS netcfg in the following way:
+
+* in the ``devices`` section, use an INT-enabled pipeconf ID (``fabric-int``
+ or ``fabric-spgw-int``);
+* in the ``apps`` section, add a config block for app ID
+ ``org.stratumproject.fabric.tna.inbandtelemetry``, like in the example below:
.. code-block:: json
@@ -50,7 +108,7 @@
"report": {
"collectorIp": "10.32.11.2",
"collectorPort": 32766,
- "minFlowHopLatencyChangeNs": 512,
+ "minFlowHopLatencyChangeNs": 256,
"watchSubnets": [
"10.32.11.0/24"
],
@@ -63,47 +121,174 @@
}
}
+Here's a reference of the fields that you can configure for the INT app:
-Intel DeepInsight integration
------------------------------
+* ``collectorIp``: The IP address of the INT collector. Must be an IP address
+ routable by the fabric, either the IP address of a host directly connected to
+ the fabric and discovered by ONOS, or reachable via an external router.
+ *Required*
+
+* ``collectorPort``: The UDP port used by the INT collector to listen for report
+ packets. *Required*
+
+* ``minFlowHopLatencyChangeNs``: Minimum latency difference in nanoseconds to
+ trigger flow report generation. *Optional, default is 256.*
+
+ Used by the smart filters to immediately report abnormal latency changes. In
+ normal conditions, switches generate 1 report per second for each active
+ 5-tuple. During congestion, when packets experience higher latency, the
+ switch will generate a report immediately if the latency difference between
+ this packet and the previous one of the same 5-tuple is greater than
+ ``minFlowHopLatencyChangeNs``.
+
+ **Warning:** Setting ``minFlowHopLatencyChangeNs`` to ``0`` or small values
+ (lower than the switch normal jitter) will cause the switch to generate a lot
+ of reports. The current implementation only supports powers of 2.
+
+* ``watchSubnets``: List of IPv4 prefixes to add to the watchlist.
+ *Optional, default is an empty list.*
+
+ All traffic with source or destination IPv4 address included in one of these
+ prefixes will be reported (both flow and drop reports). All other packets will
+ be ignored. To watch all traffic, use ``0.0.0.0/0``. For GTP-U encapsulated
+ traffic, the watchlist is always applied to the inner headers. Hence to
+ monitor UE traffic, you should provide the UE subnet.
+
+ INT traffic is always excluded from the watchlist.
+
+ The default value (empty list) implies that flow reports and drop reports
+ are disabled.
+
+* ``queueReportLatencyThresholds``: A map specifying latency thresholds to
+ trigger queue reports or reset the queue report quota.
+ *Optional, default is an empty map.*
+
+ The key of this map is the queue ID. Switches will generate queue congestion
+ reports only for queue IDs in this map. Congestion detection for other queues
+ is disabled. The same thresholds are used for all devices, i.e., it's not
+ possible to configure different thresholds for the same queue on different
+ devices.
+
+ The value of this map is a tuple:
+
+ * ``triggerNs``: The latency threshold in nanoseconds to trigger queue
+ reports. Once a packet experiences latency **above** this threshold, all
+ subsequent packets in the same queue will be reported, independently of the
+ watchlist, up to the quota or until latency drops below ``triggerNs``.
+ **Required**
+ * ``resetNs``: The latency threshold in nanosecond to reset the quota. When
+ packet latency goes below this threshold, the quota is reset to its original
+ non-zero value. **Optional, default is triggerNs/2**.
+
+ Currently the default quota is 1024 and cannot be configured.
+
+
+Intel\ :sup:`TM` DeepInsight Integration
+----------------------------------------
.. note::
- In this chapter, we assume that you already deploy the
- `Intel DeepInsight Network Analytics Software <https://www.intel.com/content/www/us/en/products/network-io/programmable-ethernet-switch/network-analytics/deep-insight.html>`_
- to your setup with a valid license.
+ In this section, we assume that you already know how to deploy DeepInsight
+ to your Kubernetes cluster with a valid license. For more information please
+ reach out to Intel's support.
- Please contact Intel to get the software package and license of the DeepInsight Network Analytics Software.
+To be able to use DeepInsight with SD-Fabric, use the following steps:
-Configure the DeepInsight topology
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+* Modify the DeepInsight Helm Chart `values.yml` to include the following setting for `tos_mask`:
-We use `SD-Fabric utiliity <https://github.com/opennetworkinglab/sdfabric-utils>`_
-to convert the ONOS topology and configurations to DeepInsight
-topology configuration and upload it.
+ .. code-block:: yaml
-To install the DeepInsight utility, use the following command:
+ global:
+ preprocessor:
+ params:
+ tos_mask: 0
-.. code-block:: bash
+* Deploy DeepInsight
+* Obtain the IPv4 address of the fabric-facing NIC interface of the Kubernetes
+ node where the ``preprocessor`` container is deployed. This is the address to
+ use as the ``collectorIp`` in the ONOS netcfg. This address must be routable
+ by the fabric, i.e., make sure you can ping that from any other host in the
+ fabric. Similarly, from within the preprocessor container you should be able
+ to ping the loopback IPv4 address of **all** fabric switches (``ipv4Loopback``
+ in the ONOS netcfg). If ping doesn't work, check the server's RPF
+ configuration, we recommend setting it to ``net.ipv4.conf.all.rp_filter=2``.
+* Generate a ``topology.json`` using the
+ `SD-Fabric utility scripts
+ <https://github.com/opennetworkinglab/sdfabric-utils/tree/main/deep-insight>`_
+ (includes instructions) and upload it using the DeepInsight UI. Make sure to
+ update and re-upload the ``topology.json`` frequently if you modify the
+ network configuration in ONOS (e.g., if you add/remove switches or links,
+ static routes, new hosts are discovered, etc.).
- pip3 install git+ssh://git@github.com/opennetworkinglab/sdfabric-utils.git#subdirectory=deep-insight
+Enabling Host-INT
+-----------------
-use the following command to generate the topology configuration file:
+Support for INT on hosts is still experimental.
-.. code-block:: bash
+Please check the documentation here to install ``int-host-reporter`` on your
+servers:
+``https://github.com/opennetworkinglab/int-host-reporter``
- di gen-topology [-s ONOS_ADDRESS] [-u ONOS_USER] [-p ONOS_PASSWORD] [-o TOPOLOGY-CONFIG-JSON]
+Drop Reasons
+------------
-For example, we installs ONOS to ``192.168.100.1:8181``
+We use the following reason codes when generating drop reports. Please use this
+table as a reference when debugging drop reasons in DeepInsight or other INT
+collector.
-.. code-block:: bash
+.. list-table:: SD-Fabric INT Drop Reasons
+ :widths: 15 25 60
+ :header-rows: 1
- di gen-topology -s 192.168.100.1:8181 -o /tmp/topology.json
+ * - Code
+ - Name
+ - Description
+ * - 0
+ - UNKNOWN
+ - Drop with unknown reason.
+ * - 26
+ - IP TTL ZERO
+ - IPv4 or IPv6 TTL zero. There might be a forwarding loop.
+ * - 29
+ - IP MISS
+ - IPv4 or IPv6 routing table miss. Check for missing routes in ONOS or if
+ the host is discovered.
+ * - 55
+ - INGRESS PORT VLAN MAPPING MISS
+ - Ingress port VLAN table miss. Packets are being received with an
+ unexpected VLAN. Check the ``interfaces`` section of the netcfg.
+ * - 71
+ - TRAFFIC MANAGER
+ - Packet dropped by traffic manager due to congestion (tail drop) or
+ because the port is down.
+ * - 80
+ - ACL DENY
+ - Check the ACL table rules.
+ * - 89
+ - BRIDGING MISS
+ - Missing bridging entry. Check table entry from bridging table.
+ * - 128 (WIP)
+ - NEXT ID MISS
+ - Missing next ID from ECMP (``hashed``) or multicast table.
+ * - 129
+ - MPLS MISS
+ - MPLS table miss. Check the segment routing device config in the netcfg.
+ * - 130
+ - EGRESS NEXT MISS
+ - Egress VLAN table miss. Check the ``interfaces`` section of the netcfg.
+ * - 131
+ - MPLS TTL ZERO
+ - There might be a forwarding loop.
+ * - 132
+ - UPF DOWNLINK PDR MISS
+ - Missing downlink PDR rule for the UE. Check UP4 flows.
+ * - 133
+ - UPF UPLINK PDR MISS
+ - Missing uplink PDR rule. Check UP4 flows.
+ * - 134
+ - UPF FAR MISS
+ - Missing FAR rule. Check UP4 flows.
+ * - 150
+ - UPF UPLINK RECIRC DENY
+ - Missing rules for UE-to-UE communication.
-.. tip::
- Use ``di -h`` to get more detail about commands and parameters
-
-To upload the topology configuration file, go to DeepInsight web UI.
-In ``settings`` page there is a ``Topology Settings`` section.
-
-Choose ``Upload topology.json`` and use ``Browse...`` button to upload it.