Charles Chan | fcfe890 | 2022-02-02 17:06:27 -0800 | [diff] [blame] | 1 | .. SPDX-FileCopyrightText: 2021 Open Networking Foundation <info@opennetworking.org> |
| 2 | .. SPDX-License-Identifier: Apache-2.0 |
| 3 | |
Carmelo Cascone | 7623e7c | 2021-10-13 17:45:27 -0700 | [diff] [blame] | 4 | .. _int: |
| 5 | |
Charles Chan | caebcf3 | 2021-09-20 22:17:52 -0700 | [diff] [blame] | 6 | In-band Network Telemetry (INT) |
| 7 | =============================== |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 8 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 9 | Overview |
| 10 | -------- |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 11 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 12 | SD-Fabric supports the Inband Network Telemetry (INT) standard for data plane |
| 13 | telemetry. |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 14 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 15 | When INT is enabled, all switches are instrumented to generate INT reports for |
| 16 | all traffic, reporting per-packet metadata such as the switch ID, ingress/egress |
| 17 | ports, latency, queue congestion status, etc. Report generation is handled |
| 18 | entirely at the data plane, in a way that does not affect the performance |
| 19 | observed by regular traffic. |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 20 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 21 | .. image:: ../images/int-overview.png |
| 22 | :width: 700px |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 23 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 24 | We aim at achieving visibility end-to-end. For this reason, we provide an |
| 25 | implementation of INT for switches as well hosts. For switches, INT report |
| 26 | generation is integrated as part of the same P4 pipeline responsible for |
| 27 | bridging, routing, UPF, etc. For hosts, we provide *experimental* support for |
| 28 | an eBPF-based application that can monitor packets as they are processed by the |
| 29 | Kernel networking stack and Kubernetes CNI plug-ins. In the following, we use |
| 30 | the term INT nodes to refer to both switches and hosts. |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 31 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 32 | SD-Fabric is responsible for producing and delivering INT report packets to an |
| 33 | external collector. The actual collection and analysis of reports is out of |
| 34 | scope, but we support integration with 3rd party analytics platforms. SD-Fabric |
| 35 | is currently being validated for integration with Intel\ :sup:`TM` DeepInsight, a |
| 36 | commercial analytics platform. However, any collector compatible with the INT |
| 37 | standard can be used instead. |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 38 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 39 | Supported Features |
| 40 | ~~~~~~~~~~~~~~~~~~ |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 41 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 42 | * **Telemetry Report Format Specification v0.5**: report packets generated by |
| 43 | nodes adhere to this version of the standard. |
| 44 | * **INT-XD mode (eXport Data)**: all nodes generate "postcards". For a given |
| 45 | packet, the INT collector might receive up to N reports where N is the number |
| 46 | of INT nodes in the path. |
| 47 | * **Configurable watchlist**: specify which flows to monitor. It could be all |
| 48 | traffic, entire subnets, or specific 5-tuples. |
| 49 | * **Flow reports**: for a given flow (5-tuple), each node produces reports |
| 50 | periodically allowing a collector to monitor the path and end-to-end latency, |
| 51 | as well as detect anomalies such as path loop/change. |
| 52 | * **Drop reports**: when a node drops a packet, it generates a report carrying |
| 53 | the switch ID and the drop reason (e.g., routing table miss, TTL zero, queue |
| 54 | congestion, and more). |
| 55 | * **Queue congestion reports**: when queue utilization goes above a configurable |
| 56 | threshold, switches produce reports for all packets in the queue, |
| 57 | making it possible to identify exactly which flow is causing congestion. |
| 58 | * **Smart filters and triggers**: generating INT reports for each packet seen by |
| 59 | a node can lead to excessive network overhead and overloading at the |
| 60 | collector. For this reason, nodes implement logic to limit the volume of |
| 61 | reports generated in a way that doesn't cause anomalies to go undetected. For |
| 62 | flow reports and drop reports, the pipeline generates 1 report/sec for each |
| 63 | 5-tuple, or more when detecting anomalies (e.g., changes in the ingress/egress |
| 64 | port, queues, hop latency, etc). For queue congestion reports, the number of |
| 65 | reports that can be generated for each congestion event is limited to a |
| 66 | configurable "quota". |
| 67 | * **Integration with P4-UPF**: when processing GTP-U encapsulated packets, |
| 68 | switches can watch inside GTP-U tunnels, generating reports for the inner |
| 69 | headers and making it possible to troubleshoot issues at the application |
| 70 | level. In addition, when generating drop reports, we support UPF-specific drop |
| 71 | reasons to identify if drops are caused by the UPF tables (because of a |
| 72 | misconfiguration somewhere in the control stack, or simply because the |
| 73 | specific user device is not authorized). |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 74 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 75 | INT Report Delivery |
| 76 | ~~~~~~~~~~~~~~~~~~~ |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 77 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 78 | INT reports generated by nodes are delivered to the INT collector using the same |
| 79 | fabric links. In the example below, user traffic goes through three switches. |
| 80 | Each one is generating an INT report packet (postcard), which is forwarded using |
| 81 | the same flow rules for regular traffic. |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 82 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 83 | .. image:: ../images/int-reports.png |
| 84 | :width: 700px |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 85 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 86 | This choice has the advantage of simplifying deployment and control plane logic, |
| 87 | as it doesn't require setting up a different network and handling installation |
| 88 | of flow rules specific to INT reports. However, the downside is that delivery of |
| 89 | INT reports can be subject to the same issues that we are trying to detect using |
| 90 | INT. For example, if a user packet is getting dropped because of missing routing |
| 91 | entries, the INT report generated for the drop event might also be dropped for |
| 92 | the same reason. |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 93 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 94 | In future releases, we might add support for using the management network for |
| 95 | report delivery, but for now using the fabric network is the only supported |
| 96 | option. |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 97 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 98 | ONOS Configuration |
| 99 | ------------------ |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 100 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 101 | To enable INT, modify the ONOS netcfg in the following way: |
| 102 | |
| 103 | * in the ``devices`` section, use an INT-enabled pipeconf ID (``fabric-int`` |
Carmelo Cascone | f2b1791 | 2022-02-04 09:27:27 -0800 | [diff] [blame^] | 104 | or ``fabric-upf-int``); |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 105 | * in the ``apps`` section, add a config block for app ID |
| 106 | ``org.stratumproject.fabric.tna.inbandtelemetry``, like in the example below: |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 107 | |
| 108 | .. code-block:: json |
| 109 | |
| 110 | { |
| 111 | "apps": { |
| 112 | "org.stratumproject.fabric.tna.inbandtelemetry": { |
| 113 | "report": { |
| 114 | "collectorIp": "10.32.11.2", |
| 115 | "collectorPort": 32766, |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 116 | "minFlowHopLatencyChangeNs": 256, |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 117 | "watchSubnets": [ |
| 118 | "10.32.11.0/24" |
| 119 | ], |
| 120 | "queueReportLatencyThresholds": { |
| 121 | "0": {"triggerNs": 2000, "resetNs": 500}, |
| 122 | "2": {"triggerNs": 1000, "resetNs": 400} |
| 123 | } |
| 124 | } |
| 125 | } |
| 126 | } |
| 127 | } |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 128 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 129 | Here's a reference of the fields that you can configure for the INT app: |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 130 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 131 | * ``collectorIp``: The IP address of the INT collector. Must be an IP address |
| 132 | routable by the fabric, either the IP address of a host directly connected to |
| 133 | the fabric and discovered by ONOS, or reachable via an external router. |
| 134 | *Required* |
| 135 | |
| 136 | * ``collectorPort``: The UDP port used by the INT collector to listen for report |
| 137 | packets. *Required* |
| 138 | |
| 139 | * ``minFlowHopLatencyChangeNs``: Minimum latency difference in nanoseconds to |
| 140 | trigger flow report generation. *Optional, default is 256.* |
| 141 | |
| 142 | Used by the smart filters to immediately report abnormal latency changes. In |
| 143 | normal conditions, switches generate 1 report per second for each active |
| 144 | 5-tuple. During congestion, when packets experience higher latency, the |
| 145 | switch will generate a report immediately if the latency difference between |
| 146 | this packet and the previous one of the same 5-tuple is greater than |
| 147 | ``minFlowHopLatencyChangeNs``. |
| 148 | |
| 149 | **Warning:** Setting ``minFlowHopLatencyChangeNs`` to ``0`` or small values |
| 150 | (lower than the switch normal jitter) will cause the switch to generate a lot |
| 151 | of reports. The current implementation only supports powers of 2. |
| 152 | |
| 153 | * ``watchSubnets``: List of IPv4 prefixes to add to the watchlist. |
| 154 | *Optional, default is an empty list.* |
| 155 | |
| 156 | All traffic with source or destination IPv4 address included in one of these |
| 157 | prefixes will be reported (both flow and drop reports). All other packets will |
| 158 | be ignored. To watch all traffic, use ``0.0.0.0/0``. For GTP-U encapsulated |
| 159 | traffic, the watchlist is always applied to the inner headers. Hence to |
| 160 | monitor UE traffic, you should provide the UE subnet. |
| 161 | |
| 162 | INT traffic is always excluded from the watchlist. |
| 163 | |
| 164 | The default value (empty list) implies that flow reports and drop reports |
| 165 | are disabled. |
| 166 | |
| 167 | * ``queueReportLatencyThresholds``: A map specifying latency thresholds to |
| 168 | trigger queue reports or reset the queue report quota. |
| 169 | *Optional, default is an empty map.* |
| 170 | |
| 171 | The key of this map is the queue ID. Switches will generate queue congestion |
| 172 | reports only for queue IDs in this map. Congestion detection for other queues |
| 173 | is disabled. The same thresholds are used for all devices, i.e., it's not |
| 174 | possible to configure different thresholds for the same queue on different |
| 175 | devices. |
| 176 | |
| 177 | The value of this map is a tuple: |
| 178 | |
| 179 | * ``triggerNs``: The latency threshold in nanoseconds to trigger queue |
| 180 | reports. Once a packet experiences latency **above** this threshold, all |
| 181 | subsequent packets in the same queue will be reported, independently of the |
| 182 | watchlist, up to the quota or until latency drops below ``triggerNs``. |
| 183 | **Required** |
| 184 | * ``resetNs``: The latency threshold in nanosecond to reset the quota. When |
| 185 | packet latency goes below this threshold, the quota is reset to its original |
| 186 | non-zero value. **Optional, default is triggerNs/2**. |
| 187 | |
| 188 | Currently the default quota is 1024 and cannot be configured. |
| 189 | |
| 190 | |
| 191 | Intel\ :sup:`TM` DeepInsight Integration |
| 192 | ---------------------------------------- |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 193 | |
| 194 | .. note:: |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 195 | In this section, we assume that you already know how to deploy DeepInsight |
| 196 | to your Kubernetes cluster with a valid license. For more information please |
| 197 | reach out to Intel's support. |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 198 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 199 | To be able to use DeepInsight with SD-Fabric, use the following steps: |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 200 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 201 | * Modify the DeepInsight Helm Chart `values.yml` to include the following setting for `tos_mask`: |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 202 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 203 | .. code-block:: yaml |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 204 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 205 | global: |
| 206 | preprocessor: |
| 207 | params: |
| 208 | tos_mask: 0 |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 209 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 210 | * Deploy DeepInsight |
| 211 | * Obtain the IPv4 address of the fabric-facing NIC interface of the Kubernetes |
| 212 | node where the ``preprocessor`` container is deployed. This is the address to |
| 213 | use as the ``collectorIp`` in the ONOS netcfg. This address must be routable |
| 214 | by the fabric, i.e., make sure you can ping that from any other host in the |
| 215 | fabric. Similarly, from within the preprocessor container you should be able |
| 216 | to ping the loopback IPv4 address of **all** fabric switches (``ipv4Loopback`` |
| 217 | in the ONOS netcfg). If ping doesn't work, check the server's RPF |
| 218 | configuration, we recommend setting it to ``net.ipv4.conf.all.rp_filter=2``. |
| 219 | * Generate a ``topology.json`` using the |
| 220 | `SD-Fabric utility scripts |
| 221 | <https://github.com/opennetworkinglab/sdfabric-utils/tree/main/deep-insight>`_ |
| 222 | (includes instructions) and upload it using the DeepInsight UI. Make sure to |
| 223 | update and re-upload the ``topology.json`` frequently if you modify the |
| 224 | network configuration in ONOS (e.g., if you add/remove switches or links, |
| 225 | static routes, new hosts are discovered, etc.). |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 226 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 227 | Enabling Host-INT |
| 228 | ----------------- |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 229 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 230 | Support for INT on hosts is still experimental. |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 231 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 232 | Please check the documentation here to install ``int-host-reporter`` on your |
| 233 | servers: |
| 234 | ``https://github.com/opennetworkinglab/int-host-reporter`` |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 235 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 236 | Drop Reasons |
| 237 | ------------ |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 238 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 239 | We use the following reason codes when generating drop reports. Please use this |
| 240 | table as a reference when debugging drop reasons in DeepInsight or other INT |
| 241 | collector. |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 242 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 243 | .. list-table:: SD-Fabric INT Drop Reasons |
| 244 | :widths: 15 25 60 |
| 245 | :header-rows: 1 |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 246 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 247 | * - Code |
| 248 | - Name |
| 249 | - Description |
| 250 | * - 0 |
| 251 | - UNKNOWN |
| 252 | - Drop with unknown reason. |
| 253 | * - 26 |
| 254 | - IP TTL ZERO |
| 255 | - IPv4 or IPv6 TTL zero. There might be a forwarding loop. |
| 256 | * - 29 |
| 257 | - IP MISS |
| 258 | - IPv4 or IPv6 routing table miss. Check for missing routes in ONOS or if |
| 259 | the host is discovered. |
| 260 | * - 55 |
| 261 | - INGRESS PORT VLAN MAPPING MISS |
| 262 | - Ingress port VLAN table miss. Packets are being received with an |
| 263 | unexpected VLAN. Check the ``interfaces`` section of the netcfg. |
| 264 | * - 71 |
| 265 | - TRAFFIC MANAGER |
| 266 | - Packet dropped by traffic manager due to congestion (tail drop) or |
| 267 | because the port is down. |
| 268 | * - 80 |
| 269 | - ACL DENY |
| 270 | - Check the ACL table rules. |
| 271 | * - 89 |
| 272 | - BRIDGING MISS |
| 273 | - Missing bridging entry. Check table entry from bridging table. |
| 274 | * - 128 (WIP) |
| 275 | - NEXT ID MISS |
| 276 | - Missing next ID from ECMP (``hashed``) or multicast table. |
| 277 | * - 129 |
| 278 | - MPLS MISS |
| 279 | - MPLS table miss. Check the segment routing device config in the netcfg. |
| 280 | * - 130 |
| 281 | - EGRESS NEXT MISS |
| 282 | - Egress VLAN table miss. Check the ``interfaces`` section of the netcfg. |
| 283 | * - 131 |
| 284 | - MPLS TTL ZERO |
| 285 | - There might be a forwarding loop. |
| 286 | * - 132 |
| 287 | - UPF DOWNLINK PDR MISS |
| 288 | - Missing downlink PDR rule for the UE. Check UP4 flows. |
| 289 | * - 133 |
| 290 | - UPF UPLINK PDR MISS |
| 291 | - Missing uplink PDR rule. Check UP4 flows. |
| 292 | * - 134 |
| 293 | - UPF FAR MISS |
| 294 | - Missing FAR rule. Check UP4 flows. |
| 295 | * - 150 |
| 296 | - UPF UPLINK RECIRC DENY |
| 297 | - Missing rules for UE-to-UE communication. |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 298 | |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 299 | |