Charles Chan | caebcf3 | 2021-09-20 22:17:52 -0700 | [diff] [blame] | 1 | In-band Network Telemetry (INT) |
| 2 | =============================== |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 3 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 4 | Overview |
| 5 | -------- |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 6 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 7 | SD-Fabric supports the Inband Network Telemetry (INT) standard for data plane |
| 8 | telemetry. |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 9 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 10 | When INT is enabled, all switches are instrumented to generate INT reports for |
| 11 | all traffic, reporting per-packet metadata such as the switch ID, ingress/egress |
| 12 | ports, latency, queue congestion status, etc. Report generation is handled |
| 13 | entirely at the data plane, in a way that does not affect the performance |
| 14 | observed by regular traffic. |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 15 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 16 | .. image:: ../images/int-overview.png |
| 17 | :width: 700px |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 18 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 19 | We aim at achieving visibility end-to-end. For this reason, we provide an |
| 20 | implementation of INT for switches as well hosts. For switches, INT report |
| 21 | generation is integrated as part of the same P4 pipeline responsible for |
| 22 | bridging, routing, UPF, etc. For hosts, we provide *experimental* support for |
| 23 | an eBPF-based application that can monitor packets as they are processed by the |
| 24 | Kernel networking stack and Kubernetes CNI plug-ins. In the following, we use |
| 25 | the term INT nodes to refer to both switches and hosts. |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 26 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 27 | SD-Fabric is responsible for producing and delivering INT report packets to an |
| 28 | external collector. The actual collection and analysis of reports is out of |
| 29 | scope, but we support integration with 3rd party analytics platforms. SD-Fabric |
| 30 | is currently being validated for integration with Intel\ :sup:`TM` DeepInsight, a |
| 31 | commercial analytics platform. However, any collector compatible with the INT |
| 32 | standard can be used instead. |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 33 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 34 | Supported Features |
| 35 | ~~~~~~~~~~~~~~~~~~ |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 36 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 37 | * **Telemetry Report Format Specification v0.5**: report packets generated by |
| 38 | nodes adhere to this version of the standard. |
| 39 | * **INT-XD mode (eXport Data)**: all nodes generate "postcards". For a given |
| 40 | packet, the INT collector might receive up to N reports where N is the number |
| 41 | of INT nodes in the path. |
| 42 | * **Configurable watchlist**: specify which flows to monitor. It could be all |
| 43 | traffic, entire subnets, or specific 5-tuples. |
| 44 | * **Flow reports**: for a given flow (5-tuple), each node produces reports |
| 45 | periodically allowing a collector to monitor the path and end-to-end latency, |
| 46 | as well as detect anomalies such as path loop/change. |
| 47 | * **Drop reports**: when a node drops a packet, it generates a report carrying |
| 48 | the switch ID and the drop reason (e.g., routing table miss, TTL zero, queue |
| 49 | congestion, and more). |
| 50 | * **Queue congestion reports**: when queue utilization goes above a configurable |
| 51 | threshold, switches produce reports for all packets in the queue, |
| 52 | making it possible to identify exactly which flow is causing congestion. |
| 53 | * **Smart filters and triggers**: generating INT reports for each packet seen by |
| 54 | a node can lead to excessive network overhead and overloading at the |
| 55 | collector. For this reason, nodes implement logic to limit the volume of |
| 56 | reports generated in a way that doesn't cause anomalies to go undetected. For |
| 57 | flow reports and drop reports, the pipeline generates 1 report/sec for each |
| 58 | 5-tuple, or more when detecting anomalies (e.g., changes in the ingress/egress |
| 59 | port, queues, hop latency, etc). For queue congestion reports, the number of |
| 60 | reports that can be generated for each congestion event is limited to a |
| 61 | configurable "quota". |
| 62 | * **Integration with P4-UPF**: when processing GTP-U encapsulated packets, |
| 63 | switches can watch inside GTP-U tunnels, generating reports for the inner |
| 64 | headers and making it possible to troubleshoot issues at the application |
| 65 | level. In addition, when generating drop reports, we support UPF-specific drop |
| 66 | reasons to identify if drops are caused by the UPF tables (because of a |
| 67 | misconfiguration somewhere in the control stack, or simply because the |
| 68 | specific user device is not authorized). |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 69 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 70 | INT Report Delivery |
| 71 | ~~~~~~~~~~~~~~~~~~~ |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 72 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 73 | INT reports generated by nodes are delivered to the INT collector using the same |
| 74 | fabric links. In the example below, user traffic goes through three switches. |
| 75 | Each one is generating an INT report packet (postcard), which is forwarded using |
| 76 | the same flow rules for regular traffic. |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 77 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 78 | .. image:: ../images/int-reports.png |
| 79 | :width: 700px |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 80 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 81 | This choice has the advantage of simplifying deployment and control plane logic, |
| 82 | as it doesn't require setting up a different network and handling installation |
| 83 | of flow rules specific to INT reports. However, the downside is that delivery of |
| 84 | INT reports can be subject to the same issues that we are trying to detect using |
| 85 | INT. For example, if a user packet is getting dropped because of missing routing |
| 86 | entries, the INT report generated for the drop event might also be dropped for |
| 87 | the same reason. |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 88 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 89 | In future releases, we might add support for using the management network for |
| 90 | report delivery, but for now using the fabric network is the only supported |
| 91 | option. |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 92 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 93 | ONOS Configuration |
| 94 | ------------------ |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 95 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 96 | To enable INT, modify the ONOS netcfg in the following way: |
| 97 | |
| 98 | * in the ``devices`` section, use an INT-enabled pipeconf ID (``fabric-int`` |
| 99 | or ``fabric-spgw-int``); |
| 100 | * in the ``apps`` section, add a config block for app ID |
| 101 | ``org.stratumproject.fabric.tna.inbandtelemetry``, like in the example below: |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 102 | |
| 103 | .. code-block:: json |
| 104 | |
| 105 | { |
| 106 | "apps": { |
| 107 | "org.stratumproject.fabric.tna.inbandtelemetry": { |
| 108 | "report": { |
| 109 | "collectorIp": "10.32.11.2", |
| 110 | "collectorPort": 32766, |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 111 | "minFlowHopLatencyChangeNs": 256, |
Yi Tseng | 1aa9da7 | 2021-09-28 02:33:14 -0700 | [diff] [blame] | 112 | "watchSubnets": [ |
| 113 | "10.32.11.0/24" |
| 114 | ], |
| 115 | "queueReportLatencyThresholds": { |
| 116 | "0": {"triggerNs": 2000, "resetNs": 500}, |
| 117 | "2": {"triggerNs": 1000, "resetNs": 400} |
| 118 | } |
| 119 | } |
| 120 | } |
| 121 | } |
| 122 | } |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 123 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 124 | Here's a reference of the fields that you can configure for the INT app: |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 125 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 126 | * ``collectorIp``: The IP address of the INT collector. Must be an IP address |
| 127 | routable by the fabric, either the IP address of a host directly connected to |
| 128 | the fabric and discovered by ONOS, or reachable via an external router. |
| 129 | *Required* |
| 130 | |
| 131 | * ``collectorPort``: The UDP port used by the INT collector to listen for report |
| 132 | packets. *Required* |
| 133 | |
| 134 | * ``minFlowHopLatencyChangeNs``: Minimum latency difference in nanoseconds to |
| 135 | trigger flow report generation. *Optional, default is 256.* |
| 136 | |
| 137 | Used by the smart filters to immediately report abnormal latency changes. In |
| 138 | normal conditions, switches generate 1 report per second for each active |
| 139 | 5-tuple. During congestion, when packets experience higher latency, the |
| 140 | switch will generate a report immediately if the latency difference between |
| 141 | this packet and the previous one of the same 5-tuple is greater than |
| 142 | ``minFlowHopLatencyChangeNs``. |
| 143 | |
| 144 | **Warning:** Setting ``minFlowHopLatencyChangeNs`` to ``0`` or small values |
| 145 | (lower than the switch normal jitter) will cause the switch to generate a lot |
| 146 | of reports. The current implementation only supports powers of 2. |
| 147 | |
| 148 | * ``watchSubnets``: List of IPv4 prefixes to add to the watchlist. |
| 149 | *Optional, default is an empty list.* |
| 150 | |
| 151 | All traffic with source or destination IPv4 address included in one of these |
| 152 | prefixes will be reported (both flow and drop reports). All other packets will |
| 153 | be ignored. To watch all traffic, use ``0.0.0.0/0``. For GTP-U encapsulated |
| 154 | traffic, the watchlist is always applied to the inner headers. Hence to |
| 155 | monitor UE traffic, you should provide the UE subnet. |
| 156 | |
| 157 | INT traffic is always excluded from the watchlist. |
| 158 | |
| 159 | The default value (empty list) implies that flow reports and drop reports |
| 160 | are disabled. |
| 161 | |
| 162 | * ``queueReportLatencyThresholds``: A map specifying latency thresholds to |
| 163 | trigger queue reports or reset the queue report quota. |
| 164 | *Optional, default is an empty map.* |
| 165 | |
| 166 | The key of this map is the queue ID. Switches will generate queue congestion |
| 167 | reports only for queue IDs in this map. Congestion detection for other queues |
| 168 | is disabled. The same thresholds are used for all devices, i.e., it's not |
| 169 | possible to configure different thresholds for the same queue on different |
| 170 | devices. |
| 171 | |
| 172 | The value of this map is a tuple: |
| 173 | |
| 174 | * ``triggerNs``: The latency threshold in nanoseconds to trigger queue |
| 175 | reports. Once a packet experiences latency **above** this threshold, all |
| 176 | subsequent packets in the same queue will be reported, independently of the |
| 177 | watchlist, up to the quota or until latency drops below ``triggerNs``. |
| 178 | **Required** |
| 179 | * ``resetNs``: The latency threshold in nanosecond to reset the quota. When |
| 180 | packet latency goes below this threshold, the quota is reset to its original |
| 181 | non-zero value. **Optional, default is triggerNs/2**. |
| 182 | |
| 183 | Currently the default quota is 1024 and cannot be configured. |
| 184 | |
| 185 | |
| 186 | Intel\ :sup:`TM` DeepInsight Integration |
| 187 | ---------------------------------------- |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 188 | |
| 189 | .. note:: |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 190 | In this section, we assume that you already know how to deploy DeepInsight |
| 191 | to your Kubernetes cluster with a valid license. For more information please |
| 192 | reach out to Intel's support. |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 193 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 194 | To be able to use DeepInsight with SD-Fabric, use the following steps: |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 195 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 196 | * Modify the DeepInsight Helm Chart `values.yml` to include the following setting for `tos_mask`: |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 197 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 198 | .. code-block:: yaml |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 199 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 200 | global: |
| 201 | preprocessor: |
| 202 | params: |
| 203 | tos_mask: 0 |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 204 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 205 | * Deploy DeepInsight |
| 206 | * Obtain the IPv4 address of the fabric-facing NIC interface of the Kubernetes |
| 207 | node where the ``preprocessor`` container is deployed. This is the address to |
| 208 | use as the ``collectorIp`` in the ONOS netcfg. This address must be routable |
| 209 | by the fabric, i.e., make sure you can ping that from any other host in the |
| 210 | fabric. Similarly, from within the preprocessor container you should be able |
| 211 | to ping the loopback IPv4 address of **all** fabric switches (``ipv4Loopback`` |
| 212 | in the ONOS netcfg). If ping doesn't work, check the server's RPF |
| 213 | configuration, we recommend setting it to ``net.ipv4.conf.all.rp_filter=2``. |
| 214 | * Generate a ``topology.json`` using the |
| 215 | `SD-Fabric utility scripts |
| 216 | <https://github.com/opennetworkinglab/sdfabric-utils/tree/main/deep-insight>`_ |
| 217 | (includes instructions) and upload it using the DeepInsight UI. Make sure to |
| 218 | update and re-upload the ``topology.json`` frequently if you modify the |
| 219 | network configuration in ONOS (e.g., if you add/remove switches or links, |
| 220 | static routes, new hosts are discovered, etc.). |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 221 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 222 | Enabling Host-INT |
| 223 | ----------------- |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 224 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 225 | Support for INT on hosts is still experimental. |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 226 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 227 | Please check the documentation here to install ``int-host-reporter`` on your |
| 228 | servers: |
| 229 | ``https://github.com/opennetworkinglab/int-host-reporter`` |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 230 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 231 | Drop Reasons |
| 232 | ------------ |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 233 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 234 | We use the following reason codes when generating drop reports. Please use this |
| 235 | table as a reference when debugging drop reasons in DeepInsight or other INT |
| 236 | collector. |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 237 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 238 | .. list-table:: SD-Fabric INT Drop Reasons |
| 239 | :widths: 15 25 60 |
| 240 | :header-rows: 1 |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 241 | |
Carmelo Cascone | deffadd | 2021-10-05 18:28:58 -0700 | [diff] [blame] | 242 | * - Code |
| 243 | - Name |
| 244 | - Description |
| 245 | * - 0 |
| 246 | - UNKNOWN |
| 247 | - Drop with unknown reason. |
| 248 | * - 26 |
| 249 | - IP TTL ZERO |
| 250 | - IPv4 or IPv6 TTL zero. There might be a forwarding loop. |
| 251 | * - 29 |
| 252 | - IP MISS |
| 253 | - IPv4 or IPv6 routing table miss. Check for missing routes in ONOS or if |
| 254 | the host is discovered. |
| 255 | * - 55 |
| 256 | - INGRESS PORT VLAN MAPPING MISS |
| 257 | - Ingress port VLAN table miss. Packets are being received with an |
| 258 | unexpected VLAN. Check the ``interfaces`` section of the netcfg. |
| 259 | * - 71 |
| 260 | - TRAFFIC MANAGER |
| 261 | - Packet dropped by traffic manager due to congestion (tail drop) or |
| 262 | because the port is down. |
| 263 | * - 80 |
| 264 | - ACL DENY |
| 265 | - Check the ACL table rules. |
| 266 | * - 89 |
| 267 | - BRIDGING MISS |
| 268 | - Missing bridging entry. Check table entry from bridging table. |
| 269 | * - 128 (WIP) |
| 270 | - NEXT ID MISS |
| 271 | - Missing next ID from ECMP (``hashed``) or multicast table. |
| 272 | * - 129 |
| 273 | - MPLS MISS |
| 274 | - MPLS table miss. Check the segment routing device config in the netcfg. |
| 275 | * - 130 |
| 276 | - EGRESS NEXT MISS |
| 277 | - Egress VLAN table miss. Check the ``interfaces`` section of the netcfg. |
| 278 | * - 131 |
| 279 | - MPLS TTL ZERO |
| 280 | - There might be a forwarding loop. |
| 281 | * - 132 |
| 282 | - UPF DOWNLINK PDR MISS |
| 283 | - Missing downlink PDR rule for the UE. Check UP4 flows. |
| 284 | * - 133 |
| 285 | - UPF UPLINK PDR MISS |
| 286 | - Missing uplink PDR rule. Check UP4 flows. |
| 287 | * - 134 |
| 288 | - UPF FAR MISS |
| 289 | - Missing FAR rule. Check UP4 flows. |
| 290 | * - 150 |
| 291 | - UPF UPLINK RECIRC DENY |
| 292 | - Missing rules for UE-to-UE communication. |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 293 | |
Yi Tseng | d16c4db | 2021-09-29 03:45:05 -0700 | [diff] [blame] | 294 | |