blob: 31f2fc22a582f4472507a37c246925817f15efb4 [file] [log] [blame]
Charles Chanfcfe8902022-02-02 17:06:27 -08001.. SPDX-FileCopyrightText: 2021 Open Networking Foundation <info@opennetworking.org>
2.. SPDX-License-Identifier: Apache-2.0
3
Carmelo Cascone7623e7c2021-10-13 17:45:27 -07004.. _int:
5
Charles Chancaebcf32021-09-20 22:17:52 -07006In-band Network Telemetry (INT)
7===============================
Yi Tseng1aa9da72021-09-28 02:33:14 -07008
Carmelo Casconedeffadd2021-10-05 18:28:58 -07009Overview
10--------
Yi Tseng1aa9da72021-09-28 02:33:14 -070011
Carmelo Casconedeffadd2021-10-05 18:28:58 -070012SD-Fabric supports the Inband Network Telemetry (INT) standard for data plane
13telemetry.
Yi Tseng1aa9da72021-09-28 02:33:14 -070014
Carmelo Casconedeffadd2021-10-05 18:28:58 -070015When INT is enabled, all switches are instrumented to generate INT reports for
16all traffic, reporting per-packet metadata such as the switch ID, ingress/egress
17ports, latency, queue congestion status, etc. Report generation is handled
18entirely at the data plane, in a way that does not affect the performance
19observed by regular traffic.
Yi Tseng1aa9da72021-09-28 02:33:14 -070020
Carmelo Casconedeffadd2021-10-05 18:28:58 -070021.. image:: ../images/int-overview.png
22 :width: 700px
Yi Tseng1aa9da72021-09-28 02:33:14 -070023
Carmelo Casconedeffadd2021-10-05 18:28:58 -070024We aim at achieving visibility end-to-end. For this reason, we provide an
25implementation of INT for switches as well hosts. For switches, INT report
26generation is integrated as part of the same P4 pipeline responsible for
27bridging, routing, UPF, etc. For hosts, we provide *experimental* support for
28an eBPF-based application that can monitor packets as they are processed by the
29Kernel networking stack and Kubernetes CNI plug-ins. In the following, we use
30the term INT nodes to refer to both switches and hosts.
Yi Tseng1aa9da72021-09-28 02:33:14 -070031
Carmelo Casconedeffadd2021-10-05 18:28:58 -070032SD-Fabric is responsible for producing and delivering INT report packets to an
33external collector. The actual collection and analysis of reports is out of
34scope, but we support integration with 3rd party analytics platforms. SD-Fabric
35is currently being validated for integration with Intel\ :sup:`TM` DeepInsight, a
36commercial analytics platform. However, any collector compatible with the INT
37standard can be used instead.
Yi Tseng1aa9da72021-09-28 02:33:14 -070038
Carmelo Casconedeffadd2021-10-05 18:28:58 -070039Supported Features
40~~~~~~~~~~~~~~~~~~
Yi Tseng1aa9da72021-09-28 02:33:14 -070041
Carmelo Casconedeffadd2021-10-05 18:28:58 -070042* **Telemetry Report Format Specification v0.5**: report packets generated by
43 nodes adhere to this version of the standard.
44* **INT-XD mode (eXport Data)**: all nodes generate "postcards". For a given
45 packet, the INT collector might receive up to N reports where N is the number
46 of INT nodes in the path.
47* **Configurable watchlist**: specify which flows to monitor. It could be all
48 traffic, entire subnets, or specific 5-tuples.
49* **Flow reports**: for a given flow (5-tuple), each node produces reports
50 periodically allowing a collector to monitor the path and end-to-end latency,
51 as well as detect anomalies such as path loop/change.
52* **Drop reports**: when a node drops a packet, it generates a report carrying
53 the switch ID and the drop reason (e.g., routing table miss, TTL zero, queue
54 congestion, and more).
55* **Queue congestion reports**: when queue utilization goes above a configurable
56 threshold, switches produce reports for all packets in the queue,
57 making it possible to identify exactly which flow is causing congestion.
58* **Smart filters and triggers**: generating INT reports for each packet seen by
59 a node can lead to excessive network overhead and overloading at the
60 collector. For this reason, nodes implement logic to limit the volume of
61 reports generated in a way that doesn't cause anomalies to go undetected. For
62 flow reports and drop reports, the pipeline generates 1 report/sec for each
63 5-tuple, or more when detecting anomalies (e.g., changes in the ingress/egress
64 port, queues, hop latency, etc). For queue congestion reports, the number of
65 reports that can be generated for each congestion event is limited to a
66 configurable "quota".
67* **Integration with P4-UPF**: when processing GTP-U encapsulated packets,
68 switches can watch inside GTP-U tunnels, generating reports for the inner
69 headers and making it possible to troubleshoot issues at the application
70 level. In addition, when generating drop reports, we support UPF-specific drop
71 reasons to identify if drops are caused by the UPF tables (because of a
72 misconfiguration somewhere in the control stack, or simply because the
73 specific user device is not authorized).
Yi Tseng1aa9da72021-09-28 02:33:14 -070074
Carmelo Casconedeffadd2021-10-05 18:28:58 -070075INT Report Delivery
76~~~~~~~~~~~~~~~~~~~
Yi Tseng1aa9da72021-09-28 02:33:14 -070077
Carmelo Casconedeffadd2021-10-05 18:28:58 -070078INT reports generated by nodes are delivered to the INT collector using the same
79fabric links. In the example below, user traffic goes through three switches.
80Each one is generating an INT report packet (postcard), which is forwarded using
81the same flow rules for regular traffic.
Yi Tseng1aa9da72021-09-28 02:33:14 -070082
Carmelo Casconedeffadd2021-10-05 18:28:58 -070083.. image:: ../images/int-reports.png
84 :width: 700px
Yi Tseng1aa9da72021-09-28 02:33:14 -070085
Carmelo Casconedeffadd2021-10-05 18:28:58 -070086This choice has the advantage of simplifying deployment and control plane logic,
87as it doesn't require setting up a different network and handling installation
88of flow rules specific to INT reports. However, the downside is that delivery of
89INT reports can be subject to the same issues that we are trying to detect using
90INT. For example, if a user packet is getting dropped because of missing routing
91entries, the INT report generated for the drop event might also be dropped for
92the same reason.
Yi Tseng1aa9da72021-09-28 02:33:14 -070093
Carmelo Casconedeffadd2021-10-05 18:28:58 -070094In future releases, we might add support for using the management network for
95report delivery, but for now using the fabric network is the only supported
96option.
Yi Tseng1aa9da72021-09-28 02:33:14 -070097
Carmelo Casconedeffadd2021-10-05 18:28:58 -070098ONOS Configuration
99------------------
Yi Tseng1aa9da72021-09-28 02:33:14 -0700100
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700101To enable INT, modify the ONOS netcfg in the following way:
102
103* in the ``devices`` section, use an INT-enabled pipeconf ID (``fabric-int``
Carmelo Casconef2b17912022-02-04 09:27:27 -0800104 or ``fabric-upf-int``);
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700105* in the ``apps`` section, add a config block for app ID
106 ``org.stratumproject.fabric.tna.inbandtelemetry``, like in the example below:
Yi Tseng1aa9da72021-09-28 02:33:14 -0700107
108.. code-block:: json
109
110 {
111 "apps": {
112 "org.stratumproject.fabric.tna.inbandtelemetry": {
113 "report": {
114 "collectorIp": "10.32.11.2",
115 "collectorPort": 32766,
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700116 "minFlowHopLatencyChangeNs": 256,
Yi Tseng1aa9da72021-09-28 02:33:14 -0700117 "watchSubnets": [
118 "10.32.11.0/24"
119 ],
120 "queueReportLatencyThresholds": {
121 "0": {"triggerNs": 2000, "resetNs": 500},
122 "2": {"triggerNs": 1000, "resetNs": 400}
123 }
124 }
125 }
126 }
127 }
Yi Tsengd16c4db2021-09-29 03:45:05 -0700128
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700129Here's a reference of the fields that you can configure for the INT app:
Yi Tsengd16c4db2021-09-29 03:45:05 -0700130
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700131* ``collectorIp``: The IP address of the INT collector. Must be an IP address
132 routable by the fabric, either the IP address of a host directly connected to
133 the fabric and discovered by ONOS, or reachable via an external router.
134 *Required*
135
136* ``collectorPort``: The UDP port used by the INT collector to listen for report
137 packets. *Required*
138
139* ``minFlowHopLatencyChangeNs``: Minimum latency difference in nanoseconds to
140 trigger flow report generation. *Optional, default is 256.*
141
142 Used by the smart filters to immediately report abnormal latency changes. In
143 normal conditions, switches generate 1 report per second for each active
144 5-tuple. During congestion, when packets experience higher latency, the
145 switch will generate a report immediately if the latency difference between
146 this packet and the previous one of the same 5-tuple is greater than
147 ``minFlowHopLatencyChangeNs``.
148
149 **Warning:** Setting ``minFlowHopLatencyChangeNs`` to ``0`` or small values
150 (lower than the switch normal jitter) will cause the switch to generate a lot
151 of reports. The current implementation only supports powers of 2.
152
153* ``watchSubnets``: List of IPv4 prefixes to add to the watchlist.
154 *Optional, default is an empty list.*
155
156 All traffic with source or destination IPv4 address included in one of these
157 prefixes will be reported (both flow and drop reports). All other packets will
158 be ignored. To watch all traffic, use ``0.0.0.0/0``. For GTP-U encapsulated
159 traffic, the watchlist is always applied to the inner headers. Hence to
160 monitor UE traffic, you should provide the UE subnet.
161
162 INT traffic is always excluded from the watchlist.
163
164 The default value (empty list) implies that flow reports and drop reports
165 are disabled.
166
167* ``queueReportLatencyThresholds``: A map specifying latency thresholds to
168 trigger queue reports or reset the queue report quota.
169 *Optional, default is an empty map.*
170
171 The key of this map is the queue ID. Switches will generate queue congestion
172 reports only for queue IDs in this map. Congestion detection for other queues
173 is disabled. The same thresholds are used for all devices, i.e., it's not
174 possible to configure different thresholds for the same queue on different
175 devices.
176
177 The value of this map is a tuple:
178
179 * ``triggerNs``: The latency threshold in nanoseconds to trigger queue
180 reports. Once a packet experiences latency **above** this threshold, all
181 subsequent packets in the same queue will be reported, independently of the
182 watchlist, up to the quota or until latency drops below ``triggerNs``.
183 **Required**
184 * ``resetNs``: The latency threshold in nanosecond to reset the quota. When
185 packet latency goes below this threshold, the quota is reset to its original
186 non-zero value. **Optional, default is triggerNs/2**.
187
188 Currently the default quota is 1024 and cannot be configured.
189
190
191Intel\ :sup:`TM` DeepInsight Integration
192----------------------------------------
Yi Tsengd16c4db2021-09-29 03:45:05 -0700193
194.. note::
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700195 In this section, we assume that you already know how to deploy DeepInsight
196 to your Kubernetes cluster with a valid license. For more information please
197 reach out to Intel's support.
Yi Tsengd16c4db2021-09-29 03:45:05 -0700198
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700199To be able to use DeepInsight with SD-Fabric, use the following steps:
Yi Tsengd16c4db2021-09-29 03:45:05 -0700200
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700201* Modify the DeepInsight Helm Chart `values.yml` to include the following setting for `tos_mask`:
Yi Tsengd16c4db2021-09-29 03:45:05 -0700202
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700203 .. code-block:: yaml
Yi Tsengd16c4db2021-09-29 03:45:05 -0700204
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700205 global:
206 preprocessor:
207 params:
208 tos_mask: 0
Yi Tsengd16c4db2021-09-29 03:45:05 -0700209
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700210* Deploy DeepInsight
211* Obtain the IPv4 address of the fabric-facing NIC interface of the Kubernetes
212 node where the ``preprocessor`` container is deployed. This is the address to
213 use as the ``collectorIp`` in the ONOS netcfg. This address must be routable
214 by the fabric, i.e., make sure you can ping that from any other host in the
215 fabric. Similarly, from within the preprocessor container you should be able
216 to ping the loopback IPv4 address of **all** fabric switches (``ipv4Loopback``
217 in the ONOS netcfg). If ping doesn't work, check the server's RPF
218 configuration, we recommend setting it to ``net.ipv4.conf.all.rp_filter=2``.
219* Generate a ``topology.json`` using the
220 `SD-Fabric utility scripts
221 <https://github.com/opennetworkinglab/sdfabric-utils/tree/main/deep-insight>`_
222 (includes instructions) and upload it using the DeepInsight UI. Make sure to
223 update and re-upload the ``topology.json`` frequently if you modify the
224 network configuration in ONOS (e.g., if you add/remove switches or links,
225 static routes, new hosts are discovered, etc.).
Yi Tsengd16c4db2021-09-29 03:45:05 -0700226
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700227Enabling Host-INT
228-----------------
Yi Tsengd16c4db2021-09-29 03:45:05 -0700229
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700230Support for INT on hosts is still experimental.
Yi Tsengd16c4db2021-09-29 03:45:05 -0700231
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700232Please check the documentation here to install ``int-host-reporter`` on your
233servers:
234``https://github.com/opennetworkinglab/int-host-reporter``
Yi Tsengd16c4db2021-09-29 03:45:05 -0700235
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700236Drop Reasons
237------------
Yi Tsengd16c4db2021-09-29 03:45:05 -0700238
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700239We use the following reason codes when generating drop reports. Please use this
240table as a reference when debugging drop reasons in DeepInsight or other INT
241collector.
Yi Tsengd16c4db2021-09-29 03:45:05 -0700242
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700243.. list-table:: SD-Fabric INT Drop Reasons
244 :widths: 15 25 60
245 :header-rows: 1
Yi Tsengd16c4db2021-09-29 03:45:05 -0700246
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700247 * - Code
248 - Name
249 - Description
250 * - 0
251 - UNKNOWN
252 - Drop with unknown reason.
253 * - 26
254 - IP TTL ZERO
255 - IPv4 or IPv6 TTL zero. There might be a forwarding loop.
256 * - 29
257 - IP MISS
258 - IPv4 or IPv6 routing table miss. Check for missing routes in ONOS or if
259 the host is discovered.
260 * - 55
261 - INGRESS PORT VLAN MAPPING MISS
262 - Ingress port VLAN table miss. Packets are being received with an
263 unexpected VLAN. Check the ``interfaces`` section of the netcfg.
264 * - 71
265 - TRAFFIC MANAGER
266 - Packet dropped by traffic manager due to congestion (tail drop) or
267 because the port is down.
268 * - 80
269 - ACL DENY
270 - Check the ACL table rules.
271 * - 89
272 - BRIDGING MISS
273 - Missing bridging entry. Check table entry from bridging table.
274 * - 128 (WIP)
275 - NEXT ID MISS
276 - Missing next ID from ECMP (``hashed``) or multicast table.
277 * - 129
278 - MPLS MISS
279 - MPLS table miss. Check the segment routing device config in the netcfg.
280 * - 130
281 - EGRESS NEXT MISS
282 - Egress VLAN table miss. Check the ``interfaces`` section of the netcfg.
283 * - 131
284 - MPLS TTL ZERO
285 - There might be a forwarding loop.
286 * - 132
287 - UPF DOWNLINK PDR MISS
288 - Missing downlink PDR rule for the UE. Check UP4 flows.
289 * - 133
290 - UPF UPLINK PDR MISS
291 - Missing uplink PDR rule. Check UP4 flows.
292 * - 134
293 - UPF FAR MISS
294 - Missing FAR rule. Check UP4 flows.
295 * - 150
296 - UPF UPLINK RECIRC DENY
297 - Missing rules for UE-to-UE communication.
Yi Tsengd16c4db2021-09-29 03:45:05 -0700298
Yi Tseng16ec50b2022-03-03 10:33:17 -0800299Known Issues and Limitations
300----------------------------
301
302 * Some INT collectors might not support dual-homed topology.
Yi Tsengd16c4db2021-09-29 03:45:05 -0700303