blob: 283258311c09d31e26dabd15e191fd2a86c61799 [file] [log] [blame]
Carmelo Cascone7623e7c2021-10-13 17:45:27 -07001.. _int:
2
Charles Chancaebcf32021-09-20 22:17:52 -07003In-band Network Telemetry (INT)
4===============================
Yi Tseng1aa9da72021-09-28 02:33:14 -07005
Carmelo Casconedeffadd2021-10-05 18:28:58 -07006Overview
7--------
Yi Tseng1aa9da72021-09-28 02:33:14 -07008
Carmelo Casconedeffadd2021-10-05 18:28:58 -07009SD-Fabric supports the Inband Network Telemetry (INT) standard for data plane
10telemetry.
Yi Tseng1aa9da72021-09-28 02:33:14 -070011
Carmelo Casconedeffadd2021-10-05 18:28:58 -070012When INT is enabled, all switches are instrumented to generate INT reports for
13all traffic, reporting per-packet metadata such as the switch ID, ingress/egress
14ports, latency, queue congestion status, etc. Report generation is handled
15entirely at the data plane, in a way that does not affect the performance
16observed by regular traffic.
Yi Tseng1aa9da72021-09-28 02:33:14 -070017
Carmelo Casconedeffadd2021-10-05 18:28:58 -070018.. image:: ../images/int-overview.png
19 :width: 700px
Yi Tseng1aa9da72021-09-28 02:33:14 -070020
Carmelo Casconedeffadd2021-10-05 18:28:58 -070021We aim at achieving visibility end-to-end. For this reason, we provide an
22implementation of INT for switches as well hosts. For switches, INT report
23generation is integrated as part of the same P4 pipeline responsible for
24bridging, routing, UPF, etc. For hosts, we provide *experimental* support for
25an eBPF-based application that can monitor packets as they are processed by the
26Kernel networking stack and Kubernetes CNI plug-ins. In the following, we use
27the term INT nodes to refer to both switches and hosts.
Yi Tseng1aa9da72021-09-28 02:33:14 -070028
Carmelo Casconedeffadd2021-10-05 18:28:58 -070029SD-Fabric is responsible for producing and delivering INT report packets to an
30external collector. The actual collection and analysis of reports is out of
31scope, but we support integration with 3rd party analytics platforms. SD-Fabric
32is currently being validated for integration with Intel\ :sup:`TM` DeepInsight, a
33commercial analytics platform. However, any collector compatible with the INT
34standard can be used instead.
Yi Tseng1aa9da72021-09-28 02:33:14 -070035
Carmelo Casconedeffadd2021-10-05 18:28:58 -070036Supported Features
37~~~~~~~~~~~~~~~~~~
Yi Tseng1aa9da72021-09-28 02:33:14 -070038
Carmelo Casconedeffadd2021-10-05 18:28:58 -070039* **Telemetry Report Format Specification v0.5**: report packets generated by
40 nodes adhere to this version of the standard.
41* **INT-XD mode (eXport Data)**: all nodes generate "postcards". For a given
42 packet, the INT collector might receive up to N reports where N is the number
43 of INT nodes in the path.
44* **Configurable watchlist**: specify which flows to monitor. It could be all
45 traffic, entire subnets, or specific 5-tuples.
46* **Flow reports**: for a given flow (5-tuple), each node produces reports
47 periodically allowing a collector to monitor the path and end-to-end latency,
48 as well as detect anomalies such as path loop/change.
49* **Drop reports**: when a node drops a packet, it generates a report carrying
50 the switch ID and the drop reason (e.g., routing table miss, TTL zero, queue
51 congestion, and more).
52* **Queue congestion reports**: when queue utilization goes above a configurable
53 threshold, switches produce reports for all packets in the queue,
54 making it possible to identify exactly which flow is causing congestion.
55* **Smart filters and triggers**: generating INT reports for each packet seen by
56 a node can lead to excessive network overhead and overloading at the
57 collector. For this reason, nodes implement logic to limit the volume of
58 reports generated in a way that doesn't cause anomalies to go undetected. For
59 flow reports and drop reports, the pipeline generates 1 report/sec for each
60 5-tuple, or more when detecting anomalies (e.g., changes in the ingress/egress
61 port, queues, hop latency, etc). For queue congestion reports, the number of
62 reports that can be generated for each congestion event is limited to a
63 configurable "quota".
64* **Integration with P4-UPF**: when processing GTP-U encapsulated packets,
65 switches can watch inside GTP-U tunnels, generating reports for the inner
66 headers and making it possible to troubleshoot issues at the application
67 level. In addition, when generating drop reports, we support UPF-specific drop
68 reasons to identify if drops are caused by the UPF tables (because of a
69 misconfiguration somewhere in the control stack, or simply because the
70 specific user device is not authorized).
Yi Tseng1aa9da72021-09-28 02:33:14 -070071
Carmelo Casconedeffadd2021-10-05 18:28:58 -070072INT Report Delivery
73~~~~~~~~~~~~~~~~~~~
Yi Tseng1aa9da72021-09-28 02:33:14 -070074
Carmelo Casconedeffadd2021-10-05 18:28:58 -070075INT reports generated by nodes are delivered to the INT collector using the same
76fabric links. In the example below, user traffic goes through three switches.
77Each one is generating an INT report packet (postcard), which is forwarded using
78the same flow rules for regular traffic.
Yi Tseng1aa9da72021-09-28 02:33:14 -070079
Carmelo Casconedeffadd2021-10-05 18:28:58 -070080.. image:: ../images/int-reports.png
81 :width: 700px
Yi Tseng1aa9da72021-09-28 02:33:14 -070082
Carmelo Casconedeffadd2021-10-05 18:28:58 -070083This choice has the advantage of simplifying deployment and control plane logic,
84as it doesn't require setting up a different network and handling installation
85of flow rules specific to INT reports. However, the downside is that delivery of
86INT reports can be subject to the same issues that we are trying to detect using
87INT. For example, if a user packet is getting dropped because of missing routing
88entries, the INT report generated for the drop event might also be dropped for
89the same reason.
Yi Tseng1aa9da72021-09-28 02:33:14 -070090
Carmelo Casconedeffadd2021-10-05 18:28:58 -070091In future releases, we might add support for using the management network for
92report delivery, but for now using the fabric network is the only supported
93option.
Yi Tseng1aa9da72021-09-28 02:33:14 -070094
Carmelo Casconedeffadd2021-10-05 18:28:58 -070095ONOS Configuration
96------------------
Yi Tseng1aa9da72021-09-28 02:33:14 -070097
Carmelo Casconedeffadd2021-10-05 18:28:58 -070098To enable INT, modify the ONOS netcfg in the following way:
99
100* in the ``devices`` section, use an INT-enabled pipeconf ID (``fabric-int``
101 or ``fabric-spgw-int``);
102* in the ``apps`` section, add a config block for app ID
103 ``org.stratumproject.fabric.tna.inbandtelemetry``, like in the example below:
Yi Tseng1aa9da72021-09-28 02:33:14 -0700104
105.. code-block:: json
106
107 {
108 "apps": {
109 "org.stratumproject.fabric.tna.inbandtelemetry": {
110 "report": {
111 "collectorIp": "10.32.11.2",
112 "collectorPort": 32766,
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700113 "minFlowHopLatencyChangeNs": 256,
Yi Tseng1aa9da72021-09-28 02:33:14 -0700114 "watchSubnets": [
115 "10.32.11.0/24"
116 ],
117 "queueReportLatencyThresholds": {
118 "0": {"triggerNs": 2000, "resetNs": 500},
119 "2": {"triggerNs": 1000, "resetNs": 400}
120 }
121 }
122 }
123 }
124 }
Yi Tsengd16c4db2021-09-29 03:45:05 -0700125
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700126Here's a reference of the fields that you can configure for the INT app:
Yi Tsengd16c4db2021-09-29 03:45:05 -0700127
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700128* ``collectorIp``: The IP address of the INT collector. Must be an IP address
129 routable by the fabric, either the IP address of a host directly connected to
130 the fabric and discovered by ONOS, or reachable via an external router.
131 *Required*
132
133* ``collectorPort``: The UDP port used by the INT collector to listen for report
134 packets. *Required*
135
136* ``minFlowHopLatencyChangeNs``: Minimum latency difference in nanoseconds to
137 trigger flow report generation. *Optional, default is 256.*
138
139 Used by the smart filters to immediately report abnormal latency changes. In
140 normal conditions, switches generate 1 report per second for each active
141 5-tuple. During congestion, when packets experience higher latency, the
142 switch will generate a report immediately if the latency difference between
143 this packet and the previous one of the same 5-tuple is greater than
144 ``minFlowHopLatencyChangeNs``.
145
146 **Warning:** Setting ``minFlowHopLatencyChangeNs`` to ``0`` or small values
147 (lower than the switch normal jitter) will cause the switch to generate a lot
148 of reports. The current implementation only supports powers of 2.
149
150* ``watchSubnets``: List of IPv4 prefixes to add to the watchlist.
151 *Optional, default is an empty list.*
152
153 All traffic with source or destination IPv4 address included in one of these
154 prefixes will be reported (both flow and drop reports). All other packets will
155 be ignored. To watch all traffic, use ``0.0.0.0/0``. For GTP-U encapsulated
156 traffic, the watchlist is always applied to the inner headers. Hence to
157 monitor UE traffic, you should provide the UE subnet.
158
159 INT traffic is always excluded from the watchlist.
160
161 The default value (empty list) implies that flow reports and drop reports
162 are disabled.
163
164* ``queueReportLatencyThresholds``: A map specifying latency thresholds to
165 trigger queue reports or reset the queue report quota.
166 *Optional, default is an empty map.*
167
168 The key of this map is the queue ID. Switches will generate queue congestion
169 reports only for queue IDs in this map. Congestion detection for other queues
170 is disabled. The same thresholds are used for all devices, i.e., it's not
171 possible to configure different thresholds for the same queue on different
172 devices.
173
174 The value of this map is a tuple:
175
176 * ``triggerNs``: The latency threshold in nanoseconds to trigger queue
177 reports. Once a packet experiences latency **above** this threshold, all
178 subsequent packets in the same queue will be reported, independently of the
179 watchlist, up to the quota or until latency drops below ``triggerNs``.
180 **Required**
181 * ``resetNs``: The latency threshold in nanosecond to reset the quota. When
182 packet latency goes below this threshold, the quota is reset to its original
183 non-zero value. **Optional, default is triggerNs/2**.
184
185 Currently the default quota is 1024 and cannot be configured.
186
187
188Intel\ :sup:`TM` DeepInsight Integration
189----------------------------------------
Yi Tsengd16c4db2021-09-29 03:45:05 -0700190
191.. note::
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700192 In this section, we assume that you already know how to deploy DeepInsight
193 to your Kubernetes cluster with a valid license. For more information please
194 reach out to Intel's support.
Yi Tsengd16c4db2021-09-29 03:45:05 -0700195
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700196To be able to use DeepInsight with SD-Fabric, use the following steps:
Yi Tsengd16c4db2021-09-29 03:45:05 -0700197
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700198* Modify the DeepInsight Helm Chart `values.yml` to include the following setting for `tos_mask`:
Yi Tsengd16c4db2021-09-29 03:45:05 -0700199
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700200 .. code-block:: yaml
Yi Tsengd16c4db2021-09-29 03:45:05 -0700201
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700202 global:
203 preprocessor:
204 params:
205 tos_mask: 0
Yi Tsengd16c4db2021-09-29 03:45:05 -0700206
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700207* Deploy DeepInsight
208* Obtain the IPv4 address of the fabric-facing NIC interface of the Kubernetes
209 node where the ``preprocessor`` container is deployed. This is the address to
210 use as the ``collectorIp`` in the ONOS netcfg. This address must be routable
211 by the fabric, i.e., make sure you can ping that from any other host in the
212 fabric. Similarly, from within the preprocessor container you should be able
213 to ping the loopback IPv4 address of **all** fabric switches (``ipv4Loopback``
214 in the ONOS netcfg). If ping doesn't work, check the server's RPF
215 configuration, we recommend setting it to ``net.ipv4.conf.all.rp_filter=2``.
216* Generate a ``topology.json`` using the
217 `SD-Fabric utility scripts
218 <https://github.com/opennetworkinglab/sdfabric-utils/tree/main/deep-insight>`_
219 (includes instructions) and upload it using the DeepInsight UI. Make sure to
220 update and re-upload the ``topology.json`` frequently if you modify the
221 network configuration in ONOS (e.g., if you add/remove switches or links,
222 static routes, new hosts are discovered, etc.).
Yi Tsengd16c4db2021-09-29 03:45:05 -0700223
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700224Enabling Host-INT
225-----------------
Yi Tsengd16c4db2021-09-29 03:45:05 -0700226
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700227Support for INT on hosts is still experimental.
Yi Tsengd16c4db2021-09-29 03:45:05 -0700228
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700229Please check the documentation here to install ``int-host-reporter`` on your
230servers:
231``https://github.com/opennetworkinglab/int-host-reporter``
Yi Tsengd16c4db2021-09-29 03:45:05 -0700232
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700233Drop Reasons
234------------
Yi Tsengd16c4db2021-09-29 03:45:05 -0700235
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700236We use the following reason codes when generating drop reports. Please use this
237table as a reference when debugging drop reasons in DeepInsight or other INT
238collector.
Yi Tsengd16c4db2021-09-29 03:45:05 -0700239
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700240.. list-table:: SD-Fabric INT Drop Reasons
241 :widths: 15 25 60
242 :header-rows: 1
Yi Tsengd16c4db2021-09-29 03:45:05 -0700243
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700244 * - Code
245 - Name
246 - Description
247 * - 0
248 - UNKNOWN
249 - Drop with unknown reason.
250 * - 26
251 - IP TTL ZERO
252 - IPv4 or IPv6 TTL zero. There might be a forwarding loop.
253 * - 29
254 - IP MISS
255 - IPv4 or IPv6 routing table miss. Check for missing routes in ONOS or if
256 the host is discovered.
257 * - 55
258 - INGRESS PORT VLAN MAPPING MISS
259 - Ingress port VLAN table miss. Packets are being received with an
260 unexpected VLAN. Check the ``interfaces`` section of the netcfg.
261 * - 71
262 - TRAFFIC MANAGER
263 - Packet dropped by traffic manager due to congestion (tail drop) or
264 because the port is down.
265 * - 80
266 - ACL DENY
267 - Check the ACL table rules.
268 * - 89
269 - BRIDGING MISS
270 - Missing bridging entry. Check table entry from bridging table.
271 * - 128 (WIP)
272 - NEXT ID MISS
273 - Missing next ID from ECMP (``hashed``) or multicast table.
274 * - 129
275 - MPLS MISS
276 - MPLS table miss. Check the segment routing device config in the netcfg.
277 * - 130
278 - EGRESS NEXT MISS
279 - Egress VLAN table miss. Check the ``interfaces`` section of the netcfg.
280 * - 131
281 - MPLS TTL ZERO
282 - There might be a forwarding loop.
283 * - 132
284 - UPF DOWNLINK PDR MISS
285 - Missing downlink PDR rule for the UE. Check UP4 flows.
286 * - 133
287 - UPF UPLINK PDR MISS
288 - Missing uplink PDR rule. Check UP4 flows.
289 * - 134
290 - UPF FAR MISS
291 - Missing FAR rule. Check UP4 flows.
292 * - 150
293 - UPF UPLINK RECIRC DENY
294 - Missing rules for UE-to-UE communication.
Yi Tsengd16c4db2021-09-29 03:45:05 -0700295
Yi Tsengd16c4db2021-09-29 03:45:05 -0700296