blob: 390c06f6cdf887ac0439f4e270f30203c22b995b [file] [log] [blame]
Charles Chancaebcf32021-09-20 22:17:52 -07001In-band Network Telemetry (INT)
2===============================
Yi Tseng1aa9da72021-09-28 02:33:14 -07003
Carmelo Casconedeffadd2021-10-05 18:28:58 -07004Overview
5--------
Yi Tseng1aa9da72021-09-28 02:33:14 -07006
Carmelo Casconedeffadd2021-10-05 18:28:58 -07007SD-Fabric supports the Inband Network Telemetry (INT) standard for data plane
8telemetry.
Yi Tseng1aa9da72021-09-28 02:33:14 -07009
Carmelo Casconedeffadd2021-10-05 18:28:58 -070010When INT is enabled, all switches are instrumented to generate INT reports for
11all traffic, reporting per-packet metadata such as the switch ID, ingress/egress
12ports, latency, queue congestion status, etc. Report generation is handled
13entirely at the data plane, in a way that does not affect the performance
14observed by regular traffic.
Yi Tseng1aa9da72021-09-28 02:33:14 -070015
Carmelo Casconedeffadd2021-10-05 18:28:58 -070016.. image:: ../images/int-overview.png
17 :width: 700px
Yi Tseng1aa9da72021-09-28 02:33:14 -070018
Carmelo Casconedeffadd2021-10-05 18:28:58 -070019We aim at achieving visibility end-to-end. For this reason, we provide an
20implementation of INT for switches as well hosts. For switches, INT report
21generation is integrated as part of the same P4 pipeline responsible for
22bridging, routing, UPF, etc. For hosts, we provide *experimental* support for
23an eBPF-based application that can monitor packets as they are processed by the
24Kernel networking stack and Kubernetes CNI plug-ins. In the following, we use
25the term INT nodes to refer to both switches and hosts.
Yi Tseng1aa9da72021-09-28 02:33:14 -070026
Carmelo Casconedeffadd2021-10-05 18:28:58 -070027SD-Fabric is responsible for producing and delivering INT report packets to an
28external collector. The actual collection and analysis of reports is out of
29scope, but we support integration with 3rd party analytics platforms. SD-Fabric
30is currently being validated for integration with Intel\ :sup:`TM` DeepInsight, a
31commercial analytics platform. However, any collector compatible with the INT
32standard can be used instead.
Yi Tseng1aa9da72021-09-28 02:33:14 -070033
Carmelo Casconedeffadd2021-10-05 18:28:58 -070034Supported Features
35~~~~~~~~~~~~~~~~~~
Yi Tseng1aa9da72021-09-28 02:33:14 -070036
Carmelo Casconedeffadd2021-10-05 18:28:58 -070037* **Telemetry Report Format Specification v0.5**: report packets generated by
38 nodes adhere to this version of the standard.
39* **INT-XD mode (eXport Data)**: all nodes generate "postcards". For a given
40 packet, the INT collector might receive up to N reports where N is the number
41 of INT nodes in the path.
42* **Configurable watchlist**: specify which flows to monitor. It could be all
43 traffic, entire subnets, or specific 5-tuples.
44* **Flow reports**: for a given flow (5-tuple), each node produces reports
45 periodically allowing a collector to monitor the path and end-to-end latency,
46 as well as detect anomalies such as path loop/change.
47* **Drop reports**: when a node drops a packet, it generates a report carrying
48 the switch ID and the drop reason (e.g., routing table miss, TTL zero, queue
49 congestion, and more).
50* **Queue congestion reports**: when queue utilization goes above a configurable
51 threshold, switches produce reports for all packets in the queue,
52 making it possible to identify exactly which flow is causing congestion.
53* **Smart filters and triggers**: generating INT reports for each packet seen by
54 a node can lead to excessive network overhead and overloading at the
55 collector. For this reason, nodes implement logic to limit the volume of
56 reports generated in a way that doesn't cause anomalies to go undetected. For
57 flow reports and drop reports, the pipeline generates 1 report/sec for each
58 5-tuple, or more when detecting anomalies (e.g., changes in the ingress/egress
59 port, queues, hop latency, etc). For queue congestion reports, the number of
60 reports that can be generated for each congestion event is limited to a
61 configurable "quota".
62* **Integration with P4-UPF**: when processing GTP-U encapsulated packets,
63 switches can watch inside GTP-U tunnels, generating reports for the inner
64 headers and making it possible to troubleshoot issues at the application
65 level. In addition, when generating drop reports, we support UPF-specific drop
66 reasons to identify if drops are caused by the UPF tables (because of a
67 misconfiguration somewhere in the control stack, or simply because the
68 specific user device is not authorized).
Yi Tseng1aa9da72021-09-28 02:33:14 -070069
Carmelo Casconedeffadd2021-10-05 18:28:58 -070070INT Report Delivery
71~~~~~~~~~~~~~~~~~~~
Yi Tseng1aa9da72021-09-28 02:33:14 -070072
Carmelo Casconedeffadd2021-10-05 18:28:58 -070073INT reports generated by nodes are delivered to the INT collector using the same
74fabric links. In the example below, user traffic goes through three switches.
75Each one is generating an INT report packet (postcard), which is forwarded using
76the same flow rules for regular traffic.
Yi Tseng1aa9da72021-09-28 02:33:14 -070077
Carmelo Casconedeffadd2021-10-05 18:28:58 -070078.. image:: ../images/int-reports.png
79 :width: 700px
Yi Tseng1aa9da72021-09-28 02:33:14 -070080
Carmelo Casconedeffadd2021-10-05 18:28:58 -070081This choice has the advantage of simplifying deployment and control plane logic,
82as it doesn't require setting up a different network and handling installation
83of flow rules specific to INT reports. However, the downside is that delivery of
84INT reports can be subject to the same issues that we are trying to detect using
85INT. For example, if a user packet is getting dropped because of missing routing
86entries, the INT report generated for the drop event might also be dropped for
87the same reason.
Yi Tseng1aa9da72021-09-28 02:33:14 -070088
Carmelo Casconedeffadd2021-10-05 18:28:58 -070089In future releases, we might add support for using the management network for
90report delivery, but for now using the fabric network is the only supported
91option.
Yi Tseng1aa9da72021-09-28 02:33:14 -070092
Carmelo Casconedeffadd2021-10-05 18:28:58 -070093ONOS Configuration
94------------------
Yi Tseng1aa9da72021-09-28 02:33:14 -070095
Carmelo Casconedeffadd2021-10-05 18:28:58 -070096To enable INT, modify the ONOS netcfg in the following way:
97
98* in the ``devices`` section, use an INT-enabled pipeconf ID (``fabric-int``
99 or ``fabric-spgw-int``);
100* in the ``apps`` section, add a config block for app ID
101 ``org.stratumproject.fabric.tna.inbandtelemetry``, like in the example below:
Yi Tseng1aa9da72021-09-28 02:33:14 -0700102
103.. code-block:: json
104
105 {
106 "apps": {
107 "org.stratumproject.fabric.tna.inbandtelemetry": {
108 "report": {
109 "collectorIp": "10.32.11.2",
110 "collectorPort": 32766,
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700111 "minFlowHopLatencyChangeNs": 256,
Yi Tseng1aa9da72021-09-28 02:33:14 -0700112 "watchSubnets": [
113 "10.32.11.0/24"
114 ],
115 "queueReportLatencyThresholds": {
116 "0": {"triggerNs": 2000, "resetNs": 500},
117 "2": {"triggerNs": 1000, "resetNs": 400}
118 }
119 }
120 }
121 }
122 }
Yi Tsengd16c4db2021-09-29 03:45:05 -0700123
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700124Here's a reference of the fields that you can configure for the INT app:
Yi Tsengd16c4db2021-09-29 03:45:05 -0700125
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700126* ``collectorIp``: The IP address of the INT collector. Must be an IP address
127 routable by the fabric, either the IP address of a host directly connected to
128 the fabric and discovered by ONOS, or reachable via an external router.
129 *Required*
130
131* ``collectorPort``: The UDP port used by the INT collector to listen for report
132 packets. *Required*
133
134* ``minFlowHopLatencyChangeNs``: Minimum latency difference in nanoseconds to
135 trigger flow report generation. *Optional, default is 256.*
136
137 Used by the smart filters to immediately report abnormal latency changes. In
138 normal conditions, switches generate 1 report per second for each active
139 5-tuple. During congestion, when packets experience higher latency, the
140 switch will generate a report immediately if the latency difference between
141 this packet and the previous one of the same 5-tuple is greater than
142 ``minFlowHopLatencyChangeNs``.
143
144 **Warning:** Setting ``minFlowHopLatencyChangeNs`` to ``0`` or small values
145 (lower than the switch normal jitter) will cause the switch to generate a lot
146 of reports. The current implementation only supports powers of 2.
147
148* ``watchSubnets``: List of IPv4 prefixes to add to the watchlist.
149 *Optional, default is an empty list.*
150
151 All traffic with source or destination IPv4 address included in one of these
152 prefixes will be reported (both flow and drop reports). All other packets will
153 be ignored. To watch all traffic, use ``0.0.0.0/0``. For GTP-U encapsulated
154 traffic, the watchlist is always applied to the inner headers. Hence to
155 monitor UE traffic, you should provide the UE subnet.
156
157 INT traffic is always excluded from the watchlist.
158
159 The default value (empty list) implies that flow reports and drop reports
160 are disabled.
161
162* ``queueReportLatencyThresholds``: A map specifying latency thresholds to
163 trigger queue reports or reset the queue report quota.
164 *Optional, default is an empty map.*
165
166 The key of this map is the queue ID. Switches will generate queue congestion
167 reports only for queue IDs in this map. Congestion detection for other queues
168 is disabled. The same thresholds are used for all devices, i.e., it's not
169 possible to configure different thresholds for the same queue on different
170 devices.
171
172 The value of this map is a tuple:
173
174 * ``triggerNs``: The latency threshold in nanoseconds to trigger queue
175 reports. Once a packet experiences latency **above** this threshold, all
176 subsequent packets in the same queue will be reported, independently of the
177 watchlist, up to the quota or until latency drops below ``triggerNs``.
178 **Required**
179 * ``resetNs``: The latency threshold in nanosecond to reset the quota. When
180 packet latency goes below this threshold, the quota is reset to its original
181 non-zero value. **Optional, default is triggerNs/2**.
182
183 Currently the default quota is 1024 and cannot be configured.
184
185
186Intel\ :sup:`TM` DeepInsight Integration
187----------------------------------------
Yi Tsengd16c4db2021-09-29 03:45:05 -0700188
189.. note::
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700190 In this section, we assume that you already know how to deploy DeepInsight
191 to your Kubernetes cluster with a valid license. For more information please
192 reach out to Intel's support.
Yi Tsengd16c4db2021-09-29 03:45:05 -0700193
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700194To be able to use DeepInsight with SD-Fabric, use the following steps:
Yi Tsengd16c4db2021-09-29 03:45:05 -0700195
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700196* Modify the DeepInsight Helm Chart `values.yml` to include the following setting for `tos_mask`:
Yi Tsengd16c4db2021-09-29 03:45:05 -0700197
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700198 .. code-block:: yaml
Yi Tsengd16c4db2021-09-29 03:45:05 -0700199
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700200 global:
201 preprocessor:
202 params:
203 tos_mask: 0
Yi Tsengd16c4db2021-09-29 03:45:05 -0700204
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700205* Deploy DeepInsight
206* Obtain the IPv4 address of the fabric-facing NIC interface of the Kubernetes
207 node where the ``preprocessor`` container is deployed. This is the address to
208 use as the ``collectorIp`` in the ONOS netcfg. This address must be routable
209 by the fabric, i.e., make sure you can ping that from any other host in the
210 fabric. Similarly, from within the preprocessor container you should be able
211 to ping the loopback IPv4 address of **all** fabric switches (``ipv4Loopback``
212 in the ONOS netcfg). If ping doesn't work, check the server's RPF
213 configuration, we recommend setting it to ``net.ipv4.conf.all.rp_filter=2``.
214* Generate a ``topology.json`` using the
215 `SD-Fabric utility scripts
216 <https://github.com/opennetworkinglab/sdfabric-utils/tree/main/deep-insight>`_
217 (includes instructions) and upload it using the DeepInsight UI. Make sure to
218 update and re-upload the ``topology.json`` frequently if you modify the
219 network configuration in ONOS (e.g., if you add/remove switches or links,
220 static routes, new hosts are discovered, etc.).
Yi Tsengd16c4db2021-09-29 03:45:05 -0700221
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700222Enabling Host-INT
223-----------------
Yi Tsengd16c4db2021-09-29 03:45:05 -0700224
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700225Support for INT on hosts is still experimental.
Yi Tsengd16c4db2021-09-29 03:45:05 -0700226
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700227Please check the documentation here to install ``int-host-reporter`` on your
228servers:
229``https://github.com/opennetworkinglab/int-host-reporter``
Yi Tsengd16c4db2021-09-29 03:45:05 -0700230
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700231Drop Reasons
232------------
Yi Tsengd16c4db2021-09-29 03:45:05 -0700233
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700234We use the following reason codes when generating drop reports. Please use this
235table as a reference when debugging drop reasons in DeepInsight or other INT
236collector.
Yi Tsengd16c4db2021-09-29 03:45:05 -0700237
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700238.. list-table:: SD-Fabric INT Drop Reasons
239 :widths: 15 25 60
240 :header-rows: 1
Yi Tsengd16c4db2021-09-29 03:45:05 -0700241
Carmelo Casconedeffadd2021-10-05 18:28:58 -0700242 * - Code
243 - Name
244 - Description
245 * - 0
246 - UNKNOWN
247 - Drop with unknown reason.
248 * - 26
249 - IP TTL ZERO
250 - IPv4 or IPv6 TTL zero. There might be a forwarding loop.
251 * - 29
252 - IP MISS
253 - IPv4 or IPv6 routing table miss. Check for missing routes in ONOS or if
254 the host is discovered.
255 * - 55
256 - INGRESS PORT VLAN MAPPING MISS
257 - Ingress port VLAN table miss. Packets are being received with an
258 unexpected VLAN. Check the ``interfaces`` section of the netcfg.
259 * - 71
260 - TRAFFIC MANAGER
261 - Packet dropped by traffic manager due to congestion (tail drop) or
262 because the port is down.
263 * - 80
264 - ACL DENY
265 - Check the ACL table rules.
266 * - 89
267 - BRIDGING MISS
268 - Missing bridging entry. Check table entry from bridging table.
269 * - 128 (WIP)
270 - NEXT ID MISS
271 - Missing next ID from ECMP (``hashed``) or multicast table.
272 * - 129
273 - MPLS MISS
274 - MPLS table miss. Check the segment routing device config in the netcfg.
275 * - 130
276 - EGRESS NEXT MISS
277 - Egress VLAN table miss. Check the ``interfaces`` section of the netcfg.
278 * - 131
279 - MPLS TTL ZERO
280 - There might be a forwarding loop.
281 * - 132
282 - UPF DOWNLINK PDR MISS
283 - Missing downlink PDR rule for the UE. Check UP4 flows.
284 * - 133
285 - UPF UPLINK PDR MISS
286 - Missing uplink PDR rule. Check UP4 flows.
287 * - 134
288 - UPF FAR MISS
289 - Missing FAR rule. Check UP4 flows.
290 * - 150
291 - UPF UPLINK RECIRC DENY
292 - Missing rules for UE-to-UE communication.
Yi Tsengd16c4db2021-09-29 03:45:05 -0700293
Yi Tsengd16c4db2021-09-29 03:45:05 -0700294