Blame - advanced/int.rst - sdfabric-docs

blob: c0436b784456408f6cbe96993e83da9b2c1fa2f8 [file] [log] [blame]

Charles Chan	fcfe890	2022-02-02 17:06:27 -0800	[diff] [blame]	1	.. SPDX-FileCopyrightText: 2021 Open Networking Foundation <info@opennetworking.org>
				2	.. SPDX-License-Identifier: Apache-2.0
				3
Carmelo Cascone	7623e7c	2021-10-13 17:45:27 -0700	[diff] [blame]	4	.. _int:
				5
Charles Chan	caebcf3	2021-09-20 22:17:52 -0700	[diff] [blame]	6	In-band Network Telemetry (INT)
				7	===============================
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	8
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	9	Overview
				10	--------
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	11
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	12	SD-Fabric supports the Inband Network Telemetry (INT) standard for data plane
				13	telemetry.
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	14
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	15	When INT is enabled, all switches are instrumented to generate INT reports for
				16	all traffic, reporting per-packet metadata such as the switch ID, ingress/egress
				17	ports, latency, queue congestion status, etc. Report generation is handled
				18	entirely at the data plane, in a way that does not affect the performance
				19	observed by regular traffic.
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	20
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	21	.. image:: ../images/int-overview.png
				22	:width: 700px
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	23
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	24	We aim at achieving visibility end-to-end. For this reason, we provide an
				25	implementation of INT for switches as well hosts. For switches, INT report
				26	generation is integrated as part of the same P4 pipeline responsible for
				27	bridging, routing, UPF, etc. For hosts, we provide experimental support for
				28	an eBPF-based application that can monitor packets as they are processed by the
				29	Kernel networking stack and Kubernetes CNI plug-ins. In the following, we use
				30	the term INT nodes to refer to both switches and hosts.
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	31
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	32	SD-Fabric is responsible for producing and delivering INT report packets to an
				33	external collector. The actual collection and analysis of reports is out of
				34	scope, but we support integration with 3rd party analytics platforms. SD-Fabric
				35	is currently being validated for integration with Intel\ :sup:`TM` DeepInsight, a
				36	commercial analytics platform. However, any collector compatible with the INT
				37	standard can be used instead.
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	38
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	39	Supported Features
				40	~~~~~~~~~~~~~~~~~~
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	41
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	42	* Telemetry Report Format Specification v0.5: report packets generated by
				43	nodes adhere to this version of the standard.
				44	* INT-XD mode (eXport Data): all nodes generate "postcards". For a given
				45	packet, the INT collector might receive up to N reports where N is the number
				46	of INT nodes in the path.
				47	* Configurable watchlist: specify which flows to monitor. It could be all
				48	traffic, entire subnets, or specific 5-tuples.
				49	* Flow reports: for a given flow (5-tuple), each node produces reports
				50	periodically allowing a collector to monitor the path and end-to-end latency,
				51	as well as detect anomalies such as path loop/change.
				52	* Drop reports: when a node drops a packet, it generates a report carrying
				53	the switch ID and the drop reason (e.g., routing table miss, TTL zero, queue
				54	congestion, and more).
				55	* Queue congestion reports: when queue utilization goes above a configurable
				56	threshold, switches produce reports for all packets in the queue,
				57	making it possible to identify exactly which flow is causing congestion.
				58	* Smart filters and triggers: generating INT reports for each packet seen by
				59	a node can lead to excessive network overhead and overloading at the
				60	collector. For this reason, nodes implement logic to limit the volume of
				61	reports generated in a way that doesn't cause anomalies to go undetected. For
				62	flow reports and drop reports, the pipeline generates 1 report/sec for each
				63	5-tuple, or more when detecting anomalies (e.g., changes in the ingress/egress
				64	port, queues, hop latency, etc). For queue congestion reports, the number of
				65	reports that can be generated for each congestion event is limited to a
				66	configurable "quota".
				67	* Integration with P4-UPF: when processing GTP-U encapsulated packets,
				68	switches can watch inside GTP-U tunnels, generating reports for the inner
				69	headers and making it possible to troubleshoot issues at the application
				70	level. In addition, when generating drop reports, we support UPF-specific drop
				71	reasons to identify if drops are caused by the UPF tables (because of a
				72	misconfiguration somewhere in the control stack, or simply because the
				73	specific user device is not authorized).
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	74
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	75	INT Report Delivery
				76	~~~~~~~~~~~~~~~~~~~
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	77
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	78	INT reports generated by nodes are delivered to the INT collector using the same
				79	fabric links. In the example below, user traffic goes through three switches.
				80	Each one is generating an INT report packet (postcard), which is forwarded using
				81	the same flow rules for regular traffic.
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	82
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	83	.. image:: ../images/int-reports.png
				84	:width: 700px
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	85
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	86	This choice has the advantage of simplifying deployment and control plane logic,
				87	as it doesn't require setting up a different network and handling installation
				88	of flow rules specific to INT reports. However, the downside is that delivery of
				89	INT reports can be subject to the same issues that we are trying to detect using
				90	INT. For example, if a user packet is getting dropped because of missing routing
				91	entries, the INT report generated for the drop event might also be dropped for
				92	the same reason.
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	93
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	94	In future releases, we might add support for using the management network for
				95	report delivery, but for now using the fabric network is the only supported
				96	option.
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	97
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	98	ONOS Configuration
				99	------------------
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	100
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	101	To enable INT, modify the ONOS netcfg in the following way:
				102
				103	* in the ``devices`` section, use an INT-enabled pipeconf ID (``fabric-int``
Carmelo Cascone	f2b1791	2022-02-04 09:27:27 -0800	[diff] [blame^]	104	or ``fabric-upf-int``);
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	105	* in the ``apps`` section, add a config block for app ID
				106	``org.stratumproject.fabric.tna.inbandtelemetry``, like in the example below:
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	107
				108	.. code-block:: json
				109
				110	{
				111	"apps": {
				112	"org.stratumproject.fabric.tna.inbandtelemetry": {
				113	"report": {
				114	"collectorIp": "10.32.11.2",
				115	"collectorPort": 32766,
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	116	"minFlowHopLatencyChangeNs": 256,
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	117	"watchSubnets": [
				118	"10.32.11.0/24"
				119	],
				120	"queueReportLatencyThresholds": {
				121	"0": {"triggerNs": 2000, "resetNs": 500},
				122	"2": {"triggerNs": 1000, "resetNs": 400}
				123	}
				124	}
				125	}
				126	}
				127	}
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	128
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	129	Here's a reference of the fields that you can configure for the INT app:
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	130
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	131	* ``collectorIp``: The IP address of the INT collector. Must be an IP address
				132	routable by the fabric, either the IP address of a host directly connected to
				133	the fabric and discovered by ONOS, or reachable via an external router.
				134	Required
				135
				136	* ``collectorPort``: The UDP port used by the INT collector to listen for report
				137	packets. Required
				138
				139	* ``minFlowHopLatencyChangeNs``: Minimum latency difference in nanoseconds to
				140	trigger flow report generation. Optional, default is 256.
				141
				142	Used by the smart filters to immediately report abnormal latency changes. In
				143	normal conditions, switches generate 1 report per second for each active
				144	5-tuple. During congestion, when packets experience higher latency, the
				145	switch will generate a report immediately if the latency difference between
				146	this packet and the previous one of the same 5-tuple is greater than
				147	``minFlowHopLatencyChangeNs``.
				148
				149	Warning: Setting ``minFlowHopLatencyChangeNs`` to ``0`` or small values
				150	(lower than the switch normal jitter) will cause the switch to generate a lot
				151	of reports. The current implementation only supports powers of 2.
				152
				153	* ``watchSubnets``: List of IPv4 prefixes to add to the watchlist.
				154	Optional, default is an empty list.
				155
				156	All traffic with source or destination IPv4 address included in one of these
				157	prefixes will be reported (both flow and drop reports). All other packets will
				158	be ignored. To watch all traffic, use ``0.0.0.0/0``. For GTP-U encapsulated
				159	traffic, the watchlist is always applied to the inner headers. Hence to
				160	monitor UE traffic, you should provide the UE subnet.
				161
				162	INT traffic is always excluded from the watchlist.
				163
				164	The default value (empty list) implies that flow reports and drop reports
				165	are disabled.
				166
				167	* ``queueReportLatencyThresholds``: A map specifying latency thresholds to
				168	trigger queue reports or reset the queue report quota.
				169	Optional, default is an empty map.
				170
				171	The key of this map is the queue ID. Switches will generate queue congestion
				172	reports only for queue IDs in this map. Congestion detection for other queues
				173	is disabled. The same thresholds are used for all devices, i.e., it's not
				174	possible to configure different thresholds for the same queue on different
				175	devices.
				176
				177	The value of this map is a tuple:
				178
				179	* ``triggerNs``: The latency threshold in nanoseconds to trigger queue
				180	reports. Once a packet experiences latency above this threshold, all
				181	subsequent packets in the same queue will be reported, independently of the
				182	watchlist, up to the quota or until latency drops below ``triggerNs``.
				183	Required
				184	* ``resetNs``: The latency threshold in nanosecond to reset the quota. When
				185	packet latency goes below this threshold, the quota is reset to its original
				186	non-zero value. Optional, default is triggerNs/2.
				187
				188	Currently the default quota is 1024 and cannot be configured.
				189
				190
				191	Intel\ :sup:`TM` DeepInsight Integration
				192	----------------------------------------
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	193
				194	.. note::
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	195	In this section, we assume that you already know how to deploy DeepInsight
				196	to your Kubernetes cluster with a valid license. For more information please
				197	reach out to Intel's support.
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	198
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	199	To be able to use DeepInsight with SD-Fabric, use the following steps:
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	200
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	201	* Modify the DeepInsight Helm Chart `values.yml` to include the following setting for `tos_mask`:
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	202
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	203	.. code-block:: yaml
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	204
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	205	global:
				206	preprocessor:
				207	params:
				208	tos_mask: 0
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	209
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	210	* Deploy DeepInsight
				211	* Obtain the IPv4 address of the fabric-facing NIC interface of the Kubernetes
				212	node where the ``preprocessor`` container is deployed. This is the address to
				213	use as the ``collectorIp`` in the ONOS netcfg. This address must be routable
				214	by the fabric, i.e., make sure you can ping that from any other host in the
				215	fabric. Similarly, from within the preprocessor container you should be able
				216	to ping the loopback IPv4 address of all fabric switches (``ipv4Loopback``
				217	in the ONOS netcfg). If ping doesn't work, check the server's RPF
				218	configuration, we recommend setting it to ``net.ipv4.conf.all.rp_filter=2``.
				219	* Generate a ``topology.json`` using the
				220	`SD-Fabric utility scripts
				221	<https://github.com/opennetworkinglab/sdfabric-utils/tree/main/deep-insight>`_
				222	(includes instructions) and upload it using the DeepInsight UI. Make sure to
				223	update and re-upload the ``topology.json`` frequently if you modify the
				224	network configuration in ONOS (e.g., if you add/remove switches or links,
				225	static routes, new hosts are discovered, etc.).
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	226
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	227	Enabling Host-INT
				228	-----------------
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	229
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	230	Support for INT on hosts is still experimental.
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	231
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	232	Please check the documentation here to install ``int-host-reporter`` on your
				233	servers:
				234	``https://github.com/opennetworkinglab/int-host-reporter``
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	235
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	236	Drop Reasons
				237	------------
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	238
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	239	We use the following reason codes when generating drop reports. Please use this
				240	table as a reference when debugging drop reasons in DeepInsight or other INT
				241	collector.
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	242
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	243	.. list-table:: SD-Fabric INT Drop Reasons
				244	:widths: 15 25 60
				245	:header-rows: 1
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	246
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	247	* - Code
				248	- Name
				249	- Description
				250	* - 0
				251	- UNKNOWN
				252	- Drop with unknown reason.
				253	* - 26
				254	- IP TTL ZERO
				255	- IPv4 or IPv6 TTL zero. There might be a forwarding loop.
				256	* - 29
				257	- IP MISS
				258	- IPv4 or IPv6 routing table miss. Check for missing routes in ONOS or if
				259	the host is discovered.
				260	* - 55
				261	- INGRESS PORT VLAN MAPPING MISS
				262	- Ingress port VLAN table miss. Packets are being received with an
				263	unexpected VLAN. Check the ``interfaces`` section of the netcfg.
				264	* - 71
				265	- TRAFFIC MANAGER
				266	- Packet dropped by traffic manager due to congestion (tail drop) or
				267	because the port is down.
				268	* - 80
				269	- ACL DENY
				270	- Check the ACL table rules.
				271	* - 89
				272	- BRIDGING MISS
				273	- Missing bridging entry. Check table entry from bridging table.
				274	* - 128 (WIP)
				275	- NEXT ID MISS
				276	- Missing next ID from ECMP (``hashed``) or multicast table.
				277	* - 129
				278	- MPLS MISS
				279	- MPLS table miss. Check the segment routing device config in the netcfg.
				280	* - 130
				281	- EGRESS NEXT MISS
				282	- Egress VLAN table miss. Check the ``interfaces`` section of the netcfg.
				283	* - 131
				284	- MPLS TTL ZERO
				285	- There might be a forwarding loop.
				286	* - 132
				287	- UPF DOWNLINK PDR MISS
				288	- Missing downlink PDR rule for the UE. Check UP4 flows.
				289	* - 133
				290	- UPF UPLINK PDR MISS
				291	- Missing uplink PDR rule. Check UP4 flows.
				292	* - 134
				293	- UPF FAR MISS
				294	- Missing FAR rule. Check UP4 flows.
				295	* - 150
				296	- UPF UPLINK RECIRC DENY
				297	- Missing rules for UE-to-UE communication.
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	298
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	299