Blame - advanced/int.rst - sdfabric-docs

blob: 283258311c09d31e26dabd15e191fd2a86c61799 [file] [log] [blame]

Carmelo Cascone	7623e7c	2021-10-13 17:45:27 -0700	[diff] [blame]	1	.. _int:
				2
Charles Chan	caebcf3	2021-09-20 22:17:52 -0700	[diff] [blame]	3	In-band Network Telemetry (INT)
				4	===============================
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	5
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	6	Overview
				7	--------
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	8
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	9	SD-Fabric supports the Inband Network Telemetry (INT) standard for data plane
				10	telemetry.
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	11
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	12	When INT is enabled, all switches are instrumented to generate INT reports for
				13	all traffic, reporting per-packet metadata such as the switch ID, ingress/egress
				14	ports, latency, queue congestion status, etc. Report generation is handled
				15	entirely at the data plane, in a way that does not affect the performance
				16	observed by regular traffic.
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	17
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	18	.. image:: ../images/int-overview.png
				19	:width: 700px
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	20
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	21	We aim at achieving visibility end-to-end. For this reason, we provide an
				22	implementation of INT for switches as well hosts. For switches, INT report
				23	generation is integrated as part of the same P4 pipeline responsible for
				24	bridging, routing, UPF, etc. For hosts, we provide experimental support for
				25	an eBPF-based application that can monitor packets as they are processed by the
				26	Kernel networking stack and Kubernetes CNI plug-ins. In the following, we use
				27	the term INT nodes to refer to both switches and hosts.
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	28
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	29	SD-Fabric is responsible for producing and delivering INT report packets to an
				30	external collector. The actual collection and analysis of reports is out of
				31	scope, but we support integration with 3rd party analytics platforms. SD-Fabric
				32	is currently being validated for integration with Intel\ :sup:`TM` DeepInsight, a
				33	commercial analytics platform. However, any collector compatible with the INT
				34	standard can be used instead.
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	35
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	36	Supported Features
				37	~~~~~~~~~~~~~~~~~~
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	38
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	39	* Telemetry Report Format Specification v0.5: report packets generated by
				40	nodes adhere to this version of the standard.
				41	* INT-XD mode (eXport Data): all nodes generate "postcards". For a given
				42	packet, the INT collector might receive up to N reports where N is the number
				43	of INT nodes in the path.
				44	* Configurable watchlist: specify which flows to monitor. It could be all
				45	traffic, entire subnets, or specific 5-tuples.
				46	* Flow reports: for a given flow (5-tuple), each node produces reports
				47	periodically allowing a collector to monitor the path and end-to-end latency,
				48	as well as detect anomalies such as path loop/change.
				49	* Drop reports: when a node drops a packet, it generates a report carrying
				50	the switch ID and the drop reason (e.g., routing table miss, TTL zero, queue
				51	congestion, and more).
				52	* Queue congestion reports: when queue utilization goes above a configurable
				53	threshold, switches produce reports for all packets in the queue,
				54	making it possible to identify exactly which flow is causing congestion.
				55	* Smart filters and triggers: generating INT reports for each packet seen by
				56	a node can lead to excessive network overhead and overloading at the
				57	collector. For this reason, nodes implement logic to limit the volume of
				58	reports generated in a way that doesn't cause anomalies to go undetected. For
				59	flow reports and drop reports, the pipeline generates 1 report/sec for each
				60	5-tuple, or more when detecting anomalies (e.g., changes in the ingress/egress
				61	port, queues, hop latency, etc). For queue congestion reports, the number of
				62	reports that can be generated for each congestion event is limited to a
				63	configurable "quota".
				64	* Integration with P4-UPF: when processing GTP-U encapsulated packets,
				65	switches can watch inside GTP-U tunnels, generating reports for the inner
				66	headers and making it possible to troubleshoot issues at the application
				67	level. In addition, when generating drop reports, we support UPF-specific drop
				68	reasons to identify if drops are caused by the UPF tables (because of a
				69	misconfiguration somewhere in the control stack, or simply because the
				70	specific user device is not authorized).
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	71
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	72	INT Report Delivery
				73	~~~~~~~~~~~~~~~~~~~
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	74
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	75	INT reports generated by nodes are delivered to the INT collector using the same
				76	fabric links. In the example below, user traffic goes through three switches.
				77	Each one is generating an INT report packet (postcard), which is forwarded using
				78	the same flow rules for regular traffic.
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	79
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	80	.. image:: ../images/int-reports.png
				81	:width: 700px
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	82
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	83	This choice has the advantage of simplifying deployment and control plane logic,
				84	as it doesn't require setting up a different network and handling installation
				85	of flow rules specific to INT reports. However, the downside is that delivery of
				86	INT reports can be subject to the same issues that we are trying to detect using
				87	INT. For example, if a user packet is getting dropped because of missing routing
				88	entries, the INT report generated for the drop event might also be dropped for
				89	the same reason.
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	90
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	91	In future releases, we might add support for using the management network for
				92	report delivery, but for now using the fabric network is the only supported
				93	option.
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	94
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	95	ONOS Configuration
				96	------------------
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	97
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	98	To enable INT, modify the ONOS netcfg in the following way:
				99
				100	* in the ``devices`` section, use an INT-enabled pipeconf ID (``fabric-int``
				101	or ``fabric-spgw-int``);
				102	* in the ``apps`` section, add a config block for app ID
				103	``org.stratumproject.fabric.tna.inbandtelemetry``, like in the example below:
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	104
				105	.. code-block:: json
				106
				107	{
				108	"apps": {
				109	"org.stratumproject.fabric.tna.inbandtelemetry": {
				110	"report": {
				111	"collectorIp": "10.32.11.2",
				112	"collectorPort": 32766,
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	113	"minFlowHopLatencyChangeNs": 256,
Yi Tseng	1aa9da7	2021-09-28 02:33:14 -0700	[diff] [blame]	114	"watchSubnets": [
				115	"10.32.11.0/24"
				116	],
				117	"queueReportLatencyThresholds": {
				118	"0": {"triggerNs": 2000, "resetNs": 500},
				119	"2": {"triggerNs": 1000, "resetNs": 400}
				120	}
				121	}
				122	}
				123	}
				124	}
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	125
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	126	Here's a reference of the fields that you can configure for the INT app:
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	127
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	128	* ``collectorIp``: The IP address of the INT collector. Must be an IP address
				129	routable by the fabric, either the IP address of a host directly connected to
				130	the fabric and discovered by ONOS, or reachable via an external router.
				131	Required
				132
				133	* ``collectorPort``: The UDP port used by the INT collector to listen for report
				134	packets. Required
				135
				136	* ``minFlowHopLatencyChangeNs``: Minimum latency difference in nanoseconds to
				137	trigger flow report generation. Optional, default is 256.
				138
				139	Used by the smart filters to immediately report abnormal latency changes. In
				140	normal conditions, switches generate 1 report per second for each active
				141	5-tuple. During congestion, when packets experience higher latency, the
				142	switch will generate a report immediately if the latency difference between
				143	this packet and the previous one of the same 5-tuple is greater than
				144	``minFlowHopLatencyChangeNs``.
				145
				146	Warning: Setting ``minFlowHopLatencyChangeNs`` to ``0`` or small values
				147	(lower than the switch normal jitter) will cause the switch to generate a lot
				148	of reports. The current implementation only supports powers of 2.
				149
				150	* ``watchSubnets``: List of IPv4 prefixes to add to the watchlist.
				151	Optional, default is an empty list.
				152
				153	All traffic with source or destination IPv4 address included in one of these
				154	prefixes will be reported (both flow and drop reports). All other packets will
				155	be ignored. To watch all traffic, use ``0.0.0.0/0``. For GTP-U encapsulated
				156	traffic, the watchlist is always applied to the inner headers. Hence to
				157	monitor UE traffic, you should provide the UE subnet.
				158
				159	INT traffic is always excluded from the watchlist.
				160
				161	The default value (empty list) implies that flow reports and drop reports
				162	are disabled.
				163
				164	* ``queueReportLatencyThresholds``: A map specifying latency thresholds to
				165	trigger queue reports or reset the queue report quota.
				166	Optional, default is an empty map.
				167
				168	The key of this map is the queue ID. Switches will generate queue congestion
				169	reports only for queue IDs in this map. Congestion detection for other queues
				170	is disabled. The same thresholds are used for all devices, i.e., it's not
				171	possible to configure different thresholds for the same queue on different
				172	devices.
				173
				174	The value of this map is a tuple:
				175
				176	* ``triggerNs``: The latency threshold in nanoseconds to trigger queue
				177	reports. Once a packet experiences latency above this threshold, all
				178	subsequent packets in the same queue will be reported, independently of the
				179	watchlist, up to the quota or until latency drops below ``triggerNs``.
				180	Required
				181	* ``resetNs``: The latency threshold in nanosecond to reset the quota. When
				182	packet latency goes below this threshold, the quota is reset to its original
				183	non-zero value. Optional, default is triggerNs/2.
				184
				185	Currently the default quota is 1024 and cannot be configured.
				186
				187
				188	Intel\ :sup:`TM` DeepInsight Integration
				189	----------------------------------------
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	190
				191	.. note::
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	192	In this section, we assume that you already know how to deploy DeepInsight
				193	to your Kubernetes cluster with a valid license. For more information please
				194	reach out to Intel's support.
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	195
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	196	To be able to use DeepInsight with SD-Fabric, use the following steps:
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	197
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	198	* Modify the DeepInsight Helm Chart `values.yml` to include the following setting for `tos_mask`:
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	199
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	200	.. code-block:: yaml
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	201
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	202	global:
				203	preprocessor:
				204	params:
				205	tos_mask: 0
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	206
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	207	* Deploy DeepInsight
				208	* Obtain the IPv4 address of the fabric-facing NIC interface of the Kubernetes
				209	node where the ``preprocessor`` container is deployed. This is the address to
				210	use as the ``collectorIp`` in the ONOS netcfg. This address must be routable
				211	by the fabric, i.e., make sure you can ping that from any other host in the
				212	fabric. Similarly, from within the preprocessor container you should be able
				213	to ping the loopback IPv4 address of all fabric switches (``ipv4Loopback``
				214	in the ONOS netcfg). If ping doesn't work, check the server's RPF
				215	configuration, we recommend setting it to ``net.ipv4.conf.all.rp_filter=2``.
				216	* Generate a ``topology.json`` using the
				217	`SD-Fabric utility scripts
				218	<https://github.com/opennetworkinglab/sdfabric-utils/tree/main/deep-insight>`_
				219	(includes instructions) and upload it using the DeepInsight UI. Make sure to
				220	update and re-upload the ``topology.json`` frequently if you modify the
				221	network configuration in ONOS (e.g., if you add/remove switches or links,
				222	static routes, new hosts are discovered, etc.).
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	223
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	224	Enabling Host-INT
				225	-----------------
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	226
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	227	Support for INT on hosts is still experimental.
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	228
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	229	Please check the documentation here to install ``int-host-reporter`` on your
				230	servers:
				231	``https://github.com/opennetworkinglab/int-host-reporter``
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	232
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	233	Drop Reasons
				234	------------
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	235
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	236	We use the following reason codes when generating drop reports. Please use this
				237	table as a reference when debugging drop reasons in DeepInsight or other INT
				238	collector.
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	239
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	240	.. list-table:: SD-Fabric INT Drop Reasons
				241	:widths: 15 25 60
				242	:header-rows: 1
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	243
Carmelo Cascone	deffadd	2021-10-05 18:28:58 -0700	[diff] [blame]	244	* - Code
				245	- Name
				246	- Description
				247	* - 0
				248	- UNKNOWN
				249	- Drop with unknown reason.
				250	* - 26
				251	- IP TTL ZERO
				252	- IPv4 or IPv6 TTL zero. There might be a forwarding loop.
				253	* - 29
				254	- IP MISS
				255	- IPv4 or IPv6 routing table miss. Check for missing routes in ONOS or if
				256	the host is discovered.
				257	* - 55
				258	- INGRESS PORT VLAN MAPPING MISS
				259	- Ingress port VLAN table miss. Packets are being received with an
				260	unexpected VLAN. Check the ``interfaces`` section of the netcfg.
				261	* - 71
				262	- TRAFFIC MANAGER
				263	- Packet dropped by traffic manager due to congestion (tail drop) or
				264	because the port is down.
				265	* - 80
				266	- ACL DENY
				267	- Check the ACL table rules.
				268	* - 89
				269	- BRIDGING MISS
				270	- Missing bridging entry. Check table entry from bridging table.
				271	* - 128 (WIP)
				272	- NEXT ID MISS
				273	- Missing next ID from ECMP (``hashed``) or multicast table.
				274	* - 129
				275	- MPLS MISS
				276	- MPLS table miss. Check the segment routing device config in the netcfg.
				277	* - 130
				278	- EGRESS NEXT MISS
				279	- Egress VLAN table miss. Check the ``interfaces`` section of the netcfg.
				280	* - 131
				281	- MPLS TTL ZERO
				282	- There might be a forwarding loop.
				283	* - 132
				284	- UPF DOWNLINK PDR MISS
				285	- Missing downlink PDR rule for the UE. Check UP4 flows.
				286	* - 133
				287	- UPF UPLINK PDR MISS
				288	- Missing uplink PDR rule. Check UP4 flows.
				289	* - 134
				290	- UPF FAR MISS
				291	- Missing FAR rule. Check UP4 flows.
				292	* - 150
				293	- UPF UPLINK RECIRC DENY
				294	- Missing rules for UE-to-UE communication.
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	295
Yi Tseng	d16c4db	2021-09-29 03:45:05 -0700	[diff] [blame]	296