blob: c62f132a50d4f0bf741da8322356b633446f0599 [file] [log] [blame]
Charles Chanfcfe8902022-02-02 17:06:27 -08001.. SPDX-FileCopyrightText: 2021 Open Networking Foundation <info@opennetworking.org>
2.. SPDX-License-Identifier: Apache-2.0
3
pierventre517cd532021-10-12 22:58:00 +02004.. _architecture_design:
5
Charles Chancaebcf32021-09-20 22:17:52 -07006Architecture and Design
7=======================
Charles Chan10ad1442021-10-05 16:57:26 -07008
9Architecture
10------------
11
12Classic SDN
13^^^^^^^^^^^
14SD-Fabric operates as a hybrid L2/L3 fabric. As a pure (or classic) SDN solution, SD-Fabric does
15not use any of the traditional control protocols typically found in networking, a non-exhaustive
16list of which includes: STP, MSTP, RSTP, LACP, MLAG, PIM, IGMP, OSPF, IS-IS, Trill, RSVP, LDP
17and BGP. Instead, SD-Fabric uses an SDN Controller (ONOS) decoupled from the data plane
18hardware to directly program ASIC forwarding tables in a pipeline defined by a P4 program. In
19this design, a set of applications running on ONOS program all the fabric functionality and
20features, such as Ethernet switching, IP routing, mobile core user plane, multicast, DHCP Relay,
21and more.
22
23
24Topologies
25^^^^^^^^^^
26SD-Fabric supports a number of different topological variants. In its simplest instantiation, one
27could use a single leaf or a leaf-pair to connect servers, external routers, and other equipment
28like access nodes or physical appliances (PNFs). Such a deployment can also be scaled
29horizontally into a leaf-and-spine fabric (2-level folded Clos), by adding 2 or 4 spines and up to
3010 leaves in single or paired configurations. Further scale can be achieved by distributing the
31fabric itself across geographical regions, with spine switches in a primary central location,
32connected to other spines in multiple secondary (remote) locations using WDM links. Such 4-level
33topologies (leaf-spine-spine-leaf) can be used for backhaul in operator networks, where
34the secondary locations are deeper in the network and closer to the end-user. In these
35configurations, the spines in the secondary locations serve as aggregation devices that backhaul
36traffic from the access nodes to the primary location which typically has the facilities for compute
37and storage for NFV applications.
38See :ref:`Topology` for details.
39
40
41Redundancy
42^^^^^^^^^^
43SD-Fabric supports redundancy at every level. A leaf-spine fabric is redundant by design in the
44spine layer, with the use of ECMP hashing and multiple spines. In addition, SD-Fabric supports
45leaf pairs, where servers and external routers can be dual-homed to two ToRs in an active-active
46configuration. In the control plane, some SDN solutions use single instance controllers, which are
47single points of failure. Others use two controllers in active backup mode, which is redundant,
48but may lack scale as all the work is still being done by one instance at any time and scale can
49never exceed the capacity of one server. In contrast, SD-Fabric is based on ONOS, an SDN
50controller that offers N-way redundancy and scale. An ONOS cluster with 3 or 5 instances are all
51active nodes doing work simultaneously, and failure handling is fully automated and completely
52handled by the ONOS platform.
53
54.. image:: images/arch-redundancy.png
55 :width: 350px
56
57MPLS Segment Routing (SR)
58^^^^^^^^^^^^^^^^^^^^^^^^^
59While SR is not an externally supported feature, SD-Fabric architecture internally uses concepts
60like globally significant MPLS labels that are assigned to each leaf and spine switch. The leaf
61switches push an MPLS label designating the destination ToR (leaf) onto the IPv4 or IPv6 traffic,
62before hashing the flows to the spines. In turn, the spines forward the traffic solely on the basis
63of the MPLS labels. This design concept, popular in IP/MPLS WAN networks, has significant
64advantages. Since the spines only maintain label state, it leads to significantly less programming
65burden and better scale. For example, in one use case the leaf switches may each hold 100K+
66IPv4/v6 routes, while the spine switches need to be programmed with only 10s of labels! As a
67result, completely different ASICs can be used for the leaf and spine switches; the leaves can
68have bigger routing tables and deeper buffers while sacrificing switching capacity, while the
69spines can have smaller tables with high switching capacity.
70
71Beyond Traditional Fabrics
72--------------------------
73
74.. image:: images/arch-features.png
75 :width: 700px
76
77While SD-Fabric offers advancements that go well beyond traditional fabrics, it is first helpful to
78understand that SD-Fabric provides all the features found in network fabrics from traditional
79networking vendors in order to make SD-Fabric compatible with all existing infrastructure
80(servers, applications, etc.).
81
82At its core, SD-Fabric is a L3 fabric where both IPv4 and IPv6 packets are routed across server
83racks using multiple equal-cost paths via spine switches. L2 bridging and VLANs are also
84supported within each server rack, and compute nodes can be dual-homed to two Top-of-Rack
85(ToR) switches in an active-active configuration (M-LAG). SD-Fabric assumes that the fabric
86connects to the public Internet and the public cloud (or other networks) via traditional router(s).
87SD-Fabric supports a number of other router features like static routes, multicast, DHCP L3 Relay
88and the use of ACLs based on layer 2/3/4 options to drop traffic at ingress or redirect traffic via
89Policy Based Routing. But SDN control greatly simplifies the software running on each switch,
90and control is moved into SDN applications running in the edge cloud.
91
Carmelo Cascone43989982021-10-12 00:01:19 -070092While these traditional switching/routing features are not particularly novel, SD-Fabric's
Charles Chan10ad1442021-10-05 16:57:26 -070093fundamental embrace of programmable silicon offers advantages that go far beyond traditional
94fabrics.
95
96Programmable Data Planes & P4
97^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Carmelo Cascone43989982021-10-12 00:01:19 -070098SD-Fabric's data plane is fully programmable. In marked contrast to traditional fabrics, features
Charles Chan10ad1442021-10-05 16:57:26 -070099are not prescribed by switch vendors. This is made possible by P4, a high-level programming
100language used to define the switch packet processing pipeline, which can be compiled to run at
101line-rate on programmable ASICs like Intel Tofino (see https://opennetworking.org/p4/). P4
102allows operators to continuously evolve their network infrastructure by re-programming the
103existing switches, rolling out new features and services on a weekly basis. In contrast, traditional
104fabrics based on fixed-function ASICs are subject to extremely long hardware development
105cycles (4 years on average) and require expensive infrastructure upgrades to support new features.
106
107SD-Fabric takes advantage of P4 programmability by extending the traditional L2/L3 pipeline for
108switching and routing with specialized functions such as 4G/5G Mobile Core User Plane Function
109(UPF) and Inband Network Telemetry (INT).
110
1114G/5G Mobile Core User Plane Function (UPF)
112^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
113Switches in SD-Fabric can be programmed to perform UPF functions at line rate. The L2/L3
114packet processing pipeline running on Intel Tofino switches has been extended to include
115capabilities such as GTP-U tunnel termination, usage reporting, idle-mode buffering, QoS, slicing,
116and more. Similar to vRouter, a new ONOS app abstracts the whole leaf-spine fabric as one big
117UPF, providing integration with the mobile core control plane using a 3GPP-compliant
118implementation of the Packet Forwarding Control Protocol (PFCP).
119
120With integrated UPF processing, SD-Fabric can implement a 4G/5G local breakout for edge
121applications that is multi-terabit and low-latency, without taking away CPU processing power for
122containers or VMs. In contrast to UPF solutions based on full or partial smartNIC offload,
Carmelo Cascone43989982021-10-12 00:01:19 -0700123SD-Fabric's embedded UPF does not require additional hardware other than the same leaf and spine
Carmelo Casconea67dc7e2022-02-24 17:08:13 -0800124switches used to interconnect servers and base stations.
125
126At the same time, SD-Fabric can be
Charles Chan10ad1442021-10-05 16:57:26 -0700127integrated with both CPU-based or smartNIC-based UPFs to improve scale while supporting
128differentiated services on a hardware-based fast-path at line rate for mission critical 4G/5G
129applications (see https://opennetworking.org/sd-core/ for more details).
130
131Visibility with Inband Network Telemetry (INT)
132^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
133SD-Fabric comes with scalable support for INT, providing unprecedented visibility into how
134individual packets are processed by the fabric. To this end, the P4-defined switch pipeline has
135been extended with the ability to generate INT reports for a number of packet events and
136anomalies, for example:
137
138- For each flow (5-tuple), it produces periodic reports to monitor the path in terms of which
139 switches, ports, queues, and end-to-end latency is introduced by each network hop
140 (switch).
141- If a packet gets dropped, it generates a report carrying the switch ID and the drop reason
142 (e.g., routing table miss, TTL zero, queue congestion, and more).
143- During congestion, it produces reports to reconstruct a snapshot of the queue at a given
144 time, making it possible to identify exactly which flow is causing delay or drops to other flows.
145- For GTP-U tunnels, it produces reports about the inner flow, thus monitoring the
146 forwarding behavior and perceived QoS for individual UE flows.
147
Carmelo Cascone43989982021-10-12 00:01:19 -0700148SD-Fabric's INT implementation is compliant with the open source INT specification, and it has
149been validated to work with Intel's DeepInsight performance monitoring solution, which acts as
Charles Chan10ad1442021-10-05 16:57:26 -0700150the collector of INT reports generated by switches. Moreover, to avoid overloading the INT
Carmelo Cascone43989982021-10-12 00:01:19 -0700151collector and to minimize the overhead of INT reports in the fabric, SD-Fabric's data plane uses
Charles Chan10ad1442021-10-05 16:57:26 -0700152P4 to implement smart filters and triggers that drastically reduce the number of reports
153generated, for example, by filtering out duplicates and by triggering report generation only in
154case of meaningful anomalies (e.g., spikes in hop latency, path changes, drops, queue congestion,
155etc.). In contrast to other sampling-based approaches which often allow some anomalies to go
156undetected, SD-Fabric provides precise INT-based visibility that can scale to millions of flows.
157
158Flexible ASIC Resource Allocation
159^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Carmelo Cascone43989982021-10-12 00:01:19 -0700160The P4 program at the base of SD-Fabric's software stack defines match-action tables for
Charles Chan10ad1442021-10-05 16:57:26 -0700161common L2/L3 features such as bridging, IPv4/IPv6 routing, MPLS termination, and ACL, as well
162as specialized features like UPF, with tables that store GTP-U tunnel information and more. In
163contrast to fixed-function ASICs used in traditional fabrics, table sizes are not fixed. The use of
164programmable ASICs like Intel Tofino in SD-Fabric enables the P4 program to be adapted to
165specific deployment requirements. For example, for routing-heavy deployments, one could
166decide to increase the IPv4 routing table to take up to 90% of the total ASIC memory, with an
167arbitrary ratio of longest-prefix match (LPM) entries and exact match /32 entries, while reducing
168the size of other tables. Similarly, when using SD-Fabric for UPF, one could decide to recompile
169the P4 program with larger GTP-U tunnel tables, while reducing the IPv4 routing table size to
17010-100 entries (since most traffic is tunneled) or by entirely removing the IPv6 tables.
171
172Closed Loop Control
173^^^^^^^^^^^^^^^^^^^
174With complete transparency, visibility, and verifiability, SD-Fabric becomes capable of being
175optimized and secured through programmatic real-time closed loop control. By defining specific
176acceptable tolerances for specific settings, measuring for compliance, and automatically adapting
177to deviations, a closed loop network can be created that dynamically and automatically responds
178to environmental changes. We can apply closed loop control for a variety of use cases including
179resource optimization (traffic engineering), verification (forwarding behavior), security (DDoS
180mitigation), and others. In particular, in collaboration with the Pronto™ project, a microburst
181mitigation mechanism has been implemented in order to stop attackers from filling up switch
182queues in an attack attempting to disrupt mission critical traffic.
183
184SDN, White Boxes, and Open Source
185SD-Fabric is based on a purist implementation of SDN in both control and data planes. When
186coupled with open source, this approach enables faster development of features and greater
187flexibility for operators to deploy only what they need and customize/optimize the features the
188way they want. Furthermore, SDN facilitates the centralized configuration of all network
189functionality, and allows network monitoring and troubleshooting to be centralized as well. Both
190are significant benefits over traditional box-by-box networking and enable faster deployments,
191simplified operations, and streamlined troubleshooting.
192
193The use of white box (bare metal) switching hardware from ODMs significantly reduces CapEx
194costs when compared to products from OEM vendors. By some accounts, the cost savings can
195be as high as 60%. This is typically due to the OEM vendors amortizing the cost of developing
196embedded switch/router software into the price of their hardware.
197
198Finally, open source software allows network operators to develop their own applications and
199choose how they integrate with their backend systems. And open source is considered more
200secure, with ‘many eyes’ making it much harder for backdoors to be intentionally or
201unintentionally introduced into the network.
202
203Such unfettered ability to control timelines, features and costs compared to traditional network
204fabrics makes SD-Fabric very attractive for operators, enterprises, and government applications.
205
206Extensible APIs
207^^^^^^^^^^^^^^^
208People usually think of a network fabric as an opaque pipe where applications send packets into
209the network and hope they come out the other side. Little visibility is provided to determine
210where things have gone wrong when a packet doesn't make it to its destination. Network
211applications have no knowledge of how the packets are handled by the fabric.
212
213With the SD-Fabric API, network applications have full visibility and control over how their
214packets are processed. For example, a delay-sensitive application has the option to be informed
215of the network latency and instruct the fabric to redirect its packet when there is congestion on
216the current forwarding path. Similarly, the API offers a way to associate network traffic with a
217network slice, providing QoS guarantees and traffic isolation from other slices. The API also plays
218a critical role in closed loop control by offering a programmatic way to dynamically change the
219packet forwarding behavior.
220
Carmelo Cascone43989982021-10-12 00:01:19 -0700221At a high level, SD-Fabric's APIs fall into four major categories: configuration, information,
Charles Chan10ad1442021-10-05 16:57:26 -0700222control, and OAM.
223
224- Configuration: APIs let users set up SD-Fabric features such as VLAN information for
225 bridging and subnet information for routing.
226- Information: APIs allow users to obtain operation status, metrics, and network events
227 of SD-Fabric, such as link congestion, counters, and port status.
228- Control: APIs enable users to dynamically change the forwarding behavior of the
229 fabric, such as drop or redirect the traffic, setting QoS classification, and applying
230 network slicing policies.
231- OAM: APIs expose operational and management features, such as software upgrade
232 and troubleshooting, allowing SD-Fabric to be integrated with existing orchestration
233 systems and workflows.
234
235Edge-Cloud Ready
236----------------
237SD-Fabric adopts cloud native technologies and methodologies that are well developed and
238widely used in the computing world. Cloud native technologies make the deployment and
239operation of SD-Fabric similar to other software deployed in a cloud environment.
240
241Kubernetes Integration
242^^^^^^^^^^^^^^^^^^^^^^
243Both control plane software (ONOS™ and apps) and, importantly, data plane software (Stratum™),
244are containerized and deployed as Kubernetes services in SD-Fabric. In other words, not only the
245servers but also the switching hardware identify as Kubernetes ‘nodes’ and the same processes
246can be used to manage the lifecycle of both control and data plane containers. For example, Helm
247charts can be used for installing and configuring images for both, while Kubernetes monitors the
248health of all containers and restarts failed instances on servers and switches alike.
249
250.. image:: images/arch-k8s.png
251 :width: 500px
252
253Configuration, Logging, and Troubleshooting
254^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
255SD-Fabric reads all configurations from a single repository and automatically applies appropriate
256config to the relevant components. In contrast to traditional embedded networking, there is no
257need for network operators to go through the error-prone process of configuring individual leaf
258and spine switches. Similarly, logs of each component in SD-Fabric are streamed to an EFK stack
259(ElasticSearch, Fluentbit, Kibana) for log preservation, filtering and analysis. SD-Fabric offers a
260single-pane-of-glass for logging and troubleshooting network state, which can further be
Carmelo Cascone43989982021-10-12 00:01:19 -0700261integrated with operator's backend systems
Charles Chan10ad1442021-10-05 16:57:26 -0700262
263.. image:: images/arch-logging.png
264 :width: 1000px
265
266
267Monitoring and Alerts
268^^^^^^^^^^^^^^^^^^^^^
269SD-Fabric continuously monitors system metrics such as bandwidth utilization and connectivity
270health. These metrics are streamed to Prometheus and Grafana for data aggregation and
271visualization. Additionally, alerts are triggered when metrics meet predefined conditions. This
272allows the operators to react to certain network events such as bandwidth saturation even before
273the issue starts to disrupt user traffic.
274
275.. image:: images/arch-monitoring.png
276 :width: 1000px
277
278Deployment Automation
279^^^^^^^^^^^^^^^^^^^^^
280SD-Fabric utilizes a CI/CD model to manage the lifecycle of the software, allowing developers to
281make rapid iterations when introducing a new feature. New container images are generated
282automatically when new versions are released. Once the hardware is in place, a complete
283deployment of the entire SD-Fabric stack can be pushed from the public cloud with a single click
284fabric-wide in less than two minutes.
285
286.. image:: images/arch-deployment.png
287 :width: 900px
288
289Aether™-Ready
290^^^^^^^^^^^^^
291SD-Fabric fits into a variety of edge use cases. Aether is ONF's private 5G/LTE enterprise edge
292cloud platform, currently running in a dozen sites across multiple geographies as of early 2021.
293
294Aether consists of several edge clouds deployed at enterprise sites controlled and managed by a
295central cloud. Each Aether Edge hosts third-party or in-house edge apps that benefit from low
296latency and high bandwidth connectivity to the local devices and systems at the enterprise edge.
297Each edge also hosts O-RAN compliant private-RAN control, IoT, and AI/ML platforms, and
298terminates mobile user plane traffic by providing local breakout (UPF) at the edge sites. In
299contrast, the Aether management platform centrally runs the shared mobile-core control plane
300that supports all edges from the public cloud. Additionally, from a public cloud a management
301portal for the operator and for each enterprise is provided, and Runtime Operation Control (ROC)
302controls and configures the entire Aether solution in a centralized manner.
303
304SD-Fabric has been fully integrated into the Aether Edge as its underlying network infrastructure,
305interconnecting all hardware equipment in each edge site such as servers and disaggregated RAN
306components with bridging, routing, and advanced processing like local breakout. It is worth
307noting that SD-Fabric can be configured and orchestrated via its configuration APIs by cloud
308solutions, and therefore can be easily integrated with Aether or third party cloud offerings from
309hyperscalers. In Aether, SD-Fabric configurations are centralized, modeled, and generated by
310ROC to ensure the fabric configurations are consistent with other Aether components.
311
312In addition to connectivity, SD-Fabric supports a number of advanced services such as
313hierarchical QoS, network slicing, and UPF idle-mode buffering. And given its native support for
314programmability, we expect many more innovative services to take advantage of SD-Fabric over
315time.
316
317.. image:: images/arch-aether-ready.png
318 :width: 800px
319
320System Components
321-----------------
322
323.. image:: images/arch-software-stack.png
324 :width: 400px
325
326Open Network Operating System (ONOS)
327^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Carmelo Cascone43989982021-10-12 00:01:19 -0700328SD-Fabric uses ONF's Open Network Operating System (ONOS) as the SDN controller. ONOS is
Charles Chan10ad1442021-10-05 16:57:26 -0700329designed as a distributed system, composed of multiple instances operating in a cluster, with all
330instances actively operating on the network while being functionally identical. This unique
331capability of ONOS simultaneously affords high availability and horizontal scaling of the control
332plane. ONOS interacts with the network devices by means of pluggable southbound interfaces.
333In particular, SD-Fabric leverages P4Runtime™ for programming and gNMI for configuring
334certain features (such as port speed) in the fabric switches. Like other SDN controllers, ONOS
335provides several core services like topology discovery and end point discovery (hosts, routers,
336etc. attached to the fabric). Unlike any other open source SDN controller, ONOS delivers these
337core services in a distributed way over the entire cluster, such that applications running in any
338instance of the controller have the same view and information.
339
340ONOS Applications
341^^^^^^^^^^^^^^^^^
342SD-Fabric uses a collection of applications that run on ONOS to provide the fabric features and
343services. The main application responsible for fabric operation handles connectivity features
344according to SD-Fabric architecture, while other apps like DHCP relay, AAA, UPF control, and
345multicast handle more specialized features. Importantly, SD-Fabric uses the ONOS Flow Objective
346API, which allows applications to program switching devices in a pipeline-agnostic
347way. By using Flow-Objectives, applications can be written without worrying about low-level
348pipeline details of various switching chips. The API is implemented by specific device drivers
Carmelo Cascone43989982021-10-12 00:01:19 -0700349that are aware of the pipelines they serve and can thus convert the application's API calls to
Charles Chan10ad1442021-10-05 16:57:26 -0700350device-specific rules. In this way, the application can be written once, and adapted to pipelines
351from different ASIC vendors.
352
353Stratum
354^^^^^^^
355SD-Fabric integrates switch software from the ONF Stratum project. Stratum is an open source
356silicon-independent switch operating system. Stratum implements the latest SDN-centric
357northbound interfaces, including P4, P4Runtime, gNMI/OpenConfig, and gNOI, thereby enabling
358interchangeability of forwarding devices and programmability of forwarding behaviors. On the
359southbound interface, Stratum implements silicon-dependent adapters supporting network
360ASICs such as Intel Tofino, Broadcom™ XGS® line, and others.
361
362Leaf and Spine Switch Hardware
363^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
364In a typical configuration, the leaf and spine hardware used in SD-Fabric are typically Open
365Compute Project (OCP)™ certified switches from a selection of different ODM vendors. The port
366configurations and ASICs used in these switches are dependent on operator needs. For example,
367if the need is only for traditional fabric features, a number of options are possible – e.g., Broadcom
368StrataXGS ASICs in 48x1G/10G, 32x40G/100G configurations. For advanced needs that take
369advantage of P4 and programmable ASICs, Intel Tofino or Broadcom Trident 4 are more
370appropriate choices.
371
372ONL and ONIE
373^^^^^^^^^^^^
374The SD-Fabric switch software stack includes Open Network Linux (ONL) and Open Network
375Install Environment (ONIE) from OCP. The switches are shipped with ONIE, a boot loader that
376enables the installation of the target OS as part of the provisioning process. ONL, a Linux
377distribution for bare metal switches, is used as the base operating system. It ships with a number
378of additional drivers for bare metal switch hardware elements (e.g., LEDs, SFPs) that are typically
379unavailable in normal Linux distributions for bare metal servers (e.g., Ubuntu).
380
381Docker/Kubernetes, Elasticsearch/Fluentbit/Kibana, Prometheus/Grafana
382^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
383While ONOS/Stratum instances can be deployed natively on bare metal servers/switches, there
384are advantages in deploying ONOS/Stratum instances as containers and using a container
385management system like Kubernetes (K8s). In particular, K8s can monitor and automatically
386reboot lost controller instances (container pods), which then rejoin the operating cluster
387seamlessly. SD-Fabric also utilizes widely adopted cloud native technologies such as
388Elastic/Fluentbit/Kibana for log preservation, filtering and analysis, and Prometheus/Grafana for
389metric monitoring and alert.