blob: dbe2ec4ae1b93031fbcab79e63e327a1ff6abf1f [file] [log] [blame]
Scott Bakera91cbd52021-07-28 09:23:08 -07001..
2 SPDX-FileCopyrightText: © 2020 Open Networking Foundation <support@opennetworking.org>
3 SPDX-License-Identifier: Apache-2.0
4
5Runtime Operational Control (ROC)
6=================================
7
8Purpose
9-------
10
11The Aether Runtime Operation Control (ROC) is a component designed with the primary purpose of managing the
12Aether Connectivity Service (ACS), including facilitating the integration of edge services with the ACS.
13The Aether ROC allows enterprises to configure subscribers and profiles, as well as implement policies related
14to those profiles. It also allows the Aether operations team to configure the parameters of those policies.
15The ROC is one of many subsystems that make up the Aether Management Platform (AMP).
16
17What the ROC *does* do:
18
19- Add/Update/Delete/Query configuration
20
21- Persist configuration
22
23- Push configuration to services and devices
24
25- Make observations actionable, either manually or automatically
26
27What the ROC *does not* do:
28
29- The ROC does not directly deploy or manage the lifecycle of containers.
30 This is done using the Terraform/Rancher/Helm/Kubernetes stack.
31
32- The ROC does not directly collect or store logging or metric information.
33 This is done using the ElasticStack and Grafana/Prometheus components.
34
35- The ROC is not a message bus used for component-to-component communication.
36 If a message bus is required, then a suitable service such as Kafka could be used.
37
38- The ROC does not implement a service dependency graph.
39 This can be done through helm charts, which are typically hierarchical in nature.
40
41- The ROC is not a formal service mesh.
42 Other tools, such as Istio, could be leveraged to provide service meshes.
43
44- The ROC does not configure *Edge Services*.
Zack Williams1ae109e2021-07-27 11:17:04 -070045 While the ROC's modeling support is general and could be leveraged to support an edge service, and an
Scott Bakera91cbd52021-07-28 09:23:08 -070046 adapter could be written to configure an edge service, promoting an edge service to ROC management would
47 be the exception rather than the rule. Edge services have their own GUIs and APIs, perhaps belonging to
48 a 3rd-party service provider.
49
Zack Williams1ae109e2021-07-27 11:17:04 -070050Although we call out the tasks that ROC doesn't do itself, it's often still necessary for the ROC to be aware
Scott Bakera91cbd52021-07-28 09:23:08 -070051of the actions these other components have taken.
Zack Williams1ae109e2021-07-27 11:17:04 -070052For example, while the ROC doesn't implement a service dependency graph, it is the case that the ROC is aware
Scott Bakera91cbd52021-07-28 09:23:08 -070053of how services are related. This is necessary because some of the actions it takes affect multiple services
54(e.g., a ROC-supported operation on a subscriber profile might result in the ROC making calls to SD-Core,
55SD-RAN, and SD-Fabric).
56
57Throughout the design process, the ROC design team has taken lessons learned from prior systems, such as XOS,
58and applied them to create a next generation design that focuses on solving the configuration problem in a
59focused and lightweight manner.
60
61Design and Requirements
62-----------------------
63
64- The ROC must offer an *API* that may be used by administrators, as well as external services, to configure
65 Aether.
66
67- This ROC API must support new end-to-end abstractions that cross multiple subsystems of Aether.
Zack Williams1ae109e2021-07-27 11:17:04 -070068 For example, "give subscriber X running application Y QoS guarantee Z" is an abstraction that potentially
Scott Bakera91cbd52021-07-28 09:23:08 -070069 spans SD-RAN, SD-Fabric.
70 The ROC defines and implements such end-to-end abstractions.
71
72- The ROC must offer an *Operations GUI* to Operations Personnel, so they may configure the Aether Connectivity
73 service.
74
75- The ROC must offer an *Enterprise GUI* to Enterprise Personnel, so they may configure the connectivity aspects
76 of their particular edge site.
Zack Williams1ae109e2021-07-27 11:17:04 -070077 It's possible this GUI shares implementation with the Operations GUI, but the presentation, content, and
Scott Bakera91cbd52021-07-28 09:23:08 -070078 workflow may differ.
79
80- The ROC must support *versioning* of configuration, so changes can be rolled back as necessary, and an audit
81 history may be retrieved of previous configurations.
82
83- The ROC must support best practices of *performance*, *high availability*, *reliability*, and *security*.
84
85- The ROC must support *role-based access controls (RBAC)*, so that different parties have different visibility
86 into the data model.
87
88- The ROC must be extensible.
89 Aether will incorporate new services over time, and existing services will evolve.
90
91Data Model
92----------
93
94An important aspect of the ROC is that it maintains a data model that represents all the abstractions, such as
95subscribers and profiles, it is responsible for.
Zack Williams1ae109e2021-07-27 11:17:04 -070096The ROC's data model is based on YANG specifications.
Scott Bakera91cbd52021-07-28 09:23:08 -070097YANG is a rich language for data modeling, with support for strong validation of the data stored in the models.
98YANG allows relations between objects to be specified, adding a relational aspect that our previous approaches
99(for example, protobuf) did not directly support.
100YANG is agnostic as to how the data is stored, and is not directly tied to SQL/RDBMS or NoSQL paradigms.
101
102ROC uses tooling built around aether-config (an ONOS-based microservice) to maintain a set of YANG models.
103Among other things, aether-config implements model versioning.
104Migration from one version of the data model to another is supported, as is simultaneous operation of
105different versions.
106
107Architecture
108------------
109
110Below is a high-level architectural diagram of the ROC:
111
Scott Bakerf07ec232022-03-08 12:20:24 -0800112.. image:: images/roc-diagram-for-guide.svg
Sean Condon257687f2021-08-23 11:13:20 +0100113 :width: 1000
Scott Bakera91cbd52021-07-28 09:23:08 -0700114
115The following walks through the main stack of ROC components in a top-down manner, starting with the GUI(s) and
116ending with the devices/services.
117
Scott Bakerf07ec232022-03-08 12:20:24 -0800118Aether Portals
119""""""""""""""
Scott Bakera91cbd52021-07-28 09:23:08 -0700120
Scott Bakerf07ec232022-03-08 12:20:24 -0800121One or more portals may reside above the ROC, providing a convenient user interface.
122These will include an *Operations Portal* that will have a high level of technical
123detail for Aether staff, as well as an *Enterprise Portal* that will have a presentation
124aimed at customers.
Scott Bakera91cbd52021-07-28 09:23:08 -0700125These different perspectives can be enforced through the following:
126
127- RBAC controls, to limit access to information that might be unsuitable for a particular party.
128
129- Dashboards, to aggregate/present information in an intuitive manner
130
131- Multi-step workflows (aka Wizards) to break a complex task into smaller guided steps.
132
133The *Portal* is an angular-based typescript GUI.
Zack Williams1ae109e2021-07-27 11:17:04 -0700134The GUI uses REST API to communicate with the ``aether-roc-api`` layer, which in turn communicates with aether-config
Scott Bakera91cbd52021-07-28 09:23:08 -0700135via gNMI.
136The GUI implementation is consistent with modern GUI design, implemented as a single-page application and includes
Zack Williams1ae109e2021-07-27 11:17:04 -0700137a "commit list" that allows several changes to be atomically submitted together.
Scott Bakera91cbd52021-07-28 09:23:08 -0700138Views within the GUI are handcrafted, and as new models are added to Aether, the GUI must be adapted to incorporate
139the new models.
140
141The Portal is a combination of control and observation.
142The control aspect relates to pushing configuration, and the observation aspect relates to viewing metrics,
143logging, and alerts.
144The Portal will leverage other components to do some of the heavy lifting.
145For example, it would make no sense for us to implement our own graph-drawing tool or our own metrics querying
146language when Grafana and Prometheus are already able to do that and we can leverage them.
147GUI pages can be constructed that embed the Grafana renderer.
148
Zack Williams1ae109e2021-07-27 11:17:04 -0700149``aether-roc-api``
150""""""""""""""""""
Scott Bakera91cbd52021-07-28 09:23:08 -0700151
Zack Williams1ae109e2021-07-27 11:17:04 -0700152``aether-roc-api`` a REST API layer that sits between the portals and aether-config.
153The southbound layer of ``aether-roc-api`` is gNMI.
154This is how ``aether-roc-api`` talks to aether-config.
155``aether-roc-api`` at this time is entirely auto-generated; developers need not spend time manually creating REST APIs
Scott Bakera91cbd52021-07-28 09:23:08 -0700156for their models.
157The API layer serves multiple purposes:
158
159- gNMI is an inconvenient interface to use for GUI design, and REST is expected for GUI development.
160
161- The API layer is a potential location for early validation and early security checking, allowing errors to be caught
162 closer to the user.
163 This allows error messages to be generated in a more customary way than gNMI.
164
165- The API layer is yet another place for semantic translation to take place.
166 Although the API layer is currently auto-generated, it is possible that additional methods could be added.
Zack Williams1ae109e2021-07-27 11:17:04 -0700167 gNMI supports only "GET" and "SET", whereas the ``aether-roc-api`` natively supports "GET", "PUT", "POST", "PATCH",
168 and "DELETE".
Scott Bakera91cbd52021-07-28 09:23:08 -0700169
Scott Bakerf07ec232022-03-08 12:20:24 -0800170aether-config stack
171"""""""""""""""""""
Scott Bakera91cbd52021-07-28 09:23:08 -0700172
Zack Williams1ae109e2021-07-27 11:17:04 -0700173*Aether-config* (a Aether-specific deployment of the "\ *onos-config*\ " microservice) is the core of the ROC's
Scott Bakera91cbd52021-07-28 09:23:08 -0700174configuration system.
175Aether-config is a component that other teams may use in other contexts.
Zack Williams1ae109e2021-07-27 11:17:04 -0700176It's possible that an Aether deployment might have multiple instances of aether-config used for independent purposes.
Scott Bakera91cbd52021-07-28 09:23:08 -0700177The job of aether-config is to store and version configuration data.
Scott Bakerf07ec232022-03-08 12:20:24 -0800178Configuration is pushed to aether-config through the northbound gNMI interface, stored in an Atomix database,
179then pushed to services and devices using a southbound gNMI interface.
180An operator is part of the aether-config stack and assists in configuring onos-topo (not pictured),
181a topology management component.
Scott Bakera91cbd52021-07-28 09:23:08 -0700182
183Adapters
184""""""""
185
186Not every device or service beneath the ROC supports gNMI, and in the case where it is not supported, an adapter is
Zack Williams1ae109e2021-07-27 11:17:04 -0700187written to translate between gNMI and the device's or service's native API.
188For example, a gNMI → REST adapter exists to translate between the ROC's modeling and the Aether Connectivity
Scott Bakera91cbd52021-07-28 09:23:08 -0700189Control (SD-Core) components. The adapter is not necessarily only a syntactic translation, but may also be a
190semantic translation.
191[1]_ This supports a logical decoupling of the models stored in the ROC and the interface used by the southbound
192device/service, allowing the southbound device/service and the ROC to evolve independently.
193It also allows for southbound devices/services to be replaced without affecting the northbound interface.
194
195Workflow Engine
196"""""""""""""""
197
198The workflow engine, to the left of the aether-config stack, is where multi-step workflows may be implemented.
Scott Bakera91cbd52021-07-28 09:23:08 -0700199The workflow engine is a placeholder where workflows may be implemented in Aether as they are required.
Scott Bakera91cbd52021-07-28 09:23:08 -0700200It is expected that a workflow engine would both read and write the aether-config data model, as well as respond to
201external events.
202
203Analytics Engine
204""""""""""""""""
205
206The analytics engine, to the right of the aether-config stack, is where enrichment of analytics will be performed.
Scott Bakerf07ec232022-03-08 12:20:24 -0800207Raw metrics and events are pushed to the analytics engine through an event bus such as Kafka.
208The events are processed by an event processor that enriches the event with context from multiple sources, including
209from the configuration system.
210The enriched events are then stored in a local database.
211Aether-config can query the enriched events as part of gNMI operational state.
212The enriched events are also pushed through a northbound abstraction, where they may be utilized by
213Grafana, or utilized directly by the Aether portals.
Scott Bakera91cbd52021-07-28 09:23:08 -0700214
215The analytics engine also provides an opportunity to implement access control from the telemetry API.
Scott Bakerf07ec232022-03-08 12:20:24 -0800216For example, if Prometheus is chosen as the northbound abstraction, then a solution such as
217prom-label-proxy may be used for access control.
Scott Bakera91cbd52021-07-28 09:23:08 -0700218
219Aether Modeling
220---------------
221
222There is no fixed distinction between high-level and low-level modeling in the ROC.
223There is one set of Aether modeling that might have customer-facing and internal-facing aspects.
224
225.. image:: images/aether-highlevel.svg
Sean Condon257687f2021-08-23 11:13:20 +0100226 :width: 600
Scott Bakera91cbd52021-07-28 09:23:08 -0700227
228The above diagram is an example of how a single set of models could serve both high-level and low-level needs and
229is not necessarily identical to the current implementation.
230For example, *App* and *Service* are concepts that are necessarily enterprise-facing.
231*UPF*\ s are concepts that are operator-facing.
232A UPF might be used by a Service, but the customer need not be aware of this detail.
233Similarly, some objects might be partially customer-facing and partially operator-facing.
234For example, a *Radio* is a piece of hardware the customer has deployed on his premises, so he must know of it, but
235the configuration details of the radio (signal strength, IP address, etc) are operator-facing.
236
Scott Bakerf07ec232022-03-08 12:20:24 -0800237For further information on the set of models used in this Aether release, consult :ref:`roc-developer-guide`.
Scott Bakera91cbd52021-07-28 09:23:08 -0700238
239Identity Management
240-------------------
241
242The ROC leverages an external identity database (i.e.
243LDAP server) to store user data such as account names and passwords for users who are able to log in to the ROC.
244This LDAP server also has the capability to associate users with groups, for example adding ROC administrators to
245ONFAetherAdmin would be a way to grant those people administrative privileges within the ROC.
246
247An external authentication service (DEX) is used to authenticate the user, handling the mechanics of accepting the
248password, validating it, and securely returning the group the user belongs to.
249The group identifier is then used to grant access to resources within the ROC.
250
251The ROC leverages Open Policy Agent (OPA) as a framework for writing access control policies.
252
253Securing Machine-to-Machine Communications
254------------------------------------------
255
256gNMI naturally lends itself to mutual TLS for authentication, and that is the recommended way to secure
257communications between components that speak gNMI.
258For example, the communication between aether-config and its adapters uses gNMI and therefore uses mutual TLS.
259Distributing certificates between components is a problem outside the scope of the ROC.
Zack Williams1ae109e2021-07-27 11:17:04 -0700260It's assumed that another tool will be responsible for distribution, renewing certificates before they expire, etc.
Scott Bakera91cbd52021-07-28 09:23:08 -0700261
262For components that speak REST, HTTPS is used to secure the connection, and authentication can take place using
263mechanisms within the HTTPS protocol (basic auth, tokens, etc).
264Oath2 and OpenID Connect are leveraged as an authorization provider when using these REST APIs.
265
266.. [1]
Zack Williams1ae109e2021-07-27 11:17:04 -0700267 Adapters are an ad hoc approach to implementing the workflow engine,
Scott Bakera91cbd52021-07-28 09:23:08 -0700268 where they map models onto models, including the appropriate semantic
269 translation. This is what we originally did in XOS, but we prefer a
270 more structured approach for ROC.
271
272
Sean Condon256df682022-01-24 14:36:16 +0000273Operations Portal Usage
274-----------------------
Scott Bakera91cbd52021-07-28 09:23:08 -0700275
Sean Condon256df682022-01-24 14:36:16 +0000276The Operations Portal is available as a web application, at a location defined in the Ingress of the Cluster.
277
278It is secured by SSL and an authentication system based on OpenID Connect. The implementation of this is through
279Keycloak, with users and groups defined in LDAP. It has a Role Based Access Control (RBAC) implementation based
280on Open Policy Agent (OPA).
281
282The Operations Portal is built on Angular 12 framework, and is compatible with the latest versions of:
283
284.. list-table:: Browser Compatibility
285 :widths: 40 60
286 :header-rows: 0
287
288 * - Google Chrome
289 - latest
290 * - Mozilla Firefox
291 - latest and extended support release (ESR)
292 * - Microsoft Edge
293 - 2 most recent major versions
294 * - Apple Safari
295 - 2 most recent major versions
296 * - Apple iOS
297 - 2 most recent major versions
298 * - Google Android
299 - 2 most recent major versions
300