Scott Baker | a91cbd5 | 2021-07-28 09:23:08 -0700 | [diff] [blame] | 1 | .. |
| 2 | SPDX-FileCopyrightText: © 2020 Open Networking Foundation <support@opennetworking.org> |
| 3 | SPDX-License-Identifier: Apache-2.0 |
| 4 | |
| 5 | Runtime Operational Control (ROC) |
| 6 | ================================= |
| 7 | |
| 8 | Purpose |
| 9 | ------- |
| 10 | |
| 11 | The Aether Runtime Operation Control (ROC) is a component designed with the primary purpose of managing the |
| 12 | Aether Connectivity Service (ACS), including facilitating the integration of edge services with the ACS. |
| 13 | The Aether ROC allows enterprises to configure subscribers and profiles, as well as implement policies related |
| 14 | to those profiles. It also allows the Aether operations team to configure the parameters of those policies. |
| 15 | The ROC is one of many subsystems that make up the Aether Management Platform (AMP). |
| 16 | |
| 17 | What the ROC *does* do: |
| 18 | |
| 19 | - Add/Update/Delete/Query configuration |
| 20 | |
| 21 | - Persist configuration |
| 22 | |
| 23 | - Push configuration to services and devices |
| 24 | |
| 25 | - Make observations actionable, either manually or automatically |
| 26 | |
| 27 | What the ROC *does not* do: |
| 28 | |
| 29 | - The ROC does not directly deploy or manage the lifecycle of containers. |
| 30 | This is done using the Terraform/Rancher/Helm/Kubernetes stack. |
| 31 | |
| 32 | - The ROC does not directly collect or store logging or metric information. |
| 33 | This is done using the ElasticStack and Grafana/Prometheus components. |
| 34 | |
| 35 | - The ROC is not a message bus used for component-to-component communication. |
| 36 | If a message bus is required, then a suitable service such as Kafka could be used. |
| 37 | |
| 38 | - The ROC does not implement a service dependency graph. |
| 39 | This can be done through helm charts, which are typically hierarchical in nature. |
| 40 | |
| 41 | - The ROC is not a formal service mesh. |
| 42 | Other tools, such as Istio, could be leveraged to provide service meshes. |
| 43 | |
| 44 | - The ROC does not configure *Edge Services*. |
Zack Williams | 1ae109e | 2021-07-27 11:17:04 -0700 | [diff] [blame] | 45 | While the ROC's modeling support is general and could be leveraged to support an edge service, and an |
Scott Baker | a91cbd5 | 2021-07-28 09:23:08 -0700 | [diff] [blame] | 46 | adapter could be written to configure an edge service, promoting an edge service to ROC management would |
| 47 | be the exception rather than the rule. Edge services have their own GUIs and APIs, perhaps belonging to |
| 48 | a 3rd-party service provider. |
| 49 | |
Zack Williams | 1ae109e | 2021-07-27 11:17:04 -0700 | [diff] [blame] | 50 | Although we call out the tasks that ROC doesn't do itself, it's often still necessary for the ROC to be aware |
Scott Baker | a91cbd5 | 2021-07-28 09:23:08 -0700 | [diff] [blame] | 51 | of the actions these other components have taken. |
Zack Williams | 1ae109e | 2021-07-27 11:17:04 -0700 | [diff] [blame] | 52 | For example, while the ROC doesn't implement a service dependency graph, it is the case that the ROC is aware |
Scott Baker | a91cbd5 | 2021-07-28 09:23:08 -0700 | [diff] [blame] | 53 | of how services are related. This is necessary because some of the actions it takes affect multiple services |
| 54 | (e.g., a ROC-supported operation on a subscriber profile might result in the ROC making calls to SD-Core, |
| 55 | SD-RAN, and SD-Fabric). |
| 56 | |
| 57 | Throughout the design process, the ROC design team has taken lessons learned from prior systems, such as XOS, |
| 58 | and applied them to create a next generation design that focuses on solving the configuration problem in a |
| 59 | focused and lightweight manner. |
| 60 | |
| 61 | Design and Requirements |
| 62 | ----------------------- |
| 63 | |
| 64 | - The ROC must offer an *API* that may be used by administrators, as well as external services, to configure |
| 65 | Aether. |
| 66 | |
| 67 | - This ROC API must support new end-to-end abstractions that cross multiple subsystems of Aether. |
Zack Williams | 1ae109e | 2021-07-27 11:17:04 -0700 | [diff] [blame] | 68 | For example, "give subscriber X running application Y QoS guarantee Z" is an abstraction that potentially |
Scott Baker | a91cbd5 | 2021-07-28 09:23:08 -0700 | [diff] [blame] | 69 | spans SD-RAN, SD-Fabric. |
| 70 | The ROC defines and implements such end-to-end abstractions. |
| 71 | |
| 72 | - The ROC must offer an *Operations GUI* to Operations Personnel, so they may configure the Aether Connectivity |
| 73 | service. |
| 74 | |
| 75 | - The ROC must offer an *Enterprise GUI* to Enterprise Personnel, so they may configure the connectivity aspects |
| 76 | of their particular edge site. |
Zack Williams | 1ae109e | 2021-07-27 11:17:04 -0700 | [diff] [blame] | 77 | It's possible this GUI shares implementation with the Operations GUI, but the presentation, content, and |
Scott Baker | a91cbd5 | 2021-07-28 09:23:08 -0700 | [diff] [blame] | 78 | workflow may differ. |
| 79 | |
| 80 | - The ROC must support *versioning* of configuration, so changes can be rolled back as necessary, and an audit |
| 81 | history may be retrieved of previous configurations. |
| 82 | |
| 83 | - The ROC must support best practices of *performance*, *high availability*, *reliability*, and *security*. |
| 84 | |
| 85 | - The ROC must support *role-based access controls (RBAC)*, so that different parties have different visibility |
| 86 | into the data model. |
| 87 | |
| 88 | - The ROC must be extensible. |
| 89 | Aether will incorporate new services over time, and existing services will evolve. |
| 90 | |
| 91 | Data Model |
| 92 | ---------- |
| 93 | |
| 94 | An important aspect of the ROC is that it maintains a data model that represents all the abstractions, such as |
| 95 | subscribers and profiles, it is responsible for. |
Zack Williams | 1ae109e | 2021-07-27 11:17:04 -0700 | [diff] [blame] | 96 | The ROC's data model is based on YANG specifications. |
Scott Baker | a91cbd5 | 2021-07-28 09:23:08 -0700 | [diff] [blame] | 97 | YANG is a rich language for data modeling, with support for strong validation of the data stored in the models. |
| 98 | YANG allows relations between objects to be specified, adding a relational aspect that our previous approaches |
| 99 | (for example, protobuf) did not directly support. |
| 100 | YANG is agnostic as to how the data is stored, and is not directly tied to SQL/RDBMS or NoSQL paradigms. |
| 101 | |
| 102 | ROC uses tooling built around aether-config (an ONOS-based microservice) to maintain a set of YANG models. |
| 103 | Among other things, aether-config implements model versioning. |
| 104 | Migration from one version of the data model to another is supported, as is simultaneous operation of |
| 105 | different versions. |
| 106 | |
| 107 | Architecture |
| 108 | ------------ |
| 109 | |
| 110 | Below is a high-level architectural diagram of the ROC: |
| 111 | |
Scott Baker | f07ec23 | 2022-03-08 12:20:24 -0800 | [diff] [blame^] | 112 | .. image:: images/roc-diagram-for-guide.svg |
Sean Condon | 257687f | 2021-08-23 11:13:20 +0100 | [diff] [blame] | 113 | :width: 1000 |
Scott Baker | a91cbd5 | 2021-07-28 09:23:08 -0700 | [diff] [blame] | 114 | |
| 115 | The following walks through the main stack of ROC components in a top-down manner, starting with the GUI(s) and |
| 116 | ending with the devices/services. |
| 117 | |
Scott Baker | f07ec23 | 2022-03-08 12:20:24 -0800 | [diff] [blame^] | 118 | Aether Portals |
| 119 | """""""""""""" |
Scott Baker | a91cbd5 | 2021-07-28 09:23:08 -0700 | [diff] [blame] | 120 | |
Scott Baker | f07ec23 | 2022-03-08 12:20:24 -0800 | [diff] [blame^] | 121 | One or more portals may reside above the ROC, providing a convenient user interface. |
| 122 | These will include an *Operations Portal* that will have a high level of technical |
| 123 | detail for Aether staff, as well as an *Enterprise Portal* that will have a presentation |
| 124 | aimed at customers. |
Scott Baker | a91cbd5 | 2021-07-28 09:23:08 -0700 | [diff] [blame] | 125 | These different perspectives can be enforced through the following: |
| 126 | |
| 127 | - RBAC controls, to limit access to information that might be unsuitable for a particular party. |
| 128 | |
| 129 | - Dashboards, to aggregate/present information in an intuitive manner |
| 130 | |
| 131 | - Multi-step workflows (aka Wizards) to break a complex task into smaller guided steps. |
| 132 | |
| 133 | The *Portal* is an angular-based typescript GUI. |
Zack Williams | 1ae109e | 2021-07-27 11:17:04 -0700 | [diff] [blame] | 134 | The GUI uses REST API to communicate with the ``aether-roc-api`` layer, which in turn communicates with aether-config |
Scott Baker | a91cbd5 | 2021-07-28 09:23:08 -0700 | [diff] [blame] | 135 | via gNMI. |
| 136 | The GUI implementation is consistent with modern GUI design, implemented as a single-page application and includes |
Zack Williams | 1ae109e | 2021-07-27 11:17:04 -0700 | [diff] [blame] | 137 | a "commit list" that allows several changes to be atomically submitted together. |
Scott Baker | a91cbd5 | 2021-07-28 09:23:08 -0700 | [diff] [blame] | 138 | Views within the GUI are handcrafted, and as new models are added to Aether, the GUI must be adapted to incorporate |
| 139 | the new models. |
| 140 | |
| 141 | The Portal is a combination of control and observation. |
| 142 | The control aspect relates to pushing configuration, and the observation aspect relates to viewing metrics, |
| 143 | logging, and alerts. |
| 144 | The Portal will leverage other components to do some of the heavy lifting. |
| 145 | For example, it would make no sense for us to implement our own graph-drawing tool or our own metrics querying |
| 146 | language when Grafana and Prometheus are already able to do that and we can leverage them. |
| 147 | GUI pages can be constructed that embed the Grafana renderer. |
| 148 | |
Zack Williams | 1ae109e | 2021-07-27 11:17:04 -0700 | [diff] [blame] | 149 | ``aether-roc-api`` |
| 150 | """""""""""""""""" |
Scott Baker | a91cbd5 | 2021-07-28 09:23:08 -0700 | [diff] [blame] | 151 | |
Zack Williams | 1ae109e | 2021-07-27 11:17:04 -0700 | [diff] [blame] | 152 | ``aether-roc-api`` a REST API layer that sits between the portals and aether-config. |
| 153 | The southbound layer of ``aether-roc-api`` is gNMI. |
| 154 | This is how ``aether-roc-api`` talks to aether-config. |
| 155 | ``aether-roc-api`` at this time is entirely auto-generated; developers need not spend time manually creating REST APIs |
Scott Baker | a91cbd5 | 2021-07-28 09:23:08 -0700 | [diff] [blame] | 156 | for their models. |
| 157 | The API layer serves multiple purposes: |
| 158 | |
| 159 | - gNMI is an inconvenient interface to use for GUI design, and REST is expected for GUI development. |
| 160 | |
| 161 | - The API layer is a potential location for early validation and early security checking, allowing errors to be caught |
| 162 | closer to the user. |
| 163 | This allows error messages to be generated in a more customary way than gNMI. |
| 164 | |
| 165 | - The API layer is yet another place for semantic translation to take place. |
| 166 | Although the API layer is currently auto-generated, it is possible that additional methods could be added. |
Zack Williams | 1ae109e | 2021-07-27 11:17:04 -0700 | [diff] [blame] | 167 | gNMI supports only "GET" and "SET", whereas the ``aether-roc-api`` natively supports "GET", "PUT", "POST", "PATCH", |
| 168 | and "DELETE". |
Scott Baker | a91cbd5 | 2021-07-28 09:23:08 -0700 | [diff] [blame] | 169 | |
Scott Baker | f07ec23 | 2022-03-08 12:20:24 -0800 | [diff] [blame^] | 170 | aether-config stack |
| 171 | """"""""""""""""""" |
Scott Baker | a91cbd5 | 2021-07-28 09:23:08 -0700 | [diff] [blame] | 172 | |
Zack Williams | 1ae109e | 2021-07-27 11:17:04 -0700 | [diff] [blame] | 173 | *Aether-config* (a Aether-specific deployment of the "\ *onos-config*\ " microservice) is the core of the ROC's |
Scott Baker | a91cbd5 | 2021-07-28 09:23:08 -0700 | [diff] [blame] | 174 | configuration system. |
| 175 | Aether-config is a component that other teams may use in other contexts. |
Zack Williams | 1ae109e | 2021-07-27 11:17:04 -0700 | [diff] [blame] | 176 | It's possible that an Aether deployment might have multiple instances of aether-config used for independent purposes. |
Scott Baker | a91cbd5 | 2021-07-28 09:23:08 -0700 | [diff] [blame] | 177 | The job of aether-config is to store and version configuration data. |
Scott Baker | f07ec23 | 2022-03-08 12:20:24 -0800 | [diff] [blame^] | 178 | Configuration is pushed to aether-config through the northbound gNMI interface, stored in an Atomix database, |
| 179 | then pushed to services and devices using a southbound gNMI interface. |
| 180 | An operator is part of the aether-config stack and assists in configuring onos-topo (not pictured), |
| 181 | a topology management component. |
Scott Baker | a91cbd5 | 2021-07-28 09:23:08 -0700 | [diff] [blame] | 182 | |
| 183 | Adapters |
| 184 | """""""" |
| 185 | |
| 186 | Not every device or service beneath the ROC supports gNMI, and in the case where it is not supported, an adapter is |
Zack Williams | 1ae109e | 2021-07-27 11:17:04 -0700 | [diff] [blame] | 187 | written to translate between gNMI and the device's or service's native API. |
| 188 | For example, a gNMI → REST adapter exists to translate between the ROC's modeling and the Aether Connectivity |
Scott Baker | a91cbd5 | 2021-07-28 09:23:08 -0700 | [diff] [blame] | 189 | Control (SD-Core) components. The adapter is not necessarily only a syntactic translation, but may also be a |
| 190 | semantic translation. |
| 191 | [1]_ This supports a logical decoupling of the models stored in the ROC and the interface used by the southbound |
| 192 | device/service, allowing the southbound device/service and the ROC to evolve independently. |
| 193 | It also allows for southbound devices/services to be replaced without affecting the northbound interface. |
| 194 | |
| 195 | Workflow Engine |
| 196 | """"""""""""""" |
| 197 | |
| 198 | The workflow engine, to the left of the aether-config stack, is where multi-step workflows may be implemented. |
Scott Baker | a91cbd5 | 2021-07-28 09:23:08 -0700 | [diff] [blame] | 199 | The workflow engine is a placeholder where workflows may be implemented in Aether as they are required. |
Scott Baker | a91cbd5 | 2021-07-28 09:23:08 -0700 | [diff] [blame] | 200 | It is expected that a workflow engine would both read and write the aether-config data model, as well as respond to |
| 201 | external events. |
| 202 | |
| 203 | Analytics Engine |
| 204 | """""""""""""""" |
| 205 | |
| 206 | The analytics engine, to the right of the aether-config stack, is where enrichment of analytics will be performed. |
Scott Baker | f07ec23 | 2022-03-08 12:20:24 -0800 | [diff] [blame^] | 207 | Raw metrics and events are pushed to the analytics engine through an event bus such as Kafka. |
| 208 | The events are processed by an event processor that enriches the event with context from multiple sources, including |
| 209 | from the configuration system. |
| 210 | The enriched events are then stored in a local database. |
| 211 | Aether-config can query the enriched events as part of gNMI operational state. |
| 212 | The enriched events are also pushed through a northbound abstraction, where they may be utilized by |
| 213 | Grafana, or utilized directly by the Aether portals. |
Scott Baker | a91cbd5 | 2021-07-28 09:23:08 -0700 | [diff] [blame] | 214 | |
| 215 | The analytics engine also provides an opportunity to implement access control from the telemetry API. |
Scott Baker | f07ec23 | 2022-03-08 12:20:24 -0800 | [diff] [blame^] | 216 | For example, if Prometheus is chosen as the northbound abstraction, then a solution such as |
| 217 | prom-label-proxy may be used for access control. |
Scott Baker | a91cbd5 | 2021-07-28 09:23:08 -0700 | [diff] [blame] | 218 | |
| 219 | Aether Modeling |
| 220 | --------------- |
| 221 | |
| 222 | There is no fixed distinction between high-level and low-level modeling in the ROC. |
| 223 | There is one set of Aether modeling that might have customer-facing and internal-facing aspects. |
| 224 | |
| 225 | .. image:: images/aether-highlevel.svg |
Sean Condon | 257687f | 2021-08-23 11:13:20 +0100 | [diff] [blame] | 226 | :width: 600 |
Scott Baker | a91cbd5 | 2021-07-28 09:23:08 -0700 | [diff] [blame] | 227 | |
| 228 | The above diagram is an example of how a single set of models could serve both high-level and low-level needs and |
| 229 | is not necessarily identical to the current implementation. |
| 230 | For example, *App* and *Service* are concepts that are necessarily enterprise-facing. |
| 231 | *UPF*\ s are concepts that are operator-facing. |
| 232 | A UPF might be used by a Service, but the customer need not be aware of this detail. |
| 233 | Similarly, some objects might be partially customer-facing and partially operator-facing. |
| 234 | For example, a *Radio* is a piece of hardware the customer has deployed on his premises, so he must know of it, but |
| 235 | the configuration details of the radio (signal strength, IP address, etc) are operator-facing. |
| 236 | |
Scott Baker | f07ec23 | 2022-03-08 12:20:24 -0800 | [diff] [blame^] | 237 | For further information on the set of models used in this Aether release, consult :ref:`roc-developer-guide`. |
Scott Baker | a91cbd5 | 2021-07-28 09:23:08 -0700 | [diff] [blame] | 238 | |
| 239 | Identity Management |
| 240 | ------------------- |
| 241 | |
| 242 | The ROC leverages an external identity database (i.e. |
| 243 | LDAP server) to store user data such as account names and passwords for users who are able to log in to the ROC. |
| 244 | This LDAP server also has the capability to associate users with groups, for example adding ROC administrators to |
| 245 | ONFAetherAdmin would be a way to grant those people administrative privileges within the ROC. |
| 246 | |
| 247 | An external authentication service (DEX) is used to authenticate the user, handling the mechanics of accepting the |
| 248 | password, validating it, and securely returning the group the user belongs to. |
| 249 | The group identifier is then used to grant access to resources within the ROC. |
| 250 | |
| 251 | The ROC leverages Open Policy Agent (OPA) as a framework for writing access control policies. |
| 252 | |
| 253 | Securing Machine-to-Machine Communications |
| 254 | ------------------------------------------ |
| 255 | |
| 256 | gNMI naturally lends itself to mutual TLS for authentication, and that is the recommended way to secure |
| 257 | communications between components that speak gNMI. |
| 258 | For example, the communication between aether-config and its adapters uses gNMI and therefore uses mutual TLS. |
| 259 | Distributing certificates between components is a problem outside the scope of the ROC. |
Zack Williams | 1ae109e | 2021-07-27 11:17:04 -0700 | [diff] [blame] | 260 | It's assumed that another tool will be responsible for distribution, renewing certificates before they expire, etc. |
Scott Baker | a91cbd5 | 2021-07-28 09:23:08 -0700 | [diff] [blame] | 261 | |
| 262 | For components that speak REST, HTTPS is used to secure the connection, and authentication can take place using |
| 263 | mechanisms within the HTTPS protocol (basic auth, tokens, etc). |
| 264 | Oath2 and OpenID Connect are leveraged as an authorization provider when using these REST APIs. |
| 265 | |
| 266 | .. [1] |
Zack Williams | 1ae109e | 2021-07-27 11:17:04 -0700 | [diff] [blame] | 267 | Adapters are an ad hoc approach to implementing the workflow engine, |
Scott Baker | a91cbd5 | 2021-07-28 09:23:08 -0700 | [diff] [blame] | 268 | where they map models onto models, including the appropriate semantic |
| 269 | translation. This is what we originally did in XOS, but we prefer a |
| 270 | more structured approach for ROC. |
| 271 | |
| 272 | |
Sean Condon | 256df68 | 2022-01-24 14:36:16 +0000 | [diff] [blame] | 273 | Operations Portal Usage |
| 274 | ----------------------- |
Scott Baker | a91cbd5 | 2021-07-28 09:23:08 -0700 | [diff] [blame] | 275 | |
Sean Condon | 256df68 | 2022-01-24 14:36:16 +0000 | [diff] [blame] | 276 | The Operations Portal is available as a web application, at a location defined in the Ingress of the Cluster. |
| 277 | |
| 278 | It is secured by SSL and an authentication system based on OpenID Connect. The implementation of this is through |
| 279 | Keycloak, with users and groups defined in LDAP. It has a Role Based Access Control (RBAC) implementation based |
| 280 | on Open Policy Agent (OPA). |
| 281 | |
| 282 | The Operations Portal is built on Angular 12 framework, and is compatible with the latest versions of: |
| 283 | |
| 284 | .. list-table:: Browser Compatibility |
| 285 | :widths: 40 60 |
| 286 | :header-rows: 0 |
| 287 | |
| 288 | * - Google Chrome |
| 289 | - latest |
| 290 | * - Mozilla Firefox |
| 291 | - latest and extended support release (ESR) |
| 292 | * - Microsoft Edge |
| 293 | - 2 most recent major versions |
| 294 | * - Apple Safari |
| 295 | - 2 most recent major versions |
| 296 | * - Apple iOS |
| 297 | - 2 most recent major versions |
| 298 | * - Google Android |
| 299 | - 2 most recent major versions |
| 300 | |