Blame - amp/roc.rst - aether-docs

blob: 8c276337c4b0655c523544ec5f971a28ec4d6159 [file] [log] [blame]

Scott Baker	a91cbd5	2021-07-28 09:23:08 -0700	[diff] [blame]	1	..
				2	SPDX-FileCopyrightText: © 2020 Open Networking Foundation <support@opennetworking.org>
				3	SPDX-License-Identifier: Apache-2.0
				4
				5	Runtime Operational Control (ROC)
				6	=================================
				7
				8	Purpose
				9	-------
				10
				11	The Aether Runtime Operation Control (ROC) is a component designed with the primary purpose of managing the
				12	Aether Connectivity Service (ACS), including facilitating the integration of edge services with the ACS.
				13	The Aether ROC allows enterprises to configure subscribers and profiles, as well as implement policies related
				14	to those profiles. It also allows the Aether operations team to configure the parameters of those policies.
				15	The ROC is one of many subsystems that make up the Aether Management Platform (AMP).
				16
				17	What the ROC does do:
				18
				19	- Add/Update/Delete/Query configuration
				20
				21	- Persist configuration
				22
				23	- Push configuration to services and devices
				24
				25	- Make observations actionable, either manually or automatically
				26
				27	What the ROC does not do:
				28
				29	- The ROC does not directly deploy or manage the lifecycle of containers.
				30	This is done using the Terraform/Rancher/Helm/Kubernetes stack.
				31
				32	- The ROC does not directly collect or store logging or metric information.
				33	This is done using the ElasticStack and Grafana/Prometheus components.
				34
				35	- The ROC is not a message bus used for component-to-component communication.
				36	If a message bus is required, then a suitable service such as Kafka could be used.
				37
				38	- The ROC does not implement a service dependency graph.
				39	This can be done through helm charts, which are typically hierarchical in nature.
				40
				41	- The ROC is not a formal service mesh.
				42	Other tools, such as Istio, could be leveraged to provide service meshes.
				43
				44	- The ROC does not configure Edge Services.
				45	While the ROC’s modeling support is general and could be leveraged to support an edge service, and an
				46	adapter could be written to configure an edge service, promoting an edge service to ROC management would
				47	be the exception rather than the rule. Edge services have their own GUIs and APIs, perhaps belonging to
				48	a 3rd-party service provider.
				49
				50	Although we call out the tasks that ROC doesn’t do itself, it’s often still necessary for the ROC to be aware
				51	of the actions these other components have taken.
				52	For example, while the ROC doesn’t implement a service dependency graph, it is the case that the ROC is aware
				53	of how services are related. This is necessary because some of the actions it takes affect multiple services
				54	(e.g., a ROC-supported operation on a subscriber profile might result in the ROC making calls to SD-Core,
				55	SD-RAN, and SD-Fabric).
				56
				57	Throughout the design process, the ROC design team has taken lessons learned from prior systems, such as XOS,
				58	and applied them to create a next generation design that focuses on solving the configuration problem in a
				59	focused and lightweight manner.
				60
				61	Design and Requirements
				62	-----------------------
				63
				64	- The ROC must offer an API that may be used by administrators, as well as external services, to configure
				65	Aether.
				66
				67	- This ROC API must support new end-to-end abstractions that cross multiple subsystems of Aether.
				68	For example, “give subscriber X running application Y QoS guarantee Z'' is an abstraction that potentially
				69	spans SD-RAN, SD-Fabric.
				70	The ROC defines and implements such end-to-end abstractions.
				71
				72	- The ROC must offer an Operations GUI to Operations Personnel, so they may configure the Aether Connectivity
				73	service.
				74
				75	- The ROC must offer an Enterprise GUI to Enterprise Personnel, so they may configure the connectivity aspects
				76	of their particular edge site.
				77	It’s possible this GUI shares implementation with the Operations GUI, but the presentation, content, and
				78	workflow may differ.
				79
				80	- The ROC must support versioning of configuration, so changes can be rolled back as necessary, and an audit
				81	history may be retrieved of previous configurations.
				82
				83	- The ROC must support best practices of performance, high availability, reliability, and security.
				84
				85	- The ROC must support role-based access controls (RBAC), so that different parties have different visibility
				86	into the data model.
				87
				88	- The ROC must be extensible.
				89	Aether will incorporate new services over time, and existing services will evolve.
				90
				91	Data Model
				92	----------
				93
				94	An important aspect of the ROC is that it maintains a data model that represents all the abstractions, such as
				95	subscribers and profiles, it is responsible for.
				96	The ROC’s data model is based on YANG specifications.
				97	YANG is a rich language for data modeling, with support for strong validation of the data stored in the models.
				98	YANG allows relations between objects to be specified, adding a relational aspect that our previous approaches
				99	(for example, protobuf) did not directly support.
				100	YANG is agnostic as to how the data is stored, and is not directly tied to SQL/RDBMS or NoSQL paradigms.
				101
				102	ROC uses tooling built around aether-config (an ONOS-based microservice) to maintain a set of YANG models.
				103	Among other things, aether-config implements model versioning.
				104	Migration from one version of the data model to another is supported, as is simultaneous operation of
				105	different versions.
				106
				107	Architecture
				108	------------
				109
				110	Below is a high-level architectural diagram of the ROC:
				111
				112	.. image:: images/aether-architecture.svg
				113
				114	The following walks through the main stack of ROC components in a top-down manner, starting with the GUI(s) and
				115	ending with the devices/services.
				116
				117	Operations Portal / Enterprise Portal
				118	"""""""""""""""""""""""""""""""""""""
				119
				120	The code base for the Operations Portal and Enterprise Portal is shared.
				121	They are two different perspectives of the same portal.
				122	The Operations Portal presents a rougher, more expansive view of the breadth of the Aether modeling.
				123	The Enterprise Portal presents a more curated view of the modeling.
				124	These different perspectives can be enforced through the following:
				125
				126	- RBAC controls, to limit access to information that might be unsuitable for a particular party.
				127
				128	- Dashboards, to aggregate/present information in an intuitive manner
				129
				130	- Multi-step workflows (aka Wizards) to break a complex task into smaller guided steps.
				131
				132	The Portal is an angular-based typescript GUI.
				133	The GUI uses REST API to communicate with the aether-roc-api layer, which in turn communicates with aether-config
				134	via gNMI.
				135	The GUI implementation is consistent with modern GUI design, implemented as a single-page application and includes
				136	a “commit list” that allows several changes to be atomically submitted together.
				137	Views within the GUI are handcrafted, and as new models are added to Aether, the GUI must be adapted to incorporate
				138	the new models.
				139
				140	The Portal is a combination of control and observation.
				141	The control aspect relates to pushing configuration, and the observation aspect relates to viewing metrics,
				142	logging, and alerts.
				143	The Portal will leverage other components to do some of the heavy lifting.
				144	For example, it would make no sense for us to implement our own graph-drawing tool or our own metrics querying
				145	language when Grafana and Prometheus are already able to do that and we can leverage them.
				146	GUI pages can be constructed that embed the Grafana renderer.
				147
				148	aether-roc-api
				149	""""""""""""""
				150
				151	Aether-roc-api a REST API layer that sits between the portals and aether-config.
				152	The southbound layer of aether-roc-api is gNMI.
				153	This is how aether-roc-api talks to aether-config.
				154	Aether-roc-api at this time is entirely auto-generated; developers need not spend time manually creating REST APIs
				155	for their models.
				156	The API layer serves multiple purposes:
				157
				158	- gNMI is an inconvenient interface to use for GUI design, and REST is expected for GUI development.
				159
				160	- The API layer is a potential location for early validation and early security checking, allowing errors to be caught
				161	closer to the user.
				162	This allows error messages to be generated in a more customary way than gNMI.
				163
				164	- The API layer is yet another place for semantic translation to take place.
				165	Although the API layer is currently auto-generated, it is possible that additional methods could be added.
				166	gNMI supports only “GET” and “SET”, whereas the aether-roc-api natively supports “GET”, “PUT”, “POST”, “PATCH”,
				167	and “DELETE”.
				168
				169	aether-config
				170	"""""""""""""
				171
				172	Aether-config (a Aether-specific deployment of the “\ onos-config\ ” microservice) is the core of the ROC’s
				173	configuration system.
				174	Aether-config is a component that other teams may use in other contexts.
				175	It’s possible that an Aether deployment might have multiple instances of aether-config used for independent purposes.
				176	The job of aether-config is to store and version configuration data.
				177	Configuration is pushed to aether-config through the northbound gNMI interface, is stored in an Atomix database
				178	(not shown in the figure), and is pushed to services and devices using a southbound gNMI interface.
				179
				180	Adapters
				181	""""""""
				182
				183	Not every device or service beneath the ROC supports gNMI, and in the case where it is not supported, an adapter is
				184	written to translate between gNMI and the device’s or service’s native API.
				185	For example, a gNMI → REST adapter exists to translate between the ROC’s modeling and the Aether Connectivity
				186	Control (SD-Core) components. The adapter is not necessarily only a syntactic translation, but may also be a
				187	semantic translation.
				188	[1]_ This supports a logical decoupling of the models stored in the ROC and the interface used by the southbound
				189	device/service, allowing the southbound device/service and the ROC to evolve independently.
				190	It also allows for southbound devices/services to be replaced without affecting the northbound interface.
				191
				192	Workflow Engine
				193	"""""""""""""""
				194
				195	The workflow engine, to the left of the aether-config stack, is where multi-step workflows may be implemented.
				196	At this time we do not have these workflows, but during the experience with SEBA/VOLTHA, we learned that workflow
				197	became a key aspect of the implementation.
				198	For example, SEBA had a state machine surrounding how devices were authorized, activated, and deactivated.
				199	The workflow engine is a placeholder where workflows may be implemented in Aether as they are required.
				200
				201	Another use of the workflow engine may be to translate between levels in modeling.
				202	For example, the workflow engine may examine the high-level Enterprise modeling and make changes to the Operations
				203	modeling to achieve the Enterprise behavior.
				204
				205	Previously this component was referred to as “onos-ztp”.
				206	It is expected that a workflow engine would both read and write the aether-config data model, as well as respond to
				207	external events.
				208
				209	Analytics Engine
				210	""""""""""""""""
				211
				212	The analytics engine, to the right of the aether-config stack, is where enrichment of analytics will be performed.
				213	Raw metrics and logs are collected with open source components Grafana/Prometheus and ElasticStack.
				214	Those metrics might need additional transformation before they can be presented to Enterprise users, or in some
				215	cases even before they are presented to the Ops team.
				216	The Analytics engine would be a place where those metrics could be transformed or enriched, and then written back
				217	to Prometheus or Elastic (or forwarded as alerts).
				218
				219	The analytics engine is also where analytics would be related to config models in aether-config, in order for
				220	Enterprise or Operations personnel to take action in response to data and insights received through analytics.
				221	Action doesn’t necessarily have to involve humans.
				222	It is expected that the combination of Analytics Engine and Workflow Engine could automate a response.
				223
				224	The analytics engine also provides an opportunity to implement access control from the telemetry API.
				225	Prometheus itself is not multi-tenant and does not support fine-grained access controls.
				226
				227	Aether Operator
				228	"""""""""""""""
				229
				230	Not pictured in the diagram is the ONOS Operator, which is responsible for configuring the models within
				231	aether-config. Models to load are specified by a helm chart.
				232	The operator compiles them on demand and incorporates them into aether-config.
				233	This eliminates dynamic load compatibility issues that were previously a problem with building models and
				234	aether-config separately. Operators are considered a best practice in Kubernetes.
				235
				236	Modules are loaded into the process primarily for performance and simplicity reasons.
				237	The design team has had experience with other systems (for example, Voltha and XOS) where modules were decoupled
				238	and message buses introduced between them, but that can lead to both complexity issues and performance bottlenecks
				239	in those systems. The same module and operator pattern will be applied to aether-roc-api.
				240
				241	Aether Modeling
				242	---------------
				243
				244	There is no fixed distinction between high-level and low-level modeling in the ROC.
				245	There is one set of Aether modeling that might have customer-facing and internal-facing aspects.
				246
				247	.. image:: images/aether-highlevel.svg
				248
				249	The above diagram is an example of how a single set of models could serve both high-level and low-level needs and
				250	is not necessarily identical to the current implementation.
				251	For example, App and Service are concepts that are necessarily enterprise-facing.
				252	UPF\ s are concepts that are operator-facing.
				253	A UPF might be used by a Service, but the customer need not be aware of this detail.
				254	Similarly, some objects might be partially customer-facing and partially operator-facing.
				255	For example, a Radio is a piece of hardware the customer has deployed on his premises, so he must know of it, but
				256	the configuration details of the radio (signal strength, IP address, etc) are operator-facing.
				257
				258	An approximation of the current Aether-3.0 (Release 1.5) modeling is presented below:
				259
				260	.. image:: images/aether-3.0-models.svg
				261
				262	The key Enterprise-facing abstractions are Applicatio\ n, Virtual Cellular Service (VCS), and DeviceGroup.
				263
				264	Identity Management
				265	-------------------
				266
				267	The ROC leverages an external identity database (i.e.
				268	LDAP server) to store user data such as account names and passwords for users who are able to log in to the ROC.
				269	This LDAP server also has the capability to associate users with groups, for example adding ROC administrators to
				270	ONFAetherAdmin would be a way to grant those people administrative privileges within the ROC.
				271
				272	An external authentication service (DEX) is used to authenticate the user, handling the mechanics of accepting the
				273	password, validating it, and securely returning the group the user belongs to.
				274	The group identifier is then used to grant access to resources within the ROC.
				275
				276	The ROC leverages Open Policy Agent (OPA) as a framework for writing access control policies.
				277
				278	Securing Machine-to-Machine Communications
				279	------------------------------------------
				280
				281	gNMI naturally lends itself to mutual TLS for authentication, and that is the recommended way to secure
				282	communications between components that speak gNMI.
				283	For example, the communication between aether-config and its adapters uses gNMI and therefore uses mutual TLS.
				284	Distributing certificates between components is a problem outside the scope of the ROC.
				285	It’s assumed that another tool will be responsible for distribution, renewing certificates before they expire, etc.
				286
				287	For components that speak REST, HTTPS is used to secure the connection, and authentication can take place using
				288	mechanisms within the HTTPS protocol (basic auth, tokens, etc).
				289	Oath2 and OpenID Connect are leveraged as an authorization provider when using these REST APIs.
				290
				291	.. [1]
				292	Adaptors are an ad hoc approach to implementing the workflow engine,
				293	where they map models onto models, including the appropriate semantic
				294	translation. This is what we originally did in XOS, but we prefer a
				295	more structured approach for ROC.
				296
				297
				298