blob: b7d27dcce554228bd1a63d81e5d753ac0bc24aa8 [file] [log] [blame]
Andrea Campanella448fbc22021-05-13 15:39:00 +02001=============================================
2VOLTHA and ONOS software update procedures
3=============================================
4
Joey Armstrong342430f2024-04-10 10:36:34 -04005This document describes the software upgrade procedure for VOLTHA and ONOS
6in a deployed system. Distinction is made between a `minor` software upgrade,
7which can be done in service, meaning with no dataplane service interruption
8to existing customers, and a `major` software upgrade, which in turns requires
9a full maintenance window during which service is impacted.
Andrea Campanella448fbc22021-05-13 15:39:00 +020010
Joey Armstrong342430f2024-04-10 10:36:34 -040011Changes to data-structures in storage (ETCD for VOLTHA and Atomix for ONOS)
12are out of scope for in-service upgrades. Such changes qualify as major
13software upgrades that require a maintenance windows. The KAFKA bus update
14has its own section given that the procedure is different from the rest of
15the components. The following elements expect a fully working provisioned
16VOLTHA and ONOS deployment on top of a Kubernetes cluster, with exposed ONOS
17REST API ports. It is also expected that new versions of the different
18components are available to the operator that performs the upgrade.
Andrea Campanella448fbc22021-05-13 15:39:00 +020019
20Minor Software Version Update
21=============================
Joey Armstrong342430f2024-04-10 10:36:34 -040022
23The `minor` software upgrade qualifier refers to an upgrade that does not
24involve API changes, which in VOLTHA, refers to either a change to the protos
25or to voltha-lib-go, and in ONOS to a change in the Java interfaces, CLI
26commands or REST APIs of either the Apps or the platform. A `minor` software
27update is intended for bug fixes and not for new features. `Minor` software
28update is supported only for ONOS apps and VOLTHA components. No in service
29software update is supported for ETCD or Kafka.
Andrea Campanella448fbc22021-05-13 15:39:00 +020030
31VOLTHA services
32---------------
Joey Armstrong342430f2024-04-10 10:36:34 -040033
Andrea Campanella448fbc22021-05-13 15:39:00 +020034VOLTHA components `minor` software upgrade leverages `helm` and `k8s`.
Joey Armstrong342430f2024-04-10 10:36:34 -040035During this process is expected that no provision subscriber call is
36executed from the northbound. In process calls will be executed thanks to
37the stored data and/or the persistence of messages over KAFKA.
Andrea Campanella448fbc22021-05-13 15:39:00 +020038
39After changes in the code are made and verified the following steps are needed:
40
41#. Update Minor Version of the component
42#. Build a new version of the needed component to update
43#. update the component's minor version in the helm chart
44#. | issue the helm upgrade command. If the changes have been already upstreamed to ONF the upstream chart
45 | `onf/<component name>` can be used, otherwise a local copy of the chart is required.
46
47Following is an example of the `helm` command to upgrade the openonu adapter.
48Topics, kv store paths and kafka endpoints need to be adapted to the specific deployment.
49
50.. code:: bash
51
52 helm upgrade --install --create-namespace \
53 -n voltha1 opeonu-adapter onf/voltha-adapter-openonu \
54 --set global.stack_name=voltha1 \
55 --set adapter_open_onu.kv_store_data_prefix=service/voltha/voltha1_voltha1 \
56 --set adapter_open_onu.topics.core_topic=voltha1_voltha1_rwcore \
57 --set adapter_open_onu.topics.adapter_open_onu_topic=voltha1_voltha1_brcm_openomci_onu \
58 --set services.kafka.adapter.service=voltha-infra-kafka.infra.svc \
59 --set services.kafka.cluster.service=voltha-infra-kafka.infra.svc \
60 --set services.etcd.service=voltha-infra-etcd.infra.svc
61
62ONOS apps
63---------
64`Minor` software update is also available for the following ONOS apps - `sadis`, `olt`, `aaa`, `kafka`, `dhcpl2relay`,
65`mac-learning`, `igmpproxy`, and `mcast`. These apps can be thus updated with no impact on the dataplane of provisioned
66subscribers. The `minor` software update for the ONOS apps leverage existing ONOS REST APIs.
67
Joey Armstrong342430f2024-04-10 10:36:34 -040068During this process is expected that no provision subscriber call is
69executed from the REST APIs. In process calls will be executed thanks to
70the Atomix stored flows. Some metrics and/or packet processing might be
71lost during this procedure, the system relies on retry mechanisms present in
72the services and the dataplane protocols for converging to a stable stated
73(e.g. DHCP retry).
Andrea Campanella448fbc22021-05-13 15:39:00 +020074
75
76After changes in the code of ONOS apps are made and verified the following steps are needed:
77
78#. | obtain the .oar of the app, either via a local build with `mvn clean install` or, if the code has been upstreamed
79 | by downloading it from `maven central <https://search.maven.org/search?q=g:org.opencord>`_ or sonatype.
80#. Delete the old version of the ONOS app.
81#. Upload install and activate the new `oar` file.
82
83Following is an example of the different `curl` commands to upgrade the olt app. This assumes the .oar to be present in
84the directory where the command is executed from/
85
86.. code:: bash
87
88 # download the app
89 curl --fail -sSL https://oss.sonatype.org/content/groups/public/org/opencord/olt-app/4.5.0-SNAPSHOT/olt-app-4.5.0-20210504.162620-3.oar > org.opencord.olt-4.5.0.SNAPSHOT.oar
90 # delete the app
91 curl --fail -sSL -X DELETE http://karaf:karaf@127.0.0.1:8181/onos/v1/applications/org.opencord.olt
92 # install and activate the new version of the app
93 curl --fail -sSL -H Content-Type:application/octet-stream -X POST http://karaf:karaf@127.0.0.1:8181/onos/v1/applications?activate=true --data-binary @org.opencord.olt-4.5.0.SNAPSHOT.oar 2>&1
94
Andrea Campanellae1a64ab2022-01-28 14:48:37 +010095Minor Software Version Rollback Due To Failure
96----------------------------------------------
97
98A `Minor` software upgrade can incur in failures and broken functionality. There are two possible cases, 1. container
99does not start, 2. broken functionality during operations
100
101VOLTHA Component updated container does not start
102^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
103
104This is automatically handled by Kubernetes. An old version of the pod does not get
105terminated unless the new one is running and ready according to its readiness probe.
106No system or data-plane functionality is impacted.
107
108The operator will need to go in, manually delete the failing pod, fix the issue and re-deploy after
109fixing the new `minor` version.
110
111VOLTHA Component Broken functionality during operations
112^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
113
114In this case the container started and became `ready` in Kubernetes but functionality of the system or data-plane
115is broken, e.g. a subscriber can't be provisioned or no traffic is flowing.
116
117In this case the operator needs to perform a manual intervention,
118rolling back to the previous minor version of the container. The rollback operation is the same as a `minor` software
119update via `helm` but instead of increasing the version number it's a decrement of it to the last known running one.
120
121ONOS app not starting or broken functionality
122^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Hardik Windlass702b5fa2022-06-29 17:27:01 +0530123
Andrea Campanellae1a64ab2022-01-28 14:48:37 +0100124For ONOS apps a manual intervention is always necessary, both if the app does not start or if functionality is broken.
125The rollback of an ONOS application is done by following the same procedure as the
126update using the previous, or last known working, version of the `.oar` file.
Andrea Campanella448fbc22021-05-13 15:39:00 +0200127
Hardik Windlass702b5fa2022-06-29 17:27:01 +0530128Inter-dependency among changes submitted in different Components
129^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
130
131Even though it is expected that minor version upgrade should be seemless,
132still there are chances that the changes that went in for a component are related with other component changes.
133In this case the operator needs to perform a manual intervention,
134and upgrade the components manually in desired order.
135
Andrea Campanella448fbc22021-05-13 15:39:00 +0200136Major Software Version Update
137=============================
138A software update is qualified to be `major` where there are changes in the APIs or in the format of the
139data stored by a component.
140
141A major software update at the moment in VOLTHA and ONOS requires a maintenance window
142during which the dataplane for the subscribers is going to be interrupted, thus no service will be provided.
143There are several cases and they can be handled differently.
144
145VOLTHA services API or Data format changes
146------------------------------------------
147A `major` update is needed because VOLTHA API between components have been changed or because format of the data being
148stored is different, thus a complete-wipe out needs to be performed.
149In such scenario each stack can be updated independently with no teardown required of the infrastructure of ONOS,
150ETCD, KAFKA.
151Different versions of Voltha can co-exists over the same infrastructure.
152
153The procedure is iterative on each stack and is performed as follows:
154
155#. un-provision all the subscribers via ONOS REST API.
156#. delete all the OLTs managed by the stack via VOLTHA gRPC API.
157#. upgrade the stack version via `helm` upgrade command and the correct version of the `voltha-stack` chart.
158
Andrea Campanellac18d1182021-09-10 12:01:38 +0200159Details on the `helm` commands can be found in the voltha-helm-charts README file <voltha-helm-charts/README.md>_
Andrea Campanella448fbc22021-05-13 15:39:00 +0200160
161If the API change is between the `openolt adapter` and the `openolt agent` on the OLT hardware please refer to section
162:ref:`OpenOLT Agent Update <openolt-update>`.
163
164
165ONOS, Atomix or ONOS apps
166-------------------------
167A `major` update is needed because of changes in the interfaces (Java APIs), REST APIs, of ONOS itself or in one
168of the apps have been made, rendering incompatible the two subsequent implementations. A `major` software update is
169also needed for changes made to the data stored in Atomix or for an update of the Atomix version iself.
170In this scenario all the stacks connected to an ONOS instance need to be cleaned of data before moving them
171over to a new ONOS cluster.
172
173The procedure is as follows:
174
175#. deploy a new ONOS cluster in a new namespace `infra1`
176#. un-provision all the subscribers via ONOS REST API
177#. delete the OLT device (not strictly required, but best to ensure clean state)
178#. redeploy the of-agent with the new ONOS cluster endpoints
179#. re-provision the OLT
180#. re-provision the subscribers
181#. iterate over steps 2,3,4,5,6 for each of the stack connected to the ONOS you want to update.
182
183Following is an example on how to deploy ONOS:
184
185.. code:: bash
186
187 helm install --create-namespace \
188 --set replicas=3,atomix.replicas=3 \
189 --set atomix.persistence.enabled=false \
190 --set image.pullPolicy=Always,image.repository=voltha/voltha-onos,image.tag=5.0.0 \
191 --namespace infra1 onos onos/onos-classic
192
193Following is an example on how to re-deploy the of-agent, using the `voltha-stack` chart,
194pointing new controller endpoints. Only the `ofagent` pod will be restarted.
195
196.. code:: bash
197
198 helm upgrade --install --create-namespace \
199 --set global.topics.core_topic=voltha1_voltha1_rwcore,defaults.kv_store_data_prefix=service/minimal \
200 --set global.kv_store_data_prefix=service/voltha/voltha1_voltha1 \
201 --set services.etcd.port=2379 --set services.etcd.address=etcd.default.svc:2379 \
202 --set services.kafka.adapter.service=voltha-infra-kafka.infra.svc \
203 --set services.kafka.cluster.service=voltha-infra-kafka.infra.svc \
204 --set services.etcd.service=voltha-infra-etcd.infra.svc
205 --set 'voltha.services.controller[0].service=voltha-infra1-onos-classic-0.voltha-infra1-onos-classic-hs.infra1.svc' \
206 --set 'voltha.services.controller[0].port=6653' \
207 --set 'voltha.services.controller[0].address=voltha-infra1-onos-classic-0.voltha-infra1-onos-classic-hs.infra1.svc:6653' \
208 --set 'voltha.services.controller[1].service=voltha-infra1-onos-classic-1.voltha-infra1-onos-classic-hs.infra1.svc' \
209 --set 'voltha.services.controller[1].port=6653' \
210 --set 'voltha.services.controller[1].address=voltha-infra1-onos-classic-1.voltha-infra1-onos-classic-hs.infra1.svc:6653' \
211 --set 'voltha.services.controller[2].service=voltha-infra1-onos-classic-2.voltha-infra1-onos-classic-hs.infra1.svc' \
212 --set 'voltha.services.controller[2].port=6653' \
213 --set 'voltha.services.controller[2].address=voltha-infra1-onos-classic-2.voltha-infra1-onos-classic-hs.infra1.svc:6653' \
214 --set global.log_level=WARN --namespace voltha voltha onf/voltha-stack
215
216ETCD
217----
218A `major` update is needed because tearing down the ETCD cluster means deleting the data stored,
219thus requiring a rebuild by the different components.
220
221The procedure is as follows:
222
223#. deploy a new ETCD cluster.
224#. un-provision all the subscribers via ONOS REST API
225#. delete the OLT device (not strictly required, but best to ensure clean state)
226#. redeploy the voltha stack with the `voltha-stack` `helm` chart pointing it to the new ETCD endpoints.
227#. re-provision the OLT
228#. re-provision the subscribers
229#. iterate over steps 2,3,4,5,6 for each stack connected to the ETCD cluster you want to update.
230
Andrea Campanellac18d1182021-09-10 12:01:38 +0200231Details on the `helm` commands for the voltha stack can be found in the `voltha-helm-charts README file <../voltha-helm-charts/README.md>`_
Andrea Campanella448fbc22021-05-13 15:39:00 +0200232
233Following is an example on how to deploy a new 3 node ETCD cluster:
234
235.. code:: bash
236
237 helm install --create-namespace --set auth.rbac.enabled=false,persistence.enabled=false,statefulset.replicaCount=3 --namespace infra etcd bitnami/etcd
238
239KAFKA Update
240============
241An update of Kafka is not considered to be a `major` software upgrade because it can be performed with
242no service impact to the user.
243
244.. code:: bash
245
246 helm install --create-namespace --set global.log_level=WARN --namespace infra kafka bitnami/kafka
247
248Following is an example on how to re-deploy the stack pods, using the `voltha-stack` chart,
249pointing new kafka (`voltha-infra-kafka-2.infra.svc`) endpoints.
250Each pod will be restarted but without dataplane interruption because it will be the same of a pod restart,
251thus leveraging the data stored in ETCD.
252
253.. code:: bash
254
255 helm upgrade --install --create-namespace \
256 --set global.topics.core_topic=voltha1_voltha1_rwcore,defaults.kv_store_data_prefix=service/minimal \
257 --set global.kv_store_data_prefix=service/voltha/voltha1_voltha1 \
258 --set services.etcd.port=2379 --set services.etcd.address=etcd.default.svc:2379 \
259 --set services.kafka.adapter.service=voltha-infra-kafka-2.infra.svc \
260 --set services.kafka.cluster.service=voltha-infra-kafka-2.infra.svc \
261 --set services.etcd.service=voltha-infra-etcd.infra.svc
262 --set 'voltha.services.controller[0].service=voltha-infra-onos-classic-0.voltha-infra-onos-classic-hs.infra.svc' \
263 --set 'voltha.services.controller[0].port=6653' \
264 --set 'voltha.services.controller[0].address=voltha-infra-onos-classic-0.voltha-infra-onos-classic-hs.infra.svc:6653' \
265 --set 'voltha.services.controller[1].service=voltha-infra-onos-classic-1.voltha-infra-onos-classic-hs.infra.svc' \
266 --set 'voltha.services.controller[1].port=6653' \
267 --set 'voltha.services.controller[1].address=voltha-infra-onos-classic-1.voltha-infra-onos-classic-hs.infra.svc:6653' \
268 --set 'voltha.services.controller[2].service=voltha-infra-onos-classic-2.voltha-infra-onos-classic-hs.infra.svc' \
269 --set 'voltha.services.controller[2].port=6653' \
270 --set 'voltha.services.controller[2].address=voltha-infra-onos-classic-2.voltha-infra-onos-classic-hs.infra.svc:6653' \
271 --set global.log_level=WARN --namespace voltha voltha onf/voltha
272
273
274.. _openolt-update:
275
276OpenOLT Agent Update
277====================
278
279The `openolt agent` on the box can be upgrade without having to teardown all the VOLTHA stack to which the OLT was
280connected. Again here we make the ditinction of a minor update and a major update of the openolt agent.
281A minor update happens when there is no API change between the `openolt agent` and the `openolt adapter`, meaning the
282`openolt.proto` has not been updated in either of those components.
283A major update is required when there are changes to the `openolt.proto` API.
284
285Both updates of the OpenOLT agent are service impacting for the customer.
286
287Minor Update
288------------
289A minor update will be seen from VOLTHA as a reboot of the OLT.
290During a minor update of the openolt agent no northbound should be done, in progress provision call will
291reconcile upon OLT reboot. Events, metrics and performance measurements data can be lost and should not be expected
292during this procedure.
293The procedure is as follows:
294
295#. place the new openolt agent `.deb` package on the desired OLT.
296#. stop the running `openolt`, `dev_mgmnt_deamon` and optionally the `watchdog` processes on the OLT.
297#. run the new openolt packages
298#. reboot the OLT hardware.
299
300After these steps are done VOLTHA will re-receive the OLT connection and re-provision data accordingly.
301
302Major update
303------------
304A major update will require the OLT to be deleted from VOLTHA to ensure no inconsistent data is stored.
305During a major update of the openolt agent and adapter no northbound should be done and
306in progress call will fail. Events, metrics and performance measurements data will be lost.
307The procedure is as follows:
308
309#. Delete the OLT device from VOLTHA (e.g. voltctl device delete <olt_id>)
310#. Upgrade the openolt-adapter to the new version via `helm upgrade`.
311#. place the new openolt agent `.deb` package on the desired OLT.
312#. stop the running `openolt`, `dev_mgmnt_deamon` and optionally the `watchdog` processes on the OLT.
313#. run the new openolt packages
314#. reboot the OLT hardware.
315#. re-provision the OLT (e.g. `voltctl device provision <ip:port>`
316#. re-enable the OLT (e.g. `voltctl device enable <olt_id>`
317#. re-provision the subscribers.
318
319After these steps VOLTHA effectively treats the OLT as a brand new one which it had no prior knowledge of.