blob: b125bb52838e0cedad90b7094645a8bff1d5df73 [file] [log] [blame]
Andrea Campanella448fbc22021-05-13 15:39:00 +02001=============================================
2VOLTHA and ONOS software update procedures
3=============================================
4
5This document describes the software upgrade procedure for VOLTHA and ONOS in a deployed system.
6Distinction is made between a `minor` software upgrade, which can be done in service,
7meaning with no dataplane service interruption to existing customers, and a `major` software upgrade,
8which in turns requires a full maintenance window during which service is impacted.
9
10Changes to data-structures in storage (ETCD for VOLTHA and Atomix for ONOS) are out of scope for in-service upgrades.
11Such changes qualify as major software upgrades that require a maintenance windows.
12The KAFKA bus update has its own section given that the procedure is different from the rest of the components.
13The following elements expect a fully working provisioned VOLTHA and ONOS deployment on top of a Kubernetes cluster,
14with exposed ONOS REST API ports.
15It is also expected that new versions of the different components are available to the operator that performs
16the upgrade.
17
18Minor Software Version Update
19=============================
20The `minor` software upgrade qualifier refers to an upgrade that does not involve API
21changes, which in VOLTHA, refers to either a change to the protos or to voltha-lib-go,
22and in ONOS to a change in the Java interfaces, CLI commands or REST APIs of either the Apps or the platform.
23A `minor` software update is intended for bug fixes and not for new features.
24`Minor` software update is supported only for ONOS apps and VOLTHA components. No in service software update
25is supported for ETCD or Kafka.
26
27VOLTHA services
28---------------
29VOLTHA components `minor` software upgrade leverages `helm` and `k8s`.
30During this process is expected that no provision subscriber call is executed from the northbound.
31In process calls will be executed thanks to the stored data and/or the persistence of messages over KAFKA.
32
33After changes in the code are made and verified the following steps are needed:
34
35#. Update Minor Version of the component
36#. Build a new version of the needed component to update
37#. update the component's minor version in the helm chart
38#. | issue the helm upgrade command. If the changes have been already upstreamed to ONF the upstream chart
39 | `onf/<component name>` can be used, otherwise a local copy of the chart is required.
40
41Following is an example of the `helm` command to upgrade the openonu adapter.
42Topics, kv store paths and kafka endpoints need to be adapted to the specific deployment.
43
44.. code:: bash
45
46 helm upgrade --install --create-namespace \
47 -n voltha1 opeonu-adapter onf/voltha-adapter-openonu \
48 --set global.stack_name=voltha1 \
49 --set adapter_open_onu.kv_store_data_prefix=service/voltha/voltha1_voltha1 \
50 --set adapter_open_onu.topics.core_topic=voltha1_voltha1_rwcore \
51 --set adapter_open_onu.topics.adapter_open_onu_topic=voltha1_voltha1_brcm_openomci_onu \
52 --set services.kafka.adapter.service=voltha-infra-kafka.infra.svc \
53 --set services.kafka.cluster.service=voltha-infra-kafka.infra.svc \
54 --set services.etcd.service=voltha-infra-etcd.infra.svc
55
56ONOS apps
57---------
58`Minor` software update is also available for the following ONOS apps - `sadis`, `olt`, `aaa`, `kafka`, `dhcpl2relay`,
59`mac-learning`, `igmpproxy`, and `mcast`. These apps can be thus updated with no impact on the dataplane of provisioned
60subscribers. The `minor` software update for the ONOS apps leverage existing ONOS REST APIs.
61
62During this process is expected that no provision subscriber call is executed from the REST APIs.
63In process calls will be executed thanks to the Atomix stored flows.
64Some metrics and/or packet processing might be lost during this procedure, the system relies on retry mechanisms
65present in the services and the dataplane protocols for converging to a stable stated (e.g. DHCP retry)
66
67
68After changes in the code of ONOS apps are made and verified the following steps are needed:
69
70#. | obtain the .oar of the app, either via a local build with `mvn clean install` or, if the code has been upstreamed
71 | by downloading it from `maven central <https://search.maven.org/search?q=g:org.opencord>`_ or sonatype.
72#. Delete the old version of the ONOS app.
73#. Upload install and activate the new `oar` file.
74
75Following is an example of the different `curl` commands to upgrade the olt app. This assumes the .oar to be present in
76the directory where the command is executed from/
77
78.. code:: bash
79
80 # download the app
81 curl --fail -sSL https://oss.sonatype.org/content/groups/public/org/opencord/olt-app/4.5.0-SNAPSHOT/olt-app-4.5.0-20210504.162620-3.oar > org.opencord.olt-4.5.0.SNAPSHOT.oar
82 # delete the app
83 curl --fail -sSL -X DELETE http://karaf:karaf@127.0.0.1:8181/onos/v1/applications/org.opencord.olt
84 # install and activate the new version of the app
85 curl --fail -sSL -H Content-Type:application/octet-stream -X POST http://karaf:karaf@127.0.0.1:8181/onos/v1/applications?activate=true --data-binary @org.opencord.olt-4.5.0.SNAPSHOT.oar 2>&1
86
Andrea Campanellae1a64ab2022-01-28 14:48:37 +010087Minor Software Version Rollback Due To Failure
88----------------------------------------------
89
90A `Minor` software upgrade can incur in failures and broken functionality. There are two possible cases, 1. container
91does not start, 2. broken functionality during operations
92
93VOLTHA Component updated container does not start
94^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
95
96This is automatically handled by Kubernetes. An old version of the pod does not get
97terminated unless the new one is running and ready according to its readiness probe.
98No system or data-plane functionality is impacted.
99
100The operator will need to go in, manually delete the failing pod, fix the issue and re-deploy after
101fixing the new `minor` version.
102
103VOLTHA Component Broken functionality during operations
104^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
105
106In this case the container started and became `ready` in Kubernetes but functionality of the system or data-plane
107is broken, e.g. a subscriber can't be provisioned or no traffic is flowing.
108
109In this case the operator needs to perform a manual intervention,
110rolling back to the previous minor version of the container. The rollback operation is the same as a `minor` software
111update via `helm` but instead of increasing the version number it's a decrement of it to the last known running one.
112
113ONOS app not starting or broken functionality
114^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Hardik Windlass702b5fa2022-06-29 17:27:01 +0530115
Andrea Campanellae1a64ab2022-01-28 14:48:37 +0100116For ONOS apps a manual intervention is always necessary, both if the app does not start or if functionality is broken.
117The rollback of an ONOS application is done by following the same procedure as the
118update using the previous, or last known working, version of the `.oar` file.
Andrea Campanella448fbc22021-05-13 15:39:00 +0200119
Hardik Windlass702b5fa2022-06-29 17:27:01 +0530120Inter-dependency among changes submitted in different Components
121^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
122
123Even though it is expected that minor version upgrade should be seemless,
124still there are chances that the changes that went in for a component are related with other component changes.
125In this case the operator needs to perform a manual intervention,
126and upgrade the components manually in desired order.
127
Andrea Campanella448fbc22021-05-13 15:39:00 +0200128Major Software Version Update
129=============================
130A software update is qualified to be `major` where there are changes in the APIs or in the format of the
131data stored by a component.
132
133A major software update at the moment in VOLTHA and ONOS requires a maintenance window
134during which the dataplane for the subscribers is going to be interrupted, thus no service will be provided.
135There are several cases and they can be handled differently.
136
137VOLTHA services API or Data format changes
138------------------------------------------
139A `major` update is needed because VOLTHA API between components have been changed or because format of the data being
140stored is different, thus a complete-wipe out needs to be performed.
141In such scenario each stack can be updated independently with no teardown required of the infrastructure of ONOS,
142ETCD, KAFKA.
143Different versions of Voltha can co-exists over the same infrastructure.
144
145The procedure is iterative on each stack and is performed as follows:
146
147#. un-provision all the subscribers via ONOS REST API.
148#. delete all the OLTs managed by the stack via VOLTHA gRPC API.
149#. upgrade the stack version via `helm` upgrade command and the correct version of the `voltha-stack` chart.
150
Andrea Campanellac18d1182021-09-10 12:01:38 +0200151Details on the `helm` commands can be found in the voltha-helm-charts README file <voltha-helm-charts/README.md>_
Andrea Campanella448fbc22021-05-13 15:39:00 +0200152
153If the API change is between the `openolt adapter` and the `openolt agent` on the OLT hardware please refer to section
154:ref:`OpenOLT Agent Update <openolt-update>`.
155
156
157ONOS, Atomix or ONOS apps
158-------------------------
159A `major` update is needed because of changes in the interfaces (Java APIs), REST APIs, of ONOS itself or in one
160of the apps have been made, rendering incompatible the two subsequent implementations. A `major` software update is
161also needed for changes made to the data stored in Atomix or for an update of the Atomix version iself.
162In this scenario all the stacks connected to an ONOS instance need to be cleaned of data before moving them
163over to a new ONOS cluster.
164
165The procedure is as follows:
166
167#. deploy a new ONOS cluster in a new namespace `infra1`
168#. un-provision all the subscribers via ONOS REST API
169#. delete the OLT device (not strictly required, but best to ensure clean state)
170#. redeploy the of-agent with the new ONOS cluster endpoints
171#. re-provision the OLT
172#. re-provision the subscribers
173#. iterate over steps 2,3,4,5,6 for each of the stack connected to the ONOS you want to update.
174
175Following is an example on how to deploy ONOS:
176
177.. code:: bash
178
179 helm install --create-namespace \
180 --set replicas=3,atomix.replicas=3 \
181 --set atomix.persistence.enabled=false \
182 --set image.pullPolicy=Always,image.repository=voltha/voltha-onos,image.tag=5.0.0 \
183 --namespace infra1 onos onos/onos-classic
184
185Following is an example on how to re-deploy the of-agent, using the `voltha-stack` chart,
186pointing new controller endpoints. Only the `ofagent` pod will be restarted.
187
188.. code:: bash
189
190 helm upgrade --install --create-namespace \
191 --set global.topics.core_topic=voltha1_voltha1_rwcore,defaults.kv_store_data_prefix=service/minimal \
192 --set global.kv_store_data_prefix=service/voltha/voltha1_voltha1 \
193 --set services.etcd.port=2379 --set services.etcd.address=etcd.default.svc:2379 \
194 --set services.kafka.adapter.service=voltha-infra-kafka.infra.svc \
195 --set services.kafka.cluster.service=voltha-infra-kafka.infra.svc \
196 --set services.etcd.service=voltha-infra-etcd.infra.svc
197 --set 'voltha.services.controller[0].service=voltha-infra1-onos-classic-0.voltha-infra1-onos-classic-hs.infra1.svc' \
198 --set 'voltha.services.controller[0].port=6653' \
199 --set 'voltha.services.controller[0].address=voltha-infra1-onos-classic-0.voltha-infra1-onos-classic-hs.infra1.svc:6653' \
200 --set 'voltha.services.controller[1].service=voltha-infra1-onos-classic-1.voltha-infra1-onos-classic-hs.infra1.svc' \
201 --set 'voltha.services.controller[1].port=6653' \
202 --set 'voltha.services.controller[1].address=voltha-infra1-onos-classic-1.voltha-infra1-onos-classic-hs.infra1.svc:6653' \
203 --set 'voltha.services.controller[2].service=voltha-infra1-onos-classic-2.voltha-infra1-onos-classic-hs.infra1.svc' \
204 --set 'voltha.services.controller[2].port=6653' \
205 --set 'voltha.services.controller[2].address=voltha-infra1-onos-classic-2.voltha-infra1-onos-classic-hs.infra1.svc:6653' \
206 --set global.log_level=WARN --namespace voltha voltha onf/voltha-stack
207
208ETCD
209----
210A `major` update is needed because tearing down the ETCD cluster means deleting the data stored,
211thus requiring a rebuild by the different components.
212
213The procedure is as follows:
214
215#. deploy a new ETCD cluster.
216#. un-provision all the subscribers via ONOS REST API
217#. delete the OLT device (not strictly required, but best to ensure clean state)
218#. redeploy the voltha stack with the `voltha-stack` `helm` chart pointing it to the new ETCD endpoints.
219#. re-provision the OLT
220#. re-provision the subscribers
221#. iterate over steps 2,3,4,5,6 for each stack connected to the ETCD cluster you want to update.
222
Andrea Campanellac18d1182021-09-10 12:01:38 +0200223Details on the `helm` commands for the voltha stack can be found in the `voltha-helm-charts README file <../voltha-helm-charts/README.md>`_
Andrea Campanella448fbc22021-05-13 15:39:00 +0200224
225Following is an example on how to deploy a new 3 node ETCD cluster:
226
227.. code:: bash
228
229 helm install --create-namespace --set auth.rbac.enabled=false,persistence.enabled=false,statefulset.replicaCount=3 --namespace infra etcd bitnami/etcd
230
231KAFKA Update
232============
233An update of Kafka is not considered to be a `major` software upgrade because it can be performed with
234no service impact to the user.
235
236.. code:: bash
237
238 helm install --create-namespace --set global.log_level=WARN --namespace infra kafka bitnami/kafka
239
240Following is an example on how to re-deploy the stack pods, using the `voltha-stack` chart,
241pointing new kafka (`voltha-infra-kafka-2.infra.svc`) endpoints.
242Each pod will be restarted but without dataplane interruption because it will be the same of a pod restart,
243thus leveraging the data stored in ETCD.
244
245.. code:: bash
246
247 helm upgrade --install --create-namespace \
248 --set global.topics.core_topic=voltha1_voltha1_rwcore,defaults.kv_store_data_prefix=service/minimal \
249 --set global.kv_store_data_prefix=service/voltha/voltha1_voltha1 \
250 --set services.etcd.port=2379 --set services.etcd.address=etcd.default.svc:2379 \
251 --set services.kafka.adapter.service=voltha-infra-kafka-2.infra.svc \
252 --set services.kafka.cluster.service=voltha-infra-kafka-2.infra.svc \
253 --set services.etcd.service=voltha-infra-etcd.infra.svc
254 --set 'voltha.services.controller[0].service=voltha-infra-onos-classic-0.voltha-infra-onos-classic-hs.infra.svc' \
255 --set 'voltha.services.controller[0].port=6653' \
256 --set 'voltha.services.controller[0].address=voltha-infra-onos-classic-0.voltha-infra-onos-classic-hs.infra.svc:6653' \
257 --set 'voltha.services.controller[1].service=voltha-infra-onos-classic-1.voltha-infra-onos-classic-hs.infra.svc' \
258 --set 'voltha.services.controller[1].port=6653' \
259 --set 'voltha.services.controller[1].address=voltha-infra-onos-classic-1.voltha-infra-onos-classic-hs.infra.svc:6653' \
260 --set 'voltha.services.controller[2].service=voltha-infra-onos-classic-2.voltha-infra-onos-classic-hs.infra.svc' \
261 --set 'voltha.services.controller[2].port=6653' \
262 --set 'voltha.services.controller[2].address=voltha-infra-onos-classic-2.voltha-infra-onos-classic-hs.infra.svc:6653' \
263 --set global.log_level=WARN --namespace voltha voltha onf/voltha
264
265
266.. _openolt-update:
267
268OpenOLT Agent Update
269====================
270
271The `openolt agent` on the box can be upgrade without having to teardown all the VOLTHA stack to which the OLT was
272connected. Again here we make the ditinction of a minor update and a major update of the openolt agent.
273A minor update happens when there is no API change between the `openolt agent` and the `openolt adapter`, meaning the
274`openolt.proto` has not been updated in either of those components.
275A major update is required when there are changes to the `openolt.proto` API.
276
277Both updates of the OpenOLT agent are service impacting for the customer.
278
279Minor Update
280------------
281A minor update will be seen from VOLTHA as a reboot of the OLT.
282During a minor update of the openolt agent no northbound should be done, in progress provision call will
283reconcile upon OLT reboot. Events, metrics and performance measurements data can be lost and should not be expected
284during this procedure.
285The procedure is as follows:
286
287#. place the new openolt agent `.deb` package on the desired OLT.
288#. stop the running `openolt`, `dev_mgmnt_deamon` and optionally the `watchdog` processes on the OLT.
289#. run the new openolt packages
290#. reboot the OLT hardware.
291
292After these steps are done VOLTHA will re-receive the OLT connection and re-provision data accordingly.
293
294Major update
295------------
296A major update will require the OLT to be deleted from VOLTHA to ensure no inconsistent data is stored.
297During a major update of the openolt agent and adapter no northbound should be done and
298in progress call will fail. Events, metrics and performance measurements data will be lost.
299The procedure is as follows:
300
301#. Delete the OLT device from VOLTHA (e.g. voltctl device delete <olt_id>)
302#. Upgrade the openolt-adapter to the new version via `helm upgrade`.
303#. place the new openolt agent `.deb` package on the desired OLT.
304#. stop the running `openolt`, `dev_mgmnt_deamon` and optionally the `watchdog` processes on the OLT.
305#. run the new openolt packages
306#. reboot the OLT hardware.
307#. re-provision the OLT (e.g. `voltctl device provision <ip:port>`
308#. re-enable the OLT (e.g. `voltctl device enable <olt_id>`
309#. re-provision the subscribers.
310
311After these steps VOLTHA effectively treats the OLT as a brand new one which it had no prior knowledge of.