blob: 5d9b59217c61a744cb039b0d601c2187725794cc [file] [log] [blame]
Andrea Campanella448fbc22021-05-13 15:39:00 +02001=============================================
2VOLTHA and ONOS software update procedures
3=============================================
4
5This document describes the software upgrade procedure for VOLTHA and ONOS in a deployed system.
6Distinction is made between a `minor` software upgrade, which can be done in service,
7meaning with no dataplane service interruption to existing customers, and a `major` software upgrade,
8which in turns requires a full maintenance window during which service is impacted.
9
10Changes to data-structures in storage (ETCD for VOLTHA and Atomix for ONOS) are out of scope for in-service upgrades.
11Such changes qualify as major software upgrades that require a maintenance windows.
12The KAFKA bus update has its own section given that the procedure is different from the rest of the components.
13The following elements expect a fully working provisioned VOLTHA and ONOS deployment on top of a Kubernetes cluster,
14with exposed ONOS REST API ports.
15It is also expected that new versions of the different components are available to the operator that performs
16the upgrade.
17
18Minor Software Version Update
19=============================
20The `minor` software upgrade qualifier refers to an upgrade that does not involve API
21changes, which in VOLTHA, refers to either a change to the protos or to voltha-lib-go,
22and in ONOS to a change in the Java interfaces, CLI commands or REST APIs of either the Apps or the platform.
23A `minor` software update is intended for bug fixes and not for new features.
24`Minor` software update is supported only for ONOS apps and VOLTHA components. No in service software update
25is supported for ETCD or Kafka.
26
27VOLTHA services
28---------------
29VOLTHA components `minor` software upgrade leverages `helm` and `k8s`.
30During this process is expected that no provision subscriber call is executed from the northbound.
31In process calls will be executed thanks to the stored data and/or the persistence of messages over KAFKA.
32
33After changes in the code are made and verified the following steps are needed:
34
35#. Update Minor Version of the component
36#. Build a new version of the needed component to update
37#. update the component's minor version in the helm chart
38#. | issue the helm upgrade command. If the changes have been already upstreamed to ONF the upstream chart
39 | `onf/<component name>` can be used, otherwise a local copy of the chart is required.
40
41Following is an example of the `helm` command to upgrade the openonu adapter.
42Topics, kv store paths and kafka endpoints need to be adapted to the specific deployment.
43
44.. code:: bash
45
46 helm upgrade --install --create-namespace \
47 -n voltha1 opeonu-adapter onf/voltha-adapter-openonu \
48 --set global.stack_name=voltha1 \
49 --set adapter_open_onu.kv_store_data_prefix=service/voltha/voltha1_voltha1 \
50 --set adapter_open_onu.topics.core_topic=voltha1_voltha1_rwcore \
51 --set adapter_open_onu.topics.adapter_open_onu_topic=voltha1_voltha1_brcm_openomci_onu \
52 --set services.kafka.adapter.service=voltha-infra-kafka.infra.svc \
53 --set services.kafka.cluster.service=voltha-infra-kafka.infra.svc \
54 --set services.etcd.service=voltha-infra-etcd.infra.svc
55
56ONOS apps
57---------
58`Minor` software update is also available for the following ONOS apps - `sadis`, `olt`, `aaa`, `kafka`, `dhcpl2relay`,
59`mac-learning`, `igmpproxy`, and `mcast`. These apps can be thus updated with no impact on the dataplane of provisioned
60subscribers. The `minor` software update for the ONOS apps leverage existing ONOS REST APIs.
61
62During this process is expected that no provision subscriber call is executed from the REST APIs.
63In process calls will be executed thanks to the Atomix stored flows.
64Some metrics and/or packet processing might be lost during this procedure, the system relies on retry mechanisms
65present in the services and the dataplane protocols for converging to a stable stated (e.g. DHCP retry)
66
67
68After changes in the code of ONOS apps are made and verified the following steps are needed:
69
70#. | obtain the .oar of the app, either via a local build with `mvn clean install` or, if the code has been upstreamed
71 | by downloading it from `maven central <https://search.maven.org/search?q=g:org.opencord>`_ or sonatype.
72#. Delete the old version of the ONOS app.
73#. Upload install and activate the new `oar` file.
74
75Following is an example of the different `curl` commands to upgrade the olt app. This assumes the .oar to be present in
76the directory where the command is executed from/
77
78.. code:: bash
79
80 # download the app
81 curl --fail -sSL https://oss.sonatype.org/content/groups/public/org/opencord/olt-app/4.5.0-SNAPSHOT/olt-app-4.5.0-20210504.162620-3.oar > org.opencord.olt-4.5.0.SNAPSHOT.oar
82 # delete the app
83 curl --fail -sSL -X DELETE http://karaf:karaf@127.0.0.1:8181/onos/v1/applications/org.opencord.olt
84 # install and activate the new version of the app
85 curl --fail -sSL -H Content-Type:application/octet-stream -X POST http://karaf:karaf@127.0.0.1:8181/onos/v1/applications?activate=true --data-binary @org.opencord.olt-4.5.0.SNAPSHOT.oar 2>&1
86
Andrea Campanellae1a64ab2022-01-28 14:48:37 +010087Minor Software Version Rollback Due To Failure
88----------------------------------------------
89
90A `Minor` software upgrade can incur in failures and broken functionality. There are two possible cases, 1. container
91does not start, 2. broken functionality during operations
92
93VOLTHA Component updated container does not start
94^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
95
96This is automatically handled by Kubernetes. An old version of the pod does not get
97terminated unless the new one is running and ready according to its readiness probe.
98No system or data-plane functionality is impacted.
99
100The operator will need to go in, manually delete the failing pod, fix the issue and re-deploy after
101fixing the new `minor` version.
102
103VOLTHA Component Broken functionality during operations
104^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
105
106In this case the container started and became `ready` in Kubernetes but functionality of the system or data-plane
107is broken, e.g. a subscriber can't be provisioned or no traffic is flowing.
108
109In this case the operator needs to perform a manual intervention,
110rolling back to the previous minor version of the container. The rollback operation is the same as a `minor` software
111update via `helm` but instead of increasing the version number it's a decrement of it to the last known running one.
112
113ONOS app not starting or broken functionality
114^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
115For ONOS apps a manual intervention is always necessary, both if the app does not start or if functionality is broken.
116The rollback of an ONOS application is done by following the same procedure as the
117update using the previous, or last known working, version of the `.oar` file.
Andrea Campanella448fbc22021-05-13 15:39:00 +0200118
119Major Software Version Update
120=============================
121A software update is qualified to be `major` where there are changes in the APIs or in the format of the
122data stored by a component.
123
124A major software update at the moment in VOLTHA and ONOS requires a maintenance window
125during which the dataplane for the subscribers is going to be interrupted, thus no service will be provided.
126There are several cases and they can be handled differently.
127
128VOLTHA services API or Data format changes
129------------------------------------------
130A `major` update is needed because VOLTHA API between components have been changed or because format of the data being
131stored is different, thus a complete-wipe out needs to be performed.
132In such scenario each stack can be updated independently with no teardown required of the infrastructure of ONOS,
133ETCD, KAFKA.
134Different versions of Voltha can co-exists over the same infrastructure.
135
136The procedure is iterative on each stack and is performed as follows:
137
138#. un-provision all the subscribers via ONOS REST API.
139#. delete all the OLTs managed by the stack via VOLTHA gRPC API.
140#. upgrade the stack version via `helm` upgrade command and the correct version of the `voltha-stack` chart.
141
Andrea Campanellac18d1182021-09-10 12:01:38 +0200142Details on the `helm` commands can be found in the voltha-helm-charts README file <voltha-helm-charts/README.md>_
Andrea Campanella448fbc22021-05-13 15:39:00 +0200143
144If the API change is between the `openolt adapter` and the `openolt agent` on the OLT hardware please refer to section
145:ref:`OpenOLT Agent Update <openolt-update>`.
146
147
148ONOS, Atomix or ONOS apps
149-------------------------
150A `major` update is needed because of changes in the interfaces (Java APIs), REST APIs, of ONOS itself or in one
151of the apps have been made, rendering incompatible the two subsequent implementations. A `major` software update is
152also needed for changes made to the data stored in Atomix or for an update of the Atomix version iself.
153In this scenario all the stacks connected to an ONOS instance need to be cleaned of data before moving them
154over to a new ONOS cluster.
155
156The procedure is as follows:
157
158#. deploy a new ONOS cluster in a new namespace `infra1`
159#. un-provision all the subscribers via ONOS REST API
160#. delete the OLT device (not strictly required, but best to ensure clean state)
161#. redeploy the of-agent with the new ONOS cluster endpoints
162#. re-provision the OLT
163#. re-provision the subscribers
164#. iterate over steps 2,3,4,5,6 for each of the stack connected to the ONOS you want to update.
165
166Following is an example on how to deploy ONOS:
167
168.. code:: bash
169
170 helm install --create-namespace \
171 --set replicas=3,atomix.replicas=3 \
172 --set atomix.persistence.enabled=false \
173 --set image.pullPolicy=Always,image.repository=voltha/voltha-onos,image.tag=5.0.0 \
174 --namespace infra1 onos onos/onos-classic
175
176Following is an example on how to re-deploy the of-agent, using the `voltha-stack` chart,
177pointing new controller endpoints. Only the `ofagent` pod will be restarted.
178
179.. code:: bash
180
181 helm upgrade --install --create-namespace \
182 --set global.topics.core_topic=voltha1_voltha1_rwcore,defaults.kv_store_data_prefix=service/minimal \
183 --set global.kv_store_data_prefix=service/voltha/voltha1_voltha1 \
184 --set services.etcd.port=2379 --set services.etcd.address=etcd.default.svc:2379 \
185 --set services.kafka.adapter.service=voltha-infra-kafka.infra.svc \
186 --set services.kafka.cluster.service=voltha-infra-kafka.infra.svc \
187 --set services.etcd.service=voltha-infra-etcd.infra.svc
188 --set 'voltha.services.controller[0].service=voltha-infra1-onos-classic-0.voltha-infra1-onos-classic-hs.infra1.svc' \
189 --set 'voltha.services.controller[0].port=6653' \
190 --set 'voltha.services.controller[0].address=voltha-infra1-onos-classic-0.voltha-infra1-onos-classic-hs.infra1.svc:6653' \
191 --set 'voltha.services.controller[1].service=voltha-infra1-onos-classic-1.voltha-infra1-onos-classic-hs.infra1.svc' \
192 --set 'voltha.services.controller[1].port=6653' \
193 --set 'voltha.services.controller[1].address=voltha-infra1-onos-classic-1.voltha-infra1-onos-classic-hs.infra1.svc:6653' \
194 --set 'voltha.services.controller[2].service=voltha-infra1-onos-classic-2.voltha-infra1-onos-classic-hs.infra1.svc' \
195 --set 'voltha.services.controller[2].port=6653' \
196 --set 'voltha.services.controller[2].address=voltha-infra1-onos-classic-2.voltha-infra1-onos-classic-hs.infra1.svc:6653' \
197 --set global.log_level=WARN --namespace voltha voltha onf/voltha-stack
198
199ETCD
200----
201A `major` update is needed because tearing down the ETCD cluster means deleting the data stored,
202thus requiring a rebuild by the different components.
203
204The procedure is as follows:
205
206#. deploy a new ETCD cluster.
207#. un-provision all the subscribers via ONOS REST API
208#. delete the OLT device (not strictly required, but best to ensure clean state)
209#. redeploy the voltha stack with the `voltha-stack` `helm` chart pointing it to the new ETCD endpoints.
210#. re-provision the OLT
211#. re-provision the subscribers
212#. iterate over steps 2,3,4,5,6 for each stack connected to the ETCD cluster you want to update.
213
Andrea Campanellac18d1182021-09-10 12:01:38 +0200214Details on the `helm` commands for the voltha stack can be found in the `voltha-helm-charts README file <../voltha-helm-charts/README.md>`_
Andrea Campanella448fbc22021-05-13 15:39:00 +0200215
216Following is an example on how to deploy a new 3 node ETCD cluster:
217
218.. code:: bash
219
220 helm install --create-namespace --set auth.rbac.enabled=false,persistence.enabled=false,statefulset.replicaCount=3 --namespace infra etcd bitnami/etcd
221
222KAFKA Update
223============
224An update of Kafka is not considered to be a `major` software upgrade because it can be performed with
225no service impact to the user.
226
227.. code:: bash
228
229 helm install --create-namespace --set global.log_level=WARN --namespace infra kafka bitnami/kafka
230
231Following is an example on how to re-deploy the stack pods, using the `voltha-stack` chart,
232pointing new kafka (`voltha-infra-kafka-2.infra.svc`) endpoints.
233Each pod will be restarted but without dataplane interruption because it will be the same of a pod restart,
234thus leveraging the data stored in ETCD.
235
236.. code:: bash
237
238 helm upgrade --install --create-namespace \
239 --set global.topics.core_topic=voltha1_voltha1_rwcore,defaults.kv_store_data_prefix=service/minimal \
240 --set global.kv_store_data_prefix=service/voltha/voltha1_voltha1 \
241 --set services.etcd.port=2379 --set services.etcd.address=etcd.default.svc:2379 \
242 --set services.kafka.adapter.service=voltha-infra-kafka-2.infra.svc \
243 --set services.kafka.cluster.service=voltha-infra-kafka-2.infra.svc \
244 --set services.etcd.service=voltha-infra-etcd.infra.svc
245 --set 'voltha.services.controller[0].service=voltha-infra-onos-classic-0.voltha-infra-onos-classic-hs.infra.svc' \
246 --set 'voltha.services.controller[0].port=6653' \
247 --set 'voltha.services.controller[0].address=voltha-infra-onos-classic-0.voltha-infra-onos-classic-hs.infra.svc:6653' \
248 --set 'voltha.services.controller[1].service=voltha-infra-onos-classic-1.voltha-infra-onos-classic-hs.infra.svc' \
249 --set 'voltha.services.controller[1].port=6653' \
250 --set 'voltha.services.controller[1].address=voltha-infra-onos-classic-1.voltha-infra-onos-classic-hs.infra.svc:6653' \
251 --set 'voltha.services.controller[2].service=voltha-infra-onos-classic-2.voltha-infra-onos-classic-hs.infra.svc' \
252 --set 'voltha.services.controller[2].port=6653' \
253 --set 'voltha.services.controller[2].address=voltha-infra-onos-classic-2.voltha-infra-onos-classic-hs.infra.svc:6653' \
254 --set global.log_level=WARN --namespace voltha voltha onf/voltha
255
256
257.. _openolt-update:
258
259OpenOLT Agent Update
260====================
261
262The `openolt agent` on the box can be upgrade without having to teardown all the VOLTHA stack to which the OLT was
263connected. Again here we make the ditinction of a minor update and a major update of the openolt agent.
264A minor update happens when there is no API change between the `openolt agent` and the `openolt adapter`, meaning the
265`openolt.proto` has not been updated in either of those components.
266A major update is required when there are changes to the `openolt.proto` API.
267
268Both updates of the OpenOLT agent are service impacting for the customer.
269
270Minor Update
271------------
272A minor update will be seen from VOLTHA as a reboot of the OLT.
273During a minor update of the openolt agent no northbound should be done, in progress provision call will
274reconcile upon OLT reboot. Events, metrics and performance measurements data can be lost and should not be expected
275during this procedure.
276The procedure is as follows:
277
278#. place the new openolt agent `.deb` package on the desired OLT.
279#. stop the running `openolt`, `dev_mgmnt_deamon` and optionally the `watchdog` processes on the OLT.
280#. run the new openolt packages
281#. reboot the OLT hardware.
282
283After these steps are done VOLTHA will re-receive the OLT connection and re-provision data accordingly.
284
285Major update
286------------
287A major update will require the OLT to be deleted from VOLTHA to ensure no inconsistent data is stored.
288During a major update of the openolt agent and adapter no northbound should be done and
289in progress call will fail. Events, metrics and performance measurements data will be lost.
290The procedure is as follows:
291
292#. Delete the OLT device from VOLTHA (e.g. voltctl device delete <olt_id>)
293#. Upgrade the openolt-adapter to the new version via `helm upgrade`.
294#. place the new openolt agent `.deb` package on the desired OLT.
295#. stop the running `openolt`, `dev_mgmnt_deamon` and optionally the `watchdog` processes on the OLT.
296#. run the new openolt packages
297#. reboot the OLT hardware.
298#. re-provision the OLT (e.g. `voltctl device provision <ip:port>`
299#. re-enable the OLT (e.g. `voltctl device enable <olt_id>`
300#. re-provision the subscribers.
301
302After these steps VOLTHA effectively treats the OLT as a brand new one which it had no prior knowledge of.