Andrea Campanella | 448fbc2 | 2021-05-13 15:39:00 +0200 | [diff] [blame] | 1 | ============================================= |
| 2 | VOLTHA and ONOS software update procedures |
| 3 | ============================================= |
| 4 | |
Joey Armstrong | 342430f | 2024-04-10 10:36:34 -0400 | [diff] [blame] | 5 | This document describes the software upgrade procedure for VOLTHA and ONOS |
| 6 | in a deployed system. Distinction is made between a `minor` software upgrade, |
| 7 | which can be done in service, meaning with no dataplane service interruption |
| 8 | to existing customers, and a `major` software upgrade, which in turns requires |
| 9 | a full maintenance window during which service is impacted. |
Andrea Campanella | 448fbc2 | 2021-05-13 15:39:00 +0200 | [diff] [blame] | 10 | |
Joey Armstrong | 342430f | 2024-04-10 10:36:34 -0400 | [diff] [blame] | 11 | Changes to data-structures in storage (ETCD for VOLTHA and Atomix for ONOS) |
| 12 | are out of scope for in-service upgrades. Such changes qualify as “major” |
| 13 | software upgrades that require a maintenance windows. The KAFKA bus update |
| 14 | has its own section given that the procedure is different from the rest of |
| 15 | the components. The following elements expect a fully working provisioned |
| 16 | VOLTHA and ONOS deployment on top of a Kubernetes cluster, with exposed ONOS |
| 17 | REST API ports. It is also expected that new versions of the different |
| 18 | components are available to the operator that performs the upgrade. |
Andrea Campanella | 448fbc2 | 2021-05-13 15:39:00 +0200 | [diff] [blame] | 19 | |
| 20 | Minor Software Version Update |
| 21 | ============================= |
Joey Armstrong | 342430f | 2024-04-10 10:36:34 -0400 | [diff] [blame] | 22 | |
| 23 | The `minor` software upgrade qualifier refers to an upgrade that does not |
| 24 | involve API changes, which in VOLTHA, refers to either a change to the protos |
| 25 | or to voltha-lib-go, and in ONOS to a change in the Java interfaces, CLI |
| 26 | commands or REST APIs of either the Apps or the platform. A `minor` software |
| 27 | update is intended for bug fixes and not for new features. `Minor` software |
| 28 | update is supported only for ONOS apps and VOLTHA components. No in service |
| 29 | software update is supported for ETCD or Kafka. |
Andrea Campanella | 448fbc2 | 2021-05-13 15:39:00 +0200 | [diff] [blame] | 30 | |
| 31 | VOLTHA services |
| 32 | --------------- |
Joey Armstrong | 342430f | 2024-04-10 10:36:34 -0400 | [diff] [blame] | 33 | |
Andrea Campanella | 448fbc2 | 2021-05-13 15:39:00 +0200 | [diff] [blame] | 34 | VOLTHA components `minor` software upgrade leverages `helm` and `k8s`. |
Joey Armstrong | 342430f | 2024-04-10 10:36:34 -0400 | [diff] [blame] | 35 | During this process is expected that no provision subscriber call is |
| 36 | executed from the northbound. In process calls will be executed thanks to |
| 37 | the stored data and/or the persistence of messages over KAFKA. |
Andrea Campanella | 448fbc2 | 2021-05-13 15:39:00 +0200 | [diff] [blame] | 38 | |
| 39 | After changes in the code are made and verified the following steps are needed: |
| 40 | |
| 41 | #. Update Minor Version of the component |
| 42 | #. Build a new version of the needed component to update |
| 43 | #. update the component's minor version in the helm chart |
| 44 | #. | issue the helm upgrade command. If the changes have been already upstreamed to ONF the upstream chart |
| 45 | | `onf/<component name>` can be used, otherwise a local copy of the chart is required. |
| 46 | |
| 47 | Following is an example of the `helm` command to upgrade the openonu adapter. |
| 48 | Topics, kv store paths and kafka endpoints need to be adapted to the specific deployment. |
| 49 | |
| 50 | .. code:: bash |
| 51 | |
| 52 | helm upgrade --install --create-namespace \ |
| 53 | -n voltha1 opeonu-adapter onf/voltha-adapter-openonu \ |
| 54 | --set global.stack_name=voltha1 \ |
| 55 | --set adapter_open_onu.kv_store_data_prefix=service/voltha/voltha1_voltha1 \ |
| 56 | --set adapter_open_onu.topics.core_topic=voltha1_voltha1_rwcore \ |
| 57 | --set adapter_open_onu.topics.adapter_open_onu_topic=voltha1_voltha1_brcm_openomci_onu \ |
| 58 | --set services.kafka.adapter.service=voltha-infra-kafka.infra.svc \ |
| 59 | --set services.kafka.cluster.service=voltha-infra-kafka.infra.svc \ |
| 60 | --set services.etcd.service=voltha-infra-etcd.infra.svc |
| 61 | |
| 62 | ONOS apps |
| 63 | --------- |
| 64 | `Minor` software update is also available for the following ONOS apps - `sadis`, `olt`, `aaa`, `kafka`, `dhcpl2relay`, |
| 65 | `mac-learning`, `igmpproxy`, and `mcast`. These apps can be thus updated with no impact on the dataplane of provisioned |
| 66 | subscribers. The `minor` software update for the ONOS apps leverage existing ONOS REST APIs. |
| 67 | |
Joey Armstrong | 342430f | 2024-04-10 10:36:34 -0400 | [diff] [blame] | 68 | During this process is expected that no provision subscriber call is |
| 69 | executed from the REST APIs. In process calls will be executed thanks to |
| 70 | the Atomix stored flows. Some metrics and/or packet processing might be |
| 71 | lost during this procedure, the system relies on retry mechanisms present in |
| 72 | the services and the dataplane protocols for converging to a stable stated |
| 73 | (e.g. DHCP retry). |
Andrea Campanella | 448fbc2 | 2021-05-13 15:39:00 +0200 | [diff] [blame] | 74 | |
| 75 | |
| 76 | After changes in the code of ONOS apps are made and verified the following steps are needed: |
| 77 | |
| 78 | #. | obtain the .oar of the app, either via a local build with `mvn clean install` or, if the code has been upstreamed |
| 79 | | by downloading it from `maven central <https://search.maven.org/search?q=g:org.opencord>`_ or sonatype. |
| 80 | #. Delete the old version of the ONOS app. |
| 81 | #. Upload install and activate the new `oar` file. |
| 82 | |
| 83 | Following is an example of the different `curl` commands to upgrade the olt app. This assumes the .oar to be present in |
| 84 | the directory where the command is executed from/ |
| 85 | |
| 86 | .. code:: bash |
| 87 | |
| 88 | # download the app |
| 89 | curl --fail -sSL https://oss.sonatype.org/content/groups/public/org/opencord/olt-app/4.5.0-SNAPSHOT/olt-app-4.5.0-20210504.162620-3.oar > org.opencord.olt-4.5.0.SNAPSHOT.oar |
| 90 | # delete the app |
| 91 | curl --fail -sSL -X DELETE http://karaf:karaf@127.0.0.1:8181/onos/v1/applications/org.opencord.olt |
| 92 | # install and activate the new version of the app |
| 93 | curl --fail -sSL -H Content-Type:application/octet-stream -X POST http://karaf:karaf@127.0.0.1:8181/onos/v1/applications?activate=true --data-binary @org.opencord.olt-4.5.0.SNAPSHOT.oar 2>&1 |
| 94 | |
Andrea Campanella | e1a64ab | 2022-01-28 14:48:37 +0100 | [diff] [blame] | 95 | Minor Software Version Rollback Due To Failure |
| 96 | ---------------------------------------------- |
| 97 | |
| 98 | A `Minor` software upgrade can incur in failures and broken functionality. There are two possible cases, 1. container |
| 99 | does not start, 2. broken functionality during operations |
| 100 | |
| 101 | VOLTHA Component updated container does not start |
| 102 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 103 | |
| 104 | This is automatically handled by Kubernetes. An old version of the pod does not get |
| 105 | terminated unless the new one is running and ready according to its readiness probe. |
| 106 | No system or data-plane functionality is impacted. |
| 107 | |
| 108 | The operator will need to go in, manually delete the failing pod, fix the issue and re-deploy after |
| 109 | fixing the new `minor` version. |
| 110 | |
| 111 | VOLTHA Component Broken functionality during operations |
| 112 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 113 | |
| 114 | In this case the container started and became `ready` in Kubernetes but functionality of the system or data-plane |
| 115 | is broken, e.g. a subscriber can't be provisioned or no traffic is flowing. |
| 116 | |
| 117 | In this case the operator needs to perform a manual intervention, |
| 118 | rolling back to the previous minor version of the container. The rollback operation is the same as a `minor` software |
| 119 | update via `helm` but instead of increasing the version number it's a decrement of it to the last known running one. |
| 120 | |
| 121 | ONOS app not starting or broken functionality |
| 122 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
Hardik Windlass | 702b5fa | 2022-06-29 17:27:01 +0530 | [diff] [blame] | 123 | |
Andrea Campanella | e1a64ab | 2022-01-28 14:48:37 +0100 | [diff] [blame] | 124 | For ONOS apps a manual intervention is always necessary, both if the app does not start or if functionality is broken. |
| 125 | The rollback of an ONOS application is done by following the same procedure as the |
| 126 | update using the previous, or last known working, version of the `.oar` file. |
Andrea Campanella | 448fbc2 | 2021-05-13 15:39:00 +0200 | [diff] [blame] | 127 | |
Hardik Windlass | 702b5fa | 2022-06-29 17:27:01 +0530 | [diff] [blame] | 128 | Inter-dependency among changes submitted in different Components |
| 129 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 130 | |
| 131 | Even though it is expected that minor version upgrade should be seemless, |
| 132 | still there are chances that the changes that went in for a component are related with other component changes. |
| 133 | In this case the operator needs to perform a manual intervention, |
| 134 | and upgrade the components manually in desired order. |
| 135 | |
Andrea Campanella | 448fbc2 | 2021-05-13 15:39:00 +0200 | [diff] [blame] | 136 | Major Software Version Update |
| 137 | ============================= |
| 138 | A software update is qualified to be `major` where there are changes in the APIs or in the format of the |
| 139 | data stored by a component. |
| 140 | |
| 141 | A major software update at the moment in VOLTHA and ONOS requires a maintenance window |
| 142 | during which the dataplane for the subscribers is going to be interrupted, thus no service will be provided. |
| 143 | There are several cases and they can be handled differently. |
| 144 | |
| 145 | VOLTHA services API or Data format changes |
| 146 | ------------------------------------------ |
| 147 | A `major` update is needed because VOLTHA API between components have been changed or because format of the data being |
| 148 | stored is different, thus a complete-wipe out needs to be performed. |
| 149 | In such scenario each stack can be updated independently with no teardown required of the infrastructure of ONOS, |
| 150 | ETCD, KAFKA. |
| 151 | Different versions of Voltha can co-exists over the same infrastructure. |
| 152 | |
| 153 | The procedure is iterative on each stack and is performed as follows: |
| 154 | |
| 155 | #. un-provision all the subscribers via ONOS REST API. |
| 156 | #. delete all the OLTs managed by the stack via VOLTHA gRPC API. |
| 157 | #. upgrade the stack version via `helm` upgrade command and the correct version of the `voltha-stack` chart. |
| 158 | |
Andrea Campanella | c18d118 | 2021-09-10 12:01:38 +0200 | [diff] [blame] | 159 | Details on the `helm` commands can be found in the voltha-helm-charts README file <voltha-helm-charts/README.md>_ |
Andrea Campanella | 448fbc2 | 2021-05-13 15:39:00 +0200 | [diff] [blame] | 160 | |
| 161 | If the API change is between the `openolt adapter` and the `openolt agent` on the OLT hardware please refer to section |
| 162 | :ref:`OpenOLT Agent Update <openolt-update>`. |
| 163 | |
| 164 | |
| 165 | ONOS, Atomix or ONOS apps |
| 166 | ------------------------- |
| 167 | A `major` update is needed because of changes in the interfaces (Java APIs), REST APIs, of ONOS itself or in one |
| 168 | of the apps have been made, rendering incompatible the two subsequent implementations. A `major` software update is |
| 169 | also needed for changes made to the data stored in Atomix or for an update of the Atomix version iself. |
| 170 | In this scenario all the stacks connected to an ONOS instance need to be cleaned of data before moving them |
| 171 | over to a new ONOS cluster. |
| 172 | |
| 173 | The procedure is as follows: |
| 174 | |
| 175 | #. deploy a new ONOS cluster in a new namespace `infra1` |
| 176 | #. un-provision all the subscribers via ONOS REST API |
| 177 | #. delete the OLT device (not strictly required, but best to ensure clean state) |
| 178 | #. redeploy the of-agent with the new ONOS cluster endpoints |
| 179 | #. re-provision the OLT |
| 180 | #. re-provision the subscribers |
| 181 | #. iterate over steps 2,3,4,5,6 for each of the stack connected to the ONOS you want to update. |
| 182 | |
| 183 | Following is an example on how to deploy ONOS: |
| 184 | |
| 185 | .. code:: bash |
| 186 | |
| 187 | helm install --create-namespace \ |
| 188 | --set replicas=3,atomix.replicas=3 \ |
| 189 | --set atomix.persistence.enabled=false \ |
| 190 | --set image.pullPolicy=Always,image.repository=voltha/voltha-onos,image.tag=5.0.0 \ |
| 191 | --namespace infra1 onos onos/onos-classic |
| 192 | |
| 193 | Following is an example on how to re-deploy the of-agent, using the `voltha-stack` chart, |
| 194 | pointing new controller endpoints. Only the `ofagent` pod will be restarted. |
| 195 | |
| 196 | .. code:: bash |
| 197 | |
| 198 | helm upgrade --install --create-namespace \ |
| 199 | --set global.topics.core_topic=voltha1_voltha1_rwcore,defaults.kv_store_data_prefix=service/minimal \ |
| 200 | --set global.kv_store_data_prefix=service/voltha/voltha1_voltha1 \ |
| 201 | --set services.etcd.port=2379 --set services.etcd.address=etcd.default.svc:2379 \ |
| 202 | --set services.kafka.adapter.service=voltha-infra-kafka.infra.svc \ |
| 203 | --set services.kafka.cluster.service=voltha-infra-kafka.infra.svc \ |
| 204 | --set services.etcd.service=voltha-infra-etcd.infra.svc |
| 205 | --set 'voltha.services.controller[0].service=voltha-infra1-onos-classic-0.voltha-infra1-onos-classic-hs.infra1.svc' \ |
| 206 | --set 'voltha.services.controller[0].port=6653' \ |
| 207 | --set 'voltha.services.controller[0].address=voltha-infra1-onos-classic-0.voltha-infra1-onos-classic-hs.infra1.svc:6653' \ |
| 208 | --set 'voltha.services.controller[1].service=voltha-infra1-onos-classic-1.voltha-infra1-onos-classic-hs.infra1.svc' \ |
| 209 | --set 'voltha.services.controller[1].port=6653' \ |
| 210 | --set 'voltha.services.controller[1].address=voltha-infra1-onos-classic-1.voltha-infra1-onos-classic-hs.infra1.svc:6653' \ |
| 211 | --set 'voltha.services.controller[2].service=voltha-infra1-onos-classic-2.voltha-infra1-onos-classic-hs.infra1.svc' \ |
| 212 | --set 'voltha.services.controller[2].port=6653' \ |
| 213 | --set 'voltha.services.controller[2].address=voltha-infra1-onos-classic-2.voltha-infra1-onos-classic-hs.infra1.svc:6653' \ |
| 214 | --set global.log_level=WARN --namespace voltha voltha onf/voltha-stack |
| 215 | |
| 216 | ETCD |
| 217 | ---- |
| 218 | A `major` update is needed because tearing down the ETCD cluster means deleting the data stored, |
| 219 | thus requiring a rebuild by the different components. |
| 220 | |
| 221 | The procedure is as follows: |
| 222 | |
| 223 | #. deploy a new ETCD cluster. |
| 224 | #. un-provision all the subscribers via ONOS REST API |
| 225 | #. delete the OLT device (not strictly required, but best to ensure clean state) |
| 226 | #. redeploy the voltha stack with the `voltha-stack` `helm` chart pointing it to the new ETCD endpoints. |
| 227 | #. re-provision the OLT |
| 228 | #. re-provision the subscribers |
| 229 | #. iterate over steps 2,3,4,5,6 for each stack connected to the ETCD cluster you want to update. |
| 230 | |
Andrea Campanella | c18d118 | 2021-09-10 12:01:38 +0200 | [diff] [blame] | 231 | Details on the `helm` commands for the voltha stack can be found in the `voltha-helm-charts README file <../voltha-helm-charts/README.md>`_ |
Andrea Campanella | 448fbc2 | 2021-05-13 15:39:00 +0200 | [diff] [blame] | 232 | |
| 233 | Following is an example on how to deploy a new 3 node ETCD cluster: |
| 234 | |
| 235 | .. code:: bash |
| 236 | |
| 237 | helm install --create-namespace --set auth.rbac.enabled=false,persistence.enabled=false,statefulset.replicaCount=3 --namespace infra etcd bitnami/etcd |
| 238 | |
| 239 | KAFKA Update |
| 240 | ============ |
| 241 | An update of Kafka is not considered to be a `major` software upgrade because it can be performed with |
| 242 | no service impact to the user. |
| 243 | |
| 244 | .. code:: bash |
| 245 | |
| 246 | helm install --create-namespace --set global.log_level=WARN --namespace infra kafka bitnami/kafka |
| 247 | |
| 248 | Following is an example on how to re-deploy the stack pods, using the `voltha-stack` chart, |
| 249 | pointing new kafka (`voltha-infra-kafka-2.infra.svc`) endpoints. |
| 250 | Each pod will be restarted but without dataplane interruption because it will be the same of a pod restart, |
| 251 | thus leveraging the data stored in ETCD. |
| 252 | |
| 253 | .. code:: bash |
| 254 | |
| 255 | helm upgrade --install --create-namespace \ |
| 256 | --set global.topics.core_topic=voltha1_voltha1_rwcore,defaults.kv_store_data_prefix=service/minimal \ |
| 257 | --set global.kv_store_data_prefix=service/voltha/voltha1_voltha1 \ |
| 258 | --set services.etcd.port=2379 --set services.etcd.address=etcd.default.svc:2379 \ |
| 259 | --set services.kafka.adapter.service=voltha-infra-kafka-2.infra.svc \ |
| 260 | --set services.kafka.cluster.service=voltha-infra-kafka-2.infra.svc \ |
| 261 | --set services.etcd.service=voltha-infra-etcd.infra.svc |
| 262 | --set 'voltha.services.controller[0].service=voltha-infra-onos-classic-0.voltha-infra-onos-classic-hs.infra.svc' \ |
| 263 | --set 'voltha.services.controller[0].port=6653' \ |
| 264 | --set 'voltha.services.controller[0].address=voltha-infra-onos-classic-0.voltha-infra-onos-classic-hs.infra.svc:6653' \ |
| 265 | --set 'voltha.services.controller[1].service=voltha-infra-onos-classic-1.voltha-infra-onos-classic-hs.infra.svc' \ |
| 266 | --set 'voltha.services.controller[1].port=6653' \ |
| 267 | --set 'voltha.services.controller[1].address=voltha-infra-onos-classic-1.voltha-infra-onos-classic-hs.infra.svc:6653' \ |
| 268 | --set 'voltha.services.controller[2].service=voltha-infra-onos-classic-2.voltha-infra-onos-classic-hs.infra.svc' \ |
| 269 | --set 'voltha.services.controller[2].port=6653' \ |
| 270 | --set 'voltha.services.controller[2].address=voltha-infra-onos-classic-2.voltha-infra-onos-classic-hs.infra.svc:6653' \ |
| 271 | --set global.log_level=WARN --namespace voltha voltha onf/voltha |
| 272 | |
| 273 | |
| 274 | .. _openolt-update: |
| 275 | |
| 276 | OpenOLT Agent Update |
| 277 | ==================== |
| 278 | |
| 279 | The `openolt agent` on the box can be upgrade without having to teardown all the VOLTHA stack to which the OLT was |
| 280 | connected. Again here we make the ditinction of a minor update and a major update of the openolt agent. |
| 281 | A minor update happens when there is no API change between the `openolt agent` and the `openolt adapter`, meaning the |
| 282 | `openolt.proto` has not been updated in either of those components. |
| 283 | A major update is required when there are changes to the `openolt.proto` API. |
| 284 | |
| 285 | Both updates of the OpenOLT agent are service impacting for the customer. |
| 286 | |
| 287 | Minor Update |
| 288 | ------------ |
| 289 | A minor update will be seen from VOLTHA as a reboot of the OLT. |
| 290 | During a minor update of the openolt agent no northbound should be done, in progress provision call will |
| 291 | reconcile upon OLT reboot. Events, metrics and performance measurements data can be lost and should not be expected |
| 292 | during this procedure. |
| 293 | The procedure is as follows: |
| 294 | |
| 295 | #. place the new openolt agent `.deb` package on the desired OLT. |
| 296 | #. stop the running `openolt`, `dev_mgmnt_deamon` and optionally the `watchdog` processes on the OLT. |
| 297 | #. run the new openolt packages |
| 298 | #. reboot the OLT hardware. |
| 299 | |
| 300 | After these steps are done VOLTHA will re-receive the OLT connection and re-provision data accordingly. |
| 301 | |
| 302 | Major update |
| 303 | ------------ |
| 304 | A major update will require the OLT to be deleted from VOLTHA to ensure no inconsistent data is stored. |
| 305 | During a major update of the openolt agent and adapter no northbound should be done and |
| 306 | in progress call will fail. Events, metrics and performance measurements data will be lost. |
| 307 | The procedure is as follows: |
| 308 | |
| 309 | #. Delete the OLT device from VOLTHA (e.g. voltctl device delete <olt_id>) |
| 310 | #. Upgrade the openolt-adapter to the new version via `helm upgrade`. |
| 311 | #. place the new openolt agent `.deb` package on the desired OLT. |
| 312 | #. stop the running `openolt`, `dev_mgmnt_deamon` and optionally the `watchdog` processes on the OLT. |
| 313 | #. run the new openolt packages |
| 314 | #. reboot the OLT hardware. |
| 315 | #. re-provision the OLT (e.g. `voltctl device provision <ip:port>` |
| 316 | #. re-enable the OLT (e.g. `voltctl device enable <olt_id>` |
| 317 | #. re-provision the subscribers. |
| 318 | |
| 319 | After these steps VOLTHA effectively treats the OLT as a brand new one which it had no prior knowledge of. |