Andrea Campanella | 448fbc2 | 2021-05-13 15:39:00 +0200 | [diff] [blame] | 1 | ============================================= |
| 2 | VOLTHA and ONOS software update procedures |
| 3 | ============================================= |
| 4 | |
| 5 | This document describes the software upgrade procedure for VOLTHA and ONOS in a deployed system. |
| 6 | Distinction is made between a `minor` software upgrade, which can be done in service, |
| 7 | meaning with no dataplane service interruption to existing customers, and a `major` software upgrade, |
| 8 | which in turns requires a full maintenance window during which service is impacted. |
| 9 | |
| 10 | Changes to data-structures in storage (ETCD for VOLTHA and Atomix for ONOS) are out of scope for in-service upgrades. |
| 11 | Such changes qualify as “major” software upgrades that require a maintenance windows. |
| 12 | The KAFKA bus update has its own section given that the procedure is different from the rest of the components. |
| 13 | The following elements expect a fully working provisioned VOLTHA and ONOS deployment on top of a Kubernetes cluster, |
| 14 | with exposed ONOS REST API ports. |
| 15 | It is also expected that new versions of the different components are available to the operator that performs |
| 16 | the upgrade. |
| 17 | |
| 18 | Minor Software Version Update |
| 19 | ============================= |
| 20 | The `minor` software upgrade qualifier refers to an upgrade that does not involve API |
| 21 | changes, which in VOLTHA, refers to either a change to the protos or to voltha-lib-go, |
| 22 | and in ONOS to a change in the Java interfaces, CLI commands or REST APIs of either the Apps or the platform. |
| 23 | A `minor` software update is intended for bug fixes and not for new features. |
| 24 | `Minor` software update is supported only for ONOS apps and VOLTHA components. No in service software update |
| 25 | is supported for ETCD or Kafka. |
| 26 | |
| 27 | VOLTHA services |
| 28 | --------------- |
| 29 | VOLTHA components `minor` software upgrade leverages `helm` and `k8s`. |
| 30 | During this process is expected that no provision subscriber call is executed from the northbound. |
| 31 | In process calls will be executed thanks to the stored data and/or the persistence of messages over KAFKA. |
| 32 | |
| 33 | After changes in the code are made and verified the following steps are needed: |
| 34 | |
| 35 | #. Update Minor Version of the component |
| 36 | #. Build a new version of the needed component to update |
| 37 | #. update the component's minor version in the helm chart |
| 38 | #. | issue the helm upgrade command. If the changes have been already upstreamed to ONF the upstream chart |
| 39 | | `onf/<component name>` can be used, otherwise a local copy of the chart is required. |
| 40 | |
| 41 | Following is an example of the `helm` command to upgrade the openonu adapter. |
| 42 | Topics, kv store paths and kafka endpoints need to be adapted to the specific deployment. |
| 43 | |
| 44 | .. code:: bash |
| 45 | |
| 46 | helm upgrade --install --create-namespace \ |
| 47 | -n voltha1 opeonu-adapter onf/voltha-adapter-openonu \ |
| 48 | --set global.stack_name=voltha1 \ |
| 49 | --set adapter_open_onu.kv_store_data_prefix=service/voltha/voltha1_voltha1 \ |
| 50 | --set adapter_open_onu.topics.core_topic=voltha1_voltha1_rwcore \ |
| 51 | --set adapter_open_onu.topics.adapter_open_onu_topic=voltha1_voltha1_brcm_openomci_onu \ |
| 52 | --set services.kafka.adapter.service=voltha-infra-kafka.infra.svc \ |
| 53 | --set services.kafka.cluster.service=voltha-infra-kafka.infra.svc \ |
| 54 | --set services.etcd.service=voltha-infra-etcd.infra.svc |
| 55 | |
| 56 | ONOS apps |
| 57 | --------- |
| 58 | `Minor` software update is also available for the following ONOS apps - `sadis`, `olt`, `aaa`, `kafka`, `dhcpl2relay`, |
| 59 | `mac-learning`, `igmpproxy`, and `mcast`. These apps can be thus updated with no impact on the dataplane of provisioned |
| 60 | subscribers. The `minor` software update for the ONOS apps leverage existing ONOS REST APIs. |
| 61 | |
| 62 | During this process is expected that no provision subscriber call is executed from the REST APIs. |
| 63 | In process calls will be executed thanks to the Atomix stored flows. |
| 64 | Some metrics and/or packet processing might be lost during this procedure, the system relies on retry mechanisms |
| 65 | present in the services and the dataplane protocols for converging to a stable stated (e.g. DHCP retry) |
| 66 | |
| 67 | |
| 68 | After changes in the code of ONOS apps are made and verified the following steps are needed: |
| 69 | |
| 70 | #. | obtain the .oar of the app, either via a local build with `mvn clean install` or, if the code has been upstreamed |
| 71 | | by downloading it from `maven central <https://search.maven.org/search?q=g:org.opencord>`_ or sonatype. |
| 72 | #. Delete the old version of the ONOS app. |
| 73 | #. Upload install and activate the new `oar` file. |
| 74 | |
| 75 | Following is an example of the different `curl` commands to upgrade the olt app. This assumes the .oar to be present in |
| 76 | the directory where the command is executed from/ |
| 77 | |
| 78 | .. code:: bash |
| 79 | |
| 80 | # download the app |
| 81 | curl --fail -sSL https://oss.sonatype.org/content/groups/public/org/opencord/olt-app/4.5.0-SNAPSHOT/olt-app-4.5.0-20210504.162620-3.oar > org.opencord.olt-4.5.0.SNAPSHOT.oar |
| 82 | # delete the app |
| 83 | curl --fail -sSL -X DELETE http://karaf:karaf@127.0.0.1:8181/onos/v1/applications/org.opencord.olt |
| 84 | # install and activate the new version of the app |
| 85 | curl --fail -sSL -H Content-Type:application/octet-stream -X POST http://karaf:karaf@127.0.0.1:8181/onos/v1/applications?activate=true --data-binary @org.opencord.olt-4.5.0.SNAPSHOT.oar 2>&1 |
| 86 | |
Andrea Campanella | e1a64ab | 2022-01-28 14:48:37 +0100 | [diff] [blame] | 87 | Minor Software Version Rollback Due To Failure |
| 88 | ---------------------------------------------- |
| 89 | |
| 90 | A `Minor` software upgrade can incur in failures and broken functionality. There are two possible cases, 1. container |
| 91 | does not start, 2. broken functionality during operations |
| 92 | |
| 93 | VOLTHA Component updated container does not start |
| 94 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 95 | |
| 96 | This is automatically handled by Kubernetes. An old version of the pod does not get |
| 97 | terminated unless the new one is running and ready according to its readiness probe. |
| 98 | No system or data-plane functionality is impacted. |
| 99 | |
| 100 | The operator will need to go in, manually delete the failing pod, fix the issue and re-deploy after |
| 101 | fixing the new `minor` version. |
| 102 | |
| 103 | VOLTHA Component Broken functionality during operations |
| 104 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 105 | |
| 106 | In this case the container started and became `ready` in Kubernetes but functionality of the system or data-plane |
| 107 | is broken, e.g. a subscriber can't be provisioned or no traffic is flowing. |
| 108 | |
| 109 | In this case the operator needs to perform a manual intervention, |
| 110 | rolling back to the previous minor version of the container. The rollback operation is the same as a `minor` software |
| 111 | update via `helm` but instead of increasing the version number it's a decrement of it to the last known running one. |
| 112 | |
| 113 | ONOS app not starting or broken functionality |
| 114 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 115 | For ONOS apps a manual intervention is always necessary, both if the app does not start or if functionality is broken. |
| 116 | The rollback of an ONOS application is done by following the same procedure as the |
| 117 | update using the previous, or last known working, version of the `.oar` file. |
Andrea Campanella | 448fbc2 | 2021-05-13 15:39:00 +0200 | [diff] [blame] | 118 | |
| 119 | Major Software Version Update |
| 120 | ============================= |
| 121 | A software update is qualified to be `major` where there are changes in the APIs or in the format of the |
| 122 | data stored by a component. |
| 123 | |
| 124 | A major software update at the moment in VOLTHA and ONOS requires a maintenance window |
| 125 | during which the dataplane for the subscribers is going to be interrupted, thus no service will be provided. |
| 126 | There are several cases and they can be handled differently. |
| 127 | |
| 128 | VOLTHA services API or Data format changes |
| 129 | ------------------------------------------ |
| 130 | A `major` update is needed because VOLTHA API between components have been changed or because format of the data being |
| 131 | stored is different, thus a complete-wipe out needs to be performed. |
| 132 | In such scenario each stack can be updated independently with no teardown required of the infrastructure of ONOS, |
| 133 | ETCD, KAFKA. |
| 134 | Different versions of Voltha can co-exists over the same infrastructure. |
| 135 | |
| 136 | The procedure is iterative on each stack and is performed as follows: |
| 137 | |
| 138 | #. un-provision all the subscribers via ONOS REST API. |
| 139 | #. delete all the OLTs managed by the stack via VOLTHA gRPC API. |
| 140 | #. upgrade the stack version via `helm` upgrade command and the correct version of the `voltha-stack` chart. |
| 141 | |
Andrea Campanella | c18d118 | 2021-09-10 12:01:38 +0200 | [diff] [blame] | 142 | Details on the `helm` commands can be found in the voltha-helm-charts README file <voltha-helm-charts/README.md>_ |
Andrea Campanella | 448fbc2 | 2021-05-13 15:39:00 +0200 | [diff] [blame] | 143 | |
| 144 | If the API change is between the `openolt adapter` and the `openolt agent` on the OLT hardware please refer to section |
| 145 | :ref:`OpenOLT Agent Update <openolt-update>`. |
| 146 | |
| 147 | |
| 148 | ONOS, Atomix or ONOS apps |
| 149 | ------------------------- |
| 150 | A `major` update is needed because of changes in the interfaces (Java APIs), REST APIs, of ONOS itself or in one |
| 151 | of the apps have been made, rendering incompatible the two subsequent implementations. A `major` software update is |
| 152 | also needed for changes made to the data stored in Atomix or for an update of the Atomix version iself. |
| 153 | In this scenario all the stacks connected to an ONOS instance need to be cleaned of data before moving them |
| 154 | over to a new ONOS cluster. |
| 155 | |
| 156 | The procedure is as follows: |
| 157 | |
| 158 | #. deploy a new ONOS cluster in a new namespace `infra1` |
| 159 | #. un-provision all the subscribers via ONOS REST API |
| 160 | #. delete the OLT device (not strictly required, but best to ensure clean state) |
| 161 | #. redeploy the of-agent with the new ONOS cluster endpoints |
| 162 | #. re-provision the OLT |
| 163 | #. re-provision the subscribers |
| 164 | #. iterate over steps 2,3,4,5,6 for each of the stack connected to the ONOS you want to update. |
| 165 | |
| 166 | Following is an example on how to deploy ONOS: |
| 167 | |
| 168 | .. code:: bash |
| 169 | |
| 170 | helm install --create-namespace \ |
| 171 | --set replicas=3,atomix.replicas=3 \ |
| 172 | --set atomix.persistence.enabled=false \ |
| 173 | --set image.pullPolicy=Always,image.repository=voltha/voltha-onos,image.tag=5.0.0 \ |
| 174 | --namespace infra1 onos onos/onos-classic |
| 175 | |
| 176 | Following is an example on how to re-deploy the of-agent, using the `voltha-stack` chart, |
| 177 | pointing new controller endpoints. Only the `ofagent` pod will be restarted. |
| 178 | |
| 179 | .. code:: bash |
| 180 | |
| 181 | helm upgrade --install --create-namespace \ |
| 182 | --set global.topics.core_topic=voltha1_voltha1_rwcore,defaults.kv_store_data_prefix=service/minimal \ |
| 183 | --set global.kv_store_data_prefix=service/voltha/voltha1_voltha1 \ |
| 184 | --set services.etcd.port=2379 --set services.etcd.address=etcd.default.svc:2379 \ |
| 185 | --set services.kafka.adapter.service=voltha-infra-kafka.infra.svc \ |
| 186 | --set services.kafka.cluster.service=voltha-infra-kafka.infra.svc \ |
| 187 | --set services.etcd.service=voltha-infra-etcd.infra.svc |
| 188 | --set 'voltha.services.controller[0].service=voltha-infra1-onos-classic-0.voltha-infra1-onos-classic-hs.infra1.svc' \ |
| 189 | --set 'voltha.services.controller[0].port=6653' \ |
| 190 | --set 'voltha.services.controller[0].address=voltha-infra1-onos-classic-0.voltha-infra1-onos-classic-hs.infra1.svc:6653' \ |
| 191 | --set 'voltha.services.controller[1].service=voltha-infra1-onos-classic-1.voltha-infra1-onos-classic-hs.infra1.svc' \ |
| 192 | --set 'voltha.services.controller[1].port=6653' \ |
| 193 | --set 'voltha.services.controller[1].address=voltha-infra1-onos-classic-1.voltha-infra1-onos-classic-hs.infra1.svc:6653' \ |
| 194 | --set 'voltha.services.controller[2].service=voltha-infra1-onos-classic-2.voltha-infra1-onos-classic-hs.infra1.svc' \ |
| 195 | --set 'voltha.services.controller[2].port=6653' \ |
| 196 | --set 'voltha.services.controller[2].address=voltha-infra1-onos-classic-2.voltha-infra1-onos-classic-hs.infra1.svc:6653' \ |
| 197 | --set global.log_level=WARN --namespace voltha voltha onf/voltha-stack |
| 198 | |
| 199 | ETCD |
| 200 | ---- |
| 201 | A `major` update is needed because tearing down the ETCD cluster means deleting the data stored, |
| 202 | thus requiring a rebuild by the different components. |
| 203 | |
| 204 | The procedure is as follows: |
| 205 | |
| 206 | #. deploy a new ETCD cluster. |
| 207 | #. un-provision all the subscribers via ONOS REST API |
| 208 | #. delete the OLT device (not strictly required, but best to ensure clean state) |
| 209 | #. redeploy the voltha stack with the `voltha-stack` `helm` chart pointing it to the new ETCD endpoints. |
| 210 | #. re-provision the OLT |
| 211 | #. re-provision the subscribers |
| 212 | #. iterate over steps 2,3,4,5,6 for each stack connected to the ETCD cluster you want to update. |
| 213 | |
Andrea Campanella | c18d118 | 2021-09-10 12:01:38 +0200 | [diff] [blame] | 214 | Details on the `helm` commands for the voltha stack can be found in the `voltha-helm-charts README file <../voltha-helm-charts/README.md>`_ |
Andrea Campanella | 448fbc2 | 2021-05-13 15:39:00 +0200 | [diff] [blame] | 215 | |
| 216 | Following is an example on how to deploy a new 3 node ETCD cluster: |
| 217 | |
| 218 | .. code:: bash |
| 219 | |
| 220 | helm install --create-namespace --set auth.rbac.enabled=false,persistence.enabled=false,statefulset.replicaCount=3 --namespace infra etcd bitnami/etcd |
| 221 | |
| 222 | KAFKA Update |
| 223 | ============ |
| 224 | An update of Kafka is not considered to be a `major` software upgrade because it can be performed with |
| 225 | no service impact to the user. |
| 226 | |
| 227 | .. code:: bash |
| 228 | |
| 229 | helm install --create-namespace --set global.log_level=WARN --namespace infra kafka bitnami/kafka |
| 230 | |
| 231 | Following is an example on how to re-deploy the stack pods, using the `voltha-stack` chart, |
| 232 | pointing new kafka (`voltha-infra-kafka-2.infra.svc`) endpoints. |
| 233 | Each pod will be restarted but without dataplane interruption because it will be the same of a pod restart, |
| 234 | thus leveraging the data stored in ETCD. |
| 235 | |
| 236 | .. code:: bash |
| 237 | |
| 238 | helm upgrade --install --create-namespace \ |
| 239 | --set global.topics.core_topic=voltha1_voltha1_rwcore,defaults.kv_store_data_prefix=service/minimal \ |
| 240 | --set global.kv_store_data_prefix=service/voltha/voltha1_voltha1 \ |
| 241 | --set services.etcd.port=2379 --set services.etcd.address=etcd.default.svc:2379 \ |
| 242 | --set services.kafka.adapter.service=voltha-infra-kafka-2.infra.svc \ |
| 243 | --set services.kafka.cluster.service=voltha-infra-kafka-2.infra.svc \ |
| 244 | --set services.etcd.service=voltha-infra-etcd.infra.svc |
| 245 | --set 'voltha.services.controller[0].service=voltha-infra-onos-classic-0.voltha-infra-onos-classic-hs.infra.svc' \ |
| 246 | --set 'voltha.services.controller[0].port=6653' \ |
| 247 | --set 'voltha.services.controller[0].address=voltha-infra-onos-classic-0.voltha-infra-onos-classic-hs.infra.svc:6653' \ |
| 248 | --set 'voltha.services.controller[1].service=voltha-infra-onos-classic-1.voltha-infra-onos-classic-hs.infra.svc' \ |
| 249 | --set 'voltha.services.controller[1].port=6653' \ |
| 250 | --set 'voltha.services.controller[1].address=voltha-infra-onos-classic-1.voltha-infra-onos-classic-hs.infra.svc:6653' \ |
| 251 | --set 'voltha.services.controller[2].service=voltha-infra-onos-classic-2.voltha-infra-onos-classic-hs.infra.svc' \ |
| 252 | --set 'voltha.services.controller[2].port=6653' \ |
| 253 | --set 'voltha.services.controller[2].address=voltha-infra-onos-classic-2.voltha-infra-onos-classic-hs.infra.svc:6653' \ |
| 254 | --set global.log_level=WARN --namespace voltha voltha onf/voltha |
| 255 | |
| 256 | |
| 257 | .. _openolt-update: |
| 258 | |
| 259 | OpenOLT Agent Update |
| 260 | ==================== |
| 261 | |
| 262 | The `openolt agent` on the box can be upgrade without having to teardown all the VOLTHA stack to which the OLT was |
| 263 | connected. Again here we make the ditinction of a minor update and a major update of the openolt agent. |
| 264 | A minor update happens when there is no API change between the `openolt agent` and the `openolt adapter`, meaning the |
| 265 | `openolt.proto` has not been updated in either of those components. |
| 266 | A major update is required when there are changes to the `openolt.proto` API. |
| 267 | |
| 268 | Both updates of the OpenOLT agent are service impacting for the customer. |
| 269 | |
| 270 | Minor Update |
| 271 | ------------ |
| 272 | A minor update will be seen from VOLTHA as a reboot of the OLT. |
| 273 | During a minor update of the openolt agent no northbound should be done, in progress provision call will |
| 274 | reconcile upon OLT reboot. Events, metrics and performance measurements data can be lost and should not be expected |
| 275 | during this procedure. |
| 276 | The procedure is as follows: |
| 277 | |
| 278 | #. place the new openolt agent `.deb` package on the desired OLT. |
| 279 | #. stop the running `openolt`, `dev_mgmnt_deamon` and optionally the `watchdog` processes on the OLT. |
| 280 | #. run the new openolt packages |
| 281 | #. reboot the OLT hardware. |
| 282 | |
| 283 | After these steps are done VOLTHA will re-receive the OLT connection and re-provision data accordingly. |
| 284 | |
| 285 | Major update |
| 286 | ------------ |
| 287 | A major update will require the OLT to be deleted from VOLTHA to ensure no inconsistent data is stored. |
| 288 | During a major update of the openolt agent and adapter no northbound should be done and |
| 289 | in progress call will fail. Events, metrics and performance measurements data will be lost. |
| 290 | The procedure is as follows: |
| 291 | |
| 292 | #. Delete the OLT device from VOLTHA (e.g. voltctl device delete <olt_id>) |
| 293 | #. Upgrade the openolt-adapter to the new version via `helm upgrade`. |
| 294 | #. place the new openolt agent `.deb` package on the desired OLT. |
| 295 | #. stop the running `openolt`, `dev_mgmnt_deamon` and optionally the `watchdog` processes on the OLT. |
| 296 | #. run the new openolt packages |
| 297 | #. reboot the OLT hardware. |
| 298 | #. re-provision the OLT (e.g. `voltctl device provision <ip:port>` |
| 299 | #. re-enable the OLT (e.g. `voltctl device enable <olt_id>` |
| 300 | #. re-provision the subscribers. |
| 301 | |
| 302 | After these steps VOLTHA effectively treats the OLT as a brand new one which it had no prior knowledge of. |