Zack Williams | 9026f53 | 2020-11-30 11:34:32 -0700 | [diff] [blame] | 1 | .. |
| 2 | SPDX-FileCopyrightText: © 2020 Open Networking Foundation <support@opennetworking.org> |
| 3 | SPDX-License-Identifier: Apache-2.0 |
| 4 | |
Zack Williams | 794532a | 2021-03-18 17:38:36 -0700 | [diff] [blame] | 5 | General Procedures |
| 6 | ================== |
Zack Williams | 9026f53 | 2020-11-30 11:34:32 -0700 | [diff] [blame] | 7 | |
| 8 | Edge shutdown procedure |
| 9 | ----------------------- |
| 10 | |
| 11 | To gracefully shutdown an Aether Edge Pod, follow the following steps: |
| 12 | |
| 13 | 1. Shutdown the fabric switches using ``shutdown -h now`` |
| 14 | |
| 15 | 2. Shutdown the compute servers using ``shutdown -h now`` |
| 16 | |
| 17 | 3. Shutdown the management server using ``shutdown -h now`` |
| 18 | |
| 19 | 4. The management switch and eNB aren't capable of a graceful shutdown, so no |
| 20 | steps need to be taken for that hardware. |
| 21 | |
| 22 | 5. Remove power from the pod. |
| 23 | |
| 24 | .. note:: |
| 25 | |
| 26 | The shutdown steps can be automated with an :doc:`ad-hoc ansible command |
| 27 | <ansible:user_guide/intro_adhoc>` and you have an ansible inventory of all |
| 28 | the systems:: |
| 29 | |
| 30 | ansible -i inventory/sitename.ini -b -m shutdown -a "delay=60" all |
| 31 | |
Zack Williams | 1ae109e | 2021-07-27 11:17:04 -0700 | [diff] [blame] | 32 | The ``delay=60`` argument is to allow hosts behind the management server to |
| 33 | be reached before the management server shuts down. |
Zack Williams | 9026f53 | 2020-11-30 11:34:32 -0700 | [diff] [blame] | 34 | |
| 35 | Edge power up procedure |
| 36 | ----------------------- |
| 37 | |
| 38 | 1. Restore power to the pod equipment. The fabric and management switches will |
| 39 | power on automatically. |
| 40 | |
| 41 | 2. Turn on the management server using the front panel power button |
| 42 | |
| 43 | 3. Turn on the compute servers using the front panel power buttons |
Hyunsun Moon | 4949b06 | 2021-10-01 14:42:15 -0700 | [diff] [blame] | 44 | |
| 45 | Restore stateful application procedure |
| 46 | -------------------------------------- |
| 47 | |
| 48 | .. note:: |
| 49 | |
| 50 | PersistentVolumeClaim/PersistentVolume backup and restore is currently only available for ACC and AMP clusters. |
| 51 | |
| 52 | 1. Download and install Velero CLI following the `official guide <https://velero.io/docs/v1.7/basic-install/#install-the-cli>`_. |
| 53 | You'll also need ``kubectl`` and ``helm`` command line tools. |
| 54 | |
| 55 | 2. Download the K8S config of the target cluster from Rancher to your workstation. |
| 56 | |
| 57 | 3. Open Rancher **Continuous Delivery** > **Clusters** dashboard, |
| 58 | find the cluster the target application is running on, |
| 59 | and temporarily update the cluster label used as the target application's cluster selector |
| 60 | to uninstall the application and prevent it from being reinstalled during the restore process. |
| 61 | Refer to the table below for the cluster selector labels for the Aether applications. |
| 62 | It may take several minutes for the application uninstalled. |
| 63 | |
| 64 | +-------------+-----------------+------------------+ |
| 65 | | Application | Original Label | Temporary Label | |
| 66 | +-------------+-----------------+------------------+ |
| 67 | | cassandra | core4g=enabled | core4g=disabled | |
| 68 | +-------------+-----------------+------------------+ |
| 69 | | mongodb | core5g=enabled | core5g=disabled | |
| 70 | +-------------+-----------------+------------------+ |
| 71 | | roc | roc=enabled | roc=disabled | |
| 72 | +-------------+-----------------+------------------+ |
| 73 | |
| 74 | .. image:: images/rancher-fleet-cluster-label-edit1.png |
| 75 | :width: 753 |
| 76 | |
| 77 | .. image:: images/rancher-fleet-cluster-label-edit2.png |
| 78 | :width: 753 |
| 79 | |
| 80 | 4. Clean up existing PVC and PV for the application. In this guide, Cassandra is used as an example. |
| 81 | |
| 82 | .. code-block:: shell |
| 83 | |
| 84 | # Assume that we lost all HSSDB data |
Hyunsun Moon | 9a8ad09 | 2021-10-12 23:51:58 -0700 | [diff] [blame] | 85 | $ kubectl exec cassandra-0 -n aether-sdcore-4g -- cqlsh $cassandra_ip -e 'select * from vhss.users_imsi' |
Hyunsun Moon | 4949b06 | 2021-10-01 14:42:15 -0700 | [diff] [blame] | 86 | <stdin>:1:InvalidRequest: code=2200 [Invalid query] message="Keyspace vhss does not exist" |
| 87 | |
| 88 | # Confirm the application is uninstalled after updating the cluster label |
Hyunsun Moon | 9a8ad09 | 2021-10-12 23:51:58 -0700 | [diff] [blame] | 89 | $ helm list -n aether-sdcore-4g |
Hyunsun Moon | 4949b06 | 2021-10-01 14:42:15 -0700 | [diff] [blame] | 90 | (no result) |
| 91 | |
| 92 | # Clean up any remaining resources including PVC |
Hyunsun Moon | 9a8ad09 | 2021-10-12 23:51:58 -0700 | [diff] [blame] | 93 | $ kubectl delete ns aether-sdcore-4g |
Hyunsun Moon | 4949b06 | 2021-10-01 14:42:15 -0700 | [diff] [blame] | 94 | |
| 95 | # Clean up released PVs if exists |
| 96 | $ kubectl delete pv $(kubectl get pv | grep cassandra | grep Released | awk '$1 {print$1}') |
| 97 | |
| 98 | 5. Find a backup to restore. |
| 99 | |
| 100 | .. code-block:: shell |
| 101 | |
| 102 | # Find the relevant backup schedule name |
| 103 | $ velero schedule get |
Hyunsun Moon | 81c18e2 | 2021-10-13 19:03:51 -0700 | [diff] [blame] | 104 | NAME STATUS CREATED SCHEDULE BACKUP TTL LAST BACKUP SELECTOR |
| 105 | velero-daily-cassandra Enabled 2021-10-11 15:33:30 -0700 PDT 0 7 * * * 720h0m0s 11h ago app=cassandra |
| 106 | velero-daily-mongodb Enabled 2021-10-11 15:33:30 -0700 PDT 0 7 * * * 720h0m0s 11h ago app.kubernetes.io/name=mongodb |
| 107 | velero-daily-opendistro-es Enabled 2021-10-11 15:33:30 -0700 PDT 0 7 * * * 720h0m0s 11h ago app=opendistro-es |
| 108 | velero-daily-prometheus Enabled 2021-10-11 15:33:30 -0700 PDT 0 7 * * * 720h0m0s 11h ago app=prometheus |
Hyunsun Moon | 4949b06 | 2021-10-01 14:42:15 -0700 | [diff] [blame] | 109 | |
| 110 | # List the backups |
Hyunsun Moon | 9a8ad09 | 2021-10-12 23:51:58 -0700 | [diff] [blame] | 111 | $ velero backup get --selector velero.io/schedule-name=velero-daily-cassandra |
| 112 | NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR |
| 113 | velero-daily-cassandra-20211012070020 Completed 0 0 2021-10-12 00:00:41 -0700 PDT 29d default app=cassandra |
| 114 | velero-daily-cassandra-20211011070019 Completed 0 0 2021-10-11 00:00:26 -0700 PDT 28d default app=cassandra |
Hyunsun Moon | 4949b06 | 2021-10-01 14:42:15 -0700 | [diff] [blame] | 115 | ... |
| 116 | |
| 117 | # Confirm the backup includes all the necessary resources |
Hyunsun Moon | 9a8ad09 | 2021-10-12 23:51:58 -0700 | [diff] [blame] | 118 | $ velero backup describe velero-daily-cassandra-20211012070020 --details |
Hyunsun Moon | 4949b06 | 2021-10-01 14:42:15 -0700 | [diff] [blame] | 119 | ... |
| 120 | Resource List: |
Hyunsun Moon | 9a8ad09 | 2021-10-12 23:51:58 -0700 | [diff] [blame] | 121 | v1/PersistentVolume: |
| 122 | - pvc-50ccd76e-3808-432b-882f-8858ecebf25b |
| 123 | - pvc-67f82bc9-14f3-4faf-bf24-a2a3d6ccc411 |
| 124 | v1/PersistentVolumeClaim: |
| 125 | - aether-sdcore-4g/data-cassandra-0 |
| 126 | - aether-sdcore-4g/data-cassandra-1 |
| 127 | - aether-sdcore-4g/data-cassandra-2 |
Hyunsun Moon | 4949b06 | 2021-10-01 14:42:15 -0700 | [diff] [blame] | 128 | |
| 129 | 6. Update the backup storage location to read-only mode to prevent backup object from being created or |
| 130 | deleted in the backup location during the restore process. |
| 131 | |
| 132 | .. code-block:: shell |
| 133 | |
| 134 | $ kubectl patch backupstoragelocations default \ |
| 135 | --namespace velero \ |
| 136 | --type merge \ |
| 137 | --patch '{"spec":{"accessMode":"ReadOnly"}}' |
| 138 | |
| 139 | 7. Create a restore with the most recent backup. |
| 140 | |
| 141 | .. code-block:: shell |
| 142 | |
| 143 | # Create restore |
Hyunsun Moon | 9a8ad09 | 2021-10-12 23:51:58 -0700 | [diff] [blame] | 144 | $ velero restore create --from-backup velero-daily-cassandra-20211012070020 |
Hyunsun Moon | 4949b06 | 2021-10-01 14:42:15 -0700 | [diff] [blame] | 145 | |
| 146 | # Wait STATUS become Completed |
| 147 | $ velero restore get |
Hyunsun Moon | 9a8ad09 | 2021-10-12 23:51:58 -0700 | [diff] [blame] | 148 | NAME BACKUP STATUS STARTED COMPLETED ERRORS WARNINGS CREATED SELECTOR |
| 149 | velero-daily-cassandra-20211012070020-20211012141850 velero-daily-cassandra-20211012070020 Completed 2021-10-12 13:11:20 -0700 PDT <nil> 0 0 2021-10-12 13:11:20 -0700 PDT <none> |
Hyunsun Moon | 4949b06 | 2021-10-01 14:42:15 -0700 | [diff] [blame] | 150 | |
| 151 | 8. Confirm that PVCs are restored and "Bound" to the restored PV successfully. |
| 152 | |
| 153 | .. code-block:: shell |
| 154 | |
| 155 | $ kubectl get pvc -n aether-sdcore |
| 156 | NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE |
Hyunsun Moon | 9a8ad09 | 2021-10-12 23:51:58 -0700 | [diff] [blame] | 157 | data-cassandra-0 Bound pvc-50ccd76e-3808-432b-882f-8858ecebf25b 10Gi RWO standard 45s |
| 158 | data-cassandra-1 Bound pvc-67f82bc9-14f3-4faf-bf24-a2a3d6ccc411 10Gi RWO standard 45s |
| 159 | data-cassandra-2 Bound pvc-a7f055b2-aab1-41ce-b3f4-c4bcb83b0232 10Gi RWO standard 45s |
Hyunsun Moon | 4949b06 | 2021-10-01 14:42:15 -0700 | [diff] [blame] | 160 | |
| 161 | 9. Revert the backup storage location to read-write mode. |
| 162 | |
| 163 | .. code-block:: shell |
| 164 | |
| 165 | kubectl patch backupstoragelocation default \ |
| 166 | --namespace velero \ |
| 167 | --type merge \ |
| 168 | --patch '{"spec":{"accessMode":"ReadWrite"}}' |
| 169 | |
| 170 | 10. Revert the cluster label to the original and wait Fleet to reinstall the application. |
| 171 | It may take several minutes. |
| 172 | |
| 173 | .. code-block:: shell |
| 174 | |
| 175 | # Confirm the application is installed |
Hyunsun Moon | 9a8ad09 | 2021-10-12 23:51:58 -0700 | [diff] [blame] | 176 | $$ kubectl get po -n aether-sdcore-4g -l app=cassandra |
| 177 | NAME READY STATUS RESTARTS AGE |
| 178 | cassandra-0 1/1 Running 0 1h |
| 179 | cassandra-1 1/1 Running 0 1h |
| 180 | cassandra-2 1/1 Running 0 1h |
Hyunsun Moon | 4949b06 | 2021-10-01 14:42:15 -0700 | [diff] [blame] | 181 | |
| 182 | # Confirm the data is restored |
| 183 | $ kubectl exec cassandra-0 -n aether-sdcore -- cqlsh $cassandra_ip -e 'select * from vhss.users_imsi' |
| 184 | ... |
| 185 | (10227 rows) |