blob: 909ed397b86e6098d6e47857ba04addd733e510b [file] [log] [blame]
Zack Williams9026f532020-11-30 11:34:32 -07001..
2 SPDX-FileCopyrightText: © 2020 Open Networking Foundation <support@opennetworking.org>
3 SPDX-License-Identifier: Apache-2.0
4
Zack Williams794532a2021-03-18 17:38:36 -07005General Procedures
6==================
Zack Williams9026f532020-11-30 11:34:32 -07007
8Edge shutdown procedure
9-----------------------
10
11To gracefully shutdown an Aether Edge Pod, follow the following steps:
12
131. Shutdown the fabric switches using ``shutdown -h now``
14
152. Shutdown the compute servers using ``shutdown -h now``
16
173. Shutdown the management server using ``shutdown -h now``
18
194. The management switch and eNB aren't capable of a graceful shutdown, so no
20 steps need to be taken for that hardware.
21
225. Remove power from the pod.
23
24.. note::
25
26 The shutdown steps can be automated with an :doc:`ad-hoc ansible command
27 <ansible:user_guide/intro_adhoc>` and you have an ansible inventory of all
28 the systems::
29
30 ansible -i inventory/sitename.ini -b -m shutdown -a "delay=60" all
31
Zack Williams1ae109e2021-07-27 11:17:04 -070032 The ``delay=60`` argument is to allow hosts behind the management server to
33 be reached before the management server shuts down.
Zack Williams9026f532020-11-30 11:34:32 -070034
35Edge power up procedure
36-----------------------
37
381. Restore power to the pod equipment. The fabric and management switches will
39 power on automatically.
40
412. Turn on the management server using the front panel power button
42
433. Turn on the compute servers using the front panel power buttons
Hyunsun Moon4949b062021-10-01 14:42:15 -070044
45Restore stateful application procedure
46--------------------------------------
47
48.. note::
49
50 PersistentVolumeClaim/PersistentVolume backup and restore is currently only available for ACC and AMP clusters.
51
521. Download and install Velero CLI following the `official guide <https://velero.io/docs/v1.7/basic-install/#install-the-cli>`_.
53 You'll also need ``kubectl`` and ``helm`` command line tools.
54
552. Download the K8S config of the target cluster from Rancher to your workstation.
56
573. Open Rancher **Continuous Delivery** > **Clusters** dashboard,
58 find the cluster the target application is running on,
59 and temporarily update the cluster label used as the target application's cluster selector
60 to uninstall the application and prevent it from being reinstalled during the restore process.
61 Refer to the table below for the cluster selector labels for the Aether applications.
62 It may take several minutes for the application uninstalled.
63
64 +-------------+-----------------+------------------+
65 | Application | Original Label | Temporary Label |
66 +-------------+-----------------+------------------+
67 | cassandra | core4g=enabled | core4g=disabled |
68 +-------------+-----------------+------------------+
69 | mongodb | core5g=enabled | core5g=disabled |
70 +-------------+-----------------+------------------+
71 | roc | roc=enabled | roc=disabled |
72 +-------------+-----------------+------------------+
73
74.. image:: images/rancher-fleet-cluster-label-edit1.png
75 :width: 753
76
77.. image:: images/rancher-fleet-cluster-label-edit2.png
78 :width: 753
79
804. Clean up existing PVC and PV for the application. In this guide, Cassandra is used as an example.
81
82.. code-block:: shell
83
84 # Assume that we lost all HSSDB data
Hyunsun Moon9a8ad092021-10-12 23:51:58 -070085 $ kubectl exec cassandra-0 -n aether-sdcore-4g -- cqlsh $cassandra_ip -e 'select * from vhss.users_imsi'
Hyunsun Moon4949b062021-10-01 14:42:15 -070086 <stdin>:1:InvalidRequest: code=2200 [Invalid query] message="Keyspace vhss does not exist"
87
88 # Confirm the application is uninstalled after updating the cluster label
Hyunsun Moon9a8ad092021-10-12 23:51:58 -070089 $ helm list -n aether-sdcore-4g
Hyunsun Moon4949b062021-10-01 14:42:15 -070090 (no result)
91
92 # Clean up any remaining resources including PVC
Hyunsun Moon9a8ad092021-10-12 23:51:58 -070093 $ kubectl delete ns aether-sdcore-4g
Hyunsun Moon4949b062021-10-01 14:42:15 -070094
95 # Clean up released PVs if exists
96 $ kubectl delete pv $(kubectl get pv | grep cassandra | grep Released | awk '$1 {print$1}')
97
985. Find a backup to restore.
99
100.. code-block:: shell
101
102 # Find the relevant backup schedule name
103 $ velero schedule get
Hyunsun Moon81c18e22021-10-13 19:03:51 -0700104 NAME STATUS CREATED SCHEDULE BACKUP TTL LAST BACKUP SELECTOR
105 velero-daily-cassandra Enabled 2021-10-11 15:33:30 -0700 PDT 0 7 * * * 720h0m0s 11h ago app=cassandra
106 velero-daily-mongodb Enabled 2021-10-11 15:33:30 -0700 PDT 0 7 * * * 720h0m0s 11h ago app.kubernetes.io/name=mongodb
107 velero-daily-opendistro-es Enabled 2021-10-11 15:33:30 -0700 PDT 0 7 * * * 720h0m0s 11h ago app=opendistro-es
108 velero-daily-prometheus Enabled 2021-10-11 15:33:30 -0700 PDT 0 7 * * * 720h0m0s 11h ago app=prometheus
Hyunsun Moon4949b062021-10-01 14:42:15 -0700109
110 # List the backups
Hyunsun Moon9a8ad092021-10-12 23:51:58 -0700111 $ velero backup get --selector velero.io/schedule-name=velero-daily-cassandra
112 NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR
113 velero-daily-cassandra-20211012070020 Completed 0 0 2021-10-12 00:00:41 -0700 PDT 29d default app=cassandra
114 velero-daily-cassandra-20211011070019 Completed 0 0 2021-10-11 00:00:26 -0700 PDT 28d default app=cassandra
Hyunsun Moon4949b062021-10-01 14:42:15 -0700115 ...
116
117 # Confirm the backup includes all the necessary resources
Hyunsun Moon9a8ad092021-10-12 23:51:58 -0700118 $ velero backup describe velero-daily-cassandra-20211012070020 --details
Hyunsun Moon4949b062021-10-01 14:42:15 -0700119 ...
120 Resource List:
Hyunsun Moon9a8ad092021-10-12 23:51:58 -0700121 v1/PersistentVolume:
122 - pvc-50ccd76e-3808-432b-882f-8858ecebf25b
123 - pvc-67f82bc9-14f3-4faf-bf24-a2a3d6ccc411
124 v1/PersistentVolumeClaim:
125 - aether-sdcore-4g/data-cassandra-0
126 - aether-sdcore-4g/data-cassandra-1
127 - aether-sdcore-4g/data-cassandra-2
Hyunsun Moon4949b062021-10-01 14:42:15 -0700128
1296. Update the backup storage location to read-only mode to prevent backup object from being created or
130 deleted in the backup location during the restore process.
131
132.. code-block:: shell
133
134 $ kubectl patch backupstoragelocations default \
135 --namespace velero \
136 --type merge \
137 --patch '{"spec":{"accessMode":"ReadOnly"}}'
138
1397. Create a restore with the most recent backup.
140
141.. code-block:: shell
142
143 # Create restore
Hyunsun Moon9a8ad092021-10-12 23:51:58 -0700144 $ velero restore create --from-backup velero-daily-cassandra-20211012070020
Hyunsun Moon4949b062021-10-01 14:42:15 -0700145
146 # Wait STATUS become Completed
147 $ velero restore get
Hyunsun Moon9a8ad092021-10-12 23:51:58 -0700148 NAME BACKUP STATUS STARTED COMPLETED ERRORS WARNINGS CREATED SELECTOR
149 velero-daily-cassandra-20211012070020-20211012141850 velero-daily-cassandra-20211012070020 Completed 2021-10-12 13:11:20 -0700 PDT <nil> 0 0 2021-10-12 13:11:20 -0700 PDT <none>
Hyunsun Moon4949b062021-10-01 14:42:15 -0700150
1518. Confirm that PVCs are restored and "Bound" to the restored PV successfully.
152
153.. code-block:: shell
154
155 $ kubectl get pvc -n aether-sdcore
156 NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
Hyunsun Moon9a8ad092021-10-12 23:51:58 -0700157 data-cassandra-0 Bound pvc-50ccd76e-3808-432b-882f-8858ecebf25b 10Gi RWO standard 45s
158 data-cassandra-1 Bound pvc-67f82bc9-14f3-4faf-bf24-a2a3d6ccc411 10Gi RWO standard 45s
159 data-cassandra-2 Bound pvc-a7f055b2-aab1-41ce-b3f4-c4bcb83b0232 10Gi RWO standard 45s
Hyunsun Moon4949b062021-10-01 14:42:15 -0700160
1619. Revert the backup storage location to read-write mode.
162
163.. code-block:: shell
164
165 kubectl patch backupstoragelocation default \
166 --namespace velero \
167 --type merge \
168 --patch '{"spec":{"accessMode":"ReadWrite"}}'
169
17010. Revert the cluster label to the original and wait Fleet to reinstall the application.
171 It may take several minutes.
172
173.. code-block:: shell
174
175 # Confirm the application is installed
Hyunsun Moon9a8ad092021-10-12 23:51:58 -0700176 $$ kubectl get po -n aether-sdcore-4g -l app=cassandra
177 NAME READY STATUS RESTARTS AGE
178 cassandra-0 1/1 Running 0 1h
179 cassandra-1 1/1 Running 0 1h
180 cassandra-2 1/1 Running 0 1h
Hyunsun Moon4949b062021-10-01 14:42:15 -0700181
182 # Confirm the data is restored
183 $ kubectl exec cassandra-0 -n aether-sdcore -- cqlsh $cassandra_ip -e 'select * from vhss.users_imsi'
184 ...
185 (10227 rows)