blob: 53dcacc8e273c5d2bdb50ea9b3cbd18aadbf04fd [file] [log] [blame]
Zack Williams9026f532020-11-30 11:34:32 -07001..
2 SPDX-FileCopyrightText: © 2020 Open Networking Foundation <support@opennetworking.org>
3 SPDX-License-Identifier: Apache-2.0
4
Zack Williams794532a2021-03-18 17:38:36 -07005General Procedures
6==================
Zack Williams9026f532020-11-30 11:34:32 -07007
8Edge shutdown procedure
9-----------------------
10
11To gracefully shutdown an Aether Edge Pod, follow the following steps:
12
131. Shutdown the fabric switches using ``shutdown -h now``
14
152. Shutdown the compute servers using ``shutdown -h now``
16
173. Shutdown the management server using ``shutdown -h now``
18
194. The management switch and eNB aren't capable of a graceful shutdown, so no
20 steps need to be taken for that hardware.
21
225. Remove power from the pod.
23
24.. note::
25
26 The shutdown steps can be automated with an :doc:`ad-hoc ansible command
27 <ansible:user_guide/intro_adhoc>` and you have an ansible inventory of all
28 the systems::
29
30 ansible -i inventory/sitename.ini -b -m shutdown -a "delay=60" all
31
Zack Williams1ae109e2021-07-27 11:17:04 -070032 The ``delay=60`` argument is to allow hosts behind the management server to
33 be reached before the management server shuts down.
Zack Williams9026f532020-11-30 11:34:32 -070034
35Edge power up procedure
36-----------------------
37
381. Restore power to the pod equipment. The fabric and management switches will
39 power on automatically.
40
412. Turn on the management server using the front panel power button
42
433. Turn on the compute servers using the front panel power buttons
Hyunsun Moon4949b062021-10-01 14:42:15 -070044
45Restore stateful application procedure
46--------------------------------------
47
48.. note::
49
50 PersistentVolumeClaim/PersistentVolume backup and restore is currently only available for ACC and AMP clusters.
51
521. Download and install Velero CLI following the `official guide <https://velero.io/docs/v1.7/basic-install/#install-the-cli>`_.
53 You'll also need ``kubectl`` and ``helm`` command line tools.
54
552. Download the K8S config of the target cluster from Rancher to your workstation.
56
573. Open Rancher **Continuous Delivery** > **Clusters** dashboard,
58 find the cluster the target application is running on,
59 and temporarily update the cluster label used as the target application's cluster selector
60 to uninstall the application and prevent it from being reinstalled during the restore process.
61 Refer to the table below for the cluster selector labels for the Aether applications.
62 It may take several minutes for the application uninstalled.
63
64 +-------------+-----------------+------------------+
65 | Application | Original Label | Temporary Label |
66 +-------------+-----------------+------------------+
67 | cassandra | core4g=enabled | core4g=disabled |
68 +-------------+-----------------+------------------+
69 | mongodb | core5g=enabled | core5g=disabled |
70 +-------------+-----------------+------------------+
71 | roc | roc=enabled | roc=disabled |
72 +-------------+-----------------+------------------+
73
74.. image:: images/rancher-fleet-cluster-label-edit1.png
75 :width: 753
76
77.. image:: images/rancher-fleet-cluster-label-edit2.png
78 :width: 753
79
804. Clean up existing PVC and PV for the application. In this guide, Cassandra is used as an example.
81
82.. code-block:: shell
83
84 # Assume that we lost all HSSDB data
Hyunsun Moon9a8ad092021-10-12 23:51:58 -070085 $ kubectl exec cassandra-0 -n aether-sdcore-4g -- cqlsh $cassandra_ip -e 'select * from vhss.users_imsi'
Hyunsun Moon4949b062021-10-01 14:42:15 -070086 <stdin>:1:InvalidRequest: code=2200 [Invalid query] message="Keyspace vhss does not exist"
87
88 # Confirm the application is uninstalled after updating the cluster label
Hyunsun Moon9a8ad092021-10-12 23:51:58 -070089 $ helm list -n aether-sdcore-4g
Hyunsun Moon4949b062021-10-01 14:42:15 -070090 (no result)
91
92 # Clean up any remaining resources including PVC
Hyunsun Moon9a8ad092021-10-12 23:51:58 -070093 $ kubectl delete ns aether-sdcore-4g
Hyunsun Moon4949b062021-10-01 14:42:15 -070094
95 # Clean up released PVs if exists
96 $ kubectl delete pv $(kubectl get pv | grep cassandra | grep Released | awk '$1 {print$1}')
97
985. Find a backup to restore.
99
100.. code-block:: shell
101
102 # Find the relevant backup schedule name
103 $ velero schedule get
Hyunsun Moon9a8ad092021-10-12 23:51:58 -0700104 NAME STATUS CREATED SCHEDULE BACKUP TTL LAST BACKUP SELECTOR
105 velero-daily-cassandra Enabled 2021-10-11 15:33:30 -0700 PDT 0 7 * * * 720h0m0s 11h ago app=cassandra
106 velero-daily-mongodb Enabled 2021-10-11 15:33:30 -0700 PDT 0 7 * * * 720h0m0s 11h ago app.kubernetes.io/name=mongodb
Hyunsun Moon4949b062021-10-01 14:42:15 -0700107
108 # List the backups
Hyunsun Moon9a8ad092021-10-12 23:51:58 -0700109 $ velero backup get --selector velero.io/schedule-name=velero-daily-cassandra
110 NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR
111 velero-daily-cassandra-20211012070020 Completed 0 0 2021-10-12 00:00:41 -0700 PDT 29d default app=cassandra
112 velero-daily-cassandra-20211011070019 Completed 0 0 2021-10-11 00:00:26 -0700 PDT 28d default app=cassandra
Hyunsun Moon4949b062021-10-01 14:42:15 -0700113 ...
114
115 # Confirm the backup includes all the necessary resources
Hyunsun Moon9a8ad092021-10-12 23:51:58 -0700116 $ velero backup describe velero-daily-cassandra-20211012070020 --details
Hyunsun Moon4949b062021-10-01 14:42:15 -0700117 ...
118 Resource List:
Hyunsun Moon9a8ad092021-10-12 23:51:58 -0700119 v1/PersistentVolume:
120 - pvc-50ccd76e-3808-432b-882f-8858ecebf25b
121 - pvc-67f82bc9-14f3-4faf-bf24-a2a3d6ccc411
122 v1/PersistentVolumeClaim:
123 - aether-sdcore-4g/data-cassandra-0
124 - aether-sdcore-4g/data-cassandra-1
125 - aether-sdcore-4g/data-cassandra-2
Hyunsun Moon4949b062021-10-01 14:42:15 -0700126
1276. Update the backup storage location to read-only mode to prevent backup object from being created or
128 deleted in the backup location during the restore process.
129
130.. code-block:: shell
131
132 $ kubectl patch backupstoragelocations default \
133 --namespace velero \
134 --type merge \
135 --patch '{"spec":{"accessMode":"ReadOnly"}}'
136
1377. Create a restore with the most recent backup.
138
139.. code-block:: shell
140
141 # Create restore
Hyunsun Moon9a8ad092021-10-12 23:51:58 -0700142 $ velero restore create --from-backup velero-daily-cassandra-20211012070020
Hyunsun Moon4949b062021-10-01 14:42:15 -0700143
144 # Wait STATUS become Completed
145 $ velero restore get
Hyunsun Moon9a8ad092021-10-12 23:51:58 -0700146 NAME BACKUP STATUS STARTED COMPLETED ERRORS WARNINGS CREATED SELECTOR
147 velero-daily-cassandra-20211012070020-20211012141850 velero-daily-cassandra-20211012070020 Completed 2021-10-12 13:11:20 -0700 PDT <nil> 0 0 2021-10-12 13:11:20 -0700 PDT <none>
Hyunsun Moon4949b062021-10-01 14:42:15 -0700148
1498. Confirm that PVCs are restored and "Bound" to the restored PV successfully.
150
151.. code-block:: shell
152
153 $ kubectl get pvc -n aether-sdcore
154 NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
Hyunsun Moon9a8ad092021-10-12 23:51:58 -0700155 data-cassandra-0 Bound pvc-50ccd76e-3808-432b-882f-8858ecebf25b 10Gi RWO standard 45s
156 data-cassandra-1 Bound pvc-67f82bc9-14f3-4faf-bf24-a2a3d6ccc411 10Gi RWO standard 45s
157 data-cassandra-2 Bound pvc-a7f055b2-aab1-41ce-b3f4-c4bcb83b0232 10Gi RWO standard 45s
Hyunsun Moon4949b062021-10-01 14:42:15 -0700158
1599. Revert the backup storage location to read-write mode.
160
161.. code-block:: shell
162
163 kubectl patch backupstoragelocation default \
164 --namespace velero \
165 --type merge \
166 --patch '{"spec":{"accessMode":"ReadWrite"}}'
167
16810. Revert the cluster label to the original and wait Fleet to reinstall the application.
169 It may take several minutes.
170
171.. code-block:: shell
172
173 # Confirm the application is installed
Hyunsun Moon9a8ad092021-10-12 23:51:58 -0700174 $$ kubectl get po -n aether-sdcore-4g -l app=cassandra
175 NAME READY STATUS RESTARTS AGE
176 cassandra-0 1/1 Running 0 1h
177 cassandra-1 1/1 Running 0 1h
178 cassandra-2 1/1 Running 0 1h
Hyunsun Moon4949b062021-10-01 14:42:15 -0700179
180 # Confirm the data is restored
181 $ kubectl exec cassandra-0 -n aether-sdcore -- cqlsh $cassandra_ip -e 'select * from vhss.users_imsi'
182 ...
183 (10227 rows)