blob: e7d52ff16e8668dcdb120a2ada8f8a3afc43f054 [file] [log] [blame]
Zack Williams9026f532020-11-30 11:34:32 -07001..
2 SPDX-FileCopyrightText: © 2020 Open Networking Foundation <support@opennetworking.org>
3 SPDX-License-Identifier: Apache-2.0
4
Zack Williams794532a2021-03-18 17:38:36 -07005General Procedures
6==================
Zack Williams9026f532020-11-30 11:34:32 -07007
8Edge shutdown procedure
9-----------------------
10
11To gracefully shutdown an Aether Edge Pod, follow the following steps:
12
131. Shutdown the fabric switches using ``shutdown -h now``
14
152. Shutdown the compute servers using ``shutdown -h now``
16
173. Shutdown the management server using ``shutdown -h now``
18
194. The management switch and eNB aren't capable of a graceful shutdown, so no
20 steps need to be taken for that hardware.
21
225. Remove power from the pod.
23
24.. note::
25
26 The shutdown steps can be automated with an :doc:`ad-hoc ansible command
27 <ansible:user_guide/intro_adhoc>` and you have an ansible inventory of all
28 the systems::
29
30 ansible -i inventory/sitename.ini -b -m shutdown -a "delay=60" all
31
Zack Williams1ae109e2021-07-27 11:17:04 -070032 The ``delay=60`` argument is to allow hosts behind the management server to
33 be reached before the management server shuts down.
Zack Williams9026f532020-11-30 11:34:32 -070034
35Edge power up procedure
36-----------------------
37
381. Restore power to the pod equipment. The fabric and management switches will
39 power on automatically.
40
412. Turn on the management server using the front panel power button
42
433. Turn on the compute servers using the front panel power buttons
Hyunsun Moon4949b062021-10-01 14:42:15 -070044
45Restore stateful application procedure
46--------------------------------------
47
48.. note::
49
50 PersistentVolumeClaim/PersistentVolume backup and restore is currently only available for ACC and AMP clusters.
51
521. Download and install Velero CLI following the `official guide <https://velero.io/docs/v1.7/basic-install/#install-the-cli>`_.
53 You'll also need ``kubectl`` and ``helm`` command line tools.
54
552. Download the K8S config of the target cluster from Rancher to your workstation.
56
573. Open Rancher **Continuous Delivery** > **Clusters** dashboard,
58 find the cluster the target application is running on,
59 and temporarily update the cluster label used as the target application's cluster selector
60 to uninstall the application and prevent it from being reinstalled during the restore process.
61 Refer to the table below for the cluster selector labels for the Aether applications.
62 It may take several minutes for the application uninstalled.
63
64 +-------------+-----------------+------------------+
65 | Application | Original Label | Temporary Label |
66 +-------------+-----------------+------------------+
67 | cassandra | core4g=enabled | core4g=disabled |
68 +-------------+-----------------+------------------+
69 | mongodb | core5g=enabled | core5g=disabled |
70 +-------------+-----------------+------------------+
71 | roc | roc=enabled | roc=disabled |
72 +-------------+-----------------+------------------+
73
74.. image:: images/rancher-fleet-cluster-label-edit1.png
75 :width: 753
76
77.. image:: images/rancher-fleet-cluster-label-edit2.png
78 :width: 753
79
804. Clean up existing PVC and PV for the application. In this guide, Cassandra is used as an example.
81
82.. code-block:: shell
83
84 # Assume that we lost all HSSDB data
85 $ kubectl exec cassandra-0 -n aether-sdcore -- cqlsh $cassandra_ip -e 'select * from vhss.users_imsi'
86 <stdin>:1:InvalidRequest: code=2200 [Invalid query] message="Keyspace vhss does not exist"
87
88 # Confirm the application is uninstalled after updating the cluster label
89 $ helm list -n aether-sdcore
90 (no result)
91
92 # Clean up any remaining resources including PVC
93 $ kubectl delete ns aether-sdcore
94
95 # Clean up released PVs if exists
96 $ kubectl delete pv $(kubectl get pv | grep cassandra | grep Released | awk '$1 {print$1}')
97
985. Find a backup to restore.
99
100.. code-block:: shell
101
102 # Find the relevant backup schedule name
103 $ velero schedule get
104 NAME STATUS CREATED SCHEDULE BACKUP TTL LAST BACKUP SELECTOR
105 velero-daily-logging Enabled 2021-09-25 01:35:24 -0700 PDT 0 0 * * * 720h0m0s 19h ago <none>
106 velero-daily-monitoring Enabled 2021-09-25 01:35:25 -0700 PDT 0 0 * * * 720h0m0s 19h ago <none>
107 velero-daily-roc Enabled 2021-09-25 01:35:25 -0700 PDT 0 0 * * * 720h0m0s 19h ago <none>
108 velero-daily-sdcore Enabled 2021-09-25 01:35:25 -0700 PDT 0 0 * * * 720h0m0s 19h ago <none>
109
110 # List the backups
111 $ velero backup get --selector velero.io/schedule-name=velero-daily-sdcore
112 NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR
113 velero-daily-sdcore-20211001000013 Completed 0 0 2021-09-30 17:00:19 -0700 PDT 29d default <none>
114 velero-daily-sdcore-20210930000013 Completed 0 0 2021-09-29 17:00:28 -0700 PDT 28d default <none>
115 ...
116
117 # Confirm the backup includes all the necessary resources
118 $ velero backup describe velero-daily-sdcore-20211001000013 --details
119 ...
120 Resource List:
121 v1/PersistentVolume:
122 - pvc-67f82bc9-14f3-4faf-bf24-a2a3d6ccc411
123 - pvc-b19d996b-cc83-4c10-9888-a55ba0eedc93
124 - pvc-d2473b2e-8e6c-42d2-9d13-8fdb842d8cb1
125 v1/PersistentVolumeClaim:
126 - aether-sdcore/data-cassandra-0
127 - aether-sdcore/data-cassandra-1
128 - aether-sdcore/data-cassandra-2
129
1306. Update the backup storage location to read-only mode to prevent backup object from being created or
131 deleted in the backup location during the restore process.
132
133.. code-block:: shell
134
135 $ kubectl patch backupstoragelocations default \
136 --namespace velero \
137 --type merge \
138 --patch '{"spec":{"accessMode":"ReadOnly"}}'
139
1407. Create a restore with the most recent backup.
141
142.. code-block:: shell
143
144 # Create restore
145 $ velero restore create --from-backup velero-daily-sdcore-20211001000013
146
147 # Wait STATUS become Completed
148 $ velero restore get
149 NAME BACKUP STATUS STARTED COMPLETED ERRORS WARNINGS CREATED SELECTOR
150 velero-daily-sdcore-20211001000013-20211001141850 velero-daily-sdcore-20211001000013 Completed 2021-10-01 13:11:20 -0700 PDT <nil> 0 0 2021-10-01 13:11:20 -0700 PDT <none>
151
1528. Confirm that PVCs are restored and "Bound" to the restored PV successfully.
153
154.. code-block:: shell
155
156 $ kubectl get pvc -n aether-sdcore
157 NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
158 data-cassandra-0 Bound pvc-67f82bc9-14f3-4faf-bf24-a2a3d6ccc411 10Gi RWO standard 45s
159 data-cassandra-1 Bound pvc-b19d996b-cc83-4c10-9888-a55ba0eedc93 10Gi RWO standard 45s
160 data-cassandra-2 Bound pvc-d2473b2e-8e6c-42d2-9d13-8fdb842d8cb1 10Gi RWO standard 45s
161
1629. Revert the backup storage location to read-write mode.
163
164.. code-block:: shell
165
166 kubectl patch backupstoragelocation default \
167 --namespace velero \
168 --type merge \
169 --patch '{"spec":{"accessMode":"ReadWrite"}}'
170
17110. Revert the cluster label to the original and wait Fleet to reinstall the application.
172 It may take several minutes.
173
174.. code-block:: shell
175
176 # Confirm the application is installed
177 $ helm list -n aether-sdcore
178 NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
179 cassandra aether-sdcore 8 2021-10-01 22:27:18.739617668 +0000 UTC deployed cassandra-0.15.1 3.11.6
180 sd-core-4g aether-sdcore 26 2021-10-02 00:55:25.317693605 +0000 UTC deployed sd-core-0.7.3
181
182 # Confirm the data is restored
183 $ kubectl exec cassandra-0 -n aether-sdcore -- cqlsh $cassandra_ip -e 'select * from vhss.users_imsi'
184 ...
185 (10227 rows)