Blame - operations/procedures.rst - aether-docs

blob: e7d52ff16e8668dcdb120a2ada8f8a3afc43f054 [file] [log] [blame]

Zack Williams	9026f53	2020-11-30 11:34:32 -0700	[diff] [blame]	1	..
				2	SPDX-FileCopyrightText: © 2020 Open Networking Foundation <support@opennetworking.org>
				3	SPDX-License-Identifier: Apache-2.0
				4
Zack Williams	794532a	2021-03-18 17:38:36 -0700	[diff] [blame]	5	General Procedures
				6	==================
Zack Williams	9026f53	2020-11-30 11:34:32 -0700	[diff] [blame]	7
				8	Edge shutdown procedure
				9	-----------------------
				10
				11	To gracefully shutdown an Aether Edge Pod, follow the following steps:
				12
				13	1. Shutdown the fabric switches using ``shutdown -h now``
				14
				15	2. Shutdown the compute servers using ``shutdown -h now``
				16
				17	3. Shutdown the management server using ``shutdown -h now``
				18
				19	4. The management switch and eNB aren't capable of a graceful shutdown, so no
				20	steps need to be taken for that hardware.
				21
				22	5. Remove power from the pod.
				23
				24	.. note::
				25
				26	The shutdown steps can be automated with an :doc:`ad-hoc ansible command
				27	<ansible:user_guide/intro_adhoc>` and you have an ansible inventory of all
				28	the systems::
				29
				30	ansible -i inventory/sitename.ini -b -m shutdown -a "delay=60" all
				31
Zack Williams	1ae109e	2021-07-27 11:17:04 -0700	[diff] [blame]	32	The ``delay=60`` argument is to allow hosts behind the management server to
				33	be reached before the management server shuts down.
Zack Williams	9026f53	2020-11-30 11:34:32 -0700	[diff] [blame]	34
				35	Edge power up procedure
				36	-----------------------
				37
				38	1. Restore power to the pod equipment. The fabric and management switches will
				39	power on automatically.
				40
				41	2. Turn on the management server using the front panel power button
				42
				43	3. Turn on the compute servers using the front panel power buttons
Hyunsun Moon	4949b06	2021-10-01 14:42:15 -0700	[diff] [blame^]	44
				45	Restore stateful application procedure
				46	--------------------------------------
				47
				48	.. note::
				49
				50	PersistentVolumeClaim/PersistentVolume backup and restore is currently only available for ACC and AMP clusters.
				51
				52	1. Download and install Velero CLI following the `official guide <https://velero.io/docs/v1.7/basic-install/#install-the-cli>`_.
				53	You'll also need ``kubectl`` and ``helm`` command line tools.
				54
				55	2. Download the K8S config of the target cluster from Rancher to your workstation.
				56
				57	3. Open Rancher Continuous Delivery > Clusters dashboard,
				58	find the cluster the target application is running on,
				59	and temporarily update the cluster label used as the target application's cluster selector
				60	to uninstall the application and prevent it from being reinstalled during the restore process.
				61	Refer to the table below for the cluster selector labels for the Aether applications.
				62	It may take several minutes for the application uninstalled.
				63
				64	+-------------+-----------------+------------------+
				65	\| Application \| Original Label \| Temporary Label \|
				66	+-------------+-----------------+------------------+
				67	\| cassandra \| core4g=enabled \| core4g=disabled \|
				68	+-------------+-----------------+------------------+
				69	\| mongodb \| core5g=enabled \| core5g=disabled \|
				70	+-------------+-----------------+------------------+
				71	\| roc \| roc=enabled \| roc=disabled \|
				72	+-------------+-----------------+------------------+
				73
				74	.. image:: images/rancher-fleet-cluster-label-edit1.png
				75	:width: 753
				76
				77	.. image:: images/rancher-fleet-cluster-label-edit2.png
				78	:width: 753
				79
				80	4. Clean up existing PVC and PV for the application. In this guide, Cassandra is used as an example.
				81
				82	.. code-block:: shell
				83
				84	# Assume that we lost all HSSDB data
				85	$ kubectl exec cassandra-0 -n aether-sdcore -- cqlsh $cassandra_ip -e 'select * from vhss.users_imsi'
				86	<stdin>:1:InvalidRequest: code=2200 [Invalid query] message="Keyspace vhss does not exist"
				87
				88	# Confirm the application is uninstalled after updating the cluster label
				89	$ helm list -n aether-sdcore
				90	(no result)
				91
				92	# Clean up any remaining resources including PVC
				93	$ kubectl delete ns aether-sdcore
				94
				95	# Clean up released PVs if exists
				96	$ kubectl delete pv $(kubectl get pv \| grep cassandra \| grep Released \| awk '$1 {print$1}')
				97
				98	5. Find a backup to restore.
				99
				100	.. code-block:: shell
				101
				102	# Find the relevant backup schedule name
				103	$ velero schedule get
				104	NAME STATUS CREATED SCHEDULE BACKUP TTL LAST BACKUP SELECTOR
				105	velero-daily-logging Enabled 2021-09-25 01:35:24 -0700 PDT 0 0 * * * 720h0m0s 19h ago <none>
				106	velero-daily-monitoring Enabled 2021-09-25 01:35:25 -0700 PDT 0 0 * * * 720h0m0s 19h ago <none>
				107	velero-daily-roc Enabled 2021-09-25 01:35:25 -0700 PDT 0 0 * * * 720h0m0s 19h ago <none>
				108	velero-daily-sdcore Enabled 2021-09-25 01:35:25 -0700 PDT 0 0 * * * 720h0m0s 19h ago <none>
				109
				110	# List the backups
				111	$ velero backup get --selector velero.io/schedule-name=velero-daily-sdcore
				112	NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR
				113	velero-daily-sdcore-20211001000013 Completed 0 0 2021-09-30 17:00:19 -0700 PDT 29d default <none>
				114	velero-daily-sdcore-20210930000013 Completed 0 0 2021-09-29 17:00:28 -0700 PDT 28d default <none>
				115	...
				116
				117	# Confirm the backup includes all the necessary resources
				118	$ velero backup describe velero-daily-sdcore-20211001000013 --details
				119	...
				120	Resource List:
				121	v1/PersistentVolume:
				122	- pvc-67f82bc9-14f3-4faf-bf24-a2a3d6ccc411
				123	- pvc-b19d996b-cc83-4c10-9888-a55ba0eedc93
				124	- pvc-d2473b2e-8e6c-42d2-9d13-8fdb842d8cb1
				125	v1/PersistentVolumeClaim:
				126	- aether-sdcore/data-cassandra-0
				127	- aether-sdcore/data-cassandra-1
				128	- aether-sdcore/data-cassandra-2
				129
				130	6. Update the backup storage location to read-only mode to prevent backup object from being created or
				131	deleted in the backup location during the restore process.
				132
				133	.. code-block:: shell
				134
				135	$ kubectl patch backupstoragelocations default \
				136	--namespace velero \
				137	--type merge \
				138	--patch '{"spec":{"accessMode":"ReadOnly"}}'
				139
				140	7. Create a restore with the most recent backup.
				141
				142	.. code-block:: shell
				143
				144	# Create restore
				145	$ velero restore create --from-backup velero-daily-sdcore-20211001000013
				146
				147	# Wait STATUS become Completed
				148	$ velero restore get
				149	NAME BACKUP STATUS STARTED COMPLETED ERRORS WARNINGS CREATED SELECTOR
				150	velero-daily-sdcore-20211001000013-20211001141850 velero-daily-sdcore-20211001000013 Completed 2021-10-01 13:11:20 -0700 PDT <nil> 0 0 2021-10-01 13:11:20 -0700 PDT <none>
				151
				152	8. Confirm that PVCs are restored and "Bound" to the restored PV successfully.
				153
				154	.. code-block:: shell
				155
				156	$ kubectl get pvc -n aether-sdcore
				157	NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
				158	data-cassandra-0 Bound pvc-67f82bc9-14f3-4faf-bf24-a2a3d6ccc411 10Gi RWO standard 45s
				159	data-cassandra-1 Bound pvc-b19d996b-cc83-4c10-9888-a55ba0eedc93 10Gi RWO standard 45s
				160	data-cassandra-2 Bound pvc-d2473b2e-8e6c-42d2-9d13-8fdb842d8cb1 10Gi RWO standard 45s
				161
				162	9. Revert the backup storage location to read-write mode.
				163
				164	.. code-block:: shell
				165
				166	kubectl patch backupstoragelocation default \
				167	--namespace velero \
				168	--type merge \
				169	--patch '{"spec":{"accessMode":"ReadWrite"}}'
				170
				171	10. Revert the cluster label to the original and wait Fleet to reinstall the application.
				172	It may take several minutes.
				173
				174	.. code-block:: shell
				175
				176	# Confirm the application is installed
				177	$ helm list -n aether-sdcore
				178	NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
				179	cassandra aether-sdcore 8 2021-10-01 22:27:18.739617668 +0000 UTC deployed cassandra-0.15.1 3.11.6
				180	sd-core-4g aether-sdcore 26 2021-10-02 00:55:25.317693605 +0000 UTC deployed sd-core-0.7.3
				181
				182	# Confirm the data is restored
				183	$ kubectl exec cassandra-0 -n aether-sdcore -- cqlsh $cassandra_ip -e 'select * from vhss.users_imsi'
				184	...
				185	(10227 rows)