Blame - operations/procedures.rst - aether-docs - Gitiles

blob: 0f2788ef2dc701de821f81756a3b06b51ebb70e6 [file] [log] [blame]

Zack Williams	9026f53	2020-11-30 11:34:32 -0700	[diff] [blame]	1	..
				2	SPDX-FileCopyrightText: © 2020 Open Networking Foundation <support@opennetworking.org>
				3	SPDX-License-Identifier: Apache-2.0
				4
Scott Baker	3c7cfea	2022-03-09 16:22:42 -0800	[diff] [blame]	5	Other Procedures
				6	================
Zack Williams	9026f53	2020-11-30 11:34:32 -0700	[diff] [blame]	7
				8	Edge shutdown procedure
				9	-----------------------
				10
				11	To gracefully shutdown an Aether Edge Pod, follow the following steps:
				12
				13	1. Shutdown the fabric switches using ``shutdown -h now``
				14
				15	2. Shutdown the compute servers using ``shutdown -h now``
				16
				17	3. Shutdown the management server using ``shutdown -h now``
				18
				19	4. The management switch and eNB aren't capable of a graceful shutdown, so no
				20	steps need to be taken for that hardware.
				21
				22	5. Remove power from the pod.
				23
				24	.. note::
				25
				26	The shutdown steps can be automated with an :doc:`ad-hoc ansible command
				27	<ansible:user_guide/intro_adhoc>` and you have an ansible inventory of all
				28	the systems::
				29
				30	ansible -i inventory/sitename.ini -b -m shutdown -a "delay=60" all
				31
Zack Williams	1ae109e	2021-07-27 11:17:04 -0700	[diff] [blame]	32	The ``delay=60`` argument is to allow hosts behind the management server to
				33	be reached before the management server shuts down.
Zack Williams	9026f53	2020-11-30 11:34:32 -0700	[diff] [blame]	34
				35	Edge power up procedure
				36	-----------------------
				37
				38	1. Restore power to the pod equipment. The fabric and management switches will
				39	power on automatically.
				40
				41	2. Turn on the management server using the front panel power button
				42
				43	3. Turn on the compute servers using the front panel power buttons
Hyunsun Moon	4949b06	2021-10-01 14:42:15 -0700	[diff] [blame]	44
				45	Restore stateful application procedure
				46	--------------------------------------
				47
				48	.. note::
				49
				50	PersistentVolumeClaim/PersistentVolume backup and restore is currently only available for ACC and AMP clusters.
				51
				52	1. Download and install Velero CLI following the `official guide <https://velero.io/docs/v1.7/basic-install/#install-the-cli>`_.
				53	You'll also need ``kubectl`` and ``helm`` command line tools.
				54
				55	2. Download the K8S config of the target cluster from Rancher to your workstation.
				56
				57	3. Open Rancher Continuous Delivery > Clusters dashboard,
				58	find the cluster the target application is running on,
				59	and temporarily update the cluster label used as the target application's cluster selector
				60	to uninstall the application and prevent it from being reinstalled during the restore process.
				61	Refer to the table below for the cluster selector labels for the Aether applications.
				62	It may take several minutes for the application uninstalled.
				63
				64	+-------------+-----------------+------------------+
				65	\| Application \| Original Label \| Temporary Label \|
				66	+-------------+-----------------+------------------+
				67	\| cassandra \| core4g=enabled \| core4g=disabled \|
				68	+-------------+-----------------+------------------+
				69	\| mongodb \| core5g=enabled \| core5g=disabled \|
				70	+-------------+-----------------+------------------+
				71	\| roc \| roc=enabled \| roc=disabled \|
				72	+-------------+-----------------+------------------+
				73
				74	.. image:: images/rancher-fleet-cluster-label-edit1.png
				75	:width: 753
				76
				77	.. image:: images/rancher-fleet-cluster-label-edit2.png
				78	:width: 753
				79
				80	4. Clean up existing PVC and PV for the application. In this guide, Cassandra is used as an example.
				81
				82	.. code-block:: shell
				83
				84	# Assume that we lost all HSSDB data
Hyunsun Moon	9a8ad09	2021-10-12 23:51:58 -0700	[diff] [blame]	85	$ kubectl exec cassandra-0 -n aether-sdcore-4g -- cqlsh $cassandra_ip -e 'select * from vhss.users_imsi'
Hyunsun Moon	4949b06	2021-10-01 14:42:15 -0700	[diff] [blame]	86	<stdin>:1:InvalidRequest: code=2200 [Invalid query] message="Keyspace vhss does not exist"
				87
				88	# Confirm the application is uninstalled after updating the cluster label
Hyunsun Moon	9a8ad09	2021-10-12 23:51:58 -0700	[diff] [blame]	89	$ helm list -n aether-sdcore-4g
Hyunsun Moon	4949b06	2021-10-01 14:42:15 -0700	[diff] [blame]	90	(no result)
				91
				92	# Clean up any remaining resources including PVC
Hyunsun Moon	9a8ad09	2021-10-12 23:51:58 -0700	[diff] [blame]	93	$ kubectl delete ns aether-sdcore-4g
Hyunsun Moon	4949b06	2021-10-01 14:42:15 -0700	[diff] [blame]	94
				95	# Clean up released PVs if exists
				96	$ kubectl delete pv $(kubectl get pv \| grep cassandra \| grep Released \| awk '$1 {print$1}')
				97
				98	5. Find a backup to restore.
				99
				100	.. code-block:: shell
				101
				102	# Find the relevant backup schedule name
				103	$ velero schedule get
Hyunsun Moon	81c18e2	2021-10-13 19:03:51 -0700	[diff] [blame]	104	NAME STATUS CREATED SCHEDULE BACKUP TTL LAST BACKUP SELECTOR
				105	velero-daily-cassandra Enabled 2021-10-11 15:33:30 -0700 PDT 0 7 * * * 720h0m0s 11h ago app=cassandra
				106	velero-daily-mongodb Enabled 2021-10-11 15:33:30 -0700 PDT 0 7 * * * 720h0m0s 11h ago app.kubernetes.io/name=mongodb
				107	velero-daily-opendistro-es Enabled 2021-10-11 15:33:30 -0700 PDT 0 7 * * * 720h0m0s 11h ago app=opendistro-es
				108	velero-daily-prometheus Enabled 2021-10-11 15:33:30 -0700 PDT 0 7 * * * 720h0m0s 11h ago app=prometheus
Hyunsun Moon	4949b06	2021-10-01 14:42:15 -0700	[diff] [blame]	109
				110	# List the backups
Hyunsun Moon	9a8ad09	2021-10-12 23:51:58 -0700	[diff] [blame]	111	$ velero backup get --selector velero.io/schedule-name=velero-daily-cassandra
				112	NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR
				113	velero-daily-cassandra-20211012070020 Completed 0 0 2021-10-12 00:00:41 -0700 PDT 29d default app=cassandra
				114	velero-daily-cassandra-20211011070019 Completed 0 0 2021-10-11 00:00:26 -0700 PDT 28d default app=cassandra
Hyunsun Moon	4949b06	2021-10-01 14:42:15 -0700	[diff] [blame]	115	...
				116
				117	# Confirm the backup includes all the necessary resources
Hyunsun Moon	9a8ad09	2021-10-12 23:51:58 -0700	[diff] [blame]	118	$ velero backup describe velero-daily-cassandra-20211012070020 --details
Hyunsun Moon	4949b06	2021-10-01 14:42:15 -0700	[diff] [blame]	119	...
				120	Resource List:
Hyunsun Moon	9a8ad09	2021-10-12 23:51:58 -0700	[diff] [blame]	121	v1/PersistentVolume:
				122	- pvc-50ccd76e-3808-432b-882f-8858ecebf25b
				123	- pvc-67f82bc9-14f3-4faf-bf24-a2a3d6ccc411
				124	v1/PersistentVolumeClaim:
				125	- aether-sdcore-4g/data-cassandra-0
				126	- aether-sdcore-4g/data-cassandra-1
				127	- aether-sdcore-4g/data-cassandra-2
Hyunsun Moon	4949b06	2021-10-01 14:42:15 -0700	[diff] [blame]	128
				129	6. Update the backup storage location to read-only mode to prevent backup object from being created or
				130	deleted in the backup location during the restore process.
				131
				132	.. code-block:: shell
				133
				134	$ kubectl patch backupstoragelocations default \
				135	--namespace velero \
				136	--type merge \
				137	--patch '{"spec":{"accessMode":"ReadOnly"}}'
				138
				139	7. Create a restore with the most recent backup.
				140
				141	.. code-block:: shell
				142
				143	# Create restore
Hyunsun Moon	9a8ad09	2021-10-12 23:51:58 -0700	[diff] [blame]	144	$ velero restore create --from-backup velero-daily-cassandra-20211012070020
Hyunsun Moon	4949b06	2021-10-01 14:42:15 -0700	[diff] [blame]	145
				146	# Wait STATUS become Completed
				147	$ velero restore get
Hyunsun Moon	9a8ad09	2021-10-12 23:51:58 -0700	[diff] [blame]	148	NAME BACKUP STATUS STARTED COMPLETED ERRORS WARNINGS CREATED SELECTOR
				149	velero-daily-cassandra-20211012070020-20211012141850 velero-daily-cassandra-20211012070020 Completed 2021-10-12 13:11:20 -0700 PDT <nil> 0 0 2021-10-12 13:11:20 -0700 PDT <none>
Hyunsun Moon	4949b06	2021-10-01 14:42:15 -0700	[diff] [blame]	150
				151	8. Confirm that PVCs are restored and "Bound" to the restored PV successfully.
				152
				153	.. code-block:: shell
				154
				155	$ kubectl get pvc -n aether-sdcore
				156	NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
Hyunsun Moon	9a8ad09	2021-10-12 23:51:58 -0700	[diff] [blame]	157	data-cassandra-0 Bound pvc-50ccd76e-3808-432b-882f-8858ecebf25b 10Gi RWO standard 45s
				158	data-cassandra-1 Bound pvc-67f82bc9-14f3-4faf-bf24-a2a3d6ccc411 10Gi RWO standard 45s
				159	data-cassandra-2 Bound pvc-a7f055b2-aab1-41ce-b3f4-c4bcb83b0232 10Gi RWO standard 45s
Hyunsun Moon	4949b06	2021-10-01 14:42:15 -0700	[diff] [blame]	160
				161	9. Revert the backup storage location to read-write mode.
				162
				163	.. code-block:: shell
				164
				165	kubectl patch backupstoragelocation default \
				166	--namespace velero \
				167	--type merge \
				168	--patch '{"spec":{"accessMode":"ReadWrite"}}'
				169
				170	10. Revert the cluster label to the original and wait Fleet to reinstall the application.
				171	It may take several minutes.
				172
				173	.. code-block:: shell
				174
				175	# Confirm the application is installed
Hyunsun Moon	9a8ad09	2021-10-12 23:51:58 -0700	[diff] [blame]	176	$$ kubectl get po -n aether-sdcore-4g -l app=cassandra
				177	NAME READY STATUS RESTARTS AGE
				178	cassandra-0 1/1 Running 0 1h
				179	cassandra-1 1/1 Running 0 1h
				180	cassandra-2 1/1 Running 0 1h
Hyunsun Moon	4949b06	2021-10-01 14:42:15 -0700	[diff] [blame]	181
				182	# Confirm the data is restored
				183	$ kubectl exec cassandra-0 -n aether-sdcore -- cqlsh $cassandra_ip -e 'select * from vhss.users_imsi'
				184	...
				185	(10227 rows)