operations/procedures.rst - aether-docs - Gitiles

 ..
    SPDX-FileCopyrightText: © 2020 Open Networking Foundation <support@opennetworking.org>
    SPDX-License-Identifier: Apache-2.0

 General Procedures
 ==================

 Edge shutdown procedure
 -----------------------

 To gracefully shutdown an Aether Edge Pod, follow the following steps:

 1. Shutdown the fabric switches using ``shutdown -h now``

 2. Shutdown the compute servers using ``shutdown -h now``

 3. Shutdown the management server using ``shutdown -h now``

 4. The management switch and eNB aren't capable of a graceful shutdown, so no
    steps need to be taken for that hardware.

 5. Remove power from the pod.

 .. note::

    The shutdown steps can be automated with an :doc:`ad-hoc ansible command
    <ansible:user_guide/intro_adhoc>` and you have an ansible inventory of all
    the systems::

       ansible -i inventory/sitename.ini -b -m shutdown -a "delay=60" all

    The ``delay=60`` argument is to allow hosts behind the management server to
    be reached before the management server shuts down.

 Edge power up procedure
 -----------------------

 1. Restore power to the pod equipment.  The fabric and management switches will
    power on automatically.

 2. Turn on the management server using the front panel power button

 3. Turn on the compute servers using the front panel power buttons

 Restore stateful application procedure
 --------------------------------------

 .. note::

    PersistentVolumeClaim/PersistentVolume backup and restore is currently only available for ACC and AMP clusters.

 1. Download and install Velero CLI following the `official guide <https://velero.io/docs/v1.7/basic-install/#install-the-cli>`_.
    You'll also need ``kubectl`` and ``helm`` command line tools.

 2. Download the K8S config of the target cluster from Rancher to your workstation.

 3. Open Rancher **Continuous Delivery** > **Clusters** dashboard,
    find the cluster the target application is running on,
    and temporarily update the cluster label used as the target application's cluster selector
    to uninstall the application and prevent it from being reinstalled during the restore process.
    Refer to the table below for the cluster selector labels for the Aether applications.
    It may take several minutes for the application uninstalled.

    +-------------+-----------------+------------------+
    | Application | Original Label  | Temporary Label  |
    +-------------+-----------------+------------------+
    | cassandra   | core4g=enabled  | core4g=disabled  |
    +-------------+-----------------+------------------+
    | mongodb     | core5g=enabled  | core5g=disabled  |
    +-------------+-----------------+------------------+
    | roc         | roc=enabled     | roc=disabled     |
    +-------------+-----------------+------------------+

 .. image:: images/rancher-fleet-cluster-label-edit1.png
     :width: 753

 .. image:: images/rancher-fleet-cluster-label-edit2.png
     :width: 753

 4. Clean up existing PVC and PV for the application. In this guide, Cassandra is used as an example.

 .. code-block:: shell

    # Assume that we lost all HSSDB data
    $ kubectl exec cassandra-0 -n aether-sdcore -- cqlsh $cassandra_ip -e 'select * from vhss.users_imsi'
    <stdin>:1:InvalidRequest: code=2200 [Invalid query] message="Keyspace vhss does not exist"

    # Confirm the application is uninstalled after updating the cluster label
    $ helm list -n aether-sdcore
    (no result)

    # Clean up any remaining resources including PVC
    $ kubectl delete ns aether-sdcore

    # Clean up released PVs if exists
    $ kubectl delete pv $(kubectl get pv | grep cassandra | grep Released | awk '$1 {print$1}')

 5. Find a backup to restore.

 .. code-block:: shell

    # Find the relevant backup schedule name
    $ velero schedule get
    NAME                      STATUS    CREATED                         SCHEDULE    BACKUP TTL   LAST BACKUP   SELECTOR
    velero-daily-logging      Enabled   2021-09-25 01:35:24 -0700 PDT   0 0 * * *   720h0m0s     19h ago       <none>
    velero-daily-monitoring   Enabled   2021-09-25 01:35:25 -0700 PDT   0 0 * * *   720h0m0s     19h ago       <none>
    velero-daily-roc          Enabled   2021-09-25 01:35:25 -0700 PDT   0 0 * * *   720h0m0s     19h ago       <none>
    velero-daily-sdcore       Enabled   2021-09-25 01:35:25 -0700 PDT   0 0 * * *   720h0m0s     19h ago       <none>

    # List the backups
    $ velero backup get --selector velero.io/schedule-name=velero-daily-sdcore
    NAME                                 STATUS      ERRORS   WARNINGS   CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
    velero-daily-sdcore-20211001000013   Completed   0        0          2021-09-30 17:00:19 -0700 PDT   29d       default            <none>
    velero-daily-sdcore-20210930000013   Completed   0        0          2021-09-29 17:00:28 -0700 PDT   28d       default            <none>
    ...

    # Confirm the backup includes all the necessary resources
    $ velero backup describe velero-daily-sdcore-20211001000013 --details
    ...
    Resource List:
     v1/PersistentVolume:
       - pvc-67f82bc9-14f3-4faf-bf24-a2a3d6ccc411
       - pvc-b19d996b-cc83-4c10-9888-a55ba0eedc93
       - pvc-d2473b2e-8e6c-42d2-9d13-8fdb842d8cb1
     v1/PersistentVolumeClaim:
       - aether-sdcore/data-cassandra-0
       - aether-sdcore/data-cassandra-1
       - aether-sdcore/data-cassandra-2

 6. Update the backup storage location to read-only mode to prevent backup object from being created or
    deleted in the backup location during the restore process.

 .. code-block:: shell

    $ kubectl patch backupstoragelocations default \
        --namespace velero \
        --type merge \
        --patch '{"spec":{"accessMode":"ReadOnly"}}'

 7. Create a restore with the most recent backup.

 .. code-block:: shell

    # Create restore
    $ velero restore create --from-backup velero-daily-sdcore-20211001000013

    # Wait STATUS become Completed
    $ velero restore get
    NAME                                                BACKUP                               STATUS       STARTED                         COMPLETED   ERRORS   WARNINGS   CREATED                         SELECTOR
    velero-daily-sdcore-20211001000013-20211001141850   velero-daily-sdcore-20211001000013   Completed    2021-10-01 13:11:20 -0700 PDT   <nil>       0        0          2021-10-01 13:11:20 -0700 PDT   <none>

 8. Confirm that PVCs are restored and "Bound" to the restored PV successfully.

 .. code-block:: shell

    $ kubectl get pvc -n aether-sdcore
    NAME                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    data-cassandra-0    Bound    pvc-67f82bc9-14f3-4faf-bf24-a2a3d6ccc411   10Gi       RWO            standard       45s
    data-cassandra-1    Bound    pvc-b19d996b-cc83-4c10-9888-a55ba0eedc93   10Gi       RWO            standard       45s
    data-cassandra-2    Bound    pvc-d2473b2e-8e6c-42d2-9d13-8fdb842d8cb1   10Gi       RWO            standard       45s

 9. Revert the backup storage location to read-write mode.

 .. code-block:: shell

    kubectl patch backupstoragelocation default \
      --namespace velero \
      --type merge \
      --patch '{"spec":{"accessMode":"ReadWrite"}}'

 10. Revert the cluster label to the original and wait Fleet to reinstall the application.
     It may take several minutes.

 .. code-block:: shell

    # Confirm the application is installed
    $ helm list -n aether-sdcore
    NAME      	NAMESPACE       	REVISION	UPDATED                                	STATUS  	CHART           	APP VERSION
    cassandra 	aether-sdcore     8       	2021-10-01 22:27:18.739617668 +0000 UTC	deployed	cassandra-0.15.1	3.11.6
    sd-core-4g	aether-sdcore     26      	2021-10-02 00:55:25.317693605 +0000 UTC	deployed	sd-core-0.7.3

    # Confirm the data is restored
    $ kubectl exec cassandra-0 -n aether-sdcore -- cqlsh $cassandra_ip -e 'select * from vhss.users_imsi'
    ...
    (10227 rows)
	..
	SPDX-FileCopyrightText: © 2020 Open Networking Foundation <support@opennetworking.org>
	SPDX-License-Identifier: Apache-2.0

	General Procedures
	==================

	Edge shutdown procedure
	-----------------------

	To gracefully shutdown an Aether Edge Pod, follow the following steps:

	1. Shutdown the fabric switches using ``shutdown -h now``

	2. Shutdown the compute servers using ``shutdown -h now``

	3. Shutdown the management server using ``shutdown -h now``

	4. The management switch and eNB aren't capable of a graceful shutdown, so no
	steps need to be taken for that hardware.

	5. Remove power from the pod.

	.. note::

	The shutdown steps can be automated with an :doc:`ad-hoc ansible command
	<ansible:user_guide/intro_adhoc>` and you have an ansible inventory of all
	the systems::

	ansible -i inventory/sitename.ini -b -m shutdown -a "delay=60" all

	The ``delay=60`` argument is to allow hosts behind the management server to
	be reached before the management server shuts down.

	Edge power up procedure
	-----------------------

	1. Restore power to the pod equipment. The fabric and management switches will
	power on automatically.

	2. Turn on the management server using the front panel power button

	3. Turn on the compute servers using the front panel power buttons

	Restore stateful application procedure
	--------------------------------------

	.. note::

	PersistentVolumeClaim/PersistentVolume backup and restore is currently only available for ACC and AMP clusters.

	1. Download and install Velero CLI following the `official guide <https://velero.io/docs/v1.7/basic-install/#install-the-cli>`_.
	You'll also need ``kubectl`` and ``helm`` command line tools.

	2. Download the K8S config of the target cluster from Rancher to your workstation.

	3. Open Rancher Continuous Delivery > Clusters dashboard,
	find the cluster the target application is running on,
	and temporarily update the cluster label used as the target application's cluster selector
	to uninstall the application and prevent it from being reinstalled during the restore process.
	Refer to the table below for the cluster selector labels for the Aether applications.
	It may take several minutes for the application uninstalled.

	+-------------+-----------------+------------------+
	\| Application \| Original Label \| Temporary Label \|
	+-------------+-----------------+------------------+
	\| cassandra \| core4g=enabled \| core4g=disabled \|
	+-------------+-----------------+------------------+
	\| mongodb \| core5g=enabled \| core5g=disabled \|
	+-------------+-----------------+------------------+
	\| roc \| roc=enabled \| roc=disabled \|
	+-------------+-----------------+------------------+

	.. image:: images/rancher-fleet-cluster-label-edit1.png
	:width: 753

	.. image:: images/rancher-fleet-cluster-label-edit2.png
	:width: 753

	4. Clean up existing PVC and PV for the application. In this guide, Cassandra is used as an example.

	.. code-block:: shell

	# Assume that we lost all HSSDB data
	$ kubectl exec cassandra-0 -n aether-sdcore -- cqlsh $cassandra_ip -e 'select * from vhss.users_imsi'
	<stdin>:1:InvalidRequest: code=2200 [Invalid query] message="Keyspace vhss does not exist"

	# Confirm the application is uninstalled after updating the cluster label
	$ helm list -n aether-sdcore
	(no result)

	# Clean up any remaining resources including PVC
	$ kubectl delete ns aether-sdcore

	# Clean up released PVs if exists
	$ kubectl delete pv $(kubectl get pv \| grep cassandra \| grep Released \| awk '$1 {print$1}')

	5. Find a backup to restore.

	.. code-block:: shell

	# Find the relevant backup schedule name
	$ velero schedule get
	NAME STATUS CREATED SCHEDULE BACKUP TTL LAST BACKUP SELECTOR
	velero-daily-logging Enabled 2021-09-25 01:35:24 -0700 PDT 0 0 * * * 720h0m0s 19h ago <none>
	velero-daily-monitoring Enabled 2021-09-25 01:35:25 -0700 PDT 0 0 * * * 720h0m0s 19h ago <none>
	velero-daily-roc Enabled 2021-09-25 01:35:25 -0700 PDT 0 0 * * * 720h0m0s 19h ago <none>
	velero-daily-sdcore Enabled 2021-09-25 01:35:25 -0700 PDT 0 0 * * * 720h0m0s 19h ago <none>

	# List the backups
	$ velero backup get --selector velero.io/schedule-name=velero-daily-sdcore
	NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR
	velero-daily-sdcore-20211001000013 Completed 0 0 2021-09-30 17:00:19 -0700 PDT 29d default <none>
	velero-daily-sdcore-20210930000013 Completed 0 0 2021-09-29 17:00:28 -0700 PDT 28d default <none>
	...

	# Confirm the backup includes all the necessary resources
	$ velero backup describe velero-daily-sdcore-20211001000013 --details
	...
	Resource List:
	v1/PersistentVolume:
	- pvc-67f82bc9-14f3-4faf-bf24-a2a3d6ccc411
	- pvc-b19d996b-cc83-4c10-9888-a55ba0eedc93
	- pvc-d2473b2e-8e6c-42d2-9d13-8fdb842d8cb1
	v1/PersistentVolumeClaim:
	- aether-sdcore/data-cassandra-0
	- aether-sdcore/data-cassandra-1
	- aether-sdcore/data-cassandra-2

	6. Update the backup storage location to read-only mode to prevent backup object from being created or
	deleted in the backup location during the restore process.

	.. code-block:: shell

	$ kubectl patch backupstoragelocations default \
	--namespace velero \
	--type merge \
	--patch '{"spec":{"accessMode":"ReadOnly"}}'

	7. Create a restore with the most recent backup.

	.. code-block:: shell

	# Create restore
	$ velero restore create --from-backup velero-daily-sdcore-20211001000013

	# Wait STATUS become Completed
	$ velero restore get
	NAME BACKUP STATUS STARTED COMPLETED ERRORS WARNINGS CREATED SELECTOR
	velero-daily-sdcore-20211001000013-20211001141850 velero-daily-sdcore-20211001000013 Completed 2021-10-01 13:11:20 -0700 PDT <nil> 0 0 2021-10-01 13:11:20 -0700 PDT <none>

	8. Confirm that PVCs are restored and "Bound" to the restored PV successfully.

	.. code-block:: shell

	$ kubectl get pvc -n aether-sdcore
	NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
	data-cassandra-0 Bound pvc-67f82bc9-14f3-4faf-bf24-a2a3d6ccc411 10Gi RWO standard 45s
	data-cassandra-1 Bound pvc-b19d996b-cc83-4c10-9888-a55ba0eedc93 10Gi RWO standard 45s
	data-cassandra-2 Bound pvc-d2473b2e-8e6c-42d2-9d13-8fdb842d8cb1 10Gi RWO standard 45s

	9. Revert the backup storage location to read-write mode.

	.. code-block:: shell

	kubectl patch backupstoragelocation default \
	--namespace velero \
	--type merge \
	--patch '{"spec":{"accessMode":"ReadWrite"}}'

	10. Revert the cluster label to the original and wait Fleet to reinstall the application.
	It may take several minutes.

	.. code-block:: shell

	# Confirm the application is installed
	$ helm list -n aether-sdcore
	NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
	cassandra aether-sdcore 8 2021-10-01 22:27:18.739617668 +0000 UTC deployed cassandra-0.15.1 3.11.6
	sd-core-4g aether-sdcore 26 2021-10-02 00:55:25.317693605 +0000 UTC deployed sd-core-0.7.3

	# Confirm the data is restored
	$ kubectl exec cassandra-0 -n aether-sdcore -- cqlsh $cassandra_ip -e 'select * from vhss.users_imsi'
	...
	(10227 rows)