AETHER-1715 Add instructions to restore PVC/PV using Velero

Change-Id: I66f6ec9cff5e2f181b269640ace3848b9641ab27
diff --git a/operations/procedures.rst b/operations/procedures.rst
index 114b6bf..e7d52ff 100644
--- a/operations/procedures.rst
+++ b/operations/procedures.rst
@@ -41,3 +41,145 @@
 2. Turn on the management server using the front panel power button
 
 3. Turn on the compute servers using the front panel power buttons
+
+Restore stateful application procedure
+--------------------------------------
+
+.. note::
+
+   PersistentVolumeClaim/PersistentVolume backup and restore is currently only available for ACC and AMP clusters.
+
+1. Download and install Velero CLI following the `official guide <https://velero.io/docs/v1.7/basic-install/#install-the-cli>`_.
+   You'll also need ``kubectl`` and ``helm`` command line tools.
+
+2. Download the K8S config of the target cluster from Rancher to your workstation.
+
+3. Open Rancher **Continuous Delivery** > **Clusters** dashboard,
+   find the cluster the target application is running on,
+   and temporarily update the cluster label used as the target application's cluster selector
+   to uninstall the application and prevent it from being reinstalled during the restore process.
+   Refer to the table below for the cluster selector labels for the Aether applications.
+   It may take several minutes for the application uninstalled.
+
+   +-------------+-----------------+------------------+
+   | Application | Original Label  | Temporary Label  |
+   +-------------+-----------------+------------------+
+   | cassandra   | core4g=enabled  | core4g=disabled  |
+   +-------------+-----------------+------------------+
+   | mongodb     | core5g=enabled  | core5g=disabled  |
+   +-------------+-----------------+------------------+
+   | roc         | roc=enabled     | roc=disabled     |
+   +-------------+-----------------+------------------+
+
+.. image:: images/rancher-fleet-cluster-label-edit1.png
+    :width: 753
+
+.. image:: images/rancher-fleet-cluster-label-edit2.png
+    :width: 753
+
+4. Clean up existing PVC and PV for the application. In this guide, Cassandra is used as an example.
+
+.. code-block:: shell
+
+   # Assume that we lost all HSSDB data
+   $ kubectl exec cassandra-0 -n aether-sdcore -- cqlsh $cassandra_ip -e 'select * from vhss.users_imsi'
+   <stdin>:1:InvalidRequest: code=2200 [Invalid query] message="Keyspace vhss does not exist"
+
+   # Confirm the application is uninstalled after updating the cluster label
+   $ helm list -n aether-sdcore
+   (no result)
+
+   # Clean up any remaining resources including PVC
+   $ kubectl delete ns aether-sdcore
+
+   # Clean up released PVs if exists
+   $ kubectl delete pv $(kubectl get pv | grep cassandra | grep Released | awk '$1 {print$1}')
+
+5. Find a backup to restore.
+
+.. code-block:: shell
+
+   # Find the relevant backup schedule name
+   $ velero schedule get
+   NAME                      STATUS    CREATED                         SCHEDULE    BACKUP TTL   LAST BACKUP   SELECTOR
+   velero-daily-logging      Enabled   2021-09-25 01:35:24 -0700 PDT   0 0 * * *   720h0m0s     19h ago       <none>
+   velero-daily-monitoring   Enabled   2021-09-25 01:35:25 -0700 PDT   0 0 * * *   720h0m0s     19h ago       <none>
+   velero-daily-roc          Enabled   2021-09-25 01:35:25 -0700 PDT   0 0 * * *   720h0m0s     19h ago       <none>
+   velero-daily-sdcore       Enabled   2021-09-25 01:35:25 -0700 PDT   0 0 * * *   720h0m0s     19h ago       <none>
+
+   # List the backups
+   $ velero backup get --selector velero.io/schedule-name=velero-daily-sdcore
+   NAME                                 STATUS      ERRORS   WARNINGS   CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
+   velero-daily-sdcore-20211001000013   Completed   0        0          2021-09-30 17:00:19 -0700 PDT   29d       default            <none>
+   velero-daily-sdcore-20210930000013   Completed   0        0          2021-09-29 17:00:28 -0700 PDT   28d       default            <none>
+   ...
+
+   # Confirm the backup includes all the necessary resources
+   $ velero backup describe velero-daily-sdcore-20211001000013 --details
+   ...
+   Resource List:
+    v1/PersistentVolume:
+      - pvc-67f82bc9-14f3-4faf-bf24-a2a3d6ccc411
+      - pvc-b19d996b-cc83-4c10-9888-a55ba0eedc93
+      - pvc-d2473b2e-8e6c-42d2-9d13-8fdb842d8cb1
+    v1/PersistentVolumeClaim:
+      - aether-sdcore/data-cassandra-0
+      - aether-sdcore/data-cassandra-1
+      - aether-sdcore/data-cassandra-2
+
+6. Update the backup storage location to read-only mode to prevent backup object from being created or
+   deleted in the backup location during the restore process.
+
+.. code-block:: shell
+
+   $ kubectl patch backupstoragelocations default \
+       --namespace velero \
+       --type merge \
+       --patch '{"spec":{"accessMode":"ReadOnly"}}'
+
+7. Create a restore with the most recent backup.
+
+.. code-block:: shell
+
+   # Create restore
+   $ velero restore create --from-backup velero-daily-sdcore-20211001000013
+
+   # Wait STATUS become Completed
+   $ velero restore get
+   NAME                                                BACKUP                               STATUS       STARTED                         COMPLETED   ERRORS   WARNINGS   CREATED                         SELECTOR
+   velero-daily-sdcore-20211001000013-20211001141850   velero-daily-sdcore-20211001000013   Completed    2021-10-01 13:11:20 -0700 PDT   <nil>       0        0          2021-10-01 13:11:20 -0700 PDT   <none>
+
+8. Confirm that PVCs are restored and "Bound" to the restored PV successfully.
+
+.. code-block:: shell
+
+   $ kubectl get pvc -n aether-sdcore
+   NAME                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
+   data-cassandra-0    Bound    pvc-67f82bc9-14f3-4faf-bf24-a2a3d6ccc411   10Gi       RWO            standard       45s
+   data-cassandra-1    Bound    pvc-b19d996b-cc83-4c10-9888-a55ba0eedc93   10Gi       RWO            standard       45s
+   data-cassandra-2    Bound    pvc-d2473b2e-8e6c-42d2-9d13-8fdb842d8cb1   10Gi       RWO            standard       45s
+
+9. Revert the backup storage location to read-write mode.
+
+.. code-block:: shell
+
+   kubectl patch backupstoragelocation default \
+     --namespace velero \
+     --type merge \
+     --patch '{"spec":{"accessMode":"ReadWrite"}}'
+
+10. Revert the cluster label to the original and wait Fleet to reinstall the application.
+    It may take several minutes.
+
+.. code-block:: shell
+
+   # Confirm the application is installed
+   $ helm list -n aether-sdcore
+   NAME      	NAMESPACE       	REVISION	UPDATED                                	STATUS  	CHART           	APP VERSION
+   cassandra 	aether-sdcore     8       	2021-10-01 22:27:18.739617668 +0000 UTC	deployed	cassandra-0.15.1	3.11.6
+   sd-core-4g	aether-sdcore     26      	2021-10-02 00:55:25.317693605 +0000 UTC	deployed	sd-core-0.7.3
+
+   # Confirm the data is restored
+   $ kubectl exec cassandra-0 -n aether-sdcore -- cqlsh $cassandra_ip -e 'select * from vhss.users_imsi'
+   ...
+   (10227 rows)