[SEBA-154] and [SEBA-104] Docs
Change-Id: I449edbb78c0900af0d4ce6bb6a1a80a77c625faf
diff --git a/SUMMARY.md b/SUMMARY.md
index ee4e0ec..cd55421 100644
--- a/SUMMARY.md
+++ b/SUMMARY.md
@@ -35,7 +35,8 @@
* [VTN Setup](prereqs/vtn-setup.md)
* [M-CORD](charts/mcord.md)
* [XOSSH](charts/xossh.md)
- * [MONITORING](charts/monitoring.md)
+ * [Logging and Monitoring](charts/logging-monitoring.md)
+ * [Persistent Storage](charts/storage.md)
* [Operations Guide](operating_cord/operating_cord.md)
* [General Info](operating_cord/general.md)
* [GUI](operating_cord/gui.md)
diff --git a/charts/kafka.md b/charts/kafka.md
index 4004be0..99cfe0e 100644
--- a/charts/kafka.md
+++ b/charts/kafka.md
@@ -7,22 +7,11 @@
```shell
helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator
-helm install --name cord-kafka \
---set replicas=1 \
---set persistence.enabled=false \
---set zookeeper.servers=1 \
---set zookeeper.persistence.enabled=false \
-incubator/kafka
+helm install -f examples/kafka-single.yaml --version 0.8.8 -n cord-kafka incubator/kafka
+helm install -f examples/kafka-single.yaml --version 0.8.8 -n voltha-kafka incubator/kafka
```
-If you are experierencing problems with a multi instance installation of kafka,
-you can try to install a single instance of it:
-
-```shell
-helm install --name cord-kafka incubator/kafka -f examples/kafka-single.yaml
-```
-
-## Viewing events on the bus
+## Viewing events with kafkacat
As a debugging tool you can deploy a container containing `kafkacat` and use
that to listen for events:
@@ -31,21 +20,27 @@
helm install -n kafkacat xos-tools/kafkacat/
```
-Once the container is up and running you can exec into the pod and use this
-command to listen for events on a particular topic:
+Once the container is up and running you can exec into the pod and use various
+commands. For a complete reference, please refer to the [`kafkacat`
+guide](https://github.com/edenhill/kafkacat)
-```shell
-kafkacat -C -b <kafka-service> -t <kafka-topic>
-```
+ A few examples:
-For a complete reference, please refer to the [`kafkacat` guide](https://github.com/edenhill/kafkacat)
+- List available topics:
+ ```shell
+ kafkacat -L -b <kafka-service>
+ ```
-### Most common topics
+- Listen for events on a particular topic:
+ ```shell
+ kafkacat -C -b <kafka-service> -t <kafka-topic>
+ ```
-Here are some of the most common topic you can listen to on `cord-kafka`:
+- Some common topics to listen for on `cord-kafka` and `voltha-kafka`:
-```shell
-kafkacat -b cord-kafka -t onu.events
-kafkacat -b cord-kafka -t authentication.events
-kafkacat -b cord-kafka -t dhcp.events
-```
\ No newline at end of file
+ ```shell
+ kafkacat -b cord-kafka -t onu.events
+ kafkacat -b cord-kafka -t authentication.events
+ kafkacat -b cord-kafka -t dhcp.events
+ kafkacat -b voltha-kafka -t voltha.events
+ ```
diff --git a/charts/local-persistent-volume.md b/charts/local-persistent-volume.md
deleted file mode 100644
index b2bd8d8..0000000
--- a/charts/local-persistent-volume.md
+++ /dev/null
@@ -1,40 +0,0 @@
-# Local Persistent Volume Helm chart
-
-## Introduction
-
-The `local-persistent-volume` helm chart is a utility helm chart. It was
-created mainly to persist the `xos-core` DB data but this helm can be used
-to persist any data.
-
-It uses a relatively new kubernetes feature (it's a beta feature
-in Kubernetes 1.10.x) that allows us to define an independent persistent
-store in a kubernetes cluster.
-
-The helm chart mainly consists of the following kubernetes resources:
-
-- A storage class resource representing a local persistent volume
-- A persistent volume resource associated with the storage class and a specific directory on a specific node
-- A persistent volume claim resource that claims certain portion of the persistent volume on behalf of a pod
-
-The following variables are configurable in the helm chart:
-
-- `storageClassName`: The name of the storage class resource
-- `persistentVolumeName`: The name of the persistent volume resource
-- `pvClaimName`: The name of the persistent volume claim resource
-- `volumeHostName`: The name of the kubernetes node on which the data will be persisted
-- `hostLocalPath`: The directory or volume mount path on the chosen chosen node where data will be persisted
-- `pvStorageCapacity`: The capacity of the volume available to the persistent volume resource (e.g. 10Gi)
-
-Note: For this helm chart to work, the volume mount path or directory specified in the `hostLocalPath` variable needs to exist before the helm chart is deployed.
-
-## Standard Install
-
-```shell
-helm install -n local-store local-persistent-volume
-```
-
-## Standard Uninstall
-
-```shell
-helm delete --purge local-store
-```
diff --git a/charts/logging-monitoring.md b/charts/logging-monitoring.md
new file mode 100644
index 0000000..8b80de4
--- /dev/null
+++ b/charts/logging-monitoring.md
@@ -0,0 +1,46 @@
+# Deploy Logging and Monitoring components
+
+To read more about logging and monitoring in CORD, please refer to [the design
+document](https://docs.google.com/document/d/1hCljvKzsNW9D2Y1cbvOTNOCbTy1AgH33zXvVjbicjH8/edit).
+
+There are currently two charts that deploy logging and monitoring
+functionality, `nem-monitoring` and `logging`. Both of these charts depend on
+having [kafka](kafka.md) instances running in order to pass messages.
+
+
+## `nem-monitoring` charts
+
+```shell
+helm dep update nem-monitoring
+helm install -n nem-monitoring nem-monitoring
+```
+
+> NOTE: In order to display `voltha` kpis you need to have `voltha`
+> and `voltha-kafka` installed.
+
+### Monitoring Dashboards
+
+This chart exposes two dashboards:
+
+- [Grafana](http://docs.grafana.org/) on port `31300`
+- [Prometheus](https://prometheus.io/docs/) on port `31301`
+
+## `logging` charts
+
+```shell
+helm dep up logging
+helm install -n logging logging
+```
+
+For smaller developer/test environments without persistent storage, please use
+the `examples/logging-single.yaml` file to run the logging chart, which doesn't
+create PVC's.
+
+### Logging Dashboard
+
+The [Kibana](https://www.elastic.co/guide/en/kibana/current/index.html)
+dashboard can be found on port `30601`
+
+To start using Kibana, you must create an index under *Management > Index
+Patterns*. Create one with a name of `logstash-*`, then you can search for
+events in the *Discover* section.
diff --git a/charts/monitoring.md b/charts/monitoring.md
deleted file mode 100644
index 60af004..0000000
--- a/charts/monitoring.md
+++ /dev/null
@@ -1,20 +0,0 @@
-# Deploy Monitoring
-
-To read more about the monitoring in CORD, please refer to this [document](https://docs.google.com/document/d/1hCljvKzsNW9D2Y1cbvOTNOCbTy1AgH33zXvVjbicjH8/edit).
-
-To install the required components in you cluster:
-
-```shell
-helm dep update nem-monitoring
-helm install -n nem-monitoring nem-monitoring
-```
-
-> NOTE: In order to display `voltha` kpis you need to have `voltha`
-> and `voltha-kafka` installed.
-
-## Access the monitoring dashboard
-
-This chart exposes two dashboards:
-
-- grafana on port `31300`
-- prometheus on port `31301`
diff --git a/charts/storage.md b/charts/storage.md
new file mode 100644
index 0000000..6f8301b
--- /dev/null
+++ b/charts/storage.md
@@ -0,0 +1,339 @@
+# Persistent Storage charts
+
+These charts implement persistent storage that is within Kubernetes.
+
+See the Kubernetes documentation for background material on how persistent
+storage works:
+
+- [StorageClass](https://kubernetes.io/docs/concepts/storage/storage-classes/)
+- [PersistentVolume](https://kubernetes.io/docs/concepts/storage/persistent-volumes/)
+
+Using persistent storage is optional during development, but should be
+provisioned for and configured during production and realistic testing
+scenarios.
+
+## Local Directory
+
+The `local-provisioner` chart creates
+[local](https://kubernetes.io/docs/concepts/storage/volumes/#local) volumes on
+specific nodes, from directories. As there are no enforced limits for volume
+size and the node names are preconfigured, this chart is intended for use only
+for development and testing.
+
+Multiple directories can be specified in the `volumes` list - an example is
+given in the `values.yaml` file of the chart.
+
+The `StorageClass` created for all volumes is `local-directory`.
+
+There is an ansible script that automates the creation of directories on all
+the kubernetes nodes. Make sure that the inventory name in ansible matches the
+one given as `host` in the `volumes` list, then invoke with:
+
+```shell
+ansible-playbook -i <path to ansbible inventory> --extra-vars "helm_values_file:<path to values.yaml>" local-directory-playbook.yaml
+```
+
+## Local Provisioner
+
+The `local-provisioner` chart provides a
+[local](https://kubernetes.io/docs/concepts/storage/volumes/#local),
+non-distributed `PersistentVolume` that is usable on one specific node. It
+does this by running the k8s [external storage local volume
+provisioner](https://github.com/kubernetes-incubator/external-storage/tree/master/local-volume/helm/provisioner).
+
+This type of storage is useful for workloads that have their own intrinsic HA
+or redundancy strategies, and only need storage on multiple nodes.
+
+This provisioner is not "dynamic" in the sense that that it can't create a new
+`PersistentVolume` on demand from a storage pool, but the provisioner can
+automatically create volumes as disks/partitions are mounted on the nodes.
+
+To create a new PV, a disk or partition on a node has to be formatted and
+mounted in specific locations, after which the provisioner will automatically
+create a `PersistentVolume` for the mount. As these volumes can't be split or
+resized, care must be taken to ensure that the correct quantity, types, and
+sizes of mounts are created for all the `PersistentVolumeClaim`'s required to
+be bound for a specific workload.
+
+By default, two `StorageClasses` were created to differentiate between Hard
+Disks and SSD's:
+
+- `local-hdd`, which offers PV's on volumes mounted in `/mnt/local-storage/hdd/*`
+- `local-ssd`, which offers PV's on volumes mounted in `/mnt/local-storage/ssd/*`
+
+### Adding a new local volume on a node
+
+If you wanted to add a new volume to a node, you'd physically install a new
+disk in the system, then determine the device file it uses. Assuming that it's
+a hard disk and the device file is `/dev/sdb`, you might partition, format, and
+mount the disk like this:
+
+```shell
+$ sudo parted -s /dev/sdb \
+ mklabel gpt \
+ mkpart primary ext4 1MiB 100%
+$ sudo mkfs.ext4 /dev/sdb1
+$ echo "/dev/sdb1 /mnt/local-storage/hdd/sdb1 ext4 defaults 0 0" | sudo tee -a /etc/fstab
+$ sudo mount /mnt/local-storage/hdd/sdb1
+```
+
+Then check that the `PersistentVolume` is created by the `local-provisioner`:
+
+```shell
+$ kubectl get pv
+NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
+local-pv-2bfa2c43 19Gi RWO Delete Available local-hdd 6h
+
+$ kubectl describe pv local-pv-
+Name: local-pv-2bfa2c43
+Labels: <none>
+Annotations: pv.kubernetes.io/provisioned-by=local-volume-provisioner-node1-...
+Finalizers: [kubernetes.io/pv-protection]
+StorageClass: local-hdd
+Status: Available
+Claim:
+Reclaim Policy: Delete
+Access Modes: RWO
+Capacity: 19Gi
+Node Affinity:
+ Required Terms:
+ Term 0: kubernetes.io/hostname in [node1]
+Message:
+Source:
+ Type: LocalVolume (a persistent volume backed by local storage on a node)
+ Path: /mnt/local-storage/hdd/sdb1
+Events: <none>
+```
+
+## Ceph deployed with Rook
+
+[Rook](https://rook.github.io/) provides an abstraction layer for Ceph and
+other distributed persistent data storage systems.
+
+There are 3 Rook charts included with CORD:
+
+- `rook-operator`, which runs the volume provisioning portion of Rook (and is a
+ thin wrapper around the upstream [rook-ceph
+ chart](https://rook.github.io/docs/rook/v0.8/helm-operator.html)
+
+- `rook-cluster`, which defines the Ceph cluster and creates these
+ `StorageClass` objects usable by other charts:
+
+ - `cord-ceph-rbd`, dynamically create `PersistentVolumes` when a
+ `PersistentVolumeClaim` is created. These volumes are only usable by a
+ single container at a time.
+
+ - `cord-cephfs`, a single shared filesystem which is mountable
+ `ReadWriteMulti` on multiple containers via `PersistentVolumeClaim`. It's
+ size is predetermined.
+
+- `rook-tools`, which provides a toolbox container for troubleshooting problems
+ with Rook/Ceph
+
+To create persistent volumes, you will need to load the first 2 charts, with
+the third only needed for troubleshooting and diagnostics.
+
+### Rook Node Prerequisties
+
+By default, all the nodes running k8s are expected to have a directory named
+`/mnt/ceph` where the Ceph data is stored (the `cephDataDir` variable can be
+used to change this path).
+
+In a production deployment, this would ideally be located on its own block
+storage device.
+
+There should be at least 3 nodes with storage available to provide data
+redundancy.
+
+### Loading Rook Charts
+
+First, add the `rook-beta` repo to helm, then load the `rook-operator` chart
+into the `rook-ceph-system` namespace:
+
+```shell
+cd helm-charts/storage
+helm repo add rook-beta https://charts.rook.io/beta
+helm dep update rook-operator
+helm install --namespace rook-ceph-system -n rook-operator rook-operator
+```
+
+Check that it's running (it will start the `rook-ceph-agent` and
+`rook-discover` DaemonSets):
+
+```shell
+$ kubectl -n rook-ceph-system get pods
+NAME READY STATUS RESTARTS AGE
+rook-ceph-agent-4c66b 1/1 Running 0 6m
+rook-ceph-agent-dsdsr 1/1 Running 0 6m
+rook-ceph-agent-gwjlk 1/1 Running 0 6m
+rook-ceph-operator-687b7bb6ff-vzjsl 1/1 Running 0 7m
+rook-discover-9f87r 1/1 Running 0 6m
+rook-discover-lmhz9 1/1 Running 0 6m
+rook-discover-mxsr5 1/1 Running 0 6m
+```
+
+Next, load the `rook-cluster` chart, which connects the storage on the nodes to
+the Ceph pool, and the CephFS filesystem:
+
+```shell
+helm install -n rook-cluster rook-cluster
+```
+
+Check that the cluster is running - this may take a few minutes, and look for the
+`rook-ceph-mds-*` containers to start:
+
+```shell
+$ kubectl -n rook-ceph get pods
+NAME READY STATUS RESTARTS AGE
+rook-ceph-mds-cord-ceph-filesystem-7564b648cf-4wxzn 1/1 Running 0 1m
+rook-ceph-mds-cord-ceph-filesystem-7564b648cf-rcvnx 1/1 Running 0 1m
+rook-ceph-mgr-a-75654fb698-zqj67 1/1 Running 0 5m
+rook-ceph-mon0-v9d2t 1/1 Running 0 5m
+rook-ceph-mon1-4sxgc 1/1 Running 0 5m
+rook-ceph-mon2-6b6pj 1/1 Running 0 5m
+rook-ceph-osd-id-0-85d887f76c-44w9d 1/1 Running 0 4m
+rook-ceph-osd-id-1-866fb5c684-lmxfp 1/1 Running 0 4m
+rook-ceph-osd-id-2-557dd69c5c-qdnmb 1/1 Running 0 4m
+rook-ceph-osd-prepare-node1-bfzzm 0/1 Completed 0 4m
+rook-ceph-osd-prepare-node2-dt4gx 0/1 Completed 0 4m
+rook-ceph-osd-prepare-node3-t5fnn 0/1 Completed 0 4m
+
+$ kubectl -n rook-ceph get storageclass
+NAME PROVISIONER AGE
+cord-ceph-rbd ceph.rook.io/block 6m
+cord-cephfs kubernetes.io/no-provisioner 6m
+
+$ kubectl -n rook-ceph get filesystems
+NAME AGE
+cord-ceph-filesystem 6m
+
+$ kubectl -n rook-ceph get pools
+NAME AGE
+cord-ceph-pool 6m
+
+$ kubectl -n rook-ceph get persistentvolume
+NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
+cord-cephfs-pv 20Gi RWX Retain Available cord-cephfs 7m
+```
+
+At this point you can create a `PersistentVolumeClaim` on `cord-ceph-rbd` and a
+corresponding `PersistentVolume` will be created by the `rook-ceph-operator`
+acting as a volume provisioner and bound to the PVC.
+
+Creating a `PeristentVolumeClaim` on `cord-cephfs` will mount the same CephFS
+filesystem on every container that requests it. The CephFS PV implementation
+currently isn't as mature as the Ceph RDB volumes, and may not remount properly
+when used with a PVC.
+
+### Troubleshooting Rook
+
+Checking the `rook-ceph-operator` logs can be enlightening:
+
+```shell
+kubectl -n rook-ceph-system logs -f rook-ceph-operator-...
+```
+
+The [Rook toolbox container](https://rook.io/docs/rook/v0.8/toolbox.html) has
+been containerized as the `rook-tools` chart, and provides a variety of tools
+for debugging Rook and Ceph.
+
+Load the `rook-tools` chart:
+
+```shell
+helm install -n rook-tools rook-tools
+```
+
+Once the container is running (check with `kubectl -n rook-ceph get pods`),
+exec into it to run a shell to access all tools:
+
+```shell
+kubectl -n rook-ceph exec -it rook-ceph-tools bash
+```
+
+or run a one-off command:
+
+```shell
+kubectl -n rook-ceph exec rook-ceph-tools -- ceph status
+```
+
+or mount the CephFS volume:
+
+```shell
+kubectl -n rook-ceph exec -it rook-ceph-tools bash
+mkdir /mnt/cephfs
+mon_endpoints=$(grep mon_host /etc/ceph/ceph.conf | awk '{print $3}')
+my_secret=$(grep key /etc/ceph/keyring | awk '{print $3}')
+mount -t ceph -o name=admin,secret=$my_secret $mon_endpoints:/ /mnt/cephfs
+ls /mnt/cephfs
+```
+
+### Cleaning up after Rook
+
+The `rook-operator` chart will leave a few `DaemonSet` behind after it's
+removed. Clean these up using these commands:
+
+```shell
+kubectl -n rook-ceph-system delete daemonset rook-ceph-agent
+kubectl -n rook-ceph-system delete daemonset rook-discover
+helm delete --purge rook-operator
+```
+
+If you have other charts that create `PersistentVolumeClaims`, you may need to
+clean them up manually (for example, if you've changed the `StorageClass` they
+use), list them with:
+
+```shell
+kubectl --all-namespaces get pvc
+```
+
+Files may be left behind in the Ceph storage directory and/or Rook
+configuration that need to be deleted before starting `rook-*` charts. If
+you've used the `automation-tools/kubespray-installer` scripts to set up a
+environment named `test`, you can delete all these files with the following
+commands:
+
+```shell
+cd cord/automation-tools/kubespray-installer
+ansible -i inventories/test/inventory.cfg -b -m shell -a "rm -rf /var/lib/rook && rm -rf /mnt/ceph/*" all
+```
+
+The current upgrade process for Rook involves manual intervention and
+inspection using the tools container.
+
+## Using Persistent Storage
+
+The general process for using persistent storage is to create a
+[PersistentVolumeClaim](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims)
+on the appropriate
+[StorageClass](https://kubernetes.io/docs/concepts/storage/storage-classes/)
+for the workload you're trying to run.
+
+### Example: XOS Database on a local directory
+
+For development and testing, it may be useful to persist the XOS database
+
+```shell
+helm install -f examples/xos-db-local-dir.yaml -n xos-core xos-core
+```
+
+### Example: XOS Database on a Ceph RBD volume
+
+The XOS Database (Postgres) wants a volume that persists if a node goes down or
+is taken out of service, not shared with other containers running Postgres,
+thus the Ceph RBD volume is a reasonable choice to use with it.
+
+```shell
+helm install -f examples/xos-db-ceph-rbd.yaml -n xos-core xos-core
+```
+
+### Example: Docker Registry on CephFS shared filesystem
+
+The Docker Registry wants a filesystem that is the shared across all
+containers, so it's a suitable workload for the `cephfs` shared filesystem.
+
+There's an example values file available in `helm-charts/examples/registry-cephfs.yaml`
+
+```shell
+helm install -f examples/registry-cephfs.yaml -n docker-registry stable/docker-registry
+```
+
diff --git a/charts/voltha.md b/charts/voltha.md
index eb7b47a..03169b5 100644
--- a/charts/voltha.md
+++ b/charts/voltha.md
@@ -1,28 +1,21 @@
# Deploy VOLTHA
+VOLTHA depends on having a [kafka message bus](kafka.md) deployed with a name
+of `voltha-kafka`, so deploy that with helm before deploying the voltha chart.
+
+
## First Time Installation
-Download the helm charts `incubator` repository
+Download the helm charts `incubator` repository:
```shell
helm repo add incubator https://kubernetes-charts-incubator.storage.googleapis.com/
```
-Build dependencies
+Update dependencies within the voltha chart:
```shell
-helm dep build voltha
-```
-
-Install the kafka dependency
-
-```shell
-helm install --name voltha-kafka \
---set replicas=1 \
---set persistence.enabled=false \
---set zookeeper.servers=1 \
---set zookeeper.persistence.enabled=false \
-incubator/kafka
+helm dep up voltha
```
There is an `etcd-operator` **known bug** that prevents deploying
diff --git a/prereqs/k8s-multi-node.md b/prereqs/k8s-multi-node.md
index 7b70562..40eba73 100644
--- a/prereqs/k8s-multi-node.md
+++ b/prereqs/k8s-multi-node.md
@@ -19,7 +19,8 @@
* **Operator/Developer Machine** (1x, either physical or virtual machine)
* Has Git installed
* Has Python3 installed (<https://www.python.org/downloads/>)
- * Has a stable version of Ansible installed (<http://docs.ansible.com/ansible/latest/intro_installation.html>)
+ * Has a stable version of Ansible installed (<http://docs.ansible.com/ansible/latest/intro_installation.html>), tested with version `2.5.3`
+ * Has [ansible-modules-hashivault](https://pypi.org/project/ansible-modules-hashivault/) installed where ansible can use it.
* Is able to reach the target servers (ssh into them)
* **Target/Cluster Machines** (at least 3x, either physical or virtual machines)
* Run Ubuntu 16.04 server
diff --git a/profiles/rcord/workflows/att.md b/profiles/rcord/workflows/att.md
index 0d384a1..4e889bb 100644
--- a/profiles/rcord/workflows/att.md
+++ b/profiles/rcord/workflows/att.md
@@ -159,4 +159,4 @@
### Device monitoring
-Please refer to the [monitoring](../../../charts/monitoring.md) chart.
\ No newline at end of file
+Please refer to the [monitoring](../../../charts/logging-monitoring.md) chart.