[SEBA-697] Operating guide for Prometheus and ELK Stack

Change-Id: Iecd38df4596411077bab50fce8505d21bd803018
diff --git a/charts/logging-monitoring.md b/charts/logging-monitoring.md
index 81845c1..6b6fca5 100644
--- a/charts/logging-monitoring.md
+++ b/charts/logging-monitoring.md
@@ -27,6 +27,8 @@
 - [Grafana](http://docs.grafana.org/) on port *31300*
 - [Prometheus](https://prometheus.io/docs/) on port *31301*
 
+For informations on how to use Prometheys please refer to the [Prometheus Operations Guide](../operating_cord/prometheus.md).
+
 ## logging charts
 
 By default, the logging charts rely on the [Persistent Storage](storage.md)
@@ -55,6 +57,4 @@
 The [Kibana](https://www.elastic.co/guide/en/kibana/current/index.html)
 dashboard can be found on port `30601`
 
-To start using Kibana, you must create an index under *Management > Index
-Patterns*.  Create one with a name of `logstash-*`, then you can search for
-events in the *Discover* section.
+For informations on how to use Elastic Stack please refer to the [ELK Stack Operations Guide](../operating_cord/elk_stack.md).
diff --git a/images/grafana/grafana-1.png b/images/grafana/grafana-1.png
new file mode 100644
index 0000000..40eea01
--- /dev/null
+++ b/images/grafana/grafana-1.png
Binary files differ
diff --git a/images/grafana/grafana-2.png b/images/grafana/grafana-2.png
new file mode 100644
index 0000000..8260070
--- /dev/null
+++ b/images/grafana/grafana-2.png
Binary files differ
diff --git a/images/grafana/grafana-create-1.png b/images/grafana/grafana-create-1.png
new file mode 100644
index 0000000..589bb59
--- /dev/null
+++ b/images/grafana/grafana-create-1.png
Binary files differ
diff --git a/images/grafana/grafana-create-2.png b/images/grafana/grafana-create-2.png
new file mode 100644
index 0000000..16606ba
--- /dev/null
+++ b/images/grafana/grafana-create-2.png
Binary files differ
diff --git a/images/grafana/grafana-create-3.png b/images/grafana/grafana-create-3.png
new file mode 100644
index 0000000..ec7e8cf
--- /dev/null
+++ b/images/grafana/grafana-create-3.png
Binary files differ
diff --git a/images/grafana/grafana-create-4.png b/images/grafana/grafana-create-4.png
new file mode 100644
index 0000000..a634299
--- /dev/null
+++ b/images/grafana/grafana-create-4.png
Binary files differ
diff --git a/images/grafana/grafana-create-5.png b/images/grafana/grafana-create-5.png
new file mode 100644
index 0000000..4c5bb86
--- /dev/null
+++ b/images/grafana/grafana-create-5.png
Binary files differ
diff --git a/images/grafana/grafana-export-1.png b/images/grafana/grafana-export-1.png
new file mode 100644
index 0000000..810f506
--- /dev/null
+++ b/images/grafana/grafana-export-1.png
Binary files differ
diff --git a/images/grafana/grafana-export-2.png b/images/grafana/grafana-export-2.png
new file mode 100644
index 0000000..a1bfb14
--- /dev/null
+++ b/images/grafana/grafana-export-2.png
Binary files differ
diff --git a/images/grafana/prometheus-metrics.png b/images/grafana/prometheus-metrics.png
new file mode 100644
index 0000000..90bef9a
--- /dev/null
+++ b/images/grafana/prometheus-metrics.png
Binary files differ
diff --git a/operating_cord/elk_stack.md b/operating_cord/elk_stack.md
index f0aeea4..6546452 100644
--- a/operating_cord/elk_stack.md
+++ b/operating_cord/elk_stack.md
@@ -1,54 +1,47 @@
 # ELK Stack
 
+> In order to use Elastic Stack the `logging` helm-chart needs to be installed.
+> That is part of the `cord-platform` helm-chart, but if you need to install it,
+> please refer to [this guide](../charts/logging-monitoring.md#logging-charts).
+
 CORD uses ELK Stack for logging information at all levels. CORD’s ELK Stack
 logger collects information from several components, including the XOS Core,
-API, and various Synchronizers. On a running physical POD, the logs can be
-accessed at `http://<head-node>/app/kibana`. For CORD-in-a-box these logs can
-be accessed at `http://<head-node>:8080/app/kibana`.
+API, and various Synchronizers. Together with logs events and alarms are
+collected in ELK Stack.
 
-There is also a second way of accessing low-level logs with additional
-verbosity that do not make it into ELK Stack. This involves accessing log
-messages in various containers directly. You may do so by running the following
-command on the head node.
-
-```shell
-docker logs < container-name
-```
+On a running POD, ELK can be accessed at `http://<pod-ip>:30601`.
 
 For most purposes, the logs in ELK Stack should contain enough information
 to diagnose problems. Furthermore, these logs thread together facts across
 multiple components by using the identifiers of XOS data model objects.
 
-> Important!
->
-> Before you can start using ELK stack, you must initialize its index.  To do
-> so:
->
-> 1) Replace `logstash-*` with `*` in the text box marked "Index pattern."
->
-> 2) Pick `@timestamp` as the "Time Filter Field Name."
->
-> Configuring the default logstash- index pattern will lead to HTTP errors in
-> your browser. If you did this by accident, then delete it under Management ->
-> Index Patterns, and create another pattern as described above.
+**Important!**
+
+To start using Kibana, you must create an index under *Management > Index
+Patterns*.  Create one with a name of `logstash-*`, then you can search for
+events in the *Discover* section.
+
+## Examples Query
 
 More information about using
 [Kibana](https://www.elastic.co/guide/en/kibana/current/getting-started.html)
 to access ELK Stack logs is available elsewhere, but to illustrate how the
 logging system is used in CORD, consider the following example quieries.
 
+### XOS Related queries
+
 The first example query enlists log messages in the implementation of a
 particular service synchronizer, in a given time range:
 
 ```sql
-+synchronizer_name:vtr-synchronizer AND +@timestamp:[now-1h TO now]
++synchronizer_name:rcord-synchronizer AND +@timestamp:[now-1h TO now]
 ```
 
 A second query gets log messages that are linked to the _Network_ data model
 across all services:
 
 ```sql
-+model_name: Network
++model_name: RCordSubscriber
 ```
 
 The same query can be refined to include the identifier of the specific
@@ -56,7 +49,7 @@
 page in the XOS GUI.
 
 ```sql
-+model_name: Network AND +pk:7
++model_name: RCordSubscriber AND +pk:68
 ```
 
 A final example lists log messages in a service synchronizer that
@@ -64,6 +57,143 @@
 execution:
 
 ```sql
-+synchronizer_name: vtr-synchronizer AND +exception
++synchronizer_name:rcord-synchronizer AND +exception
 ```
 
+## REST APIs based queries
+
+The first thing you need to do to use the REST APIs is to find the port on which the service is exposed.
+At the moment the [official chart](https://github.com/helm/charts/tree/master/incubator/elasticsearch)
+won't let us specify a port, so that will change for every deployment.
+
+To find the correct port, run this command anywhere you have the `kubectl` tool installed:
+
+```bash
+export ELK_PORT=$(kubectl get svc logging-elasticsearch-client -o json | jq .spec.ports[0].nodePort)
+```
+
+You can then query the REST API on that port, for example:
+
+```bash
+curl -XGET "http://localhost:$ELK_PORT"
+{
+  "name" : "logging-elasticsearch-client-587599fbdc-bkfhn",
+  "cluster_name" : "elasticsearch",
+  "cluster_uuid" : "Nwc4HpeBQrOOcL4IWVyRAw",
+  "version" : {
+    "number" : "6.5.4",
+    "build_flavor" : "oss",
+    "build_type" : "tar",
+    "build_hash" : "d2ef93d",
+    "build_date" : "2018-12-17T21:17:40.758843Z",
+    "build_snapshot" : false,
+    "lucene_version" : "7.5.0",
+    "minimum_wire_compatibility_version" : "5.6.0",
+    "minimum_index_compatibility_version" : "5.0.0"
+  },
+  "tagline" : "You Know, for Search"
+}
+```
+
+Following are some SEBA related querying example. This examples are executed from within the cluster,
+if you want to run them from a client machine replace `localhost` with the `cluster-ip`.
+
+### Get current authentication status for a particular ONU
+
+```bash
+curl -XGET "http://localhost:$ELK_PORT/_search" -H 'Content-Type: application/json' -d'
+{
+  "size": 1, 
+  "sort": {
+    "timestamp": "desc"
+  },
+  "query": {
+    "bool": {
+      "must": [
+        {
+          "match": {
+            "serialNumber": "PSMO12345678"
+          }
+        }
+      ],
+      "filter": {
+         "term": {
+          "kafka_topic": "authentication.events"
+        }
+      }
+    }
+  }
+}' | jq .hits.hits[0]
+```
+
+Example response:
+
+```json
+{
+  "_index": "logstash-2019.05.30",
+  "_type": "doc",
+  "_id": "Kw_bCWsBPqzdKVIdSbxC",
+  "_score": null,
+  "_source": {
+    "authenticationState": "APPROVED",
+    "deviceId": "of:0000aabbccddeeff",
+    "@version": "1",
+    "portNumber": "128",
+    "kafka_topic": "authentication.events",
+    "@timestamp": "2019-05-30T17:48:14.343Z",
+    "timestamp": "2019-05-30T17:48:14.308Z",
+    "kafka_key": "%{[@metadata][kafka][key]}",
+    "serialNumber": "PSMO12345678",
+    "type": "cord-kafka",
+    "kafka_timestamp": "1559238494311"
+  },
+  "sort": [
+    1559238494308
+  ]
+}
+```
+
+### Get all the events regarding a particular ONU
+
+```bash
+curl -XGET "http://localhost:$ELK_PORT/_search" -H 'Content-Type: application/json' -d'
+{
+  "sort": {
+    "timestamp": "desc"
+  },
+  "query": {
+    "bool": {
+      "must": [
+        {
+          "match": {
+            "serialNumber": "PSMO12345678"
+          }
+        }
+      ]
+    }
+  }
+}' | jq .hits.hits
+```
+
+### Get all the events regarding a particular action
+
+If you want to list all the authentication events, regardless fo the ONU serial number:
+
+```bash
+curl -XGET "http://localhost:$ELK_PORT/_search" -H 'Content-Type: application/json' -d'
+{
+  "sort": {
+    "timestamp": "desc"
+  },
+  "query": {
+    "bool": {
+      "filter": {
+        "term": {
+          "kafka_topic": "authentication.events"
+        }
+      }
+    }
+  }
+}
+' | jq .hits.hits
+```
\ No newline at end of file
diff --git a/operating_cord/prometheus.md b/operating_cord/prometheus.md
index c387479..1818004 100644
--- a/operating_cord/prometheus.md
+++ b/operating_cord/prometheus.md
@@ -1,3 +1,246 @@
 # Prometheus
 
-This is a placeholder for information on monitoring...
+> In order to use Elastic Stack the `nem-monitoring` helm-chart needs to be installed.
+> That is part of the `cord-platform` helm-chart, but if you need to install it,
+> please refer to [this guide](../charts/logging-monitoring.md#nem-monitoring-charts).
+
+CORD uses Prometheus for storing time-series metrics related to the POD usage.
+Within Prometheus you'll find statistics related to traffic through the hardware and metrics
+about operational requests made to the pod.
+
+On a running POD, Prometheus can be accessed at `http://<pod-ip>:31301`,
+but we suggest to utilize Grafana that already provides custom CORD
+dashboards, and is accessible at `http://<pod-ip>:31301`.
+
+Unless you customized the chart installation, the credentials to access Grafana are:
+
+```yaml
+username: admin
+password: strongpassword
+```
+
+## Visualize an existing dashboard in Grafana
+
+Once you are logged in Grafana you can list the existing dashboards by selecting the `Home`
+dropdown menu on the top left of the window:
+
+![grafana-dropdown](../images/grafana/grafana-1.png)
+
+Then select the dashboard you are interested in simply by clicking on it:
+
+![grafana-dashboard-list](../images/grafana/grafana-2.png)
+
+## I want to create a new visualization
+
+That's **great**! Let's get started.
+
+To create a new dashboard select the `+` sign in the Grafana sidebar and select Dashboard
+
+![grafana-create-dashboard-1](../images/grafana/grafana-create-1.png)
+
+Then select the kind of visualization you want to add to that dashboard
+
+![grafana-create-dashboard-2](../images/grafana/grafana-create-2.png)
+
+Once it has been added, click on the `edit` item in the panel dropdown
+
+![grafana-create-dashboard-3](../images/grafana/grafana-create-3.png)
+
+Now it's time to create the query that will populate the graph.
+
+The first thing to do is to select the `Data Source`
+
+![grafana-create-dashboard-4](../images/grafana/grafana-create-4.png)
+
+And then create the query to display the metric you are interested into
+
+![grafana-create-dashboard-5](../images/grafana/grafana-create-5.png)
+
+> In order to write a meaningful query we suggest to consult the
+> [Prometheus documentation](https://prometheus.io/docs/prometheus/latest/querying/basics/)
+
+To find the list of metrics that are available in Prometheus open the Prometheus dashboard
+at `http://<pod-ip>:31301/graph` and look them up in the dropdown
+
+![prometheus-metrics](../images/grafana/prometheus-metrics.png)
+
+## I want to save the new dashboard and make it available in CORD
+
+That's even **better**! Here's what you need to do.
+
+The CORD Grafana dashboard are made available through `configmaps` defined in the
+[`nem-monitoring`](https://github.com/opencord/helm-charts/tree/master/nem-monitoring/grafana-dashboards) helm chart.
+
+So the first thing to do is obtain the `.json` export of the dashboard.
+
+To do that click on the `Setting` icon on the top right of the screen
+
+![grafana-export-1](../images/grafana/grafana-export-1.png)
+
+then click on `View JSON` and copy the json into a file
+
+![grafana-export-2](../images/grafana/grafana-export-2.png)
+
+then you need to save the file in `~/cord/helm-charts/nem-monitoring/grafana-dashboards`,
+for this example we'll name that file `my_dashboard.json`.
+
+In order to load that dashboard into Grafana at startup create a new `configmap` in
+`~/cord/helm-charts/nem-monitoring/templates/grafana-dashboard-my-dashboard-configmap.yaml`
+with the following content:
+
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: grafana-dashboard-my-dashboard
+  labels:
+     grafana_dashboard: "1"
+data:
+  kb8s.json: |
+{{ .Files.Get "grafana-dashboards/my_dashboard.json" | indent 4 }}
+```
+
+## REST APIs based queries
+
+In some cases you want to fetch some data from Prometheus via REST, following are some examples of the
+queries you can achieve, but we strongly suggest to refer to the [official documentation](https://prometheus.io/docs/prometheus/latest/querying/api/)
+
+### Get the latest values of a metric
+
+```bash
+curl -X GET -G http://10.128.22.1:31301/api/v1/query \
+--data-urlencode 'query=onos_rx_bytes_total' | jq .
+```
+
+Example response:
+
+```json
+{
+  "status": "success",
+  "data": {
+    "resultType": "vector",
+    "result": [
+      {
+        "metric": {
+          "__name__": "onos_rx_bytes_total",
+          "device_id": "of:0000000000000001",
+          "instance": "kpi-exporter:8080",
+          "job": "voltha-kpi",
+          "port_id": "1"
+        },
+        "value": [
+          1559256189.732,
+          "3842"
+        ]
+      },
+      {
+        "metric": {
+          "__name__": "onos_rx_bytes_total",
+          "device_id": "of:0000000000000001",
+          "instance": "kpi-exporter:8080",
+          "job": "voltha-kpi",
+          "port_id": "2"
+        },
+        "value": [
+          1559256189.732,
+          "496584"
+        ]
+      }
+    ]
+  }
+}
+
+```
+
+### Get the latest values of a metric with a filter
+
+```bash
+curl -X GET -G http://10.128.22.1:31301/api/v1/query \
+--data-urlencode 'query=onos_rx_bytes_total{device_id="of:0000000000000001", port_id="1"}' | jq .
+```
+
+Example response:
+
+```json
+{
+  "status": "success",
+  "data": {
+    "resultType": "vector",
+    "result": [
+      {
+        "metric": {
+          "__name__": "onos_rx_bytes_total",
+          "device_id": "of:0000000000000001",
+          "instance": "kpi-exporter:8080",
+          "job": "voltha-kpi",
+          "port_id": "1"
+        },
+        "value": [
+          1559256175.475,
+          "3842"
+        ]
+      }
+    ]
+  }
+}
+
+```
+
+### Get the values of a metric with filters and a time range
+
+```bash 
+export START=$(date -d '1 minute ago' -u +"%Y-%m-%dT%H:%M:%SZ")
+export END=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
+
+
+curl -X GET -G 'http://10.128.22.1:31301/api/v1/query_range' \
+--data-urlencode 'query=onos_rx_bytes_total{device_id="of:0000000000000001", port_id="1"}' \
+--data-urlencode start=$START \
+--data-urlencode end=$END \
+--data-urlencode step=15s | jq .
+```
+
+Example response:
+
+```json
+{
+  "status": "success",
+  "data": {
+    "resultType": "matrix",
+    "result": [
+      {
+        "metric": {
+          "__name__": "onos_rx_bytes_total",
+          "device_id": "of:0000000000000001",
+          "instance": "kpi-exporter:8080",
+          "job": "voltha-kpi",
+          "port_id": "1"
+        },
+        "values": [
+          [
+            1559255865,
+            "3842"
+          ],
+          [
+            1559255880,
+            "3842"
+          ],
+          [
+            1559255895,
+            "3842"
+          ],
+          [
+            1559255910,
+            "3842"
+          ],
+          [
+            1559255925,
+            "3842"
+          ]
+        ]
+      }
+    ]
+  }
+}
+```
+