adding operating guide Change-Id: I6b77fc6f17fe3941af2b8679a626f85d6012a284

commit: bd786deaa2dcb691edb96556fd10460006d74614 [log] [tgz]
author: llp <llp@onlab.us> Fri Jul 07 15:57:27 2017 -0700
committer: llp <llp@onlab.us> Mon Jul 10 07:08:47 2017 -0700
tree: 187be529ed4e0e088ba13703d18af33cd8bc9ec7
parent: 512389a36c8130a7992382b852a1268d324ca1fa [diff]
diff --git a/.gitignore b/.gitignore
index 774a54c..95b37f0 100644
--- a/.gitignore
+++ b/.gitignore

@@ -43,3 +43,6 @@
 
 # generated config
 genconfig/*
+
+# GitBook
+_book

diff --git a/README.md b/README.md
index f3c628a..5bb2fb9 100644
--- a/README.md
+++ b/README.md

@@ -3,7 +3,7 @@
 This is the main entry point for building and installing CORD.
 
 If this is your first encounter with CORD, we suggest you start by
-bringing up an emulated version called CORD-in-a-Box.
+bringing up an emulated version called _CORD-in-a-Box_.
 It installs CORD on a set of virtual machines running on a single
 physical server. Just follow our [CORD-in-a-Box Guide](docs/quickstart.md).
 
@@ -14,7 +14,7 @@
 
 For additional information about the CORD Project, see:
 
-* [Project Home](http://opencord.org)
+* [Website](http://opencord.org)
 * [Wiki](http://wiki.opencord.org)
 * [Jira](http://jira.opencord.org)
 * [Gerrit](http://gerrit.opencord.org)

diff --git a/SUMMARY.md b/SUMMARY.md
deleted file mode 100644
index bccca5a..0000000
--- a/SUMMARY.md
+++ /dev/null

@@ -1,6 +0,0 @@
-# Summary
-
-* [Building CORD](README.md)
-    * [CORD-in-a-Box](docs/quickstart.md)
-    * [Physical POD](docs/quickstart_physical.md)
-

diff --git a/docs/Makefile b/docs/Makefile
new file mode 100644
index 0000000..a369a94
--- /dev/null
+++ b/docs/Makefile

@@ -0,0 +1,7 @@
+default: book
+
+book:
+	gitbook init; gitbook serve &
+
+clean:
+	rm -rf _book

diff --git a/docs/README.md b/docs/README.md
new file mode 100644
index 0000000..e90c928
--- /dev/null
+++ b/docs/README.md

@@ -0,0 +1,13 @@
+# Building and Installing CORD
+
+This guide describes how to build and install CORD.
+
+If this is your first encounter with CORD, we suggest you start by
+bringing up an emulated version called _CORD-in-a-Box_.
+It installs CORD on a set of virtual machines running on a single
+physical server. Just follow our [CORD-in-a-Box Guide](quickstart.md).
+
+You can also install CORD on a physical POD. This involves first assembling
+a set of servers and switches, and then pointing the build system at
+that target hardware. Just follow our
+[Physical POD Guide](quickstart_physical.md).

diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md
new file mode 100644
index 0000000..ee5a7b4
--- /dev/null
+++ b/docs/SUMMARY.md

@@ -0,0 +1,9 @@
+# Summary
+
+* [Building CORD](README.md)
+    * [CORD-in-a-Box](quickstart.md)
+    * [Physical POD](quickstart_physical.md)
+* [Operating CORD](operate/README.md)
+    * [Powering Up a POD](operate/power_up.md)
+    * [ELK Stack Logs](operate/elk_stack.md)
+

diff --git a/docs/operate/README.md b/docs/operate/README.md
new file mode 100644
index 0000000..a2c2581
--- /dev/null
+++ b/docs/operate/README.md

@@ -0,0 +1,13 @@
+# Operating CORD
+
+This guide defines various processes and procedures for operating a CORD POD.
+It assumes the [build-and-install](../README.md) has already
+completed, and you want to operate and manage a running POD.
+
+
+Today, CORD most often runs for demo, development, or evaluation
+purposes, so this guide is limited to simple procedures suitable for
+such settings. We expect more realistic operational scenarios will be
+supported in the future. It is also the case that CORD's operations
+and management interface is primarily defined by its Northbound API,
+which is documented at `<head-node>/apidocs/` on a running POD.

diff --git a/docs/operate/elk_stack.md b/docs/operate/elk_stack.md
new file mode 100644
index 0000000..672eb7c
--- /dev/null
+++ b/docs/operate/elk_stack.md

@@ -0,0 +1,57 @@
+# ELK Stack Logs
+
+CORD uses ELK Stack for logging information at all levels. CORD’s
+ELK Stack logger collects information from several components,
+including the XOS Core, API, and various Synchronizers. On a running
+POD, the logs can be accessed at `http://<head-node>:8080/kibana`.
+
+There is also a second way of accessing low-level logs with additional
+verbosity that do not make it into ELK Stack. This involves accessing log
+messages in various containers directly. You may do so by running the
+following command on the head node.
+
+```
+$ docker logs < container-name
+```
+
+For most purposes, the logs in ELK Stack should contain enough information
+to diagnose problems. Furthermore, these logs thread together facts across
+multiple components by using the identifiers of XOS data model objects.
+
+More information about using
+[Kibana](https://www.elastic.co/guide/en/kibana/current/getting-started.html)
+to access ELK Stack logs is available elsewhere, but to illustrate how the logging
+system is used in CORD, consider the following example quieries.
+
+The first example query enlists log messages in the implementation of a
+particular service synchronizer, in a given time range:
+
+```
++synchronizer_name:vtr-synchronizer AND +@timestamp:[now-1h TO now]
+```
+
+A second query gets log messages that are linked to the _Network_ data model
+across all services:
+
+```
++model_name: Network
+```
+
+The same query can be refined to include the identifier of the specific
+_Network_ object in question. You can obtain the object id from the object’s
+page in the XOS GUI.
+
+```
++model_name: Network AND +pk:7
+```
+
+A final example lists log messages in a service synchronizer that
+contain Python exceptions, and will usually correspond to anomalous
+execution:
+
+```
++synchronizer_name: vtr-synchronizer AND +exception
+```
+
+
+

diff --git a/docs/operate/power_up.md b/docs/operate/power_up.md
new file mode 100644
index 0000000..7826963
--- /dev/null
+++ b/docs/operate/power_up.md

@@ -0,0 +1,124 @@
+# Powering Up a POD
+
+This guide describes how to power up a previously installed CORD POD that
+has been powered down (cleanly or otherwise). The end goal of the power up
+procedure is a fully functioning CORD POD.
+
+## Boot the Head Node
+
+* **Physical  POD:** Power on the head node
+* **CiaB:** Bring up the prod VM:
+```
+$ cd ~/cord/build; vagrant up prod
+```
+
+## Check the Head Node Services
+
+1. Verify that `mgmtbr` and `fabric` interfaces are up and have IP addresses
+2. Verify that MAAS UI is running and accessible:
+  * **Physical POD:** `http://<head-node>/MAAS`
+  * **CiaB:** `http://<ciab-server>:8080/MAAS`
+> **Troubleshooting: MAAS UI not available on CiaB.**
+> If you are running a CiaB and there is no webserver on port 8080, it might
+> be necessary to refresh port forwarding to the prod VM.
+> Run `ps ax|grep 8080`
+> and look for an SSH command (will look something like this):
+```
+31353 pts/5    S      0:00 ssh -o User=vagrant -o Port=22 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o ForwardX11=no -o IdentityFile="/users/acb/cord/build/targets/cord-in-a-box/.vagrant/machines/prod/libvirt/private_key" -L *:8080:192.168.121.14:80 -N 192.168.121.14
+```
+> A workaround is to kill this process, and then copy and paste the command
+> above into another window on the CiaB server to set up a new SSH port forwarding connection.
+
+3. Verify that the following Docker containers are running: mavenrepo, switchq, automation, provisioner, generator, harvester, storage, allocator, registry
+
+4. Use `sudo lxc list` to ensure that juju lxc containers are running. If any are stopped, use `sudo lxc start <name>` to restart them.
+
+5. Run: `source /opt/cord_profile/admin-openrc.sh`
+
+6. Verify that the following OpenStack commands work:
+  * `$ keystone user-list`
+  * `$ nova list --all-tenants`
+  * `$ neutron net-list`
+> **Troubleshooting: OpenStack commands give SSL error.**
+> Sometimes Keystone starts up in a strange state and OpenStack
+> commands will fail with various SSL errors.
+> To fix this, it is often sufficient to run:
+`ssh ubuntu@keystone sudo service apache2 restart`
+
+
+## Power on Leaf and Spine Switches
+
+* **Physical POD:** power on the switches.  
+* **CiaB:** bring up the switch VMs:
+```
+$ cd ~/cord/build; vagrant up leaf-1 leaf-2 spine-1
+```
+
+## Check the Switches
+
+On the head node (i.e., prod VM for CiaB):
+
+1. Get switch IPs by running: cord prov list
+2. Verify that ping works for all switch IPs 
+
+## Boot the Compute Nodes
+
+* **Physical POD:** Log into the MAAS UI and power on the compute node.
+* **CiaB:** Log into the MAAS UI and power on the compute node.
+
+## Check the Compute Nodes
+
+Once the compute nodes are up:
+
+1. Login to the head node
+2. Run: `source /opt/cord_profile/admin-openrc.sh`
+3. Verify that nova service-list shows the compute node as “up”.
+> It may take a few minutes until the node's status is updated in Nova.
+4. Verify that you can log into the compute nodes from the head node as the ubuntu user
+
+## Check XOS
+
+Verify that XOS UI is running and accessible:
+
+* **Physical POD:** `http://<head-node>/xos`
+* **CiaB:** `http://<ciab-server>:8080/xos`
+
+If it's not working, try restarting XOS (replace `rcord` with the name of your profile):
+
+```
+$ cd /opt/cord_profile; docker-compose -p rcord restart
+```
+
+## Check VTN
+
+Verify that VTN is initialized correctly:
+
+1. Run `onos> cordvtn-nodes`
+2. Make sure the compute nodes have COMPLETE status.
+3. Prior to rebooting existing OpenStack VMs:
+  * Run `onos> cordvtn-ports`
+  * Make sure some ports show up
+  * If not, try this:
+    - `onos> cordvtn-sync-neutron-states <keystone-url> admin admin <password>`
+    - `onos> cordvtn-sync-xos-states <xos-url> xosadmin@opencord.org <password>`
+
+##Boot OpenStack VMs
+
+To bring up OpenStack VMs that were running before the POD was shut down:
+
+1. Run `source /opt/cord_profile/admin-openrc.sh`
+2. Get list of VM IDs: `nova list --all-tenants`
+3. For each VM:
+  * `$ nova start <vm-id>`
+  * `$ nova console-log <vm-id>`
+  * Inspect the console log to make sure that the network interfaces get IP addresses.
+
+To restart a vSG inside the vSG VM:
+
+1. SSH to the vSG VM
+2. Run: `sudo rm /root/network_is_setup`
+3. Save the vSG Tenant in the XOS UI
+4. Once the synchronizer has re-run, make sure you can ping 8.8.8.8 from inside the vSG container
+```
+sudo docker exec -ti vcpe-222-111 ping 8.8.8.8
+```
commit	bd786deaa2dcb691edb96556fd10460006d74614	[log] [tgz]
author	llp <llp@onlab.us>	Fri Jul 07 15:57:27 2017 -0700
committer	llp <llp@onlab.us>	Mon Jul 10 07:08:47 2017 -0700
tree	187be529ed4e0e088ba13703d18af33cd8bc9ec7
parent	512389a36c8130a7992382b852a1268d324ca1fa [diff]