adding operating guide
Change-Id: I6b77fc6f17fe3941af2b8679a626f85d6012a284
diff --git a/docs/Makefile b/docs/Makefile
new file mode 100644
index 0000000..a369a94
--- /dev/null
+++ b/docs/Makefile
@@ -0,0 +1,7 @@
+default: book
+
+book:
+ gitbook init; gitbook serve &
+
+clean:
+ rm -rf _book
diff --git a/docs/README.md b/docs/README.md
new file mode 100644
index 0000000..e90c928
--- /dev/null
+++ b/docs/README.md
@@ -0,0 +1,13 @@
+# Building and Installing CORD
+
+This guide describes how to build and install CORD.
+
+If this is your first encounter with CORD, we suggest you start by
+bringing up an emulated version called _CORD-in-a-Box_.
+It installs CORD on a set of virtual machines running on a single
+physical server. Just follow our [CORD-in-a-Box Guide](quickstart.md).
+
+You can also install CORD on a physical POD. This involves first assembling
+a set of servers and switches, and then pointing the build system at
+that target hardware. Just follow our
+[Physical POD Guide](quickstart_physical.md).
diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md
new file mode 100644
index 0000000..ee5a7b4
--- /dev/null
+++ b/docs/SUMMARY.md
@@ -0,0 +1,9 @@
+# Summary
+
+* [Building CORD](README.md)
+ * [CORD-in-a-Box](quickstart.md)
+ * [Physical POD](quickstart_physical.md)
+* [Operating CORD](operate/README.md)
+ * [Powering Up a POD](operate/power_up.md)
+ * [ELK Stack Logs](operate/elk_stack.md)
+
diff --git a/docs/operate/README.md b/docs/operate/README.md
new file mode 100644
index 0000000..a2c2581
--- /dev/null
+++ b/docs/operate/README.md
@@ -0,0 +1,13 @@
+# Operating CORD
+
+This guide defines various processes and procedures for operating a CORD POD.
+It assumes the [build-and-install](../README.md) has already
+completed, and you want to operate and manage a running POD.
+
+
+Today, CORD most often runs for demo, development, or evaluation
+purposes, so this guide is limited to simple procedures suitable for
+such settings. We expect more realistic operational scenarios will be
+supported in the future. It is also the case that CORD's operations
+and management interface is primarily defined by its Northbound API,
+which is documented at `<head-node>/apidocs/` on a running POD.
diff --git a/docs/operate/elk_stack.md b/docs/operate/elk_stack.md
new file mode 100644
index 0000000..672eb7c
--- /dev/null
+++ b/docs/operate/elk_stack.md
@@ -0,0 +1,57 @@
+# ELK Stack Logs
+
+CORD uses ELK Stack for logging information at all levels. CORD’s
+ELK Stack logger collects information from several components,
+including the XOS Core, API, and various Synchronizers. On a running
+POD, the logs can be accessed at `http://<head-node>:8080/kibana`.
+
+There is also a second way of accessing low-level logs with additional
+verbosity that do not make it into ELK Stack. This involves accessing log
+messages in various containers directly. You may do so by running the
+following command on the head node.
+
+```
+$ docker logs < container-name
+```
+
+For most purposes, the logs in ELK Stack should contain enough information
+to diagnose problems. Furthermore, these logs thread together facts across
+multiple components by using the identifiers of XOS data model objects.
+
+More information about using
+[Kibana](https://www.elastic.co/guide/en/kibana/current/getting-started.html)
+to access ELK Stack logs is available elsewhere, but to illustrate how the logging
+system is used in CORD, consider the following example quieries.
+
+The first example query enlists log messages in the implementation of a
+particular service synchronizer, in a given time range:
+
+```
++synchronizer_name:vtr-synchronizer AND +@timestamp:[now-1h TO now]
+```
+
+A second query gets log messages that are linked to the _Network_ data model
+across all services:
+
+```
++model_name: Network
+```
+
+The same query can be refined to include the identifier of the specific
+_Network_ object in question. You can obtain the object id from the object’s
+page in the XOS GUI.
+
+```
++model_name: Network AND +pk:7
+```
+
+A final example lists log messages in a service synchronizer that
+contain Python exceptions, and will usually correspond to anomalous
+execution:
+
+```
++synchronizer_name: vtr-synchronizer AND +exception
+```
+
+
+
diff --git a/docs/operate/power_up.md b/docs/operate/power_up.md
new file mode 100644
index 0000000..7826963
--- /dev/null
+++ b/docs/operate/power_up.md
@@ -0,0 +1,124 @@
+# Powering Up a POD
+
+This guide describes how to power up a previously installed CORD POD that
+has been powered down (cleanly or otherwise). The end goal of the power up
+procedure is a fully functioning CORD POD.
+
+## Boot the Head Node
+
+* **Physical POD:** Power on the head node
+* **CiaB:** Bring up the prod VM:
+```
+$ cd ~/cord/build; vagrant up prod
+```
+
+## Check the Head Node Services
+
+1. Verify that `mgmtbr` and `fabric` interfaces are up and have IP addresses
+2. Verify that MAAS UI is running and accessible:
+ * **Physical POD:** `http://<head-node>/MAAS`
+ * **CiaB:** `http://<ciab-server>:8080/MAAS`
+> **Troubleshooting: MAAS UI not available on CiaB.**
+> If you are running a CiaB and there is no webserver on port 8080, it might
+> be necessary to refresh port forwarding to the prod VM.
+> Run `ps ax|grep 8080`
+> and look for an SSH command (will look something like this):
+```
+31353 pts/5 S 0:00 ssh -o User=vagrant -o Port=22 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o ForwardX11=no -o IdentityFile="/users/acb/cord/build/targets/cord-in-a-box/.vagrant/machines/prod/libvirt/private_key" -L *:8080:192.168.121.14:80 -N 192.168.121.14
+```
+> A workaround is to kill this process, and then copy and paste the command
+> above into another window on the CiaB server to set up a new SSH port forwarding connection.
+
+3. Verify that the following Docker containers are running: mavenrepo, switchq, automation, provisioner, generator, harvester, storage, allocator, registry
+
+4. Use `sudo lxc list` to ensure that juju lxc containers are running. If any are stopped, use `sudo lxc start <name>` to restart them.
+
+5. Run: `source /opt/cord_profile/admin-openrc.sh`
+
+6. Verify that the following OpenStack commands work:
+ * `$ keystone user-list`
+ * `$ nova list --all-tenants`
+ * `$ neutron net-list`
+> **Troubleshooting: OpenStack commands give SSL error.**
+> Sometimes Keystone starts up in a strange state and OpenStack
+> commands will fail with various SSL errors.
+> To fix this, it is often sufficient to run:
+`ssh ubuntu@keystone sudo service apache2 restart`
+
+
+## Power on Leaf and Spine Switches
+
+* **Physical POD:** power on the switches.
+* **CiaB:** bring up the switch VMs:
+```
+$ cd ~/cord/build; vagrant up leaf-1 leaf-2 spine-1
+```
+
+## Check the Switches
+
+On the head node (i.e., prod VM for CiaB):
+
+1. Get switch IPs by running: cord prov list
+2. Verify that ping works for all switch IPs
+
+## Boot the Compute Nodes
+
+* **Physical POD:** Log into the MAAS UI and power on the compute node.
+* **CiaB:** Log into the MAAS UI and power on the compute node.
+
+## Check the Compute Nodes
+
+Once the compute nodes are up:
+
+1. Login to the head node
+2. Run: `source /opt/cord_profile/admin-openrc.sh`
+3. Verify that nova service-list shows the compute node as “up”.
+> It may take a few minutes until the node's status is updated in Nova.
+4. Verify that you can log into the compute nodes from the head node as the ubuntu user
+
+## Check XOS
+
+Verify that XOS UI is running and accessible:
+
+* **Physical POD:** `http://<head-node>/xos`
+* **CiaB:** `http://<ciab-server>:8080/xos`
+
+If it's not working, try restarting XOS (replace `rcord` with the name of your profile):
+
+```
+$ cd /opt/cord_profile; docker-compose -p rcord restart
+```
+
+## Check VTN
+
+Verify that VTN is initialized correctly:
+
+1. Run `onos> cordvtn-nodes`
+2. Make sure the compute nodes have COMPLETE status.
+3. Prior to rebooting existing OpenStack VMs:
+ * Run `onos> cordvtn-ports`
+ * Make sure some ports show up
+ * If not, try this:
+ - `onos> cordvtn-sync-neutron-states <keystone-url> admin admin <password>`
+ - `onos> cordvtn-sync-xos-states <xos-url> xosadmin@opencord.org <password>`
+
+##Boot OpenStack VMs
+
+To bring up OpenStack VMs that were running before the POD was shut down:
+
+1. Run `source /opt/cord_profile/admin-openrc.sh`
+2. Get list of VM IDs: `nova list --all-tenants`
+3. For each VM:
+ * `$ nova start <vm-id>`
+ * `$ nova console-log <vm-id>`
+ * Inspect the console log to make sure that the network interfaces get IP addresses.
+
+To restart a vSG inside the vSG VM:
+
+1. SSH to the vSG VM
+2. Run: `sudo rm /root/network_is_setup`
+3. Save the vSG Tenant in the XOS UI
+4. Once the synchronizer has re-run, make sure you can ping 8.8.8.8 from inside the vSG container
+```
+sudo docker exec -ti vcpe-222-111 ping 8.8.8.8
+```