Reorganization pass on Aether Docs
Change-Id: I0653109d6fe8d340278580ff5c7758ca264b512e
diff --git a/edge_deployment/tost_deployment.rst b/edge_deployment/tost_deployment.rst
new file mode 100644
index 0000000..2ac0225
--- /dev/null
+++ b/edge_deployment/tost_deployment.rst
@@ -0,0 +1,668 @@
+..
+ SPDX-FileCopyrightText: © 2020 Open Networking Foundation <support@opennetworking.org>
+ SPDX-License-Identifier: Apache-2.0
+
+===============
+TOST Deployment
+===============
+
+Update aether-pod-config
+========================
+
+Aether-pod-configs is a git project hosted on **gerrit.opencord.org** and we placed the following materials in it.
+
+- Terraform scripts to install TOST applications on Rancher, including ONOS, Stratum and Telegraf.
+- Customized configuration for each application (helm values).
+- Application specific configuration files, including ONOS network configuration and Stratum chassis config.
+
+Here is an example folder structure:
+
+.. code-block:: console
+
+ ╰─$ tree staging/ace-menlo/tost
+ staging/ace-menlo/tost
+ ├── app_map.tfvars
+ ├── backend.tf
+ ├── deepinsight
+ │ ├── README.md
+ │ ├── deepinsight-topo.json
+ │ └── deepinsight-topo.json.license
+ ├── main.tf -> ../../../common/tost/main.tf
+ ├── onos
+ │ ├── app_map.tfvars
+ │ ├── backend.tf
+ │ ├── main.tf -> ../../../../common/tost/apps/onos/main.tf
+ │ ├── onos-netcfg.json
+ │ ├── onos-netcfg.json.license
+ │ ├── onos.yaml
+ │ └── variables.tf -> ../../../../common/tost/apps/onos/variables.tf
+ ├── stratum
+ │ ├── app_map.tfvars
+ │ ├── backend.tf
+ │ ├── main.tf -> ../../../../common/tost/apps/stratum/main.tf
+ │ ├── menlo-staging-leaf-1-chassis-config.pb.txt
+ │ ├── menlo-staging-leaf-2-chassis-config.pb.txt
+ │ ├── menlo-staging-spine-1-chassis-config.pb.txt
+ │ ├── menlo-staging-spine-2-chassis-config.pb.txt
+ │ ├── stratum.yaml
+ │ ├── tost-dev-chassis-config.pb.txt
+ │ └── variables.tf -> ../../../../common/tost/apps/stratum/variables.tf
+ ├── telegraf
+ │ ├── app_map.tfvars
+ │ ├── backend.tf
+ │ ├── main.tf -> ../../../../common/tost/apps/telegraf/main.tf
+ │ ├── telegraf.yaml
+ │ └── variables.tf -> ../../../../common/tost/apps/telegraf/variables.tf
+ └── variables.tf -> ../../../common/tost/variables.tf
+
+There are four Terraform scripts inside **tost** directory and are responsible for managing each service.
+
+Root folder
+^^^^^^^^^^^
+Terraform reads **app_map.tfvars** to know which application will be installed on Rancher
+and which version and customized values need to apply to.
+
+Here is the example of **app_map.tfvars** which defines prerequisite apps for TOST
+as well as project and namespace in which TOST apps will be provisioned.
+Note that currently we don't have any prerequisite so we left this blank intentionally.
+It can be used to specify prerequisites in the future.
+
+.. code-block::
+
+ project_name = "tost"
+ namespace_name = "tost"
+
+ app_map = {}
+
+ONOS folder
+^^^^^^^^^^^
+All files under **onos** directory are related to ONOS application.
+The **app_map.tfvars** in this folder describes the information about ONOS helm chart.
+
+In this example, we specify the **onos-tost** helm chart version to **0.1.18** and load **onos.yaml**
+as custom value files.
+
+.. code-block::
+
+ apps = ["onos"]
+
+ app_map = {
+ onos = {
+ app_name = "onos-tost"
+ project_name = "tost"
+ target_namespace = "onos-tost"
+ catalog_name = "onos"
+ template_name = "onos-tost"
+ template_version = "0.1.18"
+ values_yaml = ["onos.yaml"]
+ }
+ }
+
+**onos.yaml** used to custom your ONOS-tost Helm chart values and please pay attention to the last section, config.
+
+.. code-block:: yaml
+
+ onos-classic:
+ image:
+ tag: master
+ pullPolicy: Always
+ replicas: 1
+ atomix:
+ replicas: 1
+ logging:
+ config: |
+ # Common pattern layout for appenders
+ log4j2.stdout.pattern = %d{RFC3339} %-5level [%c{1}] %msg%n%throwable
+
+ # Root logger
+ log4j2.rootLogger.level = INFO
+
+ # OSGi appender
+ log4j2.rootLogger.appenderRef.PaxOsgi.ref = PaxOsgi
+ log4j2.appender.osgi.type = PaxOsgi
+ log4j2.appender.osgi.name = PaxOsgi
+ log4j2.appender.osgi.filter = *
+
+ # stdout appender
+ log4j2.rootLogger.appenderRef.Console.ref = Console
+ log4j2.appender.console.type = Console
+ log4j2.appender.console.name = Console
+ log4j2.appender.console.layout.type = PatternLayout
+ log4j2.appender.console.layout.pattern = ${log4j2.stdout.pattern}
+
+ # SSHD logger
+ log4j2.logger.sshd.name = org.apache.sshd
+ log4j2.logger.sshd.level = INFO
+
+ # Spifly logger
+ log4j2.logger.spifly.name = org.apache.aries.spifly
+ log4j2.logger.spifly.level = WARN
+
+ # SegmentRouting logger
+ log4j2.logger.segmentrouting.name = org.onosproject.segmentrouting
+ log4j2.logger.segmentrouting.level = DEBUG
+
+ config:
+ server: gerrit.opencord.org
+ repo: aether-pod-configs
+ folder: staging/ace-menlo/tost/onos
+ file: onos-netcfg.json
+ netcfgUrl: http://onos-tost-onos-classic-hs.tost.svc:8181/onos/v1/network/configuration
+ clusterUrl: http://onos-tost-onos-classic-hs.tost.svc:8181/onos/v1/cluster
+
+Once the **onos-tost** containers are deployed into Kubernetes,
+it will read **onos-netcfg.json** file from the **aether-pod-config** and please change the folder name to different location if necessary.
+
+**onos-netcfg.json** is environment dependent and please change it to fit your environment.
+
+..
+ TODO: Add an example based on the recommended topology
+
+Stratum folder
+^^^^^^^^^^^^^^
+Stratum uses a similar directory structure as ONOS for Terraform and its configuration files.
+
+The customize value file is named **stratum.yaml**
+
+.. code-block::
+
+ app_map = {
+ stratum= {
+ app_name = "stratum"
+ project_name = "tost"
+ target_namespace = "stratum"
+ catalog_name = "stratum"
+ template_name = "stratum"
+ template_version = "0.1.9"
+ values_yaml = ["stratum.yaml"]
+ }
+ }
+
+Like ONOS, **stratum.yaml** used to customize Stratum Helm Chart and please pay attention to the config section.
+
+.. code-block:: yaml
+
+ image:
+ registry: registry.aetherproject.org
+ repository: tost/stratum-bfrt
+ tag: 9.2.0-4.14.49
+ pullPolicy: Always
+ pullSecrets:
+ - aether-registry-credential
+
+ extraParams:
+ - "-max_log_size=0"
+ - '-write_req_log_file=""'
+ - '-read_req_log_file=""'
+ - "-v=0"
+ - "-stderrthreshold=0"
+ - "-bf_switchd_background=false"
+
+ nodeSelector:
+ node-role.aetherproject.org: switch
+
+ tolerations:
+ - effect: NoSchedule
+ value: switch
+ key: node-role.aetherproject.org
+
+ config:
+ server: gerrit.opencord.org
+ repo: aether-pod-configs
+ folder: staging/ace-onf-menlo/tost/stratum
+
+Stratum has the same deployment workflow as ONOS.
+Once it is deployed to Kubernetes, it will read switch-dependent config files from the aether-pod-configs repo.
+The key folder indicates that relative path of configs.
+
+.. attention::
+
+ The switch-dependent config file should be named as **${hostname}-chassis-config.pb.txt**.
+ For example, if the host name of your Tofino switch is **my-leaf**, please name config file **my-leaf-config.pb.txt**.
+
+..
+ TODO: Add an example based on the recommended topology
+
+Telegraf folder
+^^^^^^^^^^^^^^^
+
+The app_map.tfvars specify the Helm Chart version and the filename of the custom Helm value file.
+
+.. code-block::
+
+ apps=["telegraf"]
+
+ app_map = {
+ telegraf= {
+ app_name = "telegraf"
+ project_name = "tost"
+ target_namespace = "telegraf"
+ catalog_name = "influxdata"
+ template_name = "telegraf"
+ template_version = "1.7.23"
+ values_yaml = ["telegraf.yaml"]
+ }
+ }
+
+The **telegraf.yaml** used to override the Telegraf Helm Chart and its environment-dependent.
+Please pay attention to the **inputs.addresses** section.
+Telegraf will read data from stratum so we need to specify all Tofino switch’s IP addresses here.
+Taking Menlo staging pod as example, there are four switches so we fill out 4 IP addresses.
+
+.. code-block:: yaml
+
+ podAnnotations:
+ field.cattle.io/workloadMetrics: '[{"path":"/metrics","port":9273,"schema":"HTTP"}]'
+
+ config:
+ outputs:
+ - prometheus_client:
+ metric_version: 2
+ listen: ":9273"
+ inputs:
+ - cisco_telemetry_gnmi:
+ addresses:
+ - 10.92.1.81:9339
+ - 10.92.1.82:9339
+ - 10.92.1.83:9339
+ - 10.92.1.84:9339
+ redial: 10s
+ - cisco_telemetry_gnmi.subscription:
+ name: stratum_counters
+ origin: openconfig-interfaces
+ path: /interfaces/interface[name=*]/state/counters
+ sample_interval: 5000ns
+ subscription_mode: sample
+
+
+Create Your Own Configs
+^^^^^^^^^^^^^^^^^^^^^^^
+
+The easiest way to create your own configs is running the template script.
+
+Assumed we would like to set up the **ace-example** pod in the production environment.
+
+1. open the **tools/ace_env**
+2. fill out all required variables
+3. import the environment variables from **tools/ace_env**
+4. perform the makefile command to generate configuration and directory for TOST
+5. update **onos-netcfg.json** for ONOS
+6. update **${hostname}-chassis-config.pb.txt** for Stratum
+7. update all switch IPs in **telegraf.yaml**
+8. commit your change and open the Gerrit patch
+
+.. code-block:: console
+
+ vim tools/ace_env
+ source tools/ace_env
+ make -C tools/ tost
+ vim production/ace-example/tost/onos/onos-netcfg.json
+ vim production/ace-example/tost/stratum/*${hostname}-chassis-config.pb.txt**
+ vim production/ace-example/tost/telegraf/telegraf.yam
+ git add commit
+ git review
+
+
+Quick recap
+^^^^^^^^^^^
+
+To recap, most of the files in **tost** folder can be copied from existing examples.
+However, there are a few files we need to pay extra attentions to.
+
+- **onos-netcfg.json** in **onos** folder
+- Chassis config in **stratum** folder
+ There should be one chassis config for each switch. The file name needs to be
+ **${hostname}-chassis-config.pb.txt**
+- **telegraf.yaml** in **telegraf** folder need to be updated with all switch
+ IP addresses
+
+Double check these files and make sure they have been updated accordingly.
+
+
+Create a review request
+^^^^^^^^^^^^^^^^^^^^^^^
+We also need to create a gerrit review request, similar to what we have done in
+the **Aether Runtime Deployment**.
+
+Please refer to :doc:`Aether Runtime Deployment <runtime_deployment>` to
+create a review request.
+
+
+Create TOST deployment job in Jenkins
+=====================================
+There are three major components in the Jenkins system, the Jenkins pipeline
+and Jenkins Job Builder and Jenkins Job.
+
+We follow the Infrastructure as Code principle to place three major components
+in a Git repo, ``aether-ci-management``
+
+Download the ``aether-ci-management`` repository.
+
+.. code-block:: shell
+
+ $ cd $WORKDIR
+ $ git clone "ssh://[username]@gerrit.opencord.org:29418/aether-ci-management"
+
+
+Here is the example of folder structure, we put everything related to three
+major components under the jjb folder.
+
+.. code-block:: console
+
+ $ tree -d jjb
+ jjb
+ ├── ci-management
+ ├── global
+ │ ├── jenkins-admin -> ../../global-jjb/jenkins-admin
+ │ ├── jenkins-init-scripts -> ../../global-jjb/jenkins-init-scripts
+ │ ├── jjb -> ../../global-jjb/jjb
+ │ └── shell -> ../../global-jjb/shell
+ ├── pipeline
+ ├── repos
+ ├── shell
+ └── templates
+
+
+Jenkins pipeline
+^^^^^^^^^^^^^^^^
+Jenkins pipeline runs the Terraform scripts to install desired applications
+into the specified Kubernetes cluster.
+
+Both ONOS and Stratum will read configuration files (network config, chassis
+config) from aether-pod-config.
+
+The default git branch is master. For testing purpose, we also provide two
+parameters to specify the number of reviews and patchset.
+
+We will explain more in the next section.
+
+.. note::
+
+ Currently, we don’t perform the incremental upgrade for TOST application.
+ Instead, we perform the clean installation.
+ In the pipeline script, Terraform will destroy all existing resources and
+ then create them again.
+
+
+We put all pipeline scripts under the pipeline directory, the language of the
+pipeline script is groovy.
+
+.. code-block:: console
+
+ $ tree pipeline
+ pipeline
+ ├── aether-in-a-box.groovy
+ ├── artifact-release.groovy
+ ├── cd-pipeline-charts-postrelease.groovy
+ ├── cd-pipeline-dockerhub-postrelease.groovy
+ ├── cd-pipeline-postrelease.groovy
+ ├── cd-pipeline-terraform.groovy
+ ├── docker-publish.groovy
+ ├── ng40-func.groovy
+ ├── ng40-scale.groovy
+ ├── reuse-scan-gerrit.groovy
+ ├── reuse-scan-github.groovy
+ ├── tost-onos.groovy
+ ├── tost-stratum.groovy
+ ├── tost-telegraf.groovy
+ └── tost.groovy
+
+Currently, we had four pipeline scripts for TOST deployment.
+
+1. tost-onos.groovy
+2. tost-stratum.groovy
+3. tost-telegraf.groovy
+4. tost.groovy
+
+tost-[onos/stratum/telegraf].groovy are used to deploy the individual
+application respectively, and tost.groovy is a high level script, used to
+deploy the TOST application, it will execute the above three scripts in its
+pipeline script.
+
+
+Jenkins jobs
+^^^^^^^^^^^^
+
+Jenkins job is the task unit in the Jenkins system. A Jenkins job contains the following information:
+
+- Jenkins pipeline
+- Parameters for Jenkins pipeline
+- Build trigger
+- Source code management
+
+We created one Jenkins job for each TOST component, per Aether edge.
+
+We have four Jenkins jobs (HostPath provisioner, ONOS, Stratum and Telegraf)
+for each edge as of today.
+
+There are 10+ parameters in Jenkins jobs and they can be divided into two
+parts, cluster-level and application-level.
+
+Here is an example of supported parameters.
+
+.. image:: images/jenkins-onos-params.png
+ :width: 480px
+
+Application level
+"""""""""""""""""
+
+- **GERRIT_CHANGE_NUMBER/GERRIT_PATCHSET_NUMBER**: tell the pipeline script to read
+ the config for aether-pod-configs repo from a specified gerrit review, instead of the
+ HEAD branch. It’s good for developer to test its change before merge.
+- **onos_user**: used to login ONOS controller
+- **git_repo/git_server/git_user/git_password_env**: information of git
+ repository, **git_password_env** is a key for Jenkins Credential system.
+
+Cluster level
+"""""""""""""
+- **gcp_credential**: Google Cloud Platform credential for remote storage, used
+ by Terraform.
+- **terraform_dir**: The root directory of the TOST directory.
+- **rancher_cluster**: target Rancher cluster name.
+- **rancher_api_env**: Rancher credential to access Rancher, used by Terraform.
+
+.. note::
+
+ Typically, developer only focus on **GERRIT_CHANGE_NUMBER** and **GERRIT_PATCHSET_NUMBER**. The rest of them are managed by OPs.
+
+Jenkins Job Builder (JJB)
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+We prefer to apply the IaC (Infrastructure as Code) for everything. We use the
+JJB (Jenkins Job Builder) to create new Jenkins Job, including the Jenkins
+pipeline. We need to clone a set of Jenkins jobs when a new edge is deployed.
+
+In order to provide the flexibility and avoid re-inventing the wheel, we used
+the job template to declare your job. Thanks to the JJB, we can use the
+parameters in the job template to render different kinds of jobs easily.
+
+All the template files are placed under templates directory.
+
+.. code-block:: console
+
+ ╰─$ tree templates
+ templates
+ ├── aether-in-a-box.yaml
+ ├── archive-artifacts.yaml
+ ├── artifact-release.yml
+ ├── cd-pipeline-terraform.yaml
+ ├── docker-publish-github.yaml
+ ├── docker-publish.yaml
+ ├── helm-lint.yaml
+ ├── make-test.yaml
+ ├── ng40-nightly.yaml
+ ├── ng40-test.yaml
+ ├── private-docker-publish.yaml
+ ├── private-make-test.yaml
+ ├── publish-helm-repo.yaml
+ ├── reuse-gerrit.yaml
+ ├── reuse-github.yaml
+ ├── sync-dir.yaml
+ ├── tost.yaml
+ ├── verify-licensed.yaml
+ └── versioning.yaml
+
+We defined all TOST required job templates in tost.yaml and here is its partial
+content.
+
+.. code-block:: yaml
+
+ - job-template:
+ name: "{name}-onos"
+ id: "deploy-onos"
+ project-type: pipeline
+ dsl: !include-raw-escape: jjb/pipeline/tost-onos.groovy
+ triggers:
+ - onf-infra-tost-gerrit-trigger:
+ gerrit-server-name: '{gerrit-server-name}'
+ trigger_command: "apply"
+ pattern: "{terraform_dir}/tost/onos/.*"
+ logrotate:
+ daysToKeep: 7
+ numToKeep: 10
+ artifactDaysToKeep: 7
+ artifactNumToKeep: 10
+ parameters:
+ - string:
+ name: gcp_credential
+ default: "{google_bucket_access}"
+ - string:
+ name: rancher_cluster
+ default: "{rancher_cluster}"
+ - string:
+ name: rancher_api_env
+ default: "{rancher_api}"
+ - string:
+ name: git_repo
+ default: "aether-pod-configs"
+ - string:
+ name: git_server
+ default: "gerrit.opencord.org"
+ - string:
+ name: git_ssh_user
+ default: "jenkins"
+
+
+
+
+Once we have the job template, we need to tell the JJB, we want to use the job template to create our own jobs.
+Here comes the concept of project, you need to define job templates you want to use and the values of all parameters.
+
+
+We put all project yaml files under the repo directory and here is the example
+
+.. code-block:: console
+
+ ╰─$ tree repos 130 ↵
+ repos
+ ├── aether-helm-charts.yaml
+ ├── aether-in-a-box.yaml
+ ├── cd-pipeline-terraform.yaml
+ ├── ng40-test.yaml
+ ├── spgw.yaml
+ └── tost.yaml
+
+
+Following is the example of tost projects, we defined three projects here, and each project has different
+parameters and Jenkins jobs it wants to use.
+
+.. code-block:: yaml
+
+ - project:
+ name: deploy-menlo-tost-dev
+ rancher_cluster: "menlo-tost-dev"
+ terraform_dir: "testing/menlo-tost"
+ rancher_api: "{rancher_testing_access}"
+ jobs:
+ - "deploy"
+ - "deploy-onos"
+ - "deploy-stratum"
+ - "deploy-telegraf"
+ - project:
+ name: deploy-menlo-tost-staging
+ rancher_cluster: "ace-menlo"
+ terraform_dir: "staging/ace-menlo"
+ rancher_api: "{rancher_staging_access}"
+ jobs:
+ - "deploy"
+ - "deploy-onos"
+ - "deploy-stratum"
+ - "deploy-telegraf"
+ - project:
+ name: deploy-menlo-production
+ rancher_cluster: "ace-menlo"
+ terraform_dir: "production/ace-menlo"
+ rancher_api: "{rancher_production_access}"
+ jobs:
+ - "deploy"
+ - "deploy-onos"
+ - "deploy-stratum"
+ - "deploy-telegraf"
+
+
+Create Your Own Jenkins Job
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Basically, if you don't need to customize the Jenkins pipeline script and the job configuration, the only thing
+you need to do is modify the repos/tost.yaml to add your project.
+
+For example, we would like to deploy the TOST to our production pod, let's assume it named "tost-example".
+Add the following content into repos/tost.yaml
+
+.. code-block:: yaml
+
+ - project:
+ name: deploy-tost-example-production
+ rancher_cluster: "ace-test-example"
+ terraform_dir: "production/tost-example"
+ rancher_api: "{rancher_production_access}"
+ jobs:
+ - "deploy"
+ - "deploy-onos"
+ - "deploy-stratum"
+ - "deploy-telegraf"
+
+
+.. note::
+
+ The **terraform_dir** indicates the directory location in aether-pod-configs repo, please ensure your Terraform scripts
+ already there before running the Jenkins job.
+
+
+Trigger TOST deployment in Jenkins
+==================================
+Whenever a change is merged into **aether-pod-config**,
+the Jenkins job should be triggered automatically to (re)deploy TOST.
+
+You can also type the comment **apply** in the Gerrit patch, it will trigger Jenkins jobs to deploy TOST for you.
+
+
+Verification
+============
+Fabric connectivity should be fully ready at this point.
+We should verify that **all servers**, including compute nodes and the management server,
+have an IP address and are **able to reach each other via fabric interface** before continuing the next step.
+
+This can be simply done by running a **ping** command from one server to another server's fabric IP.
+
+
+Troubleshooting
+===============
+
+The deployment process involves the following steps:
+
+1. Jenkins Job
+2. Jenkins Pipeline
+3. Clone Git Repository
+4. Execute Terraform scripts
+5. Rancher start to install applications
+6. Applications be deployed into Kubernetes cluster
+7. ONOS/Stratum will read the configuration (network config, chassis config)
+8. Pod become running
+
+Taking ONOS as an example, here's what you can do to troubleshoot.
+
+You can see the log message of the first 4 steps in Jenkins console.
+If something goes wrong, the status of the Jenkins job will be in red.
+If Jenkins doesn't report any error message, the next step is going to Rancher's portal
+to ensure the Answers is same as the *onos.yaml* in *aether-pod-configs*.