blob: 12acef4e983e339cbf693c6f397033915589f3e4 [file] [log] [blame]
Charles Chan4a107222020-10-30 17:23:48 -07001..
2 SPDX-FileCopyrightText: © 2020 Open Networking Foundation <support@opennetworking.org>
3 SPDX-License-Identifier: Apache-2.0
4
5===============
6TOST Deployment
7===============
8
9Update aether-pod-config
10========================
11
12Aether-pod-configs is a git project hosted on **gerrit.opencord.org** and we placed the following materials in it.
13
14- Terraform scripts to install TOST applications on Rancher, including ONOS, Stratum and Telegraf.
15- Customized configuration for each application (helm values).
16- Application specific configuration files, including ONOS network configuration and Stratum chassis config.
17
18Here is an example folder structure:
19
20.. code-block:: console
21
22 ╰─$ tree staging/ace-menlo/tost
23 staging/ace-menlo/tost
24 ├── app_map.tfvars
25 ├── backend.tf
26 ├── common
27 │ ├── main.tf
28 │ └── variables.tf
29 ├── hostpath.yaml
30 ├── main.tf
31 ├── onos
32 │ ├── app_map.tfvars
33 │ ├── backend.tf
34 │ ├── main.tf -> ../common/main.tf
35 │ ├── onos-netcfg.json
36 │ ├── onos-netcfg.json.license
37 │ ├── onos.yaml
38 │ └── variables.tf -> ../common/variables.tf
39 ├── stratum
40 │ ├── app_map.tfvars
41 │ ├── backend.tf
42 │ ├── main.tf -> ../common/main.tf
43 │ ├── menlo-staging-leaf-1-chassis-config.pb.txt
44 │ ├── menlo-staging-leaf-2-chassis-config.pb.txt
45 │ ├── menlo-staging-spine-1-chassis-config.pb.txt
46 │ ├── menlo-staging-spine-2-chassis-config.pb.txt
47 │ ├── stratum.yaml
48 │ ├── tost-dev-chassis-config.pb.txt
49 │ └── variables.tf -> ../common/variables.tf
50 ├── telegraf
51 │ ├── app_map.tfvars
52 │ ├── backend.tf
53 │ ├── main.tf -> ../common/main.tf
54 │ ├── telegraf.yaml
55 │ └── variables.tf -> ../common/variables.tf
56 └── variables.tf
57
58There are four Terraform scripts inside **tost** directory and are responsible for managing each service.
59
60Root folder
61^^^^^^^^^^^
62Terraform reads **app_map.tfvars** to know which application will be installed on Rancher
63and which version and customized values need to apply to.
64
Hyunsun Moonfc751aa2020-11-11 18:49:47 -080065Here is the example of **app_map.tfvars** which defines prerequisite apps for TOST
66as well as project and namespace in which TOST apps will be provisioned.
67Note that currently we don't have any prerequisite so we left this blank intentionally.
68It can be used to specify prerequisites in the future.
Charles Chan4a107222020-10-30 17:23:48 -070069
70.. code-block::
71
72 project_name = "tost"
73 namespace_name = "tost"
74
Hyunsun Moonfc751aa2020-11-11 18:49:47 -080075 app_map = {}
Charles Chan4a107222020-10-30 17:23:48 -070076
77ONOS folder
78^^^^^^^^^^^
79All files under **onos** directory are related to ONOS application.
Hyunsun Moonfc751aa2020-11-11 18:49:47 -080080The **app_map.tfvars** in this folder describes the information about ONOS helm chart.
Charles Chan4a107222020-10-30 17:23:48 -070081
82In this example, we specify the **onos-tost** helm chart version to **0.1.18** and load **onos.yaml**
83as custom value files.
84
85.. code-block::
86
87 apps = ["onos"]
88
89 app_map = {
90 onos = {
91 app_name = "onos-tost"
92 project_name = "tost"
93 target_namespace = "onos-tost"
94 catalog_name = "onos"
95 template_name = "onos-tost"
96 template_version = "0.1.18"
97 values_yaml = ["onos.yaml"]
98 }
99 }
100
101**onos.yaml** used to custom your ONOS-tost Helm chart values and please pay attention to the last section, config.
102
103.. code-block:: yaml
104
105 onos-classic:
106 image:
107 tag: master
108 pullPolicy: Always
109 replicas: 1
110 atomix:
111 replicas: 1
112 logging:
113 config: |
114 # Common pattern layout for appenders
115 log4j2.stdout.pattern = %d{RFC3339} %-5level [%c{1}] %msg%n%throwable
116
117 # Root logger
118 log4j2.rootLogger.level = INFO
119
120 # OSGi appender
121 log4j2.rootLogger.appenderRef.PaxOsgi.ref = PaxOsgi
122 log4j2.appender.osgi.type = PaxOsgi
123 log4j2.appender.osgi.name = PaxOsgi
124 log4j2.appender.osgi.filter = *
125
126 # stdout appender
127 log4j2.rootLogger.appenderRef.Console.ref = Console
128 log4j2.appender.console.type = Console
129 log4j2.appender.console.name = Console
130 log4j2.appender.console.layout.type = PatternLayout
131 log4j2.appender.console.layout.pattern = ${log4j2.stdout.pattern}
132
133 # SSHD logger
134 log4j2.logger.sshd.name = org.apache.sshd
135 log4j2.logger.sshd.level = INFO
136
137 # Spifly logger
138 log4j2.logger.spifly.name = org.apache.aries.spifly
139 log4j2.logger.spifly.level = WARN
140
141 # SegmentRouting logger
142 log4j2.logger.segmentrouting.name = org.onosproject.segmentrouting
143 log4j2.logger.segmentrouting.level = DEBUG
144
145 config:
146 server: gerrit.opencord.org
147 repo: aether-pod-configs
148 folder: staging/ace-menlo/tost/onos
149 file: onos-netcfg.json
150 netcfgUrl: http://onos-tost-onos-classic-hs.tost.svc:8181/onos/v1/network/configuration
151 clusterUrl: http://onos-tost-onos-classic-hs.tost.svc:8181/onos/v1/cluster
152
153Once the **onos-tost** containers are deployed into Kubernetes,
154it will read **onos-netcfg.json** file from the **aether-pod-config** and please change the folder name to different location if necessary.
155
156**onos-netcfg.json** is environment dependent and please change it to fit your environment.
157
158..
159 TODO: Add an example based on the recommended topology
160
161Stratum folder
162^^^^^^^^^^^^^^
163Stratum uses a similar directory structure as ONOS for Terraform and its configuration files.
164
165The customize value file is named **stratum.yaml**
166
167.. code-block::
168
169 app_map = {
170 stratum= {
171 app_name = "stratum"
172 project_name = "tost"
173 target_namespace = "stratum"
174 catalog_name = "stratum"
175 template_name = "stratum"
176 template_version = "0.1.9"
177 values_yaml = ["stratum.yaml"]
178 }
179 }
180
181Like ONOS, **stratum.yaml** used to customize Stratum Helm Chart and please pay attention to the config section.
182
183.. code-block:: yaml
184
185 image:
186 registry: registry.aetherproject.org
187 repository: tost/stratum-bfrt
188 tag: 9.2.0-4.14.49
189 pullPolicy: Always
190 pullSecrets:
191 - aether-registry-credential
192
193 extraParams:
194 - "-max_log_size=0"
195 - '-write_req_log_file=""'
196 - '-read_req_log_file=""'
197 - "-v=0"
198 - "-stderrthreshold=0"
199 - "-bf_switchd_background=false"
200
201 nodeSelector:
202 node-role.aetherproject.org: switch
203
204 tolerations:
205 - effect: NoSchedule
206 value: switch
207 key: node-role.aetherproject.org
208
209 config:
210 server: gerrit.opencord.org
211 repo: aether-pod-configs
212 folder: staging/ace-onf-menlo/tost/stratum
213
214Stratum has the same deployment workflow as ONOS.
215Once it is deployed to Kubernetes, it will read switch-dependent config files from the aether-pod-configs repo.
216The key folder indicates that relative path of configs.
217
218.. attention::
219
220 The switch-dependent config file should be named as **${hostname}-chassis-config.pb.txt**.
221 For example, if the host name of your Tofino switch is **my-leaf**, please name config file **my-leaf-config.pb.txt**.
222
223..
224 TODO: Add an example based on the recommended topology
225
226Telegraf folder
227^^^^^^^^^^^^^^^
228
229The app_map.tfvars specify the Helm Chart version and the filename of the custom Helm value file.
230
231.. code-block::
232
233 apps=["telegraf"]
234
235 app_map = {
236 telegraf= {
237 app_name = "telegraf"
238 project_name = "tost"
239 target_namespace = "telegraf"
240 catalog_name = "influxdata"
241 template_name = "telegraf"
242 template_version = "1.7.23"
243 values_yaml = ["telegraf.yaml"]
244 }
245 }
246
247The **telegraf.yaml** used to override the Telegraf Helm Chart and its environment-dependent.
248Please pay attention to the **inputs.addresses** section.
249Telegraf will read data from stratum so we need to specify all Tofino switch’s IP addresses here.
250Taking Menlo staging pod as example, there are four switches so we fill out 4 IP addresses.
251
252.. code-block:: yaml
253
254 podAnnotations:
255 field.cattle.io/workloadMetrics: '[{"path":"/metrics","port":9273,"schema":"HTTP"}]'
256
257 config:
258 outputs:
259 - prometheus_client:
260 metric_version: 2
261 listen: ":9273"
262 inputs:
263 - cisco_telemetry_gnmi:
264 addresses:
265 - 10.92.1.81:9339
266 - 10.92.1.82:9339
267 - 10.92.1.83:9339
268 - 10.92.1.84:9339
269 redial: 10s
270 - cisco_telemetry_gnmi.subscription:
271 name: stratum_counters
272 origin: openconfig-interfaces
273 path: /interfaces/interface[name=*]/state/counters
274 sample_interval: 5000ns
275 subscription_mode: sample
276
277Quick recap
278^^^^^^^^^^^
279
280To recap, most of the files in **tost** folder can be copied from existing examples.
281However, there are a few files we need to pay extra attentions to.
282
283- **onos-netcfg.json** in **onos** folder
284- Chassis config in **stratum** folder
285 There should be one chassis config for each switch. The file name needs to be **${hostname}-chassis-config.pb.txt**
286- **telegraf.yaml** in **telegraf** folder need to be updated with all switch IP addresses
287
288Double check these files and make sure they have been updated accordingly.
289
290
291Create a review request
292^^^^^^^^^^^^^^^^^^^^^^^
293We also need to create a gerrit review request, similar to what we have done in the **Aether Run-Time Deployment**.
294Please refer to :doc:`Aether Run-Time Deployment <run_time_deployment>` to create a review request.
295
296
297Create TOST deployment job in Jenkins
298=====================================
299There are three major components in the Jenkins system, the Jenkins pipeline and Jenkins Job Builder and Jenkins Job.
300
301.. note::
302
303 All Jenkins related files are placed in a `temporary repository <https://github.com/hwchiu/stratum-example/tree/master/pipelines>`_ and will move to another repo once the Aether Jenkins is ready.
304
305
306Jenkins pipeline
307^^^^^^^^^^^^^^^^
308Jenkins pipeline runs the Terraform scripts to install desired applications into the specified Kubernetes cluster.
309
310Both ONOS and Stratum will read configuration files (network config, chassis config) from aether-pod-config.
311The default git branch is master.
312For testing purpose, we also provide two parameters to specify the number of reviews and patchset.
313We will explain more in the next section.
314
315.. note::
316
317 Currently, we don’t perform the incremental upgrade for TOST application.
318 Instead, we perform the clean installation.
319 In the pipeline script, Terraform will destroy all existing resources and then create them again.
320
321Jenkins jobs
322^^^^^^^^^^^^
323
324Jenkins job is the task unit in the Jenkins system. A Jenkins job contains the following information:
325
326- Jenkins pipeline
327- Parameters for Jenkins pipeline
328- Build trigger
329- Source code management
330
331We created one Jenkins job for each TOST component, per Aether edge.
332We have four Jenkins jobs (HostPath provisioner, ONOS, Stratum and Telegraf) for each edge as of today.
333
334There are 10+ parameters in Jenkins jobs and they can be divided into two parts, cluster-level and application-level.
335Here is an example of supported parameters.
336
337.. image:: images/jenkins-onos-params.png
338 :width: 480px
339
340Application level
341"""""""""""""""""
342
343- **config_review/config_patchset** tell the pipeline script to read the config for ONOS from a specified
344 gerrit review, instead of the HEAD branch. It’s good for developer to test its change before merge.
345- **onos_user/onos_password**: used to login ONOS controller
346 **onos_password** is a key which will load the real password from Jenkins Credential system.
347- **onos_ns**: the namespace we installed the secret file for ONOS, (will refactor in the future).
348- **git_repo/git_server/git_user/git_password_env**: information of git repository, **git_password_env** is a key for
349 Jenkins Credential system.
350
351Cluster level
352"""""""""""""
353- **gcp_credential**: Google Cloud Platform credential for remote storage, used by Terraform.
354- **terraform_dir**: The root directory of the TOST directory.
355- **rancher_cluster**: target Rancher cluster name.
356- **rancher_api_env**: Rancher credential to access Rancher, used by Terraform.
357- **k8s_conifg**: Kubernetes config to access remote Kubernetes cluster.
358
359.. note::
360
361 Typically, developer only focus on **config_review** and **config_patchset**. The rest of them are managed by OPs.
362
363Jenkins Job Builder (JJB)
364^^^^^^^^^^^^^^^^^^^^^^^^^
365We prefer to apply the IaaC (Infrastructure as a Code) for everything.
366We use the JJB (Jenkins Job Builder) to create new Jenkins Job, including the Jenkins pipeline.
367We need to clone a set of Jenkins jobs when a new edge is deployed.
368
369..
370 TODO: Automate Jenkins job creation with JJB once the Aether Jenkins is set updated
371
372Trigger TOST deployment in Jenkins
373==================================
374Ideally, whenever a change is merged into **aether-pod-config**,
375the Jenkins job should be triggered automatically to (re)deploy TOST.
376This is still being set up at this moment.
377Therefore, we need to manually trigger the deployment by clicking the **Build** button
378of each Jenkins job and provide parameters accordingly.
379
380..
381 TODO: Update this once the gerrit trigger is implemented
382
383
384Troubleshooting
385===============
386
387The deployment process involves the following steps:
388
3891. Jenkins Job
3902. Jenkins Pipeline
3913. Clone Git Repository
3924. Execute Terraform scripts
3935. Rancher start to install applications
3946. Applications be deployed into Kubernetes cluster
3957. ONOS/Stratum will read the configuration (network config, chassis config)
3968. Pod become running
397
398Taking ONOS as an example, here's what you can do to troubleshoot.
399
400You can see the log message of the first 4 steps in Jenkins console.
401If something goes wrong, the status of the Jenkins job will be in red.
402If Jenkins doesn't report any error message, the next step is going to Rancher's portal
403to ensure the Answers is same as the *onos.yaml* in *aether-pod-configs*.