blob: d07b4395a7bdb475c19db062bb3c682515ad541b [file] [log] [blame]
Ajay Lotan Thakur2127ee92022-08-31 07:09:04 -07001.. vim: syntax=rst
2
3.. _aiab_troubleshooting:
4
Larry Peterson0fa9b362023-08-09 15:15:13 -07005Troubleshooting Aether-in-a-Box
6===============================
Ajay Lotan Thakur2127ee92022-08-31 07:09:04 -07007
8FAQs
9----
10
11RKE2 vs. Kubespray Install
12^^^^^^^^^^^^^^^^^^^^^^^^^^
13
14The AiaB installer will bring up Kubernetes on the server where it is run. By default it
15uses `RKE2 <https://docs.rke2.io>`_ as the Kubernetes platform. However, older versions of AiaB
16used `Kubespray <https://kubernetes.io/docs/setup/production-environment/tools/kubespray/>`_
17and that is still an option. To switch to Kubespray as the Kubernetes platform, edit the
18Makefile and replace *rke2* with *kubespray* on this line::
19
20 node0:~/aether-in-a-box$ git diff Makefile
21 diff --git a/Makefile b/Makefile
22 index 5f2c186..608c221 100644
23 --- a/Makefile
24 +++ b/Makefile
25 @@ -35,7 +35,7 @@ ENABLE_GNBSIM ?= true
26 ENABLE_SUBSCRIBER_PROXY ?= false
27 GNBSIM_COLORS ?= true
28
29 -K8S_INSTALL ?= rke2
30 +K8S_INSTALL ?= kubespray
31 CTR_CMD := sudo /var/lib/rancher/rke2/bin/ctr --address /run/k3s/containerd/containerd.sock --namespace k8s.io
32
33 PROXY_ENABLED ?= false
34 node0:~/aether-in-a-box$
35
36
37You may wish to use Kubespray instead of RKE2 if you want to use locally-built images with AiaB
38(e.g., if you are developing SD-CORE services). The reason is that RKE2 uses containerd instead of
39Docker and so cannot access images in the local Docker registry.
40
41How to use Local Image
42^^^^^^^^^^^^^^^^^^^^^^
43
44Note that RKE2 (the default Kubernetes installer) is based on containerd rather than Docker.
45Containerd has its own local image registry that is separate from the local Docker Registry. With RKE2,
46if you have used `docker build` to build a local image, it is only in the Docker registry and so is not
47available to run in AiaB without some additional steps. An easy workaround
48is to use `docker push` to push the image to a remote repository (e.g., Docker Hub) and then modify your
49Helm values file to pull in that remote image. Another option is to save the local Docker image
50into a file and push the file to the containerd registry like this::
51
52 docker save -o /tmp/lte-uesoftmodem.tar omecproject/lte-uesoftmodem:1.1.0
53 sudo /var/lib/rancher/rke2/bin/ctr --address /run/k3s/containerd/containerd.sock --namespace k8s.io \
54 images import /tmp/lte-uesoftmodem.tar
55
56The above commands save the local Docker image `omecproject/lte-uesoftmodem:1.1.0` in a tarball, and then upload
57the tarball into the containerd registry where it is available for use by RKE2. Of course you should replace
58`omecproject/lte-uesoftmodem:1.1.0` with the name of your image.
59
60If you know that you are going to be using AiaB to test locally-built images, probably the easiest thing to do is to
61use the Kubespray installer. If you have already installed using RKE2 and you want to switch to Kubespray, first
62run `make clean` before following the steps in the :ref:`rke2-vs-kubespray-install` section above.
63
64Restarting the AiaB Server
65^^^^^^^^^^^^^^^^^^^^^^^^^^
66
67AiaB should come up in a mostly working state if the AiaB server is rebooted. If any pods are
68stuck in an Error or CrashLoopBackoff state they can be restarted using ``kubectl delete pod``.
69It might also be necessary to power cycle the Sercomm eNodeB in order to get it to reconnect to
70the SD-CORE.
71
72
73Enabling externalIP at MME
74^^^^^^^^^^^^^^^^^^^^^^^^^^
75
76You can enable externalIP service in the MME by providing following config in the override file::
77
78 node0:~/aether-in-a-box$ git diff sd-core-4g-values.yaml
79 diff --git a/sd-core-4g-values.yaml b/sd-core-4g-values.yaml
80 index 0939739..f240f89 100644
81 --- a/sd-core-4g-values.yaml
82 +++ b/sd-core-4g-values.yaml
83 @@ -24,6 +24,11 @@ omec-control-plane:
84 bootstrap:
85 users: []
86 staticusers: []
87 + mme:
88 + s1ap:
89 + serviceType: ClusterIP
90 + externalIP: 10.1.1.1
91 +
92 spgwc:
93 pfcp: true
94 cfgFiles:
95 node0:~/aether-in-a-box$
96
97Enabling externalIP at AMF
98^^^^^^^^^^^^^^^^^^^^^^^^^^
99
100You can enable externalIP service in the AMF by providing following config in the override file::
101
102 node0:~/aether-in-a-box$ git diff sd-core-5g-values.yaml
103 diff --git a/sd-core-5g-values.yaml b/sd-core-5g-values.yaml
104 index e513e1f..fc1c684 100644
105 --- a/sd-core-5g-values.yaml
106 +++ b/sd-core-5g-values.yaml
107 @@ -34,6 +34,9 @@
108
109 amf:
110 cfgFiles:
111 + ngapp:
112 + serviceType: ClusterIP
113 + externalIp: "10.1.1.2"
114 + port: 38412
115 amfcfg.conf:
116 configuration:
117 enableDBStore: false
118 @@ -176,6 +179,7 @@ omec-user-plane:
119 cpiface:
120 dnn: "internet"
121 hostname: "upf"
122
123 5g-ran-sim:
124 enable: ${ENABLE_GNBSIM}
125
126 node0:~/aether-in-a-box$
127
128Troubleshooting
129---------------
130
131**NOTE: Running both 4G and 5G SD-CORE simultaneously in AiaB is currently not supported.**
132
133Proxy Issues
134^^^^^^^^^^^^
135
136When working with AiaB behind a proxy, it may be possible to experience certain issues
137due to security policies. That is, the proxy may block a domain (e.g., opencord.org)
138and you may see messages like these ones when trying to clone or get a copy of aether-in-a-box::
139
140 ubuntu18:~$ git clone https://gerrit.opencord.org/aether-in-a-box
141 Cloning into 'aether-in-a-box'...
142 fatal: unable to access 'https://gerrit.opencord.org/aether-in-a-box/': server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none
143
144or::
145
146 ubuntu18:~$ wget https://gerrit.opencord.org/plugins/gitiles/aether-in-a-box/+archive/refs/heads/master.tar.gz
147 --2022-06-01 13:13:42-- https://gerrit.opencord.org/plugins/gitiles/aether-in-a-box/+archive/refs/heads/master.tar.gz
148 Resolving proxy.company-xyz.com (proxy.company-xyz.com)... w.x.y.z
149 Connecting to proxy.company-xyz.com (proxy.company-xyz.com)|w.x.y.z|:#... connected.
150 ERROR: cannot verify gerrit.opencord.org's certificate, issued by 'emailAddress=proxy-team@company-xyz.com,... ,C=US':
151 Self-signed certificate encountered.
152
153To address this issue, you need to talk to your company's proxy admins and request to
154unblock (re-classify) the opencord.org domain
155
156
157"make" fails immediately
158^^^^^^^^^^^^^^^^^^^^^^^^
159
160AiaB connects macvlan networks to ``DATA_IFACE`` so that the UPF can communicate on the network.
161To do this it assumes that the *systemd-networkd* service is installed and running, ``DATA_IFACE``
162is under its control, and the systemd-networkd configuration file for ``DATA_IFACE`` ends with
163``<DATA_IFACE>.network``, where ``<DATA_IFACE>`` stands for the actual interface name. It
164tries to find this configuration file by looking in the standard paths. If it fails you'll see
165a message like::
166
167 FATAL: Could not find systemd-networkd config for interface foobar, exiting now!
168 make: *** [Makefile:112: /users/acb/aether-in-a-box//build/milestones/interface-check] Error 1
169
170In this case, you can specify a ``DATA_IFACE_PATH=<path to the config file>`` argument to ``make``
171so that AiaB can find the systemd-networkd configuration file for ``DATA_IFACE``. It's also possible
172that your system does not use systemd-networkd to configure network interfaces (more likely if you
173are running in a VM), in which case AiaB is currently not able to install in your setup. You
174can check that systemd-networkd is installed and running as follows::
175
176 $ systemctl status systemd-networkd.service
177 systemd-networkd.service - Network Service
178 Loaded: loaded (/lib/systemd/system/systemd-networkd.service; disabled; vendor preset: enabled)
179 Active: active (running) since Tue 2022-07-12 13:42:18 CDT; 2h 26min ago
180 TriggeredBy: systemd-networkd.socket
181 Docs: man:systemd-networkd.service(8)
182 Main PID: 13777 (systemd-network)
183 Status: "Processing requests..."
184 Tasks: 1 (limit: 193212)
185 Memory: 6.4M
186 CGroup: /system.slice/systemd-networkd.service
187 └─13777 /lib/systemd/systemd-networkd
188
Arrobo, Gabriel28cebee2023-01-24 10:41:25 -0800189
190.. _AiaB_fails_too_many_files_open:
191
192AiaB fails during deployment of SD-Core network
193^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
194
195When running AiaB in Ubuntu 22.04, AiaB installation fails during the deployment of the SD-Core with
196an error message as shown below::
197
198 ...
199 ...
200 Update Complete. Happy Helming!⎈
201 NODE_IP=10.80.51.4 DATA_IFACE=data RAN_SUBNET=192.168.251.0/24 ENABLE_GNBSIM=true envsubst < /home/ubuntu/aether-in-a-box//sd-core-5g-values.yaml | \
202 helm upgrade --create-namespace --install --wait \
203 --namespace omec \
204 --values - \
205 sd-core \
206 aether/sd-core
207 Release "sd-core" does not exist. Installing it now.
208 coalesce.go:175: warning: skipped value for kafka.config: Not a table.
209 Error: timed out waiting for the condition
210 make: *** [Makefile:336: /home/ubuntu/aether-in-a-box//build/milestones/5g-core] Error 1
211
212To get more details about the issue, you can execute the following command to see what pod(s) have issues::
213
214 $ kubectl -n omec get pods
215 NAME READY STATUS RESTARTS AGE
216 amf-6dd746b9cd-2mk2j 0/1 CrashLoopBackOff 13 (24s ago) 42m
217 ausf-6dbb7655c7-4pkmp 1/1 Running 0 42m
218 gnbsim-0 1/1 Running 0 42m
219 metricfunc-7864fb8b7c-srf2l 1/1 Running 3 (41m ago) 42m
220 mongodb-0 1/1 Running 0 42m
221 mongodb-1 1/1 Running 0 41m
222 mongodb-arbiter-0 1/1 Running 0 42m
223 nrf-57c79d9f65-fs9qj 1/1 Running 0 42m
224 nssf-5b85b8978d-q8dz5 1/1 Running 0 42m
225 pcf-758d7cfb48-wjfxf 1/1 Running 0 42m
226 sd-core-kafka-0 1/1 Running 0 42m
227 sd-core-zookeeper-0 1/1 Running 0 42m
228 simapp-6cccd6f787-sd52q 0/1 Error 13 (5m14s ago) 42m
229 smf-ff667d5b8-sw5vf 1/1 Running 0 42m
230 udm-768b9987b4-cqvbg 1/1 Running 0 42m
231 udr-8566897d45-n8cbz 1/1 Running 0 42m
232 upf-0 5/5 Running 0 42m
233 webui-5894ffd49d-bdwf4 1/1 Running 0 42m
234
235As shown above, there are problems with the AMF and SIMAPP pods and to see the specifics of the
236problem, the user can see the logs as shown below::
237
238 $ kubectl -n omec logs amf-6dd746b9cd-2mk2j
239 ...
240 ...
241 } (resolver returned new addresses)
242 2023/01/24 17:24:56 INFO: [core] [Channel #1] Channel switches to new LB policy "pick_first"
243 2023/01/24 17:24:56 INFO: [core] [Channel #1 SubChannel #2] Subchannel created
244 2023/01/24 17:24:56 too many open files
245
246As the message shows, the problem is due to "too many open files". To resolve this issue, the user
247can increase the maximum number of available watches and the maximum number of inotify instances
248(e.g., 10x). To do so, first, see the current maximum numbers::
249
250 $ sysctl fs.inotify.max_user_instances
251 fs.inotify.max_user_instances = 128
252 $ sysctl fs.inotify.max_user_watches
253 fs.inotify.max_user_watches = 1048576
254
255Then, increase these values by executing::
256
257 sudo sysctl fs.inotify.max_user_instances=1280
258 sudo sysctl fs.inotify.max_user_watches=10485760
259
260The above setting gets reset to their original values when the machine is rebooted. You can make
261this change permanent by creating an override file::
262
263 sudo nano /etc/sysctl.d/90-override.conf
Arrobo, Gabrield717f292023-02-27 15:36:05 -0800264 fs.inotify.max_user_instances=1280
265 fs.inotify.max_user_watches=10485760
266 sudo sysctl --system
Arrobo, Gabriel28cebee2023-01-24 10:41:25 -0800267
Arrobo, Gabrield717f292023-02-27 15:36:05 -0800268The last command is to load the changes without having to reboot the machine
Arrobo, Gabriel28cebee2023-01-24 10:41:25 -0800269
Ajay Lotan Thakur2127ee92022-08-31 07:09:04 -0700270Data plane is not working
271^^^^^^^^^^^^^^^^^^^^^^^^^
272
273The first step is to read `Understanding AiaB networking`_understanding_aiab_networking, which
274gives a high level picture
275of the AiaB data plane and how the pieces fit together. In order to debug the problem you will
276need to figure out where data plane packets from the eNodeB are dropped. One way to do this is to
277run ``tcpdump`` on (1) DATA_IFACE to ensure that the data plane packets are arriving, (2) the
278``access`` interface to see that they make it to the UPF, and (3) the ``core`` to check that they
279are forwarded upstream.
280
281If the upstream packets don't make it to DATA_IFACE, you probably need to add the static route
282on the eNodeB so packets to the UPF have a next hop of DATA_IFACE. You can see these upstream
283packets by running::
284
285 tcpdump -i <data-iface> -n udp port 2152
286
287If they don't make it to ``access`` you should check that the kernel routing table is forwarding
288a packet with destination 192.158.252.3 to the ``access`` interface. You can see them by running::
289
290 tcpdump -i access -n udp port 2152
291
Vijaya Tiruveedula8724efb2022-10-26 15:53:16 +0530292In case packets are not forwarded from ``DATA_IFACE`` to ``acccess`` interface, the following command
293can be used to forward the traffic which is destined to 192.168.252.3::
294
295 iptables -A FORWARD -d 192.168.252.3 -i <data-iface> -o access -j ACCEPT
296
Ajay Lotan Thakur2127ee92022-08-31 07:09:04 -0700297If they don't make it to ``core`` then they are being dropped by the UPF for some reason. This
298may be a configuration issue with the state loaded in the ROC / SD-CORE -- the UPF is being told
299to discard these packets. You should check that the device's IMSI is part of a slice and that
300the slice's policy settings allow traffic to that destination. You can view them via the following::
301
302 tcpdump -i core -n net 172.250.0.0/16
303
304That command will capture all packets to/from the UE subnet.
305
306If you cannot figure out the issue, see `Getting Help`_.
307
308.. _rke2-vs-kubespray-install:
309
310Getting Help
311------------
312
313Please introduce yourself and post your questions to the `#aether-dev` channel on the ONF Community Slack.
314Details about how to join this channel can be found on the `ONF Wiki <https://wiki.opennetworking.org/display/COM/Aether>`_.
315In your introduction please state your institution and position, and describe why you are interested in Aether
316and what is your end goal.
317
318If you need help debugging your setup, please give as much detail as possible about
319your environment: the OS version you have installed, are you running on bare metal or in a VM,
320how much CPU and memory does your server have, are you installing behind a proxy, and so on. Also list the steps
321you have performed so far, and post any error messages you have received. These details will aid the community
322to understand where you are and how to help you make progress.
323
324