Ajay Lotan Thakur | 2127ee9 | 2022-08-31 07:09:04 -0700 | [diff] [blame] | 1 | .. vim: syntax=rst |
| 2 | |
| 3 | .. _aiab_troubleshooting: |
| 4 | |
Larry Peterson | 0fa9b36 | 2023-08-09 15:15:13 -0700 | [diff] [blame] | 5 | Troubleshooting Aether-in-a-Box |
| 6 | =============================== |
Ajay Lotan Thakur | 2127ee9 | 2022-08-31 07:09:04 -0700 | [diff] [blame] | 7 | |
| 8 | FAQs |
| 9 | ---- |
| 10 | |
| 11 | RKE2 vs. Kubespray Install |
| 12 | ^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 13 | |
| 14 | The AiaB installer will bring up Kubernetes on the server where it is run. By default it |
| 15 | uses `RKE2 <https://docs.rke2.io>`_ as the Kubernetes platform. However, older versions of AiaB |
| 16 | used `Kubespray <https://kubernetes.io/docs/setup/production-environment/tools/kubespray/>`_ |
| 17 | and that is still an option. To switch to Kubespray as the Kubernetes platform, edit the |
| 18 | Makefile and replace *rke2* with *kubespray* on this line:: |
| 19 | |
| 20 | node0:~/aether-in-a-box$ git diff Makefile |
| 21 | diff --git a/Makefile b/Makefile |
| 22 | index 5f2c186..608c221 100644 |
| 23 | --- a/Makefile |
| 24 | +++ b/Makefile |
| 25 | @@ -35,7 +35,7 @@ ENABLE_GNBSIM ?= true |
| 26 | ENABLE_SUBSCRIBER_PROXY ?= false |
| 27 | GNBSIM_COLORS ?= true |
| 28 | |
| 29 | -K8S_INSTALL ?= rke2 |
| 30 | +K8S_INSTALL ?= kubespray |
| 31 | CTR_CMD := sudo /var/lib/rancher/rke2/bin/ctr --address /run/k3s/containerd/containerd.sock --namespace k8s.io |
| 32 | |
| 33 | PROXY_ENABLED ?= false |
| 34 | node0:~/aether-in-a-box$ |
| 35 | |
| 36 | |
| 37 | You may wish to use Kubespray instead of RKE2 if you want to use locally-built images with AiaB |
| 38 | (e.g., if you are developing SD-CORE services). The reason is that RKE2 uses containerd instead of |
| 39 | Docker and so cannot access images in the local Docker registry. |
| 40 | |
| 41 | How to use Local Image |
| 42 | ^^^^^^^^^^^^^^^^^^^^^^ |
| 43 | |
| 44 | Note that RKE2 (the default Kubernetes installer) is based on containerd rather than Docker. |
| 45 | Containerd has its own local image registry that is separate from the local Docker Registry. With RKE2, |
| 46 | if you have used `docker build` to build a local image, it is only in the Docker registry and so is not |
| 47 | available to run in AiaB without some additional steps. An easy workaround |
| 48 | is to use `docker push` to push the image to a remote repository (e.g., Docker Hub) and then modify your |
| 49 | Helm values file to pull in that remote image. Another option is to save the local Docker image |
| 50 | into a file and push the file to the containerd registry like this:: |
| 51 | |
| 52 | docker save -o /tmp/lte-uesoftmodem.tar omecproject/lte-uesoftmodem:1.1.0 |
| 53 | sudo /var/lib/rancher/rke2/bin/ctr --address /run/k3s/containerd/containerd.sock --namespace k8s.io \ |
| 54 | images import /tmp/lte-uesoftmodem.tar |
| 55 | |
| 56 | The above commands save the local Docker image `omecproject/lte-uesoftmodem:1.1.0` in a tarball, and then upload |
| 57 | the tarball into the containerd registry where it is available for use by RKE2. Of course you should replace |
| 58 | `omecproject/lte-uesoftmodem:1.1.0` with the name of your image. |
| 59 | |
| 60 | If you know that you are going to be using AiaB to test locally-built images, probably the easiest thing to do is to |
| 61 | use the Kubespray installer. If you have already installed using RKE2 and you want to switch to Kubespray, first |
| 62 | run `make clean` before following the steps in the :ref:`rke2-vs-kubespray-install` section above. |
| 63 | |
| 64 | Restarting the AiaB Server |
| 65 | ^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 66 | |
| 67 | AiaB should come up in a mostly working state if the AiaB server is rebooted. If any pods are |
| 68 | stuck in an Error or CrashLoopBackoff state they can be restarted using ``kubectl delete pod``. |
| 69 | It might also be necessary to power cycle the Sercomm eNodeB in order to get it to reconnect to |
| 70 | the SD-CORE. |
| 71 | |
| 72 | |
| 73 | Enabling externalIP at MME |
| 74 | ^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 75 | |
| 76 | You can enable externalIP service in the MME by providing following config in the override file:: |
| 77 | |
| 78 | node0:~/aether-in-a-box$ git diff sd-core-4g-values.yaml |
| 79 | diff --git a/sd-core-4g-values.yaml b/sd-core-4g-values.yaml |
| 80 | index 0939739..f240f89 100644 |
| 81 | --- a/sd-core-4g-values.yaml |
| 82 | +++ b/sd-core-4g-values.yaml |
| 83 | @@ -24,6 +24,11 @@ omec-control-plane: |
| 84 | bootstrap: |
| 85 | users: [] |
| 86 | staticusers: [] |
| 87 | + mme: |
| 88 | + s1ap: |
| 89 | + serviceType: ClusterIP |
| 90 | + externalIP: 10.1.1.1 |
| 91 | + |
| 92 | spgwc: |
| 93 | pfcp: true |
| 94 | cfgFiles: |
| 95 | node0:~/aether-in-a-box$ |
| 96 | |
| 97 | Enabling externalIP at AMF |
| 98 | ^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 99 | |
| 100 | You can enable externalIP service in the AMF by providing following config in the override file:: |
| 101 | |
| 102 | node0:~/aether-in-a-box$ git diff sd-core-5g-values.yaml |
| 103 | diff --git a/sd-core-5g-values.yaml b/sd-core-5g-values.yaml |
| 104 | index e513e1f..fc1c684 100644 |
| 105 | --- a/sd-core-5g-values.yaml |
| 106 | +++ b/sd-core-5g-values.yaml |
| 107 | @@ -34,6 +34,9 @@ |
| 108 | |
| 109 | amf: |
| 110 | cfgFiles: |
| 111 | + ngapp: |
| 112 | + serviceType: ClusterIP |
| 113 | + externalIp: "10.1.1.2" |
| 114 | + port: 38412 |
| 115 | amfcfg.conf: |
| 116 | configuration: |
| 117 | enableDBStore: false |
| 118 | @@ -176,6 +179,7 @@ omec-user-plane: |
| 119 | cpiface: |
| 120 | dnn: "internet" |
| 121 | hostname: "upf" |
| 122 | |
| 123 | 5g-ran-sim: |
| 124 | enable: ${ENABLE_GNBSIM} |
| 125 | |
| 126 | node0:~/aether-in-a-box$ |
| 127 | |
| 128 | Troubleshooting |
| 129 | --------------- |
| 130 | |
| 131 | **NOTE: Running both 4G and 5G SD-CORE simultaneously in AiaB is currently not supported.** |
| 132 | |
| 133 | Proxy Issues |
| 134 | ^^^^^^^^^^^^ |
| 135 | |
| 136 | When working with AiaB behind a proxy, it may be possible to experience certain issues |
| 137 | due to security policies. That is, the proxy may block a domain (e.g., opencord.org) |
| 138 | and you may see messages like these ones when trying to clone or get a copy of aether-in-a-box:: |
| 139 | |
| 140 | ubuntu18:~$ git clone https://gerrit.opencord.org/aether-in-a-box |
| 141 | Cloning into 'aether-in-a-box'... |
| 142 | fatal: unable to access 'https://gerrit.opencord.org/aether-in-a-box/': server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none |
| 143 | |
| 144 | or:: |
| 145 | |
| 146 | ubuntu18:~$ wget https://gerrit.opencord.org/plugins/gitiles/aether-in-a-box/+archive/refs/heads/master.tar.gz |
| 147 | --2022-06-01 13:13:42-- https://gerrit.opencord.org/plugins/gitiles/aether-in-a-box/+archive/refs/heads/master.tar.gz |
| 148 | Resolving proxy.company-xyz.com (proxy.company-xyz.com)... w.x.y.z |
| 149 | Connecting to proxy.company-xyz.com (proxy.company-xyz.com)|w.x.y.z|:#... connected. |
| 150 | ERROR: cannot verify gerrit.opencord.org's certificate, issued by 'emailAddress=proxy-team@company-xyz.com,... ,C=US': |
| 151 | Self-signed certificate encountered. |
| 152 | |
| 153 | To address this issue, you need to talk to your company's proxy admins and request to |
| 154 | unblock (re-classify) the opencord.org domain |
| 155 | |
| 156 | |
| 157 | "make" fails immediately |
| 158 | ^^^^^^^^^^^^^^^^^^^^^^^^ |
| 159 | |
| 160 | AiaB connects macvlan networks to ``DATA_IFACE`` so that the UPF can communicate on the network. |
| 161 | To do this it assumes that the *systemd-networkd* service is installed and running, ``DATA_IFACE`` |
| 162 | is under its control, and the systemd-networkd configuration file for ``DATA_IFACE`` ends with |
| 163 | ``<DATA_IFACE>.network``, where ``<DATA_IFACE>`` stands for the actual interface name. It |
| 164 | tries to find this configuration file by looking in the standard paths. If it fails you'll see |
| 165 | a message like:: |
| 166 | |
| 167 | FATAL: Could not find systemd-networkd config for interface foobar, exiting now! |
| 168 | make: *** [Makefile:112: /users/acb/aether-in-a-box//build/milestones/interface-check] Error 1 |
| 169 | |
| 170 | In this case, you can specify a ``DATA_IFACE_PATH=<path to the config file>`` argument to ``make`` |
| 171 | so that AiaB can find the systemd-networkd configuration file for ``DATA_IFACE``. It's also possible |
| 172 | that your system does not use systemd-networkd to configure network interfaces (more likely if you |
| 173 | are running in a VM), in which case AiaB is currently not able to install in your setup. You |
| 174 | can check that systemd-networkd is installed and running as follows:: |
| 175 | |
| 176 | $ systemctl status systemd-networkd.service |
| 177 | ● systemd-networkd.service - Network Service |
| 178 | Loaded: loaded (/lib/systemd/system/systemd-networkd.service; disabled; vendor preset: enabled) |
| 179 | Active: active (running) since Tue 2022-07-12 13:42:18 CDT; 2h 26min ago |
| 180 | TriggeredBy: ● systemd-networkd.socket |
| 181 | Docs: man:systemd-networkd.service(8) |
| 182 | Main PID: 13777 (systemd-network) |
| 183 | Status: "Processing requests..." |
| 184 | Tasks: 1 (limit: 193212) |
| 185 | Memory: 6.4M |
| 186 | CGroup: /system.slice/systemd-networkd.service |
| 187 | └─13777 /lib/systemd/systemd-networkd |
| 188 | |
Arrobo, Gabriel | 28cebee | 2023-01-24 10:41:25 -0800 | [diff] [blame] | 189 | |
| 190 | .. _AiaB_fails_too_many_files_open: |
| 191 | |
| 192 | AiaB fails during deployment of SD-Core network |
| 193 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 194 | |
| 195 | When running AiaB in Ubuntu 22.04, AiaB installation fails during the deployment of the SD-Core with |
| 196 | an error message as shown below:: |
| 197 | |
| 198 | ... |
| 199 | ... |
| 200 | Update Complete. ⎈Happy Helming!⎈ |
| 201 | NODE_IP=10.80.51.4 DATA_IFACE=data RAN_SUBNET=192.168.251.0/24 ENABLE_GNBSIM=true envsubst < /home/ubuntu/aether-in-a-box//sd-core-5g-values.yaml | \ |
| 202 | helm upgrade --create-namespace --install --wait \ |
| 203 | --namespace omec \ |
| 204 | --values - \ |
| 205 | sd-core \ |
| 206 | aether/sd-core |
| 207 | Release "sd-core" does not exist. Installing it now. |
| 208 | coalesce.go:175: warning: skipped value for kafka.config: Not a table. |
| 209 | Error: timed out waiting for the condition |
| 210 | make: *** [Makefile:336: /home/ubuntu/aether-in-a-box//build/milestones/5g-core] Error 1 |
| 211 | |
| 212 | To get more details about the issue, you can execute the following command to see what pod(s) have issues:: |
| 213 | |
| 214 | $ kubectl -n omec get pods |
| 215 | NAME READY STATUS RESTARTS AGE |
| 216 | amf-6dd746b9cd-2mk2j 0/1 CrashLoopBackOff 13 (24s ago) 42m |
| 217 | ausf-6dbb7655c7-4pkmp 1/1 Running 0 42m |
| 218 | gnbsim-0 1/1 Running 0 42m |
| 219 | metricfunc-7864fb8b7c-srf2l 1/1 Running 3 (41m ago) 42m |
| 220 | mongodb-0 1/1 Running 0 42m |
| 221 | mongodb-1 1/1 Running 0 41m |
| 222 | mongodb-arbiter-0 1/1 Running 0 42m |
| 223 | nrf-57c79d9f65-fs9qj 1/1 Running 0 42m |
| 224 | nssf-5b85b8978d-q8dz5 1/1 Running 0 42m |
| 225 | pcf-758d7cfb48-wjfxf 1/1 Running 0 42m |
| 226 | sd-core-kafka-0 1/1 Running 0 42m |
| 227 | sd-core-zookeeper-0 1/1 Running 0 42m |
| 228 | simapp-6cccd6f787-sd52q 0/1 Error 13 (5m14s ago) 42m |
| 229 | smf-ff667d5b8-sw5vf 1/1 Running 0 42m |
| 230 | udm-768b9987b4-cqvbg 1/1 Running 0 42m |
| 231 | udr-8566897d45-n8cbz 1/1 Running 0 42m |
| 232 | upf-0 5/5 Running 0 42m |
| 233 | webui-5894ffd49d-bdwf4 1/1 Running 0 42m |
| 234 | |
| 235 | As shown above, there are problems with the AMF and SIMAPP pods and to see the specifics of the |
| 236 | problem, the user can see the logs as shown below:: |
| 237 | |
| 238 | $ kubectl -n omec logs amf-6dd746b9cd-2mk2j |
| 239 | ... |
| 240 | ... |
| 241 | } (resolver returned new addresses) |
| 242 | 2023/01/24 17:24:56 INFO: [core] [Channel #1] Channel switches to new LB policy "pick_first" |
| 243 | 2023/01/24 17:24:56 INFO: [core] [Channel #1 SubChannel #2] Subchannel created |
| 244 | 2023/01/24 17:24:56 too many open files |
| 245 | |
| 246 | As the message shows, the problem is due to "too many open files". To resolve this issue, the user |
| 247 | can increase the maximum number of available watches and the maximum number of inotify instances |
| 248 | (e.g., 10x). To do so, first, see the current maximum numbers:: |
| 249 | |
| 250 | $ sysctl fs.inotify.max_user_instances |
| 251 | fs.inotify.max_user_instances = 128 |
| 252 | $ sysctl fs.inotify.max_user_watches |
| 253 | fs.inotify.max_user_watches = 1048576 |
| 254 | |
| 255 | Then, increase these values by executing:: |
| 256 | |
| 257 | sudo sysctl fs.inotify.max_user_instances=1280 |
| 258 | sudo sysctl fs.inotify.max_user_watches=10485760 |
| 259 | |
| 260 | The above setting gets reset to their original values when the machine is rebooted. You can make |
| 261 | this change permanent by creating an override file:: |
| 262 | |
| 263 | sudo nano /etc/sysctl.d/90-override.conf |
Arrobo, Gabriel | d717f29 | 2023-02-27 15:36:05 -0800 | [diff] [blame] | 264 | fs.inotify.max_user_instances=1280 |
| 265 | fs.inotify.max_user_watches=10485760 |
| 266 | sudo sysctl --system |
Arrobo, Gabriel | 28cebee | 2023-01-24 10:41:25 -0800 | [diff] [blame] | 267 | |
Arrobo, Gabriel | d717f29 | 2023-02-27 15:36:05 -0800 | [diff] [blame] | 268 | The last command is to load the changes without having to reboot the machine |
Arrobo, Gabriel | 28cebee | 2023-01-24 10:41:25 -0800 | [diff] [blame] | 269 | |
Ajay Lotan Thakur | 2127ee9 | 2022-08-31 07:09:04 -0700 | [diff] [blame] | 270 | Data plane is not working |
| 271 | ^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 272 | |
| 273 | The first step is to read `Understanding AiaB networking`_understanding_aiab_networking, which |
| 274 | gives a high level picture |
| 275 | of the AiaB data plane and how the pieces fit together. In order to debug the problem you will |
| 276 | need to figure out where data plane packets from the eNodeB are dropped. One way to do this is to |
| 277 | run ``tcpdump`` on (1) DATA_IFACE to ensure that the data plane packets are arriving, (2) the |
| 278 | ``access`` interface to see that they make it to the UPF, and (3) the ``core`` to check that they |
| 279 | are forwarded upstream. |
| 280 | |
| 281 | If the upstream packets don't make it to DATA_IFACE, you probably need to add the static route |
| 282 | on the eNodeB so packets to the UPF have a next hop of DATA_IFACE. You can see these upstream |
| 283 | packets by running:: |
| 284 | |
| 285 | tcpdump -i <data-iface> -n udp port 2152 |
| 286 | |
| 287 | If they don't make it to ``access`` you should check that the kernel routing table is forwarding |
| 288 | a packet with destination 192.158.252.3 to the ``access`` interface. You can see them by running:: |
| 289 | |
| 290 | tcpdump -i access -n udp port 2152 |
| 291 | |
Vijaya Tiruveedula | 8724efb | 2022-10-26 15:53:16 +0530 | [diff] [blame] | 292 | In case packets are not forwarded from ``DATA_IFACE`` to ``acccess`` interface, the following command |
| 293 | can be used to forward the traffic which is destined to 192.168.252.3:: |
| 294 | |
| 295 | iptables -A FORWARD -d 192.168.252.3 -i <data-iface> -o access -j ACCEPT |
| 296 | |
Ajay Lotan Thakur | 2127ee9 | 2022-08-31 07:09:04 -0700 | [diff] [blame] | 297 | If they don't make it to ``core`` then they are being dropped by the UPF for some reason. This |
| 298 | may be a configuration issue with the state loaded in the ROC / SD-CORE -- the UPF is being told |
| 299 | to discard these packets. You should check that the device's IMSI is part of a slice and that |
| 300 | the slice's policy settings allow traffic to that destination. You can view them via the following:: |
| 301 | |
| 302 | tcpdump -i core -n net 172.250.0.0/16 |
| 303 | |
| 304 | That command will capture all packets to/from the UE subnet. |
| 305 | |
| 306 | If you cannot figure out the issue, see `Getting Help`_. |
| 307 | |
| 308 | .. _rke2-vs-kubespray-install: |
| 309 | |
| 310 | Getting Help |
| 311 | ------------ |
| 312 | |
| 313 | Please introduce yourself and post your questions to the `#aether-dev` channel on the ONF Community Slack. |
| 314 | Details about how to join this channel can be found on the `ONF Wiki <https://wiki.opennetworking.org/display/COM/Aether>`_. |
| 315 | In your introduction please state your institution and position, and describe why you are interested in Aether |
| 316 | and what is your end goal. |
| 317 | |
| 318 | If you need help debugging your setup, please give as much detail as possible about |
| 319 | your environment: the OS version you have installed, are you running on bare metal or in a VM, |
| 320 | how much CPU and memory does your server have, are you installing behind a proxy, and so on. Also list the steps |
| 321 | you have performed so far, and post any error messages you have received. These details will aid the community |
| 322 | to understand where you are and how to help you make progress. |
| 323 | |
| 324 | |