blob: d07b4395a7bdb475c19db062bb3c682515ad541b [file] [log] [blame]
.. vim: syntax=rst
.. _aiab_troubleshooting:
Troubleshooting Aether-in-a-Box
===============================
FAQs
----
RKE2 vs. Kubespray Install
^^^^^^^^^^^^^^^^^^^^^^^^^^
The AiaB installer will bring up Kubernetes on the server where it is run. By default it
uses `RKE2 <https://docs.rke2.io>`_ as the Kubernetes platform. However, older versions of AiaB
used `Kubespray <https://kubernetes.io/docs/setup/production-environment/tools/kubespray/>`_
and that is still an option. To switch to Kubespray as the Kubernetes platform, edit the
Makefile and replace *rke2* with *kubespray* on this line::
node0:~/aether-in-a-box$ git diff Makefile
diff --git a/Makefile b/Makefile
index 5f2c186..608c221 100644
--- a/Makefile
+++ b/Makefile
@@ -35,7 +35,7 @@ ENABLE_GNBSIM ?= true
ENABLE_SUBSCRIBER_PROXY ?= false
GNBSIM_COLORS ?= true
-K8S_INSTALL ?= rke2
+K8S_INSTALL ?= kubespray
CTR_CMD := sudo /var/lib/rancher/rke2/bin/ctr --address /run/k3s/containerd/containerd.sock --namespace k8s.io
PROXY_ENABLED ?= false
node0:~/aether-in-a-box$
You may wish to use Kubespray instead of RKE2 if you want to use locally-built images with AiaB
(e.g., if you are developing SD-CORE services). The reason is that RKE2 uses containerd instead of
Docker and so cannot access images in the local Docker registry.
How to use Local Image
^^^^^^^^^^^^^^^^^^^^^^
Note that RKE2 (the default Kubernetes installer) is based on containerd rather than Docker.
Containerd has its own local image registry that is separate from the local Docker Registry. With RKE2,
if you have used `docker build` to build a local image, it is only in the Docker registry and so is not
available to run in AiaB without some additional steps. An easy workaround
is to use `docker push` to push the image to a remote repository (e.g., Docker Hub) and then modify your
Helm values file to pull in that remote image. Another option is to save the local Docker image
into a file and push the file to the containerd registry like this::
docker save -o /tmp/lte-uesoftmodem.tar omecproject/lte-uesoftmodem:1.1.0
sudo /var/lib/rancher/rke2/bin/ctr --address /run/k3s/containerd/containerd.sock --namespace k8s.io \
images import /tmp/lte-uesoftmodem.tar
The above commands save the local Docker image `omecproject/lte-uesoftmodem:1.1.0` in a tarball, and then upload
the tarball into the containerd registry where it is available for use by RKE2. Of course you should replace
`omecproject/lte-uesoftmodem:1.1.0` with the name of your image.
If you know that you are going to be using AiaB to test locally-built images, probably the easiest thing to do is to
use the Kubespray installer. If you have already installed using RKE2 and you want to switch to Kubespray, first
run `make clean` before following the steps in the :ref:`rke2-vs-kubespray-install` section above.
Restarting the AiaB Server
^^^^^^^^^^^^^^^^^^^^^^^^^^
AiaB should come up in a mostly working state if the AiaB server is rebooted. If any pods are
stuck in an Error or CrashLoopBackoff state they can be restarted using ``kubectl delete pod``.
It might also be necessary to power cycle the Sercomm eNodeB in order to get it to reconnect to
the SD-CORE.
Enabling externalIP at MME
^^^^^^^^^^^^^^^^^^^^^^^^^^
You can enable externalIP service in the MME by providing following config in the override file::
node0:~/aether-in-a-box$ git diff sd-core-4g-values.yaml
diff --git a/sd-core-4g-values.yaml b/sd-core-4g-values.yaml
index 0939739..f240f89 100644
--- a/sd-core-4g-values.yaml
+++ b/sd-core-4g-values.yaml
@@ -24,6 +24,11 @@ omec-control-plane:
bootstrap:
users: []
staticusers: []
+ mme:
+ s1ap:
+ serviceType: ClusterIP
+ externalIP: 10.1.1.1
+
spgwc:
pfcp: true
cfgFiles:
node0:~/aether-in-a-box$
Enabling externalIP at AMF
^^^^^^^^^^^^^^^^^^^^^^^^^^
You can enable externalIP service in the AMF by providing following config in the override file::
node0:~/aether-in-a-box$ git diff sd-core-5g-values.yaml
diff --git a/sd-core-5g-values.yaml b/sd-core-5g-values.yaml
index e513e1f..fc1c684 100644
--- a/sd-core-5g-values.yaml
+++ b/sd-core-5g-values.yaml
@@ -34,6 +34,9 @@
amf:
cfgFiles:
+ ngapp:
+ serviceType: ClusterIP
+ externalIp: "10.1.1.2"
+ port: 38412
amfcfg.conf:
configuration:
enableDBStore: false
@@ -176,6 +179,7 @@ omec-user-plane:
cpiface:
dnn: "internet"
hostname: "upf"
5g-ran-sim:
enable: ${ENABLE_GNBSIM}
node0:~/aether-in-a-box$
Troubleshooting
---------------
**NOTE: Running both 4G and 5G SD-CORE simultaneously in AiaB is currently not supported.**
Proxy Issues
^^^^^^^^^^^^
When working with AiaB behind a proxy, it may be possible to experience certain issues
due to security policies. That is, the proxy may block a domain (e.g., opencord.org)
and you may see messages like these ones when trying to clone or get a copy of aether-in-a-box::
ubuntu18:~$ git clone https://gerrit.opencord.org/aether-in-a-box
Cloning into 'aether-in-a-box'...
fatal: unable to access 'https://gerrit.opencord.org/aether-in-a-box/': server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none
or::
ubuntu18:~$ wget https://gerrit.opencord.org/plugins/gitiles/aether-in-a-box/+archive/refs/heads/master.tar.gz
--2022-06-01 13:13:42-- https://gerrit.opencord.org/plugins/gitiles/aether-in-a-box/+archive/refs/heads/master.tar.gz
Resolving proxy.company-xyz.com (proxy.company-xyz.com)... w.x.y.z
Connecting to proxy.company-xyz.com (proxy.company-xyz.com)|w.x.y.z|:#... connected.
ERROR: cannot verify gerrit.opencord.org's certificate, issued by 'emailAddress=proxy-team@company-xyz.com,... ,C=US':
Self-signed certificate encountered.
To address this issue, you need to talk to your company's proxy admins and request to
unblock (re-classify) the opencord.org domain
"make" fails immediately
^^^^^^^^^^^^^^^^^^^^^^^^
AiaB connects macvlan networks to ``DATA_IFACE`` so that the UPF can communicate on the network.
To do this it assumes that the *systemd-networkd* service is installed and running, ``DATA_IFACE``
is under its control, and the systemd-networkd configuration file for ``DATA_IFACE`` ends with
``<DATA_IFACE>.network``, where ``<DATA_IFACE>`` stands for the actual interface name. It
tries to find this configuration file by looking in the standard paths. If it fails you'll see
a message like::
FATAL: Could not find systemd-networkd config for interface foobar, exiting now!
make: *** [Makefile:112: /users/acb/aether-in-a-box//build/milestones/interface-check] Error 1
In this case, you can specify a ``DATA_IFACE_PATH=<path to the config file>`` argument to ``make``
so that AiaB can find the systemd-networkd configuration file for ``DATA_IFACE``. It's also possible
that your system does not use systemd-networkd to configure network interfaces (more likely if you
are running in a VM), in which case AiaB is currently not able to install in your setup. You
can check that systemd-networkd is installed and running as follows::
$ systemctl status systemd-networkd.service
systemd-networkd.service - Network Service
Loaded: loaded (/lib/systemd/system/systemd-networkd.service; disabled; vendor preset: enabled)
Active: active (running) since Tue 2022-07-12 13:42:18 CDT; 2h 26min ago
TriggeredBy: systemd-networkd.socket
Docs: man:systemd-networkd.service(8)
Main PID: 13777 (systemd-network)
Status: "Processing requests..."
Tasks: 1 (limit: 193212)
Memory: 6.4M
CGroup: /system.slice/systemd-networkd.service
└─13777 /lib/systemd/systemd-networkd
.. _AiaB_fails_too_many_files_open:
AiaB fails during deployment of SD-Core network
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
When running AiaB in Ubuntu 22.04, AiaB installation fails during the deployment of the SD-Core with
an error message as shown below::
...
...
Update Complete. Happy Helming!⎈
NODE_IP=10.80.51.4 DATA_IFACE=data RAN_SUBNET=192.168.251.0/24 ENABLE_GNBSIM=true envsubst < /home/ubuntu/aether-in-a-box//sd-core-5g-values.yaml | \
helm upgrade --create-namespace --install --wait \
--namespace omec \
--values - \
sd-core \
aether/sd-core
Release "sd-core" does not exist. Installing it now.
coalesce.go:175: warning: skipped value for kafka.config: Not a table.
Error: timed out waiting for the condition
make: *** [Makefile:336: /home/ubuntu/aether-in-a-box//build/milestones/5g-core] Error 1
To get more details about the issue, you can execute the following command to see what pod(s) have issues::
$ kubectl -n omec get pods
NAME READY STATUS RESTARTS AGE
amf-6dd746b9cd-2mk2j 0/1 CrashLoopBackOff 13 (24s ago) 42m
ausf-6dbb7655c7-4pkmp 1/1 Running 0 42m
gnbsim-0 1/1 Running 0 42m
metricfunc-7864fb8b7c-srf2l 1/1 Running 3 (41m ago) 42m
mongodb-0 1/1 Running 0 42m
mongodb-1 1/1 Running 0 41m
mongodb-arbiter-0 1/1 Running 0 42m
nrf-57c79d9f65-fs9qj 1/1 Running 0 42m
nssf-5b85b8978d-q8dz5 1/1 Running 0 42m
pcf-758d7cfb48-wjfxf 1/1 Running 0 42m
sd-core-kafka-0 1/1 Running 0 42m
sd-core-zookeeper-0 1/1 Running 0 42m
simapp-6cccd6f787-sd52q 0/1 Error 13 (5m14s ago) 42m
smf-ff667d5b8-sw5vf 1/1 Running 0 42m
udm-768b9987b4-cqvbg 1/1 Running 0 42m
udr-8566897d45-n8cbz 1/1 Running 0 42m
upf-0 5/5 Running 0 42m
webui-5894ffd49d-bdwf4 1/1 Running 0 42m
As shown above, there are problems with the AMF and SIMAPP pods and to see the specifics of the
problem, the user can see the logs as shown below::
$ kubectl -n omec logs amf-6dd746b9cd-2mk2j
...
...
} (resolver returned new addresses)
2023/01/24 17:24:56 INFO: [core] [Channel #1] Channel switches to new LB policy "pick_first"
2023/01/24 17:24:56 INFO: [core] [Channel #1 SubChannel #2] Subchannel created
2023/01/24 17:24:56 too many open files
As the message shows, the problem is due to "too many open files". To resolve this issue, the user
can increase the maximum number of available watches and the maximum number of inotify instances
(e.g., 10x). To do so, first, see the current maximum numbers::
$ sysctl fs.inotify.max_user_instances
fs.inotify.max_user_instances = 128
$ sysctl fs.inotify.max_user_watches
fs.inotify.max_user_watches = 1048576
Then, increase these values by executing::
sudo sysctl fs.inotify.max_user_instances=1280
sudo sysctl fs.inotify.max_user_watches=10485760
The above setting gets reset to their original values when the machine is rebooted. You can make
this change permanent by creating an override file::
sudo nano /etc/sysctl.d/90-override.conf
fs.inotify.max_user_instances=1280
fs.inotify.max_user_watches=10485760
sudo sysctl --system
The last command is to load the changes without having to reboot the machine
Data plane is not working
^^^^^^^^^^^^^^^^^^^^^^^^^
The first step is to read `Understanding AiaB networking`_understanding_aiab_networking, which
gives a high level picture
of the AiaB data plane and how the pieces fit together. In order to debug the problem you will
need to figure out where data plane packets from the eNodeB are dropped. One way to do this is to
run ``tcpdump`` on (1) DATA_IFACE to ensure that the data plane packets are arriving, (2) the
``access`` interface to see that they make it to the UPF, and (3) the ``core`` to check that they
are forwarded upstream.
If the upstream packets don't make it to DATA_IFACE, you probably need to add the static route
on the eNodeB so packets to the UPF have a next hop of DATA_IFACE. You can see these upstream
packets by running::
tcpdump -i <data-iface> -n udp port 2152
If they don't make it to ``access`` you should check that the kernel routing table is forwarding
a packet with destination 192.158.252.3 to the ``access`` interface. You can see them by running::
tcpdump -i access -n udp port 2152
In case packets are not forwarded from ``DATA_IFACE`` to ``acccess`` interface, the following command
can be used to forward the traffic which is destined to 192.168.252.3::
iptables -A FORWARD -d 192.168.252.3 -i <data-iface> -o access -j ACCEPT
If they don't make it to ``core`` then they are being dropped by the UPF for some reason. This
may be a configuration issue with the state loaded in the ROC / SD-CORE -- the UPF is being told
to discard these packets. You should check that the device's IMSI is part of a slice and that
the slice's policy settings allow traffic to that destination. You can view them via the following::
tcpdump -i core -n net 172.250.0.0/16
That command will capture all packets to/from the UE subnet.
If you cannot figure out the issue, see `Getting Help`_.
.. _rke2-vs-kubespray-install:
Getting Help
------------
Please introduce yourself and post your questions to the `#aether-dev` channel on the ONF Community Slack.
Details about how to join this channel can be found on the `ONF Wiki <https://wiki.opennetworking.org/display/COM/Aether>`_.
In your introduction please state your institution and position, and describe why you are interested in Aether
and what is your end goal.
If you need help debugging your setup, please give as much detail as possible about
your environment: the OS version you have installed, are you running on bare metal or in a VM,
how much CPU and memory does your server have, are you installing behind a proxy, and so on. Also list the steps
you have performed so far, and post any error messages you have received. These details will aid the community
to understand where you are and how to help you make progress.