Zack Williams | 9026f53 | 2020-11-30 11:34:32 -0700 | [diff] [blame] | 1 | .. |
| 2 | SPDX-FileCopyrightText: © 2020 Open Networking Foundation <support@opennetworking.org> |
| 3 | SPDX-License-Identifier: Apache-2.0 |
| 4 | |
| 5 | Troubleshooting |
| 6 | =============== |
| 7 | |
Zack Williams | 9d94b4f | 2020-12-14 11:25:29 -0700 | [diff] [blame] | 8 | |
| 9 | Firewalls and other host network issues |
| 10 | --------------------------------------- |
| 11 | |
Zack Williams | 5fd7a23 | 2020-12-03 12:45:56 -0700 | [diff] [blame] | 12 | Unable to access a system |
Zack Williams | 9d94b4f | 2020-12-14 11:25:29 -0700 | [diff] [blame] | 13 | """"""""""""""""""""""""" |
Zack Williams | 5fd7a23 | 2020-12-03 12:45:56 -0700 | [diff] [blame] | 14 | |
| 15 | If it's a system behind another system (ex: the compute nodes behind a |
| 16 | management server) and you're trying to interactively login to it, make sure |
| 17 | that you've enabled SSH Agent Forwarding in your ``~/.ssh/config`` file:: |
| 18 | |
| 19 | Host mgmtserver1.prod.site.aetherproject.net |
| 20 | ForwardAgent yes |
| 21 | |
| 22 | If you still have problems after verifying that this is set up, run ssh with |
| 23 | the ``-v`` option, which will print out all the connection details and |
| 24 | whether an agent is used on the second ssh:: |
| 25 | |
| 26 | onfadmin@mgmtserver1:~$ ssh onfadmin@node2.mgmt.prod.site.aetherproject.net |
| 27 | debug1: client_input_channel_open: ctype auth-agent@openssh.com rchan 2 win 65536 max 16384 |
| 28 | debug1: channel 1: new [authentication agent connection] |
| 29 | debug1: confirm auth-agent@openssh.com |
| 30 | Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-56-generic x86_64) |
| 31 | ... |
| 32 | onfadmin@node2:~$ |
| 33 | |
Zack Williams | 9d94b4f | 2020-12-14 11:25:29 -0700 | [diff] [blame] | 34 | Root/Public DNS port is blocked |
| 35 | """"""""""""""""""""""""""""""" |
| 36 | |
| 37 | In some cases access to the public DNS root and other servers is blocked, which |
| 38 | prevents DNS lookups from working within the pod. |
| 39 | |
| 40 | To resolve this, forwarding addresses on the local network can be provided in |
| 41 | the Ansible YAML ``host_vars`` file, using the ``unbound_forward_zones`` list |
| 42 | to configure the Unbound recursive nameserver. An example:: |
| 43 | |
| 44 | unbound_forward_zones: |
| 45 | - name: "." |
| 46 | servers: |
| 47 | - "8.8.8.8" |
| 48 | - "8.8.4.4" |
| 49 | |
| 50 | |
| 51 | The items in the ``servers`` list would be the locally accessible nameservers. |
| 52 | |
Zack Williams | 5fd7a23 | 2020-12-03 12:45:56 -0700 | [diff] [blame] | 53 | Problems with OS installation |
| 54 | ----------------------------- |
| 55 | |
Zack Williams | e8c3b2c | 2021-02-01 12:47:28 -0700 | [diff] [blame] | 56 | iPXE doesn't load a Menu when started |
| 57 | """"""""""""""""""""""""""""""""""""" |
| 58 | |
| 59 | The URLs that iPXE provides if there is an error take you into it's |
| 60 | documentation, which is of high quality and should explain the error in much |
| 61 | greater detail - for example `https://ipxe.org/3e11623b |
| 62 | <https://ipxe.org/3e11623b>`_ explains that the DNS server address provided by |
| 63 | DHCP is not functional. |
| 64 | |
| 65 | The most common failures would be in network settings being incorrect, which |
| 66 | should be shown when the menu loads in step 4. If the menu does not load, and |
| 67 | you get an `iPXE>` Shell prompt, type:: |
| 68 | |
| 69 | config |
| 70 | |
| 71 | And you should get the iPXE configuration screen, which lists all of the |
| 72 | configuration parameters discovered: |
| 73 | |
| 74 | .. image:: images/mgmtsrv-007.png |
| 75 | :alt: iPXE config menu |
| 76 | :scale: 50% |
| 77 | |
| 78 | Most likely there's something wrong with the network configuration provided by |
| 79 | DHCP - you can scroll this menu with arrow keys to find all the settings |
| 80 | provided by the DHCP server, and SMBIOS information provided by the hardware. |
| 81 | |
Zack Williams | 5fd7a23 | 2020-12-03 12:45:56 -0700 | [diff] [blame] | 82 | OS installs, but doesn't boot |
| 83 | """"""""""""""""""""""""""""" |
| 84 | |
| 85 | If you've completed the installation but the system won't start the OS, check |
| 86 | these BIOS settings: |
| 87 | |
| 88 | - If the startup disk is nVME, under ``Advanced -> PCIe/PCI/PnP Configuration`` |
| 89 | the option ``NVMe Firmware Source`` should be set to ``AMI Native Support``, |
Zack Williams | e8c3b2c | 2021-02-01 12:47:28 -0700 | [diff] [blame] | 90 | per `Supermicro FAQ entry 28248 |
| 91 | <https://www.supermicro.com/support/faqs/faq.cfm?faq=28248>`_. |
Zack Williams | 5fd7a23 | 2020-12-03 12:45:56 -0700 | [diff] [blame] | 92 | |
Zack Williams | 9026f53 | 2020-11-30 11:34:32 -0700 | [diff] [blame] | 93 | Unknown MAC addresses |
| 94 | --------------------- |
| 95 | |
| 96 | Sometimes it's hard to find out all the MAC addresses assigned to network |
| 97 | cards. These can be found in a variety of ways: |
| 98 | |
| 99 | 1. On servers, the BMC webpage will list the built-in network card MAC |
| 100 | addresses. |
| 101 | |
| 102 | 2. If you login to a server, ``ip link`` or ``ip addr`` will show the MAC |
| 103 | address of each interface, including on add-in cards. |
| 104 | |
| 105 | 3. If you can login to a server but don't know the BMC IP or MAC address for |
| 106 | that server, you can find it with ``sudo ipmitool lan print``. |
| 107 | |
| 108 | 4. If you don't have a login to the server, but can get to the management |
| 109 | server, ``ip neighbor`` will show the arp table of MAC addresses known to |
| 110 | that system. It's output is unsorted - ``ip neigh | sort`` is easier to |
Zack Williams | 5fd7a23 | 2020-12-03 12:45:56 -0700 | [diff] [blame] | 111 | read. This can be useful for determining if there's a cabling problem - |
| 112 | a device plugged into the wrong port of the management switch could show up |
| 113 | in the DHCP pool range for a different segment. |
Zack Williams | 9026f53 | 2020-11-30 11:34:32 -0700 | [diff] [blame] | 114 | |
| 115 | Cabling issues |
| 116 | -------------- |
| 117 | |
| 118 | The system may not come up correctly if cabling isn't connected properly. |
| 119 | If you don't have hands-on with the cabling, here are some ways to check on the |
| 120 | cabling remotely: |
| 121 | |
| 122 | 1. On servers you can check which ports are connected with ``ip link show``:: |
| 123 | |
| 124 | $ ip link show |
| 125 | ... |
| 126 | 3: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 |
| 127 | link/ether 3c:ec:ef:4d:55:a8 brd ff:ff:ff:ff:ff:ff |
| 128 | ... |
| 129 | 5: eno2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 |
| 130 | link/ether 3c:ec:ef:4d:55:a9 brd ff:ff:ff:ff:ff:ff |
| 131 | |
| 132 | Ports that are up will show ``state UP`` |
| 133 | |
| 134 | 2. You can determine which remote ports are connected with LLDP, assuming that |
| 135 | the remote switch supports LLDP and has it enabled. This can be done with |
| 136 | ``networkctl lldp``, which shows both the name and the MAC address of the |
| 137 | connected switch on a per-link basis:: |
| 138 | |
| 139 | $ networkctl lldp |
| 140 | LINK CHASSIS ID SYSTEM NAME CAPS PORT ID PORT DESCRIPTION |
| 141 | eno1 10:4f:58:e7:d5:60 Aruba-2540-24…PP ..b........ 10 10 |
| 142 | eno2 10:4f:58:e7:d5:60 Aruba-2540-24…PP ..b........ 1 1 |
Zack Williams | 4c1eab9 | 2021-05-28 11:37:14 -0700 | [diff] [blame] | 143 | |
| 144 | |
| 145 | Problems with ONIE Installation |
| 146 | ------------------------------- |
| 147 | |
| 148 | Can't reboot into ONL, loops on ONIE installer mode |
| 149 | --------------------------------------------------- |
| 150 | |
| 151 | Sometimes an ONL installation is incomplete or problematic, and reinstalling it |
| 152 | doesn't result in a working system. |
| 153 | |
| 154 | If this is the case, reboot into ONIE Rescue mode and use ``parted`` to delete |
| 155 | all the ``ONL-`` prefixed partitions, then reinstall with an ``onie-installer`` |
| 156 | image. |
| 157 | |