Zack Williams | 9026f53 | 2020-11-30 11:34:32 -0700 | [diff] [blame] | 1 | .. |
| 2 | SPDX-FileCopyrightText: © 2020 Open Networking Foundation <support@opennetworking.org> |
| 3 | SPDX-License-Identifier: Apache-2.0 |
| 4 | |
| 5 | Troubleshooting |
| 6 | =============== |
| 7 | |
Zack Williams | 9d94b4f | 2020-12-14 11:25:29 -0700 | [diff] [blame] | 8 | |
| 9 | Firewalls and other host network issues |
| 10 | --------------------------------------- |
| 11 | |
Zack Williams | 5fd7a23 | 2020-12-03 12:45:56 -0700 | [diff] [blame] | 12 | Unable to access a system |
Zack Williams | 9d94b4f | 2020-12-14 11:25:29 -0700 | [diff] [blame] | 13 | """"""""""""""""""""""""" |
Zack Williams | 5fd7a23 | 2020-12-03 12:45:56 -0700 | [diff] [blame] | 14 | |
| 15 | If it's a system behind another system (ex: the compute nodes behind a |
| 16 | management server) and you're trying to interactively login to it, make sure |
| 17 | that you've enabled SSH Agent Forwarding in your ``~/.ssh/config`` file:: |
| 18 | |
| 19 | Host mgmtserver1.prod.site.aetherproject.net |
| 20 | ForwardAgent yes |
| 21 | |
| 22 | If you still have problems after verifying that this is set up, run ssh with |
| 23 | the ``-v`` option, which will print out all the connection details and |
| 24 | whether an agent is used on the second ssh:: |
| 25 | |
| 26 | onfadmin@mgmtserver1:~$ ssh onfadmin@node2.mgmt.prod.site.aetherproject.net |
| 27 | debug1: client_input_channel_open: ctype auth-agent@openssh.com rchan 2 win 65536 max 16384 |
| 28 | debug1: channel 1: new [authentication agent connection] |
| 29 | debug1: confirm auth-agent@openssh.com |
| 30 | Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-56-generic x86_64) |
| 31 | ... |
| 32 | onfadmin@node2:~$ |
| 33 | |
Zack Williams | 9d94b4f | 2020-12-14 11:25:29 -0700 | [diff] [blame] | 34 | Root/Public DNS port is blocked |
| 35 | """"""""""""""""""""""""""""""" |
| 36 | |
| 37 | In some cases access to the public DNS root and other servers is blocked, which |
Zack Williams | 1ae109e | 2021-07-27 11:17:04 -0700 | [diff] [blame] | 38 | prevents DNS queries from working within the pod. |
Zack Williams | 9d94b4f | 2020-12-14 11:25:29 -0700 | [diff] [blame] | 39 | |
| 40 | To resolve this, forwarding addresses on the local network can be provided in |
| 41 | the Ansible YAML ``host_vars`` file, using the ``unbound_forward_zones`` list |
| 42 | to configure the Unbound recursive nameserver. An example:: |
| 43 | |
| 44 | unbound_forward_zones: |
| 45 | - name: "." |
| 46 | servers: |
| 47 | - "8.8.8.8" |
| 48 | - "8.8.4.4" |
| 49 | |
| 50 | |
Zack Williams | 1ae109e | 2021-07-27 11:17:04 -0700 | [diff] [blame] | 51 | The items in the ``servers`` list should be locally accessible nameservers. |
Zack Williams | 9d94b4f | 2020-12-14 11:25:29 -0700 | [diff] [blame] | 52 | |
Zack Williams | 5fd7a23 | 2020-12-03 12:45:56 -0700 | [diff] [blame] | 53 | Problems with OS installation |
| 54 | ----------------------------- |
| 55 | |
Zack Williams | e8c3b2c | 2021-02-01 12:47:28 -0700 | [diff] [blame] | 56 | iPXE doesn't load a Menu when started |
| 57 | """"""""""""""""""""""""""""""""""""" |
| 58 | |
| 59 | The URLs that iPXE provides if there is an error take you into it's |
| 60 | documentation, which is of high quality and should explain the error in much |
| 61 | greater detail - for example `https://ipxe.org/3e11623b |
| 62 | <https://ipxe.org/3e11623b>`_ explains that the DNS server address provided by |
| 63 | DHCP is not functional. |
| 64 | |
| 65 | The most common failures would be in network settings being incorrect, which |
| 66 | should be shown when the menu loads in step 4. If the menu does not load, and |
| 67 | you get an `iPXE>` Shell prompt, type:: |
| 68 | |
| 69 | config |
| 70 | |
| 71 | And you should get the iPXE configuration screen, which lists all of the |
| 72 | configuration parameters discovered: |
| 73 | |
| 74 | .. image:: images/mgmtsrv-007.png |
| 75 | :alt: iPXE config menu |
| 76 | :scale: 50% |
| 77 | |
| 78 | Most likely there's something wrong with the network configuration provided by |
| 79 | DHCP - you can scroll this menu with arrow keys to find all the settings |
| 80 | provided by the DHCP server, and SMBIOS information provided by the hardware. |
| 81 | |
Zack Williams | 5fd7a23 | 2020-12-03 12:45:56 -0700 | [diff] [blame] | 82 | OS installs, but doesn't boot |
| 83 | """"""""""""""""""""""""""""" |
| 84 | |
| 85 | If you've completed the installation but the system won't start the OS, check |
| 86 | these BIOS settings: |
| 87 | |
| 88 | - If the startup disk is nVME, under ``Advanced -> PCIe/PCI/PnP Configuration`` |
| 89 | the option ``NVMe Firmware Source`` should be set to ``AMI Native Support``, |
Zack Williams | e8c3b2c | 2021-02-01 12:47:28 -0700 | [diff] [blame] | 90 | per `Supermicro FAQ entry 28248 |
| 91 | <https://www.supermicro.com/support/faqs/faq.cfm?faq=28248>`_. |
Zack Williams | 5fd7a23 | 2020-12-03 12:45:56 -0700 | [diff] [blame] | 92 | |
Zack Williams | 9026f53 | 2020-11-30 11:34:32 -0700 | [diff] [blame] | 93 | Unknown MAC addresses |
| 94 | --------------------- |
| 95 | |
| 96 | Sometimes it's hard to find out all the MAC addresses assigned to network |
| 97 | cards. These can be found in a variety of ways: |
| 98 | |
| 99 | 1. On servers, the BMC webpage will list the built-in network card MAC |
| 100 | addresses. |
| 101 | |
| 102 | 2. If you login to a server, ``ip link`` or ``ip addr`` will show the MAC |
| 103 | address of each interface, including on add-in cards. |
| 104 | |
| 105 | 3. If you can login to a server but don't know the BMC IP or MAC address for |
| 106 | that server, you can find it with ``sudo ipmitool lan print``. |
| 107 | |
| 108 | 4. If you don't have a login to the server, but can get to the management |
| 109 | server, ``ip neighbor`` will show the arp table of MAC addresses known to |
| 110 | that system. It's output is unsorted - ``ip neigh | sort`` is easier to |
Zack Williams | 5fd7a23 | 2020-12-03 12:45:56 -0700 | [diff] [blame] | 111 | read. This can be useful for determining if there's a cabling problem - |
| 112 | a device plugged into the wrong port of the management switch could show up |
| 113 | in the DHCP pool range for a different segment. |
Zack Williams | 9026f53 | 2020-11-30 11:34:32 -0700 | [diff] [blame] | 114 | |
| 115 | Cabling issues |
| 116 | -------------- |
| 117 | |
| 118 | The system may not come up correctly if cabling isn't connected properly. |
| 119 | If you don't have hands-on with the cabling, here are some ways to check on the |
| 120 | cabling remotely: |
| 121 | |
| 122 | 1. On servers you can check which ports are connected with ``ip link show``:: |
| 123 | |
| 124 | $ ip link show |
| 125 | ... |
| 126 | 3: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 |
| 127 | link/ether 3c:ec:ef:4d:55:a8 brd ff:ff:ff:ff:ff:ff |
| 128 | ... |
| 129 | 5: eno2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 |
| 130 | link/ether 3c:ec:ef:4d:55:a9 brd ff:ff:ff:ff:ff:ff |
| 131 | |
| 132 | Ports that are up will show ``state UP`` |
| 133 | |
| 134 | 2. You can determine which remote ports are connected with LLDP, assuming that |
| 135 | the remote switch supports LLDP and has it enabled. This can be done with |
| 136 | ``networkctl lldp``, which shows both the name and the MAC address of the |
| 137 | connected switch on a per-link basis:: |
| 138 | |
| 139 | $ networkctl lldp |
| 140 | LINK CHASSIS ID SYSTEM NAME CAPS PORT ID PORT DESCRIPTION |
| 141 | eno1 10:4f:58:e7:d5:60 Aruba-2540-24…PP ..b........ 10 10 |
| 142 | eno2 10:4f:58:e7:d5:60 Aruba-2540-24…PP ..b........ 1 1 |
Zack Williams | 4c1eab9 | 2021-05-28 11:37:14 -0700 | [diff] [blame] | 143 | |
| 144 | |
| 145 | Problems with ONIE Installation |
| 146 | ------------------------------- |
| 147 | |
| 148 | Can't reboot into ONL, loops on ONIE installer mode |
Zack Williams | 1ae109e | 2021-07-27 11:17:04 -0700 | [diff] [blame] | 149 | """"""""""""""""""""""""""""""""""""""""""""""""""" |
Zack Williams | 4c1eab9 | 2021-05-28 11:37:14 -0700 | [diff] [blame] | 150 | |
| 151 | Sometimes an ONL installation is incomplete or problematic, and reinstalling it |
| 152 | doesn't result in a working system. |
| 153 | |
| 154 | If this is the case, reboot into ONIE Rescue mode and use ``parted`` to delete |
| 155 | all the ``ONL-`` prefixed partitions, then reinstall with an ``onie-installer`` |
| 156 | image. |
| 157 | |
Zack Williams | 1ae109e | 2021-07-27 11:17:04 -0700 | [diff] [blame] | 158 | |
| 159 | Management Network Issues |
| 160 | ------------------------- |
| 161 | |
| 162 | |
| 163 | Cycling PoE port power on a HP/Aruba Management switch |
| 164 | """""""""""""""""""""""""""""""""""""""""""""""""""""" |
| 165 | |
| 166 | You may need to cycle the power on a port if an eNB or monitoring device that |
| 167 | is powered the PoE switch is not responding or misbehaving. |
| 168 | |
| 169 | To do this, login to the switch and check which ports are receiving power:: |
| 170 | |
| 171 | Aruba-2540-24G-PoEP-4SFPP# show power-over-ethernet brief |
| 172 | |
| 173 | Status and Configuration Information |
| 174 | |
| 175 | Available: 370 W Used: 11 W Remaining: 359 W |
| 176 | |
| 177 | PoE Pwr Pwr Pre-std Alloc Alloc PSE Pwr PD Pwr PoE Port PLC PLC |
| 178 | Port Enab Priority Detect Cfg Actual Rsrvd Draw Status Cls Type |
| 179 | ------ ---- -------- ------- ----- ------ ------- ------- ------------ --- ---- |
| 180 | 1 Yes low off usage usage 0.0 W 0.0 W Searching 0 - |
| 181 | 2 Yes low off usage usage 0.0 W 0.0 W Searching 0 - |
| 182 | 3 Yes low off usage usage 0.0 W 0.0 W Searching 0 - |
| 183 | 4 Yes low off usage usage 0.0 W 0.0 W Searching 0 - |
| 184 | 5 Yes low off usage usage 0.0 W 0.0 W Searching 0 - |
| 185 | 6 Yes low off usage usage 0.0 W 0.0 W Searching 0 - |
| 186 | 7 Yes low off usage usage 0.0 W 0.0 W Searching 0 - |
| 187 | 8 Yes low off usage usage 0.0 W 0.0 W Searching 0 - |
| 188 | 9 Yes low off usage usage 0.0 W 0.0 W Searching 0 - |
| 189 | 10 Yes low off usage usage 0.0 W 0.0 W Searching 0 - |
| 190 | 11 Yes low off usage usage 0.0 W 0.0 W Searching 0 - |
| 191 | 12 Yes low off usage usage 0.0 W 0.0 W Searching 0 - |
| 192 | 13 Yes low off usage usage 0.0 W 0.0 W Searching 0 - |
| 193 | 14 Yes low off usage usage 0.0 W 0.0 W Searching 0 - |
| 194 | 15 Yes low off usage usage 0.0 W 0.0 W Searching 0 - |
| 195 | 16 Yes low off usage usage 0.0 W 0.0 W Searching 0 - |
| 196 | 17 Yes low off usage usage 0.0 W 0.0 W Searching 0 - |
| 197 | 18 Yes low off usage usage 0.0 W 0.0 W Searching 0 - |
| 198 | 19 Yes low off usage usage 0.0 W 0.0 W Searching 0 - |
| 199 | 20 Yes low off usage usage 0.0 W 0.0 W Searching 0 - |
| 200 | 21 Yes low off usage usage 0.0 W 0.0 W Searching 0 - |
| 201 | 22 Yes low off usage usage 4.9 W 4.7 W Delivering 3 1 |
| 202 | 23 Yes low off usage usage 6.0 W 5.7 W Delivering 3 1 |
| 203 | 24 Yes low off usage usage 0.0 W 0.0 W Searching 0 - |
| 204 | |
| 205 | For this example, if we want to reset port 23, run these commands to disable |
| 206 | the PoE power on the port:: |
| 207 | |
| 208 | Aruba-2540-24G-PoEP-4SFPP# config |
| 209 | Aruba-2540-24G-PoEP-4SFPP(config)# interface 23 |
| 210 | Aruba-2540-24G-PoEP-4SFPP(eth-23)# no power-over-ethernet |
| 211 | Aruba-2540-24G-PoEP-4SFPP(eth-23)# show power-over-ethernet ethernet 23 |
| 212 | |
| 213 | Status and Configuration Information for port 23 |
| 214 | |
| 215 | Power Enable : No PoE Port Status : Disabled |
| 216 | PLC Class/Type : 0/- Priority Config : low |
| 217 | DLC Class/Type : 0/- Pre-std Detect : off |
| 218 | Alloc By Config : usage Configured Type : |
| 219 | Alloc By Actual : usage PoE Value Config : n/a |
| 220 | |
| 221 | |
| 222 | PoE Counter Information |
| 223 | |
| 224 | Over Current Cnt : 0 MPS Absent Cnt : 0 |
| 225 | Power Denied Cnt : 0 Short Cnt : 0 |
| 226 | |
| 227 | |
| 228 | LLDP Information |
| 229 | |
| 230 | PSE Allocated Power Value : 0.0 W PSE TLV Configured : dot3, MED |
| 231 | PD Requested Power Value : 0.0 W PSE TLV Sent Type : dot3 |
| 232 | MED LLDP Detect : Disabled PD TLV Sent Type : n/a |
| 233 | |
| 234 | |
| 235 | Power Information |
| 236 | |
| 237 | PSE Voltage : 0.0 V PSE Reserved Power : 0.0 W |
| 238 | PD Amperage Draw : 0 mA PD Power Draw : 0.0 W |
| 239 | |
| 240 | |
| 241 | At this point, the power has been removed from the device. To reenable it:: |
| 242 | |
| 243 | Aruba-2540-24G-PoEP-4SFPP(eth-23)# power-over-ethernet |
| 244 | Aruba-2540-24G-PoEP-4SFPP(eth-23)# show power-over-ethernet ethernet 23 |
| 245 | |
| 246 | Status and Configuration Information for port 23 |
| 247 | |
| 248 | Power Enable : Yes PoE Port Status : Delivering |
| 249 | PLC Class/Type : 3/1 Priority Config : low |
| 250 | DLC Class/Type : 0/- Pre-std Detect : off |
| 251 | Alloc By Config : usage Configured Type : |
| 252 | Alloc By Actual : usage PoE Value Config : n/a |
| 253 | |
| 254 | |
| 255 | PoE Counter Information |
| 256 | |
| 257 | Over Current Cnt : 0 MPS Absent Cnt : 0 |
| 258 | Power Denied Cnt : 0 Short Cnt : 0 |
| 259 | |
| 260 | |
| 261 | LLDP Information |
| 262 | |
| 263 | PSE Allocated Power Value : 0.0 W PSE TLV Configured : dot3, MED |
| 264 | PD Requested Power Value : 0.0 W PSE TLV Sent Type : dot3 |
| 265 | MED LLDP Detect : Disabled PD TLV Sent Type : n/a |
| 266 | |
| 267 | |
| 268 | Power Information |
| 269 | |
| 270 | PSE Voltage : 0.0 V PSE Reserved Power : 0.1 W |
| 271 | PD Amperage Draw : 18 mA PD Power Draw : 0.0 W |
| 272 | |
| 273 | |
| 274 | |
| 275 | Refer to command's help option for field definitions |
| 276 | |
| 277 | Aruba-2540-24G-PoEP-4SFPP(eth-23)# exit |
| 278 | Aruba-2540-24G-PoEP-4SFPP(config)# exit |
| 279 | |