blob: 7e248f766b82efeedaaf901ece926353d282e9d2 [file] [log] [blame]
Zack Williams9026f532020-11-30 11:34:32 -07001..
2 SPDX-FileCopyrightText: © 2020 Open Networking Foundation <support@opennetworking.org>
3 SPDX-License-Identifier: Apache-2.0
4
5Troubleshooting
6===============
7
Zack Williams9d94b4f2020-12-14 11:25:29 -07008
9Firewalls and other host network issues
10---------------------------------------
11
Zack Williams5fd7a232020-12-03 12:45:56 -070012Unable to access a system
Zack Williams9d94b4f2020-12-14 11:25:29 -070013"""""""""""""""""""""""""
Zack Williams5fd7a232020-12-03 12:45:56 -070014
15If it's a system behind another system (ex: the compute nodes behind a
Zack Williamsb7d45152022-03-11 09:37:34 -070016management router) and you're trying to interactively login to it, make sure
Zack Williams5fd7a232020-12-03 12:45:56 -070017that you've enabled SSH Agent Forwarding in your ``~/.ssh/config`` file::
18
19 Host mgmtserver1.prod.site.aetherproject.net
20 ForwardAgent yes
21
22If you still have problems after verifying that this is set up, run ssh with
23the ``-v`` option, which will print out all the connection details and
24whether an agent is used on the second ssh::
25
26 onfadmin@mgmtserver1:~$ ssh onfadmin@node2.mgmt.prod.site.aetherproject.net
27 debug1: client_input_channel_open: ctype auth-agent@openssh.com rchan 2 win 65536 max 16384
28 debug1: channel 1: new [authentication agent connection]
29 debug1: confirm auth-agent@openssh.com
30 Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-56-generic x86_64)
31 ...
32 onfadmin@node2:~$
33
Zack Williams9d94b4f2020-12-14 11:25:29 -070034Root/Public DNS port is blocked
35"""""""""""""""""""""""""""""""
36
37In some cases access to the public DNS root and other servers is blocked, which
Zack Williams1ae109e2021-07-27 11:17:04 -070038prevents DNS queries from working within the pod.
Zack Williams9d94b4f2020-12-14 11:25:29 -070039
40To resolve this, forwarding addresses on the local network can be provided in
41the Ansible YAML ``host_vars`` file, using the ``unbound_forward_zones`` list
42to configure the Unbound recursive nameserver. An example::
43
44 unbound_forward_zones:
45 - name: "."
46 servers:
47 - "8.8.8.8"
48 - "8.8.4.4"
49
50
Zack Williams1ae109e2021-07-27 11:17:04 -070051The items in the ``servers`` list should be locally accessible nameservers.
Zack Williams9d94b4f2020-12-14 11:25:29 -070052
Zack Williams5fd7a232020-12-03 12:45:56 -070053Problems with OS installation
54-----------------------------
55
Zack Williamse8c3b2c2021-02-01 12:47:28 -070056iPXE doesn't load a Menu when started
57"""""""""""""""""""""""""""""""""""""
58
59The URLs that iPXE provides if there is an error take you into it's
60documentation, which is of high quality and should explain the error in much
61greater detail - for example `https://ipxe.org/3e11623b
62<https://ipxe.org/3e11623b>`_ explains that the DNS server address provided by
63DHCP is not functional.
64
65The most common failures would be in network settings being incorrect, which
66should be shown when the menu loads in step 4. If the menu does not load, and
67you get an `iPXE>` Shell prompt, type::
68
69 config
70
71And you should get the iPXE configuration screen, which lists all of the
72configuration parameters discovered:
73
74 .. image:: images/mgmtsrv-007.png
75 :alt: iPXE config menu
76 :scale: 50%
77
78Most likely there's something wrong with the network configuration provided by
79DHCP - you can scroll this menu with arrow keys to find all the settings
80provided by the DHCP server, and SMBIOS information provided by the hardware.
81
Zack Williams5fd7a232020-12-03 12:45:56 -070082OS installs, but doesn't boot
83"""""""""""""""""""""""""""""
84
85If you've completed the installation but the system won't start the OS, check
86these BIOS settings:
87
88- If the startup disk is nVME, under ``Advanced -> PCIe/PCI/PnP Configuration``
89 the option ``NVMe Firmware Source`` should be set to ``AMI Native Support``,
Zack Williamse8c3b2c2021-02-01 12:47:28 -070090 per `Supermicro FAQ entry 28248
91 <https://www.supermicro.com/support/faqs/faq.cfm?faq=28248>`_.
Zack Williams5fd7a232020-12-03 12:45:56 -070092
Zack Williams9026f532020-11-30 11:34:32 -070093Unknown MAC addresses
94---------------------
95
96Sometimes it's hard to find out all the MAC addresses assigned to network
97cards. These can be found in a variety of ways:
98
991. On servers, the BMC webpage will list the built-in network card MAC
100 addresses.
101
1022. If you login to a server, ``ip link`` or ``ip addr`` will show the MAC
103 address of each interface, including on add-in cards.
104
1053. If you can login to a server but don't know the BMC IP or MAC address for
106 that server, you can find it with ``sudo ipmitool lan print``.
107
1084. If you don't have a login to the server, but can get to the management
109 server, ``ip neighbor`` will show the arp table of MAC addresses known to
110 that system. It's output is unsorted - ``ip neigh | sort`` is easier to
Zack Williams5fd7a232020-12-03 12:45:56 -0700111 read. This can be useful for determining if there's a cabling problem -
112 a device plugged into the wrong port of the management switch could show up
113 in the DHCP pool range for a different segment.
Zack Williams9026f532020-11-30 11:34:32 -0700114
115Cabling issues
116--------------
117
118The system may not come up correctly if cabling isn't connected properly.
119If you don't have hands-on with the cabling, here are some ways to check on the
120cabling remotely:
121
1221. On servers you can check which ports are connected with ``ip link show``::
123
124 $ ip link show
125 ...
126 3: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
127 link/ether 3c:ec:ef:4d:55:a8 brd ff:ff:ff:ff:ff:ff
128 ...
129 5: eno2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
130 link/ether 3c:ec:ef:4d:55:a9 brd ff:ff:ff:ff:ff:ff
131
132 Ports that are up will show ``state UP``
133
1342. You can determine which remote ports are connected with LLDP, assuming that
135 the remote switch supports LLDP and has it enabled. This can be done with
136 ``networkctl lldp``, which shows both the name and the MAC address of the
137 connected switch on a per-link basis::
138
139 $ networkctl lldp
140 LINK CHASSIS ID SYSTEM NAME CAPS PORT ID PORT DESCRIPTION
141 eno1 10:4f:58:e7:d5:60 Aruba-2540-24…PP ..b........ 10 10
142 eno2 10:4f:58:e7:d5:60 Aruba-2540-24…PP ..b........ 1 1
Zack Williams4c1eab92021-05-28 11:37:14 -0700143
144
Charles Chan770bb612022-03-02 12:34:04 -0800145Problems with SD-Fabric
146-----------------------
147Refer to :ref:`SD-Fabric Troubleshooting Guide <sdfabric:troubleshooting:Troubleshooting Guide>`
148for SD-Fabric related issues.
Zack Williams1ae109e2021-07-27 11:17:04 -0700149
150Management Network Issues
151-------------------------
152
Zack Williams1ae109e2021-07-27 11:17:04 -0700153Cycling PoE port power on a HP/Aruba Management switch
154""""""""""""""""""""""""""""""""""""""""""""""""""""""
155
156You may need to cycle the power on a port if an eNB or monitoring device that
157is powered the PoE switch is not responding or misbehaving.
158
159To do this, login to the switch and check which ports are receiving power::
160
161 Aruba-2540-24G-PoEP-4SFPP# show power-over-ethernet brief
162
163 Status and Configuration Information
164
165 Available: 370 W Used: 11 W Remaining: 359 W
166
167 PoE Pwr Pwr Pre-std Alloc Alloc PSE Pwr PD Pwr PoE Port PLC PLC
168 Port Enab Priority Detect Cfg Actual Rsrvd Draw Status Cls Type
169 ------ ---- -------- ------- ----- ------ ------- ------- ------------ --- ----
170 1 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
171 2 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
172 3 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
173 4 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
174 5 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
175 6 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
176 7 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
177 8 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
178 9 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
179 10 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
180 11 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
181 12 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
182 13 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
183 14 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
184 15 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
185 16 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
186 17 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
187 18 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
188 19 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
189 20 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
190 21 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
191 22 Yes low off usage usage 4.9 W 4.7 W Delivering 3 1
192 23 Yes low off usage usage 6.0 W 5.7 W Delivering 3 1
193 24 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
194
195For this example, if we want to reset port 23, run these commands to disable
196the PoE power on the port::
197
198 Aruba-2540-24G-PoEP-4SFPP# config
199 Aruba-2540-24G-PoEP-4SFPP(config)# interface 23
200 Aruba-2540-24G-PoEP-4SFPP(eth-23)# no power-over-ethernet
201 Aruba-2540-24G-PoEP-4SFPP(eth-23)# show power-over-ethernet ethernet 23
202
203 Status and Configuration Information for port 23
204
205 Power Enable : No PoE Port Status : Disabled
206 PLC Class/Type : 0/- Priority Config : low
207 DLC Class/Type : 0/- Pre-std Detect : off
208 Alloc By Config : usage Configured Type :
209 Alloc By Actual : usage PoE Value Config : n/a
210
211
212 PoE Counter Information
213
214 Over Current Cnt : 0 MPS Absent Cnt : 0
215 Power Denied Cnt : 0 Short Cnt : 0
216
217
218 LLDP Information
219
220 PSE Allocated Power Value : 0.0 W PSE TLV Configured : dot3, MED
221 PD Requested Power Value : 0.0 W PSE TLV Sent Type : dot3
222 MED LLDP Detect : Disabled PD TLV Sent Type : n/a
223
224
225 Power Information
226
227 PSE Voltage : 0.0 V PSE Reserved Power : 0.0 W
228 PD Amperage Draw : 0 mA PD Power Draw : 0.0 W
229
230
231At this point, the power has been removed from the device. To reenable it::
232
233 Aruba-2540-24G-PoEP-4SFPP(eth-23)# power-over-ethernet
234 Aruba-2540-24G-PoEP-4SFPP(eth-23)# show power-over-ethernet ethernet 23
235
236 Status and Configuration Information for port 23
237
238 Power Enable : Yes PoE Port Status : Delivering
239 PLC Class/Type : 3/1 Priority Config : low
240 DLC Class/Type : 0/- Pre-std Detect : off
241 Alloc By Config : usage Configured Type :
242 Alloc By Actual : usage PoE Value Config : n/a
243
244
245 PoE Counter Information
246
247 Over Current Cnt : 0 MPS Absent Cnt : 0
248 Power Denied Cnt : 0 Short Cnt : 0
249
250
251 LLDP Information
252
253 PSE Allocated Power Value : 0.0 W PSE TLV Configured : dot3, MED
254 PD Requested Power Value : 0.0 W PSE TLV Sent Type : dot3
255 MED LLDP Detect : Disabled PD TLV Sent Type : n/a
256
257
258 Power Information
259
260 PSE Voltage : 0.0 V PSE Reserved Power : 0.1 W
261 PD Amperage Draw : 18 mA PD Power Draw : 0.0 W
262
263
264
265 Refer to command's help option for field definitions
266
267 Aruba-2540-24G-PoEP-4SFPP(eth-23)# exit
268 Aruba-2540-24G-PoEP-4SFPP(config)# exit
269