blob: ad44794a063a9a0930f9479a6dbe98d6a174f71f [file] [log] [blame]
Zack Williams9026f532020-11-30 11:34:32 -07001..
2 SPDX-FileCopyrightText: © 2020 Open Networking Foundation <support@opennetworking.org>
3 SPDX-License-Identifier: Apache-2.0
4
5Troubleshooting
6===============
7
Zack Williams9d94b4f2020-12-14 11:25:29 -07008
9Firewalls and other host network issues
10---------------------------------------
11
Zack Williams5fd7a232020-12-03 12:45:56 -070012Unable to access a system
Zack Williams9d94b4f2020-12-14 11:25:29 -070013"""""""""""""""""""""""""
Zack Williams5fd7a232020-12-03 12:45:56 -070014
15If it's a system behind another system (ex: the compute nodes behind a
16management server) and you're trying to interactively login to it, make sure
17that you've enabled SSH Agent Forwarding in your ``~/.ssh/config`` file::
18
19 Host mgmtserver1.prod.site.aetherproject.net
20 ForwardAgent yes
21
22If you still have problems after verifying that this is set up, run ssh with
23the ``-v`` option, which will print out all the connection details and
24whether an agent is used on the second ssh::
25
26 onfadmin@mgmtserver1:~$ ssh onfadmin@node2.mgmt.prod.site.aetherproject.net
27 debug1: client_input_channel_open: ctype auth-agent@openssh.com rchan 2 win 65536 max 16384
28 debug1: channel 1: new [authentication agent connection]
29 debug1: confirm auth-agent@openssh.com
30 Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-56-generic x86_64)
31 ...
32 onfadmin@node2:~$
33
Zack Williams9d94b4f2020-12-14 11:25:29 -070034Root/Public DNS port is blocked
35"""""""""""""""""""""""""""""""
36
37In some cases access to the public DNS root and other servers is blocked, which
Zack Williams1ae109e2021-07-27 11:17:04 -070038prevents DNS queries from working within the pod.
Zack Williams9d94b4f2020-12-14 11:25:29 -070039
40To resolve this, forwarding addresses on the local network can be provided in
41the Ansible YAML ``host_vars`` file, using the ``unbound_forward_zones`` list
42to configure the Unbound recursive nameserver. An example::
43
44 unbound_forward_zones:
45 - name: "."
46 servers:
47 - "8.8.8.8"
48 - "8.8.4.4"
49
50
Zack Williams1ae109e2021-07-27 11:17:04 -070051The items in the ``servers`` list should be locally accessible nameservers.
Zack Williams9d94b4f2020-12-14 11:25:29 -070052
Zack Williams5fd7a232020-12-03 12:45:56 -070053Problems with OS installation
54-----------------------------
55
Zack Williamse8c3b2c2021-02-01 12:47:28 -070056iPXE doesn't load a Menu when started
57"""""""""""""""""""""""""""""""""""""
58
59The URLs that iPXE provides if there is an error take you into it's
60documentation, which is of high quality and should explain the error in much
61greater detail - for example `https://ipxe.org/3e11623b
62<https://ipxe.org/3e11623b>`_ explains that the DNS server address provided by
63DHCP is not functional.
64
65The most common failures would be in network settings being incorrect, which
66should be shown when the menu loads in step 4. If the menu does not load, and
67you get an `iPXE>` Shell prompt, type::
68
69 config
70
71And you should get the iPXE configuration screen, which lists all of the
72configuration parameters discovered:
73
74 .. image:: images/mgmtsrv-007.png
75 :alt: iPXE config menu
76 :scale: 50%
77
78Most likely there's something wrong with the network configuration provided by
79DHCP - you can scroll this menu with arrow keys to find all the settings
80provided by the DHCP server, and SMBIOS information provided by the hardware.
81
Zack Williams5fd7a232020-12-03 12:45:56 -070082OS installs, but doesn't boot
83"""""""""""""""""""""""""""""
84
85If you've completed the installation but the system won't start the OS, check
86these BIOS settings:
87
88- If the startup disk is nVME, under ``Advanced -> PCIe/PCI/PnP Configuration``
89 the option ``NVMe Firmware Source`` should be set to ``AMI Native Support``,
Zack Williamse8c3b2c2021-02-01 12:47:28 -070090 per `Supermicro FAQ entry 28248
91 <https://www.supermicro.com/support/faqs/faq.cfm?faq=28248>`_.
Zack Williams5fd7a232020-12-03 12:45:56 -070092
Zack Williams9026f532020-11-30 11:34:32 -070093Unknown MAC addresses
94---------------------
95
96Sometimes it's hard to find out all the MAC addresses assigned to network
97cards. These can be found in a variety of ways:
98
991. On servers, the BMC webpage will list the built-in network card MAC
100 addresses.
101
1022. If you login to a server, ``ip link`` or ``ip addr`` will show the MAC
103 address of each interface, including on add-in cards.
104
1053. If you can login to a server but don't know the BMC IP or MAC address for
106 that server, you can find it with ``sudo ipmitool lan print``.
107
1084. If you don't have a login to the server, but can get to the management
109 server, ``ip neighbor`` will show the arp table of MAC addresses known to
110 that system. It's output is unsorted - ``ip neigh | sort`` is easier to
Zack Williams5fd7a232020-12-03 12:45:56 -0700111 read. This can be useful for determining if there's a cabling problem -
112 a device plugged into the wrong port of the management switch could show up
113 in the DHCP pool range for a different segment.
Zack Williams9026f532020-11-30 11:34:32 -0700114
115Cabling issues
116--------------
117
118The system may not come up correctly if cabling isn't connected properly.
119If you don't have hands-on with the cabling, here are some ways to check on the
120cabling remotely:
121
1221. On servers you can check which ports are connected with ``ip link show``::
123
124 $ ip link show
125 ...
126 3: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
127 link/ether 3c:ec:ef:4d:55:a8 brd ff:ff:ff:ff:ff:ff
128 ...
129 5: eno2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
130 link/ether 3c:ec:ef:4d:55:a9 brd ff:ff:ff:ff:ff:ff
131
132 Ports that are up will show ``state UP``
133
1342. You can determine which remote ports are connected with LLDP, assuming that
135 the remote switch supports LLDP and has it enabled. This can be done with
136 ``networkctl lldp``, which shows both the name and the MAC address of the
137 connected switch on a per-link basis::
138
139 $ networkctl lldp
140 LINK CHASSIS ID SYSTEM NAME CAPS PORT ID PORT DESCRIPTION
141 eno1 10:4f:58:e7:d5:60 Aruba-2540-24…PP ..b........ 10 10
142 eno2 10:4f:58:e7:d5:60 Aruba-2540-24…PP ..b........ 1 1
Zack Williams4c1eab92021-05-28 11:37:14 -0700143
144
145Problems with ONIE Installation
146-------------------------------
147
148Can't reboot into ONL, loops on ONIE installer mode
Zack Williams1ae109e2021-07-27 11:17:04 -0700149"""""""""""""""""""""""""""""""""""""""""""""""""""
Zack Williams4c1eab92021-05-28 11:37:14 -0700150
151Sometimes an ONL installation is incomplete or problematic, and reinstalling it
152doesn't result in a working system.
153
154If this is the case, reboot into ONIE Rescue mode and use ``parted`` to delete
155all the ``ONL-`` prefixed partitions, then reinstall with an ``onie-installer``
156image.
157
Zack Williams1ae109e2021-07-27 11:17:04 -0700158
159Management Network Issues
160-------------------------
161
162
163Cycling PoE port power on a HP/Aruba Management switch
164""""""""""""""""""""""""""""""""""""""""""""""""""""""
165
166You may need to cycle the power on a port if an eNB or monitoring device that
167is powered the PoE switch is not responding or misbehaving.
168
169To do this, login to the switch and check which ports are receiving power::
170
171 Aruba-2540-24G-PoEP-4SFPP# show power-over-ethernet brief
172
173 Status and Configuration Information
174
175 Available: 370 W Used: 11 W Remaining: 359 W
176
177 PoE Pwr Pwr Pre-std Alloc Alloc PSE Pwr PD Pwr PoE Port PLC PLC
178 Port Enab Priority Detect Cfg Actual Rsrvd Draw Status Cls Type
179 ------ ---- -------- ------- ----- ------ ------- ------- ------------ --- ----
180 1 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
181 2 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
182 3 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
183 4 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
184 5 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
185 6 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
186 7 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
187 8 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
188 9 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
189 10 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
190 11 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
191 12 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
192 13 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
193 14 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
194 15 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
195 16 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
196 17 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
197 18 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
198 19 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
199 20 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
200 21 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
201 22 Yes low off usage usage 4.9 W 4.7 W Delivering 3 1
202 23 Yes low off usage usage 6.0 W 5.7 W Delivering 3 1
203 24 Yes low off usage usage 0.0 W 0.0 W Searching 0 -
204
205For this example, if we want to reset port 23, run these commands to disable
206the PoE power on the port::
207
208 Aruba-2540-24G-PoEP-4SFPP# config
209 Aruba-2540-24G-PoEP-4SFPP(config)# interface 23
210 Aruba-2540-24G-PoEP-4SFPP(eth-23)# no power-over-ethernet
211 Aruba-2540-24G-PoEP-4SFPP(eth-23)# show power-over-ethernet ethernet 23
212
213 Status and Configuration Information for port 23
214
215 Power Enable : No PoE Port Status : Disabled
216 PLC Class/Type : 0/- Priority Config : low
217 DLC Class/Type : 0/- Pre-std Detect : off
218 Alloc By Config : usage Configured Type :
219 Alloc By Actual : usage PoE Value Config : n/a
220
221
222 PoE Counter Information
223
224 Over Current Cnt : 0 MPS Absent Cnt : 0
225 Power Denied Cnt : 0 Short Cnt : 0
226
227
228 LLDP Information
229
230 PSE Allocated Power Value : 0.0 W PSE TLV Configured : dot3, MED
231 PD Requested Power Value : 0.0 W PSE TLV Sent Type : dot3
232 MED LLDP Detect : Disabled PD TLV Sent Type : n/a
233
234
235 Power Information
236
237 PSE Voltage : 0.0 V PSE Reserved Power : 0.0 W
238 PD Amperage Draw : 0 mA PD Power Draw : 0.0 W
239
240
241At this point, the power has been removed from the device. To reenable it::
242
243 Aruba-2540-24G-PoEP-4SFPP(eth-23)# power-over-ethernet
244 Aruba-2540-24G-PoEP-4SFPP(eth-23)# show power-over-ethernet ethernet 23
245
246 Status and Configuration Information for port 23
247
248 Power Enable : Yes PoE Port Status : Delivering
249 PLC Class/Type : 3/1 Priority Config : low
250 DLC Class/Type : 0/- Pre-std Detect : off
251 Alloc By Config : usage Configured Type :
252 Alloc By Actual : usage PoE Value Config : n/a
253
254
255 PoE Counter Information
256
257 Over Current Cnt : 0 MPS Absent Cnt : 0
258 Power Denied Cnt : 0 Short Cnt : 0
259
260
261 LLDP Information
262
263 PSE Allocated Power Value : 0.0 W PSE TLV Configured : dot3, MED
264 PD Requested Power Value : 0.0 W PSE TLV Sent Type : dot3
265 MED LLDP Detect : Disabled PD TLV Sent Type : n/a
266
267
268 Power Information
269
270 PSE Voltage : 0.0 V PSE Reserved Power : 0.1 W
271 PD Amperage Draw : 18 mA PD Power Draw : 0.0 W
272
273
274
275 Refer to command's help option for field definitions
276
277 Aruba-2540-24G-PoEP-4SFPP(eth-23)# exit
278 Aruba-2540-24G-PoEP-4SFPP(config)# exit
279