bgpd: Fix NHT race with Connect leading to test tool issues
* The NHT change:
"bgpd, zebra: Use next hop tracking for connected routes too"
introduces a race where bgp_connect_check can be called on a peer in
Connect state before the TCP handshake has completed. If this happens,
then the SO_ERROR sockopt to check the state of the socket is undefined
or at least does not return a useful result - it returns 0, as with
a connected socket. SO_ERROR should only be called on non-block sockets
after the socket has been ready for writing.
The net result is that bgpd can then incorrectly advance the peer FSM for
the socket (also the main 'peer'), to OpenSent. As part of which, any
incoming connection from the peer will pass through collision_detect and
may be (incorrectly) closed, depending on the RIDs.
This race is reliably hit with testing tools which wait to listen for
incoming BGP connections from the RUT to know it is in Connect/Active, and
which ignore the TCP connection (no SYN|ACK, no RST), and then launch their
own connection.
The fix is to better integrate the BGP FSM and the NHT update, to ensure
connect_check is not called on peers in Connect state.
Note: There may be no need at all for NHT to tickle FSM.
* bgpd.h: Add NHT_Update FSM event for NHT valid.
* bgp_fsm.c: (bgp_fsm_nht_update) There is no need to have a separate
switch based FSM with its own event via an exported function. Have
NHT raise the NHT_Update even on the peer, instead of calling a
side-channel function into a sub-FSM in the FSM. No need to have
code for BGP_Start, FSM can call that. Actions for Connect and
Active are the same and just lead to ConnectRetry_timer_expired
event - so FSM can just call same transition func as that.
No need to call bgp_connect_check on Connect, as Connect implies no
connection.
(FSM) Handle the NHT_Update event, replacing bgp_fsm_nht_update.
Idle -> bgp_start, Connect and Active were doing the same as
ConnectRetry_timer_expired so replicate those. Rest are No-Ops.
* bgp_nht.c: (evaluate_paths) Raise NHT_Update FSM event. Always valid.
* bgp_packet.{c,h}: (bgp_connect_check) NHT change now unnecessary, revert.
6 files changed