bgpd: Fix NHT race with Connect leading to test tool issues

* The NHT change:

   "bgpd, zebra: Use next hop tracking for connected routes too"

  introduces a race where bgp_connect_check can be called on a peer in
  Connect state before the TCP handshake has completed. If this happens,
  then the SO_ERROR sockopt to check the state of the socket is undefined
  or at least does not return a useful result - it returns 0, as with
  a connected socket. SO_ERROR should only be called on non-block sockets
  after the socket has been ready for writing.

  The net result is that bgpd can then incorrectly advance the peer FSM for
  the socket (also the main 'peer'), to OpenSent. As part of which, any
  incoming connection from the peer will pass through collision_detect and
  may be (incorrectly) closed, depending on the RIDs.

  This race is reliably hit with testing tools which wait to listen for
  incoming BGP connections from the RUT to know it is in Connect/Active, and
  which ignore the TCP connection (no SYN|ACK, no RST), and then launch their
  own connection.

  The fix is to better integrate the BGP FSM and the NHT update, to ensure
  connect_check is not called on peers in Connect state.

  Note: There may be no need at all for NHT to tickle FSM.

* bgpd.h: Add NHT_Update FSM event for NHT valid.

* bgp_fsm.c: (bgp_fsm_nht_update) There is no need to have a separate
  switch based FSM with its own event via an exported function.  Have
  NHT raise the NHT_Update even on the peer, instead of calling a
  side-channel function into a sub-FSM in the FSM.  No need to have
  code for BGP_Start, FSM can call that.  Actions for Connect and
  Active are the same and just lead to ConnectRetry_timer_expired
  event - so FSM can just call same transition func as that.

  No need to call bgp_connect_check on Connect, as Connect implies no
  connection.

  (FSM) Handle the NHT_Update event, replacing bgp_fsm_nht_update.
  Idle -> bgp_start, Connect and Active were doing the same as
  ConnectRetry_timer_expired so replicate those. Rest are No-Ops.

* bgp_nht.c: (evaluate_paths) Raise NHT_Update FSM event. Always valid.
* bgp_packet.{c,h}: (bgp_connect_check) NHT change now unnecessary, revert.
6 files changed