bgpd: Fix race in clearing completion When a peer that is Established goes down, it is moved into the Clearing state to facilitate clearing of the routes received from the peer - remove from the RIB, reselect best path, update/delete from Zebra and to other peers etc. At the end of this, a Clearing_Completed event is generated to the FSM which will allow the peer to move out of Clearing to Idle. The issue in the code is that there is a possibility of multiple Clearing Completed events being generated for a peer, one per AFI/SAFI. Upon the first such event, the peer would move to Idle. If other events happened (e.g., new connection got established) before the last Clearing_Completed event is received, bad things can happen. Fix to ensure only one Clearing_Completed event is generated. Signed-off-by: Vivek Venkataraman <vivek@cumulusnetworks.com>

commit: 7ef4221c3f85121edb68a6a54ebd6bb167408e47 [log] [tgz]
author: Donald Sharp <sharpd@cumulusnetworks.com> Mon Mar 30 06:32:52 2015 -0700
committer: Paul Jakma <paul@quagga.net> Thu Sep 24 15:26:42 2015 +0100
tree: fc347b5fa1935a8ef801c3af4b75167af0716934
parent: 234e5c8d5a35339fb319affb952581bf5abb48a7 [diff] [blame]
diff --git a/bgpd/bgp_route.c b/bgpd/bgp_route.c
index 02c926f..f401271 100644
--- a/bgpd/bgp_route.c
+++ b/bgpd/bgp_route.c

@@ -2979,8 +2979,13 @@
    *    on the process_main queue. Fast-flapping could cause that queue
    *    to grow and grow.
    */
+
+  /* lock peer in assumption that clear-node-queue will get nodes; if so,
+   * the unlock will happen upon work-queue completion; other wise, the
+   * unlock happens at the end of this function.
+   */
   if (!peer->clear_node_queue->thread)
-    peer_lock (peer); /* bgp_clear_node_complete */
+    peer_lock (peer);
 
   switch (purpose)
     {
@@ -3007,28 +3012,11 @@
       assert (0);
       break;
     }
-  
-  /* If no routes were cleared, nothing was added to workqueue, the
-   * completion function won't be run by workqueue code - call it here. 
-   * XXX: Actually, this assumption doesn't hold, see
-   * bgp_clear_route_table(), we queue all non-empty nodes.
-   *
-   * Additionally, there is a presumption in FSM that clearing is only
-   * really needed if peer state is Established - peers in
-   * pre-Established states shouldn't have any route-update state
-   * associated with them (in or out).
-   *
-   * We still can get here in pre-Established though, through
-   * peer_delete -> bgp_fsm_change_status, so this is a useful sanity
-   * check to ensure the assumption above holds.
-   *
-   * At some future point, this check could be move to the top of the
-   * function, and do a quick early-return when state is
-   * pre-Established, avoiding above list and table scans. Once we're
-   * sure it is safe..
-   */
+
+  /* unlock if no nodes got added to the clear-node-queue. */
   if (!peer->clear_node_queue->thread)
-    bgp_clear_node_complete (peer->clear_node_queue);
+    peer_unlock (peer);
+
 }
   
 void
commit	7ef4221c3f85121edb68a6a54ebd6bb167408e47	[log] [tgz]
author	Donald Sharp <sharpd@cumulusnetworks.com>	Mon Mar 30 06:32:52 2015 -0700
committer	Paul Jakma <paul@quagga.net>	Thu Sep 24 15:26:42 2015 +0100
tree	fc347b5fa1935a8ef801c3af4b75167af0716934
parent	234e5c8d5a35339fb319affb952581bf5abb48a7 [diff] [blame]