[bgpd] Fix 0.99 shutdown regression, introduce Clearing and Deleted states 2006-09-14 Paul Jakma <paul.jakma@sun.com> * (general) Fix some niggly issues around 'shutdown' and clearing by adding a Clearing FSM wait-state and a hidden 'Deleted' FSM state, to allow deleted peers to 'cool off' and hit 0 references. This introduces a slow memory leak of struct peer, however that's more a testament to the fragility of the reference counting than a bug in this patch, cleanup of reference counting to fix this is to follow. * bgpd.h: Add Clearing, Deleted states and Clearing_Completed and event. * bgp_debug.c: (bgp_status_msg[]) Add strings for Clearing and Deleted. * bgp_fsm.h: Don't allow timer/event threads to set anything for Deleted peers. * bgp_fsm.c: (bgp_timer_set) Add Clearing and Deleted. Deleted needs to stop everything. (bgp_stop) Remove explicit fsm_change_status call, the general framework handles the transition. (bgp_start) Log a warning if a start is attempted on a peer that should stay down, trying to start a peer. (struct .. FSM) Add Clearing_Completed events, has little influence except when in state Clearing to signal wait-state can end. Add Clearing and Deleted states, former is a wait-state, latter is a placeholder state to allow peers to disappear quietly once refcounts settle. (bgp_event) Try reduce verbosity of FSM state-change debug, changes to same state are not interesting (Established->Established) Allow NULL action functions in FSM. * bgp_packet.c: (bgp_write) Use FSM events, rather than trying to twiddle directly with FSM state behind the back of FSM. (bgp_write_notify) ditto. (bgp_read) Remove the vague ACCEPT_PEER peer_unlock, or else this patch crashes, now it leaks instead. * bgp_route.c: (bgp_clear_node_complete) Clearing_Completed event, to end clearing. (bgp_clear_route) See extensive comments. * bgpd.c: (peer_free) should only be called while in Deleted, peer refcounting controls when peer_free is called. bgp_sync_delete should be here, not in peer_delete. (peer_delete) Initiate delete. Transition to Deleted state manually. When removing peer from indices that provide visibility of it, take great care to be idempotent wrt the reference counting of struct peer through those indices. Use bgp_timer_set, rather than replicating. Call to bgp_sync_delete isn't appropriate here, sync can be referenced while shutting down and finishing deletion. (peer_group_bind) Take care to be idempotent wrt list references indexing peers.

commit: ca058a30b1ea57f83871ab4cf1c9a91ea4064d52 [log] [tgz]
author: Paul Jakma <paul.jakma@sun.com> Thu Sep 14 02:58:49 2006 +0000
committer: Paul Jakma <paul.jakma@sun.com> Thu Sep 14 02:58:49 2006 +0000
tree: ab38ab59bad607c9b41a093cb8b35bec766f30b3
parent: 2815e61ffbbf9c362896f3912d925cf78e125ee1 [diff] [blame]
diff --git a/bgpd/bgp_route.c b/bgpd/bgp_route.c
index 2ce2ef4..5dde41d 100644
--- a/bgpd/bgp_route.c
+++ b/bgpd/bgp_route.c

@@ -2527,11 +2527,10 @@
 {
   struct peer *peer = wq->spec.data;
   
-  UNSET_FLAG (peer->sflags, PEER_STATUS_CLEARING);
   peer_unlock (peer); /* bgp_clear_node_complete */
   
   /* Tickle FSM to start moving again */
-  BGP_EVENT_ADD (peer, BGP_Start);
+  BGP_EVENT_ADD (peer, Clearing_Completed);
 }
 
 static void
@@ -2593,9 +2592,10 @@
   if (peer->clear_node_queue == NULL)
     bgp_clear_node_queue_init (peer);
   
-  /* bgp_fsm.c will not bring CLEARING sessions out of Idle this
-   * protects against peers which flap faster than we can we clear,
-   * which could lead to:
+  /* bgp_fsm.c keeps sessions in state Clearing, not transitioning to
+   * Idle until it receives a Clearing_Completed event. This protects
+   * against peers which flap faster than we can we clear, which could
+   * lead to:
    *
    * a) race with routes from the new session being installed before
    *    clear_route_node visits the node (to delete the route of that
@@ -2604,11 +2604,8 @@
    *    on the process_main queue. Fast-flapping could cause that queue
    *    to grow and grow.
    */
-  if (!CHECK_FLAG (peer->sflags, PEER_STATUS_CLEARING))
-    {
-      SET_FLAG (peer->sflags, PEER_STATUS_CLEARING);
-      peer_lock (peer); /* bgp_clear_node_complete */
-    }
+  if (!peer->clear_node_queue->thread)
+    peer_lock (peer); /* bgp_clear_node_complete */
   
   if (safi != SAFI_MPLS_VPN)
     bgp_clear_route_table (peer, afi, safi, NULL, NULL);
@@ -2624,11 +2621,37 @@
         bgp_clear_route_table (peer, afi, safi, NULL, rsclient);
     }
   
-  /* If no routes were cleared, nothing was added to workqueue, run the
-   * completion function now.
+  /* If no routes were cleared, nothing was added to workqueue, the
+   * completion function won't be run by workqueue code - call it here.
+   *
+   * Additionally, there is a presumption in FSM that clearing is only
+   * needed if peer state is Established - peers in pre-Established states
+   * shouldn't have any route-update state associated with them (in or out).
+   *
+   * We still get here from FSM through bgp_stop->clear_route_all in
+   * pre-Established though, so this is a useful sanity check to ensure
+   * the assumption above holds.
+   *
+   * At some future point, this check could be move to the top of the
+   * function, and do a quick early-return when state is
+   * pre-Established, avoiding above list and table scans. Once we're
+   * sure it is safe..
    */
   if (!peer->clear_node_queue->thread)
     bgp_clear_node_complete (peer->clear_node_queue);
+  else
+    {
+      /* clearing queue scheduled. Normal if in Established state
+       * (and about to transition out of it), but otherwise...
+       */
+      if (peer->status != Established)
+        {
+          plog_err (peer->log, "%s [Error] State %s is not Established,"
+                    " but routes were cleared - bug!",
+                    peer->host, LOOKUP (bgp_status_msg, peer->status));
+          assert (peer->status == Established);
+        }
+    }
 }
   
 void
commit	ca058a30b1ea57f83871ab4cf1c9a91ea4064d52	[log] [tgz]
author	Paul Jakma <paul.jakma@sun.com>	Thu Sep 14 02:58:49 2006 +0000
committer	Paul Jakma <paul.jakma@sun.com>	Thu Sep 14 02:58:49 2006 +0000
tree	ab38ab59bad607c9b41a093cb8b35bec766f30b3
parent	2815e61ffbbf9c362896f3912d925cf78e125ee1 [diff] [blame]