bgpd, doc, lib, zebra: nexthop-tracking in zebra
0. Introduction
This is the design specification for next hop tracking feature in
Quagga.
1. Background
Recursive routes are of the form:
p/m --> n
[Ex: 1.1.0.0/16 --> 2.2.2.2]
where 'n' itself is resolved through another route as follows:
p2/m --> h, interface
[Ex: 2.2.2.0/24 --> 3.3.3.3, eth0]
Usually, BGP routes are recursive in nature and BGP nexthops get
resolved through an IGP route. IGP usually adds its routes pointing to
an interface (these are called non-recursive routes).
When BGP receives a recursive route from a peer, it needs to validate
the nexthop. The path is marked valid or invalid based on the
reachability status of the nexthop. Nexthop validation is also
important for BGP decision process as the metric to reach the nexthop
is a parameter to best path selection process.
As it goes with routing, this is a dynamic process. Route to the
nexthop can change. The nexthop can become unreachable or
reachable. In the current BGP implementation, the nexthop validation
is done periodically in the scanner run. The default scanner run
interval is one minute. Every minute, the scanner task walks the
entire BGP table. It checks the validity of each nexthop with Zebra
(the routing table manager) through a request and response message
exchange between BGP and Zebra process. BGP process is blocked for
that duration. The mechanism has two major drawbacks:
(1) The scanner task runs to completion. That can potentially starve
the other tasks for long periods of time, based on the BGP table
size and number of nexthops.
(2) Convergence around routing changes that affect the nexthops can be
long (around a minute with the default intervals). The interval
can be shortened to achieve faster reaction time, but it makes the
first problem worse, with the scanner task consuming most of the
CPU resources.
"Next hop tracking" feature makes this process event-driven. It
eliminates periodic nexthop validation and introduces an asynchronous
communication path between BGP and Zebra for route change notifications
that can then be acted upon.
2. Goal
Stating the obvious, the main goal is to remove the two limitations we
discussed in the previous section. The goals, in a constructive tone,
are the following:
- fairness: the scanner run should not consume an unjustly high amount
of CPU time. This should give an overall good performance and
response time to other events (route changes, session events,
IO/user interface).
- convergence: BGP must react to nexthop changes instantly and provide
sub-second convergence. This may involve diverting the routes from
one nexthop to another.
3. Overview of the changes
The changes are in both BGP and Zebra modules. The short summary is
the following:
- Zebra implements a registration mechanism by which clients can
register for next hop notification. Consequently, it maintains a
separate table, per (VRF, AF) pair, of next hops and interested
client-list per next hop.
- When the main routing table changes in Zebra, it evaluates the next
hop table: for each next hop, it checks if the route table
modifications have changed its state. If so, it notifies the
interested clients.
- BGP is one such client. It registers the next hops corresponding to
all of its received routes/paths. It also threads the paths against
each nexthop structure.
- When BGP receives a next hop notification from Zebra, it walks the
corresponding path list. It makes them valid or invalid depending
on the next hop notification. It then re-computes best path for the
corresponding destination. This may result in re-announcing those
destinations to peers.
4. Design
4.1. Modules
The core design introduces an "nht" (next hop tracking) module in BGP
and "rnh" (recursive nexthop) module in Zebra. The "nht" module
provides the following APIs:
bgp_find_or_add_nexthop() : find or add a nexthop in BGP nexthop table
bgp_find_nexthop() : find a nexthop in BGP nexthop table
bgp_parse_nexthop_update() : parse a nexthop update message coming
from zebra
The "rnh" module provides the following APIs:
zebra_add_rnh() : add a recursive nexthop
zebra_delete_rnh() : delete a recursive nexthop
zebra_lookup_rnh() : lookup a recursive nexthop
zebra_add_rnh_client() : register a client for nexthop notifications
against a recursive nexthop
zebra_remove_rnh_client(): remove the client registration for a
recursive nexthop
zebra_evaluate_rnh_table(): (re)evaluate the recursive nexthop table
(most probably because the main routing
table has changed).
zebra_cleanup_rnh_client(): Cleanup a client from the "rnh" module
data structures (most probably because the
client is going away).
4.2. Control flow
The next hop registration control flow is the following:
<==== BGP Process ====>|<==== Zebra Process ====>
|
receive module nht module | zserv module rnh module
----------------------------------------------------------------------
| | |
bgp_update_ | | |
main() | bgp_find_or_add_ | |
| nexthop() | |
| | |
| | zserv_nexthop_ |
| | register() |
| | | zebra_add_rnh()
| | |
The next hop notification control flow is the following:
<==== Zebra Process ====>|<==== BGP Process ====>
|
rib module rnh module | zebra module nht module
----------------------------------------------------------------------
| | |
meta_queue_ | | |
process() | zebra_evaluate_ | |
| rnh_table() | |
| | |
| | bgp_read_nexthop_ |
| | update() |
| | | bgp_parse_
| | | nexthop_update()
| | |
4.3. zclient message format
ZEBRA_NEXTHOP_REGISTER and ZEBRA_NEXTHOP_UNREGISTER messages are
encoded in the following way:
/*
* 0 1 2 3
* 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
* +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
* | AF | prefix len |
* +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
* . Nexthop prefix .
* . .
* +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
* . .
* . .
* +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
* | AF | prefix len |
* +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
* . Nexthop prefix .
* . .
* +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*/
ZEBRA_NEXTHOP_UPDATE message is encoded as follows:
/*
* 0 1 2 3
* 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
* +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
* | AF | prefix len |
* +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
* . Nexthop prefix getting resolved .
* . .
* +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
* | metric |
* +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
* | #nexthops |
* +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
* | nexthop type |
* +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
* . resolving Nexthop details .
* . .
* +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
* . .
* +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
* | nexthop type |
* +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
* . resolving Nexthop details .
* +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*/
4.4. BGP data structure
Legend:
/\ struct bgp_node: a BGP destination/route/prefix
\/
[ ] struct bgp_info: a BGP path (e.g. route received from a peer)
_
(_) struct bgp_nexthop_cache: a BGP nexthop
/\ NULL
\/--+ ^
| :
+--[ ]--[ ]--[ ]--> NULL
/\ :
\/--+ :
| :
+--[ ]--[ ]--> NULL
:
_ :
(_).............
4.5. Zebra data structure
rnh table:
O
/ \
O O
/ \
O O
struct rnh
{
u_char flags;
struct rib *state;
struct list *client_list;
struct route_node *node;
};
5. User interface changes
quagga# show ip nht
3.3.3.3
resolved via kernel
via 11.0.0.6, swp1
Client list: bgp(fd 12)
11.0.0.10
resolved via connected
is directly connected, swp2
Client list: bgp(fd 12)
11.0.0.18
resolved via connected
is directly connected, swp4
Client list: bgp(fd 12)
11.11.11.11
resolved via kernel
via 10.0.1.2, eth0
Client list: bgp(fd 12)
quagga# show ip bgp nexthop
Current BGP nexthop cache:
3.3.3.3 valid [IGP metric 0], #paths 3
Last update: Wed Oct 16 04:43:49 2013
11.0.0.10 valid [IGP metric 1], #paths 1
Last update: Wed Oct 16 04:43:51 2013
11.0.0.18 valid [IGP metric 1], #paths 2
Last update: Wed Oct 16 04:43:47 2013
11.11.11.11 valid [IGP metric 0], #paths 1
Last update: Wed Oct 16 04:43:47 2013
quagga# show ipv6 nht
quagga# show ip bgp nexthop detail
quagga# debug bgp nht
quagga# debug zebra nht
6. Sample test cases
r2----r3
/ \ /
r1----r4
- Verify that a change in IGP cost triggers NHT
+ shutdown the r1-r4 and r2-r4 links
+ no shut the r1-r4 and r2-r4 links and wait for OSPF to come back
up
+ We should be back to the original nexthop via r4 now
- Verify that a NH becoming unreachable triggers NHT
+ Shutdown all links to r4
- Verify that a NH becoming reachable triggers NHT
+ no shut all links to r4
7. Future work
- route-policy for next hop validation (e.g. ignore default route)
- damping for rapid next hop changes
- prioritized handling of nexthop changes ((un)reachability vs. metric
changes)
- handling recursion loop, e.g.
11.11.11.11/32 -> 12.12.12.12
12.12.12.12/32 -> 11.11.11.11
11.0.0.0/8 -> <interface>
- better statistics
Addresses upstream comments.
"show ip bgp nexthop detail" couldn't display multiple NHs due to a bug.
Fix that.
Fix reference counts for the nexthop cache entries
Signed-off-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com>
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com>
Signed-off-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Fix reference counts for the nexthop cache entries.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Edited-by: Paul Jakma <paul.jakma@hpe.com>
- Fix nexthop_ipv6_add defs in rib.h not having been modified with rib_ prefix.
- Remove rib_lookup_and_pushup, appears not to be used except for
!HAVE_NETLINK && HAVE_STRUCT_IFALIASREQ case of ioctl.c::if_set_prefix,
so it's not being used at all on platform with most testing of RIB.
diff --git a/bgpd/Makefile.am b/bgpd/Makefile.am
index d2775f3..fe1be32 100644
--- a/bgpd/Makefile.am
+++ b/bgpd/Makefile.am
@@ -16,7 +16,7 @@
bgp_packet.c bgp_network.c bgp_filter.c bgp_regex.c bgp_clist.c \
bgp_dump.c bgp_snmp.c bgp_ecommunity.c bgp_mplsvpn.c bgp_nexthop.c \
bgp_damp.c bgp_table.c bgp_advertise.c bgp_vty.c bgp_mpath.c \
- bgp_encap.c bgp_encap_tlv.c
+ bgp_encap.c bgp_encap_tlv.c bgp_nht.c
noinst_HEADERS = \
bgp_aspath.h bgp_attr.h bgp_community.h bgp_debug.h bgp_fsm.h \
@@ -24,7 +24,7 @@
bgpd.h bgp_filter.h bgp_clist.h bgp_dump.h bgp_zebra.h \
bgp_ecommunity.h bgp_mplsvpn.h bgp_nexthop.h bgp_damp.h bgp_table.h \
bgp_advertise.h bgp_snmp.h bgp_vty.h bgp_mpath.h \
- bgp_encap.h bgp_encap_tlv.h bgp_encap_types.h
+ bgp_encap.h bgp_encap_tlv.h bgp_encap_types.h bgp_nht.h
bgpd_SOURCES = bgp_main.c
bgpd_LDADD = libbgp.a ../lib/libzebra.la @LIBCAP@ @LIBM@
diff --git a/bgpd/bgp_debug.c b/bgpd/bgp_debug.c
index 90a378b..ba79722 100644
--- a/bgpd/bgp_debug.c
+++ b/bgpd/bgp_debug.c
@@ -47,6 +47,7 @@
unsigned long conf_bgp_debug_normal;
unsigned long conf_bgp_debug_zebra;
unsigned long conf_bgp_debug_allow_martians;
+unsigned long conf_bgp_debug_nht;
unsigned long term_bgp_debug_as4;
unsigned long term_bgp_debug_fsm;
@@ -58,6 +59,7 @@
unsigned long term_bgp_debug_normal;
unsigned long term_bgp_debug_zebra;
unsigned long term_bgp_debug_allow_martians;
+unsigned long term_bgp_debug_nht;
/* messages for BGP-4 status */
const struct message bgp_status_msg[] =
@@ -472,6 +474,48 @@
BGP_STR
"BGP events\n")
+DEFUN (debug_bgp_nht,
+ debug_bgp_nht_cmd,
+ "debug bgp nht",
+ DEBUG_STR
+ BGP_STR
+ "BGP nexthop tracking events\n")
+{
+ if (vty->node == CONFIG_NODE)
+ DEBUG_ON (nht, NHT);
+ else
+ {
+ TERM_DEBUG_ON (nht, NHT);
+ vty_out (vty, "BGP nexthop tracking debugging is on%s", VTY_NEWLINE);
+ }
+ return CMD_SUCCESS;
+}
+
+DEFUN (no_debug_bgp_nht,
+ no_debug_bgp_nht_cmd,
+ "no debug bgp nht",
+ NO_STR
+ DEBUG_STR
+ BGP_STR
+ "BGP nexthop tracking events\n")
+{
+ if (vty->node == CONFIG_NODE)
+ DEBUG_OFF (nht, NHT);
+ else
+ {
+ TERM_DEBUG_OFF (nht, NHT);
+ vty_out (vty, "BGP nexthop tracking debugging is off%s", VTY_NEWLINE);
+ }
+ return CMD_SUCCESS;
+}
+
+ALIAS (no_debug_bgp_nht,
+ undebug_bgp_nht_cmd,
+ "undebug bgp nht",
+ UNDEBUG_STR
+ BGP_STR
+ "BGP next-hop tracking updates\n")
+
DEFUN (debug_bgp_filter,
debug_bgp_filter_cmd,
"debug bgp filters",
@@ -833,6 +877,8 @@
vty_out (vty, " BGP as4 aspath segment debugging is on%s", VTY_NEWLINE);
if (BGP_DEBUG (allow_martians, ALLOW_MARTIANS))
vty_out (vty, " BGP allow martian next hop debugging is on%s", VTY_NEWLINE);
+ if (BGP_DEBUG (nht, NHT))
+ vty_out (vty, " BGP next-hop tracking debugging is on%s", VTY_NEWLINE);
vty_out (vty, "%s", VTY_NEWLINE);
return CMD_SUCCESS;
}
@@ -911,6 +957,12 @@
vty_out (vty, "debug bgp allow-martians%s", VTY_NEWLINE);
write++;
}
+
+ if (CONF_BGP_DEBUG (nht, NHT))
+ {
+ vty_out (vty, "debug bgp nht%s", VTY_NEWLINE);
+ write++;
+ }
return write;
}
@@ -938,6 +990,8 @@
install_element (CONFIG_NODE, &debug_bgp_fsm_cmd);
install_element (ENABLE_NODE, &debug_bgp_events_cmd);
install_element (CONFIG_NODE, &debug_bgp_events_cmd);
+ install_element (ENABLE_NODE, &debug_bgp_nht_cmd);
+ install_element (CONFIG_NODE, &debug_bgp_nht_cmd);
install_element (ENABLE_NODE, &debug_bgp_filter_cmd);
install_element (CONFIG_NODE, &debug_bgp_filter_cmd);
install_element (ENABLE_NODE, &debug_bgp_keepalive_cmd);
@@ -966,6 +1020,9 @@
install_element (ENABLE_NODE, &no_debug_bgp_events_cmd);
install_element (ENABLE_NODE, &undebug_bgp_events_cmd);
install_element (CONFIG_NODE, &no_debug_bgp_events_cmd);
+ install_element (ENABLE_NODE, &no_debug_bgp_nht_cmd);
+ install_element (ENABLE_NODE, &undebug_bgp_nht_cmd);
+ install_element (CONFIG_NODE, &no_debug_bgp_nht_cmd);
install_element (ENABLE_NODE, &no_debug_bgp_filter_cmd);
install_element (ENABLE_NODE, &undebug_bgp_filter_cmd);
install_element (CONFIG_NODE, &no_debug_bgp_filter_cmd);
diff --git a/bgpd/bgp_debug.h b/bgpd/bgp_debug.h
index 42cbd7e..253bd7f 100644
--- a/bgpd/bgp_debug.h
+++ b/bgpd/bgp_debug.h
@@ -68,6 +68,7 @@
extern unsigned long conf_bgp_debug_normal;
extern unsigned long conf_bgp_debug_zebra;
extern unsigned long conf_bgp_debug_allow_martians;
+extern unsigned long conf_bgp_debug_nht;
extern unsigned long term_bgp_debug_as4;
extern unsigned long term_bgp_debug_fsm;
@@ -79,6 +80,7 @@
extern unsigned long term_bgp_debug_normal;
extern unsigned long term_bgp_debug_zebra;
extern unsigned long term_bgp_debug_allow_martians;
+extern unsigned long term_bgp_debug_nht;
#define BGP_DEBUG_AS4 0x01
#define BGP_DEBUG_AS4_SEGMENT 0x02
@@ -93,6 +95,7 @@
#define BGP_DEBUG_NORMAL 0x01
#define BGP_DEBUG_ZEBRA 0x01
#define BGP_DEBUG_ALLOW_MARTIANS 0x01
+#define BGP_DEBUG_NHT 0x01
#define BGP_DEBUG_PACKET_SEND 0x01
#define BGP_DEBUG_PACKET_SEND_DETAIL 0x02
diff --git a/bgpd/bgp_nexthop.c b/bgpd/bgp_nexthop.c
index 2406814..badada7 100644
--- a/bgpd/bgp_nexthop.c
+++ b/bgpd/bgp_nexthop.c
@@ -31,17 +31,21 @@
#include "hash.h"
#include "jhash.h"
#include "filter.h"
+#include "nexthop.h"
#include "bgpd/bgpd.h"
#include "bgpd/bgp_table.h"
#include "bgpd/bgp_route.h"
#include "bgpd/bgp_attr.h"
#include "bgpd/bgp_nexthop.h"
+#include "bgpd/bgp_nht.h"
#include "bgpd/bgp_debug.h"
#include "bgpd/bgp_damp.h"
#include "zebra/rib.h"
#include "zebra/zserv.h" /* For ZEBRA_SERV_PATH. */
+extern struct zclient *zclient;
+
struct bgp_nexthop_cache *zlookup_query (struct in_addr);
struct bgp_nexthop_cache *zlookup_query_ipv6 (struct in6_addr *);
@@ -58,7 +62,7 @@
static int bgp_import_interval;
/* Route table for next-hop lookup cache. */
-static struct bgp_table *bgp_nexthop_cache_table[AFI_MAX];
+struct bgp_table *bgp_nexthop_cache_table[AFI_MAX];
static struct bgp_table *cache1_table[AFI_MAX];
static struct bgp_table *cache2_table[AFI_MAX];
@@ -68,6 +72,13 @@
/* BGP nexthop lookup query client. */
struct zclient *zlookup = NULL;
+char *
+bnc_str (struct bgp_nexthop_cache *bnc, char *buf, int size)
+{
+ prefix2str(&(bnc->node->p), buf, size);
+ return buf;
+}
+
/* Add nexthop to the end of the list. */
static void
bnc_nexthop_add (struct bgp_nexthop_cache *bnc, struct nexthop *nexthop)
@@ -83,7 +94,7 @@
nexthop->prev = last;
}
-static void
+void
bnc_nexthop_free (struct bgp_nexthop_cache *bnc)
{
struct nexthop *nexthop;
@@ -96,13 +107,17 @@
}
}
-static struct bgp_nexthop_cache *
+struct bgp_nexthop_cache *
bnc_new (void)
{
- return XCALLOC (MTYPE_BGP_NEXTHOP_CACHE, sizeof (struct bgp_nexthop_cache));
+ struct bgp_nexthop_cache *bnc;
+
+ bnc = XCALLOC (MTYPE_BGP_NEXTHOP_CACHE, sizeof (struct bgp_nexthop_cache));
+ LIST_INIT(&(bnc->paths));
+ return bnc;
}
-static void
+void
bnc_free (struct bgp_nexthop_cache *bnc)
{
bnc_nexthop_free (bnc);
@@ -110,46 +125,6 @@
}
static int
-bgp_nexthop_same (struct nexthop *next1, struct nexthop *next2)
-{
- if (next1->type != next2->type)
- return 0;
-
- switch (next1->type)
- {
- case ZEBRA_NEXTHOP_IPV4:
- if (! IPV4_ADDR_SAME (&next1->gate.ipv4, &next2->gate.ipv4))
- return 0;
- break;
- case ZEBRA_NEXTHOP_IPV4_IFINDEX:
- if (! IPV4_ADDR_SAME (&next1->gate.ipv4, &next2->gate.ipv4)
- || next1->ifindex != next2->ifindex)
- return 0;
- break;
- case ZEBRA_NEXTHOP_IFINDEX:
- case ZEBRA_NEXTHOP_IFNAME:
- if (next1->ifindex != next2->ifindex)
- return 0;
- break;
- case ZEBRA_NEXTHOP_IPV6:
- if (! IPV6_ADDR_SAME (&next1->gate.ipv6, &next2->gate.ipv6))
- return 0;
- break;
- case ZEBRA_NEXTHOP_IPV6_IFINDEX:
- case ZEBRA_NEXTHOP_IPV6_IFNAME:
- if (! IPV6_ADDR_SAME (&next1->gate.ipv6, &next2->gate.ipv6))
- return 0;
- if (next1->ifindex != next2->ifindex)
- return 0;
- break;
- default:
- /* do nothing */
- break;
- }
- return 1;
-}
-
-static int
bgp_nexthop_cache_different (struct bgp_nexthop_cache *bnc1,
struct bgp_nexthop_cache *bnc2)
{
@@ -164,7 +139,7 @@
for (i = 0; i < bnc1->nexthop_num; i++)
{
- if (! bgp_nexthop_same (next1, next2))
+ if (! nexthop_same_no_recurse (next1, next2))
return 1;
next1 = next1->next;
@@ -382,6 +357,7 @@
return bnc->valid;
}
+#if BGP_SCAN_NEXTHOP
/* Reset and free all BGP nexthop cache. */
static void
bgp_nexthop_cache_reset (struct bgp_table *table)
@@ -397,6 +373,7 @@
bgp_unlock_node (rn);
}
}
+#endif
static void
bgp_scan (afi_t afi, safi_t safi)
@@ -407,6 +384,7 @@
struct bgp_info *next;
struct peer *peer;
struct listnode *node, *nnode;
+#if BGP_SCAN_NEXTHOP
int valid;
int current;
int changed;
@@ -417,6 +395,7 @@
bgp_nexthop_cache_table[afi] = cache2_table[afi];
else
bgp_nexthop_cache_table[afi] = cache1_table[afi];
+#endif
/* Get default bgp. */
bgp = bgp_get_default ();
@@ -446,6 +425,7 @@
if (bi->type == ZEBRA_ROUTE_BGP && bi->sub_type == BGP_ROUTE_NORMAL)
{
+#if BGP_SCAN_NEXTHOP
changed = 0;
metricchanged = 0;
@@ -478,6 +458,7 @@
afi, SAFI_UNICAST);
}
}
+#endif
if (CHECK_FLAG (bgp->af_flags[afi][SAFI_UNICAST],
BGP_CONFIG_DAMPENING)
@@ -491,11 +472,13 @@
bgp_process (bgp, rn, afi, SAFI_UNICAST);
}
+#if BGP_SCAN_NEXTHOP
/* Flash old cache. */
if (bgp_nexthop_cache_table[afi] == cache1_table[afi])
bgp_nexthop_cache_reset (cache2_table[afi]);
else
bgp_nexthop_cache_reset (cache1_table[afi]);
+#endif
if (BGP_DEBUG (events, EVENTS))
{
@@ -1262,9 +1245,7 @@
show_ip_bgp_scan_tables (struct vty *vty, const char detail)
{
struct bgp_node *rn;
- struct bgp_nexthop_cache *bnc;
char buf[INET6_ADDRSTRLEN];
- u_char i;
if (bgp_scan_thread)
vty_out (vty, "BGP scan is running%s", VTY_NEWLINE);
@@ -1272,6 +1253,7 @@
vty_out (vty, "BGP scan is not running%s", VTY_NEWLINE);
vty_out (vty, "BGP scan interval is %d%s", bgp_scan_interval, VTY_NEWLINE);
+#if BGP_SCAN_NEXTHOP
vty_out (vty, "Current BGP nexthop cache:%s", VTY_NEWLINE);
for (rn = bgp_table_top (bgp_nexthop_cache_table[AFI_IP]); rn; rn = bgp_route_next (rn))
if ((bnc = rn->info) != NULL)
@@ -1281,21 +1263,21 @@
vty_out (vty, " %s valid [IGP metric %d]%s",
inet_ntop (AF_INET, &rn->p.u.prefix4, buf, INET6_ADDRSTRLEN), bnc->metric, VTY_NEWLINE);
if (detail)
- for (i = 0; i < bnc->nexthop_num; i++)
- switch (bnc->nexthop[i].type)
+ for (nexthop = bnc->nexthop; nexthop; nexthop = nexthop->next)
+ switch (nexthop->type)
{
case NEXTHOP_TYPE_IPV4:
- vty_out (vty, " gate %s%s", inet_ntop (AF_INET, &bnc->nexthop[i].gate.ipv4, buf, INET6_ADDRSTRLEN), VTY_NEWLINE);
+ vty_out (vty, " gate %s%s", inet_ntop (AF_INET, &nexthop->gate.ipv4, buf, INET6_ADDRSTRLEN), VTY_NEWLINE);
break;
case NEXTHOP_TYPE_IPV4_IFINDEX:
- vty_out (vty, " gate %s", inet_ntop (AF_INET, &bnc->nexthop[i].gate.ipv4, buf, INET6_ADDRSTRLEN));
- vty_out (vty, " ifidx %u%s", bnc->nexthop[i].ifindex, VTY_NEWLINE);
+ vty_out (vty, " gate %s", inet_ntop (AF_INET, &nexthop->gate.ipv4, buf, INET6_ADDRSTRLEN));
+ vty_out (vty, " ifidx %u%s", nexthop->ifindex, VTY_NEWLINE);
break;
case NEXTHOP_TYPE_IFINDEX:
- vty_out (vty, " ifidx %u%s", bnc->nexthop[i].ifindex, VTY_NEWLINE);
+ vty_out (vty, " ifidx %u%s", nexthop->ifindex, VTY_NEWLINE);
break;
default:
- vty_out (vty, " invalid nexthop type %u%s", bnc->nexthop[i].type, VTY_NEWLINE);
+ vty_out (vty, " invalid nexthop type %u%s", nexthop->type, VTY_NEWLINE);
}
}
else
@@ -1315,17 +1297,17 @@
inet_ntop (AF_INET6, &rn->p.u.prefix6, buf, INET6_ADDRSTRLEN),
bnc->metric, VTY_NEWLINE);
if (detail)
- for (i = 0; i < bnc->nexthop_num; i++)
- switch (bnc->nexthop[i].type)
+ for (nexthop = bnc->nexthop; nexthop; nexthop = nexthop->next)
+ switch (nexthop->type)
{
case NEXTHOP_TYPE_IPV6:
- vty_out (vty, " gate %s%s", inet_ntop (AF_INET6, &bnc->nexthop[i].gate.ipv6, buf, INET6_ADDRSTRLEN), VTY_NEWLINE);
+ vty_out (vty, " gate %s%s", inet_ntop (AF_INET6, &nexthop->gate.ipv6, buf, INET6_ADDRSTRLEN), VTY_NEWLINE);
break;
case NEXTHOP_TYPE_IFINDEX:
- vty_out (vty, " ifidx %u%s", bnc->nexthop[i].ifindex, VTY_NEWLINE);
+ vty_out (vty, " ifidx %u%s", nexthop->ifindex, VTY_NEWLINE);
break;
default:
- vty_out (vty, " invalid nexthop type %u%s", bnc->nexthop[i].type, VTY_NEWLINE);
+ vty_out (vty, " invalid nexthop type %u%s", nexthop->type, VTY_NEWLINE);
}
}
else
@@ -1334,7 +1316,9 @@
VTY_NEWLINE);
}
}
-
+#else
+ vty_out (vty, "BGP next-hop tracking is on%s", VTY_NEWLINE);
+#endif
vty_out (vty, "BGP connected route:%s", VTY_NEWLINE);
for (rn = bgp_table_top (bgp_connected_table[AFI_IP]);
rn;
@@ -1357,6 +1341,80 @@
return CMD_SUCCESS;
}
+static int
+show_ip_bgp_nexthop_table (struct vty *vty, int detail)
+{
+ struct bgp_node *rn;
+ struct bgp_nexthop_cache *bnc;
+ char buf[INET6_ADDRSTRLEN];
+ struct nexthop *nexthop;
+ time_t tbuf;
+ afi_t afi;
+
+ vty_out (vty, "Current BGP nexthop cache:%s", VTY_NEWLINE);
+ for (afi = AFI_IP ; afi < AFI_MAX ; afi++)
+ {
+ for (rn = bgp_table_top (bgp_nexthop_cache_table[afi]); rn; rn = bgp_route_next (rn))
+ {
+ if ((bnc = rn->info) != NULL)
+ {
+ if (CHECK_FLAG(bnc->flags, BGP_NEXTHOP_VALID))
+ {
+ vty_out (vty, " %s valid [IGP metric %d], #paths %d%s",
+ inet_ntop (rn->p.family, &rn->p.u.prefix, buf, sizeof (buf)),
+ bnc->metric, bnc->path_count, VTY_NEWLINE);
+ if (detail)
+ for (nexthop = bnc->nexthop ; nexthop; nexthop = nexthop->next)
+ switch (nexthop->type)
+ {
+ case NEXTHOP_TYPE_IPV6:
+ vty_out (vty, " gate %s%s",
+ inet_ntop (AF_INET6, &nexthop->gate.ipv6,
+ buf, INET6_ADDRSTRLEN), VTY_NEWLINE);
+ break;
+ case NEXTHOP_TYPE_IPV6_IFINDEX:
+ vty_out(vty, " gate %s, if %s%s",
+ inet_ntop(AF_INET6, &nexthop->gate.ipv6, buf,
+ INET6_ADDRSTRLEN),
+ ifindex2ifname(nexthop->ifindex),
+ VTY_NEWLINE);
+ break;
+ case NEXTHOP_TYPE_IPV4:
+ vty_out (vty, " gate %s%s",
+ inet_ntop (AF_INET, &nexthop->gate.ipv4, buf,
+ INET6_ADDRSTRLEN), VTY_NEWLINE);
+ break;
+ case NEXTHOP_TYPE_IFINDEX:
+ vty_out (vty, " if %s%s",
+ ifindex2ifname(nexthop->ifindex), VTY_NEWLINE);
+ break;
+ case NEXTHOP_TYPE_IPV4_IFINDEX:
+ vty_out (vty, " gate %s, if %s%s",
+ inet_ntop(AF_INET, &nexthop->gate.ipv4, buf,
+ INET6_ADDRSTRLEN),
+ ifindex2ifname(nexthop->ifindex), VTY_NEWLINE);
+ break;
+ default:
+ vty_out (vty, " invalid nexthop type %u%s",
+ nexthop->type, VTY_NEWLINE);
+ }
+ }
+ else
+ vty_out (vty, " %s invalid%s",
+ inet_ntop (AF_INET, &rn->p.u.prefix, buf, sizeof (buf)), VTY_NEWLINE);
+#ifdef HAVE_CLOCK_MONOTONIC
+ tbuf = time(NULL) - (bgp_clock() - bnc->last_update);
+ vty_out (vty, " Last update: %s", ctime(&tbuf));
+#else
+ vty_out (vty, " Last update: %s", ctime(&bnc->uptime));
+#endif /* HAVE_CLOCK_MONOTONIC */
+ vty_out(vty, "%s", VTY_NEWLINE);
+ }
+ }
+ }
+ return CMD_SUCCESS;
+}
+
DEFUN (show_ip_bgp_scan,
show_ip_bgp_scan_cmd,
"show ip bgp scan",
@@ -1380,6 +1438,28 @@
return show_ip_bgp_scan_tables (vty, 1);
}
+DEFUN (show_ip_bgp_nexthop,
+ show_ip_bgp_nexthop_cmd,
+ "show ip bgp nexthop",
+ SHOW_STR
+ IP_STR
+ BGP_STR
+ "BGP nexthop table\n")
+{
+ return show_ip_bgp_nexthop_table (vty, 0);
+}
+
+DEFUN (show_ip_bgp_nexthop_detail,
+ show_ip_bgp_nexthop_detail_cmd,
+ "show ip bgp nexthop detail",
+ SHOW_STR
+ IP_STR
+ BGP_STR
+ "BGP nexthop table\n")
+{
+ return show_ip_bgp_nexthop_table (vty, 1);
+}
+
int
bgp_config_write_scan_time (struct vty *vty)
{
@@ -1420,8 +1500,12 @@
install_element (BGP_NODE, &no_bgp_scan_time_val_cmd);
install_element (VIEW_NODE, &show_ip_bgp_scan_cmd);
install_element (VIEW_NODE, &show_ip_bgp_scan_detail_cmd);
+ install_element (VIEW_NODE, &show_ip_bgp_nexthop_cmd);
+ install_element (VIEW_NODE, &show_ip_bgp_nexthop_detail_cmd);
install_element (RESTRICTED_NODE, &show_ip_bgp_scan_cmd);
install_element (ENABLE_NODE, &show_ip_bgp_scan_cmd);
+ install_element (ENABLE_NODE, &show_ip_bgp_nexthop_cmd);
+ install_element (ENABLE_NODE, &show_ip_bgp_nexthop_detail_cmd);
install_element (ENABLE_NODE, &show_ip_bgp_scan_detail_cmd);
}
diff --git a/bgpd/bgp_nexthop.h b/bgpd/bgp_nexthop.h
index 85c5a5d..a239ca0 100644
--- a/bgpd/bgp_nexthop.h
+++ b/bgpd/bgp_nexthop.h
@@ -22,6 +22,8 @@
#define _QUAGGA_BGP_NEXTHOP_H
#include "if.h"
+#include "queue.h"
+#include "prefix.h"
#define BGP_SCAN_INTERVAL_DEFAULT 60
#define BGP_IMPORT_INTERVAL_DEFAULT 15
@@ -53,6 +55,20 @@
/* Nexthop number and nexthop linked list.*/
u_char nexthop_num;
struct nexthop *nexthop;
+ time_t last_update;
+ u_int16_t flags;
+
+#define BGP_NEXTHOP_VALID (1 << 0)
+#define BGP_NEXTHOP_REGISTERED (1 << 1)
+
+ u_int16_t change_flags;
+
+#define BGP_NEXTHOP_CHANGED (1 << 0)
+#define BGP_NEXTHOP_METRIC_CHANGED (1 << 1)
+
+ struct bgp_node *node;
+ LIST_HEAD(path_list, bgp_info) paths;
+ unsigned int path_count;
};
extern void bgp_scan_init (void);
@@ -68,5 +84,9 @@
extern void bgp_address_init (void);
extern void bgp_address_destroy (void);
extern void bgp_scan_destroy (void);
+extern struct bgp_nexthop_cache *bnc_new(void);
+extern void bnc_free(struct bgp_nexthop_cache *bnc);
+extern void bnc_nexthop_free(struct bgp_nexthop_cache *bnc);
+extern char *bnc_str(struct bgp_nexthop_cache *bnc, char *buf, int size);
#endif /* _QUAGGA_BGP_NEXTHOP_H */
diff --git a/bgpd/bgp_nht.c b/bgpd/bgp_nht.c
new file mode 100644
index 0000000..21e1411
--- /dev/null
+++ b/bgpd/bgp_nht.c
@@ -0,0 +1,477 @@
+/* BGP Nexthop tracking
+ * Copyright (C) 2013 Cumulus Networks, Inc.
+ *
+ * This file is part of GNU Zebra.
+ *
+ * GNU Zebra is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2, or (at your option) any
+ * later version.
+ *
+ * GNU Zebra is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with GNU Zebra; see the file COPYING. If not, write to the Free
+ * Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
+ * 02111-1307, USA.
+ */
+
+#include <zebra.h>
+
+#include "command.h"
+#include "thread.h"
+#include "prefix.h"
+#include "zclient.h"
+#include "stream.h"
+#include "network.h"
+#include "log.h"
+#include "memory.h"
+#include "nexthop.h"
+#include "filter.h"
+
+#include "bgpd/bgpd.h"
+#include "bgpd/bgp_table.h"
+#include "bgpd/bgp_route.h"
+#include "bgpd/bgp_attr.h"
+#include "bgpd/bgp_nexthop.h"
+#include "bgpd/bgp_debug.h"
+#include "bgpd/bgp_nht.h"
+
+extern struct zclient *zclient;
+extern struct bgp_table *bgp_nexthop_cache_table[AFI_MAX];
+
+static void register_nexthop(struct bgp_nexthop_cache *bnc);
+static void unregister_nexthop (struct bgp_nexthop_cache *bnc);
+static void evaluate_paths(struct bgp_nexthop_cache *bnc);
+static int make_prefix(int afi, struct bgp_info *ri, struct prefix *p);
+static void path_nh_map(struct bgp_info *path, struct bgp_nexthop_cache *bnc,
+ int keep);
+
+int
+bgp_find_nexthop (struct bgp_info *path, int *changed, int *metricchanged)
+{
+ struct bgp_nexthop_cache *bnc = path->nexthop;
+
+ if (!bnc)
+ return 0;
+
+ if (changed)
+ *changed = CHECK_FLAG(bnc->change_flags, BGP_NEXTHOP_CHANGED);
+
+ if (metricchanged)
+ *metricchanged = CHECK_FLAG(bnc->change_flags,
+ BGP_NEXTHOP_METRIC_CHANGED);
+
+ return (CHECK_FLAG(bnc->flags, BGP_NEXTHOP_VALID));
+}
+
+void
+bgp_unlink_nexthop (struct bgp_info *path)
+{
+ struct bgp_nexthop_cache *bnc = path->nexthop;
+
+ if (!bnc)
+ return;
+
+ path_nh_map(path, NULL, 0);
+
+ if (LIST_EMPTY(&(bnc->paths)))
+ {
+ if (BGP_DEBUG(nht, NHT))
+ {
+ char buf[INET6_ADDRSTRLEN];
+ zlog_debug("bgp_unlink_nexthop: freeing bnc %s",
+ bnc_str(bnc, buf, INET6_ADDRSTRLEN));
+ }
+ unregister_nexthop(bnc);
+ bnc->node->info = NULL;
+ bgp_unlock_node(bnc->node);
+ bnc_free(bnc);
+ }
+}
+
+int
+bgp_find_or_add_nexthop (afi_t afi, struct bgp_info *ri, int *changed,
+ int *metricchanged)
+{
+ struct bgp_node *rn;
+ struct bgp_nexthop_cache *bnc;
+ struct prefix p;
+
+ if (make_prefix(afi, ri, &p) < 0)
+ return 1;
+ rn = bgp_node_get (bgp_nexthop_cache_table[afi], &p);
+
+ if (!rn->info)
+ {
+ bnc = bnc_new();
+ rn->info = bnc;
+ bnc->node = rn;
+ bgp_lock_node(rn);
+ register_nexthop(bnc);
+ }
+ bnc = rn->info;
+ bgp_unlock_node (rn);
+ path_nh_map(ri, bnc, 1);
+
+ if (changed)
+ *changed = CHECK_FLAG(bnc->change_flags, BGP_NEXTHOP_CHANGED);
+
+ if (metricchanged)
+ *metricchanged = CHECK_FLAG(bnc->change_flags,
+ BGP_NEXTHOP_METRIC_CHANGED);
+
+ if (CHECK_FLAG(bnc->flags, BGP_NEXTHOP_VALID) && bnc->metric)
+ (bgp_info_extra_get(ri))->igpmetric = bnc->metric;
+ else if (ri->extra)
+ ri->extra->igpmetric = 0;
+
+ return (CHECK_FLAG(bnc->flags, BGP_NEXTHOP_VALID));
+}
+
+void
+bgp_parse_nexthop_update (void)
+{
+ struct stream *s;
+ struct bgp_node *rn;
+ struct bgp_nexthop_cache *bnc;
+ struct nexthop *nexthop;
+ struct nexthop *oldnh;
+ struct nexthop *nhlist_head = NULL;
+ struct nexthop *nhlist_tail = NULL;
+ uint32_t metric;
+ u_char nexthop_num;
+ struct prefix p;
+ int i;
+
+ s = zclient->ibuf;
+
+ memset(&p, 0, sizeof(struct prefix));
+ p.family = stream_getw(s);
+ p.prefixlen = stream_getc(s);
+ switch (p.family)
+ {
+ case AF_INET:
+ p.u.prefix4.s_addr = stream_get_ipv4 (s);
+ break;
+ case AF_INET6:
+ stream_get(&p.u.prefix6, s, 16);
+ break;
+ default:
+ break;
+ }
+
+ rn = bgp_node_lookup(bgp_nexthop_cache_table[family2afi(p.family)], &p);
+ if (!rn || !rn->info)
+ {
+ if (BGP_DEBUG(nht, NHT))
+ {
+ char buf[INET6_ADDRSTRLEN];
+ prefix2str(&p, buf, INET6_ADDRSTRLEN);
+ zlog_debug("parse nexthop update(%s): rn not found", buf);
+ }
+ if (rn)
+ bgp_unlock_node (rn);
+ return;
+ }
+
+ bnc = rn->info;
+ bgp_unlock_node (rn);
+ bnc->last_update = bgp_clock();
+ bnc->change_flags = 0;
+ metric = stream_getl (s);
+ nexthop_num = stream_getc (s);
+
+ /* debug print the input */
+ if (BGP_DEBUG(nht, NHT))
+ {
+ char buf[INET6_ADDRSTRLEN];
+ prefix2str(&p, buf, INET6_ADDRSTRLEN);
+ zlog_debug("parse nexthop update(%s): metric=%d, #nexthop=%d", buf,
+ metric, nexthop_num);
+ }
+
+ if (metric != bnc->metric)
+ bnc->change_flags |= BGP_NEXTHOP_METRIC_CHANGED;
+
+ if(nexthop_num != bnc->nexthop_num)
+ bnc->change_flags |= BGP_NEXTHOP_CHANGED;
+
+ if (nexthop_num)
+ {
+ bnc->flags |= BGP_NEXTHOP_VALID;
+ bnc->metric = metric;
+ bnc->nexthop_num = nexthop_num;
+
+ for (i = 0; i < nexthop_num; i++)
+ {
+ nexthop = nexthop_new();
+ nexthop->type = stream_getc (s);
+ switch (nexthop->type)
+ {
+ case ZEBRA_NEXTHOP_IPV4:
+ nexthop->gate.ipv4.s_addr = stream_get_ipv4 (s);
+ break;
+ case ZEBRA_NEXTHOP_IFINDEX:
+ case ZEBRA_NEXTHOP_IFNAME:
+ nexthop->ifindex = stream_getl (s);
+ break;
+ case ZEBRA_NEXTHOP_IPV4_IFINDEX:
+ case ZEBRA_NEXTHOP_IPV4_IFNAME:
+ nexthop->gate.ipv4.s_addr = stream_get_ipv4 (s);
+ nexthop->ifindex = stream_getl (s);
+ break;
+#ifdef HAVE_IPV6
+ case ZEBRA_NEXTHOP_IPV6:
+ stream_get (&nexthop->gate.ipv6, s, 16);
+ break;
+ case ZEBRA_NEXTHOP_IPV6_IFINDEX:
+ case ZEBRA_NEXTHOP_IPV6_IFNAME:
+ stream_get (&nexthop->gate.ipv6, s, 16);
+ nexthop->ifindex = stream_getl (s);
+ break;
+#endif
+ default:
+ /* do nothing */
+ break;
+ }
+
+ if (nhlist_tail)
+ {
+ nhlist_tail->next = nexthop;
+ nhlist_tail = nexthop;
+ }
+ else
+ {
+ nhlist_tail = nexthop;
+ nhlist_head = nexthop;
+ }
+
+ /* No need to evaluate the nexthop if we have already determined
+ * that there has been a change.
+ */
+ if (bnc->change_flags & BGP_NEXTHOP_CHANGED)
+ continue;
+
+ for (oldnh = bnc->nexthop; oldnh; oldnh = oldnh->next)
+ if (nexthop_same_no_recurse(oldnh, nexthop))
+ break;
+
+ if (!oldnh)
+ bnc->change_flags |= BGP_NEXTHOP_CHANGED;
+ }
+ bnc_nexthop_free(bnc);
+ bnc->nexthop = nhlist_head;
+ }
+ else
+ {
+ bnc->flags &= ~BGP_NEXTHOP_VALID;
+ bnc_nexthop_free(bnc);
+ bnc->nexthop = NULL;
+ }
+
+ evaluate_paths(bnc);
+}
+
+/**
+ * make_prefix - make a prefix structure from the path (essentially
+ * path's node.
+ */
+static int
+make_prefix (int afi, struct bgp_info *ri, struct prefix *p)
+{
+ memset (p, 0, sizeof (struct prefix));
+ switch (afi)
+ {
+ case AFI_IP:
+ p->family = AF_INET;
+ p->prefixlen = IPV4_MAX_BITLEN;
+ p->u.prefix4 = ri->attr->nexthop;
+ break;
+#ifdef HAVE_IPV6
+ case AFI_IP6:
+ if (ri->attr->extra->mp_nexthop_len != 16
+ || IN6_IS_ADDR_LINKLOCAL (&ri->attr->extra->mp_nexthop_global))
+ return -1;
+
+ p->family = AF_INET6;
+ p->prefixlen = IPV6_MAX_BITLEN;
+ p->u.prefix6 = ri->attr->extra->mp_nexthop_global;
+ break;
+#endif
+ default:
+ break;
+ }
+ return 0;
+}
+
+/**
+ * sendmsg_nexthop -- Format and send a nexthop register/Unregister
+ * command to Zebra.
+ * ARGUMENTS:
+ * struct bgp_nexthop_cache *bnc -- the nexthop structure.
+ * int command -- either ZEBRA_NEXTHOP_REGISTER or ZEBRA_NEXTHOP_UNREGISTER
+ * RETURNS:
+ * void.
+ */
+static void
+sendmsg_nexthop (struct bgp_nexthop_cache *bnc, int command)
+{
+ struct stream *s;
+ struct prefix *p;
+ int ret;
+
+ /* Check socket. */
+ if (!zclient || zclient->sock < 0)
+ return;
+
+ p = &(bnc->node->p);
+ s = zclient->obuf;
+ stream_reset (s);
+ zclient_create_header (s, command, VRF_DEFAULT);
+ stream_putw(s, PREFIX_FAMILY(p));
+ stream_putc(s, p->prefixlen);
+ switch (PREFIX_FAMILY(p))
+ {
+ case AF_INET:
+ stream_put_in_addr (s, &p->u.prefix4);
+ break;
+#ifdef HAVE_IPV6
+ case AF_INET6:
+ stream_put(s, &(p->u.prefix6), 16);
+ break;
+#endif
+ default:
+ break;
+ }
+ stream_putw_at (s, 0, stream_get_endp (s));
+
+ ret = zclient_send_message(zclient);
+ /* TBD: handle the failure */
+ if (ret < 0)
+ zlog_warn("sendmsg_nexthop: zclient_send_message() failed");
+ return;
+}
+
+/**
+ * register_nexthop - register a nexthop with Zebra for notification
+ * when the route to the nexthop changes.
+ * ARGUMENTS:
+ * struct bgp_nexthop_cache *bnc -- the nexthop structure.
+ * RETURNS:
+ * void.
+ */
+static void
+register_nexthop (struct bgp_nexthop_cache *bnc)
+{
+ /* Check if we have already registered */
+ if (bnc->flags & BGP_NEXTHOP_REGISTERED)
+ return;
+ sendmsg_nexthop(bnc, ZEBRA_NEXTHOP_REGISTER);
+ SET_FLAG(bnc->flags, BGP_NEXTHOP_REGISTERED);
+}
+
+/**
+ * unregister_nexthop -- Unregister the nexthop from Zebra.
+ * ARGUMENTS:
+ * struct bgp_nexthop_cache *bnc -- the nexthop structure.
+ * RETURNS:
+ * void.
+ */
+static void
+unregister_nexthop (struct bgp_nexthop_cache *bnc)
+{
+ /* Check if we have already registered */
+ if (!CHECK_FLAG(bnc->flags, BGP_NEXTHOP_REGISTERED))
+ return;
+
+ sendmsg_nexthop(bnc, ZEBRA_NEXTHOP_UNREGISTER);
+ UNSET_FLAG(bnc->flags, BGP_NEXTHOP_REGISTERED);
+}
+
+/**
+ * evaluate_paths - Evaluate the paths/nets associated with a nexthop.
+ * ARGUMENTS:
+ * struct bgp_nexthop_cache *bnc -- the nexthop structure.
+ * RETURNS:
+ * void.
+ */
+static void
+evaluate_paths (struct bgp_nexthop_cache *bnc)
+{
+ struct bgp_node *rn;
+ struct bgp_info *path;
+ struct bgp *bgp = bgp_get_default();
+ int afi;
+
+ LIST_FOREACH(path, &(bnc->paths), nh_thread)
+ {
+ if (!(path->type == ZEBRA_ROUTE_BGP &&
+ path->sub_type == BGP_ROUTE_NORMAL))
+ continue;
+
+ rn = path->net;
+ afi = family2afi(rn->p.family);
+
+ /* Path becomes valid/invalid depending on whether the nexthop
+ * reachable/unreachable.
+ */
+ if ((CHECK_FLAG(path->flags, BGP_INFO_VALID) ? 1 : 0) !=
+ (CHECK_FLAG(bnc->flags, BGP_NEXTHOP_VALID) ? 1 : 0))
+ {
+ if (CHECK_FLAG (path->flags, BGP_INFO_VALID))
+ {
+ bgp_aggregate_decrement (bgp, &rn->p, path,
+ afi, SAFI_UNICAST);
+ bgp_info_unset_flag (rn, path, BGP_INFO_VALID);
+ }
+ else
+ {
+ bgp_info_set_flag (rn, path, BGP_INFO_VALID);
+ bgp_aggregate_increment (bgp, &rn->p, path,
+ afi, SAFI_UNICAST);
+ }
+ }
+
+ /* Copy the metric to the path. Will be used for bestpath computation */
+ if (CHECK_FLAG(bnc->flags, BGP_NEXTHOP_VALID) && bnc->metric)
+ (bgp_info_extra_get(path))->igpmetric = bnc->metric;
+ else if (path->extra)
+ path->extra->igpmetric = 0;
+
+ if (CHECK_FLAG(bnc->flags, BGP_NEXTHOP_METRIC_CHANGED) ||
+ CHECK_FLAG(bnc->flags, BGP_NEXTHOP_CHANGED))
+ SET_FLAG(path->flags, BGP_INFO_IGP_CHANGED);
+
+ bgp_process(bgp, rn, afi, SAFI_UNICAST);
+ }
+ RESET_FLAG(bnc->change_flags);
+}
+
+/**
+ * path_nh_map - make or break path-to-nexthop association.
+ * ARGUMENTS:
+ * path - pointer to the path structure
+ * bnc - pointer to the nexthop structure
+ * make - if set, make the association. if unset, just break the existing
+ * association.
+ */
+static void
+path_nh_map (struct bgp_info *path, struct bgp_nexthop_cache *bnc, int make)
+{
+ if (path->nexthop)
+ {
+ LIST_REMOVE(path, nh_thread);
+ path->nexthop->path_count--;
+ path->nexthop = NULL;
+ }
+ if (make)
+ {
+ LIST_INSERT_HEAD(&(bnc->paths), path, nh_thread);
+ path->nexthop = bnc;
+ path->nexthop->path_count++;
+ }
+}
diff --git a/bgpd/bgp_nht.h b/bgpd/bgp_nht.h
new file mode 100644
index 0000000..41c2b85
--- /dev/null
+++ b/bgpd/bgp_nht.h
@@ -0,0 +1,62 @@
+/* BGP Nexthop tracking
+ * Copyright (C) 2013 Cumulus Networks, Inc.
+ *
+ * This file is part of GNU Zebra.
+ *
+ * GNU Zebra is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2, or (at your option) any
+ * later version.
+ *
+ * GNU Zebra is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with GNU Zebra; see the file COPYING. If not, write to the Free
+ * Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
+ * 02111-1307, USA.
+ */
+
+#ifndef _BGP_NHT_H
+#define _BGP_NHT_H
+
+/**
+ * bgp_parse_nexthop_update() - parse a nexthop update message from Zebra.
+ */
+extern void bgp_parse_nexthop_update(void);
+
+/**
+ * bgp_find_nexthop() - lookup the nexthop cache table for the bnc object
+ * ARGUMENTS:
+ * p - path for which the nexthop object is being looked up
+ * c - output variable that stores whether the nexthop object has changed
+ * since last time.
+ * m - output variable that stores whether the nexthop metric has changed
+ * since last time.
+ */
+extern int bgp_find_nexthop(struct bgp_info *p, int *c, int *m);
+
+/**
+ * bgp_find_or_add_nexthop() - lookup the nexthop cache table for the bnc
+ * object. If not found, create a new object and register with ZEBRA for
+ * nexthop notification.
+ * ARGUMENTS:
+ * a - afi: AFI_IP or AF_IP6
+ * p - path for which the nexthop object is being looked up
+ * c - output variable that stores whether the nexthop object has changed
+ * since last time.
+ * m - output variable that stores whether the nexthop metric has changed
+ * since last time.
+ */
+extern int bgp_find_or_add_nexthop(afi_t a, struct bgp_info *p, int *c, int *m);
+
+/**
+ * bgp_unlink_nexthop() - Unlink the nexthop object from the path structure.
+ * ARGUMENTS:
+ * p - path structure.
+ */
+extern void bgp_unlink_nexthop(struct bgp_info *p);
+
+#endif /* _BGP_NHT_H */
diff --git a/bgpd/bgp_route.c b/bgpd/bgp_route.c
index 746c545..051dc83 100644
--- a/bgpd/bgp_route.c
+++ b/bgpd/bgp_route.c
@@ -55,6 +55,7 @@
#include "bgpd/bgp_zebra.h"
#include "bgpd/bgp_vty.h"
#include "bgpd/bgp_mpath.h"
+#include "bgpd/bgp_nht.c"
/* Extern from bgp_dump.c */
extern const char *bgp_origin_str[];
@@ -126,20 +127,14 @@
return ri->extra;
}
-/* Allocate new bgp info structure. */
-static struct bgp_info *
-bgp_info_new (void)
-{
- return XCALLOC (MTYPE_BGP_ROUTE, sizeof (struct bgp_info));
-}
-
/* Free bgp route information. */
static void
bgp_info_free (struct bgp_info *binfo)
{
if (binfo->attr)
bgp_attr_unintern (&binfo->attr);
-
+
+ bgp_unlink_nexthop(binfo);
bgp_info_extra_free (&binfo->extra);
bgp_info_mpath_free (&binfo->mpath);
@@ -1882,6 +1877,23 @@
bgp_rib_remove (rn, ri, peer, afi, safi);
}
+static struct bgp_info *
+info_make (int type, int sub_type, struct peer *peer, struct attr *attr,
+ struct bgp_node *rn)
+{
+ struct bgp_info *new;
+
+ /* Make new BGP info. */
+ new = XCALLOC (MTYPE_BGP_ROUTE, sizeof (struct bgp_info));
+ new->type = type;
+ new->sub_type = sub_type;
+ new->peer = peer;
+ new->attr = attr;
+ new->uptime = bgp_clock ();
+ new->net = rn;
+ return new;
+}
+
static void
bgp_update_rsclient (struct peer *rsclient, afi_t afi, safi_t safi,
struct attr *attr, struct peer *peer, struct prefix *p, int type,
@@ -2029,13 +2041,7 @@
p->prefixlen, rsclient->host);
}
- /* Make new BGP info. */
- new = bgp_info_new ();
- new->type = type;
- new->sub_type = sub_type;
- new->peer = peer;
- new->attr = attr_new;
- new->uptime = bgp_clock ();
+ new = info_make(type, sub_type, peer, attr_new, rn);
/* Update MPLS tag. */
if (safi == SAFI_MPLS_VPN)
@@ -2347,7 +2353,7 @@
|| (peer->sort == BGP_PEER_EBGP && peer->ttl != 1)
|| CHECK_FLAG (peer->flags, PEER_FLAG_DISABLE_CONNECTED_CHECK)))
{
- if (bgp_nexthop_lookup (afi, peer, ri, NULL, NULL))
+ if (bgp_find_or_add_nexthop (afi, ri, NULL, NULL))
bgp_info_set_flag (rn, ri, BGP_INFO_VALID);
else
bgp_info_unset_flag (rn, ri, BGP_INFO_VALID);
@@ -2376,12 +2382,7 @@
}
/* Make new BGP info. */
- new = bgp_info_new ();
- new->type = type;
- new->sub_type = sub_type;
- new->peer = peer;
- new->attr = attr_new;
- new->uptime = bgp_clock ();
+ new = info_make(type, sub_type, peer, attr_new, rn);
/* Update MPLS tag. */
if (safi == SAFI_MPLS_VPN)
@@ -2395,7 +2396,7 @@
|| (peer->sort == BGP_PEER_EBGP && peer->ttl != 1)
|| CHECK_FLAG (peer->flags, PEER_FLAG_DISABLE_CONNECTED_CHECK)))
{
- if (bgp_nexthop_lookup (afi, peer, new, NULL, NULL))
+ if (bgp_find_or_add_nexthop (afi, new, NULL, NULL))
bgp_info_set_flag (rn, new, BGP_INFO_VALID);
else
bgp_info_unset_flag (rn, new, BGP_INFO_VALID);
@@ -3538,15 +3539,11 @@
return;
}
}
-
+
/* Make new BGP info. */
- new = bgp_info_new ();
- new->type = ZEBRA_ROUTE_BGP;
- new->sub_type = BGP_ROUTE_STATIC;
- new->peer = bgp->peer_self;
+ new = info_make(ZEBRA_ROUTE_BGP, BGP_ROUTE_STATIC, bgp->peer_self,
+ attr_new, rn);
SET_FLAG (new->flags, BGP_INFO_VALID);
- new->attr = attr_new;
- new->uptime = bgp_clock ();
/* Register new BGP information. */
bgp_info_add (rn, new);
@@ -3659,13 +3656,9 @@
}
/* Make new BGP info. */
- new = bgp_info_new ();
- new->type = ZEBRA_ROUTE_BGP;
- new->sub_type = BGP_ROUTE_STATIC;
- new->peer = bgp->peer_self;
+ new = info_make(ZEBRA_ROUTE_BGP, BGP_ROUTE_STATIC, bgp->peer_self, attr_new,
+ rn);
SET_FLAG (new->flags, BGP_INFO_VALID);
- new->attr = attr_new;
- new->uptime = bgp_clock ();
/* Aggregate address increment. */
bgp_aggregate_increment (bgp, p, new, afi, safi);
@@ -3706,8 +3699,14 @@
{
struct bgp_node *rn;
struct bgp_info *ri;
+ struct bgp_info *new;
- rn = bgp_afi_node_get (bgp->rib[afi][safi], afi, safi, p, NULL);
+ /* Make new BGP info. */
+ rn = bgp_node_get (bgp->rib[afi][safi], p);
+ new = info_make(ZEBRA_ROUTE_BGP, BGP_ROUTE_STATIC, bgp->peer_self,
+ bgp_attr_default_intern(BGP_ORIGIN_IGP), rn);
+
+ SET_FLAG (new->flags, BGP_INFO_VALID);
/* Check selected route and self inserted route. */
for (ri = rn->info; ri; ri = ri->next)
@@ -3877,13 +3876,9 @@
/* Make new BGP info. */
- new = bgp_info_new ();
- new->type = ZEBRA_ROUTE_BGP;
- new->sub_type = BGP_ROUTE_STATIC;
- new->peer = bgp->peer_self;
- new->attr = attr_new;
+ new = info_make (ZEBRA_ROUTE_BGP, BGP_ROUTE_STATIC, bgp->peer_self,
+ attr_new, rn);
SET_FLAG (new->flags, BGP_INFO_VALID);
- new->uptime = bgp_clock ();
new->extra = bgp_info_extra_new();
memcpy (new->extra->tag, bgp_static->tag, 3);
@@ -4881,13 +4876,10 @@
if (aggregate->count > 0)
{
rn = bgp_node_get (table, p);
- new = bgp_info_new ();
- new->type = ZEBRA_ROUTE_BGP;
- new->sub_type = BGP_ROUTE_AGGREGATE;
- new->peer = bgp->peer_self;
+ new = info_make(ZEBRA_ROUTE_BGP, BGP_ROUTE_AGGREGATE, bgp->peer_self,
+ bgp_attr_aggregate_intern(bgp, origin, aspath, community,
+ aggregate->as_set), rn);
SET_FLAG (new->flags, BGP_INFO_VALID);
- new->attr = bgp_attr_aggregate_intern (bgp, origin, aspath, community, aggregate->as_set);
- new->uptime = bgp_clock ();
bgp_info_add (rn, new);
bgp_unlock_node (rn);
@@ -5065,14 +5057,10 @@
if (aggregate->count)
{
rn = bgp_node_get (table, p);
-
- new = bgp_info_new ();
- new->type = ZEBRA_ROUTE_BGP;
- new->sub_type = BGP_ROUTE_AGGREGATE;
- new->peer = bgp->peer_self;
+ new = info_make(ZEBRA_ROUTE_BGP, BGP_ROUTE_AGGREGATE, bgp->peer_self,
+ bgp_attr_aggregate_intern(bgp, origin, aspath, community,
+ aggregate->as_set), rn);
SET_FLAG (new->flags, BGP_INFO_VALID);
- new->attr = bgp_attr_aggregate_intern (bgp, origin, aspath, community, aggregate->as_set);
- new->uptime = bgp_clock ();
bgp_info_add (rn, new);
bgp_unlock_node (rn);
@@ -5715,16 +5703,12 @@
aspath_unintern (&attr.aspath);
bgp_attr_extra_free (&attr);
return;
- }
+ }
}
- new = bgp_info_new ();
- new->type = type;
- new->sub_type = BGP_ROUTE_REDISTRIBUTE;
- new->peer = bgp->peer_self;
+ new = info_make(type, BGP_ROUTE_REDISTRIBUTE, bgp->peer_self,
+ new_attr, bn);
SET_FLAG (new->flags, BGP_INFO_VALID);
- new->attr = new_attr;
- new->uptime = bgp_clock ();
bgp_aggregate_increment (bgp, p, new, afi, SAFI_UNICAST);
bgp_info_add (bn, new);
diff --git a/bgpd/bgp_route.h b/bgpd/bgp_route.h
index 8483f3d..16b6d5a 100644
--- a/bgpd/bgp_route.h
+++ b/bgpd/bgp_route.h
@@ -21,8 +21,11 @@
#ifndef _QUAGGA_BGP_ROUTE_H
#define _QUAGGA_BGP_ROUTE_H
+#include "queue.h"
#include "bgp_table.h"
+struct bgp_nexthop_cache;
+
/* Ancillary information to struct bgp_info,
* used for uncommonly used data (aggregation, MPLS, etc.)
* and lazily allocated to save memory.
@@ -47,7 +50,16 @@
/* For linked list. */
struct bgp_info *next;
struct bgp_info *prev;
-
+
+ /* For nexthop linked list */
+ LIST_ENTRY(bgp_info) nh_thread;
+
+ /* Back pointer to the prefix node */
+ struct bgp_node *net;
+
+ /* Back pointer to the nexthop structure */
+ struct bgp_nexthop_cache *nexthop;
+
/* Peer structure. */
struct peer *peer;
diff --git a/bgpd/bgp_zebra.c b/bgpd/bgp_zebra.c
index 00de6b8..484e355 100644
--- a/bgpd/bgp_zebra.c
+++ b/bgpd/bgp_zebra.c
@@ -39,6 +39,8 @@
#include "bgpd/bgp_fsm.h"
#include "bgpd/bgp_debug.h"
#include "bgpd/bgp_mpath.h"
+#include "bgpd/bgp_nexthop.h"
+#include "bgpd/bgp_nht.h"
/* All information about zebra. */
struct zclient *zclient = NULL;
@@ -70,6 +72,15 @@
return 0;
}
+/* Nexthop update message from zebra. */
+static int
+bgp_read_nexthop_update (int command, struct zclient *zclient,
+ zebra_size_t length, vrf_id_t vrf_id)
+{
+ bgp_parse_nexthop_update();
+ return 0;
+}
+
/* Inteface addition message from zebra. */
static int
bgp_interface_add (int command, struct zclient *zclient, zebra_size_t length,
@@ -1160,6 +1171,7 @@
zclient->interface_down = bgp_interface_down;
zclient->ipv6_route_add = zebra_read_ipv6;
zclient->ipv6_route_delete = zebra_read_ipv6;
+ zclient->nexthop_update = bgp_read_nexthop_update;
bgp_nexthop_buf = stream_new(BGP_NEXTHOP_BUF_SIZE);
bgp_ifindices_buf = stream_new(BGP_IFINDICES_BUF_SIZE);