Age | Commit message (Collapse) | Author |
|
For this use the validation state (vstate) in struct prefix and
struct filterstate to store both the ASPA and ROA validity.
Introduce helper functions to set and get the various states for
struct prefix and make sure struct filterstate is also setup properly.
Change the ASPA state in rde_aspath to be AFI/AID and role independent
by storing all 4 possible outcomes. Also add a ASPA generation count
which is used to update the rde_aspath ASPA state cache on reloads.
Rework the rde_aspa.c code to be AFI/AID and role independent. Doing
this for roles is trivial but AFI switch goes deep and is so unnecessary.
The reload is combined with the ROA reload logic and renamed to RPKI
softreload.
OK tb@
|
|
to the various prefix update functions.
While there fix a filterstate leak in up_generate_updates().
With and OK tb@
|
|
RDE. The actual reload logic is missing to keep the diff small.
OK tb@
|
|
- rde_filterstate_init(): initialize a filterstate to default values
- rde_filterstate_copy(): copy from a filterstate into a new state object
- rde_filterstate_prep(): set filtersate based on prefix passed as argument.
This makes the code a bit easier to read.
OK tb@
|
|
Removes vstate argument from rde_filter().
Rename prefix_vstate() to prefix_roa_vstate().
OK tb@
|
|
This implements ASPA validation based on the current draft. Implementing
this showed various weaknesses in the current ASPA draft which I hope to
fix in the near future.
Unlike the algorithm specified in the draft our version validates the
AS_PATH attribute in a single path doing one or two lookups depending on
the sessions BGP role.
The code is not yet hooked up into the RDE (see the NOTYET blocks).
Missing are reload logic, bgpctl integration and the loading of the
merged ASPA set from the rtr process.
OK tb@
|
|
any parts of his diff not taken are noted on tech
|
|
OK tb@
|
|
The generic add-path code up_generate_addpath() reevaluates everything
since this is the simplest way to select the announced paths. For add-path
all this is overkill since there is no dependency between prefixes and so
individual prefixes can be handled more efficently.
Extend rde_generate_updates() to pass the current newbest and oldbest
prefixes (for the selected best path) but now also include newpath and
oldpath (which is the prefix that is added/removed/modified).
If newpath or oldpath is set then a single prefix was altered and
up_generate_addpath_all() can just remove or add this prefix.
If newpath and oldpath are NULL than the full list based on newbest
needs to be inserted and any old path/prefix removed in the process.
This improves update generation performance on big route collectors using
add-path all substantially.
OK tb@
|
|
Use a per peer path_id_tx to assign to paths received from none add-path
enabled peers. This skips two extra walks of the RIB prefix list and is
a big speed-up when there are many regular sessions. If the session uses
add-path recv then the old way of assigning random path_ids needs to be
used.
With input and OK tb@
|
|
In some cases only a "small" part of the RIB needs to be looked at. Like
bgpctl show rib 10/8 or-longer that only needs to travers nodes under
10/8 all other RIB entries do not matter. By setting the start node to
the RB_NFIND(10/8) the all nodes below this point can be skipped.
Using prefix_compare() while walking the tree with RB_NEXT() the walker
know when it steps outside of the 10/8 subtree and stops.
With this the or-longer commands become a lot faster.
Looks good to tb@
|
|
Only the RDE used a hashtable for lookups while the session engine
switched from a list to RB tree some time ago.
Use peer_foreach() in the mrt code instead of passing the peer list
as an argument.
OK benno@ tb@
|
|
prototypes and members that were not removed in the previous RB tree
conversions.
OK benno@ tb@
|
|
OK benno@ tb@
|
|
OK benno@
|
|
struct. It uses a bit more memory but improves performance a lot on really
big systems because aspath_get() becomes a very hot function.
OK tb@
|
|
undersized hash table.
OK tb@
|
|
OK tb@
|
|
them on the per peer imsg queue. This is mainly for IMSG_SESSION_DOWN.
Delaying the session down can race against IMSG_SESSION_ADD which is
handled immediatly and as a result an establised connection may be
removed in the RDE because of it.
The various graceful restart imsgs need similar treatment for similar
reasons. In the end when a session is reset/closed the RDE needs to
stop all work and flush the per peer imsg queue.
With this only update and route refresh messages are handled via the
imsg queue.
OK tb@
|
|
|
|
|
|
In rev 1.90 of rde_decide.c the re->active cache of the best prefix was
replaced with a call to prefix_best(). This introduced a bug because the
nexthop state at that time may have changed already. As a result when
a nexthop became unreachable prefix_evaluate() had oldbest = NULL and
newbest = NULL and did not withdraw the prefix from FIB and Adj-RIB-Out.
To fix this store the nexthop state per prefix and introduce
prefix_evaluate_nexthop() which removes the prefix from the decision list,
updates the nexthop state of the prefix and reinserts the prefix. Doing
this ensures that prefix_best() always reports the same result once the
decison process is done. prefix_best() and prefix_eligible() only depend
on data stored on the prefix itself.
OK tb@
|
|
This allows to send out more then one path per perfix to a neighbor that
supports add-path receive. OpenBGPD supports a few different modes to
select which paths to send:
- all: send all valid paths (the ones with a * in bgpctl output)
- best: send out only the single best path
- ecmp: send out paths that evaluate the same up and including
the nexthop metric
- as-wide-best: send out paths that evaluete the same up but not including
the nexthop metric
Currently ecmp and as-wide-best are the same. On top of this best, ecmp
and as-wide-best allow to include extra paths (e.g. best plus 2) and
for the multipath modes there is also a maximum (e.g. ecmp plus 2 max 4)
OK tb@
|
|
Adjust prefix_adjout_update() to properly handle path_id_tx.
Move the lookup of the prefix out of prefix_adjout_update() and to
up_generate_updates(). While that code uses prefix_adjout_lookup() to
find the current prefix in the Adj-RIB-Out and add-path aware function
will use prefix_adjout_get().
In up_generate_default() just use 0 for path_id_tx since for this peer
that is the only prefix installed into the Adj-RIB-Out.
OK tb@
|
|
For add-path a unique path_id needs to be assigne to all prefixes.
Use a random number since the RFC explicitly mentions that there is no
meaning what the value means. The local path_id is inherited to all
the RIBs. Adj-RIB-Out handling is not yet down.
OK tb@
|
|
this prefix with respect to its previous one.
Currently the plan is to distinguish the best prefix (only one), ecmp
prefixes (currently the same as as-wide-multipath), as-wide-multipath
prefixes, valid prefixes and invalid prefixes.
This information will be used to implement add-path send but also for
ecmp support in bgpd.
OK tb@
|
|
only called in one spot.
rde_generate_updates() gets a enum eval_mode argument to discern
the different cases. peer_generate_update() uses the eval_mode to skip
the update if it is not needed.
While there also add an extra AID check in IMSG_REFRESH case to make sure
the requested AID is actually available for this peer.
OK tb@
|
|
With this it is possible to send a role in the OPEN message and if that
was successful the RDE will add the new OTC attribute if necessary.
OK tb@
|
|
When max-communities X is set on a filterrule the filter will match when
more than X communities are present in the path. In other words
max-communities 0 means no communities are allowed and max-communities 3
limits it up to 3 communities.
There is max-communities, max-ext-communities and max-large-communities
for each of the 3 community attributes. These three max checks can be used
together.
OK tb@ job@
|
|
First of all the detection logic was totally wrong. Then filter out
non-transitive extended communities when received from an ebgp peer.
Also cleanup the type handling of ext-communities. Mainly to not have
to handle the transitive vs non-transitive versions the type is masked
with EXT_COMMUNITY_VALUE before doing the switch case for the various
types.
With this my test using ext-communities works.
OK tb@
|
|
rib_entry. Mostly mechanical, this simplifies prefix_insert() and
prefix_remove() since the redo queue can now just use TAILQ_INSERT_TAIL().
rde_softreconfig_sync_reeval() needs to use TAILQ_CONCAT() to move
the list of prefixes over to the local TAILQ_HEAD to reapply them later.
OK tb@
|
|
and it also makes less sense to track this with ECMP or add-path.
Replace the re->active access with prefix_best(re) which does the
check on the spot.
Feedback and OK tb@
|
|
First flush all affected Adj-RIB-Out and then in a second step re-evaluate
the RIB itself. The no evaluate case becomes simpler. Fix the way
prefixes are re-evaluated, the list remove needs to be explict and not
part of prefix_evaluate() as in most other cases since this list is not
part of the rib_entry.
OK tb@
|
|
will be reused soon.
OK denis@ tb@
|
|
and to the accounting in the function.
OK tb@
|
|
|
|
Just pass the prefix to be withdrawn to the function and move the lookup
up. Adjust how the various accounting vars are updated so that the
values are decremented in the right cases. Do the same accounting dance
for prefix_adjout_destroy(). Adjust rde_up_flush_upcall() to directly
call prefix_adjout_withdraw() without calling it via up_generate_updates().
OK tb@
|
|
prefix. For this extend the RB trees of the Adj-RIB-Out to also consider
the path_id. Add functions to lookup a prefix without path_id so that
bgpctl works. Rename functions so that all Adj-RIB-Out specific functions
start with prefix_adjout_
For now the path_id_tx in the Adj-RIB-Out is forced to 0 since
up_generate_updates() is not ready to handle more than one path per prefix.
OK tb@
|
|
a few reindents.
OK florian@ tb@
|
|
side of RFC7911 and the send portion will follow.
The path-id is extracted from the NLRI encoding an put into struct
prefix. To do this the prefix_by_peer() function gets a path-id
argument. If a session is not path-id enabled this argument will
be always 0. If a session is path-id enabled the value is taken
from the NLRI and can be anything, including 0. The value has no
meaning in itself. Still to make sure the decision process is able
to break a tie the path-id is checked as the last step (this is not
part of the RFC but required).
OK benno@
|
|
that splits the normal RIB linkage vs the adjrib-out linkage. This is
done to make a bit of space to put an extra add-path related id into
the struct without blowing its size over 128 bytes.
Long run this struct should be split up but the necessary changes are
too large right now so this is the 2nd best option.
OK benno@
|
|
can be enabled with 'announce enhanced refresh yes'
Similar to graceful restart this allows to mark routes as stale, refresh
them and the flush out routes that are still stale. Enhanced route refresh
uses a begin of rr and a end of rr message to signal the various stages.
A future enhancement would be the addition of a timeout in case the EoRR
message is not sent in reasonable time.
OK denis@ job@
|
|
hopefully better names peer_has_as4byte() and peer_accept_no_as_set().
Move them to rde_peer.c where all other peer functions live.
OK sthen@
|
|
Add an extra reload barrier (IMSG_RECONF_DRAIN) to the sync of the peer
config from the session engine to the rde. Necessary to ensure that the
peer config is up to date in the RDE before hitting reconfiguration.
Store the export_type and the peer flags outside of peer->conf. Adjust all
users of these two fields so they only look at the copies in peer.
During reload check the values with the peer->conf to check for changes.
If the export_type or the rde evaluate or transparent-as flags changed
flush the Adj-RIB-Out for that peer and in a 2nd step rebuild the RIB from
scratch. This results in a lot of UPDATE churn but these configs are not
altered often.
Fix multiple issues in the rde_softreconfig_in_done handler that resulted
in multiple runs of the out stage of the softreconfig pipeline.
OK benno@
|
|
route-server environments.
By default only the best path is sent to peers and if that path is filtered
then the path is hidden for that peer. On route-servers this is sometimes
not desried. For this 'rde evaluate all' will cause the evaluation process
to fall back to alternate routes and will redistribute the first non-filtered
path to the peer. This is very similar to per-peer RIBs but accomplishes
the same effect without the massive increase in memory usage. Compared to
the default mode this requires more CPU resources but it is probably less
than what per-peer RIBs would require.
'rde evaluate all' can be set and reset globally, on groups and on idividual
neighbors. It is not limited to route-server configs but route loops are
possible if not properly used.
OK benno@
|
|
Doing the LIST_REMOVE() outside of prefix_evalute() is no longer valid.
As a benefit it is now simply possible to re-evaluate a prefix by passing
it to prefix_evaluate() for both removal and insertion. prefix_evaluate()
will then take care to ensure that a update is sent out if necessary.
Also move rde_send_kroute() call to rde_generate_updates() to make it a
bit easier to plug this module into a regress test.
OK denis@
|
|
prefixes from multiple sessions into the same table. Before a prefix
was removed from the table on the first withdraw (even though there
was an alternative around).
Requested by, tested and OK dlg@
|
|
Reported by Prof. Dr. Steffen Wendzel <wendzel @ hs-worms . de>,
thanks!
OK martijn@ sthen@
|
|
This is an easy safety switch to not leak full tables to upstreams and
peers. If the limit is hit a Cease notification is sent and the session
is closed.
This implements most of https://tools.ietf.org/html/draft-sa-idr-maxprefix-00
OK job@
|
|
struct rde_aspath define aspath_hashstart and aspath_hashend and update
all values in one call. Inspired by struct process and its ps_startcopy.
OK deraadt@
|