summaryrefslogtreecommitdiff
path: root/usr.sbin/bgpd/rde.h
AgeCommit message (Collapse)Author
2023-01-24Implement ASPA validation and reload logic on ASPA set changes.Claudio Jeker
For this use the validation state (vstate) in struct prefix and struct filterstate to store both the ASPA and ROA validity. Introduce helper functions to set and get the various states for struct prefix and make sure struct filterstate is also setup properly. Change the ASPA state in rde_aspath to be AFI/AID and role independent by storing all 4 possible outcomes. Also add a ASPA generation count which is used to update the rde_aspath ASPA state cache on reloads. Rework the rde_aspa.c code to be AFI/AID and role independent. Doing this for roles is trivial but AFI switch goes deep and is so unnecessary. The reload is combined with the ROA reload logic and renamed to RPKI softreload. OK tb@
2023-01-18Use the vstate of the filterstate struct instead of passing an extra copyClaudio Jeker
to the various prefix update functions. While there fix a filterstate leak in up_generate_updates(). With and OK tb@
2023-01-17Add the needed logic to load the ASPA table from the rtr process into theClaudio Jeker
RDE. The actual reload logic is missing to keep the diff small. OK tb@
2023-01-12Split rde_filterstate_prep() into three functions.Claudio Jeker
- rde_filterstate_init(): initialize a filterstate to default values - rde_filterstate_copy(): copy from a filterstate into a new state object - rde_filterstate_prep(): set filtersate based on prefix passed as argument. This makes the code a bit easier to read. OK tb@
2023-01-11Add the validation state to the filterstate struct.Claudio Jeker
Removes vstate argument from rde_filter(). Rename prefix_vstate() to prefix_roa_vstate(). OK tb@
2023-01-11Add ASPA validation functions to the RDE.Claudio Jeker
This implements ASPA validation based on the current draft. Implementing this showed various weaknesses in the current ASPA draft which I hope to fix in the near future. Unlike the algorithm specified in the draft our version validates the AS_PATH attribute in a single path doing one or two lookups depending on the sessions BGP role. The code is not yet hooked up into the RDE (see the NOTYET blocks). Missing are reload logic, bgpctl integration and the loading of the merged ASPA set from the rtr process. OK tb@
2022-12-28spelling fixes; from paul tagliamonteJason McIntyre
any parts of his diff not taken are noted on tech
2022-12-14Move some basic accessors of aspath to rde.h and make them static inline.Claudio Jeker
OK tb@
2022-09-23Implement a special update generator for add-path send all.Claudio Jeker
The generic add-path code up_generate_addpath() reevaluates everything since this is the simplest way to select the announced paths. For add-path all this is overkill since there is no dependency between prefixes and so individual prefixes can be handled more efficently. Extend rde_generate_updates() to pass the current newbest and oldbest prefixes (for the selected best path) but now also include newpath and oldpath (which is the prefix that is added/removed/modified). If newpath or oldpath is set then a single prefix was altered and up_generate_addpath_all() can just remove or add this prefix. If newpath and oldpath are NULL than the full list based on newbest needs to be inserted and any old path/prefix removed in the process. This improves update generation performance on big route collectors using add-path all substantially. OK tb@
2022-09-21Adjust pathid_assign() to be much faster in the common case.Claudio Jeker
Use a per peer path_id_tx to assign to paths received from none add-path enabled peers. This skips two extra walks of the RIB prefix list and is a big speed-up when there are many regular sessions. If the session uses add-path recv then the old way of assigning random path_ids needs to be used. With input and OK tb@
2022-09-12Introduce tree walkers that only walk a subtree of the RIB.Claudio Jeker
In some cases only a "small" part of the RIB needs to be looked at. Like bgpctl show rib 10/8 or-longer that only needs to travers nodes under 10/8 all other RIB entries do not matter. By setting the start node to the RB_NFIND(10/8) the all nodes below this point can be skipped. Using prefix_compare() while walking the tree with RB_NEXT() the walker know when it steps outside of the 10/8 subtree and stops. With this the or-longer commands become a lot faster. Looks good to tb@
2022-09-01Switch the rde_peer hashtable and peer list to a single RB tree.Claudio Jeker
Only the RDE used a hashtable for lookups while the session engine switched from a list to RB tree some time ago. Use peer_foreach() in the mrt code instead of passing the peer list as an argument. OK benno@ tb@
2022-09-01This code no longer needs siphash.h and also cleanup some leftoverClaudio Jeker
prototypes and members that were not removed in the previous RB tree conversions. OK benno@ tb@
2022-08-31Switch the generic attribute cache to an RB tree.Claudio Jeker
OK benno@ tb@
2022-08-30Switch nexthop hash to a RB tree.Claudio Jeker
OK benno@
2022-08-29Instead of a global aspath cache copy the aspath attribute per rde_aspathClaudio Jeker
struct. It uses a bit more memory but improves performance a lot on really big systems because aspath_get() becomes a very hot function. OK tb@
2022-08-29Switch the DB of communities collections to a RB tree instead of anClaudio Jeker
undersized hash table. OK tb@
2022-08-29Switch rde_aspath to a RB tree instead of a hash table.Claudio Jeker
OK tb@
2022-08-26Handle IMSG_SESSION_* messages immediatly when received and do not putClaudio Jeker
them on the per peer imsg queue. This is mainly for IMSG_SESSION_DOWN. Delaying the session down can race against IMSG_SESSION_ADD which is handled immediatly and as a result an establised connection may be removed in the RDE because of it. The various graceful restart imsgs need similar treatment for similar reasons. In the end when a session is reset/closed the RDE needs to stop all work and flush the per peer imsg queue. With this only update and route refresh messages are handled via the imsg queue. OK tb@
2022-08-03Add comment that NEXTHOP_FLAPPED is only set on oldstate of a nexthop.Claudio Jeker
2022-07-28whitespace found during a read-thru; ok claudioTheo de Raadt
2022-07-25Properly handle nexthop state changes in the decision processClaudio Jeker
In rev 1.90 of rde_decide.c the re->active cache of the best prefix was replaced with a call to prefix_best(). This introduced a bug because the nexthop state at that time may have changed already. As a result when a nexthop became unreachable prefix_evaluate() had oldbest = NULL and newbest = NULL and did not withdraw the prefix from FIB and Adj-RIB-Out. To fix this store the nexthop state per prefix and introduce prefix_evaluate_nexthop() which removes the prefix from the decision list, updates the nexthop state of the prefix and reinserts the prefix. Doing this ensures that prefix_best() always reports the same result once the decison process is done. prefix_best() and prefix_eligible() only depend on data stored on the prefix itself. OK tb@
2022-07-11Implement send side of RFC7911 ADD-PATHClaudio Jeker
This allows to send out more then one path per perfix to a neighbor that supports add-path receive. OpenBGPD supports a few different modes to select which paths to send: - all: send all valid paths (the ones with a * in bgpctl output) - best: send out only the single best path - ecmp: send out paths that evaluate the same up and including the nexthop metric - as-wide-best: send out paths that evaluete the same up but not including the nexthop metric Currently ecmp and as-wide-best are the same. On top of this best, ecmp and as-wide-best allow to include extra paths (e.g. best plus 2) and for the multipath modes there is also a maximum (e.g. ecmp plus 2 max 4) OK tb@
2022-07-08Pass path_id_tx to the Adj-RIB-OutClaudio Jeker
Adjust prefix_adjout_update() to properly handle path_id_tx. Move the lookup of the prefix out of prefix_adjout_update() and to up_generate_updates(). While that code uses prefix_adjout_lookup() to find the current prefix in the Adj-RIB-Out and add-path aware function will use prefix_adjout_get(). In up_generate_default() just use 0 for path_id_tx since for this peer that is the only prefix installed into the Adj-RIB-Out. OK tb@
2022-07-08Assign a local path_id to all prefixesClaudio Jeker
For add-path a unique path_id needs to be assigne to all prefixes. Use a random number since the RFC explicitly mentions that there is no meaning what the value means. The local path_id is inherited to all the RIBs. Adj-RIB-Out handling is not yet down. OK tb@
2022-07-07Introduce a decision metric (dmetric) that classifies the relation ofClaudio Jeker
this prefix with respect to its previous one. Currently the plan is to distinguish the best prefix (only one), ecmp prefixes (currently the same as as-wide-multipath), as-wide-multipath prefixes, valid prefixes and invalid prefixes. This information will be used to implement add-path send but also for ecmp support in bgpd. OK tb@
2022-07-07Refactor the code that generates updates so that up_generate_updates isClaudio Jeker
only called in one spot. rde_generate_updates() gets a enum eval_mode argument to discern the different cases. peer_generate_update() uses the eval_mode to skip the update if it is not needed. While there also add an extra AID check in IMSG_REFRESH case to make sure the requested AID is actually available for this peer. OK tb@
2022-06-27Add support for RFC 9234 - Route Leak Prevention and Detection Using RolesClaudio Jeker
With this it is possible to send a role in the OPEN message and if that was successful the RDE will add the new OTC attribute if necessary. OK tb@
2022-05-31Implement a max communities filter matchClaudio Jeker
When max-communities X is set on a filterrule the filter will match when more than X communities are present in the path. In other words max-communities 0 means no communities are allowed and max-communities 3 limits it up to 3 communities. There is max-communities, max-ext-communities and max-large-communities for each of the 3 community attributes. These three max checks can be used together. OK tb@ job@
2022-05-25Fix non-transitive extended community handling.Claudio Jeker
First of all the detection logic was totally wrong. Then filter out non-transitive extended communities when received from an ebgp peer. Also cleanup the type handling of ext-communities. Mainly to not have to handle the transitive vs non-transitive versions the type is masked with EXT_COMMUNITY_VALUE before doing the switch case for the various types. With this my test using ext-communities works. OK tb@
2022-03-22Switch from a LIST to TAILQ for the structure to store prefixes on aClaudio Jeker
rib_entry. Mostly mechanical, this simplifies prefix_insert() and prefix_remove() since the redo queue can now just use TAILQ_INSERT_TAIL(). rde_softreconfig_sync_reeval() needs to use TAILQ_CONCAT() to move the list of prefixes over to the local TAILQ_HEAD to reapply them later. OK tb@
2022-03-21Remove the active prefix cache in struct rib_entry. I need the spaceClaudio Jeker
and it also makes less sense to track this with ECMP or add-path. Replace the re->active access with prefix_best(re) which does the check on the spot. Feedback and OK tb@
2022-03-21Adjust how RIB are reloaded when their flags (esp. no evaluate) changes.Claudio Jeker
First flush all affected Adj-RIB-Out and then in a second step re-evaluate the RIB itself. The no evaluate case becomes simpler. Fix the way prefixes are re-evaluated, the list remove needs to be explict and not part of prefix_evaluate() as in most other cases since this list is not part of the rib_entry. OK tb@
2022-03-15Replace the eor member of struct prefix with a flag. Saves a byte thatClaudio Jeker
will be reused soon. OK denis@ tb@
2022-03-02Adapt prefix_adjout_update() the same way as prefix_adjout_withdraw()Claudio Jeker
and to the accounting in the function. OK tb@
2022-03-02Correct prefix_adjout_destroy() prototypeClaudio Jeker
2022-03-02Refactor prefix_adjout_withdraw()Claudio Jeker
Just pass the prefix to be withdrawn to the function and move the lookup up. Adjust how the various accounting vars are updated so that the values are decremented in the right cases. Do the same accounting dance for prefix_adjout_destroy(). Adjust rde_up_flush_upcall() to directly call prefix_adjout_withdraw() without calling it via up_generate_updates(). OK tb@
2022-02-25For add-path send the Adj-RIB-Out needs to handle multiple paths perClaudio Jeker
prefix. For this extend the RB trees of the Adj-RIB-Out to also consider the path_id. Add functions to lookup a prefix without path_id so that bgpctl works. Rename functions so that all Adj-RIB-Out specific functions start with prefix_adjout_ For now the path_id_tx in the Adj-RIB-Out is forced to 0 since up_generate_updates() is not ready to handle more than one path per prefix. OK tb@
2022-02-06Switch from u_intX_t types to stdint.h uintX_t. Mostly mechanical withClaudio Jeker
a few reindents. OK florian@ tb@
2021-08-09Implement reception of multiple paths per BGP session. This is oneClaudio Jeker
side of RFC7911 and the send portion will follow. The path-id is extracted from the NLRI encoding an put into struct prefix. To do this the prefix_by_peer() function gets a path-id argument. If a session is not path-id enabled this argument will be always 0. If a session is path-id enabled the value is taken from the NLRI and can be anything, including 0. The value has no meaning in itself. Still to make sure the decision process is able to break a tie the path-id is checked as the last step (this is not part of the RFC but required). OK benno@
2021-07-27Restructure struct prefix a bit and move the rib pointer to the unionClaudio Jeker
that splits the normal RIB linkage vs the adjrib-out linkage. This is done to make a bit of space to put an extra add-path related id into the struct without blowing its size over 128 bytes. Long run this struct should be split up but the necessary changes are too large right now so this is the 2nd best option. OK benno@
2021-06-17Implement RFC 7313 enhanced route refresh. It is off by default andClaudio Jeker
can be enabled with 'announce enhanced refresh yes' Similar to graceful restart this allows to mark routes as stale, refresh them and the flush out routes that are still stale. Enhanced route refresh uses a begin of rr and a end of rr message to signal the various stages. A future enhancement would be the addition of a timeout in case the EoRR message is not sent in reasonable time. OK denis@ job@
2021-05-27Rename and move functions used to get per-peer settings to theClaudio Jeker
hopefully better names peer_has_as4byte() and peer_accept_no_as_set(). Move them to rde_peer.c where all other peer functions live. OK sthen@
2021-05-06Improve reload behaviour of RDE peer flags and export_type.Claudio Jeker
Add an extra reload barrier (IMSG_RECONF_DRAIN) to the sync of the peer config from the session engine to the rde. Necessary to ensure that the peer config is up to date in the RDE before hitting reconfiguration. Store the export_type and the peer flags outside of peer->conf. Adjust all users of these two fields so they only look at the copies in peer. During reload check the values with the peer->conf to check for changes. If the export_type or the rde evaluate or transparent-as flags changed flush the Adj-RIB-Out for that peer and in a 2nd step rebuild the RIB from scratch. This results in a lot of UPDATE churn but these configs are not altered often. Fix multiple issues in the rde_softreconfig_in_done handler that resulted in multiple runs of the out stage of the softreconfig pipeline. OK benno@
2021-03-02Introduce 'rde evaluate all' a mode to work around path hiding in IXPClaudio Jeker
route-server environments. By default only the best path is sent to peers and if that path is filtered then the path is hidden for that peer. On route-servers this is sometimes not desried. For this 'rde evaluate all' will cause the evaluation process to fall back to alternate routes and will redistribute the first non-filtered path to the peer. This is very similar to per-peer RIBs but accomplishes the same effect without the massive increase in memory usage. Compared to the default mode this requires more CPU resources but it is probably less than what per-peer RIBs would require. 'rde evaluate all' can be set and reset globally, on groups and on idividual neighbors. It is not limited to route-server configs but route loops are possible if not properly used. OK benno@
2021-01-13Extend prefix_evaluate() to also be used when withdrawing a prefix.Claudio Jeker
Doing the LIST_REMOVE() outside of prefix_evalute() is no longer valid. As a benefit it is now simply possible to re-evaluate a prefix by passing it to prefix_evaluate() for both removal and insertion. prefix_evaluate() will then take care to ensure that a update is sent out if necessary. Also move rde_send_kroute() call to rde_generate_updates() to make it a bit easier to plug this module into a regress test. OK denis@
2020-12-04Reference count prefixes added to a pftable. This allows to exportClaudio Jeker
prefixes from multiple sessions into the same table. Before a prefix was removed from the table on the first withdraw (even though there was an alternative around). Requested by, tested and OK dlg@
2020-06-05Remove redundant codedenis
Reported by Prof. Dr. Steffen Wendzel <wendzel @ hs-worms . de>, thanks! OK martijn@ sthen@
2020-01-24Implement 'max-prefix NUM out' to limit the number of announced prefixes.Claudio Jeker
This is an easy safety switch to not leak full tables to upstreams and peers. If the limit is hit a Cease notification is sent and the session is closed. This implements most of https://tools.ietf.org/html/draft-sa-idr-maxprefix-00 OK job@
2020-01-09Instead of calling SipHash24_Update() in path_hash for each element ofClaudio Jeker
struct rde_aspath define aspath_hashstart and aspath_hashend and update all values in one call. Inspired by struct process and its ps_startcopy. OK deraadt@