summaryrefslogtreecommitdiff
path: root/sys/netinet/ip_ipsp.h
AgeCommit message (Collapse)Author
2024-04-17Use struct ipsec_level within inpcb.Alexander Bluhm
Instead of passing around u_char[4], introduce struct ipsec_level that contains 4 ipsec levels. This provides better type safety. The embedding struct inpcb is globally visible for netstat(1), so put struct ipsec_level outside of #ifdef _KERNEL. OK deraadt@ mvs@
2023-11-26Remove inp parameter from ip_output().Alexander Bluhm
ip_output() received inp as parameter. This is only used to lookup the IPsec level of the socket. Reasoning about MP locking is much easier if only relevant data is passed around. Convert ip_output() to receive constant inp_seclevel as argument and mark it as protected by net lock. OK mvs@
2023-10-11Prevent deref-after-free when tdb_timeout() fires on invalid new tdb.Tobias Heider
When receiving a pfkeyv2 SADB_ADD message, a newly created tdb can fail in tdb_init(), which causes the tdb to not get added to the global tdb list and an immediate dereference. If a lifetime timeout triggers on this tdb, it will unconditionally try to remove it from the list and in the process deref once more than allowed, causing a one bit corruption in the already freed up slot in the tdb pool. We resolve this issue by moving timeout_add() after tdb_init() just before puttdb(). This means tdbs failing initialization get discarded immediately as they only hold a single reference. Valid tdbs get their timeouts activated just before we add them to the tdb list, meaning the timeout can safely assume they are linked. Feedback from mvs@ and millert@ ok mvs@ mbuhl@
2023-08-07start adding support for route-based ipsec vpns.David Gwynne
rather than use ipsec flows (aka, entries in the ipsec security policy database) to decide which traffic should be encapsulated in ipsec and sent to a peer, this tweaks security associations (SAs) so they can refer to a tunnel interface. when traffic is routed over that tunnel interface, an ipsec SA is looked up and used to encapsulate traffic before being sent to the peer on the SA. When traffic is received from a peer using an interface SA, the specified interface is looked up and the packet is handed to it so it looks like packets come out of the tunnel. to support this, SAs get a TDBF_IFACE flag and iface and iface_dir fields. When TDBF_IFACE is set the iface and dir fields are considered valid, and the tdb/SA should be used with the tunnel interface instead of the SPD. support from many including markus@ tobhe@ claudio@ sthen@ patrick@ now is a good time deraadt@
2023-07-06big update to pfsync to try and clean up locking in particular.David Gwynne
moving pf forward has been a real struggle, and pfsync has been a constant source of pain. we have been papering over the problems for a while now, but it reached the point that it needed a fundamental restructure, which is what this diff is. the big headliner changes in this diff are: - pfsync specific locks this is the whole reason for this diff. rather than rely on NET_LOCK or KERNEL_LOCK or whatever, pfsync now has it's own locks to protect it's internal data structures. this is important because pfsync runs a bunch of timeouts and tasks to push pfsync packets out on the wire, or when it's handling requests generated by incoming pfsync packets, both of which happen outside pf itself running. having pfsync specific locks around pfsync data structures makes the mutations of these data structures a lot more explicit and auditable. - partitioning to enable future parallelisation of the network stack, this rewrite includes support for pfsync to partition states into different "slices". these slices run independently, ie, the states collected by one slice are serialised into a separate packet to the states collected and serialised by another slice. states are mapped to pfsync slices based on the pf state hash, which is the same hash that the rest of the network stack and multiq hardware uses. - no more pfsync called from netisr pfsync used to be called from netisr to try and bundle packets, but now that there's multiple pfsync slices this doesnt make sense. instead it uses tasks in softnet tqs. - improved bulk transfer handling there's shiny new state machines around both the bulk transmit and receive handling. pfsync used to do horrible things to carp demotion counters, but now it is very predictable and returns the counters back where they started. - better tdb handling the tdb handling was pretty hairy, but hrvoje has kicked this around a lot with ipsec and sasyncd and we've found and fixed a bunch of issues as a result of that testing. - mpsafe pf state purges this was committed previously, but because the locks pfsync relied on weren't clear this just caused a ton of bugs. as part of this diff it's now reliable, and moves a big chunk of work out from under KERNEL_LOCK, which in turn improves the responsiveness and throughput of a firewall even if you're not using pfsync. there's a bunch of other little changes along the way, but the above are the big ones. hrvoje has done performance testing with this diff and notes a big improvement when pfsync is not in use. performance when pfsync is enabled is about the same, but im hoping the slices means we can scale along with pf as it improves. lots (months) of testing by me and hrvoje on pfsync boxes tests and ok sashan@ deraadt@ says this is a good time to put it in
2022-07-14Use capital letters for global ipsec(4) locks description. Use 'D'Vitaliy Makkoveev
instead of 's' for `tdb_sadb_mtx' mutex(9) because this is 'D'atabase. No functional changes. ok bluhm@
2022-04-30When performing ipsp_ids_free(), grab `ipsec_flows_mtx' mutex(9) before doVitaliy Makkoveev
`id_refcount' decrement. This should be consistent with `ipsp_ids_gc_list' list modifications, otherwise concurrent ipsp_ids_insert() could remove this dying `ids' from the list before if was placed there by ipsp_ids_free(). This makes atomic operations with `id_refcount' useless. Also prevent ipsp_ids_lookup() to return dying `ids'. ok bluhm@
2022-04-21Introduce a dedicated link entries for snapshots in pfsync(4). The purposeAlexandr Nedvedicky
of snapshots is to allow pfsync(4) to move items from global lists to local lists (a.k.a. snapshots) under a mutex protection. Snapshots are then processed without holding any mutexes. Such idea does not fly well if link entry is currently used for global lists as well as snapshots. Feedback by bluhm@ Credits also goes to hrvoje@ for extensive testing. OK bluhm@
2022-03-13Hrvoje has hit a crash with IPsec acquire while testing the parallelAlexander Bluhm
IP forwarding diff. Add mutex and refcount to make memory management of struct ipsec_acquire MP safe. testing Hrvoje Popovski; input sashan@; OK mvs@
2022-03-08In IPsec policy replace integer refcount with atomic refcount.Alexander Bluhm
OK tobhe@ mvs@
2022-03-02Merge two comments describing the locks into one.Alexander Bluhm
2022-01-04Add `ipsec_flows_mtx' mutex(9) to protect `ipsp_ids_*' list andYASUOKA Masahiko
trees. ipsp_ids_lookup() returns `ids' with bumped reference counter. original diff from mvs ok mvs
2021-12-20Use per-CPU counters for tunnel descriptor block (TDB) statistics.Vitaliy Makkoveev
'tdb_data' struct became unused and was removed. Tested by Hrvoje Popovski. ok bluhm@
2021-12-19There are occasions where the walker function in tdb_walk() mightAlexander Bluhm
sleep. So holding the tdb_sadb_mtx() when calling walker() is not allowed. Move the TDB from the TDB-Hash to a temporary list that is protected by netlock. Then unlock tdb_sadb_mtx and traverse the list to call the walker. OK mvs@
2021-12-14To cache lookups, the policy ipo is linked to its SA tdb. ThereAlexander Bluhm
is also a list of SAs that belong to a policy. To make it MP safe, protect these pointers with a mutex. tested by Hrvoje Popovski; OK mvs@
2021-12-11Protect the write access to the TDB flags field with a mutex perAlexander Bluhm
TDB. Clearing the timeout flags just before pool put in tdb_free() does not make sense. Move this to tdb_delete(). While there make the parentheses in the flag check consistent. tested by Hrvoje Popovski; OK tobhe@
2021-12-08Start documenting the locking strategy of struct tdb fields. NoteAlexander Bluhm
that gettdb_dir() is MP safe now. Add the tdb_sadb_mtx mutex in udpencap_ctlinput() to protect the access to tdb_snext. Make the braces consistently for all these TDB loops. Move NET_ASSERT_LOCKED() into the functions where the read access happens. OK mvs@
2021-12-07In ipo_tdb the flow contains a reference counted TDB cache. ThisAlexander Bluhm
may prevent that tdb_free() is called. It is not a real leak as ipsecctl -F or termination of iked flush this cache when they remove the IPsec policy. Move the code from tdb_free() to tdb_delete(), then the kernel does the cleanup itself. OK mvs@ tobhe@
2021-12-03Add tdb_delete_locked() to replace duplicate tdb deletion code inTobias Heider
pfkey_flush(). ok bluhm@ mvs@
2021-12-01Reintroduce the TDBF_DELETED flag. Checking next pointer to figureAlexander Bluhm
out whether the TDB is linked to the hash bucket does not work. This fixes removal of SAs that could not be flushed with ipsecctl -F. OK tobhe@
2021-12-01Let ipsp_spd_lookup() return an error instead of a TDB. The TDBAlexander Bluhm
is not always needed, but the error value is necessary for the caller. As TDB should be refcounted, it makes not sense to always return it. Pass an output pointer for the TDB which can be NULL. OK mvs@ tobhe@
2021-11-30Remove unused parameter from ipsp_spd_inp().Alexander Bluhm
OK mvs@ yasuoka@
2021-11-26Replace TDBF_DELETED flag with check if tdb was already unlinked.Tobias Heider
Protect tdb_unlink() and puttdb() for SADB_UPDATE with tdb_sadb_mutex. Tested by Hrvoje Popovski ok bluhm@ mvs@
2021-11-25Implement reference counting for IPsec tdbs. Not all cases areAlexander Bluhm
covered yet, more ref counts to come. The timeouts are protected, so the racy tdb_reaper() gets retired. The tdb_policy_head, onext and inext lists are protected. All gettdb...() functions return a tdb that is ref counted and has to be unrefed later. A flag ensures that tdb_delete() is called only once. Tested by Hrvoje Popovski; OK sthen@ mvs@ tobhe@
2021-11-21Add the new `ipsec_exctdb' ipsec(4) counter to count and expose to theVitaliy Makkoveev
userland the TDBs which exceeded hard limit. Also the `ipsec_notdb' counter description in header doesn't math to netstat(1) description. We never count `ipsec_notdb' and the netstat(1) description looks more appropriate so it's used to avoid confusion with the new counter. ok bluhm@
2021-11-16To debug IPsec and tdb refcounting it is useful to have "show tdb"Alexander Bluhm
and "show all tdbs" in ddb. tested by Hrvoje Popovski; OK mvs@
2021-10-25Call a locked variant of tdb_unlink() from tdb_walk(). Fixes aAlexander Bluhm
mutex locking against myself panic introduced by my previous commit. OK beck@ patrick@
2021-10-24Merge esp_input_cb() intp esp_input().Tobias Heider
ok bluhm@
2021-10-24Remove code duplication by merging the v4 and v6 input functionsAlexander Bluhm
for ah, esp, and ipcomp. Move common code into ipsec_protoff() which finds the offset of the next protocol field in the previous header. OK tobhe@
2021-10-24Refactor ah_input() and ah_output() for new crypto API.Tobias Heider
ok bluhm@
2021-10-24Refactor ipcomp_input() and ipcomp_output(). Remove obsolete code relatedTobias Heider
to old crypto API. ok bluhm@
2021-10-24There are more m_pullup() in IPsec input. Pass down the pointerAlexander Bluhm
to the mbuf to update it globally. At the end it will reach ip_deliver() which expects a pointer to an mbuf. OK sashan@
2021-10-24Remove 'struct tdb_crypto' allocations from esp_input() and esp_output().Tobias Heider
This was needed to pass arguments to the callback function, but is no longer necessary after the API makeover. ok bluhm@
2021-10-23There is an m_pullup() down in AH input. As it may free or changeAlexander Bluhm
the mbuf, the callers must be careful. Although there is no bug, use the common pattern to handle this. Pass down an mbuf pointer mp and let m_pullup() update the pointer in all callers. It looks like the tcp signature functions should not be called. Avoid an mbuf leak and return an error. OK mvs@
2021-10-23Retire asynchronous crypto API as it is no longer required by any driver andTobias Heider
adds unnecessary complexity. Dedicated crypto offloading devices are not common anymore. Modern CPU crypto acceleration works synchronously, eliminating the need for callbacks. Replace all occurrences of crypto_dispatch() with crypto_invoke(), which is blocking and only returns after the operation has completed or an error occured. Invoke callback functions directly from the consumer (e.g. IPsec, softraid) instead of relying on the crypto driver to call crypto_done(). ok bluhm@ mvs@ patrick@
2021-10-13The function ipip_output() was registered as .xf_output() xformAlexander Bluhm
function. But was is never called via this pointer. It would have immediatley crashed as mp is always NULL when called via .xf_output(). Do not set .xf_output to ipip_output. This allows to pass only the parameters which are actually needed and the control flow is clearer. OK mpi@
2021-10-05Cleanup the error handling in ipsec ipip_output() and consistentlyAlexander Bluhm
goto drop instead of return. An ENOBUFS should be EINVAL in IPv6 case. Also use combined packet and byte counter. OK sthen@ dlg@
2021-10-05Move setting ipsec mtu into a function. The NULL and invalid checkAlexander Bluhm
in ipsec_common_ctlinput() is not necessary, the loop in ipsec_set_mtu() does that anyway. udpencap_ctlinput() did not work for bundled SA, this also needs the loop in ipsec_set_mtu(). OK sthen@
2021-09-29Global variables to track initialisation behave poorly with MP.Alexander Bluhm
Move the tdb pool init into an init function. OK mvs@
2021-08-10Remove unused `ipa_pcb' from 'ipsec_acquire' structure.mvs
ok gnezdo@
2021-07-27Revert "Use per-CPU counters for tunnel descriptor block" diff.mvs
Panic reported by Hrvoje Popovski.
2021-07-26Use per-CPU counters for tunnel descriptor block (tdb) statistics.mvs
'tdb_data' struct became unused and was removed. ok bluhm@
2021-07-18Introduce and use garbage collector for 'ipsec_ids' struct entitiesmvs
destruction instead of using per-entity timeout. This fixes the races between ipsp_ids_insert(), ipsp_ids_free() and ipsp_ids_timeout(). ipsp_ids_insert() can't stop ipsp_ids_timeout() timeout handler which is already running and awaiting netlock to be released, so reused `ids' will be silently removed in this case. ipsp_ids_free() can't determine is ipsp_ids_timeout() timeout handler running because timeout_del(9) called by ipsp_ids_insert() clears it's triggered state. So ipsp_ids_timeout() could be scheduled to run twice in this case. Also hrvoje@ reported about ipsec(4) throughput increased with this diff so it seems we caught significant count of ipsp_ids_insert() races. tests and feedback by hrvoje@ ok bluhm@
2021-07-18The IPsec authentication before decryption used a different replayAlexander Bluhm
counter than after decryption. This could result in "esp_input_cb: authentication failed for packet in SA" errors. As we run crypto operations async, thousands of packets are stored in the crypto task. During the queueing the replay counter of the tdb can change. Then the higher 32 bits may increment although the lower 32 bits did not wrap. checkreplaywindow() must be called twice per packet with the same replay counter. Store the value in struct tdb_crypto while dangling in the task queue and doing crypto operations. tested by Hrvoje Popovski; joint work with tobhe@
2021-07-13Remove unused `PolicyHead' from 'sockaddr_encap' structure.mvs
ok tobhe@
2021-07-08The xformsw array never changes. Declare struct xformsw constantAlexander Bluhm
and map data read only. OK deraadt@ mvs@ mpi@
2021-07-08The properties of the crypto algorithms never change. Declare themAlexander Bluhm
constant. Then they are mapped as read only. OK deraadt@ dlg@
2021-07-07Fix whitespaces in IPsec code.Alexander Bluhm
2021-05-04Initialize `ipsec_policy_pool' within pfkey_init() instead of doing thatmvs
in runtime within pfkeyv2_send(). Also set it's interrupt protection level to IPL_SOFTNET. ok bluhm@ mpi@
2020-11-05Enable support for ASN1_DN ipsec identifiers.Peter Hessler
Tested with multiple Window 10 Pro (ver 2004) clients, and OpenBSD+iked as the server. OK tobhe@ sthen@ kn@