summaryrefslogtreecommitdiff
path: root/sys/netinet
AgeCommit message (Collapse)Author
2022-03-22For raw IP packets rip_input() traverses the loop of all PCBs. FromAlexander Bluhm
there it calls sbappendaddr() while holding the raw table mutex. This ends in sorwakeup() where we finally grab the kernel lock while holding a mutex. Witness detects this misuse. Use the same solution as for PCB notify. Collect the affected PCBs in a temporary list. The list is protected by exclusive net lock. syzbot+ebe3f03a472fecf5e42e@syzkaller.appspotmail.com OK claudio@
2022-03-22Fix whitespace.Alexander Bluhm
2022-03-21For multicast and broadcast packets udp_input() traverses the loopAlexander Bluhm
of all UDP PCBs. From there it calls udp_sbappend() while holding the UDP table mutex. This ends in sorwakeup() where we finally grab the kernel lock while holding a mutex. Witness detects this misuse. Use the same solution as for PCB notify. Collect the affected PCBs in a temporary list. The list is protected by exclusive net lock. Reported-by: syzbot+7596cb96fb9f3c9d6f4f@syzkaller.appspotmail.com OK sashan@
2022-03-21Fix whitespace. Wrap long lines. Adjust outdated comment.Alexander Bluhm
2022-03-21Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutexAlexander Bluhm
for PCB tables. It does not break userland build anymore. pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To run pf in parallel, make parts of the stack MP safe. Protect the list and hashes in the PCB tables with a mutex. Note that the protocol notify functions may call pf via tcp_output(). As the pf lock is a sleeping rw_lock, we must not hold a mutex. To solve this for now, collect these PCBs in inp_notify list and protect it with exclusive netlock. OK sashan@
2022-03-21call in_pcbselsrc from rip_output so route sourceaddr can take effect.David Gwynne
previously things that used sendto or similar with raw sockets would ignore any configured sourceaddr. this made it inconsistent with other traffic, which in turn makes things confusing to debug if you're using ping or traceroute (which use raw sockets) to figure out what's happening to other packets. the ipv6 equiv already does this too. ok sthen@ claudio@
2022-03-21treat 255.255.255.255 like an mcast address in in_pcbselsrc.David Gwynne
this allows the IP_MULTICAST_IF sockopt to specify which address you want to send a limited broadcast (255.255.255.255) packet out of. requested by and ok claudio@
2022-03-20Include sys/mutex.h from netinet/in_pcb.h. Struct mutex will beAlexander Bluhm
needed to make inpcb in kernel MP safe. To build sysctl and libkvm based programs, we have to export it to userland. OK claudio@
2022-03-14Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ulTheo Buehler
This reverts the commit protecting the list and hashes in the PCB tables with a mutex since the build of sysctl(8) breaks, as found by kettenis. ok sthen
2022-03-14pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. ToAlexander Bluhm
run pf in parallel, make parts of the stack MP safe. Protect the list and hashes in the PCB tables with a mutex. Note that the protocol notify functions may call pf via tcp_output(). As the pf lock is a sleeping rw_lock, we must not hold a mutex. To solve this for now, collect these PCBs in inp_notify list and protect it with exclusive netlock. OK sashan@
2022-03-13Hrvoje has hit a crash with IPsec acquire while testing the parallelAlexander Bluhm
IP forwarding diff. Add mutex and refcount to make memory management of struct ipsec_acquire MP safe. testing Hrvoje Popovski; input sashan@; OK mvs@
2022-03-10Use atomic load and store functions to access refcnt and waitAlexander Bluhm
variables. Although not necessary everywhere, using atomic functions exclusively for variables marked as atomic is clearer. OK mvs@ visa@
2022-03-08In IPsec policy replace integer refcount with atomic refcount.Alexander Bluhm
OK tobhe@ mvs@
2022-03-06Usually we check ipsec_in_use as shortcut to avoid IPsec lookups,Alexander Bluhm
but that does not work when coming from tcp_output() as inp != NULL. This seems to be done to block packets from sockets with options in inp_seclevel. But instead of doing the route lookup, go directly to ipsp_spd_inp() where the socket policy checks are done. Calling rtable_l2() before the shortcut also costs a bit, do it when needed. OK tobhe@
2022-03-04in_addmulti() is only called from ioctl(2) or setsockopt(2). WaitAlexander Bluhm
for malloc(9) to make the system call reliable. OK mvs@
2022-03-04in_pcbinit() is called during boot. There malloc(9) cannot fail,Alexander Bluhm
but would panic instead of waiting. Remove needless error handling. OK mvs@
2022-03-02Use NULL instead of 0 for pointer.Alexander Bluhm
2022-03-02Merge two comments describing the locks into one.Alexander Bluhm
2022-03-02The return value of in6_pcbnotify() is never used. Make it a voidAlexander Bluhm
function. OK gnezdo@ mvs@ florian@ sashan@
2022-03-01Remove outdated comment about v4-mapped v6 addresses. They are notAlexander Bluhm
supported anymore.
2022-02-25Reported-by: syzbot+1b5b209ce506db4d411d@syzkaller.appspotmail.comPhilip Guenther
Revert the pr_usrreqs move: syzkaller found a NULL pointer deref and I won't be available to monitor for followup issues for a bit
2022-02-25Move pr_attach and pr_detach to a new structure pr_usrreqs that canPhilip Guenther
then be shared among protosw structures, following the same basic direction as NetBSD and FreeBSD for this. Split PRU_CONTROL out of pr_usrreq into pru_control, giving it the proper prototype to eliminate the previously necessary casts. ok mvs@ bluhm@
2022-02-22Delete unnecessary #includes of <netinet6/ip6protosw.h>: some neverPhilip Guenther
needed it and some no longer need it after moving the externs from there to <sys/protosw.h> ok jsg@
2022-02-22Delete unnecessary #includes of <sys/domain.h> and/or <sys/protosw.h>Philip Guenther
net/if_pppx.c pointed out by jsg@ ok gnezdo@ deraadt@ jsg@ mpi@ millert@
2022-02-16rewrite vxlan to better fit the current kernel infrastructure.David Gwynne
the big change is removing the integration with and reliance on bridge(4) for learning vxlan endpoints. we have the etherbridge layer now (which is used by veb, nvgre, bpe, etc) so vxlan can operate independently of bridge(4) (or any other driver) while still dynamically learning about other endpoints. vxlan now uses the udp socket upcall mechanism to receive packets. this means it actually creates and binds udp sockets to use rather adding code in the udp layer for stealing packets from the udp layer. i think it's also important to note that this adds loop prevention to the code. this stops a vxlan interface being used to transmit a packet that was encapsulated in itself. i want to clear this out of my tree where it's been sitting for nearly a year. noone seems too concerned with the change either way. ok claudio@
2022-02-01When a struct ipovly needs to be computed and checksummed in in4_cksum(),Miod Vallat
do not bother operating on its first 8 bytes, which will always be zero. ok visa@
2022-01-25Capture a repeated pattern into sysctl_securelevel_int functionGreg Steuck
A few variables in the kernel are only writeable before securelevel is raised. It makes sense to handle them with less code. OK sthen@ bluhm@
2022-01-23Define all TCP TF_ flags as unsigned numbers. They are stored inAlexander Bluhm
u_int t_flags. Shifting TF_TIMER with TCPT_DELACK can touch the sign bit. found by kubsan; suggested by deraadt@; OK miod@
2022-01-20Shifting signed integers left by 31 is undefined behavior in C.Alexander Bluhm
found by kubsan; joint work with tobhe@; OK miod@
2022-01-04Add `ipsec_flows_mtx' mutex(9) to protect `ipsp_ids_*' list andYASUOKA Masahiko
trees. ipsp_ids_lookup() returns `ids' with bumped reference counter. original diff from mvs ok mvs
2022-01-02spellingJonathan Gray
ok jmc@ reads ok tb@
2021-12-23Remove unused variables and assignments in ah and esp output.Alexander Bluhm
found by clang 13; OK tobhe@
2021-12-23IPsec is not MP safe yet. To allow forwarding in parallel withoutAlexander Bluhm
dirty hacks, it is better to protect IPsec input and output with kernel lock. Not much is lost as crypto needs the kernel lock anyway. From here we can refine the lock later. Note that there is no kernel lock in the SPD lockup path. Goal is to keep that lock free to allow fast forwarding with non IPsec traffic. tested by Hrvoje Popovski; OK tobhe@
2021-12-22Consolidate enc_getif() lookups in IPsec input path to save one lookupTobias Heider
per packet and improve readability. ok bluhm@
2021-12-20Remove unused variable 'clen'.Tobias Heider
ok bluhm@
2021-12-20Use per-CPU counters for tunnel descriptor block (TDB) statistics.Vitaliy Makkoveev
'tdb_data' struct became unused and was removed. Tested by Hrvoje Popovski. ok bluhm@
2021-12-20Fix function name in panic string.Alexander Bluhm
2021-12-19There are occasions where the walker function in tdb_walk() mightAlexander Bluhm
sleep. So holding the tdb_sadb_mtx() when calling walker() is not allowed. Move the TDB from the TDB-Hash to a temporary list that is protected by netlock. Then unlock tdb_sadb_mtx and traverse the list to call the walker. OK mvs@
2021-12-16Fix a tiny race in tdb_delete() between TDBF_DELETED, tdb_unlink()Alexander Bluhm
and tdb_cleanspd(). gettdb...() can return a TDB before tdb_unlink(). Then ipsp_spd_lookup() could add it to tdb_policy_head after tdb_cleanspd(). There it would stay until it hits the kassert in tdb_free(). OK tobhe@
2021-12-15structure pads can leak uninitialized memory to userland via copyout,Theo de Raadt
therefore the mandatory idiom is completely clearing structs before building them for copyout -- that means ALMOST ALL STRUCTS, because we never know when some architecture will pad a struct.. In two more cases, the clearing wasn't performed. from Reno Robert ZDI ok millert bluhm
2021-12-15Syzkaller found a dereference in igmp_leavegroup() where inm->inm_rtiAlexander Bluhm
is NULL. It should be set in rti_fill(), but is not if malloc(9) fails. There is no rollback after malloc failure so the field stays uninitialized. The code is only called from ioctl, setsockopt or a task. Malloc should wait instead of failing, otherwise syscalls would be unreliable. While there also put an M_WAIT in the init code. During init malloc must not fail. OK mvs@ Reported-by: syzbot+e22326057ccf34908d78@syzkaller.appspotmail.com
2021-12-14Correct value for IPTOS_DSCP_LE since it needs to allow for the preceedingDarren Tucker
two ECN bits. From daisuke.higashi at gmail.com via OpenSSH bz#3373, ok claudio@, job@, djm@.
2021-12-14To cache lookups, the policy ipo is linked to its SA tdb. ThereAlexander Bluhm
is also a list of SAs that belong to a policy. To make it MP safe, protect these pointers with a mutex. tested by Hrvoje Popovski; OK mvs@
2021-12-11Protect the write access to the TDB flags field with a mutex perAlexander Bluhm
TDB. Clearing the timeout flags just before pool put in tdb_free() does not make sense. Move this to tdb_delete(). While there make the parentheses in the flag check consistent. tested by Hrvoje Popovski; OK tobhe@
2021-12-08Start documenting the locking strategy of struct tdb fields. NoteAlexander Bluhm
that gettdb_dir() is MP safe now. Add the tdb_sadb_mtx mutex in udpencap_ctlinput() to protect the access to tdb_snext. Make the braces consistently for all these TDB loops. Move NET_ASSERT_LOCKED() into the functions where the read access happens. OK mvs@
2021-12-07In ipo_tdb the flow contains a reference counted TDB cache. ThisAlexander Bluhm
may prevent that tdb_free() is called. It is not a real leak as ipsecctl -F or termination of iked flush this cache when they remove the IPsec policy. Move the code from tdb_free() to tdb_delete(), then the kernel does the cleanup itself. OK mvs@ tobhe@
2021-12-03Add tdb_delete_locked() to replace duplicate tdb deletion code inTobias Heider
pfkey_flush(). ok bluhm@ mvs@
2021-12-03Add TDB reference counting to ipsp_spd_lookup(). If an outputAlexander Bluhm
pointer is passed to the function, it will return a refcounted TDB. The ref happens when ipsp_spd_inp() copies the pointer from ipo->ipo_tdb. The caller of ipsp_spd_lookup() has to unref after using it. tested by Hrvoje Popovski; OK mvs@ tobhe@
2021-12-02ipsec_common_input_cb() extracted the inner IP header of IPsecAlexander Bluhm
tunnels. It is never used, so this is useless code. Remove ipn and ip6n IP header variables and the m_copydata() to fill them. OK mvs@ kn@ sthen@
2021-12-02Allow to build kernel without IPSEC or INET6 defines.Alexander Bluhm
OK mpi@ mvs@