summaryrefslogtreecommitdiff
path: root/sys/netinet
AgeCommit message (Collapse)Author
2022-06-29Nullify `ipsecflowinfo' when mbuf(9) has no ipsec flowinfo data.Vitaliy Makkoveev
Otherwise we use `ipsecflowinfo' obtained from previous packet. ok claudio@
2022-06-28Use btrace(8) to debug reference counting. dt(4) provides a staticAlexander Bluhm
tracepoint for each type of refcnt we have. As a start, add inpcb and tdb refcnt. When the counter changes, btrace may print the actual object, the current counter, the change value and optionally the stack trace. discussed with visa@; OK mpi@
2022-06-27Push the kernel lock down into arpresolve(). We still need it toAlexander Bluhm
prevent concurrent access to rt_llinfo from rtrequest_delete(). But the common case, when the MAC address is already known, works without lock. tested by Hrvoje Popovski; OK mvs@
2022-06-27Instead of calling getuptime() all the time in ARP code, do it onlyAlexander Bluhm
once per function. This gives a more consistent time value. OK claudio@ miod@ mvs@
2022-06-26The "ifq_set_maxlen(..., 1);" hack we use to enforce pipex(4) relatedVitaliy Makkoveev
(*if_qstart)() be always called with netlock held doesn't work anymore with PPPOE sessions. Introduce `pipex_list_mtx' mutex(9) and use it to protect global pipex(4) lists and radix trees. Protect pipex(4) `session' dereference with reference counters, because we could sleep when accessing pipex(4) from ioctl(2) path, and this is not possible with mutex(9) held. ok bluhm@
2022-06-17The timeout for ipsec acquire does not decrement the referenceAlexander Bluhm
counter to 0 properly. We have one reference count for the lists, and one for the timeout handler. When the timout fires, it has to decrement the reference to itself. Then the ipa is removed from the lists and decremented again. from Stefan Butz; OK tobhe@ mvs@
2022-06-06Simplify solock() and sounlock(). There is no reason to return a valueClaudio Jeker
for the lock operation and to pass a value to the unlock operation. sofree() still needs an extra flag to know if sounlock() should be called or not. But sofree() is called less often and mostly without keeping the lock. OK mpi@ mvs@
2022-05-25Call if_put(9) after we finish with `ia' within ip_getmoptions().Vitaliy Makkoveev
if_put(9) call means we finish work with `ifp' and it could be destroyed. `ia' is the pointer to 'in_ifaddr' data belongs to `ifp', so we need to release corresponding `ifp' after we finish deal with `ia'. `if_addrlist' list destruction and ip_getmoptions() are serialized with kernel and net locks so this is not critical, but looks inconsistent. ok bluhm@
2022-05-15have in_pcbselsrc copy the selected address to memory provided by the caller.David Gwynne
having it return a pointer to something that has a lifetime managed by a lock without accounting for it or taking a reference count or anything like that is asking for trouble. copying the address to caller provded memory while still inside the lock is a lot safer. discussed with visa@ ok bluhm@ claudio@
2022-05-09Protect sbappendaddr() in divert_packet() with kernel lock. WithAlexander Bluhm
divert-packet rules pf calls directly from IP layer to protocol layer. As the former has only shared net lock, additional protection against parallel access is needed. Kernel lock is a temporary workaround until the socket layer is MP safe. discussed with kettenis@ mvs@
2022-05-05Clean up divert_packet(). Function does not return error, make itAlexander Bluhm
void. Introduce mutex and refcounting for inp like in the other PCB functions. OK sashan@
2022-05-05Use static objects for struct rttimer_queue instead of dynamicallyClaudio Jeker
allocate them. Currently there are 6 rttimer_queues and not many more will follow. So change rt_timer_queue_create() to rt_timer_queue_init() which now takes a struct rttimer_queue * as argument which will be initialized. Since this changes the gloabl vars from pointer to struct adjust other callers as well. OK bluhm@
2022-05-05No longer consider IN_EXPERIMENTAL aka 240/4 as not forwardable.Claudio Jeker
We already allow 240/4 in and out so lets allow it through as well. One of many steps to make 240/4 useable. Diff by Seth David Schoen (schoen at loyalty.org) OK bluhm@ djm@
2022-05-04Move rttimer callback function from the rttimer itself to rttimer_queue.Claudio Jeker
All users use the same callback per queue so that makes sense. Also replace rt_timer_queue_destroy() with rt_timer_queue_flush(). OK bluhm@
2022-05-04In ipsp_spd_lookup() rename the parameter tdbp to tdbin as it isAlexander Bluhm
always the incoming TDB that has to be checked. from markus@
2022-05-03Retire CRYPTO_F_MPSAFE it is no longer of any use. The crypto frameworkClaudio Jeker
no longer uses a callback and so there is no need to define the callback as MPSAFE. OK bluhm@
2022-04-30When performing ipsp_ids_free(), grab `ipsec_flows_mtx' mutex(9) before doVitaliy Makkoveev
`id_refcount' decrement. This should be consistent with `ipsp_ids_gc_list' list modifications, otherwise concurrent ipsp_ids_insert() could remove this dying `ids' from the list before if was placed there by ipsp_ids_free(). This makes atomic operations with `id_refcount' useless. Also prevent ipsp_ids_lookup() to return dying `ids'. ok bluhm@
2022-04-30Convert the 2nd rttimer callback from struct rttimer to u_int rtableid.Claudio Jeker
The callback only needs to know the rtableid all the other info from struct rtableid is not needed. Also change the default rttimer callback to only delete routes that are RTF_HOST and RTF_DYNAMIC. This way 2 of the ICMP handlers can use NULL as the callback. OK bluhm@
2022-04-28In the multicast router code don't allocate a rt timer queue for eachClaudio Jeker
rdomain. The rttimer API is rtable/rdomain aware and so there is no need to have so many queues. Also init the two queues (one for IPv4 and one for IPv6) early on. This will allow the rttable code to become simpler. OK bluhm@
2022-04-28Decouple IP input and forwarding from protocol input. This allowsAlexander Bluhm
to have parallel IP processing while the upper layers are still not MP safe. Introduce ip_ours() that enqueues the packets and ipintr() that dequeues and processes them with an exclusive netlock. Note that we still have only one softnet task. Running IP processing on multiple CPU will be the next step. lots of testing Hrvoje Popovski; OK sashan@
2022-04-21Introduce a dedicated link entries for snapshots in pfsync(4). The purposeAlexandr Nedvedicky
of snapshots is to allow pfsync(4) to move items from global lists to local lists (a.k.a. snapshots) under a mutex protection. Snapshots are then processed without holding any mutexes. Such idea does not fly well if link entry is currently used for global lists as well as snapshots. Feedback by bluhm@ Credits also goes to hrvoje@ for extensive testing. OK bluhm@
2022-04-20Route timeout was a mixture of int, u_int and long. Use type intAlexander Bluhm
for timeout, add sysctl bounds checking between 0 and max int, and use time_t for absolute times. Some code assumes that the route timeout queue can be NULL and at some places this was checked. Better make sure that all queues always exist. The pool_get for struct rttimer_queue is only called from initialization and from syscall, so PR_WAITOK is possible. Keep the special hack when ip_mtudisc is set to 0. Destroy the queue and generate an empty one. If redirect timeout is 0, it should not time out. Check the value in IPv6 to make the behavior like IPv4. Sysctl net.inet6.icmp6.redirtimeout had no effect as the queue timeout was not modified. Make icmp6_sysctl() look like icmp_sysctl(). OK claudio@
2022-04-14Relax address availability check for multicast binds.Claudio Jeker
While it makes sense to limit bind(2) of unicast addresses that overlap each other to be all from the same UID (like 0.0.0.0:53 and 127.0.0.1:53) it makes little sense for multicast. Multicast is delivered to all sockets that match so there is no risk of someone stealing traffic from someone else. This should hopefully help with mDNS as reported by robert@ OK deraadt@ bluhm@
2022-03-28if_detach() does if_remove(ifp); NET_LOCK(); rti_delete(). NewAlexander Bluhm
igmp groups may join while sleeping in interface destruction. In this case if_get() in igmp_joingroup() fails and rti_fill() is not called. Then inm->inm_rti may be NULL. This is the condition when syzkaller crashes in igmp_leavegroup(). Pass the ifp the current CPU is already holding down to igmp_joingroup() and igmp_leavegroup() to avoid half constructed igmp groups. Calling if_get() in caller and callee makes no sense anyway. Reported-by: syzbot+146823a676b7bea83649@syzkaller.appspotmail.com OK denis@
2022-03-23Move global variable ripsrc onto stack, it is only used once withinAlexander Bluhm
rip_input(). from dhill@
2022-03-22For raw IP packets rip_input() traverses the loop of all PCBs. FromAlexander Bluhm
there it calls sbappendaddr() while holding the raw table mutex. This ends in sorwakeup() where we finally grab the kernel lock while holding a mutex. Witness detects this misuse. Use the same solution as for PCB notify. Collect the affected PCBs in a temporary list. The list is protected by exclusive net lock. syzbot+ebe3f03a472fecf5e42e@syzkaller.appspotmail.com OK claudio@
2022-03-22Fix whitespace.Alexander Bluhm
2022-03-21For multicast and broadcast packets udp_input() traverses the loopAlexander Bluhm
of all UDP PCBs. From there it calls udp_sbappend() while holding the UDP table mutex. This ends in sorwakeup() where we finally grab the kernel lock while holding a mutex. Witness detects this misuse. Use the same solution as for PCB notify. Collect the affected PCBs in a temporary list. The list is protected by exclusive net lock. Reported-by: syzbot+7596cb96fb9f3c9d6f4f@syzkaller.appspotmail.com OK sashan@
2022-03-21Fix whitespace. Wrap long lines. Adjust outdated comment.Alexander Bluhm
2022-03-21Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutexAlexander Bluhm
for PCB tables. It does not break userland build anymore. pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To run pf in parallel, make parts of the stack MP safe. Protect the list and hashes in the PCB tables with a mutex. Note that the protocol notify functions may call pf via tcp_output(). As the pf lock is a sleeping rw_lock, we must not hold a mutex. To solve this for now, collect these PCBs in inp_notify list and protect it with exclusive netlock. OK sashan@
2022-03-21call in_pcbselsrc from rip_output so route sourceaddr can take effect.David Gwynne
previously things that used sendto or similar with raw sockets would ignore any configured sourceaddr. this made it inconsistent with other traffic, which in turn makes things confusing to debug if you're using ping or traceroute (which use raw sockets) to figure out what's happening to other packets. the ipv6 equiv already does this too. ok sthen@ claudio@
2022-03-21treat 255.255.255.255 like an mcast address in in_pcbselsrc.David Gwynne
this allows the IP_MULTICAST_IF sockopt to specify which address you want to send a limited broadcast (255.255.255.255) packet out of. requested by and ok claudio@
2022-03-20Include sys/mutex.h from netinet/in_pcb.h. Struct mutex will beAlexander Bluhm
needed to make inpcb in kernel MP safe. To build sysctl and libkvm based programs, we have to export it to userland. OK claudio@
2022-03-14Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ulTheo Buehler
This reverts the commit protecting the list and hashes in the PCB tables with a mutex since the build of sysctl(8) breaks, as found by kettenis. ok sthen
2022-03-14pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. ToAlexander Bluhm
run pf in parallel, make parts of the stack MP safe. Protect the list and hashes in the PCB tables with a mutex. Note that the protocol notify functions may call pf via tcp_output(). As the pf lock is a sleeping rw_lock, we must not hold a mutex. To solve this for now, collect these PCBs in inp_notify list and protect it with exclusive netlock. OK sashan@
2022-03-13Hrvoje has hit a crash with IPsec acquire while testing the parallelAlexander Bluhm
IP forwarding diff. Add mutex and refcount to make memory management of struct ipsec_acquire MP safe. testing Hrvoje Popovski; input sashan@; OK mvs@
2022-03-10Use atomic load and store functions to access refcnt and waitAlexander Bluhm
variables. Although not necessary everywhere, using atomic functions exclusively for variables marked as atomic is clearer. OK mvs@ visa@
2022-03-08In IPsec policy replace integer refcount with atomic refcount.Alexander Bluhm
OK tobhe@ mvs@
2022-03-06Usually we check ipsec_in_use as shortcut to avoid IPsec lookups,Alexander Bluhm
but that does not work when coming from tcp_output() as inp != NULL. This seems to be done to block packets from sockets with options in inp_seclevel. But instead of doing the route lookup, go directly to ipsp_spd_inp() where the socket policy checks are done. Calling rtable_l2() before the shortcut also costs a bit, do it when needed. OK tobhe@
2022-03-04in_addmulti() is only called from ioctl(2) or setsockopt(2). WaitAlexander Bluhm
for malloc(9) to make the system call reliable. OK mvs@
2022-03-04in_pcbinit() is called during boot. There malloc(9) cannot fail,Alexander Bluhm
but would panic instead of waiting. Remove needless error handling. OK mvs@
2022-03-02Use NULL instead of 0 for pointer.Alexander Bluhm
2022-03-02Merge two comments describing the locks into one.Alexander Bluhm
2022-03-02The return value of in6_pcbnotify() is never used. Make it a voidAlexander Bluhm
function. OK gnezdo@ mvs@ florian@ sashan@
2022-03-01Remove outdated comment about v4-mapped v6 addresses. They are notAlexander Bluhm
supported anymore.
2022-02-25Reported-by: syzbot+1b5b209ce506db4d411d@syzkaller.appspotmail.comPhilip Guenther
Revert the pr_usrreqs move: syzkaller found a NULL pointer deref and I won't be available to monitor for followup issues for a bit
2022-02-25Move pr_attach and pr_detach to a new structure pr_usrreqs that canPhilip Guenther
then be shared among protosw structures, following the same basic direction as NetBSD and FreeBSD for this. Split PRU_CONTROL out of pr_usrreq into pru_control, giving it the proper prototype to eliminate the previously necessary casts. ok mvs@ bluhm@
2022-02-22Delete unnecessary #includes of <netinet6/ip6protosw.h>: some neverPhilip Guenther
needed it and some no longer need it after moving the externs from there to <sys/protosw.h> ok jsg@
2022-02-22Delete unnecessary #includes of <sys/domain.h> and/or <sys/protosw.h>Philip Guenther
net/if_pppx.c pointed out by jsg@ ok gnezdo@ deraadt@ jsg@ mpi@ millert@
2022-02-16rewrite vxlan to better fit the current kernel infrastructure.David Gwynne
the big change is removing the integration with and reliance on bridge(4) for learning vxlan endpoints. we have the etherbridge layer now (which is used by veb, nvgre, bpe, etc) so vxlan can operate independently of bridge(4) (or any other driver) while still dynamically learning about other endpoints. vxlan now uses the udp socket upcall mechanism to receive packets. this means it actually creates and binds udp sockets to use rather adding code in the udp layer for stealing packets from the udp layer. i think it's also important to note that this adds loop prevention to the code. this stops a vxlan interface being used to transmit a packet that was encapsulated in itself. i want to clear this out of my tree where it's been sitting for nearly a year. noone seems too concerned with the change either way. ok claudio@