Age | Commit message (Collapse) | Author |
|
Otherwise we use `ipsecflowinfo' obtained from previous packet.
ok claudio@
|
|
tracepoint for each type of refcnt we have. As a start, add inpcb
and tdb refcnt. When the counter changes, btrace may print the
actual object, the current counter, the change value and optionally
the stack trace.
discussed with visa@; OK mpi@
|
|
prevent concurrent access to rt_llinfo from rtrequest_delete().
But the common case, when the MAC address is already known, works
without lock.
tested by Hrvoje Popovski; OK mvs@
|
|
once per function. This gives a more consistent time value.
OK claudio@ miod@ mvs@
|
|
(*if_qstart)() be always called with netlock held doesn't work anymore
with PPPOE sessions.
Introduce `pipex_list_mtx' mutex(9) and use it to protect global pipex(4)
lists and radix trees.
Protect pipex(4) `session' dereference with reference counters, because we
could sleep when accessing pipex(4) from ioctl(2) path, and this is not
possible with mutex(9) held.
ok bluhm@
|
|
counter to 0 properly. We have one reference count for the lists,
and one for the timeout handler. When the timout fires, it has to
decrement the reference to itself. Then the ipa is removed from
the lists and decremented again.
from Stefan Butz; OK tobhe@ mvs@
|
|
for the lock operation and to pass a value to the unlock operation.
sofree() still needs an extra flag to know if sounlock() should be called
or not. But sofree() is called less often and mostly without keeping the lock.
OK mpi@ mvs@
|
|
if_put(9) call means we finish work with `ifp' and it could be destroyed.
`ia' is the pointer to 'in_ifaddr' data belongs to `ifp', so we need to
release corresponding `ifp' after we finish deal with `ia'.
`if_addrlist' list destruction and ip_getmoptions() are serialized with
kernel and net locks so this is not critical, but looks inconsistent.
ok bluhm@
|
|
having it return a pointer to something that has a lifetime managed
by a lock without accounting for it or taking a reference count or
anything like that is asking for trouble. copying the address to
caller provded memory while still inside the lock is a lot safer.
discussed with visa@
ok bluhm@ claudio@
|
|
divert-packet rules pf calls directly from IP layer to protocol
layer. As the former has only shared net lock, additional protection
against parallel access is needed. Kernel lock is a temporary
workaround until the socket layer is MP safe.
discussed with kettenis@ mvs@
|
|
void. Introduce mutex and refcounting for inp like in the other
PCB functions.
OK sashan@
|
|
allocate them.
Currently there are 6 rttimer_queues and not many more will follow. So
change rt_timer_queue_create() to rt_timer_queue_init() which now takes
a struct rttimer_queue * as argument which will be initialized.
Since this changes the gloabl vars from pointer to struct adjust other
callers as well.
OK bluhm@
|
|
We already allow 240/4 in and out so lets allow it through as well.
One of many steps to make 240/4 useable.
Diff by Seth David Schoen (schoen at loyalty.org)
OK bluhm@ djm@
|
|
All users use the same callback per queue so that makes sense.
Also replace rt_timer_queue_destroy() with rt_timer_queue_flush().
OK bluhm@
|
|
always the incoming TDB that has to be checked.
from markus@
|
|
no longer uses a callback and so there is no need to define the
callback as MPSAFE.
OK bluhm@
|
|
`id_refcount' decrement. This should be consistent with `ipsp_ids_gc_list'
list modifications, otherwise concurrent ipsp_ids_insert() could remove
this dying `ids' from the list before if was placed there by
ipsp_ids_free(). This makes atomic operations with `id_refcount' useless.
Also prevent ipsp_ids_lookup() to return dying `ids'.
ok bluhm@
|
|
The callback only needs to know the rtableid all the other info from
struct rtableid is not needed.
Also change the default rttimer callback to only delete routes that are
RTF_HOST and RTF_DYNAMIC. This way 2 of the ICMP handlers can use NULL
as the callback.
OK bluhm@
|
|
rdomain. The rttimer API is rtable/rdomain aware and so there is no need
to have so many queues.
Also init the two queues (one for IPv4 and one for IPv6) early on. This
will allow the rttable code to become simpler.
OK bluhm@
|
|
to have parallel IP processing while the upper layers are still not
MP safe. Introduce ip_ours() that enqueues the packets and ipintr()
that dequeues and processes them with an exclusive netlock.
Note that we still have only one softnet task. Running IP processing
on multiple CPU will be the next step.
lots of testing Hrvoje Popovski; OK sashan@
|
|
of snapshots is to allow pfsync(4) to move items from global lists
to local lists (a.k.a. snapshots) under a mutex protection. Snapshots
are then processed without holding any mutexes. Such idea does not fly
well if link entry is currently used for global lists as well as snapshots.
Feedback by bluhm@ Credits also goes to hrvoje@ for extensive testing.
OK bluhm@
|
|
for timeout, add sysctl bounds checking between 0 and max int, and
use time_t for absolute times.
Some code assumes that the route timeout queue can be NULL and at
some places this was checked. Better make sure that all queues
always exist. The pool_get for struct rttimer_queue is only called
from initialization and from syscall, so PR_WAITOK is possible.
Keep the special hack when ip_mtudisc is set to 0. Destroy the
queue and generate an empty one.
If redirect timeout is 0, it should not time out. Check the value
in IPv6 to make the behavior like IPv4.
Sysctl net.inet6.icmp6.redirtimeout had no effect as the queue
timeout was not modified. Make icmp6_sysctl() look like icmp_sysctl().
OK claudio@
|
|
While it makes sense to limit bind(2) of unicast addresses that overlap
each other to be all from the same UID (like 0.0.0.0:53 and 127.0.0.1:53)
it makes little sense for multicast. Multicast is delivered to all sockets
that match so there is no risk of someone stealing traffic from someone
else. This should hopefully help with mDNS as reported by robert@
OK deraadt@ bluhm@
|
|
igmp groups may join while sleeping in interface destruction. In
this case if_get() in igmp_joingroup() fails and rti_fill() is not
called. Then inm->inm_rti may be NULL. This is the condition when
syzkaller crashes in igmp_leavegroup().
Pass the ifp the current CPU is already holding down to igmp_joingroup()
and igmp_leavegroup() to avoid half constructed igmp groups. Calling
if_get() in caller and callee makes no sense anyway.
Reported-by: syzbot+146823a676b7bea83649@syzkaller.appspotmail.com
OK denis@
|
|
rip_input().
from dhill@
|
|
there it calls sbappendaddr() while holding the raw table mutex.
This ends in sorwakeup() where we finally grab the kernel lock while
holding a mutex. Witness detects this misuse.
Use the same solution as for PCB notify. Collect the affected PCBs
in a temporary list. The list is protected by exclusive net lock.
syzbot+ebe3f03a472fecf5e42e@syzkaller.appspotmail.com
OK claudio@
|
|
|
|
of all UDP PCBs. From there it calls udp_sbappend() while holding
the UDP table mutex. This ends in sorwakeup() where we finally
grab the kernel lock while holding a mutex. Witness detects this
misuse.
Use the same solution as for PCB notify. Collect the affected PCBs
in a temporary list. The list is protected by exclusive net lock.
Reported-by: syzbot+7596cb96fb9f3c9d6f4f@syzkaller.appspotmail.com
OK sashan@
|
|
|
|
for PCB tables. It does not break userland build anymore.
pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@
|
|
previously things that used sendto or similar with raw sockets would
ignore any configured sourceaddr. this made it inconsistent with
other traffic, which in turn makes things confusing to debug if
you're using ping or traceroute (which use raw sockets) to figure
out what's happening to other packets.
the ipv6 equiv already does this too.
ok sthen@ claudio@
|
|
this allows the IP_MULTICAST_IF sockopt to specify which address
you want to send a limited broadcast (255.255.255.255) packet out
of.
requested by and ok claudio@
|
|
needed to make inpcb in kernel MP safe. To build sysctl and libkvm
based programs, we have to export it to userland.
OK claudio@
|
|
This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.
ok sthen
|
|
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@
|
|
IP forwarding diff. Add mutex and refcount to make memory management
of struct ipsec_acquire MP safe.
testing Hrvoje Popovski; input sashan@; OK mvs@
|
|
variables. Although not necessary everywhere, using atomic functions
exclusively for variables marked as atomic is clearer.
OK mvs@ visa@
|
|
OK tobhe@ mvs@
|
|
but that does not work when coming from tcp_output() as inp != NULL.
This seems to be done to block packets from sockets with options
in inp_seclevel. But instead of doing the route lookup, go directly
to ipsp_spd_inp() where the socket policy checks are done. Calling
rtable_l2() before the shortcut also costs a bit, do it when needed.
OK tobhe@
|
|
for malloc(9) to make the system call reliable.
OK mvs@
|
|
but would panic instead of waiting. Remove needless error handling.
OK mvs@
|
|
|
|
|
|
function.
OK gnezdo@ mvs@ florian@ sashan@
|
|
supported anymore.
|
|
Revert the pr_usrreqs move: syzkaller found a NULL pointer deref
and I won't be available to monitor for followup issues for a bit
|
|
then be shared among protosw structures, following the same basic
direction as NetBSD and FreeBSD for this.
Split PRU_CONTROL out of pr_usrreq into pru_control, giving it the
proper prototype to eliminate the previously necessary casts.
ok mvs@ bluhm@
|
|
needed it and some no longer need it after moving the externs from
there to <sys/protosw.h>
ok jsg@
|
|
net/if_pppx.c pointed out by jsg@
ok gnezdo@ deraadt@ jsg@ mpi@ millert@
|
|
the big change is removing the integration with and reliance on
bridge(4) for learning vxlan endpoints. we have the etherbridge
layer now (which is used by veb, nvgre, bpe, etc) so vxlan can
operate independently of bridge(4) (or any other driver) while still
dynamically learning about other endpoints.
vxlan now uses the udp socket upcall mechanism to receive packets.
this means it actually creates and binds udp sockets to use rather
adding code in the udp layer for stealing packets from the udp
layer.
i think it's also important to note that this adds loop prevention
to the code. this stops a vxlan interface being used to transmit a
packet that was encapsulated in itself.
i want to clear this out of my tree where it's been sitting for
nearly a year. noone seems too concerned with the change either
way.
ok claudio@
|