Age | Commit message (Collapse) | Author |
|
mutex locking against myself panic introduced by my previous commit.
OK beck@ patrick@
|
|
OK todd@, mvs@, kn@
|
|
the mbuf, the callers must be careful. Although there is no bug,
use the common pattern to handle this. Pass down an mbuf pointer
mp and let m_pullup() update the pointer in all callers.
It looks like the tcp signature functions should not be called.
Avoid an mbuf leak and return an error.
OK mvs@
|
|
Reported by Peter J. Philipp.
OK mpi@
|
|
OK @mpi
|
|
algorithm is gone. Reomve all LZS references from the tree. The
v42bis in isakmpd also looks unsupported.
OK mvs@ patrick@ sthen@
|
|
Also bfdset() calls pool_get(9) with PR_WAITOK flag so it should be done
before we check the existence of this `bfd', otherwise it could be added
multiple times.
We have BFD disabled in the default kernel so this diff is for
consistency mostly.
ok mpi@
|
|
Reported-by: syzbot+684597dbbb9b516e76ae@syzkaller.appspotmail.com
ok mpi@
|
|
When the dying network interface descriptor has if_get(9) obtained
reference owned by foreign thread, the if_detach() thread will sleep
just after it removed this interface from the interface index map.
The data related to this interface is still in routing table, so
if_get(9) called by concurrent rtm_output() thread will return NULL and
the following "ifp != NULL" assertion will be triggered.
So remove the "ifp != NULL" assertions from rtm_output() and try to grab
`ifp' as early as possible then hold it until we finish the work. In the
case we won the race and we have `ifp' non NULL, concurrent if_detach()
thread will wait us. In the case we lost we just return ESRCH.
The problem reported by danj@.
Diff tested by danj@.
ok mpi@
|
|
ok mpi@
|
|
according to 802.1Q, vlan 0 on the wire is special and should be
interpreted as if it was a packet received on the parent interface,
but you get the packet priority information encoded in the vlan
header.
historically we drop vlan tagged packets that don't have a vlan
interface configured for the received tag number. historically we
have deviated from 802.1Q by allowing for the configuration of a
vlan subinterface with the vnetid "unset". this works just like any
other vlan interface, but it uses tag 0 on the wire. however, if
you're in a situation where you're receiving vlan tagged 0 packets
that really are part of the same layer 2 ethernet domain as the
parent inteface, this doesnt work well.
landry@ is in such a situation at work where the network is sending
his OpenBSD boxes packets with VLAN tag 0. sometimes. most of the
time the packets are untagged, which is expected, but sometimes
they have a VLAN header set. this causes problems, particularly
with arp.
this diff does the smallest possible change to enable reception of
these vlan 0 priority tagged packets. if an "unset" vlan interface
is not configured on the parent, then vlan 0 tagged packets get
their header stripped and continue stack processing as if they didnt
have the tag at all.
landry has been running this for months.
ok sthen@ claudio@
|
|
fixing an mbuf leak way with wgpka (keepalive) found the hard way by Matt P.
Diff from Matt Dunwoodie, ok claudio@
|
|
The preceding trunk_link_active() already produced RTM_IFINFO message when
trunk(4) state was changed. I such case we double RTM_IFINFO message or we
produce false message when trunk(4) state was not changed.
ok florian@
|
|
Panic reported by Hrvoje Popovski.
|
|
With bluhm@'s diff for parallel forwarding pipex(4) could be accessed in
parallel through (*ifp->if_input)() -> ether_input() ->
pipex_pppoe_input(). PPPOE pipex(4) sessions are mostly immutable except
MPPE crypt.
The new per-session `pxs_mtx' mutex(9) used to protect session's
`ccp-id' which is incremented each time we send CCP reset-request.
The new `pxm_mtx' mutex(9) used to protect MPPE context. Each pipex(4)
session has two of them: one for the input and one for output path.
Where is no lock order limitations because those new mutex(9)'es never
held together.
ok bluhm@
|
|
'tdb_data' struct became unused and was removed.
ok bluhm@
|
|
Rebased version of a diff from miod who described it as follows:
This tries to keep diffability against upstream, hence a questionable
choice of the size type for zcfree() - but all sizes should fit in 32
bits anyway.
Since all zcfree routines used in the tree cope with NULL arguments
(including the various alloc.c used by the boot blocks), I have
simplified TRY_FREE to compensate for the growth.
Reminded by and ok mpi
|
|
more compliant to bluhm@'s work on traffic forwarding parallelization.
ok yasuoka@ bluhm@
|
|
Variable nettaskqs must not change at runtime. Interface input
queues choose the thread during init with ifiq_softnet = net_tq().
So it cannot be modified after pfkeyv2_send() sets the first SA in
kernel. Also changing the calculation in net_tq() may call task_del()
with a different taskq than task_add().
Instead of restricting the index to the first softnet task, use an
exclusive lock. For now just move the comment. We can later decide
if a write net lock or kernel lock is better.
OK mvs@
|
|
attempting to negotiate a large MTU.
Copy the peer's max payload size from the discovery packet with memcpy()
instead of using a pointer to this value's offset in the packet buffer.
tweak and ok visa@
additional testing and ok sthen@
|
|
This is useful for debugging replay window issues with 64 bit
sequence numbers in IPsec.
ok bluhm@
|
|
|
|
doing that in runtime within ipsp_acquire_sa().
ok bluhm@
|
|
produced ugly output. Move the function name and the newline into
the DPRINTF macro. This simplifies the debug statements.
OK tobhe@
|
|
because smr_read sections don't play well with sleeping locks in pf(4).
OK bluhm@
|
|
OK @bluhm
|
|
discovery issues with ESP in UDP.
ok bluhm@ sthen@ mpi@
|
|
the code tried to carry state from the quick smr based lookup through
to the actual map update under the mutex, but this led to refcnt
leaks, and logic errors. the simplification is that if the smr based
checks say the map needs updating, we prepare the update and then
forget what we learnt inside the smr critical section and redo them
under the mutex again.
entries in an etherbridge map are either in it or they aren't, so
we don't need to refcnt them. this means the thing that takes an
entry out of the map becomes directly responsible for destroy it,
so they can do the smr call or barrier directly rather than via a
refcnt.
found by hrvoje popovski while testing the stack running in parallel,
and fix tested by him too.
ok sashan@
|
|
Add asserts and comments for the locks that are necessary.
discussed with dlg@ mpi@ mvs@; tested by Hrvoje Popovski; OK mpi@
|
|
a continue in the middle of a do { } while (0) loop is effectively
a break, it doesnt restart the loop.
without the retry, the code leaked update messages which in turn
made pool_destroy in pfsync destroy trip over a kassert cos items
were still out.
found by and fix tested by hrvoje popovski
ok sashan@
|
|
|
|
before this, things that iterated over the global list of pf states
had to take the net, pf, or pf state locks. in particular, the
ioctls that dump the state table took the net and pf state locks
before iterating over the states and using copyout to export them
to userland. when we tried replacing the use rwlocks with mutexes
under the pf locks, this blew up because you can't sleep when holding
a mutex and there's a sleeping lock used inside copyout.
this diff introduces two locks around the global state list: a mutex
that protects the head and tail of the list, and an rwlock that
protects the links between elements in the list. inserts on the
state list only occur during packet handling and can be done by
taking the mutex and putting the state on the tail before releasing
the mutex. iterating over states is only done from thread/process
contexts, so we can take a read lock, then the mutex to get a
snapshot of the head and tail pointers, and then keep the read lock
to iterate between the head and tail points. because it's a read
lock we can then take other sleeping locks (eg, the one inside
copyout) without (further) gymnastics. the pf state purge code takes
the rwlock exclusively and the mutex to remove elements from the
list.
this allows the ioctls and purge code to loop over the list
concurrently and largely without blocking the creation of states
when pf is processing packets.
pfsync also iterates over the state list when doing bulk sends,
which the state purge code needs to be careful around.
ok sashan@
|
|
|
|
pfsync_undefer_notify uses the state keys to look up the address
family, which is used to figure out if it should call ipv4 or ipv6
functions. however, the pf state purge code can unlink a state from
the trees (ie, the state keys get removed) while the pfsync defer
code is holding a reference to it and expects to be able to send
the deferred packet in the future. we can test if the state keys
are set by checking if the timeout state is PFTM_UNLINK or not.
this currently relies on both pf_remove_state and pfsync_undefer_notify
being called with the NET_LOCK held. this probably needs to be
rethought later but is good enough for now.
found the hard way on a production firewall at work.
|
|
im going to make it so pf_purge_expired_states() can gather states
largely without sharing a lock with pfsync or actual packet processing
in pf. if pf or pfsync unlink a state while pf_purge_expired_states
is looking at it, we can race with some checks and fall over a
KASSERT.
i'm fixing this by having the caller of pf_state_expires read
state->timeout first, do it's checks, and then pass the value as
an argument into pf_state_expires. this means there's a consistent
view of the state->timeout variable across all the checks that
pf_purge_expired_states in particular does. if pf/pfsync does change
the timeout while pf_purge_expired_states is looking at it, the
worst thing that happens is that it doesn't get picked as a candidate
for purging in this pass and will have to wait for the next sweep.
ok sashan@ as part of a bigger diff
|
|
pfsync_free_deferral doesnt need to check pd_m for NULL before
calling m_freem because m_freem does that anyway.
if pf_setup_pdesc in pfsync_undefer_notify failed, the mbuf was
freed but the pd_m pointer was not cleared, which would have led
to a double free when pfsync_free_deferral tried to do the same
thing for it.
if pfsync_undefer is supposed to drop the mbuf, let pfsync_free_deferral
do it for us.
ok jmatthew@
|
|
working on a uint64_t is easier than remembering how timercmp and
timersub works.
ok jmatthew@
|
|
"that seems kind of important" jmatthew@
|
|
these functions were implemented in a bunch of places with comments
saying it should be moved to kern_tc.c when more pop up, and i was
about to add another one. i think it's time to move them to kern_tc.c.
ok cheloa@ jmatthew@
|
|
instead of having a timeout per deferred packet structure, use a
single timeout in pfsync that pulls items off the list of deferred
packets.
this avoids confusion about whether a timeout is handling the defer
or another context owns it. this way round, the context that removes
a defer from the list owns it and is responsible for completing it.
this should fix a panic we hit on the firewalls at work. there's
still another one that needs a fix, but sashan@ has been looking
at it. this might make it simpler to deal with though.
ok sashan@ jmatthew@
|
|
|
|
state in pfsync(4) queue. pfsync_q_ins() takes that race into account with one
exception: the KASSERT() at line 2352. That KASSERT() needs to be removed.
2346 void
2347 pfsync_q_ins(struct pf_state *st, int q)
2348 {
2349 struct pfsync_softc *sc = pfsyncif;
2350 size_t nlen, sc_len;
2351
2352 KASSERT(st->sync_state == PFSYNC_S_NONE);
2353
2354 #if defined(PFSYNC_DEBUG)
2355 if (sc->sc_len < PFSYNC_MINPKT)
2356 panic("pfsync pkt len is too low %zd", sc->sc_len);
2357 #endif
2358 do {
2359 mtx_enter(&sc->sc_mtx[q]);
2360
2361 /*
2362 * If two threads are competing to insert the same state, then
2363 * there must be just single winner.
2364 */
2365 if (st->sync_state != PFSYNC_S_NONE) {
2366 mtx_leave(&sc->sc_mtx[q]);
2367 break;
2368 }
OK bluhm@
|
|
|
|
|
|
this simplifies the code a little bit.
|
|
this avoids IFF_LINK1 getting set by another cpu halfway through
tpmr_input. if LINK1 is not set when a packet enters a tpmr pair
it skips ip/pf checks, but if it is then set then only pf is run
against it. this way you either get the ip checks and pf when the
packet enters and leaves tpmr, or you dont get the ip and pf checks
at all.
|
|
unlike bridge(4), these checks are only run when the packet is
entering the veb/tpmr topology. the assumption is that only valid
IP packets end up inside the topology so we don't have to check
them when they're leaving.
ok bluhm@ sashan@
|
|
this removes the duplication of the check code, and lets the v6
code in particular pick up a lot more sanity checks around valid
addresses on the wire.
ok bluhm@ sashan@
|
|
SS_CANTRCVMORE bits are set.
The first check required to prevent timeout_add(9) reschedule
`rop_timeout', otherwise timeout_del_barrier(9) can't help us.
The second check is for the case when shutdown(2) with SHUT_RD argument
occurred on this socket and we should not receive anything include
RTM_DESYNC packets.
ok claudio@
|
|
am i a pf hacker now?
|