Age | Commit message (Collapse) | Author |
|
because smr_read sections don't play well with sleeping locks in pf(4).
OK bluhm@
|
|
OK @bluhm
|
|
discovery issues with ESP in UDP.
ok bluhm@ sthen@ mpi@
|
|
the code tried to carry state from the quick smr based lookup through
to the actual map update under the mutex, but this led to refcnt
leaks, and logic errors. the simplification is that if the smr based
checks say the map needs updating, we prepare the update and then
forget what we learnt inside the smr critical section and redo them
under the mutex again.
entries in an etherbridge map are either in it or they aren't, so
we don't need to refcnt them. this means the thing that takes an
entry out of the map becomes directly responsible for destroy it,
so they can do the smr call or barrier directly rather than via a
refcnt.
found by hrvoje popovski while testing the stack running in parallel,
and fix tested by him too.
ok sashan@
|
|
Add asserts and comments for the locks that are necessary.
discussed with dlg@ mpi@ mvs@; tested by Hrvoje Popovski; OK mpi@
|
|
a continue in the middle of a do { } while (0) loop is effectively
a break, it doesnt restart the loop.
without the retry, the code leaked update messages which in turn
made pool_destroy in pfsync destroy trip over a kassert cos items
were still out.
found by and fix tested by hrvoje popovski
ok sashan@
|
|
|
|
before this, things that iterated over the global list of pf states
had to take the net, pf, or pf state locks. in particular, the
ioctls that dump the state table took the net and pf state locks
before iterating over the states and using copyout to export them
to userland. when we tried replacing the use rwlocks with mutexes
under the pf locks, this blew up because you can't sleep when holding
a mutex and there's a sleeping lock used inside copyout.
this diff introduces two locks around the global state list: a mutex
that protects the head and tail of the list, and an rwlock that
protects the links between elements in the list. inserts on the
state list only occur during packet handling and can be done by
taking the mutex and putting the state on the tail before releasing
the mutex. iterating over states is only done from thread/process
contexts, so we can take a read lock, then the mutex to get a
snapshot of the head and tail pointers, and then keep the read lock
to iterate between the head and tail points. because it's a read
lock we can then take other sleeping locks (eg, the one inside
copyout) without (further) gymnastics. the pf state purge code takes
the rwlock exclusively and the mutex to remove elements from the
list.
this allows the ioctls and purge code to loop over the list
concurrently and largely without blocking the creation of states
when pf is processing packets.
pfsync also iterates over the state list when doing bulk sends,
which the state purge code needs to be careful around.
ok sashan@
|
|
|
|
pfsync_undefer_notify uses the state keys to look up the address
family, which is used to figure out if it should call ipv4 or ipv6
functions. however, the pf state purge code can unlink a state from
the trees (ie, the state keys get removed) while the pfsync defer
code is holding a reference to it and expects to be able to send
the deferred packet in the future. we can test if the state keys
are set by checking if the timeout state is PFTM_UNLINK or not.
this currently relies on both pf_remove_state and pfsync_undefer_notify
being called with the NET_LOCK held. this probably needs to be
rethought later but is good enough for now.
found the hard way on a production firewall at work.
|
|
im going to make it so pf_purge_expired_states() can gather states
largely without sharing a lock with pfsync or actual packet processing
in pf. if pf or pfsync unlink a state while pf_purge_expired_states
is looking at it, we can race with some checks and fall over a
KASSERT.
i'm fixing this by having the caller of pf_state_expires read
state->timeout first, do it's checks, and then pass the value as
an argument into pf_state_expires. this means there's a consistent
view of the state->timeout variable across all the checks that
pf_purge_expired_states in particular does. if pf/pfsync does change
the timeout while pf_purge_expired_states is looking at it, the
worst thing that happens is that it doesn't get picked as a candidate
for purging in this pass and will have to wait for the next sweep.
ok sashan@ as part of a bigger diff
|
|
pfsync_free_deferral doesnt need to check pd_m for NULL before
calling m_freem because m_freem does that anyway.
if pf_setup_pdesc in pfsync_undefer_notify failed, the mbuf was
freed but the pd_m pointer was not cleared, which would have led
to a double free when pfsync_free_deferral tried to do the same
thing for it.
if pfsync_undefer is supposed to drop the mbuf, let pfsync_free_deferral
do it for us.
ok jmatthew@
|
|
working on a uint64_t is easier than remembering how timercmp and
timersub works.
ok jmatthew@
|
|
"that seems kind of important" jmatthew@
|
|
these functions were implemented in a bunch of places with comments
saying it should be moved to kern_tc.c when more pop up, and i was
about to add another one. i think it's time to move them to kern_tc.c.
ok cheloa@ jmatthew@
|
|
instead of having a timeout per deferred packet structure, use a
single timeout in pfsync that pulls items off the list of deferred
packets.
this avoids confusion about whether a timeout is handling the defer
or another context owns it. this way round, the context that removes
a defer from the list owns it and is responsible for completing it.
this should fix a panic we hit on the firewalls at work. there's
still another one that needs a fix, but sashan@ has been looking
at it. this might make it simpler to deal with though.
ok sashan@ jmatthew@
|
|
|
|
state in pfsync(4) queue. pfsync_q_ins() takes that race into account with one
exception: the KASSERT() at line 2352. That KASSERT() needs to be removed.
2346 void
2347 pfsync_q_ins(struct pf_state *st, int q)
2348 {
2349 struct pfsync_softc *sc = pfsyncif;
2350 size_t nlen, sc_len;
2351
2352 KASSERT(st->sync_state == PFSYNC_S_NONE);
2353
2354 #if defined(PFSYNC_DEBUG)
2355 if (sc->sc_len < PFSYNC_MINPKT)
2356 panic("pfsync pkt len is too low %zd", sc->sc_len);
2357 #endif
2358 do {
2359 mtx_enter(&sc->sc_mtx[q]);
2360
2361 /*
2362 * If two threads are competing to insert the same state, then
2363 * there must be just single winner.
2364 */
2365 if (st->sync_state != PFSYNC_S_NONE) {
2366 mtx_leave(&sc->sc_mtx[q]);
2367 break;
2368 }
OK bluhm@
|
|
|
|
|
|
this simplifies the code a little bit.
|
|
this avoids IFF_LINK1 getting set by another cpu halfway through
tpmr_input. if LINK1 is not set when a packet enters a tpmr pair
it skips ip/pf checks, but if it is then set then only pf is run
against it. this way you either get the ip checks and pf when the
packet enters and leaves tpmr, or you dont get the ip and pf checks
at all.
|
|
unlike bridge(4), these checks are only run when the packet is
entering the veb/tpmr topology. the assumption is that only valid
IP packets end up inside the topology so we don't have to check
them when they're leaving.
ok bluhm@ sashan@
|
|
this removes the duplication of the check code, and lets the v6
code in particular pick up a lot more sanity checks around valid
addresses on the wire.
ok bluhm@ sashan@
|
|
SS_CANTRCVMORE bits are set.
The first check required to prevent timeout_add(9) reschedule
`rop_timeout', otherwise timeout_del_barrier(9) can't help us.
The second check is for the case when shutdown(2) with SHUT_RD argument
occurred on this socket and we should not receive anything include
RTM_DESYNC packets.
ok claudio@
|
|
am i a pf hacker now?
|
|
OK mvs@
|
|
|
|
|
|
done because we have no cases where one thread should lock two sockets
simultaneously.
tested by yasuoka@
ok bluhm@ markus@
|
|
when a divert-to rule applies to a packet, pf doesnt take the packet
away and shove it in the socket directly. pf marks the packet, and
then ip (or ipv6) input processing looks at the mark and picks the
local socket to queue it on. because tpmr operates at layer 2, ip
input has no chance to look at the packet and let the divert socket
steal it.
bridge(4) and now veb(4) handle this by checking if the packet has
the pf divert to mark set on it and calls ip input if it's set.
this copies the semantic to tpmr.
|
|
when a divert-to rule applies to a packet, pf doesnt take the packet
away and shove it in the socket directly. pf marks the packet, and
then ip (or ipv6) input processing looks at the mark and picks the
local socket to queue it on. because veb operates at layer 2, ip
input processing only occurred if the packet was destined to go
into a vport interface.
bridge(4) handles this by checking if the packet has the pf divert
to mark set on it and calls ip input if it's set. this copies the
semantic to veb.
this allows divert-to to steal (take?) packets going over a veb and
process them on a local socket.
reported by ajacatot@
|
|
are constant. Having more const makes MP review easier. More
pointers are mapped read-only in the kernel image.
OK deraadt@ mvs@
|
|
constant. Put global variable declarations into header file.
OK mvs@ mpi@
|
|
the bug has been reported by Sebastien and Olivier Cherrier.
it has turned out the pf_state_key_link_reverse() does not
grab enough references when both state keys (sk and skrev)
are identical. This makes pf to trip assert later, when
references are being dropped:
panic(ffffffff81dfbc8e) at panic+0x11d
__assert(ffffffff81e64b54,ffffffff81e0a6ee,33a,ffffffff81e03b7f)
refcnt_rele(fffffd810bf02458) at refcnt_rele+0x6f
pf_state_key_unref(fffffd810bf023f0) at pf_state_key_unref+0x21
pf_remove_state(fffffd810c0c4578) at pf_remove_state+0x1fa
pf_purge_expired_states(2) at pf_purge_expired_states+0x232
pf_purge(ffffffff82236a30) at pf_purge+0x33
taskq_thread(ffff800000032080) at taskq_thread+0x81
fixed tested by Olivier Cherrier and semarie@
OK semarie@
|
|
|
|
Additionally make the values tuneable via sysctl.
OK deraadt@ mvs@
|
|
|
|
`rtp_list' so it could be re-added by concurrent thread. Also
timeout_del(9) doesn't wait timeout proc to be finished and
timeout_del_barrier(9) should be used for that.
So use timeout_del_barrier(9) instead of timeout_del(9) and moved it
just after refcnt_finalize(9). This fixes potential use-after-free
issue in route_detach().
ok mpi@
|
|
PACKET_TAG_IPSEC_FLOWINFO to specify the IPsec flow.
ok mvs
|
|
here since it's declared in net/pfkeyv2.h.
Also kill unused RETURN_EINVAL() macro.
ok mpi@
|
|
in runtime within pfkeyv2_send(). Also set it's interrupt protection
level to IPL_SOFTNET.
ok bluhm@ mpi@
|
|
ok bluhm@ mpi@
|
|
same was done for route_attach().
Also do soisconnected() after `kp' is fully initialized. This chair
movement affects nothing for PF_KEY sockets but makes code more
consistent.
ok bluhm@ mpi@
|
|
(PF_ROUTE) sockets. This can be done because we have no cases where one
thread should lock two sockets simultaneously.
Against the previous version rtm_senddesync_timer() execution was moved
to process context.
Also this time `so_lock' used for routing sockets only but in the future
it will be used to other socket types too.
tested by claudio@
ok claudio@ bluhm@
|
|
we need to adjust assertions. at time we call pf_state_key_link_reverse()
is state_key either linked to correct reverse peer or not linked at all.
The pf_state_key_link_reverse() is being called as a reader ons tate_lock.
There might be more packets, which try to update the state key.
OK bluhm@
|
|
(PF_ROUTE) sockets. There is a locking issue with timeouts that needs
to be fixed.
Requested by deraadt@
|
|
(PF_ROUTE) sockets. This can be done because we have no cases where one
thread should lock two sockets simultaneously.
Also this time `so_lock 'used for routing sockets only but in the future
it will be used to other socket types too.
ok bluhm@
|
|
without this, something using a kevent to monitor a bpf fd on an
idle interface never has the event fire, which means it never
realises the interface goes away. with this, the read event goes
off and the next read fails with EIO, like pretty much every other
driver when the underlying device is removed.
ok claudio@ visa@ jmatthew@
|
|
when a divert socket gets a packet from userland to send back through
the kernel, it marks it as diverted so pf knows not to divert it
back to userland again. this marking stuck to the packet though,
so if it went through pf again (eg, on the way out of the network
stack) pf would skip it again. this is undesirable if you want pf
to do things to the packet on this outgoing hope, such as nat.
this has pf clear the mark once it's been used, which allows the
next run of a packet through pf to have stuff work on it.
found by some people at parta networks.
ok sashan@ lteo@ bluhm@
bluhm@ also suggested keeping my diff in the same style as the rest of pf.c
|