summaryrefslogtreecommitdiff
path: root/sys/net
AgeCommit message (Collapse)Author
2021-07-07tell ether_input() to call pf_test() outside of smr_read sections,Alexandr Nedvedicky
because smr_read sections don't play well with sleeping locks in pf(4). OK bluhm@
2021-07-07pfsync_undefer() must be called outside of PF_LOCKAlexandr Nedvedicky
OK @bluhm
2021-07-05Export tdb MTU to userland via SADB_GET. This helps debug path MTUtobhe
discovery issues with ESP in UDP. ok bluhm@ sthen@ mpi@
2021-07-05etherbridge_map was way too clever, so simplify it.David Gwynne
the code tried to carry state from the quick smr based lookup through to the actual map update under the mutex, but this led to refcnt leaks, and logic errors. the simplification is that if the smr based checks say the map needs updating, we prepare the update and then forget what we learnt inside the smr critical section and redo them under the mutex again. entries in an etherbridge map are either in it or they aren't, so we don't need to refcnt them. this means the thing that takes an entry out of the map becomes directly responsible for destroy it, so they can do the smr call or barrier directly rather than via a refcnt. found by hrvoje popovski while testing the stack running in parallel, and fix tested by him too. ok sashan@
2021-06-30Remove splnet() from ifnewlladdr(), it is not needed anymore.Alexander Bluhm
Add asserts and comments for the locks that are necessary. discussed with dlg@ mpi@ mvs@; tested by Hrvoje Popovski; OK mpi@
2021-06-25let pfsync_request_update actually retry when it overfills a packet.David Gwynne
a continue in the middle of a do { } while (0) loop is effectively a break, it doesnt restart the loop. without the retry, the code leaked update messages which in turn made pool_destroy in pfsync destroy trip over a kassert cos items were still out. found by and fix tested by hrvoje popovski ok sashan@
2021-06-23rtsock: revert from timeout_set_flags(9) to timeout_set_proc(9); ok mvs@cheloha
2021-06-23augment the global pf state list with its own locks.David Gwynne
before this, things that iterated over the global list of pf states had to take the net, pf, or pf state locks. in particular, the ioctls that dump the state table took the net and pf state locks before iterating over the states and using copyout to export them to userland. when we tried replacing the use rwlocks with mutexes under the pf locks, this blew up because you can't sleep when holding a mutex and there's a sleeping lock used inside copyout. this diff introduces two locks around the global state list: a mutex that protects the head and tail of the list, and an rwlock that protects the links between elements in the list. inserts on the state list only occur during packet handling and can be done by taking the mutex and putting the state on the tail before releasing the mutex. iterating over states is only done from thread/process contexts, so we can take a read lock, then the mutex to get a snapshot of the head and tail pointers, and then keep the read lock to iterate between the head and tail points. because it's a read lock we can then take other sleeping locks (eg, the one inside copyout) without (further) gymnastics. the pf state purge code takes the rwlock exclusively and the mutex to remove elements from the list. this allows the ioctls and purge code to loop over the list concurrently and largely without blocking the creation of states when pf is processing packets. pfsync also iterates over the state list when doing bulk sends, which the state purge code needs to be careful around. ok sashan@
2021-06-23pf_purge_expired_states can check the time once instead of for every state.David Gwynne
2021-06-23pfsync_undefer_notify needs to be careful before dereferecing state keys.David Gwynne
pfsync_undefer_notify uses the state keys to look up the address family, which is used to figure out if it should call ipv4 or ipv6 functions. however, the pf state purge code can unlink a state from the trees (ie, the state keys get removed) while the pfsync defer code is holding a reference to it and expects to be able to send the deferred packet in the future. we can test if the state keys are set by checking if the timeout state is PFTM_UNLINK or not. this currently relies on both pf_remove_state and pfsync_undefer_notify being called with the NET_LOCK held. this probably needs to be rethought later but is good enough for now. found the hard way on a production firewall at work.
2021-06-23rework pf_state_expires to avoid confusion around state->timeout.David Gwynne
im going to make it so pf_purge_expired_states() can gather states largely without sharing a lock with pfsync or actual packet processing in pf. if pf or pfsync unlink a state while pf_purge_expired_states is looking at it, we can race with some checks and fall over a KASSERT. i'm fixing this by having the caller of pf_state_expires read state->timeout first, do it's checks, and then pass the value as an argument into pf_state_expires. this means there's a consistent view of the state->timeout variable across all the checks that pf_purge_expired_states in particular does. if pf/pfsync does change the timeout while pf_purge_expired_states is looking at it, the worst thing that happens is that it doesn't get picked as a candidate for purging in this pass and will have to wait for the next sweep. ok sashan@ as part of a bigger diff
2021-06-17more consistently use pfsync_free_deferral to free the mbuf.David Gwynne
pfsync_free_deferral doesnt need to check pd_m for NULL before calling m_freem because m_freem does that anyway. if pf_setup_pdesc in pfsync_undefer_notify failed, the mbuf was freed but the pd_m pointer was not cleared, which would have led to a double free when pfsync_free_deferral tried to do the same thing for it. if pfsync_undefer is supposed to drop the mbuf, let pfsync_free_deferral do it for us. ok jmatthew@
2021-06-15use getnsecuptime instead of getmicrouptime.David Gwynne
working on a uint64_t is easier than remembering how timercmp and timersub works. ok jmatthew@
2021-06-15get the uptime before comparing to it.David Gwynne
"that seems kind of important" jmatthew@
2021-06-15factor out nsecuptime and getnsecuptime.David Gwynne
these functions were implemented in a bunch of places with comments saying it should be moved to kern_tc.c when more pop up, and i was about to add another one. i think it's time to move them to kern_tc.c. ok cheloa@ jmatthew@
2021-06-15rework pfsync deferal timeout handling.David Gwynne
instead of having a timeout per deferred packet structure, use a single timeout in pfsync that pulls items off the list of deferred packets. this avoids confusion about whether a timeout is handling the defer or another context owns it. this way round, the context that removes a defer from the list owns it and is responsible for completing it. this should fix a panic we hit on the firewalls at work. there's still another one that needs a fix, but sashan@ has been looking at it. this might make it simpler to deal with though. ok sashan@ jmatthew@
2021-06-09whitespace tweak. no functional change.David Gwynne
2021-06-02With parallel execution of pf_test() two packets may try to update the sameAlexandr Nedvedicky
state in pfsync(4) queue. pfsync_q_ins() takes that race into account with one exception: the KASSERT() at line 2352. That KASSERT() needs to be removed. 2346 void 2347 pfsync_q_ins(struct pf_state *st, int q) 2348 { 2349 struct pfsync_softc *sc = pfsyncif; 2350 size_t nlen, sc_len; 2351 2352 KASSERT(st->sync_state == PFSYNC_S_NONE); 2353 2354 #if defined(PFSYNC_DEBUG) 2355 if (sc->sc_len < PFSYNC_MINPKT) 2356 panic("pfsync pkt len is too low %zd", sc->sc_len); 2357 #endif 2358 do { 2359 mtx_enter(&sc->sc_mtx[q]); 2360 2361 /* 2362 * If two threads are competing to insert the same state, then 2363 * there must be just single winner. 2364 */ 2365 if (st->sync_state != PFSYNC_S_NONE) { 2366 mtx_leave(&sc->sc_mtx[q]); 2367 break; 2368 } OK bluhm@
2021-06-02whitespace tweaks, no functional change.David Gwynne
2021-06-02only read the if_bpf pointer once.David Gwynne
2021-06-02tpmr_input is called in an smr crit section, so it doesnt need its own.David Gwynne
this simplifies the code a little bit.
2021-06-02read the tpmr if_flags once in tpmr_input so link flags apply consistently.David Gwynne
this avoids IFF_LINK1 getting set by another cpu halfway through tpmr_input. if LINK1 is not set when a packet enters a tpmr pair it skips ip/pf checks, but if it is then set then only pf is run against it. this way you either get the ip checks and pf when the packet enters and leaves tpmr, or you dont get the ip and pf checks at all.
2021-06-02use ipv4_check and ipv6_check to well, check ip headers before running pf.David Gwynne
unlike bridge(4), these checks are only run when the packet is entering the veb/tpmr topology. the assumption is that only valid IP packets end up inside the topology so we don't have to check them when they're leaving. ok bluhm@ sashan@
2021-06-02use ipv4_check and ipv6_check provided by the network stacks.David Gwynne
this removes the duplication of the check code, and lets the v6 code in particular pick up a lot more sanity checks around valid addresses on the wire. ok bluhm@ sashan@
2021-06-01Check `so_state' in rtm_senddesync() and return if SS_ISCONNECTED ormvs
SS_CANTRCVMORE bits are set. The first check required to prevent timeout_add(9) reschedule `rop_timeout', otherwise timeout_del_barrier(9) can't help us. The second check is for the case when shutdown(2) with SHUT_RD argument occurred on this socket and we should not receive anything include RTM_DESYNC packets. ok claudio@
2021-06-01a couple of minor whitespace tweaks. no functional change.David Gwynne
am i a pf hacker now?
2021-05-30Declare all struct protosw as constant.Alexander Bluhm
OK mvs@
2021-05-27ajacoutot says i missed copying some bits from bridge for divert-to.David Gwynne
2021-05-27ajacouto says i missed copying some bits from bridge for divert-to.David Gwynne
2021-05-26Use `so_lock' to protect key management (PF_KEY) sockets. This can bemvs
done because we have no cases where one thread should lock two sockets simultaneously. tested by yasuoka@ ok bluhm@ markus@
2021-05-26add support for pf divert-to on tpmr, like what was done for veb(4).David Gwynne
when a divert-to rule applies to a packet, pf doesnt take the packet away and shove it in the socket directly. pf marks the packet, and then ip (or ipv6) input processing looks at the mark and picks the local socket to queue it on. because tpmr operates at layer 2, ip input has no chance to look at the packet and let the divert socket steal it. bridge(4) and now veb(4) handle this by checking if the packet has the pf divert to mark set on it and calls ip input if it's set. this copies the semantic to tpmr.
2021-05-26support divert-to when pf applies it to a packet.David Gwynne
when a divert-to rule applies to a packet, pf doesnt take the packet away and shove it in the socket directly. pf marks the packet, and then ip (or ipv6) input processing looks at the mark and picks the local socket to queue it on. because veb operates at layer 2, ip input processing only occurred if the packet was destined to go into a vport interface. bridge(4) handles this by checking if the packet has the pf divert to mark set on it and calls ip input if it's set. this copies the semantic to veb. this allows divert-to to steal (take?) packets going over a veb and process them on a local socket. reported by ajacatot@
2021-05-25As network features are not added dynamically, the domain structuresAlexander Bluhm
are constant. Having more const makes MP review easier. More pointers are mapped read-only in the kernel image. OK deraadt@ mvs@
2021-05-25The arrays sadb_exts_allowed_out and sadb_exts_required_out areAlexander Bluhm
constant. Put global variable declarations into header file. OK mvs@ mpi@
2021-05-17fix state key reference underflow, when sk == skrevAlexandr Nedvedicky
the bug has been reported by Sebastien and Olivier Cherrier. it has turned out the pf_state_key_link_reverse() does not grab enough references when both state keys (sk and skrev) are identical. This makes pf to trip assert later, when references are being dropped: panic(ffffffff81dfbc8e) at panic+0x11d __assert(ffffffff81e64b54,ffffffff81e0a6ee,33a,ffffffff81e03b7f) refcnt_rele(fffffd810bf02458) at refcnt_rele+0x6f pf_state_key_unref(fffffd810bf023f0) at pf_state_key_unref+0x21 pf_remove_state(fffffd810c0c4578) at pf_remove_state+0x1fa pf_purge_expired_states(2) at pf_purge_expired_states+0x232 pf_purge(ffffffff82236a30) at pf_purge+0x33 taskq_thread(ffff800000032080) at taskq_thread+0x81 fixed tested by Olivier Cherrier and semarie@ OK semarie@
2021-05-17Revert. Last change should not have been committed.Claudio Jeker
2021-05-17Increase the default buffer space using on PF_UNIX sockets to 8k.Claudio Jeker
Additionally make the values tuneable via sysctl. OK deraadt@ mvs@
2021-05-16panic does not require a \n at the end. When one is provided, it looks wrong.Theo de Raadt
2021-05-16In route detach we delete `rop_timeout' while `rop' is still linked tomvs
`rtp_list' so it could be re-added by concurrent thread. Also timeout_del(9) doesn't wait timeout proc to be finished and timeout_del_barrier(9) should be used for that. So use timeout_del_barrier(9) instead of timeout_del(9) and moved it just after refcnt_finalize(9). This fixes potential use-after-free issue in route_detach(). ok mpi@
2021-05-15Fix IPsec NAT-T to work with pipex(4). Introduce a new packet tagYASUOKA Masahiko
PACKET_TAG_IPSEC_FLOWINFO to specify the IPsec flow. ok mvs
2021-05-06Kill pfkeyv2_parsemessage() declaration which is absolutely uselessmvs
here since it's declared in net/pfkeyv2.h. Also kill unused RETURN_EINVAL() macro. ok mpi@
2021-05-04Initialize `ipsec_policy_pool' within pfkey_init() instead of doing thatmvs
in runtime within pfkeyv2_send(). Also set it's interrupt protection level to IPL_SOFTNET. ok bluhm@ mpi@
2021-05-04Remove unused `spd_tables' declaration.mvs
ok bluhm@ mpi@
2021-05-02Do soreserve() before `kp' allocation. This simplifies error path. Themvs
same was done for route_attach(). Also do soisconnected() after `kp' is fully initialized. This chair movement affects nothing for PF_KEY sockets but makes code more consistent. ok bluhm@ mpi@
2021-05-01Implement per-socket `so_lock' rwlock(9) and use it to protect routingmvs
(PF_ROUTE) sockets. This can be done because we have no cases where one thread should lock two sockets simultaneously. Against the previous version rtm_senddesync_timer() execution was moved to process context. Also this time `so_lock' used for routing sockets only but in the future it will be used to other socket types too. tested by claudio@ ok claudio@ bluhm@
2021-04-27pf_state_key_link_reverse() is prone to race on parallel forwardingAlexandr Nedvedicky
we need to adjust assertions. at time we call pf_state_key_link_reverse() is state_key either linked to correct reverse peer or not linked at all. The pf_state_key_link_reverse() is being called as a reader ons tate_lock. There might be more packets, which try to update the state key. OK bluhm@
2021-04-26Revert per-socket `so_lock' rwlock(9) and use it to protect routingClaudio Jeker
(PF_ROUTE) sockets. There is a locking issue with timeouts that needs to be fixed. Requested by deraadt@
2021-04-25Implement per-socket `so_lock' rwlock(9) and use it to protect routingmvs
(PF_ROUTE) sockets. This can be done because we have no cases where one thread should lock two sockets simultaneously. Also this time `so_lock 'used for routing sockets only but in the future it will be used to other socket types too. ok bluhm@
2021-04-23call klist_invalidate from bpfsdetach to tell kq listeners what happened.David Gwynne
without this, something using a kevent to monitor a bpf fd on an idle interface never has the event fire, which means it never realises the interface goes away. with this, the read event goes off and the next read fails with EIO, like pretty much every other driver when the underlying device is removed. ok claudio@ visa@ jmatthew@
2021-04-23only skip pf once for packets that are injected by a divert-packet socket.David Gwynne
when a divert socket gets a packet from userland to send back through the kernel, it marks it as diverted so pf knows not to divert it back to userland again. this marking stuck to the packet though, so if it went through pf again (eg, on the way out of the network stack) pf would skip it again. this is undesirable if you want pf to do things to the packet on this outgoing hope, such as nat. this has pf clear the mark once it's been used, which allows the next run of a packet through pf to have stuff work on it. found by some people at parta networks. ok sashan@ lteo@ bluhm@ bluhm@ also suggested keeping my diff in the same style as the rest of pf.c