summaryrefslogtreecommitdiff
path: root/sys/net/if_pfsync.c
AgeCommit message (Collapse)Author
2022-04-21Introduce a dedicated link entries for snapshots in pfsync(4). The purposeAlexandr Nedvedicky
of snapshots is to allow pfsync(4) to move items from global lists to local lists (a.k.a. snapshots) under a mutex protection. Snapshots are then processed without holding any mutexes. Such idea does not fly well if link entry is currently used for global lists as well as snapshots. Feedback by bluhm@ Credits also goes to hrvoje@ for extensive testing. OK bluhm@
2022-04-20In pfsync there were some KASSERT hidden behind #ifdef PFSYNC_DEBUG.Alexander Bluhm
Assertions should be active and rely on #ifdef DIAGNOSTIC. Retire PFSYNC_DEBUG. OK sashan@ dlg@
2022-04-14pf and pfsync are running without kernel lock, so the mutexes mustAlexander Bluhm
have at least mpfloor spl protection. Fix witness issue found by Hrvoje where pfsync holds mutex and interrupt grabs kernel lock. OK sashan@
2022-04-07In pfsync(4) sc_len is modified with atomic operations except inAlexander Bluhm
one place. Use atomic_add_long() there. Call the local variable sclen, that makes it easier to grep for the struct field sc_len. OK sashan@
2022-03-08merge iack_mtx, upd_c_mtx, del_mtx, ins_mtx and upd_mtx mutexes into singleAlexandr Nedvedicky
mutex st_mtx. This simplifies pf(4) state handling in pfsync(4). It also makes it more reliable. OK bluhm@
2022-02-25To fix crashes seen by Hrvoje with pfsync, IPsec and parallelAlexander Bluhm
forwarding, protect tdb flags and lists in pfsync with a mutex. help and OK sashan@
2021-11-25Implement reference counting for IPsec tdbs. Not all cases areAlexander Bluhm
covered yet, more ref counts to come. The timeouts are protected, so the racy tdb_reaper() gets retired. The tdb_policy_head, onext and inext lists are protected. All gettdb...() functions return a tdb that is ref counted and has to be unrefed later. A flag ensures that tdb_delete() is called only once. Tested by Hrvoje Popovski; OK sthen@ mvs@ tobhe@
2021-11-11Allow pfi_kif_get() callers to pre-allocate buffer for new kif. If kifAlexandr Nedvedicky
object exists already, then caller must free the pre-allocated buffer. If caller does not pre-allocate buffer, the pfi_kif_get() will get memory from pool using M_NOWAIT flag. Commit is also polishing pfi_initialize() a bit so it uses M_WAITOK allocation for pfi_all. there is no change in current behaviour. feedback by bluhm@ OK bluhm@
2021-07-07pfsync_undefer() must be called outside of PF_LOCKAlexandr Nedvedicky
OK @bluhm
2021-06-25let pfsync_request_update actually retry when it overfills a packet.David Gwynne
a continue in the middle of a do { } while (0) loop is effectively a break, it doesnt restart the loop. without the retry, the code leaked update messages which in turn made pool_destroy in pfsync destroy trip over a kassert cos items were still out. found by and fix tested by hrvoje popovski ok sashan@
2021-06-23augment the global pf state list with its own locks.David Gwynne
before this, things that iterated over the global list of pf states had to take the net, pf, or pf state locks. in particular, the ioctls that dump the state table took the net and pf state locks before iterating over the states and using copyout to export them to userland. when we tried replacing the use rwlocks with mutexes under the pf locks, this blew up because you can't sleep when holding a mutex and there's a sleeping lock used inside copyout. this diff introduces two locks around the global state list: a mutex that protects the head and tail of the list, and an rwlock that protects the links between elements in the list. inserts on the state list only occur during packet handling and can be done by taking the mutex and putting the state on the tail before releasing the mutex. iterating over states is only done from thread/process contexts, so we can take a read lock, then the mutex to get a snapshot of the head and tail pointers, and then keep the read lock to iterate between the head and tail points. because it's a read lock we can then take other sleeping locks (eg, the one inside copyout) without (further) gymnastics. the pf state purge code takes the rwlock exclusively and the mutex to remove elements from the list. this allows the ioctls and purge code to loop over the list concurrently and largely without blocking the creation of states when pf is processing packets. pfsync also iterates over the state list when doing bulk sends, which the state purge code needs to be careful around. ok sashan@
2021-06-23pfsync_undefer_notify needs to be careful before dereferecing state keys.David Gwynne
pfsync_undefer_notify uses the state keys to look up the address family, which is used to figure out if it should call ipv4 or ipv6 functions. however, the pf state purge code can unlink a state from the trees (ie, the state keys get removed) while the pfsync defer code is holding a reference to it and expects to be able to send the deferred packet in the future. we can test if the state keys are set by checking if the timeout state is PFTM_UNLINK or not. this currently relies on both pf_remove_state and pfsync_undefer_notify being called with the NET_LOCK held. this probably needs to be rethought later but is good enough for now. found the hard way on a production firewall at work.
2021-06-17more consistently use pfsync_free_deferral to free the mbuf.David Gwynne
pfsync_free_deferral doesnt need to check pd_m for NULL before calling m_freem because m_freem does that anyway. if pf_setup_pdesc in pfsync_undefer_notify failed, the mbuf was freed but the pd_m pointer was not cleared, which would have led to a double free when pfsync_free_deferral tried to do the same thing for it. if pfsync_undefer is supposed to drop the mbuf, let pfsync_free_deferral do it for us. ok jmatthew@
2021-06-15use getnsecuptime instead of getmicrouptime.David Gwynne
working on a uint64_t is easier than remembering how timercmp and timersub works. ok jmatthew@
2021-06-15get the uptime before comparing to it.David Gwynne
"that seems kind of important" jmatthew@
2021-06-15rework pfsync deferal timeout handling.David Gwynne
instead of having a timeout per deferred packet structure, use a single timeout in pfsync that pulls items off the list of deferred packets. this avoids confusion about whether a timeout is handling the defer or another context owns it. this way round, the context that removes a defer from the list owns it and is responsible for completing it. this should fix a panic we hit on the firewalls at work. there's still another one that needs a fix, but sashan@ has been looking at it. this might make it simpler to deal with though. ok sashan@ jmatthew@
2021-06-02With parallel execution of pf_test() two packets may try to update the sameAlexandr Nedvedicky
state in pfsync(4) queue. pfsync_q_ins() takes that race into account with one exception: the KASSERT() at line 2352. That KASSERT() needs to be removed. 2346 void 2347 pfsync_q_ins(struct pf_state *st, int q) 2348 { 2349 struct pfsync_softc *sc = pfsyncif; 2350 size_t nlen, sc_len; 2351 2352 KASSERT(st->sync_state == PFSYNC_S_NONE); 2353 2354 #if defined(PFSYNC_DEBUG) 2355 if (sc->sc_len < PFSYNC_MINPKT) 2356 panic("pfsync pkt len is too low %zd", sc->sc_len); 2357 #endif 2358 do { 2359 mtx_enter(&sc->sc_mtx[q]); 2360 2361 /* 2362 * If two threads are competing to insert the same state, then 2363 * there must be just single winner. 2364 */ 2365 if (st->sync_state != PFSYNC_S_NONE) { 2366 mtx_leave(&sc->sc_mtx[q]); 2367 break; 2368 } OK bluhm@
2021-03-10spellingJonathan Gray
ok gnezdo@ semarie@ mpi@
2021-02-25we don't have to cast to caddr_t when calling m_copydata anymore.David Gwynne
the first cut of this diff was made with coccinelle using this spatch: @rule@ type caddr_t; expression m, off, len, cp; @@ -m_copydata(m, off, len, (caddr_t)cp) +m_copydata(m, off, len, cp) i had fix it's opinionated idea of formatting by hand though, so i'm not sure it was worth it. ok deraadt@ bluhm@
2021-02-19we dont need to wrap some short lines.David Gwynne
2021-02-19check the state for PF_ROUTE when undeferring a packet, not the rule.David Gwynne
2021-02-09Activate use of PF_LOCK() by removing the WITH_PF_LOCK ifdefs.Patrick Wildt
Silence from the network group ok sashan@
2021-02-04make if_pfsync.c a better friend with PF_LOCKAlexandr Nedvedicky
The code delivered in this change is currently disabled. Brave souls may enable the code by adding -DWITH_PF_LOCK when building customized kernel. Big thanks goes to Hrvoje@ for providing test equipment and testing. As soon as we enter the next release cycle, the WITH_PF_LOCK will be defined as default option for MP kernels. OK dlg@
2021-02-01change route-to so it sends packets to IPs instead of interfaces.David Gwynne
this is a significant (and breaking) reworking of the policy based routing that pf can do. the intention is to make it as easy as nat/rdr to use, and more robust when it's operating. the main reasons for this change are: - route-to, reply-to, and dup-to do not work with pfsync this is because the information about where to route-to is stored in rules, and it is hard to have a ruleset synced between firewalls, and impossible to have them synced 100% of the time. - i can make my boxes panic in certain situations using route-to yeah... - the configuration and syntax for route-to rules are confusing. the argument to route-to and co is an interace name with an optional ip address. there are several problems with this. one is that people tend to think about routing as sending packets to peers by their address, not by the interface they're reachable on. another is that we currently have no way to synchronise interface topology information between firewalls, so using an interface to say where packets go means we can't do failover of these states with pfsync. another is that a change in routing topology means a host may become reachable over a different interface. tying routing policy to interfaces gets in the way of failover and load balancing. this change does the following: - stores the route info in the state instead of the pf rule this allows route-to to keep working when the ruleset changes, and allows route-to info to be sent over pfsync. there's enough spare bits in pfsync messages that the protocol doesnt break. the caveat is that route-to becomes tied to pass rules that create state, like rdr-to and nat-to. - the argument to route-to etc is a destination ip address it's not limited to a next-hop address (thought a next-hop can be a destination address). this allows for the failover and load balancing referred to above. - deprecates the address@interface host syntax in pfctl because routing is done entirely by IPs, the interface is derived from the route lookup, not pf. any attempt to use the @interface syntax will fail now in all contexts. there's enthusiasm from proctor@ jmatthew@ and others ok sashan@ bluhm@
2021-01-18Convert ifunit() to if_unit(9).mvs
ok sashan@
2021-01-04Minor refactoring in pf(4). Note that struct pfsync_state is noAlexander Bluhm
longer memcopied but assigned. Alignment should not be an issue as it is __packed. Part of a larger diff from dlg@; OK dlg@ sashan@
2020-12-12Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.jan
OK dlg@, bluhm@ No Opinion mpi@ Not against it claudio@
2020-08-24Remove ptr_array from struct pf_rulesetkn
Each ruleset's rules are stored in a TAILQ called "ptr" with "rcount" representing the number of rules in the ruleset; "ptr_array" points to an array of the same length. "ptr" is backed by pool_get(9) and may change in size as "expired" rules get removed from the ruleset - see "once" in pf.conf(5). "ptr_array" is allocated momentarily through mallocarray(9) and gets filled with the TAILQ entries, so that the sole user pfsync(4) can access the list of rules by index to pick the n-th rule during state insertion. Remove "ptr_array" and make pfsync iterate over the TAILQ instead to get the matching rule's index. This simplifies both code and data structures and avoids duplicate memory management. OK sashan
2020-08-21Leave default ifq_maxlen handling to ifq_init()kn
Most clonable interface drivers (except bridge, enc, loop, pppx, switch, trunk and vlan) initialise the send queue's length to IFQ_MAXLEN during *_clone_create() even though ifq_init(), which is eventually called through if_attach(), does the same. Remove all early "ifq_set_maxlen(&ifq->if_snd, IFQ_MAXLEN);" lines to leave it to ifq_init() and have clonable drivers a tad more in sync. OK mvs
2020-08-11Run start routing without KERNEL_LOCK()kn
pfsyncstart() does not require the big lock, make it use the ifq API. OK mvs
2020-07-29pfsync(4) holds pointer to corresponding `ifnet' as `sc_sync_if'. Thismvs
pointer obtained by ifunit() and it's reference counter is not bumped. This can cause use after free issue. Replace this pointer by interface index. ok dlg@ sashan@
2020-07-10Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.Patrick Wildt
ok dlg@ tobhe@
2020-07-10Change users of IFQ_PURGE() to use the "new" API.Patrick Wildt
ok dlg@ tobhe@
2020-06-28state import should accept AF_INET/AF_INET6 onlyAlexandr Nedvedicky
Reported-by: syzbot+6fef0091252d57113bfb@syzkaller.appspotmail.com ok kn@
2020-06-24kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)cheloha
time_second(9) and time_uptime(9) are widely used in the kernel to quickly get the system UTC or system uptime as a time_t. However, time_t is 64-bit everywhere, so it is not generally safe to use them on 32-bit platforms: you have a split-read problem if your hardware cannot perform atomic 64-bit reads. This patch replaces time_second(9) with gettime(9), a safer successor interface, throughout the kernel. Similarly, time_uptime(9) is replaced with getuptime(9). There is a performance cost on 32-bit platforms in exchange for eliminating the split-read problem: instead of two register reads you now have a lockless read loop to pull the values from the timehands. This is really not *too* bad in the grand scheme of things, but compared to what we were doing before it is several times slower. There is no performance cost on 64-bit (__LP64__) platforms. With input from visa@, dlg@, and tedu@. Several bugs squashed by visa@. ok kettenis@
2020-04-12Stop processing packets under non-exclusive (read) netlock.Martin Pieuchot
Prevent concurrency in the socket layer which is not ready for that. Two recent data corruptions in pfsync(4) and the socket layer pointed out that, at least, tun(4) was incorrectly using NET_RUNLOCK(). Until we find a way in software to avoid future mistakes and to make sure that only the softnet thread and some ioctls are safe to use a read version of the lock, put everything back to the exclusive version. ok stsp@, visa@
2020-04-11Avoid triggering KASSERT for bogus reason in pfsync_sendout with PFSYNC_DEBUG.Stefan Sperling
ok mpi@
2020-04-11fix build with PFSYNC_DEBUG by switching a format string from %d to %zdStefan Sperling
2019-11-07remove the detach and linkstate hooks when the parent is going away.David Gwynne
i think this is a fix for a real bug. pfsync leaked the hooks it had on a parent^Wsyncdev when the parent went away. now there's KASSERTs to make sure all hooks are removed before an interface goes away, the leak caused the KASSERTs to fire and made the bug obvious. found by hrvoje popovski
2019-11-07turn the linkstate hooks into a task list, like the detach hooks.David Gwynne
this is largely mechanical, except for carp. this moves the addition of the carp link state hook after we're committed to using the new interface as a carpdev. because the add can't fail, we avoid a complicated unwind dance. also, this tweaks the carp linkstate hook so it only updates the relevant carp interface, not all of the carpdevs on the parent. hrvoje popovski has tested an early version of this diff and it's generally ok, but there's some splasserts that this diff fires that i'll fix in an upcoming diff. ok claudio@
2019-11-06replace the hooks used with if_detachhooks with a task list.David Gwynne
the main semantic change is that things registering detach hooks have to allocate and set a task structure that then gets added to the list. this means if the task is allocated up front (eg, as part of carps softc or bridges port structure), it avoids the possibility that adding a hook can fail. a lot of drivers weren't checking for failure, and unwinding state in the event of failure in other parts was error prone. while doing this i discovered that the list operations have to be in a particular order, but drivers weren't doing that consistently either. this diff wraps the list ops up so you have to seriously go out of your way to screw them up. ive also sprinkled some NET_ASSERT_LOCKED around the list operations so we can make sure there's no potential for the list to be corrupted, especially while it's being run. hrvoje popovski has tested this a bit, and some issues he discovered have been fixed. ok sashan@
2019-06-10Use mallocarray(9) & put some free(9) sizes for M_IPMOPTS allocations.Martin Pieuchot
ok semarie@, visa@
2019-06-04pfsync_sendout() requires PF_LOCK()Alexandr Nedvedicky
OK mpi@
2019-02-10"non-existant" is one of those words that don't exist, so use "non-existent"Peter Hessler
instead From Pamela Mosiejczuk, many thanks! OK phessler@ deraadt@
2018-10-03Fix a race condition that affects pfsync interface deletion.Visa Hankala
When a pfsync interface is being deleted, all its timeout handlers and pfsync_send_dispatch() have to stop accessing the software context before the context is freed. Ensure sufficient synchronization by acquiring NET_LOCK() and clearing `pfsyncif' inside the critical section in pfsync_clone_destroy(). When a timeout handler has entered the critical section, it has to check `pfsyncif' and bail out if the value is NULL. pfsync_send_dispatch() already does this check. Issue reported and fix tested by Hrvoje Popovski. OK mpi@ bluhm@
2018-10-02- pfsync: avoid a recursion on PF_LOCKAlexandr Nedvedicky
OK bluhm@
2018-09-11- moving state look up outside of PF_LOCK()Alexandr Nedvedicky
this change adds a pf_state_lock rw-lock, which protects consistency of state table in PF. The code delivered in this change is guarded by 'WITH_PF_LOCK', which is still undefined. People, who are willing to experiment and want to run it must do two things: - compile kernel with -DWITH_PF_LOCK - bump NET_TASKQ from 1 to ... sky is the limit, (just select some sensible value for number of tasks your system is able to handle) OK bluhm@
2018-08-24- cosmetic tweak to if_pfsync.cAlexandr Nedvedicky
OK bluhm@, OK mpi@, henning@, jca@
2018-02-19Remove almost unused `flags' argument of suser().Martin Pieuchot
The account flag `ASU' will no longer be set but that makes suser() mpsafe since it no longer mess with a per-process field. No objection from millert@, ok tedu@, bluhm@
2018-01-09Creating a cloned interface could return ENOMEM due to temporaryAlexander Bluhm
memory shortage. As it is invoked from a system call, it should not fail and wait instead. OK visa@ mpi@