summaryrefslogtreecommitdiff
path: root/sys/net
AgeCommit message (Collapse)Author
2021-10-25Call a locked variant of tdb_unlink() from tdb_walk(). Fixes aAlexander Bluhm
mutex locking against myself panic introduced by my previous commit. OK beck@ patrick@
2021-10-24let pf_table.c to use standard way to work with listsAlexandr Nedvedicky
OK todd@, mvs@, kn@
2021-10-23There is an m_pullup() down in AH input. As it may free or changeAlexander Bluhm
the mbuf, the callers must be careful. Although there is no bug, use the common pattern to handle this. Pass down an mbuf pointer mp and let m_pullup() update the pointer in all callers. It looks like the tcp signature functions should not be called. Avoid an mbuf leak and return an error. OK mvs@
2021-10-23Fix double free after allocation failure in bpf(4).Visa Hankala
Reported by Peter J. Philipp. OK mpi@
2021-10-23YIELD() in pf_table.c should preempt for ioctl() callers onlyAlexandr Nedvedicky
OK @mpi
2021-10-22After deleting hifn(4) the only provider for the LZS compressionAlexander Bluhm
algorithm is gone. Reomve all LZS references from the tree. The v42bis in isakmpd also looks unsupported. OK mvs@ patrick@ sthen@
2021-09-14Add missing kernel lock for Bi-directional Forwarding Detection data.Vitaliy Makkoveev
Also bfdset() calls pool_get(9) with PR_WAITOK flag so it should be done before we check the existence of this `bfd', otherwise it could be added multiple times. We have BFD disabled in the default kernel so this diff is for consistency mostly. ok mpi@
2021-09-07Fix NULL pointer dereference introduced by previous commit.Vitaliy Makkoveev
Reported-by: syzbot+684597dbbb9b516e76ae@syzkaller.appspotmail.com ok mpi@
2021-09-07Fix the race between if_detach() and rtm_output().Vitaliy Makkoveev
When the dying network interface descriptor has if_get(9) obtained reference owned by foreign thread, the if_detach() thread will sleep just after it removed this interface from the interface index map. The data related to this interface is still in routing table, so if_get(9) called by concurrent rtm_output() thread will return NULL and the following "ifp != NULL" assertion will be triggered. So remove the "ifp != NULL" assertions from rtm_output() and try to grab `ifp' as early as possible then hold it until we finish the work. In the case we won the race and we have `ifp' non NULL, concurrent if_detach() thread will wait us. In the case we lost we just return ESRCH. The problem reported by danj@. Diff tested by danj@. ok mpi@
2021-08-30remove a bunch of forward-only structs that were found with ctfconv.Jasper Lievisse Adriaanse
ok mpi@
2021-08-19implement reception of "VLAN 0 priority tagged" packets.David Gwynne
according to 802.1Q, vlan 0 on the wire is special and should be interpreted as if it was a packet received on the parent interface, but you get the packet priority information encoded in the vlan header. historically we drop vlan tagged packets that don't have a vlan interface configured for the received tag number. historically we have deviated from 802.1Q by allowing for the configuration of a vlan subinterface with the vnetid "unset". this works just like any other vlan interface, but it uses tag 0 on the wire. however, if you're in a situation where you're receiving vlan tagged 0 packets that really are part of the same layer 2 ethernet domain as the parent inteface, this doesnt work well. landry@ is in such a situation at work where the network is sending his OpenBSD boxes packets with VLAN tag 0. sometimes. most of the time the packets are untagged, which is expected, but sometimes they have a VLAN header set. this causes problems, particularly with arp. this diff does the smallest possible change to enable reception of these vlan 0 priority tagged packets. if an "unset" vlan interface is not configured on the parent, then vlan 0 tagged packets get their header stripped and continue stack processing as if they didnt have the tag at all. landry has been running this for months. ok sthen@ claudio@
2021-08-05m_freem in wg_send's path where a peer has no endpoint address,Stuart Henderson
fixing an mbuf leak way with wgpka (keepalive) found the hard way by Matt P. Diff from Matt Dunwoodie, ok claudio@
2021-08-02Don't call rtm_ifchg() in trunk_port_state().mvs
The preceding trunk_link_active() already produced RTM_IFINFO message when trunk(4) state was changed. I such case we double RTM_IFINFO message or we produce false message when trunk(4) state was not changed. ok florian@
2021-07-27Revert "Use per-CPU counters for tunnel descriptor block" diff.mvs
Panic reported by Hrvoje Popovski.
2021-07-27Introduce mutex(9) to protect pipex(4) session content.mvs
With bluhm@'s diff for parallel forwarding pipex(4) could be accessed in parallel through (*ifp->if_input)() -> ether_input() -> pipex_pppoe_input(). PPPOE pipex(4) sessions are mostly immutable except MPPE crypt. The new per-session `pxs_mtx' mutex(9) used to protect session's `ccp-id' which is incremented each time we send CCP reset-request. The new `pxm_mtx' mutex(9) used to protect MPPE context. Each pipex(4) session has two of them: one for the input and one for output path. Where is no lock order limitations because those new mutex(9)'es never held together. ok bluhm@
2021-07-26Use per-CPU counters for tunnel descriptor block (tdb) statistics.mvs
'tdb_data' struct became unused and was removed. ok bluhm@
2021-07-22Add sizes for free() in zlibTheo Buehler
Rebased version of a diff from miod who described it as follows: This tries to keep diffability against upstream, hence a questionable choice of the size type for zcfree() - but all sizes should fit in 32 bits anyway. Since all zcfree routines used in the tree cope with NULL arguments (including the various alloc.c used by the boot blocks), I have simplified TRY_FREE to compensate for the growth. Reminded by and ok mpi
2021-07-20Turn pipex(4) session statistics to per-CPU counters. This makes pipex(4)mvs
more compliant to bluhm@'s work on traffic forwarding parallelization. ok yasuoka@ bluhm@
2021-07-20The current workaround to disable parallel IPsec did not work.Alexander Bluhm
Variable nettaskqs must not change at runtime. Interface input queues choose the thread during init with ifiq_softnet = net_tq(). So it cannot be modified after pfkeyv2_send() sets the first SA in kernel. Also changing the calculation in net_tq() may call task_del() with a different taskq than task_add(). Instead of restricting the index to the first softnet task, use an exclusive lock. For now just move the comment. We can later decide if a write net lock or kernel lock is better. OK mvs@
2021-07-19Fix an alignment fault observed on an octeon machine while pppoe(4) wasStefan Sperling
attempting to negotiate a large MTU. Copy the peer's max payload size from the discovery packet with memcpy() instead of using a pointer to this value's offset in the packet buffer. tweak and ok visa@ additional testing and ok sthen@
2021-07-14Export SA replay counters via pfkey and print with ipsecctl.tobhe
This is useful for debugging replay window issues with 64 bit sequence numbers in IPsec. ok bluhm@
2021-07-09ifq_hdatalen can return 0 if ifq_empty is true, which avoids locks.David Gwynne
2021-07-08Initialize `ipsec_acquire_pool' pool (9) within pfkey_init() instead ofmvs
doing that in runtime within ipsp_acquire_sa(). ok bluhm@
2021-07-08Debug printfs in encdebug were inconsistent, some missing newlinesAlexander Bluhm
produced ugly output. Move the function name and the newline into the DPRINTF macro. This simplifies the debug statements. OK tobhe@
2021-07-07tell ether_input() to call pf_test() outside of smr_read sections,Alexandr Nedvedicky
because smr_read sections don't play well with sleeping locks in pf(4). OK bluhm@
2021-07-07pfsync_undefer() must be called outside of PF_LOCKAlexandr Nedvedicky
OK @bluhm
2021-07-05Export tdb MTU to userland via SADB_GET. This helps debug path MTUtobhe
discovery issues with ESP in UDP. ok bluhm@ sthen@ mpi@
2021-07-05etherbridge_map was way too clever, so simplify it.David Gwynne
the code tried to carry state from the quick smr based lookup through to the actual map update under the mutex, but this led to refcnt leaks, and logic errors. the simplification is that if the smr based checks say the map needs updating, we prepare the update and then forget what we learnt inside the smr critical section and redo them under the mutex again. entries in an etherbridge map are either in it or they aren't, so we don't need to refcnt them. this means the thing that takes an entry out of the map becomes directly responsible for destroy it, so they can do the smr call or barrier directly rather than via a refcnt. found by hrvoje popovski while testing the stack running in parallel, and fix tested by him too. ok sashan@
2021-06-30Remove splnet() from ifnewlladdr(), it is not needed anymore.Alexander Bluhm
Add asserts and comments for the locks that are necessary. discussed with dlg@ mpi@ mvs@; tested by Hrvoje Popovski; OK mpi@
2021-06-25let pfsync_request_update actually retry when it overfills a packet.David Gwynne
a continue in the middle of a do { } while (0) loop is effectively a break, it doesnt restart the loop. without the retry, the code leaked update messages which in turn made pool_destroy in pfsync destroy trip over a kassert cos items were still out. found by and fix tested by hrvoje popovski ok sashan@
2021-06-23rtsock: revert from timeout_set_flags(9) to timeout_set_proc(9); ok mvs@cheloha
2021-06-23augment the global pf state list with its own locks.David Gwynne
before this, things that iterated over the global list of pf states had to take the net, pf, or pf state locks. in particular, the ioctls that dump the state table took the net and pf state locks before iterating over the states and using copyout to export them to userland. when we tried replacing the use rwlocks with mutexes under the pf locks, this blew up because you can't sleep when holding a mutex and there's a sleeping lock used inside copyout. this diff introduces two locks around the global state list: a mutex that protects the head and tail of the list, and an rwlock that protects the links between elements in the list. inserts on the state list only occur during packet handling and can be done by taking the mutex and putting the state on the tail before releasing the mutex. iterating over states is only done from thread/process contexts, so we can take a read lock, then the mutex to get a snapshot of the head and tail pointers, and then keep the read lock to iterate between the head and tail points. because it's a read lock we can then take other sleeping locks (eg, the one inside copyout) without (further) gymnastics. the pf state purge code takes the rwlock exclusively and the mutex to remove elements from the list. this allows the ioctls and purge code to loop over the list concurrently and largely without blocking the creation of states when pf is processing packets. pfsync also iterates over the state list when doing bulk sends, which the state purge code needs to be careful around. ok sashan@
2021-06-23pf_purge_expired_states can check the time once instead of for every state.David Gwynne
2021-06-23pfsync_undefer_notify needs to be careful before dereferecing state keys.David Gwynne
pfsync_undefer_notify uses the state keys to look up the address family, which is used to figure out if it should call ipv4 or ipv6 functions. however, the pf state purge code can unlink a state from the trees (ie, the state keys get removed) while the pfsync defer code is holding a reference to it and expects to be able to send the deferred packet in the future. we can test if the state keys are set by checking if the timeout state is PFTM_UNLINK or not. this currently relies on both pf_remove_state and pfsync_undefer_notify being called with the NET_LOCK held. this probably needs to be rethought later but is good enough for now. found the hard way on a production firewall at work.
2021-06-23rework pf_state_expires to avoid confusion around state->timeout.David Gwynne
im going to make it so pf_purge_expired_states() can gather states largely without sharing a lock with pfsync or actual packet processing in pf. if pf or pfsync unlink a state while pf_purge_expired_states is looking at it, we can race with some checks and fall over a KASSERT. i'm fixing this by having the caller of pf_state_expires read state->timeout first, do it's checks, and then pass the value as an argument into pf_state_expires. this means there's a consistent view of the state->timeout variable across all the checks that pf_purge_expired_states in particular does. if pf/pfsync does change the timeout while pf_purge_expired_states is looking at it, the worst thing that happens is that it doesn't get picked as a candidate for purging in this pass and will have to wait for the next sweep. ok sashan@ as part of a bigger diff
2021-06-17more consistently use pfsync_free_deferral to free the mbuf.David Gwynne
pfsync_free_deferral doesnt need to check pd_m for NULL before calling m_freem because m_freem does that anyway. if pf_setup_pdesc in pfsync_undefer_notify failed, the mbuf was freed but the pd_m pointer was not cleared, which would have led to a double free when pfsync_free_deferral tried to do the same thing for it. if pfsync_undefer is supposed to drop the mbuf, let pfsync_free_deferral do it for us. ok jmatthew@
2021-06-15use getnsecuptime instead of getmicrouptime.David Gwynne
working on a uint64_t is easier than remembering how timercmp and timersub works. ok jmatthew@
2021-06-15get the uptime before comparing to it.David Gwynne
"that seems kind of important" jmatthew@
2021-06-15factor out nsecuptime and getnsecuptime.David Gwynne
these functions were implemented in a bunch of places with comments saying it should be moved to kern_tc.c when more pop up, and i was about to add another one. i think it's time to move them to kern_tc.c. ok cheloa@ jmatthew@
2021-06-15rework pfsync deferal timeout handling.David Gwynne
instead of having a timeout per deferred packet structure, use a single timeout in pfsync that pulls items off the list of deferred packets. this avoids confusion about whether a timeout is handling the defer or another context owns it. this way round, the context that removes a defer from the list owns it and is responsible for completing it. this should fix a panic we hit on the firewalls at work. there's still another one that needs a fix, but sashan@ has been looking at it. this might make it simpler to deal with though. ok sashan@ jmatthew@
2021-06-09whitespace tweak. no functional change.David Gwynne
2021-06-02With parallel execution of pf_test() two packets may try to update the sameAlexandr Nedvedicky
state in pfsync(4) queue. pfsync_q_ins() takes that race into account with one exception: the KASSERT() at line 2352. That KASSERT() needs to be removed. 2346 void 2347 pfsync_q_ins(struct pf_state *st, int q) 2348 { 2349 struct pfsync_softc *sc = pfsyncif; 2350 size_t nlen, sc_len; 2351 2352 KASSERT(st->sync_state == PFSYNC_S_NONE); 2353 2354 #if defined(PFSYNC_DEBUG) 2355 if (sc->sc_len < PFSYNC_MINPKT) 2356 panic("pfsync pkt len is too low %zd", sc->sc_len); 2357 #endif 2358 do { 2359 mtx_enter(&sc->sc_mtx[q]); 2360 2361 /* 2362 * If two threads are competing to insert the same state, then 2363 * there must be just single winner. 2364 */ 2365 if (st->sync_state != PFSYNC_S_NONE) { 2366 mtx_leave(&sc->sc_mtx[q]); 2367 break; 2368 } OK bluhm@
2021-06-02whitespace tweaks, no functional change.David Gwynne
2021-06-02only read the if_bpf pointer once.David Gwynne
2021-06-02tpmr_input is called in an smr crit section, so it doesnt need its own.David Gwynne
this simplifies the code a little bit.
2021-06-02read the tpmr if_flags once in tpmr_input so link flags apply consistently.David Gwynne
this avoids IFF_LINK1 getting set by another cpu halfway through tpmr_input. if LINK1 is not set when a packet enters a tpmr pair it skips ip/pf checks, but if it is then set then only pf is run against it. this way you either get the ip checks and pf when the packet enters and leaves tpmr, or you dont get the ip and pf checks at all.
2021-06-02use ipv4_check and ipv6_check to well, check ip headers before running pf.David Gwynne
unlike bridge(4), these checks are only run when the packet is entering the veb/tpmr topology. the assumption is that only valid IP packets end up inside the topology so we don't have to check them when they're leaving. ok bluhm@ sashan@
2021-06-02use ipv4_check and ipv6_check provided by the network stacks.David Gwynne
this removes the duplication of the check code, and lets the v6 code in particular pick up a lot more sanity checks around valid addresses on the wire. ok bluhm@ sashan@
2021-06-01Check `so_state' in rtm_senddesync() and return if SS_ISCONNECTED ormvs
SS_CANTRCVMORE bits are set. The first check required to prevent timeout_add(9) reschedule `rop_timeout', otherwise timeout_del_barrier(9) can't help us. The second check is for the case when shutdown(2) with SHUT_RD argument occurred on this socket and we should not receive anything include RTM_DESYNC packets. ok claudio@
2021-06-01a couple of minor whitespace tweaks. no functional change.David Gwynne
am i a pf hacker now?