src - OpenBSD base system

Age	Commit message (Collapse)	Author
2021-07-07	tell ether_input() to call pf_test() outside of smr_read sections,	Alexandr Nedvedicky
	because smr_read sections don't play well with sleeping locks in pf(4). OK bluhm@
2021-07-07	pfsync_undefer() must be called outside of PF_LOCK	Alexandr Nedvedicky
	OK @bluhm
2021-07-05	Export tdb MTU to userland via SADB_GET. This helps debug path MTU	tobhe
	discovery issues with ESP in UDP. ok bluhm@ sthen@ mpi@
2021-07-05	etherbridge_map was way too clever, so simplify it.	David Gwynne
	the code tried to carry state from the quick smr based lookup through to the actual map update under the mutex, but this led to refcnt leaks, and logic errors. the simplification is that if the smr based checks say the map needs updating, we prepare the update and then forget what we learnt inside the smr critical section and redo them under the mutex again. entries in an etherbridge map are either in it or they aren't, so we don't need to refcnt them. this means the thing that takes an entry out of the map becomes directly responsible for destroy it, so they can do the smr call or barrier directly rather than via a refcnt. found by hrvoje popovski while testing the stack running in parallel, and fix tested by him too. ok sashan@
2021-06-30	Remove splnet() from ifnewlladdr(), it is not needed anymore.	Alexander Bluhm
	Add asserts and comments for the locks that are necessary. discussed with dlg@ mpi@ mvs@; tested by Hrvoje Popovski; OK mpi@
2021-06-25	let pfsync_request_update actually retry when it overfills a packet.	David Gwynne
	a continue in the middle of a do { } while (0) loop is effectively a break, it doesnt restart the loop. without the retry, the code leaked update messages which in turn made pool_destroy in pfsync destroy trip over a kassert cos items were still out. found by and fix tested by hrvoje popovski ok sashan@
2021-06-23	rtsock: revert from timeout_set_flags(9) to timeout_set_proc(9); ok mvs@	cheloha

2021-06-23	augment the global pf state list with its own locks.	David Gwynne
	before this, things that iterated over the global list of pf states had to take the net, pf, or pf state locks. in particular, the ioctls that dump the state table took the net and pf state locks before iterating over the states and using copyout to export them to userland. when we tried replacing the use rwlocks with mutexes under the pf locks, this blew up because you can't sleep when holding a mutex and there's a sleeping lock used inside copyout. this diff introduces two locks around the global state list: a mutex that protects the head and tail of the list, and an rwlock that protects the links between elements in the list. inserts on the state list only occur during packet handling and can be done by taking the mutex and putting the state on the tail before releasing the mutex. iterating over states is only done from thread/process contexts, so we can take a read lock, then the mutex to get a snapshot of the head and tail pointers, and then keep the read lock to iterate between the head and tail points. because it's a read lock we can then take other sleeping locks (eg, the one inside copyout) without (further) gymnastics. the pf state purge code takes the rwlock exclusively and the mutex to remove elements from the list. this allows the ioctls and purge code to loop over the list concurrently and largely without blocking the creation of states when pf is processing packets. pfsync also iterates over the state list when doing bulk sends, which the state purge code needs to be careful around. ok sashan@
2021-06-23	pf_purge_expired_states can check the time once instead of for every state.	David Gwynne

2021-06-23	pfsync_undefer_notify needs to be careful before dereferecing state keys.	David Gwynne
	pfsync_undefer_notify uses the state keys to look up the address family, which is used to figure out if it should call ipv4 or ipv6 functions. however, the pf state purge code can unlink a state from the trees (ie, the state keys get removed) while the pfsync defer code is holding a reference to it and expects to be able to send the deferred packet in the future. we can test if the state keys are set by checking if the timeout state is PFTM_UNLINK or not. this currently relies on both pf_remove_state and pfsync_undefer_notify being called with the NET_LOCK held. this probably needs to be rethought later but is good enough for now. found the hard way on a production firewall at work.
2021-06-23	rework pf_state_expires to avoid confusion around state->timeout.	David Gwynne
	im going to make it so pf_purge_expired_states() can gather states largely without sharing a lock with pfsync or actual packet processing in pf. if pf or pfsync unlink a state while pf_purge_expired_states is looking at it, we can race with some checks and fall over a KASSERT. i'm fixing this by having the caller of pf_state_expires read state->timeout first, do it's checks, and then pass the value as an argument into pf_state_expires. this means there's a consistent view of the state->timeout variable across all the checks that pf_purge_expired_states in particular does. if pf/pfsync does change the timeout while pf_purge_expired_states is looking at it, the worst thing that happens is that it doesn't get picked as a candidate for purging in this pass and will have to wait for the next sweep. ok sashan@ as part of a bigger diff
2021-06-17	more consistently use pfsync_free_deferral to free the mbuf.	David Gwynne
	pfsync_free_deferral doesnt need to check pd_m for NULL before calling m_freem because m_freem does that anyway. if pf_setup_pdesc in pfsync_undefer_notify failed, the mbuf was freed but the pd_m pointer was not cleared, which would have led to a double free when pfsync_free_deferral tried to do the same thing for it. if pfsync_undefer is supposed to drop the mbuf, let pfsync_free_deferral do it for us. ok jmatthew@
2021-06-15	use getnsecuptime instead of getmicrouptime.	David Gwynne
	working on a uint64_t is easier than remembering how timercmp and timersub works. ok jmatthew@
2021-06-15	get the uptime before comparing to it.	David Gwynne
	"that seems kind of important" jmatthew@
2021-06-15	factor out nsecuptime and getnsecuptime.	David Gwynne
	these functions were implemented in a bunch of places with comments saying it should be moved to kern_tc.c when more pop up, and i was about to add another one. i think it's time to move them to kern_tc.c. ok cheloa@ jmatthew@
2021-06-15	rework pfsync deferal timeout handling.	David Gwynne
	instead of having a timeout per deferred packet structure, use a single timeout in pfsync that pulls items off the list of deferred packets. this avoids confusion about whether a timeout is handling the defer or another context owns it. this way round, the context that removes a defer from the list owns it and is responsible for completing it. this should fix a panic we hit on the firewalls at work. there's still another one that needs a fix, but sashan@ has been looking at it. this might make it simpler to deal with though. ok sashan@ jmatthew@
2021-06-09	whitespace tweak. no functional change.	David Gwynne

2021-06-02	With parallel execution of pf_test() two packets may try to update the same	Alexandr Nedvedicky
	state in pfsync(4) queue. pfsync_q_ins() takes that race into account with one exception: the KASSERT() at line 2352. That KASSERT() needs to be removed. 2346 void 2347 pfsync_q_ins(struct pf_state st, int q) 2348 { 2349 struct pfsync_softc sc = pfsyncif; 2350 size_t nlen, sc_len; 2351 2352 KASSERT(st->sync_state == PFSYNC_S_NONE); 2353 2354 #if defined(PFSYNC_DEBUG) 2355 if (sc->sc_len < PFSYNC_MINPKT) 2356 panic("pfsync pkt len is too low %zd", sc->sc_len); 2357 #endif 2358 do { 2359 mtx_enter(&sc->sc_mtx[q]); 2360 2361 /* 2362 * If two threads are competing to insert the same state, then 2363 * there must be just single winner. 2364 */ 2365 if (st->sync_state != PFSYNC_S_NONE) { 2366 mtx_leave(&sc->sc_mtx[q]); 2367 break; 2368 } OK bluhm@
2021-06-02	whitespace tweaks, no functional change.	David Gwynne

2021-06-02	only read the if_bpf pointer once.	David Gwynne

2021-06-02	tpmr_input is called in an smr crit section, so it doesnt need its own.	David Gwynne
	this simplifies the code a little bit.
2021-06-02	read the tpmr if_flags once in tpmr_input so link flags apply consistently.	David Gwynne
	this avoids IFF_LINK1 getting set by another cpu halfway through tpmr_input. if LINK1 is not set when a packet enters a tpmr pair it skips ip/pf checks, but if it is then set then only pf is run against it. this way you either get the ip checks and pf when the packet enters and leaves tpmr, or you dont get the ip and pf checks at all.
2021-06-02	use ipv4_check and ipv6_check to well, check ip headers before running pf.	David Gwynne
	unlike bridge(4), these checks are only run when the packet is entering the veb/tpmr topology. the assumption is that only valid IP packets end up inside the topology so we don't have to check them when they're leaving. ok bluhm@ sashan@
2021-06-02	use ipv4_check and ipv6_check provided by the network stacks.	David Gwynne
	this removes the duplication of the check code, and lets the v6 code in particular pick up a lot more sanity checks around valid addresses on the wire. ok bluhm@ sashan@
2021-06-01	Check `so_state' in rtm_senddesync() and return if SS_ISCONNECTED or	mvs
	SS_CANTRCVMORE bits are set. The first check required to prevent timeout_add(9) reschedule `rop_timeout', otherwise timeout_del_barrier(9) can't help us. The second check is for the case when shutdown(2) with SHUT_RD argument occurred on this socket and we should not receive anything include RTM_DESYNC packets. ok claudio@
2021-06-01	a couple of minor whitespace tweaks. no functional change.	David Gwynne
	am i a pf hacker now?
2021-05-30	Declare all struct protosw as constant.	Alexander Bluhm
	OK mvs@
2021-05-27	ajacoutot says i missed copying some bits from bridge for divert-to.	David Gwynne

2021-05-27	ajacouto says i missed copying some bits from bridge for divert-to.	David Gwynne

2021-05-26	Use `so_lock' to protect key management (PF_KEY) sockets. This can be	mvs
	done because we have no cases where one thread should lock two sockets simultaneously. tested by yasuoka@ ok bluhm@ markus@
2021-05-26	add support for pf divert-to on tpmr, like what was done for veb(4).	David Gwynne
	when a divert-to rule applies to a packet, pf doesnt take the packet away and shove it in the socket directly. pf marks the packet, and then ip (or ipv6) input processing looks at the mark and picks the local socket to queue it on. because tpmr operates at layer 2, ip input has no chance to look at the packet and let the divert socket steal it. bridge(4) and now veb(4) handle this by checking if the packet has the pf divert to mark set on it and calls ip input if it's set. this copies the semantic to tpmr.
2021-05-26	support divert-to when pf applies it to a packet.	David Gwynne
	when a divert-to rule applies to a packet, pf doesnt take the packet away and shove it in the socket directly. pf marks the packet, and then ip (or ipv6) input processing looks at the mark and picks the local socket to queue it on. because veb operates at layer 2, ip input processing only occurred if the packet was destined to go into a vport interface. bridge(4) handles this by checking if the packet has the pf divert to mark set on it and calls ip input if it's set. this copies the semantic to veb. this allows divert-to to steal (take?) packets going over a veb and process them on a local socket. reported by ajacatot@
2021-05-25	As network features are not added dynamically, the domain structures	Alexander Bluhm
	are constant. Having more const makes MP review easier. More pointers are mapped read-only in the kernel image. OK deraadt@ mvs@
2021-05-25	The arrays sadb_exts_allowed_out and sadb_exts_required_out are	Alexander Bluhm
	constant. Put global variable declarations into header file. OK mvs@ mpi@
2021-05-17	fix state key reference underflow, when sk == skrev	Alexandr Nedvedicky
	the bug has been reported by Sebastien and Olivier Cherrier. it has turned out the pf_state_key_link_reverse() does not grab enough references when both state keys (sk and skrev) are identical. This makes pf to trip assert later, when references are being dropped: panic(ffffffff81dfbc8e) at panic+0x11d __assert(ffffffff81e64b54,ffffffff81e0a6ee,33a,ffffffff81e03b7f) refcnt_rele(fffffd810bf02458) at refcnt_rele+0x6f pf_state_key_unref(fffffd810bf023f0) at pf_state_key_unref+0x21 pf_remove_state(fffffd810c0c4578) at pf_remove_state+0x1fa pf_purge_expired_states(2) at pf_purge_expired_states+0x232 pf_purge(ffffffff82236a30) at pf_purge+0x33 taskq_thread(ffff800000032080) at taskq_thread+0x81 fixed tested by Olivier Cherrier and semarie@ OK semarie@
2021-05-17	Revert. Last change should not have been committed.	Claudio Jeker

2021-05-17	Increase the default buffer space using on PF_UNIX sockets to 8k.	Claudio Jeker
	Additionally make the values tuneable via sysctl. OK deraadt@ mvs@
2021-05-16	panic does not require a \n at the end. When one is provided, it looks wrong.	Theo de Raadt

2021-05-16	In route detach we delete `rop_timeout' while `rop' is still linked to	mvs
	`rtp_list' so it could be re-added by concurrent thread. Also timeout_del(9) doesn't wait timeout proc to be finished and timeout_del_barrier(9) should be used for that. So use timeout_del_barrier(9) instead of timeout_del(9) and moved it just after refcnt_finalize(9). This fixes potential use-after-free issue in route_detach(). ok mpi@
2021-05-15	Fix IPsec NAT-T to work with pipex(4). Introduce a new packet tag	YASUOKA Masahiko
	PACKET_TAG_IPSEC_FLOWINFO to specify the IPsec flow. ok mvs
2021-05-06	Kill pfkeyv2_parsemessage() declaration which is absolutely useless	mvs
	here since it's declared in net/pfkeyv2.h. Also kill unused RETURN_EINVAL() macro. ok mpi@
2021-05-04	Initialize `ipsec_policy_pool' within pfkey_init() instead of doing that	mvs
	in runtime within pfkeyv2_send(). Also set it's interrupt protection level to IPL_SOFTNET. ok bluhm@ mpi@
2021-05-04	Remove unused `spd_tables' declaration.	mvs
	ok bluhm@ mpi@
2021-05-02	Do soreserve() before `kp' allocation. This simplifies error path. The	mvs
	same was done for route_attach(). Also do soisconnected() after `kp' is fully initialized. This chair movement affects nothing for PF_KEY sockets but makes code more consistent. ok bluhm@ mpi@
2021-05-01	Implement per-socket `so_lock' rwlock(9) and use it to protect routing	mvs
	(PF_ROUTE) sockets. This can be done because we have no cases where one thread should lock two sockets simultaneously. Against the previous version rtm_senddesync_timer() execution was moved to process context. Also this time `so_lock' used for routing sockets only but in the future it will be used to other socket types too. tested by claudio@ ok claudio@ bluhm@
2021-04-27	pf_state_key_link_reverse() is prone to race on parallel forwarding	Alexandr Nedvedicky
	we need to adjust assertions. at time we call pf_state_key_link_reverse() is state_key either linked to correct reverse peer or not linked at all. The pf_state_key_link_reverse() is being called as a reader ons tate_lock. There might be more packets, which try to update the state key. OK bluhm@
2021-04-26	Revert per-socket `so_lock' rwlock(9) and use it to protect routing	Claudio Jeker
	(PF_ROUTE) sockets. There is a locking issue with timeouts that needs to be fixed. Requested by deraadt@
2021-04-25	Implement per-socket `so_lock' rwlock(9) and use it to protect routing	mvs
	(PF_ROUTE) sockets. This can be done because we have no cases where one thread should lock two sockets simultaneously. Also this time `so_lock 'used for routing sockets only but in the future it will be used to other socket types too. ok bluhm@
2021-04-23	call klist_invalidate from bpfsdetach to tell kq listeners what happened.	David Gwynne
	without this, something using a kevent to monitor a bpf fd on an idle interface never has the event fire, which means it never realises the interface goes away. with this, the read event goes off and the next read fails with EIO, like pretty much every other driver when the underlying device is removed. ok claudio@ visa@ jmatthew@
2021-04-23	only skip pf once for packets that are injected by a divert-packet socket.	David Gwynne
	when a divert socket gets a packet from userland to send back through the kernel, it marks it as diverted so pf knows not to divert it back to userland again. this marking stuck to the packet though, so if it went through pf again (eg, on the way out of the network stack) pf would skip it again. this is undesirable if you want pf to do things to the packet on this outgoing hope, such as nat. this has pf clear the mark once it's been used, which allows the next run of a packet through pf to have stuff work on it. found by some people at parta networks. ok sashan@ lteo@ bluhm@ bluhm@ also suggested keeping my diff in the same style as the rest of pf.c