src - OpenBSD base system

Age	Commit message (Collapse)	Author
2022-03-17	Use the refcnt API in bpf.	Visa Hankala
	OK sashan@ bluhm@
2022-02-15	Use knote_modify_fn() and knote_process_fn() in bpf.	Visa Hankala
	OK dlg@
2022-02-13	The length value in bpf_movein() is casted to from size_t to u_int	Alexander Bluhm
	and then rounded before checking. Put the same check before the calculations to avoid overflow. Reported-by: syzbot+6f29d23eca959c5a9705@syzkaller.appspotmail.com OK claudio@
2022-02-13	Rename knote_modify() to knote_assign()	Visa Hankala
	This avoids verb overlap with f_modify.
2022-02-11	Replace manual !klist_empty()+knote() with KNOTE().	Visa Hankala
	OK mpi@
2022-02-05	make bpf_movein align the packet payload.	David Gwynne
	bluhm@ hit a problem while running a regress test where a packet generated and injected via bpf ends up being consumed by the network stack. the stack assumes that packets are aligned properly, but bpf was lazy and put whatever was written to it at the start of an mbuf. ethernet has a 14 byte header, so if you put that at the start the payload will be misaligned by 2 bytes. bpf already has handling for different link header types, so this handling is extended a bit to align the payload after the link header. while here we're fixing up a few error codes. short packets produce EINVAL instead of EPERM, and packets larger than the biggest mbuf the kernel supports generates EMSGSIZE. with tweaks and ok bluhm@
2022-01-16	activate/notify waiting kq kevents from bpf_wakeup directly.	David Gwynne
	this builds on the mpsafe kq/kevent work visa has been doing. normally kevents are notified by calling selwakeup, but selwakeup needs the KERNEL_LOCK. because bpf runs from all sorts of contexts that may or may not have the kernel lock, the call to selwakeup is deferred to the systq which already has the kernel lock. while this avoids spinning in bpf for the kernel lock, it still adds latency between when the buffer is ready for a program and when that program gets notified about it. now that bpf kevents are mpsafe and bpf_wakeup is already holding the necessary locks, we can avoid that latency. bpf_wakeup now checks if there are waiting kevents and notifies them immediately. if there are no other things to wake up, bpf_wakeup avoids the task_add (and associated reference counting) to defer the selwakeup call. selwakeup can still try to notify waiting kevents, so this uses the hint passed to knote() to differentiate between the notification from bpf_wakeup and selwakeup and returns early from the latter. ok visa@
2022-01-13	Make bpf event filter MP-safe	Visa Hankala
	Use bd_mtx to serialize bpf knote handling. This allows calling the event filter without the kernel lock. OK mpi@
2022-01-13	Return an error if bpfilter_lookup() fails in bpfkqfilter()	Visa Hankala
	The lookup should not fail because the kernel lock should prevent simultaneous detaching on the vnode layer. However, most other device kqfilter routines check the lookup's outcome anyway, which is maybe a bit more forgiving. OK mpi@
2021-11-10	whitespace tweaks, no functional change.	David Gwynne

2021-10-23	Fix double free after allocation failure in bpf(4).	Visa Hankala
	Reported by Peter J. Philipp. OK mpi@
2021-06-15	factor out nsecuptime and getnsecuptime.	David Gwynne
	these functions were implemented in a bunch of places with comments saying it should be moved to kern_tc.c when more pop up, and i was about to add another one. i think it's time to move them to kern_tc.c. ok cheloa@ jmatthew@
2021-04-23	call klist_invalidate from bpfsdetach to tell kq listeners what happened.	David Gwynne
	without this, something using a kevent to monitor a bpf fd on an idle interface never has the event fire, which means it never realises the interface goes away. with this, the read event goes off and the next read fails with EIO, like pretty much every other driver when the underlying device is removed. ok claudio@ visa@ jmatthew@
2021-01-21	let vfs keep track of nonblocking state for us.	David Gwynne
	ok claudio@ mvs@
2021-01-17	don't encode the mbuf prio as part of the vlan tag in bpf_mtap_ether.	David Gwynne
	the vlan tag we're injecting into the mbuf chain is either straight off the wire and therefore already has the vlan priority encoded, or is straight after it's been set up by vlan(4), which also has the prio already encoded. ok kn@ visa@ mvs@
2021-01-02	optimise bpf_catchpacket and bpf_wakeup.	David Gwynne
	bpf_catchpacket had a chunk to deal with reader timeouts, but that has largely been moved to bpfread. the vestigal code that was left still tried to wake up a reader when a buffer got full, but there already is a chunk of code that wakes up readers when the buffer gets full. bpf_wakeup now checks for readers before calling wakeup directly, rather than pushing the wakeup to a task and calling it unconditionally. the task_add is now only done when the bpfdesc actually has something that needs it. ok visa@
2021-01-02	bpf(4): remove ticks	cheloha
	Change bd_rtout to a uint64_t of nanoseconds. Update the code in bpfioctl() and bpfread() accordingly. Add a local copy of nsecuptime() to make the diff smaller. This will need to move to kern_tc.c if/when we have another user elsewhere in the kernel. Prompted by mpi@. With input from dlg@. ok dlg@ mpi@ visa@
2020-12-26	bpf(4): bpf_d struct: replace bd_rdStart member with bd_nreaders member	cheloha
	bd_rdStart is strange. It nominally represents the start of a read(2) on a given bpf(4) descriptor, but there are several problems with it: 1. If there are multiple readers, the bd_rdStart is not set by subsequent readers, so their timeout is screwed up. The read timeout should really be tracked on a per-thread basis in bpfread(). 2. We set bd_rdStart for poll(2), select(2), and kevent(2), even though that makes no sense. We should not be setting bd_rdStart in bpfpoll() or bpfkqfilter(). 3. bd_rdStart is buggy. If ticks is 0 when the read starts then bpf_catchpacket() won't wake up the reader. This is a problem inherent to the design of bd_rdStart: it serves as both a boolean and a scalar value, even though 0 is a valid value in the scalar range. So let's replace it with a better struct member. "bd_nreaders" is a count of threads sleeping in bpfread(). It is incremented before a thread goes to sleep in bpfread() and decremented when a thread wakes up. If bd_nreaders is greater than zero when we reach bpf_catchpacket() and fbuf is non-NULL we wake up all readers. The read timeout, if any, is now tracked locally by the thread in bpfread(). Unlike bd_rdStart, bpfpoll() and bpfkqfilter() don't touch bd_nreaders. Prompted by mpi@. Basic idea from dlg@. Lots of input from dlg@. Tested by dlg@ with tcpdump(8) (blocking read) and flow-collector (https://github.com/eait-itig/flow-collector, non-blocking read). ok dlg@
2020-12-25	Refactor klist insertion and removal	Visa Hankala
	Rename klist_{insert,remove}() to klist_{insert,remove}_locked(). These functions assume that the caller has locked the klist. The current state of locking remains intact because the kernel lock is still used with all klists. Add new functions klist_insert() and klist_remove() that lock the klist internally. This allows some code simplification. OK mpi@
2020-12-12	Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.	jan
	OK dlg@, bluhm@ No Opinion mpi@ Not against it claudio@
2020-12-12	get bpf_mtap_ether to call _bpf_mtap directly instead of via bpf_mtap.	David Gwynne
	this is so _bpf_mtap can look at the mbuf with packet headers on it so it can fill in more stuff in the bpf_hdr struct. ive been running this in production for most of a month now and it's working well.
2020-12-12	try to read the mbuf timestamp from the mbuf with the pkthdrs in it.	David Gwynne

2020-12-11	bpf(4): BIOCGRTIMEOUT, BIOCSRTIMEOUT: protect bd_rtout with bd_mtx	cheloha
	Reading and writing bd_rtout is not an atomic operation, so it needs to be done under the per-descriptor mutex. While here, start annotating locking in bpfdesc.h. There's lots more to do on this front, but you have to start somewhere. Tweaked by mpi@. ok mpi@
2020-11-04	Use sysctl_int_bounded in bpf_sysctl	gnezdo
	Unlike the other cases of sysctl_bounded_arr this one uses a dynamic limit. OK millert@
2020-06-18	pass the mbuf with the data separately to the one with the pkthdr to mtap.	David Gwynne
	this lets things calling bpf_mtap_hdr and related functions also populate the extended bpf_hdr with the rcvif and prio and stuff.
2020-06-18	extend the bpf_hdr struct to include some metadata if available.	David Gwynne
	the metadata is set if the mbuf is passed with an m_pktrhdr, and copies the mbufs rcvif, priority, flowid. it also carries the direction of the packet. it also makes bpf_hdr a multiple of 4 bytes, which simplifies some calculations a bit. it also requires no changes in userland because libpcap just thinks the extra bytes in the header are padding and skips over them to the payload. this helps me verify things like whether the stack and a network card agree about toeplitz hashes, and paves the way for doing more interesting packet captures. being able to see where a packet came from as it is leaving a machine is very useful. ok mpi@
2020-05-13	bpf(4): separate descriptor non-blocking status from read timeout	cheloha
	If you set FIONBIO on a bpf(4) descriptor you enable non-blocking mode and also clobber any read timeout set for the descriptor. The reverse is also true: do BIOCSRTIMEOUT and you'll set a timeout and simultaneously disable non-blocking status. The two are mutually exclusive. This relationship is undocumented and might cause a bug. At the very least it makes reasoning about the code difficult. This patch adds a new member to bpf_d, bd_rnonblock, to store the non-blocking status of the descriptor. The read timeout is still kept in bd_rtout. With this in place, non-blocking status and the read timeout can coexist. Setting one state does not clear the other, and vice versa. Separating the two states also clears the way for changing the bpf(4) read timeout to use the system clock instead of ticks. More on that in a later patch. With insight from dlg@ regarding the purpose of the read timeout. ok dlg@
2020-04-07	Abstract the head of knote lists. This allows extending the lists,	Visa Hankala
	for example, with locking assertions. OK mpi@, anton@
2020-02-20	Replace field f_isfd with field f_flags in struct filterops to allow	Visa Hankala
	adding more filter properties without cluttering the struct. OK mpi@, anton@
2020-02-14	Push the KERNEL_LOCK() insidge pgsigio() and selwakeup().	Martin Pieuchot
	The 3 subsystems: signal, poll/select and kqueue can now be addressed separatly. Note that bpf(4) and audio(4) currently delay the wakeups to a separate context in order to respect the KERNEL_LOCK() requirement. Sockets (UDP, TCP) and pipes spin to grab the lock for the sames reasons. ok anton@, visa@
2020-01-27	update bpf_iflist in bpfsdetach instead of bpfdetach as some drivers	Joshua Stein
	like USB only use the former and bpf_iflist was otherwise retaining references to a freed bpf_if. ok sashan
2020-01-08	Unify handling of ioctls FIOSETOWN/SIOCSPGRP/TIOCSPGRP and	Visa Hankala
	FIOGETOWN/SIOCGPGRP/TIOCGPGRP. Do this by determining the meaning of the ID parameter inside the sigio code. Also add cases for FIOSETOWN and FIOGETOWN where there have been TIOCSPGRP and TIOCGPGRP before. These changes allow removing the ID translation from sys_fcntl() and sys_ioctl(). Idea from NetBSD OK mpi@, claudio@
2020-01-02	Switch bpf to use pgsigio(9) and sigio_init(9) instead of handrolling	Claudio Jeker
	something with csignal(). OK visa@
2019-12-31	Use C99 designated initializers with struct filterops. In addition,	Visa Hankala
	make the structs const so that the data are put in .rodata. OK mpi@, deraadt@, anton@, bluhm@
2019-10-21	put bpfdesc reference counting back, revert change introduced in 1.175 as:	Alexandr Nedvedicky
	BPF: remove redundant reference counting of filedescriptors Anton@ made problem crystal clear: I've been looking into a similar bpf panic reported by syzkaller, which looks somewhat related. The one reported by syzkaller is caused by issuing ioctl(SIOCIFDESTROY) on the interface which the packet filter is attached to. This will in turn invoke the following functions expressed as an inverted stacktrace: 1. bpfsdetach() 2. vdevgone() 3. VOP_REVOKE() 4. vop_generic_revoke() 5. vgonel() 6. vclean(DOCLOSE) 7. VOP_CLOSE() 8. bpfclose() Note that bpfclose() is called before changing the vnode type. In bpfclose(), the `struct bpf_d` is immediately removed from the global bpf_d_list list and might end up sleeping inside taskq_barrier(systq). Since the bpf file descriptor (fd) is still present and valid, another thread could perform an ioctl() on the fd only to fault since bpfilter_lookup() will return NULL. The vnode is not locked in this path either so it won't end up waiting on the ongoing vclean(). Steps to trigger the similar type of panic are straightforward, let there be two processes running concurrently: process A: while true ; do ifconfig tun0 up ; ifconfig tun0 destroy ; done process B: while true ; do tcpdump -i tun0 ; done panic happens within few secs (Dell PowerEdge 710) OK @visa, OK @anton
2019-10-01	remove the internal plumbing that supported a custom mbuf copy function.	David Gwynne
	this is not needed now that the "public" api does not provide a way to pass a custom copy function in for the internals to pass around. ok claudio@ visa@
2019-09-30	remove the "copy function" argument to bpf_mtap_hdr.	David Gwynne
	it was previously (ab)used by pflog, which has since been fixed. apart from that nothing else used it, so we can trim the cruft. ok kn@ claudio@ visa@ visa@ also made sure i fixed ipw(4) so i386 won't break.
2019-09-12	we don't need to cast hdr arguments to caddr_t for bpf_mtap_hdr anymore.	David Gwynne
	pointed out by naddy@
2019-09-12	make bpf_mtap_hdr take a const void *, not a caddr_t.	David Gwynne
	this makes it easier to call at least, and makes it consistent with bpf_tap_hdr. ok stsp@ sashan@
2019-06-13	free(9) sizes for buffers.	Martin Pieuchot
	ok anton@, sashan@
2019-06-10	use m_microtime to get the packet rx time it might be available.	David Gwynne

2019-05-18	BPF: remove redundant reference counting of filedescriptors	Alexandr Nedvedicky
	OK visa@, OK mpi@
2019-04-25	Lower the accepted upper bound for bd_rtout to INT_MAX in order to	anton
	prevent passing negative values to timeout_add(). While here, protect against unsigned wrap around during addition of bd_rdStart and bd_rtout since it could also cause passing negative values to timeout_add(). ok bluhm@ Reported-by: syzbot+6771e3d6d9567b3983aa@syzkaller.appspotmail.com
2019-04-15	moving BPF to RCU	Alexandr Nedvedicky
	OK visa@
2019-04-03	Reject negative and too large timeouts passed to BIOCSRTIMEOUT. Since	anton
	the timeout converted to ticks is later passed timeout_add(), it could cause a panic if the timeout is negative. ok deraadt@ millert@ Reported-by: syzbot+82cb4dfe6a1fc3d8b490@syzkaller.appspotmail.com
2019-03-18	extend BIOCSFILDROP so it can be configured to not capture packets.	David Gwynne
	BIOCSFILDROP was already able to be used as a quick and dirty firewall, which is especially useful when you you want to filter non-ip things. however, capturing the packets you're dropping is a lot of overhead when you just want to drop stuff. this extends fildrop so you can tell bpf not to capture the packets it drops. ok sthen@ mikeb@ claudio@ visa@
2018-07-13	Some USB network interfaces like rum(4) report ENXIO from their	Alexander Bluhm
	ioctl function after the device has been pulled out. Also accept this error code in bpf_detachd() to prevent a kernel panic. tcpdump(8) may run while the interface is detached. from Moritz Buhl; OK stsp@
2018-03-02	Protect the calls to ifpromisc() in bpf(4) with net lock. This	Alexander Bluhm
	affects the bpfioctl() and bpfclose() path. lock assertion reported and fix tested by Pierre Emeriaud; OK visa@
2018-02-19	Remove almost unused `flags' argument of suser().	Martin Pieuchot
	The account flag `ASU' will no longer be set but that makes suser() mpsafe since it no longer mess with a per-process field. No objection from millert@, ok tedu@, bluhm@
2018-02-01	add bpf_tap_hdr(), for handling a buffer (not an mbuf) with a header.	David Gwynne
	internally it uses mbufs to handle the chain of buffers, but the caller doesnt have to deal with that or allocate a temporary buffer with the header attached. ok mpi@