src - OpenBSD base system

Age	Commit message (Collapse)	Author
2021-12-25	kqueue: Invalidate revoked vnodes' knotes on the fly	Visa Hankala
	When a tty device is revoked, the associated knotes should be invalidated. Otherwise the user processes can keep on receiving events from the device. It appears tricky to do the invalidation as part of revocation in a way that does not allow unwanted event registration or clutter the tty code. For now, make the knotes invalid lazily before delivery. OK mpi@
2021-12-20	Make filt_dead() selectively inactive with EVFILT_EXCEPT	Visa Hankala
	When a knote uses the dead event filter, the knote's file descriptor is not supposed to point to an object with pending out-of-band data. Make the knote inactive so that userspace will not receive a spurious event. However, kqueue-based poll(2) should still receive HUP notifications. This lets the system use dead_filtops with less strings attached relative to the filter type.
2021-12-20	Run seltrue/dead event filter in modify and process callbacks	Visa Hankala
	Do not assume event status in the modify and process callbacks. Instead always run the event filter so that it has a chance to set knote flags. The filter can also indicate event inactivity.
2021-12-11	Clarify usage of __EV_POLL and __EV_SELECT	Visa Hankala
	Make __EV_POLL specific to kqueue-based poll(2), to remove overlap with __EV_SELECT that only select(2) uses. OK millert@ mpi@
2021-11-29	kqueue: Revise badfd knote handling	Visa Hankala
	When closing a file descriptor and converting the poll/select knotes into badfd knotes, keep the knotes attached to the by-fd table. This should prevent kqueue_purge() from returning before the kqueue has become quiescent. This in turn should fix a KASSERT(TAILQ_EMPTY(&kq->kq_head)) panic in KQRELE() that bluhm@ has reported. The badfd conversion is only needed when a poll/select scan is ongoing. The system can skip the conversion if the knote is not part of the active event set. The code of this commit skips the conversion when the fd is closed by the same thread that has done the fd polling. This can be improved but should already cover typical fd usage patterns. As badfd knotes now hold slots in the by-fd table, kqueue_register() clears them. poll/select use kqueue_register() to set up a new scan; any found fd close notification is a leftover from the previous scan. The new badfd handling should be free of accidental knote accumulation. This obsoletes kqpoll_dequeue() and lowers kqpoll_init() overhead. Re-enable lazy removal of poll/select knotes because the panic should no longer happen. OK mpi@
2021-11-15	Revert to eager removal of poll/select knotes	Visa Hankala
	This should prevent a panic that bluhm@ has reported.
2021-11-13	Let filt_fileattach() run without the kernel lock	Visa Hankala
	This makes it possible to attach pipe, socket and kqueue event filters without acquiring the kernel lock. Event filters behind vn_kqfilter() are not MP-safe yet, so vn_kqfilter() has to take KERNEL_LOCK(). dmabuf_kqfilter() can skip locking because it has no side effects. OK anton@, mpi@
2021-11-12	Keep knotes between poll/select systems calls	Visa Hankala
	Reduce the time overhead of kqueue-based poll(2) and select(2) by keeping knotes registered between the system calls. It is expected that the set of monitored file descriptors is relatively unchanged between consecutive iterations of these system calls. By keeping the knotes, the system saves the effort of repeated knote unregistering and re-registering. To avoid receiving events from file descriptors that are no longer in the monitored set, each poll/select knote is assigned an increasing serial number. Every iteration of poll/select uses a previously unused range of serials for its knotes. In the setup stage, kqueue_register() updates the serials of any existing knotes in the currently monitored set. Function kqueue_scan() delivers only the events whose serials are recent enough; expired knotes are dropped. When the serial range is about to wrap around, all the knotes in the kqueue backend are dropped. This change is a space-time tradeoff. Memory usage is increased somewhat because of the retained knotes. The increase is limited by the number of open file descriptors and active threads. Idea from DragonFly BSD, initial patch by mpi@, kqueue_scan()-based approach by me. Tested by anton@ and mpi@ OK mpi@
2021-11-06	Make kqread event filter MP-safe	Visa Hankala
	Use the monitored kqueue's kq_lock to serialize kqueue and knote access. Typically, the "object lock" would cover also the klist, but that is not possible with kqueues. knote_activate() needs kq_lock of the monitoring kqueue, which would create lock order troubles if kq_lock was held when calling KNOTE(&kq->kq_sel.si_note). Avoid this by using a separate klist lock for kqueues. The new klist lock is system-wide. Each kqueue instance could have a dedicated klist lock. However, the efficacy of dedicated versus system-wide lock is somewhat limited because the current implementation activates kqueue knotes through a single thread. OK mpi@
2021-07-24	Modifying a knote must be done with the corresponding lock held. Assert	Martin Pieuchot
	that the KERNEL_LOCK() is held unless the filter is marked as MPSAFE. Should help finding missing locks when unlocking various filters. ok visa@
2021-07-22	Make kqpoll_dequeue() usable with lazy removal of knotes	Visa Hankala
	Adjust kqpoll_dequeue() so that it will clear only badfd knotes when called from kqpoll_init(). This is needed by kqpoll's lazy removal of knotes. Eager removal in kqpoll_dequeue() would defeat kqpoll's attempt to reuse previously established knotes under workloads where knote activation tends to occur already before next kqpoll scan. Prompted by mpi@
2021-06-16	kqueue: kq_lock is needed when updating kn_status	Visa Hankala
	The kn_status field of struct knote is part of kqueue's internal state. When kn_status is being updated, kq_lock has to be locked. This is true even with MP-unsafe event filters. OK mpi@
2021-06-11	Remember to lock kqueue mutex in filt_timermodify().	Visa Hankala
	Reported-by: syzbot+c2aba7645a218ce03027@syzkaller.appspotmail.com
2021-06-10	Serialize internals of kqueue with a mutex	Visa Hankala
	Extend struct kqueue with a mutex and use it to serializes the internals of each kqueue instance. This should make possible to call kqueue's system call interface without the kernel lock. The event source facing side of kqueue should now be MP-safe, too, as long as the event source itself is MP-safe. msleep() with PCATCH still requires the kernel lock. To manage with this, kqueue_scan() locks the kernel temporarily for the section that may sleep. As a consequence of the kqueue mutex, knote_acquire() can lose a wakeup when klist_invalidate() calls it. To preserve proper nesting of mutexes, knote_acquire() has to release the kqueue mutex before it unlocks klist. This early unlocking of the mutex lets badly timed wakeups go unnoticed. However, the system should not hang because the sleep has a timeout. Tested by gnezdo@ and mpi@ OK mpi@
2021-06-02	Enable pool cache on knote pool	Visa Hankala
	Use the pool cache to reduce the overhead of memory management in function kqueue_register(). When EV_ADD is given, kqueue_register() pre-allocates a knote to avoid potential sleeping in the middle of the critical section that spans from knote lookup to insertion. However, the pre-allocation is useless if the lookup finds a matching knote. The cost of knote allocation will become significant with kqueue-based poll(2) and select(2) because the frequency of allocation will increase. Most of the cost appears to come from the locking inside the pool. The pool cache amortizes it by using CPU-local caches of free knotes as buffers. OK dlg@ mpi@
2021-04-22	kqueue: Make timer re-addition reset existing timer	Visa Hankala
	When an existing EVFILT_TIMER filter is re-added, cancel the existing timer and any pending event, and restart the timer using the new timeout period. This makes the new timeout period take effect immediately and matches the behaviour of FreeBSD. Previously, the new setting was applied only after the existing timer expired. The timer rescheduling is done by using an f_modify callback. The reading of timer events is moved from f_event to f_process. f_event of timer_filtops becomes redundant. Unlike most other event sources, timers activate knotes directly without using a klist and knote(9). OK mpi@
2021-02-27	Replace stray direct call of f_event with filter_event().	Visa Hankala
	This does not change the current behaviour, but filterops should be invoked through filter_*() for consistency.
2021-02-24	kqueue: Revise filterops interface	Visa Hankala
	Extend kqueue's filterops interface with new callbacks so that it becomes easier to use with fine-grained locking. The new interface delegates the serialization of kn_event access to event sources. Now kqueue uses filterops callbacks to read or write kn_event. This hides event sources' locking patterns from kqueue, and allows clean implementation of atomic read-and-clear for EV_CLEAR, for instance. There are so many existing filterops instances that converting all of them in one go is tricky. This patch adds a wrapper mechanism that kqueue uses when the new callbacks are missing. The new filterops interface has been influenced by XNU's kqueue. OK mpi@ semarie@
2021-01-27	kqueue: Fix termination assert	Visa Hankala
	When a kqueue file is closed, the kqueue can still have threads scanning it. Consequently, kqueue_terminate() can see scan markers in the event queue. These markers are removed when the scanning threads leave the kqueue. Take this into account when checking the queue's state, to avoid a panic when kqueue is closed from under a thread. OK anton@ Reported-by: syzbot+757c60a2aa1125137cce@syzkaller.appspotmail.com
2021-01-17	kqueue: Revise fd close notification	Visa Hankala
	Deliver file descriptor close notification for __EV_POLL knotes through struct kevent that kqueue_scan() returns. This replaces the previous way of returning EBADF from kqueue_scan(), making it easier to determine what exactly has changed. When a file descriptor is closed, its __EV_POLL knotes are turned into one-shot events and queued for delivery. These knotes are "unregistered" as they are reachable only through the queue of active events. This reduces interference with the normal workings of kqueue. However, more care is needed to avoid leaking knotes. In addition, the unregistering removes a limit on the number of issued knotes. To prevent accumulation of pending fd close notifications, kqpoll_init() flushes the active queue at the start of a kqpoll scan. OK mpi@
2021-01-08	Lock kernel before raising SPL in klist_lock()	Visa Hankala
	This prevents unwanted spinning with interrupts disabled. At the moment, this code is only invoked through klist_invalidate() and the callers should already hold the kernel lock. Also, one could argue that in MP-unsafe contexts klist_lock() should only assert for the kernel lock.
2021-01-07	Adjust comment about klist_invalidate()	Visa Hankala

2020-12-25	Refactor klist insertion and removal	Visa Hankala
	Rename klist_{insert,remove}() to klist_{insert,remove}_locked(). These functions assume that the caller has locked the klist. The current state of locking remains intact because the kernel lock is still used with all klists. Add new functions klist_insert() and klist_remove() that lock the klist internally. This allows some code simplification. OK mpi@
2020-12-23	Ensure that filt_dead() takes effect	Visa Hankala
	Invoke dead_filtops' f_event callback in klist_invalidate() to ensure that filt_dead() modifies every invalidated knote. If a knote has EV_ONESHOT set in its event flags, kqueue_scan() will not call f_event. OK mpi@
2020-12-23	Clear error before each iteration in kqueue_scan()	Visa Hankala
	This fixes a regression where kqueue_scan() may incorrectly return EWOULDBLOCK after a timeout. OK mpi@
2020-12-20	Introduce klistops	Visa Hankala
	This patch extends struct klist with a callback descriptor and an argument. The main purpose of this is to let the kqueue subsystem assert when a klist should be locked, and operate the klist lock in klist_invalidate(). Access to a knote list of a kqueue-monitored object has to be serialized somehow. Because the object often has a lock for protecting its state, and because the object often acquires this lock at the latest in its f_event callback function, it makes sense to use this lock also for the knote lists. The existing uses of NOTE_SUBMIT already show a pattern that is likely to become more prevalent. There could be an embedded lock in klist. However, such a lock would be redundant in many cases. The code cannot rely on a single lock type (mutex, rwlock, something else) because the needs of monitored objects vary. In addition, an embedded lock would introduce new lock order constraints. Note that the patch does not rule out use of dedicated klist locks. The patch introduces a way to associate lock operations with a klist. The caller can provide a custom implementation, or use a ready-made interface with a mutex or rwlock. For compatibility with old code, the new code falls back to using the kernel lock if no specific klist initialization has been done. The existing code already relies on implicit initialization of klist. Sadly, this change increases the size of struct klist. dlg@ thinks this is not fatal, though. OK mpi@
2020-12-18	Add fd close notification for kqueue-based poll() and select()	Visa Hankala
	When the file descriptor of an __EV_POLL-flagged knote is closed, post EBADF through the kqueue instance to the caller of kqueue_scan(). This lets kqueue-based poll() and select() preserve their current behaviour of returning EBADF when a polled file descriptor is closed concurrently. OK mpi@
2020-12-18	Make knote_{activate,remove}() internal to kern_event.c.	Visa Hankala
	OK mpi@
2020-12-16	Remove kqueue_free() and use KQRELE() in kqpoll_exit().	Visa Hankala
	Because kqpoll instances are now linked to the file descriptor table, the freeing of kqpoll and ordinary kqueues is similar. Suggested by mpi@
2020-12-16	Link kqpoll instances to fd_kqlist.	Visa Hankala
	This lets the system remove kqpoll-related event registrations when a file descriptor is closed. OK mpi@
2020-12-15	Use nkev in place of count in kqueue_scan().	Visa Hankala
	OK cheloha@, mpi@, mvs@
2020-12-09	Add kernel-only per-thread kqueue & helpers to initialize and free it.	Martin Pieuchot
	This will soon be used by select(2) and poll(2). ok anton@, visa@
2020-12-07	Refactor kqueue_scan() so it can be used by other syscalls.	Martin Pieuchot
	Stop iterating in the function and instead copy the returned events to userland after every call. ok visa@
2020-11-25	Change kqueue_scan() to keep track of collected events in the given context.	Martin Pieuchot
	It is now possible to call the function multiple times to collect events. For that, the end marker has to be preserved between calls because otherwise the scan might collect an event more than once. If a collected event gets reactivated during scanning, it will be added at the tail of the queue, out of reach because of the end marker. This is required to implement select(2) and poll(2) on top of kqueue_scan(). Done & originally committed by visa@ in r1.143, in snap for more than 2 weeks. ok visa@, anton@
2020-10-26	kevent(2): ktrace the timeout before validating it	cheloha
	As deraadt@ has pointed out, tracing timevals and timespecs before validating them makes debugging easier.
2020-10-11	Refactor kqueue_scan() to use a context: a "kqueue_scan_state struct".	Martin Pieuchot
	The struct keeps track of the end point of an event queue scan by persisting the end marker. This will be needed when kqueue_scan() is called repeatedly to complete a scan in a piecewise fashion. Extracted from a previous diff from visa@. ok visa@, anton@
2020-08-12	Reduce stack usage of kqueue_scan()	Visa Hankala
	Reuse the kev[] array of sys_kevent() in kqueue_scan() to lower stack usage. The code has reset kevp, but not nkev, whenever the retry branch is taken. However, the resetting is unnecessary because retry should be taken only if no events have been collected. Make this clearer by adding KASSERTs. OK mpi@
2020-07-04	Use klist_invalidate() in knote_processexit()	Visa Hankala
	This leaves knote_remove() for kqueue's internal use. As a result, knote_remove() is used to drop knotes from the knlist of a single kqueue instance. klist_invalidate() clears knotes from a klist that can contain entries from different kqueue instances. Use FILTEROP_ISFD to control how klist_invalidate() treats knotes, to preserve the current behaviour of knote_processexit(). All the existing callers of klist_invalidate() are fd-based. The existing code rewires and activates knotes to give userspace a clear indication that the state of the fd has changed. In knote_processexit(), any remaining knotes in ps_klist are non-fd-based (EVFILT_SIGNAL). Those are dropped without notifying userspace. OK mpi@
2020-06-22	Extend kqueue interface with EVFILT_EXCEPT filter.	Martin Pieuchot
	This filter, already implemented in macOS and Dragonfly BSD, returns exceptional conditions like the reception of out-of-band data. The functionnality is similar to poll(2)'s POLLPRI & POLLRDBAND and it can be used by the kqfilter-based poll & select implementation. ok millert@ on a previous version, ok visa@
2020-06-15	Implement a simple kqfilter for deadfs matching its poll handler.	Martin Pieuchot
	ok visa@, millert@
2020-06-15	Raise SPL when modifying ps_klist to prevent a race with interrupts.	Visa Hankala
	The list can be accessed from interrupt context if a signal is sent from an interrupt handler. OK anton@ cheloha@ mpi@
2020-06-14	Remove misleading XXX about locking of ps_klist. All of the kqueue	Visa Hankala
	subsystem and ps_klist handling still run under the kernel lock.
2020-06-12	Revert addition of double underbars for filter-specific flag.	Martin Pieuchot
	Port breakages reported by naddy@
2020-06-11	Rename poll-compatibility flag to better reflect what it is.	Martin Pieuchot
	While here prefix kernel-only EV flags with two underbars. Suggested by kettenis@, ok visa@
2020-05-30	Introduce kqueue_terminate() & kqueue_free(), no functional changes.	Martin Pieuchot
	These functions will be used to managed per-thread kqueues that are not associated to a file descriptor. ok visa@
2020-05-25	Revert "Add kqueue_scan_state struct"	Visa Hankala
	sthen@ has reported that the patch might be causing hangs with X.
2020-05-17	Add kqueue_scan_state struct	Visa Hankala
	The struct keeps track of the end point of an event queue scan by persisting the end marker. This will be needed when kqueue_scan() is called repeatedly to complete a scan in a piecewise fashion. The end marker has to be preserved between calls because otherwise the scan might collect an event more than once. If a collected event gets reactivated during scanning, it will be added at the tail of the queue, out of reach because of the end marker. OK mpi@
2020-04-07	Abstract the head of knote lists. This allows extending the lists,	Visa Hankala
	for example, with locking assertions. OK mpi@, anton@
2020-04-07	Defer selwakeup() from kqueue_wakeup() to kqueue_task() to prevent	Visa Hankala
	deep recursion. This also helps making kqueue_wakeup() free of the kernel lock because the current implementation of selwakeup() requires the lock. OK mpi@
2020-04-02	Introduce kqueue_sleep() a wrapper around the tsleep(9) dance.	Martin Pieuchot
	While here check for the validity of the timeout at the begining of the syscall. ok kettenis@, cheloha@