summaryrefslogtreecommitdiff
path: root/sys/kern/kern_event.c
AgeCommit message (Collapse)Author
2021-12-25kqueue: Invalidate revoked vnodes' knotes on the flyVisa Hankala
When a tty device is revoked, the associated knotes should be invalidated. Otherwise the user processes can keep on receiving events from the device. It appears tricky to do the invalidation as part of revocation in a way that does not allow unwanted event registration or clutter the tty code. For now, make the knotes invalid lazily before delivery. OK mpi@
2021-12-20Make filt_dead() selectively inactive with EVFILT_EXCEPTVisa Hankala
When a knote uses the dead event filter, the knote's file descriptor is not supposed to point to an object with pending out-of-band data. Make the knote inactive so that userspace will not receive a spurious event. However, kqueue-based poll(2) should still receive HUP notifications. This lets the system use dead_filtops with less strings attached relative to the filter type.
2021-12-20Run seltrue/dead event filter in modify and process callbacksVisa Hankala
Do not assume event status in the modify and process callbacks. Instead always run the event filter so that it has a chance to set knote flags. The filter can also indicate event inactivity.
2021-12-11Clarify usage of __EV_POLL and __EV_SELECTVisa Hankala
Make __EV_POLL specific to kqueue-based poll(2), to remove overlap with __EV_SELECT that only select(2) uses. OK millert@ mpi@
2021-11-29kqueue: Revise badfd knote handlingVisa Hankala
When closing a file descriptor and converting the poll/select knotes into badfd knotes, keep the knotes attached to the by-fd table. This should prevent kqueue_purge() from returning before the kqueue has become quiescent. This in turn should fix a KASSERT(TAILQ_EMPTY(&kq->kq_head)) panic in KQRELE() that bluhm@ has reported. The badfd conversion is only needed when a poll/select scan is ongoing. The system can skip the conversion if the knote is not part of the active event set. The code of this commit skips the conversion when the fd is closed by the same thread that has done the fd polling. This can be improved but should already cover typical fd usage patterns. As badfd knotes now hold slots in the by-fd table, kqueue_register() clears them. poll/select use kqueue_register() to set up a new scan; any found fd close notification is a leftover from the previous scan. The new badfd handling should be free of accidental knote accumulation. This obsoletes kqpoll_dequeue() and lowers kqpoll_init() overhead. Re-enable lazy removal of poll/select knotes because the panic should no longer happen. OK mpi@
2021-11-15Revert to eager removal of poll/select knotesVisa Hankala
This should prevent a panic that bluhm@ has reported.
2021-11-13Let filt_fileattach() run without the kernel lockVisa Hankala
This makes it possible to attach pipe, socket and kqueue event filters without acquiring the kernel lock. Event filters behind vn_kqfilter() are not MP-safe yet, so vn_kqfilter() has to take KERNEL_LOCK(). dmabuf_kqfilter() can skip locking because it has no side effects. OK anton@, mpi@
2021-11-12Keep knotes between poll/select systems callsVisa Hankala
Reduce the time overhead of kqueue-based poll(2) and select(2) by keeping knotes registered between the system calls. It is expected that the set of monitored file descriptors is relatively unchanged between consecutive iterations of these system calls. By keeping the knotes, the system saves the effort of repeated knote unregistering and re-registering. To avoid receiving events from file descriptors that are no longer in the monitored set, each poll/select knote is assigned an increasing serial number. Every iteration of poll/select uses a previously unused range of serials for its knotes. In the setup stage, kqueue_register() updates the serials of any existing knotes in the currently monitored set. Function kqueue_scan() delivers only the events whose serials are recent enough; expired knotes are dropped. When the serial range is about to wrap around, all the knotes in the kqueue backend are dropped. This change is a space-time tradeoff. Memory usage is increased somewhat because of the retained knotes. The increase is limited by the number of open file descriptors and active threads. Idea from DragonFly BSD, initial patch by mpi@, kqueue_scan()-based approach by me. Tested by anton@ and mpi@ OK mpi@
2021-11-06Make kqread event filter MP-safeVisa Hankala
Use the monitored kqueue's kq_lock to serialize kqueue and knote access. Typically, the "object lock" would cover also the klist, but that is not possible with kqueues. knote_activate() needs kq_lock of the monitoring kqueue, which would create lock order troubles if kq_lock was held when calling KNOTE(&kq->kq_sel.si_note). Avoid this by using a separate klist lock for kqueues. The new klist lock is system-wide. Each kqueue instance could have a dedicated klist lock. However, the efficacy of dedicated versus system-wide lock is somewhat limited because the current implementation activates kqueue knotes through a single thread. OK mpi@
2021-07-24Modifying a knote must be done with the corresponding lock held. AssertMartin Pieuchot
that the KERNEL_LOCK() is held unless the filter is marked as MPSAFE. Should help finding missing locks when unlocking various filters. ok visa@
2021-07-22Make kqpoll_dequeue() usable with lazy removal of knotesVisa Hankala
Adjust kqpoll_dequeue() so that it will clear only badfd knotes when called from kqpoll_init(). This is needed by kqpoll's lazy removal of knotes. Eager removal in kqpoll_dequeue() would defeat kqpoll's attempt to reuse previously established knotes under workloads where knote activation tends to occur already before next kqpoll scan. Prompted by mpi@
2021-06-16kqueue: kq_lock is needed when updating kn_statusVisa Hankala
The kn_status field of struct knote is part of kqueue's internal state. When kn_status is being updated, kq_lock has to be locked. This is true even with MP-unsafe event filters. OK mpi@
2021-06-11Remember to lock kqueue mutex in filt_timermodify().Visa Hankala
Reported-by: syzbot+c2aba7645a218ce03027@syzkaller.appspotmail.com
2021-06-10Serialize internals of kqueue with a mutexVisa Hankala
Extend struct kqueue with a mutex and use it to serializes the internals of each kqueue instance. This should make possible to call kqueue's system call interface without the kernel lock. The event source facing side of kqueue should now be MP-safe, too, as long as the event source itself is MP-safe. msleep() with PCATCH still requires the kernel lock. To manage with this, kqueue_scan() locks the kernel temporarily for the section that may sleep. As a consequence of the kqueue mutex, knote_acquire() can lose a wakeup when klist_invalidate() calls it. To preserve proper nesting of mutexes, knote_acquire() has to release the kqueue mutex before it unlocks klist. This early unlocking of the mutex lets badly timed wakeups go unnoticed. However, the system should not hang because the sleep has a timeout. Tested by gnezdo@ and mpi@ OK mpi@
2021-06-02Enable pool cache on knote poolVisa Hankala
Use the pool cache to reduce the overhead of memory management in function kqueue_register(). When EV_ADD is given, kqueue_register() pre-allocates a knote to avoid potential sleeping in the middle of the critical section that spans from knote lookup to insertion. However, the pre-allocation is useless if the lookup finds a matching knote. The cost of knote allocation will become significant with kqueue-based poll(2) and select(2) because the frequency of allocation will increase. Most of the cost appears to come from the locking inside the pool. The pool cache amortizes it by using CPU-local caches of free knotes as buffers. OK dlg@ mpi@
2021-04-22kqueue: Make timer re-addition reset existing timerVisa Hankala
When an existing EVFILT_TIMER filter is re-added, cancel the existing timer and any pending event, and restart the timer using the new timeout period. This makes the new timeout period take effect immediately and matches the behaviour of FreeBSD. Previously, the new setting was applied only after the existing timer expired. The timer rescheduling is done by using an f_modify callback. The reading of timer events is moved from f_event to f_process. f_event of timer_filtops becomes redundant. Unlike most other event sources, timers activate knotes directly without using a klist and knote(9). OK mpi@
2021-02-27Replace stray direct call of f_event with filter_event().Visa Hankala
This does not change the current behaviour, but filterops should be invoked through filter_*() for consistency.
2021-02-24kqueue: Revise filterops interfaceVisa Hankala
Extend kqueue's filterops interface with new callbacks so that it becomes easier to use with fine-grained locking. The new interface delegates the serialization of kn_event access to event sources. Now kqueue uses filterops callbacks to read or write kn_event. This hides event sources' locking patterns from kqueue, and allows clean implementation of atomic read-and-clear for EV_CLEAR, for instance. There are so many existing filterops instances that converting all of them in one go is tricky. This patch adds a wrapper mechanism that kqueue uses when the new callbacks are missing. The new filterops interface has been influenced by XNU's kqueue. OK mpi@ semarie@
2021-01-27kqueue: Fix termination assertVisa Hankala
When a kqueue file is closed, the kqueue can still have threads scanning it. Consequently, kqueue_terminate() can see scan markers in the event queue. These markers are removed when the scanning threads leave the kqueue. Take this into account when checking the queue's state, to avoid a panic when kqueue is closed from under a thread. OK anton@ Reported-by: syzbot+757c60a2aa1125137cce@syzkaller.appspotmail.com
2021-01-17kqueue: Revise fd close notificationVisa Hankala
Deliver file descriptor close notification for __EV_POLL knotes through struct kevent that kqueue_scan() returns. This replaces the previous way of returning EBADF from kqueue_scan(), making it easier to determine what exactly has changed. When a file descriptor is closed, its __EV_POLL knotes are turned into one-shot events and queued for delivery. These knotes are "unregistered" as they are reachable only through the queue of active events. This reduces interference with the normal workings of kqueue. However, more care is needed to avoid leaking knotes. In addition, the unregistering removes a limit on the number of issued knotes. To prevent accumulation of pending fd close notifications, kqpoll_init() flushes the active queue at the start of a kqpoll scan. OK mpi@
2021-01-08Lock kernel before raising SPL in klist_lock()Visa Hankala
This prevents unwanted spinning with interrupts disabled. At the moment, this code is only invoked through klist_invalidate() and the callers should already hold the kernel lock. Also, one could argue that in MP-unsafe contexts klist_lock() should only assert for the kernel lock.
2021-01-07Adjust comment about klist_invalidate()Visa Hankala
2020-12-25Refactor klist insertion and removalVisa Hankala
Rename klist_{insert,remove}() to klist_{insert,remove}_locked(). These functions assume that the caller has locked the klist. The current state of locking remains intact because the kernel lock is still used with all klists. Add new functions klist_insert() and klist_remove() that lock the klist internally. This allows some code simplification. OK mpi@
2020-12-23Ensure that filt_dead() takes effectVisa Hankala
Invoke dead_filtops' f_event callback in klist_invalidate() to ensure that filt_dead() modifies every invalidated knote. If a knote has EV_ONESHOT set in its event flags, kqueue_scan() will not call f_event. OK mpi@
2020-12-23Clear error before each iteration in kqueue_scan()Visa Hankala
This fixes a regression where kqueue_scan() may incorrectly return EWOULDBLOCK after a timeout. OK mpi@
2020-12-20Introduce klistopsVisa Hankala
This patch extends struct klist with a callback descriptor and an argument. The main purpose of this is to let the kqueue subsystem assert when a klist should be locked, and operate the klist lock in klist_invalidate(). Access to a knote list of a kqueue-monitored object has to be serialized somehow. Because the object often has a lock for protecting its state, and because the object often acquires this lock at the latest in its f_event callback function, it makes sense to use this lock also for the knote lists. The existing uses of NOTE_SUBMIT already show a pattern that is likely to become more prevalent. There could be an embedded lock in klist. However, such a lock would be redundant in many cases. The code cannot rely on a single lock type (mutex, rwlock, something else) because the needs of monitored objects vary. In addition, an embedded lock would introduce new lock order constraints. Note that the patch does not rule out use of dedicated klist locks. The patch introduces a way to associate lock operations with a klist. The caller can provide a custom implementation, or use a ready-made interface with a mutex or rwlock. For compatibility with old code, the new code falls back to using the kernel lock if no specific klist initialization has been done. The existing code already relies on implicit initialization of klist. Sadly, this change increases the size of struct klist. dlg@ thinks this is not fatal, though. OK mpi@
2020-12-18Add fd close notification for kqueue-based poll() and select()Visa Hankala
When the file descriptor of an __EV_POLL-flagged knote is closed, post EBADF through the kqueue instance to the caller of kqueue_scan(). This lets kqueue-based poll() and select() preserve their current behaviour of returning EBADF when a polled file descriptor is closed concurrently. OK mpi@
2020-12-18Make knote_{activate,remove}() internal to kern_event.c.Visa Hankala
OK mpi@
2020-12-16Remove kqueue_free() and use KQRELE() in kqpoll_exit().Visa Hankala
Because kqpoll instances are now linked to the file descriptor table, the freeing of kqpoll and ordinary kqueues is similar. Suggested by mpi@
2020-12-16Link kqpoll instances to fd_kqlist.Visa Hankala
This lets the system remove kqpoll-related event registrations when a file descriptor is closed. OK mpi@
2020-12-15Use nkev in place of count in kqueue_scan().Visa Hankala
OK cheloha@, mpi@, mvs@
2020-12-09Add kernel-only per-thread kqueue & helpers to initialize and free it.Martin Pieuchot
This will soon be used by select(2) and poll(2). ok anton@, visa@
2020-12-07Refactor kqueue_scan() so it can be used by other syscalls.Martin Pieuchot
Stop iterating in the function and instead copy the returned events to userland after every call. ok visa@
2020-11-25Change kqueue_scan() to keep track of collected events in the given context.Martin Pieuchot
It is now possible to call the function multiple times to collect events. For that, the end marker has to be preserved between calls because otherwise the scan might collect an event more than once. If a collected event gets reactivated during scanning, it will be added at the tail of the queue, out of reach because of the end marker. This is required to implement select(2) and poll(2) on top of kqueue_scan(). Done & originally committed by visa@ in r1.143, in snap for more than 2 weeks. ok visa@, anton@
2020-10-26kevent(2): ktrace the timeout before validating itcheloha
As deraadt@ has pointed out, tracing timevals and timespecs before validating them makes debugging easier.
2020-10-11Refactor kqueue_scan() to use a context: a "kqueue_scan_state struct".Martin Pieuchot
The struct keeps track of the end point of an event queue scan by persisting the end marker. This will be needed when kqueue_scan() is called repeatedly to complete a scan in a piecewise fashion. Extracted from a previous diff from visa@. ok visa@, anton@
2020-08-12Reduce stack usage of kqueue_scan()Visa Hankala
Reuse the kev[] array of sys_kevent() in kqueue_scan() to lower stack usage. The code has reset kevp, but not nkev, whenever the retry branch is taken. However, the resetting is unnecessary because retry should be taken only if no events have been collected. Make this clearer by adding KASSERTs. OK mpi@
2020-07-04Use klist_invalidate() in knote_processexit()Visa Hankala
This leaves knote_remove() for kqueue's internal use. As a result, knote_remove() is used to drop knotes from the knlist of a single kqueue instance. klist_invalidate() clears knotes from a klist that can contain entries from different kqueue instances. Use FILTEROP_ISFD to control how klist_invalidate() treats knotes, to preserve the current behaviour of knote_processexit(). All the existing callers of klist_invalidate() are fd-based. The existing code rewires and activates knotes to give userspace a clear indication that the state of the fd has changed. In knote_processexit(), any remaining knotes in ps_klist are non-fd-based (EVFILT_SIGNAL). Those are dropped without notifying userspace. OK mpi@
2020-06-22Extend kqueue interface with EVFILT_EXCEPT filter.Martin Pieuchot
This filter, already implemented in macOS and Dragonfly BSD, returns exceptional conditions like the reception of out-of-band data. The functionnality is similar to poll(2)'s POLLPRI & POLLRDBAND and it can be used by the kqfilter-based poll & select implementation. ok millert@ on a previous version, ok visa@
2020-06-15Implement a simple kqfilter for deadfs matching its poll handler.Martin Pieuchot
ok visa@, millert@
2020-06-15Raise SPL when modifying ps_klist to prevent a race with interrupts.Visa Hankala
The list can be accessed from interrupt context if a signal is sent from an interrupt handler. OK anton@ cheloha@ mpi@
2020-06-14Remove misleading XXX about locking of ps_klist. All of the kqueueVisa Hankala
subsystem and ps_klist handling still run under the kernel lock.
2020-06-12Revert addition of double underbars for filter-specific flag.Martin Pieuchot
Port breakages reported by naddy@
2020-06-11Rename poll-compatibility flag to better reflect what it is.Martin Pieuchot
While here prefix kernel-only EV flags with two underbars. Suggested by kettenis@, ok visa@
2020-05-30Introduce kqueue_terminate() & kqueue_free(), no functional changes.Martin Pieuchot
These functions will be used to managed per-thread kqueues that are not associated to a file descriptor. ok visa@
2020-05-25Revert "Add kqueue_scan_state struct"Visa Hankala
sthen@ has reported that the patch might be causing hangs with X.
2020-05-17Add kqueue_scan_state structVisa Hankala
The struct keeps track of the end point of an event queue scan by persisting the end marker. This will be needed when kqueue_scan() is called repeatedly to complete a scan in a piecewise fashion. The end marker has to be preserved between calls because otherwise the scan might collect an event more than once. If a collected event gets reactivated during scanning, it will be added at the tail of the queue, out of reach because of the end marker. OK mpi@
2020-04-07Abstract the head of knote lists. This allows extending the lists,Visa Hankala
for example, with locking assertions. OK mpi@, anton@
2020-04-07Defer selwakeup() from kqueue_wakeup() to kqueue_task() to preventVisa Hankala
deep recursion. This also helps making kqueue_wakeup() free of the kernel lock because the current implementation of selwakeup() requires the lock. OK mpi@
2020-04-02Introduce kqueue_sleep() a wrapper around the tsleep(9) dance.Martin Pieuchot
While here check for the validity of the timeout at the begining of the syscall. ok kettenis@, cheloha@