summaryrefslogtreecommitdiff
path: root/sys/kern/kern_event.c
AgeCommit message (Collapse)Author
2024-08-06Stop using KERNEL_LOCK to protect the per process kqueue listClaudio Jeker
Instead of the KERNEL_LOCK use the ps_mtx for most operations. If the ps_klist is modified an additional global rwlock (kqueue_ps_list_lock) is required. This includes the knotes with NOTE_FORK and NOTE_EXIT since in either cases a ps_klist is changed. In the NOTE_FORK | NOTE_TRACK case the call to kqueue_register() can sleep this is why a global rwlock is used. Adjust the reaper() to call knote_processexit() without KERNEL_LOCK. Double lock idea from visa@ OK mvs@
2024-07-29Move the signal related kqueue filters to kern_event.c.Claudio Jeker
Since proc and signal filters share the same klist it makes sense to keep them together. OK mvs@
2023-08-20Add kqueue1() system callVisa Hankala
kqueue1() takes the flags argument. This lets the kqueue file descriptor be opened with O_CLOEXEC. Adapted from NetBSD. OK guenther@
2023-08-13kevent: Add precision and abstimer flags for EVFILT_TIMERVisa Hankala
Add timer precision flags NOTE_SECONDS, NOTE_MSECONDS, NOTE_USECONDS and NOTE_NSECONDS for EVFILT_TIMER. Also, add an initial implementation of NOTE_ABSTIME timers. Similar kevent(2) flags exist on FreeBSD, NetBSD and XNU. Initial diff by and OK aisha@ OK mpi@
2023-04-11fix double words in commentsJonathan Gray
feedback and ok jmc@ miod, ok millert@
2023-02-10Adjust knote(9) APIVisa Hankala
Make knote(9) lock the knote list internally, and add knote_locked(9) for the typical situation where the list is already locked. Remove the KNOTE(9) macro to simplify the API. Manual page OK jmc@ OK mpi@ mvs@
2022-11-09Remove kernel lock here since msleep() with PCATCH no longer requires it.Claudio Jeker
OK mpi@
2022-08-14remove unneeded includes in sys/kernJonathan Gray
ok mpi@ miod@
2022-07-09Unwrap klist from struct selinfo as this code no longer uses selwakeup().Visa Hankala
OK jsg@
2022-06-27kqueue: Clear task when closing kqueueVisa Hankala
When closing a kqueue, block until any pending wakeup task has finished. Otherwise, if a pending task progressed slowly, the kqueue could stay alive longer than the associated file descriptor table, causing a use-after-free in KQRELE(). This also fixes a failed assertion "p->p_kq->kq_refcnt.r_refs == 1" in kqpoll_exit(). The use-after-free bug had existed since the introduction of kqueue_task() (the bug could occur if fdplock() blocked in KQRELE()). However, the issue became worse when the task was allowed to run without the kernel lock in sys/kern/kern_event.c r1.187. Prompted by a report from Mikhail on bugs@. OK mpi@ Reported-by: syzbot+fca7e4fa773c90886819@syzkaller.appspotmail.com
2022-06-20Remove unused struct fileops field fo_poll and callbacks.Visa Hankala
OK mpi@
2022-06-12kqueue: Fix missing wakeupVisa Hankala
While one thread is running kqueue_scan(), another thread can begin scanning the same kqueue, observe that the event queue is empty, and go to sleep. If the first thread re-inserts a knote for re-processing, the second thread can miss the newly pending event. Wake up the kqueue after a re-insert to correct this. This fixes a Go test hang that jsing@ tracked down to kqueue. Tested in snaps for a week. OK jsing@ mpi@
2022-05-12kqueue: Fix race condition in knote_remove()Visa Hankala
Always fetch the knlist array pointer at the start of every iteration in knote_remove(). This prevents the use of a stale pointer after another thread has simultaneously reallocated the kq_knlist array. Reported and tested by and OK jsing@
2022-05-06Replace selwakeup() with KNOTE() in kqueue event activation.Visa Hankala
The deferred activation can now run in an MP-safe task queue.
2022-03-31Move knote_processexit() call from exit1() to the reaper().Todd C. Miller
This fixes a problem where NOTE_EXIT could be received before the process was officially a zombie and thus not immediately waitable. OK deraadt@ visa@
2022-03-16Remove an unneeded include.Visa Hankala
2022-03-16Use the refcnt API in kqueue.Visa Hankala
OK dlg@ bluhm@
2022-02-22Delete unnecessary #includes of <sys/domain.h> and/or <sys/protosw.h>Philip Guenther
net/if_pppx.c pointed out by jsg@ ok gnezdo@ deraadt@ jsg@ mpi@ millert@
2022-02-13Use knote_modify() and knote_process() in obvious places.Visa Hankala
2022-02-13Rename knote_modify() to knote_assign()Visa Hankala
This avoids verb overlap with f_modify.
2022-02-11Inline klist_empty() for more economic machine code.Visa Hankala
OK mpi@
2022-02-08poll(2): Switch to kqueue backendVisa Hankala
Implement the poll(2) system call on top of the kqueue subsystem. This obsoletes the old, non-MP-safe poll backend. On entering poll(2), the new code translates each pollfd array entry into a set of knotes. When these knotes receive events through kqueue, the events are translated back to pollfd format. Entries in the pollfd array can refer to the same file descriptor with overlapping event masks. To allow such overlap with knotes, use an extra kn_pollid key that separates knotes of different pollfd entries. Adapted from DragonFly BSD, initial implementation by mpi@. Tested in snaps for three weeks. OK mpi@
2021-12-25kqueue: Invalidate revoked vnodes' knotes on the flyVisa Hankala
When a tty device is revoked, the associated knotes should be invalidated. Otherwise the user processes can keep on receiving events from the device. It appears tricky to do the invalidation as part of revocation in a way that does not allow unwanted event registration or clutter the tty code. For now, make the knotes invalid lazily before delivery. OK mpi@
2021-12-20Make filt_dead() selectively inactive with EVFILT_EXCEPTVisa Hankala
When a knote uses the dead event filter, the knote's file descriptor is not supposed to point to an object with pending out-of-band data. Make the knote inactive so that userspace will not receive a spurious event. However, kqueue-based poll(2) should still receive HUP notifications. This lets the system use dead_filtops with less strings attached relative to the filter type.
2021-12-20Run seltrue/dead event filter in modify and process callbacksVisa Hankala
Do not assume event status in the modify and process callbacks. Instead always run the event filter so that it has a chance to set knote flags. The filter can also indicate event inactivity.
2021-12-11Clarify usage of __EV_POLL and __EV_SELECTVisa Hankala
Make __EV_POLL specific to kqueue-based poll(2), to remove overlap with __EV_SELECT that only select(2) uses. OK millert@ mpi@
2021-11-29kqueue: Revise badfd knote handlingVisa Hankala
When closing a file descriptor and converting the poll/select knotes into badfd knotes, keep the knotes attached to the by-fd table. This should prevent kqueue_purge() from returning before the kqueue has become quiescent. This in turn should fix a KASSERT(TAILQ_EMPTY(&kq->kq_head)) panic in KQRELE() that bluhm@ has reported. The badfd conversion is only needed when a poll/select scan is ongoing. The system can skip the conversion if the knote is not part of the active event set. The code of this commit skips the conversion when the fd is closed by the same thread that has done the fd polling. This can be improved but should already cover typical fd usage patterns. As badfd knotes now hold slots in the by-fd table, kqueue_register() clears them. poll/select use kqueue_register() to set up a new scan; any found fd close notification is a leftover from the previous scan. The new badfd handling should be free of accidental knote accumulation. This obsoletes kqpoll_dequeue() and lowers kqpoll_init() overhead. Re-enable lazy removal of poll/select knotes because the panic should no longer happen. OK mpi@
2021-11-15Revert to eager removal of poll/select knotesVisa Hankala
This should prevent a panic that bluhm@ has reported.
2021-11-13Let filt_fileattach() run without the kernel lockVisa Hankala
This makes it possible to attach pipe, socket and kqueue event filters without acquiring the kernel lock. Event filters behind vn_kqfilter() are not MP-safe yet, so vn_kqfilter() has to take KERNEL_LOCK(). dmabuf_kqfilter() can skip locking because it has no side effects. OK anton@, mpi@
2021-11-12Keep knotes between poll/select systems callsVisa Hankala
Reduce the time overhead of kqueue-based poll(2) and select(2) by keeping knotes registered between the system calls. It is expected that the set of monitored file descriptors is relatively unchanged between consecutive iterations of these system calls. By keeping the knotes, the system saves the effort of repeated knote unregistering and re-registering. To avoid receiving events from file descriptors that are no longer in the monitored set, each poll/select knote is assigned an increasing serial number. Every iteration of poll/select uses a previously unused range of serials for its knotes. In the setup stage, kqueue_register() updates the serials of any existing knotes in the currently monitored set. Function kqueue_scan() delivers only the events whose serials are recent enough; expired knotes are dropped. When the serial range is about to wrap around, all the knotes in the kqueue backend are dropped. This change is a space-time tradeoff. Memory usage is increased somewhat because of the retained knotes. The increase is limited by the number of open file descriptors and active threads. Idea from DragonFly BSD, initial patch by mpi@, kqueue_scan()-based approach by me. Tested by anton@ and mpi@ OK mpi@
2021-11-06Make kqread event filter MP-safeVisa Hankala
Use the monitored kqueue's kq_lock to serialize kqueue and knote access. Typically, the "object lock" would cover also the klist, but that is not possible with kqueues. knote_activate() needs kq_lock of the monitoring kqueue, which would create lock order troubles if kq_lock was held when calling KNOTE(&kq->kq_sel.si_note). Avoid this by using a separate klist lock for kqueues. The new klist lock is system-wide. Each kqueue instance could have a dedicated klist lock. However, the efficacy of dedicated versus system-wide lock is somewhat limited because the current implementation activates kqueue knotes through a single thread. OK mpi@
2021-07-24Modifying a knote must be done with the corresponding lock held. AssertMartin Pieuchot
that the KERNEL_LOCK() is held unless the filter is marked as MPSAFE. Should help finding missing locks when unlocking various filters. ok visa@
2021-07-22Make kqpoll_dequeue() usable with lazy removal of knotesVisa Hankala
Adjust kqpoll_dequeue() so that it will clear only badfd knotes when called from kqpoll_init(). This is needed by kqpoll's lazy removal of knotes. Eager removal in kqpoll_dequeue() would defeat kqpoll's attempt to reuse previously established knotes under workloads where knote activation tends to occur already before next kqpoll scan. Prompted by mpi@
2021-06-16kqueue: kq_lock is needed when updating kn_statusVisa Hankala
The kn_status field of struct knote is part of kqueue's internal state. When kn_status is being updated, kq_lock has to be locked. This is true even with MP-unsafe event filters. OK mpi@
2021-06-11Remember to lock kqueue mutex in filt_timermodify().Visa Hankala
Reported-by: syzbot+c2aba7645a218ce03027@syzkaller.appspotmail.com
2021-06-10Serialize internals of kqueue with a mutexVisa Hankala
Extend struct kqueue with a mutex and use it to serializes the internals of each kqueue instance. This should make possible to call kqueue's system call interface without the kernel lock. The event source facing side of kqueue should now be MP-safe, too, as long as the event source itself is MP-safe. msleep() with PCATCH still requires the kernel lock. To manage with this, kqueue_scan() locks the kernel temporarily for the section that may sleep. As a consequence of the kqueue mutex, knote_acquire() can lose a wakeup when klist_invalidate() calls it. To preserve proper nesting of mutexes, knote_acquire() has to release the kqueue mutex before it unlocks klist. This early unlocking of the mutex lets badly timed wakeups go unnoticed. However, the system should not hang because the sleep has a timeout. Tested by gnezdo@ and mpi@ OK mpi@
2021-06-02Enable pool cache on knote poolVisa Hankala
Use the pool cache to reduce the overhead of memory management in function kqueue_register(). When EV_ADD is given, kqueue_register() pre-allocates a knote to avoid potential sleeping in the middle of the critical section that spans from knote lookup to insertion. However, the pre-allocation is useless if the lookup finds a matching knote. The cost of knote allocation will become significant with kqueue-based poll(2) and select(2) because the frequency of allocation will increase. Most of the cost appears to come from the locking inside the pool. The pool cache amortizes it by using CPU-local caches of free knotes as buffers. OK dlg@ mpi@
2021-04-22kqueue: Make timer re-addition reset existing timerVisa Hankala
When an existing EVFILT_TIMER filter is re-added, cancel the existing timer and any pending event, and restart the timer using the new timeout period. This makes the new timeout period take effect immediately and matches the behaviour of FreeBSD. Previously, the new setting was applied only after the existing timer expired. The timer rescheduling is done by using an f_modify callback. The reading of timer events is moved from f_event to f_process. f_event of timer_filtops becomes redundant. Unlike most other event sources, timers activate knotes directly without using a klist and knote(9). OK mpi@
2021-02-27Replace stray direct call of f_event with filter_event().Visa Hankala
This does not change the current behaviour, but filterops should be invoked through filter_*() for consistency.
2021-02-24kqueue: Revise filterops interfaceVisa Hankala
Extend kqueue's filterops interface with new callbacks so that it becomes easier to use with fine-grained locking. The new interface delegates the serialization of kn_event access to event sources. Now kqueue uses filterops callbacks to read or write kn_event. This hides event sources' locking patterns from kqueue, and allows clean implementation of atomic read-and-clear for EV_CLEAR, for instance. There are so many existing filterops instances that converting all of them in one go is tricky. This patch adds a wrapper mechanism that kqueue uses when the new callbacks are missing. The new filterops interface has been influenced by XNU's kqueue. OK mpi@ semarie@
2021-01-27kqueue: Fix termination assertVisa Hankala
When a kqueue file is closed, the kqueue can still have threads scanning it. Consequently, kqueue_terminate() can see scan markers in the event queue. These markers are removed when the scanning threads leave the kqueue. Take this into account when checking the queue's state, to avoid a panic when kqueue is closed from under a thread. OK anton@ Reported-by: syzbot+757c60a2aa1125137cce@syzkaller.appspotmail.com
2021-01-17kqueue: Revise fd close notificationVisa Hankala
Deliver file descriptor close notification for __EV_POLL knotes through struct kevent that kqueue_scan() returns. This replaces the previous way of returning EBADF from kqueue_scan(), making it easier to determine what exactly has changed. When a file descriptor is closed, its __EV_POLL knotes are turned into one-shot events and queued for delivery. These knotes are "unregistered" as they are reachable only through the queue of active events. This reduces interference with the normal workings of kqueue. However, more care is needed to avoid leaking knotes. In addition, the unregistering removes a limit on the number of issued knotes. To prevent accumulation of pending fd close notifications, kqpoll_init() flushes the active queue at the start of a kqpoll scan. OK mpi@
2021-01-08Lock kernel before raising SPL in klist_lock()Visa Hankala
This prevents unwanted spinning with interrupts disabled. At the moment, this code is only invoked through klist_invalidate() and the callers should already hold the kernel lock. Also, one could argue that in MP-unsafe contexts klist_lock() should only assert for the kernel lock.
2021-01-07Adjust comment about klist_invalidate()Visa Hankala
2020-12-25Refactor klist insertion and removalVisa Hankala
Rename klist_{insert,remove}() to klist_{insert,remove}_locked(). These functions assume that the caller has locked the klist. The current state of locking remains intact because the kernel lock is still used with all klists. Add new functions klist_insert() and klist_remove() that lock the klist internally. This allows some code simplification. OK mpi@
2020-12-23Ensure that filt_dead() takes effectVisa Hankala
Invoke dead_filtops' f_event callback in klist_invalidate() to ensure that filt_dead() modifies every invalidated knote. If a knote has EV_ONESHOT set in its event flags, kqueue_scan() will not call f_event. OK mpi@
2020-12-23Clear error before each iteration in kqueue_scan()Visa Hankala
This fixes a regression where kqueue_scan() may incorrectly return EWOULDBLOCK after a timeout. OK mpi@
2020-12-20Introduce klistopsVisa Hankala
This patch extends struct klist with a callback descriptor and an argument. The main purpose of this is to let the kqueue subsystem assert when a klist should be locked, and operate the klist lock in klist_invalidate(). Access to a knote list of a kqueue-monitored object has to be serialized somehow. Because the object often has a lock for protecting its state, and because the object often acquires this lock at the latest in its f_event callback function, it makes sense to use this lock also for the knote lists. The existing uses of NOTE_SUBMIT already show a pattern that is likely to become more prevalent. There could be an embedded lock in klist. However, such a lock would be redundant in many cases. The code cannot rely on a single lock type (mutex, rwlock, something else) because the needs of monitored objects vary. In addition, an embedded lock would introduce new lock order constraints. Note that the patch does not rule out use of dedicated klist locks. The patch introduces a way to associate lock operations with a klist. The caller can provide a custom implementation, or use a ready-made interface with a mutex or rwlock. For compatibility with old code, the new code falls back to using the kernel lock if no specific klist initialization has been done. The existing code already relies on implicit initialization of klist. Sadly, this change increases the size of struct klist. dlg@ thinks this is not fatal, though. OK mpi@
2020-12-18Add fd close notification for kqueue-based poll() and select()Visa Hankala
When the file descriptor of an __EV_POLL-flagged knote is closed, post EBADF through the kqueue instance to the caller of kqueue_scan(). This lets kqueue-based poll() and select() preserve their current behaviour of returning EBADF when a polled file descriptor is closed concurrently. OK mpi@
2020-12-18Make knote_{activate,remove}() internal to kern_event.c.Visa Hankala
OK mpi@