Age | Commit message (Collapse) | Author |
|
of the v_un pointers).
OK jsg@ mvs@
|
|
|
|
This makes it clearer why lock order traces are sometimes not displayed.
Prompted by a question from, and OK anton@
|
|
When a kqueue file is closed, the kqueue can still have threads
scanning it. Consequently, kqueue_terminate() can see scan markers
in the event queue. These markers are removed when the scanning threads
leave the kqueue. Take this into account when checking the queue's
state, to avoid a panic when kqueue is closed from under a thread.
OK anton@
Reported-by: syzbot+757c60a2aa1125137cce@syzkaller.appspotmail.com
|
|
message "wroute" into dmesg. Since revision 1.263 pledge "wroute"
allows to change the routing table of a socket.
OK florian@ semarie@
|
|
ok dlg@
|
|
moved option control into a sysctl.
reminder that we can delete this from benjamin baier
|
|
|
|
ok mpi@
|
|
This allows us to unlock getppid(2).
ok mpi@
|
|
Deliver file descriptor close notification for __EV_POLL knotes through
struct kevent that kqueue_scan() returns. This replaces the previous way
of returning EBADF from kqueue_scan(), making it easier to determine
what exactly has changed.
When a file descriptor is closed, its __EV_POLL knotes are turned into
one-shot events and queued for delivery. These knotes are "unregistered"
as they are reachable only through the queue of active events. This
reduces interference with the normal workings of kqueue. However, more
care is needed to avoid leaking knotes. In addition, the unregistering
removes a limit on the number of issued knotes. To prevent accumulation
of pending fd close notifications, kqpoll_init() flushes the active
queue at the start of a kqpoll scan.
OK mpi@
|
|
OK mpi@ as part of a larger diff
|
|
The syncer_thread() uses lbolt to perform periodic execution. We can
do this without lbolt.
- Adding a local wakeup(9) channel (syncer_chan) and sleep on it.
- Use a local copy of getnsecuptime() to get 1/hz resolution for time
measurements. This is much better than using gettime(9), which is
wholly unsuitable for this use case. Measure how long we spend in
the loop and use this to calculate how long to sleep until the next
execution.
NB: getnsecuptime() is probably ready to be moved to kern_tc.c and
documented.
- Using the system uptime instead of the UTC time avoids issues with
time jumps.
ok mpi@
|
|
The global "tickadj" variable is a remnant of the old NTP adjustment
code we used in the kernel before the current timecounter subsystem
was imported from FreeBSD circa 2004 or 2005.
Fifteen years hence it is completely vestigial and we can remove it.
We probably should have removed it long ago but I guess it slipped
through the cracks. FreeBSD removed it in 2002:
https://cgit.freebsd.org/src/commit/?id=e1d970f1811e5e1e9c912c032acdcec6521b2a6d
NetBSD and DragonflyBSD can probably remove it, too.
We export tickadj via the kern.clockrate sysctl(2), so update sysctl.2
and sysctl(8) accordingly. Hypothetically this change could break
someone's sysctl(8) parsing script. I don't think that's very likely.
ok mvs@
|
|
Should prevent to use uninitialized value as bogus counter index.
OK mvs@ claudio@ anton@
|
|
Original port from NetBSD by guenther@, required for upcoming amap & anon
locking.
ok kettenis@
|
|
The common code is moved to sleep_signal_check() and instead of multiple
state variables for sls_sig and sls_unwind only one sls_sigerr is set.
This simplifies the checks in sleep_finish_signal() a great bit.
Idea from and OK mpi@
|
|
Removed a rash of +/-1 and made both functions shorter and more focused.
OK millert@
|
|
This changes amd64 GENERIC.MP .text size of kern_sysctl.o from 6440 to 6400.
Surprisingly, RAMDISK grows from 1645 to 1678.
OK millert@, mglocker@
|
|
OK millert@
|
|
Makes previously explicit checking less verbose.
OK millert@
|
|
Prefer error reporting is to silent clipping.
OK millert@
|
|
error, a broadcast mbuf will stay in the socket buffer forever.
This is bad as multiple mbufs can use up all the space. Better
report ELOOP, dissolve splicing, and let userland handle it.
OK anton@
|
|
|
|
|
|
|
|
This prevents unwanted spinning with interrupts disabled.
At the moment, this code is only invoked through klist_invalidate()
and the callers should already hold the kernel lock. Also, one could
argue that in MP-unsafe contexts klist_lock() should only assert for
the kernel lock.
|
|
On sparc64, initmsgbuf() is invoked before curcpu() is usable
on the boot processor. Consequently, it is unsafe to use mutexes
during the message buffer initialization. Avoid such use by skipping
log_mtx when appending a newline from initmsgbuf().
Use mbp instead of msgbufp as the buffer argument to the putchar routine
for consistency.
Bug reported and fix suggested by miod@
|
|
The use of kqueue as backend has introduced a significant regression
in the performance of select(2), so go back to using the original code.
Some additional management overhead is to be expected when using kqueue.
However, the overhead of the current implementation is too high.
Reported by bluhm@ on bugs@
|
|
|
|
ones added to malloc() and free(). Pass the struct pool pointer as argv1
since it is currently not possible to pass the pool name to btrace.
OK mpi@
|
|
Change the pool(9) timeouts to use the system uptime instead of ticks.
- Change the timeouts from variables to macros so we can use
SEC_TO_NSEC(). This means these timeouts are no longer patchable
via ddb(4). dlg@ does not think this will be a problem, as the
timeout intervals have not changed in years.
- Use low-res time to keep things fast. Add a local copy of
getnsecuptime() to subr_pool.c to keep the diff small. We will need
to move getnsecuptime() into kern_tc.c and document it later if we
ever have other users elsewhere in the kernel.
- Rename ph_tick -> ph_timestamp and pr_cache_tick -> pr_cache_timestamp.
Prompted by tedu@ some time ago, but the effort stalled (may have been
my fault). Input from kettenis@ and dlg@.
Special thanks to mpi@ for help with struct shuffling. This change
does not increase the size of struct pool_page_header or struct pool.
ok dlg@ mpi@
|
|
|
|
via dt(4) and btrace(8).
OK mpi@ millert@
|
|
OK anton@, mpi@
|
|
devices, introduce kern.video.record for video(4) devices. By default
kern.video.record will be set to zero, blanking all data delivered
by device drivers which attach to video(4).
The idea was initially proposed by
Laurence Tratt <laurie AT tratt DOT net>.
ok mpi@
|
|
ok kettenis@, dlg@
|
|
OK mpi@
|
|
Rename klist_{insert,remove}() to klist_{insert,remove}_locked().
These functions assume that the caller has locked the klist. The current
state of locking remains intact because the kernel lock is still used
with all klists.
Add new functions klist_insert() and klist_remove() that lock the klist
internally. This allows some code simplification.
OK mpi@
|
|
Make the SMR thread maintain an explicit system-wide grace period and
make CPUs observe the current grace period when crossing a quiescent
state. This lets the SMR thread avoid a forced context switch for CPUs
that have already entered the latest grace period.
This change provides a small improvement in smr_grace_wait()'s
performance in terms of context switching.
OK mpi@, anton@
|
|
It would be convenient if there were a channel a thread could sleep on
to indicate they do not want any wakeup(9) broadcasts. The easiest way
to do this is to add an "int nowake" to kern_synch.c and extern it in
sys/systm.h. You use it like this:
#include <sys/systm.h>
tsleep_nsec(&nowait, ...);
There is now no need to handroll a local dead channel, e.g.
int chan;
tsleep_nsec(&chan, ...);
which expands the stack. Local dead channels will be replaced with
&nowake in later patches.
One possible problem with this "one global channel" approach is sleep
queue congestion. If you have lots of threads sleeping on &nowake you
might slow down a wakeup(9) on a different channel that hashes into
the same queue. Unsure how much of problem this actually is, if at all.
NetBSD and FreeBSD have a "pause" interface in the kernel that chooses
a suitable channel automatically. To keep things simple and avoid
adding a new interface we will start with this global channel.
Discussed with mpi@, claudio@, kettenis@, and deraadt@.
Basically designed by kettenis@, who vetoed my other proposals.
Bugs caught by deraadt@, tb@, and patrick@.
|
|
Make it obvious where the thread is blocked. "pause" is ambiguous.
Tweaked by kettenis@.
Probably ok kettenis@.
|
|
We only see 8 characters of wmesg in e.g. top(1), so shorten the
string to fit.
Indirectly prompted by kettenis@.
|
|
Invoke dead_filtops' f_event callback in klist_invalidate() to ensure
that filt_dead() modifies every invalidated knote. If a knote has
EV_ONESHOT set in its event flags, kqueue_scan() will not call f_event.
OK mpi@
|
|
This fixes a regression where kqueue_scan() may incorrectly return
EWOULDBLOCK after a timeout.
OK mpi@
|
|
The given set of fds are converted to equivalent kevents using EV_SET(2)
and passed to the scanning internals of kevent(2): kqueue_scan().
ktrace(1) will now output the converted kevents on top of the usuals set
bits to be able to find possible error in the convertion.
This switch implies that select(2) and pselect(2) will now query the
underlying kqfilters instead of the *_poll() routines.
Based on similar work done on DragonFlyBSD with inputs from from visa@,
millert@, anton@, cheloha@, thanks!
ok visa@
|
|
This patch extends struct klist with a callback descriptor and
an argument. The main purpose of this is to let the kqueue subsystem
assert when a klist should be locked, and operate the klist lock
in klist_invalidate().
Access to a knote list of a kqueue-monitored object has to be
serialized somehow. Because the object often has a lock for protecting
its state, and because the object often acquires this lock at the latest
in its f_event callback function, it makes sense to use this lock also
for the knote lists. The existing uses of NOTE_SUBMIT already show
a pattern that is likely to become more prevalent.
There could be an embedded lock in klist. However, such a lock would be
redundant in many cases. The code cannot rely on a single lock type
(mutex, rwlock, something else) because the needs of monitored objects
vary. In addition, an embedded lock would introduce new lock order
constraints. Note that the patch does not rule out use of dedicated
klist locks.
The patch introduces a way to associate lock operations with a klist.
The caller can provide a custom implementation, or use a ready-made
interface with a mutex or rwlock.
For compatibility with old code, the new code falls back to using the
kernel lock if no specific klist initialization has been done. The
existing code already relies on implicit initialization of klist.
Sadly, this change increases the size of struct klist. dlg@ thinks this
is not fatal, though.
OK mpi@
|
|
When the file descriptor of an __EV_POLL-flagged knote is closed,
post EBADF through the kqueue instance to the caller of kqueue_scan().
This lets kqueue-based poll() and select() preserve their current
behaviour of returning EBADF when a polled file descriptor is closed
concurrently.
OK mpi@
|
|
OK mpi@
|
|
Because kqpoll instances are now linked to the file descriptor table,
the freeing of kqpoll and ordinary kqueues is similar.
Suggested by mpi@
|