Age | Commit message (Collapse) | Author |
|
ok millert@, visa@
|
|
|
|
When we resume from a suspend we use the time from the RTC to advance
the system offset. This changes the UTC to match what the RTC has given
us while increasing the system uptime to account for the time we were
suspended.
Currently we decide whether to change to the RTC time in tc_setclock()
by comparing the new offset with the th_offset member. This is wrong.
th_offset is the *minimum* possible value for the offset, not the "real
offset". We need to perform the comparison within tc_windup() after
updating th_offset, otherwise we might rewind said offset.
Because we're now doing the comparison within tc_windup() we ought to
move naptime into the timehands. This means we now need a way to safely
read the naptime to compute the value of CLOCK_UPTIME for userspace.
Enter nanoruntime(9); it increases monotonically from boot but does not
jump forward after a resume like nanouptime(9).
|
|
of todr_handle.
OK kettenis@
|
|
The struct keeps track of the end point of an event queue scan by
persisting the end marker. This will be needed when kqueue_scan() is
called repeatedly to complete a scan in a piecewise fashion. The end
marker has to be preserved between calls because otherwise the scan
might collect an event more than once. If a collected event gets
reactivated during scanning, it will be added at the tail of the queue,
out of reach because of the end marker.
OK mpi@
|
|
ok deraadt@, mpi@, visa@
ok cheloha@ as well (would have preferred in new file for this code)
|
|
This prevent exiting processes from hanging when a slave pseudo terminal
is close(2)d before its master.
From NetBSD via anton@.
Reported-by: syzbot+2ed25b5c40d11e4c3beb@syzkaller.appspotmail.com
ok anton@, kettenis@
|
|
clear the NOCACHE flag, since if we are doing a delayed write the buffer
must be cached or it is thrown away when the "write" is done.
fixes vnd on mfs regress tests.
ok kettenis@ deraadt@
|
|
ok millert@, deraadt@
|
|
FRELE() as the last reference could be dropped which in turn will cause
soclose() to be called where the socket lock is unconditionally
acquired. Note that this is only a problem for sockets protected by the
non-recursive NET_LOCK() right now.
ok mpi@ visa@
Reported-by: syzbot+7c805a09545d997b924d@syzkaller.appspotmail.com
|
|
them in line with sbappendstream() and sbappendrecord().
Agreed by mpi@
|
|
Prevent generating events that do not correspond to how the fifo has been
opened.
ok visa@, millert@
|
|
for example, with locking assertions.
OK mpi@, anton@
|
|
deep recursion. This also helps making kqueue_wakeup() free of the
kernel lock because the current implementation of selwakeup()
requires the lock.
OK mpi@
|
|
suspend (SINGLE_SUSPEND or SINGLE_PTRACE) it needs to do this in
sleep_setup_signal(). This way the case where single_thread_clear() is
called before the sleep gets its wakeup call can be correctly handled and
the thread is put back to sleep in sleep_finish(). If the wakeup happens
before unsuspend then p_wchan is 0 and the thread will not go to sleep again.
In case of a unwind an error is returned causing the thread to return
immediatly with that error.
With and OK mpi@ kettenis@
|
|
mpi@ and I added a warning log to *sleep_nsec(9) last year to smoke
out division-to-zero bugs when converting kernel code from *sleep(9)
to the new interfaces. It whines if you tell it to sleep for zero
nanoseconds.
Now that rwsleep_nsec(9) is exposed to userspace via futex(2), though,
it is possible to get a legitimate zero-nanosecond timeout from the
user. This can cause a lot of logging, which apparently can cause
hiccups and hangs in Mesa.
As a quick fix we can round the timeout up to one nanosecond and
silence the warning. No logs, no delays, no hiccups or hangs.
--
Aside: it is unclear what we are supposed to do in the FUTEX_WAIT
zero-nanosecond timeout case: block for a tick or return ETIMEDOUT
immediately. The Linux futex(2) manpage does not mention the case.
It'd be nice to knew what the proper behavior is.
--
Prompted by matthieu@. Input from kettenis@ and deraadt@.
Tested by matthieu@, ajacoutot@.
In snaps since Mar 27 2020.
ok ajacoutot@, deraadt@, kettenis@.
|
|
OK deraadt@
|
|
panic message shows the actual code location of the assert. Do this by
moving the assert logic inside the macros.
Prompted by and OK claudio@
OK mpi@
|
|
vnode. This lets other parts of the kernel see the vnode as active for
writing. In particular, now quotaon_vnode() properly sets up quotas for
ktrace vnodes. This fixes a crash that could happen if quotas were
turned on while a process was ktraced.
ktrace vnodes are opened for writing and an initial write reference
is provided for them by vn_open(9). However, this reference is removed,
too early, when sys_ktrace() calls vn_close(9).
Crash reported and fix tested by Bryan Stenson
OK mpi@
|
|
While here check for the validity of the timeout at the begining of the
syscall.
ok kettenis@, cheloha@
|
|
sleep_setup/finish related functions are.
OK kettenis@
|
|
Let's fix this before we put them back :o)
|
|
This will allows for future simplifications of the VFS interfaces.
Tested in a bulk by naddy@ and visa@.
ok visa@, anton@
|
|
several problems with the vnode exclusive lock implementation, I
overlooked the fact that a vnode can be in a state where the usecount is
zero while the holdcount still being positive. There could still be
threads waiting on the vnode lock in uvn_io() as long as the holdcount
is positive.
"go ahead" mpi@
Reported-by: syzbot+767d6deb1a647850a0ca@syzkaller.appspotmail.com
|
|
comes to setting a process into single thread mode. It is still worng but
first the interaction with single_thread_set() must be corrected.
|
|
header could not be written.
OK anton@ mpi@
|
|
when called during execve(2). This was a caused by initializing sls_sig
with value 0 in r1.164 of kern_synch.c. Previously, tsleep(9) returned
immediately with EINTR in similar circumstances.
The immediate return without error can cause a system hang. For example,
vwaitforio() could end up spinning if called during execve(2) because
the thread did not enter sleep and other threads were not able to finish
the I/O.
tsleep
vwaitforio
nfs_flush
nfs_close
VOP_CLOSE
vn_closefile
fdrop
closef
fdcloseexec
sys_execve
Fix the issue by checking (p->p_flag & P_SUSPSINGLE) instead of
(p->p_p->ps_single != NULL) in sleep_setup_signal(). The former is more
selective than the latter and allows the thread that invokes execve(2)
enter sleep normally.
Bug report, change bisecting and testing help by Pavel Korovin
OK claudio@ mpi@
|
|
|
|
Make sure to release the last reference of the vnode after all
other traced processes have given up on it.
CID 1453020 Unchecked return value.
Inputs from guenther@, ok visa@
|
|
rwsleep_nsec(9) will not set a timeout if the nsecs parameter is
equal to INFSLP (UINT64_MAX). We need to limit the duration to
MAXTSLP (UINT64_MAX - 1) to ensure a timeout is set.
|
|
tsleep_nsec(9) will not set a timeout if the nsecs parameter is
equal to INFSLP (UINT64_MAX). We need to limit the duration to
MAXTSLP (UINT64_MAX - 1) to ensure a timeout is set.
|
|
single_thread_check() safe to be called without KERNEL_LOCK().
single_thread_wait() needs to use sleep_setup() and sleep_finish()
instead of tsleep() to make sure no wakeup() is lost.
Input kettenis@, with and OK visa@
|
|
An absolute timeout T elapses when the clock has reached time T, i.e.
when T is less than or equal to the clock's current time.
But the current code thinks T elapses only when the clock is strictly
greater than T.
For example, if my absolute timeout is 1.00000000, the current code will
not return EWOULDBLOCK until the clock reaches 1.00000001. This is wrong:
my absolute timeout elapses a nanosecond prior to that point.
So the timespeccmp(3) here should be
timespeccmp(tsp, &now, <=)
and not
timespeccmp(tsp, &now, <)
as it is currently.
|
|
With input from visa@.
ok visa@
|
|
With input from visa@.
ok visa@
|
|
While here, rename the wait channel so the tsleep_nsec(9) call will fit
onto a single line. It isn't a global channel so the name is arbitrary
anyway.
With input from visa@.
ok visa@
|
|
Requested by mpi@
|
|
implementation file. Pushing the assignment of ps_uvpcwd down to
unveil_add() is required but it doesn't introduce any functional change.
ok mpi@ semarie@
|
|
|
|
ok mpi@ visa@
|
|
This ensures that the conditions checked are still in force. The sleep
breaks atomicity, allowing another thread to alter the state.
single_thread_set() should return immediately after sleep when called
from dowait4() because there is no guarantee that the process pr still
exists. When called from single_thread_set(), the process is that of
the calling thread, which prevents process pr from disappearing.
OK anton@, mpi@, claudio@
|
|
reparented to a debugger process.
Also re-parent exiting traced processes to their original parent, if it
is still alive, after the debugger has seen the exit status.
Logic comes from FreeBSD pointed out by guenther@.
While here rename proc_reparent() into process_reparent() and get rid of
superfluous checks.
ok visa@
|
|
The allocating of lock stacks does not handle correctly the case where
the system-wide free list becomes empty. Consequently, the returned
stack buffer can still be on the CPU's free list.
This patch fixes the bug by simplifying the code. Lock stack buffers are
now allocated and freed one by one from the system-wide free list
instead of using batching.
The fix additionally addresses a buffer hoarding problem that can arise
under workloads where some CPUs are net acquirers and some other CPUs
net releasers of rwlocks.
Panic reported by Hrvoje Popovski
|
|
possible signal that was caught during sleep setup. It does not make sense
to have a default of 1 (SIGHUP) for this.
OK visa@ mpi@
|
|
file atomic. This also gets rid of the last kernel lock protected field
in the scope of struct file.
ok mpi@ visa@
|
|
Logic is hard, so keep only one of two logically equivalent statements.
CID 271085
ok kettenis@, deraadt@, miod@
|
|
This shows that atomic_* operations should not be necessery to write
to this field unlike with the process one.
The advantage of using a somewhat-unique prefix for struct member is
moot when multiple definitions use the same prefix :o)
From Amit Kulkarni, ok claudio@
|
|
|
|
Otherwise the write will be discarded, which would prevent use of vnd(4)
on top of an async-mounted file system.
OK beck@
|
|
sent via spliced socket.
Reported-by: syzbot+2f9616f39d3f3b281cfb@syzkaller.appspotmail.com
OK bluhm@
|