Age | Commit message (Collapse) | Author |
|
stops threads while called during the sleep transition and so there is
no need to clear P_WSLEEP.
OK mpi@
|
|
OK mpi@
|
|
If any other signal is pending the stop signal should be deferred.
Now cursig() uses ffs() to select the signal and so higher numbered
signals like SIGUSR1 would be ignored when going to sleep.
So handle default stop signals specially in the deep case, stash them
and only use them if no other signal is pending.
Fix for signal-stress regress (problem reported by anton@)
With and OK mpi@
|
|
setsigctx() now does this check and clears sig_stop in that case and
instead set sig_ignore. So the check in cursig that is based on sig_stop
can never be true.
OK mpi@
|
|
delivery in cursig() a lot since most of that is no longer needed.
On top of this properly handle sending a blocked signal from gdb to
the debugged process by putting the signal into to proc p_siglist.
OK kettenis@
|
|
the process is not modified during signal delivery.
This also unlocks psignal and prsignal since those are simple wrappers
around ptsignal.
OK mpi@
|
|
There used to be a predefined null vattr for !DIAGNOSTIC
but that was removed in vnode.h rev 1.84 in 2007.
ok semarie@ miod@
|
|
In setsigctx() set sig_stop to 1 if the process should be stopped.
In cursig() still return early if deep but then in sleep_signal_check()
use this information to call proc_stop and stop the proc.
This should fix the problem in the waitid regress test.
OK mpi@
|
|
This should be enough to be on the safe side when unlocking ptsignal
where a pr->ps_pgrp->pg_jobc == 0 check happens.
OK mpi@ kettenis@
|
|
Add deep flag as function argument which is used by the sleep API but
nowhere else. Both calls to sleep_signal_check() should skip the ugly
bits of cursig().
In cursig() if deep once it is clear a signal will be taken keep the
signal on the thread siglist and return. sleep_signal_check() will then
return EINTR or ERESTART based on the signal context. There is no reason
to do more in this special case. Especially stop/cont and the ptrace trap
must be skipped here. Once the call makes it to userret the signal will be
picked up again and handled in a safe location.
Stopping singals need some additional logic since we don't want to abort
the sleep just to stop a process. Since our SIGSTOP handling requires
a major rewrite this will be posponed until then.
OK mpi@
|
|
The checks in dowait6 and orphanpg using ps_mainproc are flawed and
fail if the mainproc called pthread_exit before the other threads.
Adding the flag in proc_stop_sweep is racy but the best we have right now.
This fixes regress/sys/kern/signal/sig-stop3.
OK mpi@
|
|
Also remove the ps_xsig handling in setrunnable() it is in the wrong spot
and causes signals to be delivered over and over again.
Attaching to an already stopped process is affected by this. The SIGSTOP
sent by ptrace is now ignored in ptsignal() and as a result gdb will hang
in wait4() until a SIGCONT is delivered to the process. After that all
works as usual.
OK mpi@
|
|
Also do not use ps_mainproc as the thread the signal is send to. Sending
a signal to ps_mainproc may not work reliably if it already exited. Use
TAILQ_FIRST(&pr->ps_threads) instead but first check that the process has
not yet entered exit1().
OK mpi@
|
|
Introduce P_TRACESINGLE flag to instruct the trapped thread to not
wakeup the other threads (via single_thread_clear). This must be done
like this since ptrace must wake just the single thread to ensure it
runs first and gets the ps_xsig value from ptrace.
Modern gdb depends on this for multi-threaded processes, when a breakpoint
is hit gdb fixes up the trapping instruction and then single steps over
it with only that thread. After that single step gdb continues with all
threads. If all threads are run like now it is possible that one of the
other threads hits a breakpoint before the single step is done which results
in an assertion in gdb (because that is not expected).
OK mpi@
|
|
|
|
Instead of the KERNEL_LOCK use the ps_mtx for most operations.
If the ps_klist is modified an additional global rwlock (kqueue_ps_list_lock)
is required. This includes the knotes with NOTE_FORK and NOTE_EXIT since
in either cases a ps_klist is changed. In the NOTE_FORK | NOTE_TRACK case
the call to kqueue_register() can sleep this is why a global rwlock is used.
Adjust the reaper() to call knote_processexit() without KERNEL_LOCK.
Double lock idea from visa@
OK mvs@
|
|
Since proc and signal filters share the same klist it makes sense
to keep them together.
OK mvs@
|
|
dowait6() can only look at per process state so switch this over.
Right now SIGCONT handling in ptsignal is recursive and not quite
right but this is a step in the right direction. It fixes dowait6()
handling for multithreaded processes where the main thread exited.
OK mpi@
|
|
Requested by kettenis@ and guenther@
|
|
that a process has been stopped so make room for that.
OK kettenis@
|
|
OK mpi@
|
|
in the order needed for future changes. No functional change.
OK mpi@
|
|
The SPL level is not tacked by the mutex and we no longer need to track
this in the callers.
OK miod@ mlarkin@ tb@ jca@
|
|
use one of the gotos. In this case goto out with mask and prop set to 0.
OK jca@
|
|
This diff adjusts how single_thread_set() accounts the threads by using
ps_threadcnt as initial value and counting all threads out that are already
parked. In single_thread_check call exit1() before decreasing ps_singlecount
this is now done in exit1().
exit1() and thread_fork() ensure that ps_threadcnt is updated with the
pr->ps_mtx held and in exit1() also account for exiting threads since
exit1() can sleep.
OK mpi@
|
|
Since we want to unlock sigsuspend, ptsignal needs to double check in the
SSLEEP case that the signal being delivered is still masked or unmasked.
Remove the early return for action SIG_HOLD so that the SSLEEP case can
properly recheck the sigmask.
On top of this update siglist only in one place at the end of ptsignal
this now includes the clearing of signals for the SA_CONT and SA_STOP
cases.
OK mpi@
|
|
One atomic_clearbits_int() hiding in SSTOP was missed when converting all
the exceptions that cleared the siglist again. Instead of clearing the bits
the mask needs to be set to 0 so that it is properly ignored.
OK mpi@
|
|
mostly dead.
This is more like belts and suspenders since a proc in exit1() will not
receive signals anymore and so proc_stop() should not be reachable. This
is even the case when sigexit() is called and a coredump() is happening.
OK mpi@
|
|
Change p_sigmask from atomic back to non-atomic updates. All changes to
p_sigmask are only allowed by curproc (the owner). There is no need for
atomic instructions here.
p_sigmask is mostly accessed by curproc with the exception of ptsignal().
In ptsignal() p_sigmask is now only read once unless a SSLEEP proc gets
the signal. In that case recheck the p_sigmask before wakeup to ensure
that no unnecessary wakeup happens.
Add some KASSERT(p == curproc) to ensure this precondition.
sigabort() is special since it is also called by ddb but apart from that
only works for curproc.
With and OK mvs@ OK mpi@
|
|
Tracepoints like "sched:enqueue" and "sched:unsleep" were called from inside
the loop iterating over sleeping threads as part of wakeup_proc(). When such
tracepoints were enabled they could result in another wakeup(9) possibly
corrupting the sleepqueue.
Rewrite wakeup(9) in two stages, first dequeue threads from the sleepqueue then
call setrunnable() and possible tracepoints for each of them.
This requires moving unsleep() outside of setrunnable() because it messes with
the sleepqueue.
ok claudio@
|
|
has occurred in the process.
ok various people
|
|
to be smaller than the mapping. Record which memory segments are backed by
vnodes while walking the uvm map and later suppress EFAULT errors caused
by the underlying file being truncated. okay miod@
|
|
channel instead of inventing an own one.
OK kettenis@ mvs@
|
|
The mode can now be or-ed with SINGLE_DEEP or SINGLE_NOWAIT to alter
the behaviour of single_thread_set(). This allows explicit control
of the SINGLE_DEEP behaviour.
If SINGLE_DEEP is set the deep flag is passed to the initial check call
and by that the check will error out instead of suspending (SINGLE_UNWIND)
or exiting (SINGLE_EXIT). The SINGLE_DEEP flag is required in calls to
single_thread_set() outside of userret. E.g. at the start of sys_execve
because the proc is not allowed to call exit1() in that location.
SINGLE_NOWAIT skips the wait at the end of single_thread_set() and therefor
returns BEFORE all threads have been parked. Currently this is only used by
the ptrace code and should not be used anywhere else. Not waiting for all
threads to settle is asking for trouble.
This solves an issue by using SINGLE_UNWIND in the coredump case where
the code should actually exit in case another thread crashed moments earlier.
Also the SINGLE_UNWIND in pledge_fail() is now marked SINGLE_DEEP since
the call to pledge_fail() is for sure not at the kernel boundary.
OK mpi@
|
|
SINGLE_UNWIND unwinds to the kernel boundary. On the other hand
SINGLE_SUSPEND will sleep inside tsleep(9) and other sleep functions.
Since the code will exit1() very soon after it is better to already unwind.
Now one could argue that for coredumps all threads should stop asap to
get a clean dump. Using SINGLE_UNWIND the sleep will fail with ERESTART
and no copyout should happen in that case.
This is a bit of a workaround since SINGLE_SUSPEND has a small race
where single_thread_wait() returns before all threads are really stopped.
When SINGLE_EXIT is called quickly after this can blow up inside
sleep_finish.
Reported-by: syzbot+3ef066fcfaf991f2ac2c@syzkaller.appspotmail.com
OK mpi@ kettenis@
|
|
The change to the single thread API results in crashes inside exit1()
as found by Syzkaller. There seems to be a race in the exit codepath.
What exactly fails is not really clear therefor revert for now.
This should fix the following Syzkaller reports:
Reported-by: syzbot+38efb425eada701ca8bb@syzkaller.appspotmail.com
Reported-by: syzbot+ecc0e8628b3db39b5b17@syzkaller.appspotmail.com
and maybe more.
Reverted commits:
----------------------------
Protect ps_single, ps_singlecnt and ps_threadcnt by the process mutex.
The single thread API needs to lock the process to enter single thread
mode and does not need to stop the scheduler.
This code changes ps_singlecount from a count down to zero to ps_singlecnt
which counts up until equal to ps_threadcnt (in which case all threads
are properly asleep).
Tested by phessler@, OK mpi@ cheloha@
----------------------------
Change how ps_threads and p_thr_link are locked away from using SCHED_LOCK.
The per process thread list can be traversed (read) by holding either
the KERNEL_LOCK or the per process ps_mtx (instead of SCHED_LOCK).
Abusing the SCHED_LOCK for this makes it impossible to split up the
scheduler lock into something more fine grained.
Tested by phessler@, ok mpi@
----------------------------
Fix SCHED_LOCK() leak in single_thread_set()
In the (q->p_flag & P_WEXIT) branch is a continue that did not release
the SCHED_LOCK. Refactor the code a bit to simplify the places SCHED_LOCK
is grabbed and released.
Reported-by: syzbot+ea26d351acfad3bb3f15@syzkaller.appspotmail.com
OK kettenis@
|
|
In the (q->p_flag & P_WEXIT) branch is a continue that did not release
the SCHED_LOCK. Refactor the code a bit to simplify the places SCHED_LOCK
is grabbed and released.
Reported-by: syzbot+ea26d351acfad3bb3f15@syzkaller.appspotmail.com
OK kettenis@
|
|
The per process thread list can be traversed (read) by holding either
the KERNEL_LOCK or the per process ps_mtx (instead of SCHED_LOCK).
Abusing the SCHED_LOCK for this makes it impossible to split up the
scheduler lock into something more fine grained.
Tested by phessler@, ok mpi@
|
|
The single thread API needs to lock the process to enter single thread
mode and does not need to stop the scheduler.
This code changes ps_singlecount from a count down to zero to ps_singlecnt
which counts up until equal to ps_threadcnt (in which case all threads
are properly asleep).
Tested by phessler@, OK mpi@ cheloha@
|
|
sleep_signal_check() is there to look for pending signals / single thread
requests which were posted before sleep_setup() finished. Once p_stat
is set to SSLEEP the wakeup and delivery of signals is taken care of
by ptsignal and single_thread_set().
Moving the SCHED_LOCK further down allows to cleanup cursig() and to
remove a SCHED_LOCK recursion in single_thread_check().
OK mpi@
|
|
When continuing a process on the sleep queue just let it switch to
p_stat = SSLEEP even when P_WSLEEP is set. Once a proc is SSTOP-ed
in sleep_finish() a valid sleep point has been reached and there is
no need to make the process runnable again (which results in some
hairy race conditions). Instead simply clear P_WSLEEP since a stopped
proc reached the sleep state and there is no race with wakeup() anymore.
OK mpi@
|
|
This way threads stopped by SINGLE_SUSPEND will check for pending
signals right after being released instead of returning to userland
first. The same order of check is already used in sleep_signal_check().
OK mpi@
|
|
Also remove the priority argument to sleep_finish() the code can use
the p_flag P_SINTR flag to know if the signal check is needed or not.
OK cheloha@ kettenis@ mpi@
|
|
between calls.
Instead of forcing an atomic operation across multiple calls use a three
step transaction.
1. setup sleep state by calling sleep_setup()
2. recheck sleep condition to ensure that the event did not fire before
sleep_setup() registered the proc onto the sleep queue
3. call sleep_finish() to either sleep or keep on running based on the
step 2 outcome and any possible signal delivery
To make this work wakeup from signals, single thread api and wakeup(9) need
to be aware if a process is between step 1 and step 3 so that the process
is not enqueued back onto the runqueue while going to sleep. Introduce
the p_flag P_WSLEEP to detect this situation.
On top of this remove the spl dance in msleep() which is no longer required.
It is ok to process interrupts between step 1 and 3.
OK mpi@ cheloha@
|
|
by passing BYPASSUNVEIL just for this vnode. The coredump() code is quite
careful, so this will be fine.
ok kn kettenis semarie
|
|
Pass the timeout and sleep priority not only to sleep_setup() but also
to sleep_finish(). With that sls_timeout and sls_catch can be removed
from struct sleep_state.
The timeout is now setup first thing in sleep_finish() and no longer as
last thing in sleep_setup(). This should not cause a noticeable difference
since the code run between sleep_setup() and sleep_finish() is minimal.
OK kettenis@
|
|
if () check which just returns.
OK mpi@
|
|
Make knote(9) lock the knote list internally, and add knote_locked(9)
for the typical situation where the list is already locked.
Remove the KNOTE(9) macro to simplify the API.
Manual page OK jmc@
OK mpi@ mvs@
|
|
against classic BROP with a range-checking wrapper in front of copyin() and
copyinstr() which ensures the userland source doesn't overlap the main program
text, ld.so text, signal tramp text (it's mapping is hard to distinguish
so it comes along for the ride), or libc.so text. ld.so tells the kernel
libc.so text range with msyscall(2). The range checking for 2-4 elements is
done without locking (because all 4 ranges are immutable!) and is inexpensive.
write(sock, &open, 400) now fails with EFAULT. No programs have been
discovered which require reading their own text segments with a system call.
On a machine without mmu enforcement, a test program reports the following:
userland kernel
ld.so readable unreadable
mmap xz unreadable unreadable
mmap x readable readable
mmap nrx readable readable
mmap nwx readable readable
mmap xnwx readable readable
main readable unreadable
libc unmapped? readable unreadable
libc mapped readable unreadable
ok kettenis, additional help from miod
|
|
the process* that it should be part of. Use that in clock_get{time,res}(),
thrkill(), and ptrace().
ok jca@ miod@ mpi@ mvs@
|