Age | Commit message (Collapse) | Author |
|
checking done in taskq_barrier(9) and timeout_barrier(9).
OK mpi@
|
|
We want this so that we can stop allowing readlink() on traversed
vnodes in unveil().
This includes all the kernel side and the system call.
This is not yet used in libc for realpath, so nothing calls this yet.
The libc wrapper will be committed later.
Testing by many, and ports build by naddy@
ok deraadt@
|
|
does not block the signal. If all threads block the signal, we
delivered it to the main thread. This does not conform to POSIX.
If any thread unblocks the signal, it should be delivered immediately
to this thread.
Mark such signals pending at the process instead of a single thread.
Then any thread can handle it later.
OK kettenis@ guenther@
|
|
and incorrectly return EBADF when n>curlim.
ok millert guenther tedu
|
|
|
|
encountered a wxneeded binary that attempts correct operation when started
on a nowxallowed filesystem (it tries mprotect with RWX, notices ENOTSUP
and acts in a different way). So permit execution (but of course don't
allow W^X violating mappings)
ok sthen kettenis robert
|
|
OK visa@, OK mpi@
|
|
locks.
ok jturner@ visa@
Reported-by: syzbot+f9f13034fd656af6c48f@syzkaller.appspotmail.com
|
|
ok kettenis
|
|
Reduces the worst-case error for for time values retrieved via the
microtime(9) functions from 10 ticks to 2 ticks. Being interrupted
for over a tick is unlikely but possible.
While here use C99 initializers.
From FreeBSD r303383.
ok mpi@
|
|
instead of panicing
ok deraadt@, tedu@, mpi@
|
|
allocations will recover some memory from the dma_constraint range.
The allocation still fails, the intent is to ensure that the
pagedaemon will free some memory to possibly allow a subsequent
allocation to succeed.
This also adds a UVM_PLA_NOWAKE flag to allow special cases in the
buffer cache to not wake up the pagedaemon until they want to.
ok kettenis@
|
|
clock_settime(2)/settimeofday(2) still need KERNEL_LOCK for a moment
when resetting the RTC, as that's done periodically from a task under
KERNEL_LOCK. Not quite sure how to approach that one yet.
ok visa@ mpi@, "good stuff" tedu@,
"please wait until after [tree] unlock" deraadt@
|
|
Noticed by me and otto@
ok tedu@
|
|
current status and statistics and can be exported without super-user
rights via sysctl to make it easier for tools like systat to access those.
OK deraadt@, sashan@
|
|
reported by kettenis
|
|
let's see what falls out.
ok beck deraadt kettenis mpi
|
|
|
|
default value of kern.splassert to 3, i.e. enter ddb on splassert()
failure. Will be used during fuzzing.
ok mpi@ visa@
|
|
This also modifies the backoff logic to only back off what is requested
and not a "mimimum" amount. Tested by me, benno@, tedu@ anda ports build
by naddy@.
ok tedu@
|
|
detection broke while changing the owner of a lock from struct proc to struct
filedesc/file. Instead of keeping track of the owning proc for each lock,
introduce a new list for all pending blocked locks. This list is scanned before
waiting on a blocking lock in order to determine if sleeping would cause a
deadlock.
The new implementation is serialized by the recently added locking to the same
subsystem, meaning that acquiring the kernel lock is no longer necessary.
ok visa@
|
|
Should be revisited once logwakeup() is fixed.
|
|
Use splassert_fail() instead, please set kern.splassert to 2 and report
the corresponding stack trace if you see a warning.
ok dlg@
|
|
All non-dummy implementations of VOP_ADVLOCK() rely on lf_advlock()
which is now safe to use without the kernel lock. Because VOP_ADVLOCK()
does not make the vnode dirty, it is unnecessary to keep track of
in-flight vnode lock operations and the updating of vnode->v_inflight
can be dropped from VOP_ADVLOCK(). This makes VOP_ADVLOCK() safe to use
without the kernel lock.
OK tedu@ mpi@
|
|
it obviously needs to be called with the kernel lock held, so it
makes sense to check that so we can unlock more code without
introducing bugs that shoot us in the face in the indeterminate
future.
csignal is basically a wrapper around ptsignal, so calls to that
without the kernel lock should be caught by this too.
discussed with mpi@ on bugs@
|
|
operations that tweak the kq_head and kq_count need to be serialised
against the kevent syscall which also fumbles with the list and
count too.
these asserts would have made it extremely obvious where the tun(4)
bug was. for half the time of the bug report about it we werent
even sure it was tun(4)
discussed with mpi@ jmatthew@
|
|
We ought to conform to the windup_mtx protocol and call tc_windup() even
if we aren't changing the system uptime.
|
|
if a taskq takes a lock, and something holding that lock calls
taskq_barrier, there's a potential deadlock. detect this as a lock
order problem when witness is enable. task_del conditionally followed
by taskq_barrier is a common pattern, so add a taskq_del_barrier
wrapper for it that unconditionally checks for the deadlock, like
timeout_del_barrier.
ok visa@
|
|
Reduce code clutter by removing the file name and line number output
from witness(4). Typically it is easy enough to locate offending locks
using the stack traces that are shown in lock order conflict reports.
Tricky cases can be tracked using sysctl kern.witness.locktrace=1 .
This patch additionally removes the witness(4) wrapper for mutexes.
Now each mutex implementation has to invoke the WITNESS_*() macros
in order to utilize the checker.
Discussed with and OK dlg@, OK mpi@
|
|
Now that alpha is fixed, we can use sizeof().
|
|
|
|
only need a lockf_state pointer by now.
ok mpi@ visa@
|
|
and lf_purgelocks() without the kernel lock.
OK anton@ mpi@
|
|
condition in sb_compress(). Currently the actual cluster size might
be 9KB even if the mtu is 1500, in this case a lot of memory space had
been wasted, since sbcompress() doesn't compress because of previous
condition.
ok dlg claudio
|
|
The caller of timeout_barrier() must not hold locks that could prevent
timeout handlers from making progress. The system could deadlock
otherwise.
This patch makes witness(4) able to detect barrier locking errors.
This is done by introducing a pseudo-lock that couples the lock chains
of barrier callers to the lock chains of timeout handlers.
In order to find these errors faster, this diff adds a synchronous
version of cancelling timeouts, timeout_del_barrier(9). As the
synchronous intent is explicit, this interface can check lock order
immediately instead of waiting for the potentially rare occurrence of
timeout_barrier(9).
OK dlg@ mpi@
|
|
obvious misconfigurations that cannot work.
OK mpi@ tedu@
|
|
the kernel does not need a __stack_smash_handler function.
WARNING: You need a fairly new clang, approximately > March 31.
with mortimer
|
|
if we ever want it back, it's in the attic.
ok mpi@ visa@ kettenis@
|
|
to vfs_lockf.c. This makes the public interface clearer.
The declaration of variable lockf_debug is removed from the header
because it is not needed outside of vfs_lockf.c.
OK anton@ tedu@
|
|
|
|
No other (known) BSD-derived adjtime(2) implementation checks for overflow
when converting delta into its final denomination of fractional seconds.
This is peculiar, as the call originates in 4.3BSD.
However, glibc, uclibc, and (to an extent) musl /do/ check the input and set
EINVAL if it exceeds a certain bound, so we'll just use the errno that they
use to be consistent with extant practice.
Prompted by the comment kettenis@ left when we switched to storing the
adjustment in an int64_t like ~5 years ago (kern_time.c,v 1.87).
Positive feedback from deraadt@, manpage bits ok jmc@,
no code complaints from otto@ or tedu@.
|
|
added aggressively today. Hopefully post release a glorious
flensing will remove UNVEIL_INSPECT anyway
Reported-by: syzbot+3375ce307ac7909b907b@syzkaller.appspotmail.com
|
|
Otherwise, the system can crash in smr_call_impl() if SMT is
enabled later.
Crash reported by jcs@
|
|
tc_lock allows adjfreq(2) and the kern.timecounter.hardware sysctl(2)
to read/write the active timecounter pointer and the .tc_adj_freq
member of the active timecounter safely. This eliminates any possibility
of a torn read/write for the .tc_adj_freq member when we drop the
KERNEL_LOCK from the timecounting layer. It also ensures the active
timecounter does not change in the midst of an adjfreq(2) call.
Because these are not high-traffic paths, we can get away with using
tc_lock in write-mode to ensure combination read/write adjtime(2) calls
are relatively atomic (a) to other writer adjtime(2) calls, and (b) to
settimeofday(2)/clock_settime(2) calls, which cancel ongoing adjtime(2)
adjustment.
When the KERNEL_LOCK is dropped, an unprivileged user will be able to
create some tc_lock contention via adjfreq(2); it is very unlikely to
ever be a problem. If it ever is actually a problem a lockless read
could be added to address it.
While here, reorganize sys_adjfreq()/sys_adjtime() to minimize code
under the lock. Also while here, make tc_adjfreq() void, as it cannot
fail under any circumstance. Also also while here, annotate various
globals/struct members with lock ordering details.
With lots of input from mpi@ and visa@.
ok visa@
|
|
UNVEIL_INSPECT is a hack we added to get chrome/glib working. It silently
adds permission for stat(2), access(2), and readlink(2) to be used on
all path components of any unveil'ed path. robert@ has sucessfully now
fixed chrome/glib to not require exessive TOC vs TOU stat(2) and access(2)
calls on the paths it uses, so that this no longer needed there.
readlink(2) is the sole call that is now permitted by UNVEIL_INSPECT,
and this is only needed so that realpath(3) can work. Going forward we will
likely make a realpath(2), after which we can completely deprecate
UNVEIL_INSPECT.
ok deraadt@
|
|
on spinning even if `db_active' or `panicstr' has been set. The new
mutex also disables IPIs in the critical section.
OK mpi@ patrick@
|
|
adjtimedelta is 64-bit and thus can't be read/written atomically on all
architectures. Because it can be modified from tc_windup() and
ntp_update_second() we need a way to ensure safe reads/writes for
adjtime(2) callers. One solution is to move it into the timehands and
adopt the lockless read protocol we now use for the system boot time and
uptime.
So make new_adjtimedelta an argument to tc_windup() and add a lockless
read loop to tc_adjtime(). With adjtimedelta stored in the timehands
we can now simply pass a timehands pointer to ntp_update_second(). This
makes ntp_update_second() safer as we're using the timehands' timecounter
pointer instead of the mutable global timecounter pointer.
Lots of input from mpi@ and visa@.
ok visa@
|
|
This will make upcoming MP-related diffs smaller and should make the code
int kern_tc.c easier to read in general. "windup_mtx" is also a better
mnemonic: always call tc_windup() before leaving windup_mtx.
|
|
|
|
capable of detecting undefined behavior at runtime and all findings are
printed to the system console, including the offending line in the
source code.
kubsan is limited to architectures using Clang as their default compiler
and is not enabled by default.
Derived from the NetBSD implementation.
ok kettenis@ visa@
|