Age | Commit message (Collapse) | Author |
|
to p_siglist and so there is no need to check ps_siglist for the signal.
OK mpi@
|
|
Remove the v_inflight member and alter the ffs and ext2fs sync code to
track inflight by checking if the node is locked or not (which it already
did before but for a different reason).
OK mpi@
|
|
it is a leftover from LOCKPARENT removal in NDINIT() (in rev 1.337)
ok mpi@
|
|
It replaces spec_badop, fifo_badop, dead_badop and mfs_badop, which
are only calls to panic(9), to one unique function vop_generic_badop().
No intented behaviour changes (outside the panic message which isn't
the same).
ok mpi@
|
|
If the timespec is zero-valued sys___thrsigdivert() should just do the
check for pending signals and return immediatly.
OK kettenis@
|
|
Bring these values in sync with the `tid' builtin which already include
the offset. This is necessary to build script comparing them, like:
tracepoint:sched:enqueue
{
@ts[arg0] = nsecs;
}
tracepoint:sched:on__cpu
/@ts[tid]/
{
latency = nsecs - @ts[tid];
}
Discussed with and ok bluhm@
|
|
ok kettenis@
|
|
|
|
Some pmaps (x86, hppa) and the buffer cache rely on UVM objects to allocate
and manipulate pages. These objects should not be manipulated by uvm_fault()
and do not currently require the same locking enforcement.
Use the dummy pagers to explicitly document which UVM functions are meant to
manipulate UVM objects (uobj) that do not need the upcoming `vmobjlock' and
instead still rely on the KERNEL_LOCK().
Tested by many as part of a larger diff.
ok kettenis@, beck@
|
|
this allows us to dynamically trace function boundaries with btrace by patching
prologues and epilogues with a breakpoint upon which the handler records the data,
sends it back to userland for btrace to consume.
currently it's hidden behind DDBPROF, and there is still a lot to cleanup and
improve, but basic scripts that observe return codes from a probed function
work.
from Tom Rollet, with various changes by me
feedback and ok mpi@
|
|
possible violation during the traversal of the path do the check at the
end. Make the code a bit easier to grok.
OK beck@ semarie@
|
|
"make tags" needs "make links" to have tags available in subdirectories and
netinet6 has been missing all the time.
OK tb
|
|
|
|
ok mlarkin
|
|
tracts of unused memory, and the empty-space RLE scanner (uvm_page_rle)
would rescan for empty space needlessly wasting excessive cpu time
16G machine, 100sec -> 9sec
40G machine, 325sec -> 28sec
with kettenis mlarkin
|
|
parent just called unveil(NULL, NULL) and nothing else.
With and OK beck@
|
|
"makes perfect sense to me" chris
ok gnezdo jca
|
|
per-socket locking.
No functional change.
|
|
ok mvs@
|
|
that the KERNEL_LOCK() is held unless the filter is marked as MPSAFE.
Should help finding missing locks when unlocking various filters.
ok visa@
|
|
Adjust kqpoll_dequeue() so that it will clear only badfd knotes when
called from kqpoll_init(). This is needed by kqpoll's lazy removal
of knotes. Eager removal in kqpoll_dequeue() would defeat kqpoll's
attempt to reuse previously established knotes under workloads where
knote activation tends to occur already before next kqpoll scan.
Prompted by mpi@
|
|
pass in the vnode to unveil_start_relative() like it is done for *at()
syscalls. This fixes an issue with fchdir() that actually did not correctly
reset this pointer when changing the working directory.
OK beck@
|
|
OK semarie@
|
|
the signal handler code. Traditionally a process would spin in
such a case, but we changed the logic in revision 1.167 trapsignal()
to receive a fatal signal. If that happens to init(8), the kernel
panics. In case of reboot, jump between init signal handler and
page fault trap until the kernel resets the machine.
reported and tested weerd@; OK deraadt@
|
|
|
|
These traversed vnodes are a leftover from early times where realpath(3)
was still all done in userland.
OK semarie@
|
|
waiting on CPUs that didn't spin up. This will allow us to spin down
CPUs in the future to save power as well.
ok mpi@
|
|
The code doesn't doesn't need it: the returned vnode is released
immediately. The string path is built from the namei() call using
REALPATH, during directories traversal.
Without LOCKLEAF, calling vrele() only is enough if namei() found a
file, instead of calling VOP_UNLOCK() + vrele().
ok claudio@ mpi@
|
|
lock. accept(2) panics, connect(2) dead locks. Additionally copy
in or out must not hold the net lock as it may be a memory mapped
file on NFS.
Simplify dns_portcheck(), it does not modify namelen anymore.
In doaccept() release the socket lock before calling copyaddrout().
Rearrange the checks in sys_connect() like they are in sys_bind().
OK mpi@
|
|
in crypto.c and annotate locking protection. Assert kernel lock
where needed. Remove dead code from crypto_get_driverid(). Move
crypto_init() prototype into header file.
OK mpi@
|
|
those bits.
|
|
ok drahn@
|
|
This helps unveil_add_vnode() to properly re-evaluate unveils when
"/" is added to the list.
Because of this adjust unveil_covered() to check for the root as well
so that in that case the unveil uv is returned instead of NULL. Traversing
up from the root returns the root. This check is not really needed since
namei has its own root check and shortcuts for root vnodes.
OK semarie@
|
|
ok deraadt@ kettenis@
|
|
ok matthieu@, deraadt@, jsg@
|
|
return early for simple conditions instead of using navigating inside
if-branches.
with and ok claudio@
|
|
place the wrong index is used resulting in re-evaluating all unveil nodes.
Also loop over over all but the last (just added vnode) -- again there is
no need to re-evaluate the cover of the just added unveil.
OK anton@ semarie@
|
|
Refactor the fraction-to-nanosecond conversion from BINTIME_TO_TIMESPEC()
into a dedicated routine, FRAC_TO_NSEC(), so we can reuse it elsewhere.
Then add a new BINTIME_TO_NSEC() function to sys/time.h to deduplicate
conversion code in nsecuptime(), getnsecuptime(), and tc_setclock().
Thread: https://marc.info/?l=openbsd-tech&m=162376993926751&w=2
ok dlg@
|
|
Move the kclock argument before the flags argument. XORing a bunch of
flags together may "sprawl", and I'd rather have any sprawl at the end
of the parameter list.
timeout_set_kclock() is undocumented and there is only one caller, so
no big refactor required.
Best to do this argument order shuffle before any bigger refactors of
e.g. timeout_set(9).
|
|
Currently setitimer(2) rejects timers larger than 100 million seconds
and sets EINVAL.
With the change to kclock timeouts there is no longer any reason to
use this arbitrary value. Kclock timeouts support the full range of a
timespec, so we can increase the upper bound without practical risk of
arithmetic overflow.
If we push the limit to UINT_MAX we can support the full input range
of alarm(3). We can then simplify the alarm.3 manpage in a separate
patch.
We can push the limit even higher in the future if we find software
that doesn't like the UINT_MAX limit. Until then, UINT_MAX seconds
(over 68 years) is plenty for all practical timers.
ok claudio@
|
|
The kn_status field of struct knote is part of kqueue's internal state.
When kn_status is being updated, kq_lock has to be locked. This is true
even with MP-unsafe event filters.
OK mpi@
|
|
For example uvm_objinit() becomes uvm_obj_init(). Reduce differences
between the trees and help porting new functions needed for UVM object
locking.
No functionnal change.
|
|
unveil_lookup() is now doing a dumb linear search. The problem with the
uvshrink logic was that ps_uvpcwd was a pointer into this array and after
compation it pointed to the wrong element. Also future unveil caches would
suffer from the same issue.
OK semarie@
|
|
these functions were implemented in a bunch of places with comments
saying it should be moved to kern_tc.c when more pop up, and i was
about to add another one. i think it's time to move them to kern_tc.c.
ok cheloa@ jmatthew@
|
|
partitions into the disklabel.
First, since the alt header is never accessed there is no need to
worry about it being inaccessible.
Second, the GPT header claiming to cover more sectors than the
device has is no reason to ignore all the partitions. The
partition actually present could still be useful.
Issues encountered in the wild by mlarkin@ while accessing some
disk images.
ok deraadt@
|
|
We can reduce latency for the first expiration of a timer if we don't
round it_value up to the minimum interval (1 tick).
While we're at it, we may as well consolidate all input validation and
adjustment into a single itimerfix() call. There are no other callers
in the kernel (nor should there be), so remove the prototype from
sys/time.h.
Discussion: https://marc.info/?l=openbsd-tech&m=162084338005502&w=2
Tested by weerd@ and claudio@.
probably ok claudio@
|
|
Reported-by: syzbot+c2aba7645a218ce03027@syzkaller.appspotmail.com
|
|
Extend struct kqueue with a mutex and use it to serializes the internals
of each kqueue instance. This should make possible to call kqueue's
system call interface without the kernel lock. The event source facing
side of kqueue should now be MP-safe, too, as long as the event source
itself is MP-safe.
msleep() with PCATCH still requires the kernel lock. To manage with
this, kqueue_scan() locks the kernel temporarily for the section that
may sleep.
As a consequence of the kqueue mutex, knote_acquire() can lose a wakeup
when klist_invalidate() calls it. To preserve proper nesting of mutexes,
knote_acquire() has to release the kqueue mutex before it unlocks klist.
This early unlocking of the mutex lets badly timed wakeups go unnoticed.
However, the system should not hang because the sleep has a timeout.
Tested by gnezdo@ and mpi@
OK mpi@
|
|
remove two leftover checks which were used when ni_unveil was used with UNVEIL_INSPECT.
it was used by:
- readlink(2) - removed 2019-08-31
- stat(2) and access(2) - removed 2019-03-24
ok claudio@
|
|
This socket flag was redundant with the socket buffer one.
ok mvs@
|