Age | Commit message (Collapse) | Author |
|
|
|
OK mvs@
|
|
without the KERNEL_LOCK.
This moves VXLOCK and VXWANT to a mutex protected v_lflag field and also
v_lockcount is protected by this mutex.
The vn_lock() dance is overly complex and all of this should probably replaced
by a proper lock on the vnode but such a diff is a lot more complex. This
is an intermediate step so that at least some calls can be modified to grab
the KERNEL_LOCK later or not at all.
OK mpi@
|
|
It makes clearer which vop functions are real fileystem-implementations and which one are only stubs.
No functional changes are intented.
ok visa@
|
|
ok mpi@
|
|
|
|
|
|
This makes appear some redundant & racy checks.
ok semarie@
|
|
Rename klist_{insert,remove}() to klist_{insert,remove}_locked().
These functions assume that the caller has locked the klist. The current
state of locking remains intact because the kernel lock is still used
with all klists.
Add new functions klist_insert() and klist_remove() that lock the klist
internally. This allows some code simplification.
OK mpi@
|
|
Requires sysctl_bounded_arr branch to support sysctl_rdint.
The read-only variables are marked by an empty range of [1, 0].
OK millert@
|
|
From dholland@NetBSD
ok anton@
|
|
ok kn@
|
|
This is the documented behaviour which was changed by pedro in rev 1.81
which was partially backed out in rev 1.82.
|
|
time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.
This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).
There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.
There is no performance cost on 64-bit (__LP64__) platforms.
With input from visa@, dlg@, and tedu@.
Several bugs squashed by visa@.
ok kettenis@
|
|
ncg * ipg calcualtion can overflow if signed types are used. Move
to uint32_t for the relevant values. Aligned with FreeBSD changes.
Also make sure newfs refuses to create an fs with more that 2^32-1
inodes. ok millert@
|
|
While here prefix kernel-only EV flags with two underbars.
Suggested by kettenis@, ok visa@
|
|
Adapt FS kqfilters to always return true when the flag is set and bypass
the polling mechanism of the NFS thread.
While here implement a write filter for NFS.
ok visa@
|
|
to find another, instead of first forward and then backward. The latter method
causes most full cgs to end up at the end of the partition. From FreeBSD.
ok millert@
|
|
from the range [1..UINT_MAX] initially. On inode re-use increment
and on wrap refill from the range [1..UINT_MAX-1] to avoid
assigning UINT_MAX (the original value). Zero still means uninitialized.
ok millert@
|
|
Just like most of the vop_* methods in MFS they aren't used.
ok millert@, visa@
|
|
for example, with locking assertions.
OK mpi@, anton@
|
|
If process_worklist_item() is unable to process locked vnodes,
num_on_worklist will still be non-zero, preventing the loop in
softdep_process_worklist() from exiting. This can result in a
kernel hang.
To fix this, process_worklist_item() now returns non-zero if it was
able to process a worklist item (regardless of whether it matched
the mountpoint) and takes a pointer to matchcnt as a function
argument. We now break out of the loop in softdep_process_worklist()
if process_worklist_item() is unable to make progress.
OK beck@ bluhm@
|
|
- ufs_chown() & ufs_chmod()
- ufs_reclaim()
- ext2fs_chown() & ext2fs_chmod()
- ntfs_ntget() & ntfs_ntput()
- ntfs_vgetex(), ntfs_ntlookup() & ntfs_ntlookupfile()
While here use `ap->a_p' directly when it is only required to re-enter
the VFS layer in order to help reducing the loop.
ok visa@
|
|
ok cheloha@
|
|
|
|
adding more filter properties without cluttering the struct.
OK mpi@, anton@
|
|
Do not include <sys/kthread.h> where it is not needed and stop including
<sys/proc.h> in it.
ok visa@, anton@
|
|
and to CLRSIG.
OK mpi@ visa@
|
|
OK florian@, bluhm@, visa@
|
|
Only call bread_cluster if either the previously read ffs block is
adjacent to the current block or if the current read request exceeds the
current ffs block. This effectively turns off read-ahead for random reads
that fall within one ffs block.
okay beck@, mpi@, visa@
|
|
into read-only data segment.
OK deraadt@ tedu@
|
|
ok bluhm@
|
|
within softdep_process_worklist. When this is called from the syncer
a vnode may be legtitimately locked by someone waiting for buffers
so we need to skip anything locked. FreeBSD appears to have a similar
change. This avoids a deadlock where the syncer ends up waiting for
the inode lock that his held by someone waiting for buffer space.
Found by bluhm@ and some genua folks
ok bluhm@
|
|
make the structs const so that the data are put in .rodata.
OK mpi@, deraadt@, anton@, bluhm@
|
|
OK visa@
|
|
Ensure that io to a file backing a vnd is IO_SYNC, so IO to a
vnd device is both synchronous and not cached in the buffer cache.
This allows the "mount" regress to work repeatably, and avoids
a situation where when the buffer cache cleaner runs to clear
dirty buffers while people are waiting, it actually increases the
dirty buffers when the writes to the underlying vnd are also
delayed.
ok bluhm@
|
|
ok bluhm@, cheloha@
|
|
sweep tree to correct NDIINT op and flags ahead of time. document
the requirement. This allows KERNELPATH to be used to bypass
unveil for crash dumps with nosuidcoredump=2 or 3
ok visa@ deraadt@ florian@
|
|
serializing both read/write operations using the existing file mutex.
The vnode lock still grants exclusive write access to the offset; the
mutex is only used to make the actual write atomic and prevent any
concurrent reader from observing intermediate values.
ok mpi@ visa@
|
|
|
|
|
|
|
|
https://marc.info/?l=openbsd-cvs&m=156277704122293&w=2
ok anton@
|
|
as part of the effort to unlock the kernel. Instead of relying on the
vnode lock, introduce a dedicated lock per file. Exclusive write access
is granted using the new foffset_enter and foffset_leave API. A
convenience function foffset_get is also available for threads that only
need to read the current offset.
The lock acquisition order in vn_write has been changed to match the one
in vn_read in order to avoid a potential deadlock. This change also gets
rid of a documented race in vn_read().
Inspired by the FreeBSD implementation.
With help and ok mpi@ visa@
|
|
Based on FreeBSD r320578.
While here, rename a few macros to make the consisten and keep in sync with
Linux upstream.
ok kn@
|
|
uvm_vnp_setsize() which is not free from side-effects.
ok visa@
|
|
a reported baddir panic. Discussed with guenther tedu kettenis millert..
|
|
incorrectly placed underneath the last 4 bytes (and then overwritten)
rather than afterwards.
We got confused and followed FreeBSD's lead, which curiously increased
the leakage of kernel stack from 3 bytes to 4...
ok millert kettenis
|
|
With some gritty work up to 254 bytes can be discovered. More details at
https://svnweb.freebsd.org/changeset/base/347066
The impact on OpenBSD is very limited:
1 - such stack bytes can be found in raw-device reads, from group operator.
If you can read the raw disks you can undertake other more powerful actions.
2 - read(2) upon directory fd was disabled July 1997 because I didn't like
how grep * would display garbage and mess up the tty, and applying vis(3)
for just directory reads seemed silly. read(2) was changed to return
0 (EOF). Sep 2016 this was further changed to EISDIR, so you still cannot
see the bad bytes.
3 - In 2013 when guenther adapted the getdents(2) directory-reading system
call to 64-bit ino_t, the userland data format changed to 8-byte-alignment,
making it incompatible with the 4-byte-alignment UFS on-disk format. As
a result of code refactoring the bad bytes were not copied to userland.
Bad bytes will remain in old directories on old filesystems, but nothing makes
those bytes user visible. There will be no errata or syspatch issued. I
urge other systems which do expose the information to userland to issue
errata quickly, since this is a 254 byte infoleak of the stack which is great
for ROP-chain building to attack some other bug. Especially if the kernel
has no layout/link-order randomization ...
ok kettenis jca millert otto ...
|
|
ok millert otto kettenis
|