Age | Commit message (Collapse) | Author |
|
Move vfs_stall_barrier() from the fd layer into vn_lock() and the vfs layer.
In some cases it can result in a deadlock while suspending.
Discussed with mpi@ and deraadt@
|
|
ok semarie@
|
|
For now, only assert that the tree of pages is empty in uvm_obj_destroy().
This will soon be used to free the per-UVM object lock.
While here call uvm_obj_init() when new vnodes are allocated instead of
in uvn_attach(). Because vnodes and there associated UVM object are
currently never freed, it isn't easy to know where/when to garbage
collect the associated lock. So simply check that the reference of a
given object is 0 when uvn_attach().
Tested by many as part of a bigger diff.
ok kettenis@
|
|
This is a guard against pushing the lock too far in UVM's vnode land.
ok beck@
|
|
vfs stalling is used by suspend/resume and by vmt(4) to stall any
filesystem operation from altering the state on disk. All these
operations will call vn_lock and be stalled. Adjust vfs_stall_barrier()
to allow the lock owner to still progress so that suspend can sync
the filesystems after stalling vfs operation.
OK mpi@
|
|
(both kernel and userland bits)
GENERIC + VFSLCKDEBUG is broken with it.
|
|
This flag is currently used to mark or unmark a vnode to actively
check vnode locking semantic (when compiled with VFSLCKDEBUG).
Currently, VLOCKSWORK flag isn't properly set for several FS
implementation which have full locking support. This commit enable
proper checking for them too (cd9660, udf, fuse, msdosfs, tmpfs).
Instead of using a particular flag, it directly check if
v_op->vop_islocked is nullop or not to activate or not the vnode
locking checks.
ok mpi@
|
|
|
|
without the KERNEL_LOCK.
This moves VXLOCK and VXWANT to a mutex protected v_lflag field and also
v_lockcount is protected by this mutex.
The vn_lock() dance is overly complex and all of this should probably replaced
by a proper lock on the vnode but such a diff is a lot more complex. This
is an intermediate step so that at least some calls can be modified to grab
the KERNEL_LOCK later or not at all.
OK mpi@
|
|
of the v_un pointers).
OK jsg@ mvs@
|
|
"syncprt" is unused since kern/vfs_syscalls.c r1.147 from 2008.
Adding new debug sysctls is a bit opaque and looking at kern/kern_sysctl.c
the only visible difference between used and stub ctldebug structs in the
debugvars[] array is their extern keyword, indicating that it is defined
elsewhere.
sys/sysctl.h declares all debugN members as extern upfront, but these
declarations are not needed.
Remove the unused debug sysctl, rename the only remaining one to something
meaningful and remove forward declarations from /sys/sysctl.h; this way,
adding new debug sysctls is a matter of adding extern and coming up with a
name, which is nicer to read on its own and better to grep for.
OK mpi
|
|
Adding "debug.my-knob" sysctls is really helpful to select different
code paths and/or log on demand during runtime without recompile,
but as this code is under DEBUG, lots of other noise comes with it
which is often undesired, at least when looking at specific subsystems
only.
Adding globals to the kernel and breaking into DDB to change them helps,
but that does not work over SSH, hence the need for debug sysctls.
Introduces DEBUG_SYSCTL to make use of the "debug" MIB without the rest of
DEBUG; it's DEBUG_SYSCTL and not SYSCTL_DEBUG because it's not a general
option for all of sysctl(2).
OK gnezdo
|
|
several problems with the vnode exclusive lock implementation, I
overlooked the fact that a vnode can be in a state where the usecount is
zero while the holdcount still being positive. There could still be
threads waiting on the vnode lock in uvn_io() as long as the holdcount
is positive.
"go ahead" mpi@
Reported-by: syzbot+767d6deb1a647850a0ca@syzkaller.appspotmail.com
|
|
VOP_LOCK with LK_DRAIN. This simplifies VOP_LOCK() a fair bit.
OK visa@
|
|
into read-only data segment.
OK deraadt@ tedu@
|
|
unmount this list is traversed and the dirty vnodes are flushed to
disk. Forced unmount expects that the list is empty after flushing,
otherwise the kernel panics with "dangling vnode". As the write
to disk can sleep, new vnodes may be inserted. If softdep is
enabled, resolving the dependencies creates new dirty vnodes and
inserts them to the list. To fix the panic, let insmntque() insert
new vnodes at the tail of the list. Then vflush() will still catch
them while traversing the list in forward direction.
OK tedu@ millert@ visa@
|
|
This is not necessary as the loop is restarted after vgone(). Switch
to SLIST_FOREACH without _SAFE.
OK visa@
|
|
the vnode alias code more readable.
OK visa@
|
|
|
|
ok visa@, jca@
|
|
ensure that any other thread currently trying to acquire the underlying
vnode lock has observed that the same vnode is about to be exclusively
locked. Such threads must then sleep until the exclusive lock has been
released and then try to acquire the lock again. Otherwise, exclusive
access to the vnode cannot be guaranteed.
Thanks to naddy@ and visa@ for testing; ok visa@
Reported-by: syzbot+374d0e7e2400004957f7@syzkaller.appspotmail.com
|
|
|
|
|
|
This removes a system-wide serialization point, which might help
finding timing-related bugs.
OK deraadt@ anton@
|
|
mlarkin@ noticed we would freeze while removing enormous files because
of the amount of work done to invalidate buffers on unlink. This adds
a temporary workaround to ensure we give up the lock and yield while
doing this.
The longer term answer will be to move these buffers to another list
and not do the work here.
ok deraadt@
|
|
and lf_purgelocks() without the kernel lock.
OK anton@ mpi@
|
|
obvious misconfigurations that cannot work.
OK mpi@ tedu@
|
|
lead to lost errors, where a later fsync will return success. to fix this,
set a flag on the vnode indicating a past error has occurred, and return
an error for future fsync calls.
ok bluhm deraadt visa
|
|
structure allows for better tracking of pending lock operations which is
essential in order to prevent a use-after-free once the underlying vnode is
gone.
Inspired by the lockf implementation in FreeBSD.
ok visa@
Reported-by: syzbot+d5540a236382f50f1dac@syzkaller.appspotmail.com
|
|
protected properly and files without any x bit set were accidentaly considered
executable when checked with access(2).
Issues found and reported by deraadt, halex, reyk, tb
ok deraadt
|
|
ok visa@
|
|
to unsigned int.
OK deraadt@
|
|
dedicated functions.
OK deraadt@ mpi@
|
|
We were using spacing after ellipses in an inconsistent way in the
installer. Standardize on using "... " everywhere and take into account
the cursor position while we are waiting for the task to complete: the
cursor is now always positioned after the last dot, and the space is
added when displaying completion confirmation.
While there, also take cursor position into account in vfs_shutdown(),
and remove the extra leading space before ticks in dhclient.
OK deraadt@
|
|
Because loadable kernel modules are no longer, there is no need to
register or unregister filesystem implementations at runtime. Remove
vfs_register() and vfs_unregister(), and make vfsinit() call vfs_init
routines directly. Replace the linked list of vfsconf structs with
the vfsconflist[] array.
OK mpi@ bluhm@
|
|
OK bluhm@
|
|
This brings unveil into the tree, disabled by default - Currently
this will return EPERM on all attempts to use it until we are
fully certain it is ready for people to start using, but this
now allows for others to do more tweaking and experimentation.
Still needs to send the unveil's across forks and execs before
fully enabling.
Many thanks to robert@ and deraadt@ for extensive testing.
ok deraadt@
|
|
OK mpi@
|
|
to call vfs_busy() for all nested mount points. vfs_stall() called
vfs_busy() in reverser order for all mount points. Change the
direction of the latter to resolve the lock order conflict.
OK visa@
|
|
Use that in three places:
- vfs_stall()
- sys_mount()
- dounmount()'s MNT_FORCE-does-recursive-unmounts case
ok deraadt@ visa@
|
|
OK mpi@
|
|
The loop variable mp is protected by vfs_busy() so that it cannot
be unmounted. But the next mount point nmp could be unmounted while
VFS_SYNC() sleeps. As the loop in vfs_stall() does not destroy the
mount point, TAILQ_FOREACH_REVERSE without _SAVE is the correct
macro to use.
OK deraadt@ visa@
|
|
change and this has nothing to do with it.
ok visa@, bluhm@
|
|
OK mpi@
|
|
unnecessary because curproc always does the locking.
OK mpi@
|
|
curproc that does the locking or unlocking, so the proc parameter
is pointless and can be dropped.
OK mpi@, deraadt@
|
|
are corner cases where ffs may leak blocks. So better revert and
unmount all file systems at reboot. The "init died" panic will be
fixed in a different way.
OK deraadt@
|
|
are pushed to disk. Dangling vnodes (unlinked files still in use) and
vnodes undergoing change by long-running syscalls are identified -- and
such filesystems are marked dirty on-disk while we are suspended (in case
power is lost, a fsck will be required). Filesystems without dangling or
busy vnodes are marked clean, resulting in faster boots following
"battery died" circumstances.
Tested by numerous developers, thanks for the feedback.
|
|
time; the aggressive mountpoint destruction seems to hit insane
use-after-frees when we are already far on the way down.
|
|
Change mountpoint to RDONLY a little later. Seems to improve the
rw->ro transition a bit.
|