Age | Commit message (Collapse) | Author |
|
serializing both read/write operations using the existing file mutex.
The vnode lock still grants exclusive write access to the offset; the
mutex is only used to make the actual write atomic and prevent any
concurrent reader from observing intermediate values.
ok mpi@ visa@
|
|
|
|
|
|
|
|
https://marc.info/?l=openbsd-cvs&m=156277704122293&w=2
ok anton@
|
|
as part of the effort to unlock the kernel. Instead of relying on the
vnode lock, introduce a dedicated lock per file. Exclusive write access
is granted using the new foffset_enter and foffset_leave API. A
convenience function foffset_get is also available for threads that only
need to read the current offset.
The lock acquisition order in vn_write has been changed to match the one
in vn_read in order to avoid a potential deadlock. This change also gets
rid of a documented race in vn_read().
Inspired by the FreeBSD implementation.
With help and ok mpi@ visa@
|
|
Based on FreeBSD r320578.
While here, rename a few macros to make the consisten and keep in sync with
Linux upstream.
ok kn@
|
|
uvm_vnp_setsize() which is not free from side-effects.
ok visa@
|
|
a reported baddir panic. Discussed with guenther tedu kettenis millert..
|
|
incorrectly placed underneath the last 4 bytes (and then overwritten)
rather than afterwards.
We got confused and followed FreeBSD's lead, which curiously increased
the leakage of kernel stack from 3 bytes to 4...
ok millert kettenis
|
|
With some gritty work up to 254 bytes can be discovered. More details at
https://svnweb.freebsd.org/changeset/base/347066
The impact on OpenBSD is very limited:
1 - such stack bytes can be found in raw-device reads, from group operator.
If you can read the raw disks you can undertake other more powerful actions.
2 - read(2) upon directory fd was disabled July 1997 because I didn't like
how grep * would display garbage and mess up the tty, and applying vis(3)
for just directory reads seemed silly. read(2) was changed to return
0 (EOF). Sep 2016 this was further changed to EISDIR, so you still cannot
see the bad bytes.
3 - In 2013 when guenther adapted the getdents(2) directory-reading system
call to 64-bit ino_t, the userland data format changed to 8-byte-alignment,
making it incompatible with the 4-byte-alignment UFS on-disk format. As
a result of code refactoring the bad bytes were not copied to userland.
Bad bytes will remain in old directories on old filesystems, but nothing makes
those bytes user visible. There will be no errata or syspatch issued. I
urge other systems which do expose the information to userland to issue
errata quickly, since this is a 254 byte infoleak of the stack which is great
for ROP-chain building to attack some other bug. Especially if the kernel
has no layout/link-order randomization ...
ok kettenis jca millert otto ...
|
|
ok millert otto kettenis
|
|
ok deraadt@
|
|
perhaps not enough for everyone, but we'll see what happens.
|
|
structure allows for better tracking of pending lock operations which is
essential in order to prevent a use-after-free once the underlying vnode is
gone.
Inspired by the lockf implementation in FreeBSD.
ok visa@
Reported-by: syzbot+d5540a236382f50f1dac@syzkaller.appspotmail.com
|
|
protected properly and files without any x bit set were accidentaly considered
executable when checked with access(2).
Issues found and reported by deraadt, halex, reyk, tb
ok deraadt
|
|
dedicated functions.
OK deraadt@ mpi@
|
|
|
|
does.
ok deraadt@ kettenis@
|
|
`mount -uo async,nosoftdep /mnt' would set "async" but keep "softdep"
untouched on a read/write mount.
OK deraadt krw beck bluhm
|
|
OK mpi@
|
|
implementations. Rely on the VFS layer to do the checking.
OK mpi@, helg@
|
|
of mounted on directories.
OK guenther@, mpi@
|
|
unlocking the directory vnode.
OK mpi@, helg@
|
|
just as is done in ffs_reload().
Requested by and OK bluhm@
|
|
the file allocation routine. This allows stepwise changing of the vnode
locking discipline.
OK mpi@
|
|
of the vinvalbuf() call, just like is done by other filesystems. This
prevents a kernel panic with VFSLCKDEBUG.
OK mpi@
|
|
OK mpi@
|
|
processed by the knote() hook. This ensures the vnode does not get
freed or reused too early.
OK mpi@, guenther@
|
|
unnecessary because curproc always does the locking.
OK mpi@
|
|
curproc that does the locking or unlocking, so the proc parameter
is pointless and can be dropped.
OK mpi@, deraadt@
|
|
OK millert@ visa@
|
|
Fixes softdep+UFS2. Found out the hard way by naddy@
ok visa@ naddy@ deraadt@
|
|
OK visa@
|
|
play with WITNESS.
ok visa@
|
|
safely to disk. The subsystem seems to be working as intended! :)
|
|
The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.
No objection from millert@, ok tedu@, bluhm@
|
|
are pushed to disk. Dangling vnodes (unlinked files still in use) and
vnodes undergoing change by long-running syscalls are identified -- and
such filesystems are marked dirty on-disk while we are suspended (in case
power is lost, a fsck will be required). Filesystems without dangling or
busy vnodes are marked clean, resulting in faster boots following
"battery died" circumstances.
Tested by numerous developers, thanks for the feedback.
|
|
was cleared on alloc just like we do in ffs_write().
|
|
fails, adapted from FreeBSD. Also avoid clearing the buffer if it
was cleared when allocated. OK deraadt@ otto@
|
|
|
|
|
|
ok millert@ sthen@
|
|
ok deraadt@ krw@
|
|
ok millert@ krw@
|
|
Change mountpoint to RDONLY a little later. Seems to improve the
rw->ro transition a bit.
|
|
other pending inode attribute changes. We appear to be missing UFS_UPDATE()
calls in some paths with the result that bsd.rd remounting the newly
created /mnt to RO would lose the GID changes on device inodes there.
This only affected devices, as they're the only inodes where timestamp
writes are delayed.
ok deraadt@
|
|
The deadlock happens when softdep gets the same buffer in the BMSAFEMAP case
that it already called getdirtybuf() on and made busy at the top of the loop.
when this is the case, skip the BMSAFEMAP case and simply write the buffer
out at the bottom of the loop as always. This avoids calling getdirtybuf()
a second time on the same buffer we already took for exclusive use ourself
and have not yet written out.
While I'm in here add a KASSERT for the similar case above, which I don't
think can happen but we would deadlock in the same way if it does.
testing by and ok bluhm@
|
|
and can cause pending IO's on wd(4) to be thrown away. Still
trying to find a solution.
|
|
for blocks re-fetchable from the filesystem. However at reboot time,
filesystems are unmounted, and since processes lack backing store they
are killed. Since the scheduler is still running, in some cases init is
killed... which drops us to ddb [noted by bluhm]. Solution is to convert
filesystems to read-only [proposed by kettenis]. The tale follows:
sys_reboot() should pass proc * to MD boot() to vfs_shutdown() which
completes current IO with vfs_busy VB_WRITE|VB_WAIT, then calls VFS_MOUNT()
with MNT_UPDATE | MNT_RDONLY, soon teaching us that *fs_mount() calls a
copyin() late... so store the sizes in vfsconflist[] and move the copyin()
to sys_mount()... and notice nfs_mount copyin() is size-variant, so kill
legacy struct nfs_args3. Next we learn ffs_mount()'s MNT_UPDATE code is
sharp and rusty especially wrt softdep, so fix some bugs adn add
~MNT_SOFTDEP to the downgrade. Some vnodes need a little more help,
so tie them to &dead_vnops.
ffs_mount calling DIOCCACHESYNC is causing a bit of grief still but
this issue is seperate and will be dealt with in time.
couple hundred reboots by bluhm and myself, advice from guenther and
others at the hut
|