summaryrefslogtreecommitdiff
path: root/sys/ufs
AgeCommit message (Collapse)Author
2019-08-05Allow concurrent reads of the f_offset field of struct file byanton
serializing both read/write operations using the existing file mutex. The vnode lock still grants exclusive write access to the offset; the mutex is only used to make the actual write atomic and prevent any concurrent reader from observing intermediate values. ok mpi@ visa@
2019-07-25vinvalbuf(9): tlseep -> tsleep_nsec(9); ok millert@cheloha
2019-07-19vwaitforio(9): tsleep(9) -> tsleep_nsec(9); ok visa@cheloha
2019-07-19getblk(9): tsleep(9) -> tsleep_nsec(9); ok visa@cheloha
2019-07-12Revert anton@ changes about read/write unlockingsolene
https://marc.info/?l=openbsd-cvs&m=156277704122293&w=2 ok anton@
2019-07-10Make read/write of the f_offset field belonging to struct file MP-safe;anton
as part of the effort to unlock the kernel. Instead of relying on the vnode lock, introduce a dedicated lock per file. Exclusive write access is granted using the new foffset_enter and foffset_leave API. A convenience function foffset_get is also available for threads that only need to read the current offset. The lock acquisition order in vn_write has been changed to match the one in vn_read in order to avoid a potential deadlock. This change also gets rid of a documented race in vn_read(). Inspired by the FreeBSD implementation. With help and ok mpi@ visa@
2019-07-01Add more verbose messages about unsupported ext2fs features.Kevin Lo
Based on FreeBSD r320578. While here, rename a few macros to make the consisten and keep in sync with Linux upstream. ok kn@
2019-06-18Ensure the length passed to ffs_truncate() is within bounds before callinganton
uvm_vnp_setsize() which is not free from side-effects. ok visa@
2019-05-09Nope, the right byte layout is happening, but we still need to figure outTheo de Raadt
a reported baddir panic. Discussed with guenther tedu kettenis millert..
2019-05-09For filenames which are a multiple of 4 bytes long, the zero pad isTheo de Raadt
incorrectly placed underneath the last 4 bytes (and then overwritten) rather than afterwards. We got confused and followed FreeBSD's lead, which curiously increased the leakage of kernel stack from 3 bytes to 4... ok millert kettenis
2019-05-043 bytes of kernel stack address space were leaked into on-disk directories.Theo de Raadt
With some gritty work up to 254 bytes can be discovered. More details at https://svnweb.freebsd.org/changeset/base/347066 The impact on OpenBSD is very limited: 1 - such stack bytes can be found in raw-device reads, from group operator. If you can read the raw disks you can undertake other more powerful actions. 2 - read(2) upon directory fd was disabled July 1997 because I didn't like how grep * would display garbage and mess up the tty, and applying vis(3) for just directory reads seemed silly. read(2) was changed to return 0 (EOF). Sep 2016 this was further changed to EISDIR, so you still cannot see the bad bytes. 3 - In 2013 when guenther adapted the getdents(2) directory-reading system call to 64-bit ino_t, the userland data format changed to 8-byte-alignment, making it incompatible with the 4-byte-alignment UFS on-disk format. As a result of code refactoring the bad bytes were not copied to userland. Bad bytes will remain in old directories on old filesystems, but nothing makes those bytes user visible. There will be no errata or syspatch issued. I urge other systems which do expose the information to userland to issue errata quickly, since this is a 254 byte infoleak of the stack which is great for ROP-chain building to attack some other bug. Especially if the kernel has no layout/link-order randomization ... ok kettenis jca millert otto ...
2019-05-04Add DIR_ROUNDUP define, from Kirk McKusickTheo de Raadt
ok millert otto kettenis
2019-03-15Remove FBSDID.Kevin Lo
ok deraadt@
2019-03-06increase dirhash mem a bit since very tiny machines are less common.Ted Unangst
perhaps not enough for everyone, but we'll see what happens.
2019-01-21Introduce a dedicated entry point data structure for file locks. This new dataanton
structure allows for better tracking of pending lock operations which is essential in order to prevent a use-after-free once the underlying vnode is gone. Inspired by the lockf implementation in FreeBSD. ok visa@ Reported-by: syzbot+d5540a236382f50f1dac@syzkaller.appspotmail.com
2018-12-23Rectify some issues with the noperm mount flag; the root vnode was notMartin Natano
protected properly and files without any x bit set were accidentaly considered executable when checked with access(2). Issues found and reported by deraadt, halex, reyk, tb ok deraadt
2018-09-26Move the allocating and freeing of mount points intoVisa Hankala
dedicated functions. OK deraadt@ mpi@
2018-09-06fix whitespaceJonathan Gray
2018-07-21Include the vnode type in the panic message in ffs_write(), just like ffs_read()anton
does. ok deraadt@ kettenis@
2018-07-11Prevent updating async option on softdep mountkn
`mount -uo async,nosoftdep /mnt' would set "async" but keep "softdep" untouched on a read/write mount. OK deraadt krw beck bluhm
2018-07-02Use more list macros for v_dirtyblkhd.Alexander Bluhm
OK mpi@
2018-06-21Drop redundant "node == parent node" checks from VOP_RMDIR()Visa Hankala
implementations. Rely on the VFS layer to do the checking. OK mpi@, helg@
2018-06-13Make the VFS layer responsible for preventing the deletionVisa Hankala
of mounted on directories. OK guenther@, mpi@
2018-06-07Make callers of VOP_CREATE(9) and VOP_MKNOD(9) responsible forVisa Hankala
unlocking the directory vnode. OK mpi@, helg@
2018-05-29Lock the device vnode when calling vinvalbuf() in ext2fs_reload(),Visa Hankala
just as is done in ffs_reload(). Requested by and OK bluhm@
2018-05-28Call vput(dvp) in vnode operation functions instead of calling it inVisa Hankala
the file allocation routine. This allows stepwise changing of the vnode locking discipline. OK mpi@
2018-05-28When mounting an ext2 filesystem, lock the device vnode for the durationVisa Hankala
of the vinvalbuf() call, just like is done by other filesystems. This prevents a kernel panic with VFSLCKDEBUG. OK mpi@
2018-05-27Drop unnecessary `p' parameter from vget(9).Visa Hankala
OK mpi@
2018-05-24Delay the vput() of the directory vnode until the vnode has beenVisa Hankala
processed by the knote() hook. This ensures the vnode does not get freed or reused too early. OK mpi@, guenther@
2018-05-02Remove proc from the parameters of vn_lock(). The parameter isVisa Hankala
unnecessary because curproc always does the locking. OK mpi@
2018-04-28Clean up the parameters of VOP_LOCK() and VOP_UNLOCK(). It is alwaysVisa Hankala
curproc that does the locking or unlocking, so the proc parameter is pointless and can be dropped. OK mpi@, deraadt@
2018-04-02Add size to free()David Hill
OK millert@ visa@
2018-04-01Store the size of dinode contents union.David Hill
Fixes softdep+UFS2. Found out the hard way by naddy@ ok visa@ naddy@ deraadt@
2018-03-30Add sizes to some free() calls.David Hill
OK visa@
2018-03-28Mark ext2fs inode recursive lock as RWL_IS_VNODE like for ffs to let itMartin Pieuchot
play with WITNESS. ok visa@
2018-03-15"force dirty" printf's are no longer required when pushing filesystemsTheo de Raadt
safely to disk. The subsystem seems to be working as intended! :)
2018-02-19Remove almost unused `flags' argument of suser().Martin Pieuchot
The account flag `ASU' will no longer be set but that makes suser() mpsafe since it no longer mess with a per-process field. No objection from millert@, ok tedu@, bluhm@
2018-02-10Syncronize filesystems to disk when suspending. Each mountpoint's vnodesTheo de Raadt
are pushed to disk. Dangling vnodes (unlinked files still in use) and vnodes undergoing change by long-running syscalls are identified -- and such filesystems are marked dirty on-disk while we are suspended (in case power is lost, a fsck will be required). Filesystems without dangling or busy vnodes are marked clean, resulting in faster boots following "battery died" circumstances. Tested by numerous developers, thanks for the feedback.
2018-01-13In ext2fs_write(), clear the buffer on uiomove() failure unless itTodd C. Miller
was cleared on alloc just like we do in ffs_write().
2018-01-13Add comment describing why we need to clear the buffer if uiomove()Todd C. Miller
fails, adapted from FreeBSD. Also avoid clearing the buffer if it was cleared when allocated. OK deraadt@ otto@
2018-01-08Pass correct size to uvm_vnp_setsize() for large files.Todd C. Miller
2018-01-08Add kqueue support for ext2fs based on ffs. OK deraadt@Todd C. Miller
2018-01-02Stop assuming <sys/file.h> will pull in fcntl.h when _KERNEL is defined.Philip Guenther
ok millert@ sthen@
2017-12-30Don't pull in <sys/file.h> just to get fcntl.hPhilip Guenther
ok deraadt@ krw@
2017-12-30Delete unnecessary <sys/file.h> includesPhilip Guenther
ok millert@ krw@
2017-12-14Give vflush_vnode() a hint about vnodes we don't need to account as "busy".Theo de Raadt
Change mountpoint to RDONLY a little later. Seems to improve the rw->ro transition a bit.
2017-12-14If switching RW->RO, then stop deferring timestamp writes...and possiblyPhilip Guenther
other pending inode attribute changes. We appear to be missing UFS_UPDATE() calls in some paths with the result that bsd.rd remounting the newly created /mnt to RO would lose the GID changes on device inodes there. This only affected devices, as they're the only inodes where timestamp writes are delayed. ok deraadt@
2017-12-13Fix a softdep bug exposed by our recent changes to make reboot drop to read-onlyBob Beck
The deadlock happens when softdep gets the same buffer in the BMSAFEMAP case that it already called getdirtybuf() on and made busy at the top of the loop. when this is the case, skip the BMSAFEMAP case and simply write the buffer out at the bottom of the loop as always. This avoids calling getdirtybuf() a second time on the same buffer we already took for exclusive use ourself and have not yet written out. While I'm in here add a KASSERT for the similar case above, which I don't think can happen but we would deadlock in the same way if it does. testing by and ok bluhm@
2017-12-11Disable DIOCCACHESYNC code, which I believe does the oppositeTheo de Raadt
and can cause pending IO's on wd(4) to be thrown away. Still trying to find a solution.
2017-12-11In uvm Chuck decided backing store would not be allocated proactivelyTheo de Raadt
for blocks re-fetchable from the filesystem. However at reboot time, filesystems are unmounted, and since processes lack backing store they are killed. Since the scheduler is still running, in some cases init is killed... which drops us to ddb [noted by bluhm]. Solution is to convert filesystems to read-only [proposed by kettenis]. The tale follows: sys_reboot() should pass proc * to MD boot() to vfs_shutdown() which completes current IO with vfs_busy VB_WRITE|VB_WAIT, then calls VFS_MOUNT() with MNT_UPDATE | MNT_RDONLY, soon teaching us that *fs_mount() calls a copyin() late... so store the sizes in vfsconflist[] and move the copyin() to sys_mount()... and notice nfs_mount copyin() is size-variant, so kill legacy struct nfs_args3. Next we learn ffs_mount()'s MNT_UPDATE code is sharp and rusty especially wrt softdep, so fix some bugs adn add ~MNT_SOFTDEP to the downgrade. Some vnodes need a little more help, so tie them to &dead_vnops. ffs_mount calling DIOCCACHESYNC is causing a bit of grief still but this issue is seperate and will be dealt with in time. couple hundred reboots by bluhm and myself, advice from guenther and others at the hut