summaryrefslogtreecommitdiff
path: root/sys/ufs
AgeCommit message (Collapse)Author
2020-12-25Refactor klist insertion and removalVisa Hankala
Rename klist_{insert,remove}() to klist_{insert,remove}_locked(). These functions assume that the caller has locked the klist. The current state of locking remains intact because the kernel lock is still used with all klists. Add new functions klist_insert() and klist_remove() that lock the klist internally. This allows some code simplification. OK mpi@
2020-11-07Convert ffs_sysctl to sysctl_bounded_argsgnezdo
Requires sysctl_bounded_arr branch to support sysctl_rdint. The read-only variables are marked by an empty range of [1, 0]. OK millert@
2020-10-09Do not dereference `vp' after vput(9)ing it.Martin Pieuchot
From dholland@NetBSD ok anton@
2020-08-10remove #if 0'd ufs2 magic error which predates ffs2 supportJonathan Gray
ok kn@
2020-08-10use EROFS when attempting to mount a 4.2BSD fs without MNT_RDONLYJonathan Gray
This is the documented behaviour which was changed by pedro in rev 1.81 which was partially backed out in rev 1.82.
2020-06-24kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)cheloha
time_second(9) and time_uptime(9) are widely used in the kernel to quickly get the system UTC or system uptime as a time_t. However, time_t is 64-bit everywhere, so it is not generally safe to use them on 32-bit platforms: you have a split-read problem if your hardware cannot perform atomic 64-bit reads. This patch replaces time_second(9) with gettime(9), a safer successor interface, throughout the kernel. Similarly, time_uptime(9) is replaced with getuptime(9). There is a performance cost on 32-bit platforms in exchange for eliminating the split-read problem: instead of two register reads you now have a lockless read loop to pull the values from the timehands. This is really not *too* bad in the grand scheme of things, but compared to what we were doing before it is several times slower. There is no performance cost on 64-bit (__LP64__) platforms. With input from visa@, dlg@, and tedu@. Several bugs squashed by visa@. ok kettenis@
2020-06-20With filesystem having many cylinder groups and many inodes per cg theOtto Moerbeek
ncg * ipg calcualtion can overflow if signed types are used. Move to uint32_t for the relevant values. Aligned with FreeBSD changes. Also make sure newfs refuses to create an fs with more that 2^32-1 inodes. ok millert@
2020-06-11Rename poll-compatibility flag to better reflect what it is.Martin Pieuchot
While here prefix kernel-only EV flags with two underbars. Suggested by kettenis@, ok visa@
2020-06-08Use a new EV_OLDAPI flag to match the behavior of poll(2) and select(2).Martin Pieuchot
Adapt FS kqfilters to always return true when the flag is set and bypass the polling mechanism of the NFS thread. While here implement a write filter for NFS. ok visa@
2020-05-29When the preferred cylinder group if full scan forward (wrapping if needed)Otto Moerbeek
to find another, instead of first forward and then backward. The latter method causes most full cgs to end up at the end of the partition. From FreeBSD. ok millert@
2020-05-28Make generation numbers unsigned and fill them using a random numberOtto Moerbeek
from the range [1..UINT_MAX] initially. On inode re-use increment and on wrap refill from the range [1..UINT_MAX-1] to avoid assigning UINT_MAX (the original value). Zero still means uninitialized. ok millert@
2020-05-21Explicitly documents that `vop_kqfilter' isn't missing.Martin Pieuchot
Just like most of the vop_* methods in MFS they aren't used. ok millert@, visa@
2020-04-07Abstract the head of knote lists. This allows extending the lists,Visa Hankala
for example, with locking assertions. OK mpi@, anton@
2020-03-09Avoid a tight CPU loop when no unlocked worklist items can be processed.Todd C. Miller
If process_worklist_item() is unable to process locked vnodes, num_on_worklist will still be non-zero, preventing the loop in softdep_process_worklist() from exiting. This can result in a kernel hang. To fix this, process_worklist_item() now returns non-zero if it was able to process a worklist item (regardless of whether it matched the mountpoint) and takes a pointer to matchcnt as a function argument. We now break out of the loop in softdep_process_worklist() if process_worklist_item() is unable to make progress. OK beck@ bluhm@
2020-02-27Remove unused "struct proc *" argument from the following functions:Martin Pieuchot
- ufs_chown() & ufs_chmod() - ufs_reclaim() - ext2fs_chown() & ext2fs_chmod() - ntfs_ntget() & ntfs_ntput() - ntfs_vgetex(), ntfs_ntlookup() & ntfs_ntlookupfile() While here use `ap->a_p' directly when it is only required to re-enter the VFS layer in order to help reducing the loop. ok visa@
2020-02-21Use proper function to get a timestamp, as time_second isn't safe;Otto Moerbeek
ok cheloha@
2020-02-21Handle the mount(... MNT_RELOAD) case for ffs2 as well. ok kettenis@Otto Moerbeek
2020-02-20Replace field f_isfd with field f_flags in struct filterops to allowVisa Hankala
adding more filter properties without cluttering the struct. OK mpi@, anton@
2020-02-18Cleanup <sys/kthread.h> and <sys/proc.h> includes.Martin Pieuchot
Do not include <sys/kthread.h> where it is not needed and stop including <sys/proc.h> in it. ok visa@, anton@
2020-02-14Call CURSIG() only once and pass that signal to the check in dounmount()Claudio Jeker
and to CLRSIG. OK mpi@ visa@
2020-02-04Replace TAILQ concatenation loop with TAILQ_CONCATbket
OK florian@, bluhm@, visa@
2020-01-24Improve small random read ffs performance:Kurt Miller
Only call bread_cluster if either the previously read ffs block is adjacent to the current block or if the current read request exceeds the current ffs block. This effectively turns off read-ahead for random reads that fall within one ffs block. okay beck@, mpi@, visa@
2020-01-20struct vops is not modified during runtime so use const which moves eachClaudio Jeker
into read-only data segment. OK deraadt@ tedu@
2020-01-14Convert custom semaphores to tsleep_nsec(9).Martin Pieuchot
ok bluhm@
2020-01-04Call process_worklist_item with LK_NOWAIT to skip locked vnodes fromBob Beck
within softdep_process_worklist. When this is called from the syncer a vnode may be legtitimately locked by someone waiting for buffers so we need to skip anything locked. FreeBSD appears to have a similar change. This avoids a deadlock where the syncer ends up waiting for the inode lock that his held by someone waiting for buffer space. Found by bluhm@ and some genua folks ok bluhm@
2019-12-31Use C99 designated initializers with struct filterops. In addition,Visa Hankala
make the structs const so that the data are put in .rodata. OK mpi@, deraadt@, anton@, bluhm@
2019-12-26Convert struct vfsops initializer to C99 style.Alexander Bluhm
OK visa@
2019-11-27Re-enable IO_NOCACHE, and use is in vnd.Bob Beck
Ensure that io to a file backing a vnd is IO_SYNC, so IO to a vnd device is both synchronous and not cached in the buffer cache. This allows the "mount" regress to work repeatably, and avoids a situation where when the buffer cache cleaner runs to clear dirty buffers while people are waiting, it actually increases the dirty buffers when the writes to the underlying vnd are also delayed. ok bluhm@
2019-11-25Convert infinite sleeps to tsleep_nsec(9).Martin Pieuchot
ok bluhm@, cheloha@
2019-10-06Fix vn_open to require an op of 0, and 0 or KERNELPATH only as flags.Bob Beck
sweep tree to correct NDIINT op and flags ahead of time. document the requirement. This allows KERNELPATH to be used to bypass unveil for crash dumps with nosuidcoredump=2 or 3 ok visa@ deraadt@ florian@
2019-08-05Allow concurrent reads of the f_offset field of struct file byanton
serializing both read/write operations using the existing file mutex. The vnode lock still grants exclusive write access to the offset; the mutex is only used to make the actual write atomic and prevent any concurrent reader from observing intermediate values. ok mpi@ visa@
2019-07-25vinvalbuf(9): tlseep -> tsleep_nsec(9); ok millert@cheloha
2019-07-19vwaitforio(9): tsleep(9) -> tsleep_nsec(9); ok visa@cheloha
2019-07-19getblk(9): tsleep(9) -> tsleep_nsec(9); ok visa@cheloha
2019-07-12Revert anton@ changes about read/write unlockingsolene
https://marc.info/?l=openbsd-cvs&m=156277704122293&w=2 ok anton@
2019-07-10Make read/write of the f_offset field belonging to struct file MP-safe;anton
as part of the effort to unlock the kernel. Instead of relying on the vnode lock, introduce a dedicated lock per file. Exclusive write access is granted using the new foffset_enter and foffset_leave API. A convenience function foffset_get is also available for threads that only need to read the current offset. The lock acquisition order in vn_write has been changed to match the one in vn_read in order to avoid a potential deadlock. This change also gets rid of a documented race in vn_read(). Inspired by the FreeBSD implementation. With help and ok mpi@ visa@
2019-07-01Add more verbose messages about unsupported ext2fs features.Kevin Lo
Based on FreeBSD r320578. While here, rename a few macros to make the consisten and keep in sync with Linux upstream. ok kn@
2019-06-18Ensure the length passed to ffs_truncate() is within bounds before callinganton
uvm_vnp_setsize() which is not free from side-effects. ok visa@
2019-05-09Nope, the right byte layout is happening, but we still need to figure outTheo de Raadt
a reported baddir panic. Discussed with guenther tedu kettenis millert..
2019-05-09For filenames which are a multiple of 4 bytes long, the zero pad isTheo de Raadt
incorrectly placed underneath the last 4 bytes (and then overwritten) rather than afterwards. We got confused and followed FreeBSD's lead, which curiously increased the leakage of kernel stack from 3 bytes to 4... ok millert kettenis
2019-05-043 bytes of kernel stack address space were leaked into on-disk directories.Theo de Raadt
With some gritty work up to 254 bytes can be discovered. More details at https://svnweb.freebsd.org/changeset/base/347066 The impact on OpenBSD is very limited: 1 - such stack bytes can be found in raw-device reads, from group operator. If you can read the raw disks you can undertake other more powerful actions. 2 - read(2) upon directory fd was disabled July 1997 because I didn't like how grep * would display garbage and mess up the tty, and applying vis(3) for just directory reads seemed silly. read(2) was changed to return 0 (EOF). Sep 2016 this was further changed to EISDIR, so you still cannot see the bad bytes. 3 - In 2013 when guenther adapted the getdents(2) directory-reading system call to 64-bit ino_t, the userland data format changed to 8-byte-alignment, making it incompatible with the 4-byte-alignment UFS on-disk format. As a result of code refactoring the bad bytes were not copied to userland. Bad bytes will remain in old directories on old filesystems, but nothing makes those bytes user visible. There will be no errata or syspatch issued. I urge other systems which do expose the information to userland to issue errata quickly, since this is a 254 byte infoleak of the stack which is great for ROP-chain building to attack some other bug. Especially if the kernel has no layout/link-order randomization ... ok kettenis jca millert otto ...
2019-05-04Add DIR_ROUNDUP define, from Kirk McKusickTheo de Raadt
ok millert otto kettenis
2019-03-15Remove FBSDID.Kevin Lo
ok deraadt@
2019-03-06increase dirhash mem a bit since very tiny machines are less common.Ted Unangst
perhaps not enough for everyone, but we'll see what happens.
2019-01-21Introduce a dedicated entry point data structure for file locks. This new dataanton
structure allows for better tracking of pending lock operations which is essential in order to prevent a use-after-free once the underlying vnode is gone. Inspired by the lockf implementation in FreeBSD. ok visa@ Reported-by: syzbot+d5540a236382f50f1dac@syzkaller.appspotmail.com
2018-12-23Rectify some issues with the noperm mount flag; the root vnode was notMartin Natano
protected properly and files without any x bit set were accidentaly considered executable when checked with access(2). Issues found and reported by deraadt, halex, reyk, tb ok deraadt
2018-09-26Move the allocating and freeing of mount points intoVisa Hankala
dedicated functions. OK deraadt@ mpi@
2018-09-06fix whitespaceJonathan Gray
2018-07-21Include the vnode type in the panic message in ffs_write(), just like ffs_read()anton
does. ok deraadt@ kettenis@
2018-07-11Prevent updating async option on softdep mountkn
`mount -uo async,nosoftdep /mnt' would set "async" but keep "softdep" untouched on a read/write mount. OK deraadt krw beck bluhm