src - OpenBSD base system

Age	Commit message (Collapse)	Author
2020-12-25	Refactor klist insertion and removal	Visa Hankala
	Rename klist_{insert,remove}() to klist_{insert,remove}_locked(). These functions assume that the caller has locked the klist. The current state of locking remains intact because the kernel lock is still used with all klists. Add new functions klist_insert() and klist_remove() that lock the klist internally. This allows some code simplification. OK mpi@
2020-11-07	Convert ffs_sysctl to sysctl_bounded_args	gnezdo
	Requires sysctl_bounded_arr branch to support sysctl_rdint. The read-only variables are marked by an empty range of [1, 0]. OK millert@
2020-10-09	Do not dereference `vp' after vput(9)ing it.	Martin Pieuchot
	From dholland@NetBSD ok anton@
2020-08-10	remove #if 0'd ufs2 magic error which predates ffs2 support	Jonathan Gray
	ok kn@
2020-08-10	use EROFS when attempting to mount a 4.2BSD fs without MNT_RDONLY	Jonathan Gray
	This is the documented behaviour which was changed by pedro in rev 1.81 which was partially backed out in rev 1.82.
2020-06-24	kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)	cheloha
	time_second(9) and time_uptime(9) are widely used in the kernel to quickly get the system UTC or system uptime as a time_t. However, time_t is 64-bit everywhere, so it is not generally safe to use them on 32-bit platforms: you have a split-read problem if your hardware cannot perform atomic 64-bit reads. This patch replaces time_second(9) with gettime(9), a safer successor interface, throughout the kernel. Similarly, time_uptime(9) is replaced with getuptime(9). There is a performance cost on 32-bit platforms in exchange for eliminating the split-read problem: instead of two register reads you now have a lockless read loop to pull the values from the timehands. This is really not too bad in the grand scheme of things, but compared to what we were doing before it is several times slower. There is no performance cost on 64-bit (__LP64__) platforms. With input from visa@, dlg@, and tedu@. Several bugs squashed by visa@. ok kettenis@
2020-06-20	With filesystem having many cylinder groups and many inodes per cg the	Otto Moerbeek
	ncg * ipg calcualtion can overflow if signed types are used. Move to uint32_t for the relevant values. Aligned with FreeBSD changes. Also make sure newfs refuses to create an fs with more that 2^32-1 inodes. ok millert@
2020-06-11	Rename poll-compatibility flag to better reflect what it is.	Martin Pieuchot
	While here prefix kernel-only EV flags with two underbars. Suggested by kettenis@, ok visa@
2020-06-08	Use a new EV_OLDAPI flag to match the behavior of poll(2) and select(2).	Martin Pieuchot
	Adapt FS kqfilters to always return true when the flag is set and bypass the polling mechanism of the NFS thread. While here implement a write filter for NFS. ok visa@
2020-05-29	When the preferred cylinder group if full scan forward (wrapping if needed)	Otto Moerbeek
	to find another, instead of first forward and then backward. The latter method causes most full cgs to end up at the end of the partition. From FreeBSD. ok millert@
2020-05-28	Make generation numbers unsigned and fill them using a random number	Otto Moerbeek
	from the range [1..UINT_MAX] initially. On inode re-use increment and on wrap refill from the range [1..UINT_MAX-1] to avoid assigning UINT_MAX (the original value). Zero still means uninitialized. ok millert@
2020-05-21	Explicitly documents that `vop_kqfilter' isn't missing.	Martin Pieuchot
	Just like most of the vop_* methods in MFS they aren't used. ok millert@, visa@
2020-04-07	Abstract the head of knote lists. This allows extending the lists,	Visa Hankala
	for example, with locking assertions. OK mpi@, anton@
2020-03-09	Avoid a tight CPU loop when no unlocked worklist items can be processed.	Todd C. Miller
	If process_worklist_item() is unable to process locked vnodes, num_on_worklist will still be non-zero, preventing the loop in softdep_process_worklist() from exiting. This can result in a kernel hang. To fix this, process_worklist_item() now returns non-zero if it was able to process a worklist item (regardless of whether it matched the mountpoint) and takes a pointer to matchcnt as a function argument. We now break out of the loop in softdep_process_worklist() if process_worklist_item() is unable to make progress. OK beck@ bluhm@
2020-02-27	Remove unused "struct proc *" argument from the following functions:	Martin Pieuchot
	- ufs_chown() & ufs_chmod() - ufs_reclaim() - ext2fs_chown() & ext2fs_chmod() - ntfs_ntget() & ntfs_ntput() - ntfs_vgetex(), ntfs_ntlookup() & ntfs_ntlookupfile() While here use `ap->a_p' directly when it is only required to re-enter the VFS layer in order to help reducing the loop. ok visa@
2020-02-21	Use proper function to get a timestamp, as time_second isn't safe;	Otto Moerbeek
	ok cheloha@
2020-02-21	Handle the mount(... MNT_RELOAD) case for ffs2 as well. ok kettenis@	Otto Moerbeek

2020-02-20	Replace field f_isfd with field f_flags in struct filterops to allow	Visa Hankala
	adding more filter properties without cluttering the struct. OK mpi@, anton@
2020-02-18	Cleanup <sys/kthread.h> and <sys/proc.h> includes.	Martin Pieuchot
	Do not include <sys/kthread.h> where it is not needed and stop including <sys/proc.h> in it. ok visa@, anton@
2020-02-14	Call CURSIG() only once and pass that signal to the check in dounmount()	Claudio Jeker
	and to CLRSIG. OK mpi@ visa@
2020-02-04	Replace TAILQ concatenation loop with TAILQ_CONCAT	bket
	OK florian@, bluhm@, visa@
2020-01-24	Improve small random read ffs performance:	Kurt Miller
	Only call bread_cluster if either the previously read ffs block is adjacent to the current block or if the current read request exceeds the current ffs block. This effectively turns off read-ahead for random reads that fall within one ffs block. okay beck@, mpi@, visa@
2020-01-20	struct vops is not modified during runtime so use const which moves each	Claudio Jeker
	into read-only data segment. OK deraadt@ tedu@
2020-01-14	Convert custom semaphores to tsleep_nsec(9).	Martin Pieuchot
	ok bluhm@
2020-01-04	Call process_worklist_item with LK_NOWAIT to skip locked vnodes from	Bob Beck
	within softdep_process_worklist. When this is called from the syncer a vnode may be legtitimately locked by someone waiting for buffers so we need to skip anything locked. FreeBSD appears to have a similar change. This avoids a deadlock where the syncer ends up waiting for the inode lock that his held by someone waiting for buffer space. Found by bluhm@ and some genua folks ok bluhm@
2019-12-31	Use C99 designated initializers with struct filterops. In addition,	Visa Hankala
	make the structs const so that the data are put in .rodata. OK mpi@, deraadt@, anton@, bluhm@
2019-12-26	Convert struct vfsops initializer to C99 style.	Alexander Bluhm
	OK visa@
2019-11-27	Re-enable IO_NOCACHE, and use is in vnd.	Bob Beck
	Ensure that io to a file backing a vnd is IO_SYNC, so IO to a vnd device is both synchronous and not cached in the buffer cache. This allows the "mount" regress to work repeatably, and avoids a situation where when the buffer cache cleaner runs to clear dirty buffers while people are waiting, it actually increases the dirty buffers when the writes to the underlying vnd are also delayed. ok bluhm@
2019-11-25	Convert infinite sleeps to tsleep_nsec(9).	Martin Pieuchot
	ok bluhm@, cheloha@
2019-10-06	Fix vn_open to require an op of 0, and 0 or KERNELPATH only as flags.	Bob Beck
	sweep tree to correct NDIINT op and flags ahead of time. document the requirement. This allows KERNELPATH to be used to bypass unveil for crash dumps with nosuidcoredump=2 or 3 ok visa@ deraadt@ florian@
2019-08-05	Allow concurrent reads of the f_offset field of struct file by	anton
	serializing both read/write operations using the existing file mutex. The vnode lock still grants exclusive write access to the offset; the mutex is only used to make the actual write atomic and prevent any concurrent reader from observing intermediate values. ok mpi@ visa@
2019-07-25	vinvalbuf(9): tlseep -> tsleep_nsec(9); ok millert@	cheloha

2019-07-19	vwaitforio(9): tsleep(9) -> tsleep_nsec(9); ok visa@	cheloha

2019-07-19	getblk(9): tsleep(9) -> tsleep_nsec(9); ok visa@	cheloha

2019-07-12	Revert anton@ changes about read/write unlocking	solene
	https://marc.info/?l=openbsd-cvs&m=156277704122293&w=2 ok anton@
2019-07-10	Make read/write of the f_offset field belonging to struct file MP-safe;	anton
	as part of the effort to unlock the kernel. Instead of relying on the vnode lock, introduce a dedicated lock per file. Exclusive write access is granted using the new foffset_enter and foffset_leave API. A convenience function foffset_get is also available for threads that only need to read the current offset. The lock acquisition order in vn_write has been changed to match the one in vn_read in order to avoid a potential deadlock. This change also gets rid of a documented race in vn_read(). Inspired by the FreeBSD implementation. With help and ok mpi@ visa@
2019-07-01	Add more verbose messages about unsupported ext2fs features.	Kevin Lo
	Based on FreeBSD r320578. While here, rename a few macros to make the consisten and keep in sync with Linux upstream. ok kn@
2019-06-18	Ensure the length passed to ffs_truncate() is within bounds before calling	anton
	uvm_vnp_setsize() which is not free from side-effects. ok visa@
2019-05-09	Nope, the right byte layout is happening, but we still need to figure out	Theo de Raadt
	a reported baddir panic. Discussed with guenther tedu kettenis millert..
2019-05-09	For filenames which are a multiple of 4 bytes long, the zero pad is	Theo de Raadt
	incorrectly placed underneath the last 4 bytes (and then overwritten) rather than afterwards. We got confused and followed FreeBSD's lead, which curiously increased the leakage of kernel stack from 3 bytes to 4... ok millert kettenis
2019-05-04	3 bytes of kernel stack address space were leaked into on-disk directories.	Theo de Raadt
	With some gritty work up to 254 bytes can be discovered. More details at https://svnweb.freebsd.org/changeset/base/347066 The impact on OpenBSD is very limited: 1 - such stack bytes can be found in raw-device reads, from group operator. If you can read the raw disks you can undertake other more powerful actions. 2 - read(2) upon directory fd was disabled July 1997 because I didn't like how grep * would display garbage and mess up the tty, and applying vis(3) for just directory reads seemed silly. read(2) was changed to return 0 (EOF). Sep 2016 this was further changed to EISDIR, so you still cannot see the bad bytes. 3 - In 2013 when guenther adapted the getdents(2) directory-reading system call to 64-bit ino_t, the userland data format changed to 8-byte-alignment, making it incompatible with the 4-byte-alignment UFS on-disk format. As a result of code refactoring the bad bytes were not copied to userland. Bad bytes will remain in old directories on old filesystems, but nothing makes those bytes user visible. There will be no errata or syspatch issued. I urge other systems which do expose the information to userland to issue errata quickly, since this is a 254 byte infoleak of the stack which is great for ROP-chain building to attack some other bug. Especially if the kernel has no layout/link-order randomization ... ok kettenis jca millert otto ...
2019-05-04	Add DIR_ROUNDUP define, from Kirk McKusick	Theo de Raadt
	ok millert otto kettenis
2019-03-15	Remove FBSDID.	Kevin Lo
	ok deraadt@
2019-03-06	increase dirhash mem a bit since very tiny machines are less common.	Ted Unangst
	perhaps not enough for everyone, but we'll see what happens.
2019-01-21	Introduce a dedicated entry point data structure for file locks. This new data	anton
	structure allows for better tracking of pending lock operations which is essential in order to prevent a use-after-free once the underlying vnode is gone. Inspired by the lockf implementation in FreeBSD. ok visa@ Reported-by: syzbot+d5540a236382f50f1dac@syzkaller.appspotmail.com
2018-12-23	Rectify some issues with the noperm mount flag; the root vnode was not	Martin Natano
	protected properly and files without any x bit set were accidentaly considered executable when checked with access(2). Issues found and reported by deraadt, halex, reyk, tb ok deraadt
2018-09-26	Move the allocating and freeing of mount points into	Visa Hankala
	dedicated functions. OK deraadt@ mpi@
2018-09-06	fix whitespace	Jonathan Gray

2018-07-21	Include the vnode type in the panic message in ffs_write(), just like ffs_read()	anton
	does. ok deraadt@ kettenis@
2018-07-11	Prevent updating async option on softdep mount	kn
	`mount -uo async,nosoftdep /mnt' would set "async" but keep "softdep" untouched on a read/write mount. OK deraadt krw beck bluhm