src - OpenBSD base system

Age	Commit message (Collapse)	Author
2021-05-04	Reorder the integer sysctl functions. Then the traditional 4.4BSD	Alexander Bluhm
	comment 'As above...' makes sense again. Improve comments for sysctl_int_bounded() and sysctl_bounded_arr(). OK gnezdo@ mvs@
2021-05-04	As the unbouded feature in sysctl_int_bounded() is no longer used,	Alexander Bluhm
	remove it. This also fixes a defective check of the dynamic boundary in sysctl_sysvshm(). OK mvs@ gnezdo@
2021-05-01	Update the remaining SYSCTL_INT_READONLY cases	gnezdo
	OK mvs@
2021-04-30	Rearrange the implementation of bounded sysctl. The primitive	Alexander Bluhm
	functions are sysctl_int() and sysctl_rdint(). This brings us back the 4.4BSD implementation. Then sysctl_int_bounded() builds the magic for range checks on top. sysctl_bounded_arr() is a wrapper around it to support multiple variables. Introduce macros that describe the meaning of the magic boundary values. Use these macros in obvious places. input and OK gnezdo@ mvs@
2021-04-23	Remove the sysctl kern.allowdt code from kernel if dt(4) is not	Alexander Bluhm
	configured. This will result in a "value is not available" error from sysctl when trying to enable dt on a kernel without support. The variable allowdt should be in the device, not in sysctl source. We don't need #ifdef for extern and prototypes. OK mpi@
2021-02-08	Revert the convertion of per-process thread into a SMR_TAILQ.	Martin Pieuchot
	We did not reach a consensus about using SMR to unlock single_thread_set() so there's no point in keeping this change.
2021-01-17	Cache parent's pid as `ps_ppid' and use it instead of `ps_pptr->ps_pid'.	mvs
	This allows us to unlock getppid(2). ok mpi@
2021-01-09	Split hierarchical calls into kern_sysctl_dirs	gnezdo
	Removed a rash of +/-1 and made both functions shorter and more focused. OK millert@
2021-01-09	Reduce case duplication in kern_sysctl	gnezdo
	This changes amd64 GENERIC.MP .text size of kern_sysctl.o from 6440 to 6400. Surprisingly, RAMDISK grows from 1645 to 1678. OK millert@, mglocker@
2020-12-28	Analog to the the kern.audio.record sysctl parameter for audio(4)	Marcus Glocker
	devices, introduce kern.video.record for video(4) devices. By default kern.video.record will be set to zero, blanking all data delivered by device drivers which attach to video(4). The idea was initially proposed by Laurence Tratt <laurie AT tratt DOT net>. ok mpi@
2020-12-07	Convert the per-process thread list into a SMR_TAILQ.	Martin Pieuchot
	Currently all iterations are done under KERNEL_LOCK() and therefor use the *_LOCKED() variant. From and ok claudio@
2020-11-16	Convert hw_sysctl to sysctl_bounded_args	gnezdo
	This one is surprisingly a minor loss if one were to simply add bytes on amd64: .text+.data+.bss+.rodata before 0x64b0+0x40+0x14+0x338 = 0x683c after 0x6440+0x48+0x14+0x3b8 = 0x6854
2020-11-16	Convert kern_sysctl to sysctl_bounded_args	gnezdo
	objdump -h changes in Size of kern_sysctl.o on amd64 before after .text 7140 64b0 .data 24 40 .bss 10 14 .rodata 50 338
2020-11-07	Convert ffs_sysctl to sysctl_bounded_args	gnezdo
	Requires sysctl_bounded_arr branch to support sysctl_rdint. The read-only variables are marked by an empty range of [1, 0]. OK millert@
2020-10-19	Serialize accesses to "struct vmspace" and document its refcounting.	Martin Pieuchot
	The underlying vm_space lock is used as a substitute to the KERNEL_LOCK() in uvm_grow() to make sure `vm_ssize' is not corrupted. ok anton@, kettenis@
2020-09-01	Remove unused sysctl_int_arr(9)	gnezdo

2020-08-23	Remove unused debug_syncprt, improve debug sysctl handling	kn
	"syncprt" is unused since kern/vfs_syscalls.c r1.147 from 2008. Adding new debug sysctls is a bit opaque and looking at kern/kern_sysctl.c the only visible difference between used and stub ctldebug structs in the debugvars[] array is their extern keyword, indicating that it is defined elsewhere. sys/sysctl.h declares all debugN members as extern upfront, but these declarations are not needed. Remove the unused debug sysctl, rename the only remaining one to something meaningful and remove forward declarations from /sys/sysctl.h; this way, adding new debug sysctls is a matter of adding extern and coming up with a name, which is nicer to read on its own and better to grep for. OK mpi
2020-08-22	Move sysctl(2) CTL_DEBUG from DEBUG to new DEBUG_SYSCTL	kn
	Adding "debug.my-knob" sysctls is really helpful to select different code paths and/or log on demand during runtime without recompile, but as this code is under DEBUG, lots of other noise comes with it which is often undesired, at least when looking at specific subsystems only. Adding globals to the kernel and breaking into DDB to change them helps, but that does not work over SSH, hence the need for debug sysctls. Introduces DEBUG_SYSCTL to make use of the "debug" MIB without the rest of DEBUG; it's DEBUG_SYSCTL and not SYSCTL_DEBUG because it's not a general option for all of sysctl(2). OK gnezdo
2020-08-18	Style fixups from hurried commits	gnezdo
	Thanks kettenis@ for pointing out. ok kettenis@
2020-08-18	Add sysctl_bounded_arr as a replacement for sysctl_int_arr	gnezdo
	Design by deraadt@ ok deraadt@
2020-08-01	Move range check inside sysctl_int_arr	gnezdo
	Range violations are now consistently reported as EOPNOTSUPP. Previously they were mixed with ENOPROTOOPT. OK kn@
2020-06-22	there's not going to be any whole kernel wide network livelocks soon.	David Gwynne

2020-05-29	rndvar.h not needed here	Theo de Raadt

2020-03-09	Return EINVAL for KERN_PROC if the size parameter is 0.	Todd C. Miller
	Prevents a panic due to a NULL dereference; Coverity CID 1452899. Based on a diff from mpi@, OK deraadt@ kettenis@
2020-01-24	New `kern.allowdt' button must be set to open(2) /dev/dt.	Martin Pieuchot
	dt(4) exposes kernel internals, addresses and content of states to userland. As such its interface shouldn't be available without enabling it consciously. ok millert@, deraadt@
2020-01-02	Exclude offline cpus in KERN_CPTIME calculation. Without this too high	Claudio Jeker
	idle time is reported in tools like vmstat. OK visa@ benno@ krw@
2019-12-11	Replace p_xstat with ps_xexit and ps_xsig	Philip Guenther
	Convert those to a consolidated status when needed in wait4(), kevent(), and sysctl() Pass exit code and signal separately to exit1() (This also serves as prep for adding waitid(2)) ok mpi@
2019-10-22	struct proc: change ps_start from utc time to uptime	cheloha
	Allows us to determine how long a process has been running, even if the UTC clock jumps. With help from bluhm@ and millert@, who squashed several bugs. ok bluhm@ millert@
2019-08-21	sysctl(2): add kern.utc_offset: successor to the DST/TIMEZONE options(4)	cheloha
	The DST and TIMEZONE options(4) are incompatible with KARL, so we need some other way to compensate for an RTC running with a known offset. Enter kern.utc_offset, an offset in minutes East of UTC. TIMEZONE has always been minutes West, but this is inconsistent with how everyone else talks about timezones, hence the flip. TIMEZONE has the advantage of being compiled into the binary. Our new sysctl(2) has no such luck, so it needs to be set as early as possible in boot, from sysctl.conf(5), so we can correct the kernel clock from the RTC's local time to UTC before daemons like ntpd(8) and cron(8) start. To encourage this, kern.utc_offset is made immutable after the securelevel(7) is raised to 1. Prompted by yasuoka@. Discussed with deraadt@, kettenis@, yasuoka@. Additional testing by yasuoka@. ok deraadt@, yasuoka@
2019-08-05	Allow concurrent reads of the f_offset field of struct file by	anton
	serializing both read/write operations using the existing file mutex. The vnode lock still grants exclusive write access to the offset; the mutex is only used to make the actual write atomic and prevent any concurrent reader from observing intermediate values. ok mpi@ visa@
2019-07-16	Prevent integer overflow in kernel and userland when checking mbuf	Alexander Bluhm
	limits. Convert kernel variables and calculations for mbuf memory into long to allow larger values on 64 bit machines. Put a range check into the kernel sysctl. For the interface itself int is still sufficient. In netstat -m cast all multiplications to unsigned long to hold the product of two unsigned int. input and OK visa@
2019-07-12	Revert anton@ changes about read/write unlocking	solene
	https://marc.info/?l=openbsd-cvs&m=156277704122293&w=2 ok anton@
2019-07-12	sysctl(2): add KERN_TIMEOUT_STATS: timeout(9) status and statistics.	cheloha
	With these totals one can track the throughput of the timeout(9) layer from userspace. With input from mpi@. ok mpi@
2019-07-10	Make read/write of the f_offset field belonging to struct file MP-safe;	anton
	as part of the effort to unlock the kernel. Instead of relying on the vnode lock, introduce a dedicated lock per file. Exclusive write access is granted using the new foffset_enter and foffset_leave API. A convenience function foffset_get is also available for threads that only need to read the current offset. The lock acquisition order in vn_write has been changed to match the one in vn_read in order to avoid a potential deadlock. This change also gets rid of a documented race in vn_read(). Inspired by the FreeBSD implementation. With help and ok mpi@ visa@
2019-06-16	In previous commit I forgot a net unlock if the PCB of the socket	Alexander Bluhm
	was already gone. OK mpi@
2019-06-13	When tcp_close() is running in parallel with fill_file(), the kernel	Alexander Bluhm
	could crash due to missing inp_ppcb. This happend when fstat(1) was called often and TCP was aborted with reset. Protect the sysctl path with the net lock. OK mpi@
2019-06-01	Revert to using the SCHED_LOCK() to protect time accounting.	Martin Pieuchot
	It currently creates a lock ordering problem because SCHED_LOCK() is taken by hardclock(). That means the "priorities" of a thread should be moved out of the SCHED_LOCK() first in order to make progress. Reported-by: syzbot+8e4863b3dde88eb706dc@syzkaller.appspotmail.com via anton@ as well as by kettenis@
2019-05-31	Use a per-process mutex to protect time accounting instead of SCHED_LOCK().	Martin Pieuchot
	Note that hardclock(9) still increments p_{u,s,i}ticks without holding a lock. ok visa@, cheloha@
2019-05-22	Read and assign the integer value only once. With this sysctl_int() will	Claudio Jeker
	do word loads and stores and so partial updates should no longer be observed. With this accessing global variables set by sysctl_int() should be mostly MP save. OK dlg@ mpi@
2019-05-09	Add a sysctl accessor to struct pf_status. The pf_status only holds the	Claudio Jeker
	current status and statistics and can be exported without super-user rights via sysctl to make it easier for tools like systat to access those. OK deraadt@, sashan@
2019-01-29	Add a dedicated sysctl(2) node for witness(4).	Visa Hankala
	The new node contains the subsystem's main control variable, kern.witness.watch. It is aliased by the old name, kern.witnesswatch. The alias will be removed in the future. OK anton@ mpi@
2019-01-19	Move boottime into the timehands.	cheloha
	To protect the timehands we first need to protect the basis for all UTC time in the kernel: the boottime. Because the boottime can be changed at any time it needs to be versioned along with the other members of the timehands to enable safe lockless reads when using it for anything. So the global boottime timespec goes away and the static boottimebin becomes a member of the timehands. Instead of reading the global boottime you use one of two interfaces: binboottime(9) or microboottime(9). nanoboottime(9) can trivially be added later, though there are no consumers for it at the moment. This introduces one small change in behavior. We used to advance the reported boottime just before launching kernel threads from main(). This makes it look to userland like we "booted" moments before those threads were launched. Because there is no longer a boottime global we can no longer trivially do this from main(), so the boottime we report to userspace via e.g. kern.boottime will now reflect whatever the time was when we bootstrapped the timehands via inittodr(9). This is usually no more than a minute before the kernel threads are launched from main(). The prior behavior can be restored by adding a new interface to the timecounter layer in a future commit. Based on FreeBSD r303387. Discussed with mpi@ and visa@. ok visa@
2018-11-19	delete the dns jackport experiment. it has no future.	Ted Unangst

2018-11-17	Add new KERN_CPUSTATS sysctl(2) so we can identify offline CPUs.	cheloha
	Because of hw.smt we need a way to determine whether a given CPU is "online" or "offline" from userspace. KERN_CPTIME2 is an array, and so cannot be cleanly extended for this purpose, so add a new sysctl(2) KERN_CPUSTATS with an extensible struct. At the moment it's just KERN_CPTIME2 with a flags member, but it can grow as needed. KERN_CPUSTATS appears to have been defined by BSDi long ago, but there are few (if any) packages in the wild still using the symbol so breakage in ports should be near zero. No other system inherited the symbol from BSDi, either. Then, use the new sysctl(2) in systat(1) and top(1): - systat(1) draws placeholder marks ('-') instead of percentages for offline CPUs in the cpu view. - systat(1) omits offline CPU ticks when drawing the "big bar" in the vmstat view. The upshot is that the bar isn't half idle when half your logical CPUs are disabled. - top(1) does not draw lines for offline CPUs; if CPUs toggle on or offline in interactive mode we redraw the display to expand/reduce space for the new/missing CPUs. This is consistent with what some top(1) implementations do on Linux. - top(1) omits offline CPUs from the totals when CPU totals are combined into a single line (the '-1' flag). Originally prompted by deraadt@. Discussed endlessly with deraadt@, ketennis@, and sthen@. Tested by jmc@ and jca@. Earlier versions also discussed with jca@. Earlier versions tested by jmc@, tb@, and many others. docs ok jmc@, kernel bits ok ketennis@, everything ok sthen@, "Is your stuff in yet?" deraadt@
2018-10-05	Revert KERN_CPTIME2 ENODEV changes in kernel and userspace.	cheloha
	ok kettenis deraadt
2018-10-04	Revert the inpcb table mutex commit. It triggers a witness panic	Alexander Bluhm
	in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx is held and sorwakeup() is called within the loop. As sowakeup() grabs the kernel lock, we have a lock ordering problem. found by Hrvoje Popovski; OK deraadt@ mpi@
2018-09-26	KERN_CPTIME2: set ENODEV if the CPU is offline.	cheloha
	This lets userspace distinguish between idle CPUs and those that are not schedulable because hw.smt=0. A subsequent commit probably needs to add documentation for this to sysctl.2 (and perhaps elsewhere) after the dust settles. Also included here are changes to systat(1) and top(1) that account for the ENODEV case and adjust behavior accordingly: - systat(1)'s cpu view prints placeholder marks ('-') instead of percentages for each state if the given CPU is offline. - systat(1)'s vmstat view checks for offline CPUs when computing the machine state total and excludes them, so the CPU usage graph only represents the states for online CPUs. - top(1) does not draw CPU rows for offline CPUs when the view is redrawn. If CPUs "go offline", percentages for each state are replaced by placeholder marks ('-'); the view will need to be redrawn to remove these rows. If CPUs "go online" the view will need to be redrawn to show these new CPUs. In "combined CPU" mode, the count and the state totals only represent online CPUs. Ports using KERN_CPTIME2 will need to be updated. The changes described above to make systat(1) and top(1) aware of the ENODEV case and gracefully handle a changing HW_NCPUONLINE while the application is running are not necessarily appropriate for each and every port. The changes described above are so extensive in part to demonstrate one way a program might be made robust to changing CPU availability. In particular, changing hw.smt after boot is an extremely rare event, and this needs to be weighed when updating ports. The logic needed to account for the KERN_CPTIME2 ENODEV case is very roughly: if (sysctl(...) == -1) { if (errno != ENODEV) { /* Actual error occurred. / } else { / CPU is offline. / } } else { / CPU is online and CPU states were set by sysctl(2). */ } Prompted by deraadt@. Basic idea for ENODEV from kettenis@. Discussed at length with kettenis@. Additional testing by tb@. No complaints from hackers@ after a week. ok kettenis@, "I think you should commit [now]" deraadt@
2018-09-20	As a step towards per inpcb or socket locks, remove the net lock	Alexander Bluhm
	for netstat -a. Introduce a global mutex that protects the tables and hashes for the internet PCBs. To detect detached PCB, set its inp_socket field to NULL. This has to be protected by a per PCB mutex. The protocol pointer has to be protected by the mutex as netstat uses it. Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify() before the table mutex to avoid lock ordering problems in the notify functions. OK visa@
2018-07-12	Add hw.ncpuonline to count the number of online CPUs.	cheloha
	The introduction of hw.smt means that logical CPUs can be disabled after boot and prior to suspend/resume. If hw.smt=0 (the default), there needs to be a way to count the number of hardware threads available on the system at any given time. So, import HW_NCPUONLINE/hw.ncpuonline from NetBSD and document it. hw.ncpu becomes equal to the number of CPUs given to sched_init_cpu() during boot, while hw.ncpuonline is equal to the number of CPUs available to the scheduler in the cpuset "sched_all_cpus". Set_SC_NPROCESSORS_ONLN equal to this new sysctl and keep _SC_NPROCESSORS_CONF equal to hw.ncpu. This is preferable to adding a new sysctl to count the number of configured CPUs and keeping hw.ncpu equal to the number of online CPUs because such a change would break software in the ecosystem that relies on HW_NCPU/hw.ncpu to measure CPU usage and the like. Such software in base includes top(1), systat(1), and snmpd(8), and perhaps others. We don't need additional locking to count the cardinality of a cpuset in this case because the only interfaces that can modify said cardinality are sysctl(2) and ioctl(2), both of which are under the KERNEL_LOCK. Software using HW_NCPU/hw.ncpu to determine optimal parallism will need to be updated to use HW_NCPUONLINE/hw.ncpuonline. Until then, such software may perform suboptimally. However, most changes will be similar to the change included here for libcxx's std::thread:hardware_concurrency(): using HW_NCPUONLINE in lieu of HW_NCPU should be sufficient for determining optimal parallelism for most software if the change to _SC_NPROCESSORS_ONLN is insufficient. Prompted by deraadt. Discussed at length with kettenis, deraadt, and sthen. Lots of patch tweaks from kettenis. ok kettenis, "proceed" deraadt
2018-07-02	Update the file reference count field `f_count' using atomic operations	Visa Hankala
	instead of using a mutex for update serialization. Use a per-fdp mutex to manage updating of file instance pointers in the `fd_ofiles' array to let fd_getfile() acquire file references safely with concurrent file reference releases. OK mpi@