src - OpenBSD base system

Age	Commit message (Collapse)	Author
2024-01-10	Split UDP PCB table into IPv4 and IPv6.	Alexander Bluhm
	Having two hash tables instead of a common one, reduces table size and contention on the per table lock. The address family is always known in advance. The lookups and loops are more specific. OK sashan@
2024-01-07	Error out if one syscall ever takes more than 6 arguments.	Miod Vallat
	This is not necessarily wrong per se, but would need special consideration, as not all platforms are currently able to process more than six syscall arguments (and upcoming diffs will rely upon reasonably-sized argument lists), so better break now and reconsider later if need be. ok deraadt@
2024-01-03	Run connect(2) in parallel within inet doamin.	Alexander Bluhm
	This unlocks soconnect() for UDP, rip, rip6 and divert. It takes shared net lock in combination with per socket lock. TCP and GRE still use exclusive net lock when connecting. OK mvs@
2024-01-01	copyright++;	Jonathan Gray

2023-12-21	Remove logic and comments related to INDIR now that they aren't supported	Miod Vallat
	anymore. ok tb@ deraadt@, no need to regen anything
2023-12-19	Release inpcb mutex while calling sbwait().	Alexander Bluhm
	As sbwait() may sleep, holding any mutex is not allowed. Call pru_unlock() before sbwait() in soreceive(). Bug spotted by sashan@; OK sashan@ mvs@
2023-12-19	sync	Theo de Raadt

2023-12-19	the 4th argument of pinsyscalls() is now "number of pin elements",	Theo de Raadt
	not "size of the storage of the pin elements"
2023-12-19	soreceive() must not hold mutex when calling sblock().	Alexander Bluhm
	In my recent commit I missed that sblock() may sleep while soreceive() holds the incpb mutex. Call pru_lock() after sblock(). Reported-by: syzbot+f79c896ec019553655a0@syzkaller.appspotmail.com Reported-by: syzbot+08b6f1102e429b2d4f84@syzkaller.appspotmail.com OK mvs@
2023-12-18	Run bind(2) system call in parallel.	Alexander Bluhm
	For protocols that care about locking, use the shared net lock to call sobind(). Use the per socket rwlock together with shared net lock. This affects protocols UDP, raw IP, and divert. Move the inpcb mutex locking into soreceive(), it is only used there. Add a comment to describe the current inmplementation of inpcb locking. OK mvs@ sashan@
2023-12-15	provide the pieces for ktrace/kdump to observe pinsyscall violations.	Theo de Raadt
	(not used yet, because the pinsyscall changes are still being worked on) ok kettenis
2023-12-14	Workaround for broken clang which has a broken -fno-zero-initialized-in-bss	Claudio Jeker
	implementation. Set nkmempages to -1 by default instead of 0 so that the value ends up in the data section. This way config(8) is able to alter the value as promised. See also: https://github.com/llvm/llvm-project/issues/74632 OK miod@
2023-12-14	Bring default logic to set nkmempages into the 21st century.	Claudio Jeker
	The new logic is: Up to 1G physmem use physical memory / 4, above 1G add an extra 16MB per 1G of memory. Clamp it down depending on available kernel virtual address space - up and including 512M -> 64MB (macppc, arm, sh) - between 512M and 1024M -> 128MB (hppa, i386, mips, luna88k) - over 1024M clamping to VM_KERNEL_SPACE_SIZE / 4 The result is much more malloc(9) space on 64bit archs with lots of memory and large kva space. Note: amd64 only has 4G of kva and therefor nkmempages is limited to 262144 As a side-effect NKMEMPAGES_MAX and nkmempages_max are no longer used. Tested and OK miod@
2023-12-12	put pinsyscalls(2) into the "always" group	Theo de Raadt

2023-12-12	sync	Theo de Raadt

2023-12-12	remove support for syscall(2) -- the "indirection system call" because	Theo de Raadt
	it is a dangerous alternative entry point for all system calls, and thus incompatible with the precision system call entry point scheme we are heading towards. This has been a 3-year mission: First perl needed a code-generated wrapper to fake syscall(2) as a giant switch table, then all the ports were cleaned with relatively minor fixes, except for "go". "go" required two fixes -- 1) a framework issue with old library versions, and 2) like perl, a fake syscall(2) wrapper to handle ioctl(2) and sysctl(2) because "syscall(SYS_ioctl" occurs all over the place in the "go" ecosystem because the "go developers" are plan9-loving unix-hating folk who tried to build an ecosystem without allowing "ioctl". ok kettenis, jsing, afresh1, sthen
2023-12-11	Implement per-CPU caching for the page table page (vp) pool and the PTE	Mark Kettenis
	descriptor (pted) pool in the arm64 pmap implementation. This significantly reduces the side-effects of lock contention on the kernel map lock that is (incorrectly) translated into excessive page daemon wakeups. This is not a perfect solution but it does lead to significant speedups on machines with many CPU cores. This requires adding a new pmap_init_percpu() function that gets called at the point where kernel is ready to set up the per-CPU pool caches. Dummy implementations of this function are added for all non-arm64 architectures. Some other architectures can probably benefit from providing an actual implementation that sets up per-CPU caches for pmap pools as well. ok phessler@, claudio@, miod@, patrick@
2023-12-10	sync	Theo de Raadt

2023-12-10	pinsyscalls(2) 2nd argument can be "uint " instead of "void	Theo de Raadt
	ok kettenis
2023-12-07	sync	Theo de Raadt

2023-12-07	Add a stub pinsyscalls() system call that simply returns 0 for now,	Theo de Raadt
	before future work where ld.so(1) will need this new system call. Putting this in the kernel ahead of time will save some grief. ok kettenis
2023-11-29	regen syscalls	Alexander Bluhm

2023-11-29	Unlock bind(2) syscall.	Alexander Bluhm
	For internet sockets sobind() runs with exclusive net lock due to solock(). For unix domain sockets uipc_bind() grabs the kernel lock itself. So sys_bind() is MP safe. Add NOLOCK flag to avoid kernel lock. OK mvs@
2023-11-29	Cleanup kmeminit_nkmempages().	Claudio Jeker
	NKMEMPAGES_MIN was removed long time ago in all archs so there is no need to keep it. Also initalize nkmempages_max at compile time since sparc (with variable page size) is long gone as well. No objection from miod@
2023-11-28	correct spelling of FALLTHROUGH	Jonathan Gray

2023-11-24	Fix comments longer than 80 column.	ASOU Masato
	ok miod@
2023-11-21	Fix kernel build without option PTRACE, but with dt(4).	Alexander Bluhm
	Since revision 1.26 dt_ioctl_get_auxbase() is calling process_domem(). Build the latter function into kernel if pseudo device dt is enabled. from Matthias Pitzl; OK claudio@
2023-11-15	Constify disk_map()'s path argument	Klemens Nanni
	The disklabel UID passed in is not modified, reflect that and allow callers using 'const char *'. OK miod
2023-10-30	Do not truncate MSG_EOR in recvmsg().	Alexander Bluhm
	The soreceive() code depends on the fact that MSG_EOR is set on the last mbuf of the chain. In sbappendcontrol() move MSG_EOR to the end like sbcompress() does it. This fixes MSG_EOR handling for SOCK_SEQPACKET sockets with control message. bug reported by Eric Wong analysed, tested and OK claudio@
2023-10-30	Use ERESTART for any single_thread_set() error in sys_execve().	Claudio Jeker
	If single thread is already held by another thread just unwind to userret() wait there and retry the system call later (if at all). OK mpi@
2023-10-24	Normally context switches happen in mi_switch() but there are 3 cases	Claudio Jeker
	where a switch happens outside. Cleanup these code paths and make the machine independent. - when a process forks (fork, tfork, kthread), the new proc needs to somehow be scheduled for the first time. This is done by proc_trampoline. Since proc_trampoline is machine dependent assembler code change the MP specific proc_trampoline_mp() to proc_trampoline_mi() and make sure it is now always called. - cpu_hatch: when booting APs the code needs to jump to the first proc running on that CPU. This should be the idle thread for that CPU. - sched_exit: when a proc exits it needs to switch away from itself and then instruct the reaper to clean up the rest. This is done by switching to the idle loop. Since the last two cases require a context switch to the idle proc factor out the common code to sched_toidle() and use it in those places. Tested by many on all archs. OK miod@ mpi@ cheloha@
2023-10-20	Avoid assertion failure when splitting mbuf cluster.	Alexander Bluhm
	m_split() calls m_align() to initialize the data pointer of newly allocated mbuf. If the new mbuf will be converted to a cluster, this is not necessary. If additionally the new mbuf is larger than MLEN, this can lead to a panic. Only call m_align() when a valid m_data is needed. This is the case if we do not refecence the existing cluster, but memcpy() the data into the new mbuf. Reported-by: syzbot+0e6817f5877926f0e96a@syzkaller.appspotmail.com OK claudio@ deraadt@
2023-10-17	clockintr: move callback-specific API behaviors to "clockrequest" namespace	Scott Soule Cheloha
	The API's behavior when invoked from a callback function is impossible to document. Move the special behavior into a distinct namespace, "clockrequest". - Add a 'struct clockrequest'. Basically a stripped-down 'struct clockintr' for exclusive use during clockintr_dispatch(). - In clockintr_queue, replace the "cq_shadow" clockintr with a "cq_request" clockrequest. They serve the same purpose. - CLST_SHADOW_PENDING -> CR_RESCHEDULE; different namespace, same meaning. - CLST_IGNORE_SHADOW -> CLST_IGNORE_REQUEST; same meaning. - Move shadow branch in clockintr_advance() to clockrequest_advance(). - clockintr_request_random() becomes clockrequest_advance_random(). - Delete dead shadow branches in clockintr_cancel(), clockintr_schedule(). - Callback functions now get a clockrequest pointer instead of a special clockintr pointer: update all prototypes, callers. No functional change intended.
2023-10-12	timeout: add TIMEOUT_MPSAFE flag	Scott Soule Cheloha
	Add a TIMEOUT_MPSAFE flag to signal that a timeout is safe to run without the kernel lock. Currently, TIMEOUT_MPSAFE requires TIMEOUT_PROC. When the softclock() is unlocked in the future this dependency will be removed. On MULTIPROCESSOR kernels, softclock() now shunts TIMEOUT_MPSAFE timeouts to a dedicated "timeout_proc_mp" bucket for processing by the dedicated softclock_thread_mp() kthread. Unlike softclock_thread(), softclock_thread_mp() is not pinned to any CPU and runs run at IPL_NONE. Prompted by bluhm@. Lots of input from bluhm@. Joint work with mvs@. Prompt: https://marc.info/?l=openbsd-tech&m=169646019109736&w=2 Thread: https://marc.info/?l=openbsd-tech&m=169652212131109&w=2 ok mvs@
2023-10-11	kernel: expand fixed clock interrupt periods to 64-bit values	Scott Soule Cheloha
	Technically, all the current fixed clock interrupt periods fit within an unsigned 32-bit value. But 32-bit multiplication is an accident waiting to happen. So, expand the fixed periods for hardclock, statclock, profclock, and roundrobin to 64-bit values. One exception: statclock_mask remains 32-bit because random(9) yields 32-bit values. Update the initclocks() comment to make it clear that this is not an accident.
2023-10-11	clockintr: move clockintr_schedule() into public API	Scott Soule Cheloha
	Prototype clockintr_schedule() in <sys/clockintr.h>.
2023-10-11	clockintr_stagger: rename parameters: "n" -> "numer", "count" -> "denom"	Scott Soule Cheloha
	Rename these parameters to align the code with the forthcoming manpage. No functional change.
2023-10-08	clockintr: move intrclock wrappers from sys/clockintr.h to kern_clockintr.c	Scott Soule Cheloha
	intrclock_rearm() and intrclock_trigger() are not part of the public API, so there's no reason to implement them in sys/clockintr.h. Move them to kern_clockintr.c.
2023-10-06	In sys___thrsigdivert() switch tsleep_nsec() to use the nowake ident	Claudio Jeker
	channel instead of inventing an own one. OK kettenis@ mvs@
2023-10-01	Add sysctl hw.ucomnames to list 'fixed' paths to USB serial	Kenneth R Westerback
	ports. Suggested by deraadt@, USB route idea from kettenis@. Feedback from anton@, man page improvements from deraadt@, jmc@, schwarze@. ok deraadt@ kettenis@
2023-09-29	Extend single_thread_set() mode with additional flag attributes.	Claudio Jeker
	The mode can now be or-ed with SINGLE_DEEP or SINGLE_NOWAIT to alter the behaviour of single_thread_set(). This allows explicit control of the SINGLE_DEEP behaviour. If SINGLE_DEEP is set the deep flag is passed to the initial check call and by that the check will error out instead of suspending (SINGLE_UNWIND) or exiting (SINGLE_EXIT). The SINGLE_DEEP flag is required in calls to single_thread_set() outside of userret. E.g. at the start of sys_execve because the proc is not allowed to call exit1() in that location. SINGLE_NOWAIT skips the wait at the end of single_thread_set() and therefor returns BEFORE all threads have been parked. Currently this is only used by the ptrace code and should not be used anywhere else. Not waiting for all threads to settle is asking for trouble. This solves an issue by using SINGLE_UNWIND in the coredump case where the code should actually exit in case another thread crashed moments earlier. Also the SINGLE_UNWIND in pledge_fail() is now marked SINGLE_DEEP since the call to pledge_fail() is for sure not at the kernel boundary. OK mpi@
2023-09-25	ddb(4): clockintr: print cl_arg address when displaying a clockintr	Scott Soule Cheloha

2023-09-24	kern_clockintr.c: remove extra newline	Scott Soule Cheloha

2023-09-23	Fix unreliable sys_setsockopt() with consistent use of M_WAIT	Jan Klemkow
	Also remove useless NULL check. ok bluhm@
2023-09-22	Make `logread_filterops' MP safe. For that purpose use `log_mtx' mutex(9)	Vitaliy Makkoveev
	protecting message buffer. ok bluhm
2023-09-21	Move code inside exit1() to better spots.	Claudio Jeker
	- PS_PROFIL bit is moved into the process cleanup block where it belongs - The proc read-only limit cache cleanup is moved up right after clearing p->p_fd cache. lim_free() can potentially sleep and so needs to be above the line where p_stat is set to SDEAD. With and OK jca@
2023-09-19	Improve the output of ddb "show proc" command	Claudio Jeker
	Include missing fields -- like the sleep channel and message -- and show both the PID and TID of the proc. Also add '/t' as an argument that can be used to specify a proc by TID instead of by address. OK mpi@
2023-09-19	Add a KASSERT for p->p_wchan == NULL to setrunqueue()	Claudio Jeker
	There is the same check in sched_chooseproc() but that is too late to know where the bad insertion into the runqueue was done. OK mpi@
2023-09-19	Before coredump or in pledge_fail use SINGLE_UNWIND to stop all threads.	Claudio Jeker
	SINGLE_UNWIND unwinds to the kernel boundary. On the other hand SINGLE_SUSPEND will sleep inside tsleep(9) and other sleep functions. Since the code will exit1() very soon after it is better to already unwind. Now one could argue that for coredumps all threads should stop asap to get a clean dump. Using SINGLE_UNWIND the sleep will fail with ERESTART and no copyout should happen in that case. This is a bit of a workaround since SINGLE_SUSPEND has a small race where single_thread_wait() returns before all threads are really stopped. When SINGLE_EXIT is called quickly after this can blow up inside sleep_finish. Reported-by: syzbot+3ef066fcfaf991f2ac2c@syzkaller.appspotmail.com OK mpi@ kettenis@
2023-09-17	clockintr.h: forward-declare "struct cpu_info" for clockintr_establish()	Scott Soule Cheloha
	With input from claudio@ and deraadt@.