src - OpenBSD base system

Age	Commit message (Collapse)	Author
2023-12-14	NKMEMPAGES_MAX_DEFAULT is no longer used. Remove it from param.h.	Claudio Jeker
	OK miod@
2023-12-12	remove support for syscall(2) -- the "indirection system call" because	Theo de Raadt
	it is a dangerous alternative entry point for all system calls, and thus incompatible with the precision system call entry point scheme we are heading towards. This has been a 3-year mission: First perl needed a code-generated wrapper to fake syscall(2) as a giant switch table, then all the ports were cleaned with relatively minor fixes, except for "go". "go" required two fixes -- 1) a framework issue with old library versions, and 2) like perl, a fake syscall(2) wrapper to handle ioctl(2) and sysctl(2) because "syscall(SYS_ioctl" occurs all over the place in the "go" ecosystem because the "go developers" are plan9-loving unix-hating folk who tried to build an ecosystem without allowing "ioctl". ok kettenis, jsing, afresh1, sthen
2023-12-12	The sigtramp was calling sigreturn(2), and upon failure exit(2), which	Theo de Raadt
	doesn't make sense anymore. It is better to just issue an illegal instruction. ok kettenis, with some misgivings about inconsistant approaches between architectures. In the future we could change sigreturn(2) to never return an exit code, but always just terminate the process. We stopped this system call from being callable ages ago with msyscall(2), and there is no stub for it in libc.. maybe that's the next step to take?
2023-12-11	Implement per-CPU caching for the page table page (vp) pool and the PTE	Mark Kettenis
	descriptor (pted) pool in the arm64 pmap implementation. This significantly reduces the side-effects of lock contention on the kernel map lock that is (incorrectly) translated into excessive page daemon wakeups. This is not a perfect solution but it does lead to significant speedups on machines with many CPU cores. This requires adding a new pmap_init_percpu() function that gets called at the point where kernel is ready to set up the per-CPU pool caches. Dummy implementations of this function are added for all non-arm64 architectures. Some other architectures can probably benefit from providing an actual implementation that sets up per-CPU caches for pmap pools as well. ok phessler@, claudio@, miod@, patrick@
2023-12-10	vmm(4): flush EPTs after enabling VMX mode.	Dave Voutila
	The Intel SDM states the vmxon/vmxoff instructions don't invalidate any EPT states on the cpu and recommend invalidating the global context. vmm(4) opportunistically disables and enables VMX mode as vms are created or terminated, so this adds a recommended housekeeping step per the SDM. While here, tidy up the CR4 toggling by moving it to after the MSR feature check. ok mlarkin@
2023-12-10	Add a new label "sigcodecall" inside every sigtramp definition, directly	Theo de Raadt
	in front of the syscall instruction. This is used to calculate the start of the syscall for SYS_sigreturn and pinned system calls. ok kettenis
2023-11-28	Adapt inv{vpid,ept} to return success or failure.	Dave Voutila
	ok mlarkin@
2023-11-26	vmm(4)/vmx: pass correct vpid value to invvpid.	Dave Voutila
	While vmm's use of invvpid in the vmx vcpu run loop is questionable since we require and use EPT, the vpid value is unquestionably wrong in these calls. ok mlarkin@
2023-11-24	vmm(4)/vmx: fix memory scribbling by updating GDTR/TR if vcpu moves.	Dave Voutila
	If the vcpu thread sleeps in the kernel, like when handling a nested page fault and calling uvm_fault(9), the thread may be rescheduled on another host cpu. vmm(4) was only setting the GDTR and TR bases in the VMCS once prior to first vm entry, so a thread migration can result in restoring the wrong GDTR and TR on vm exit for the host cpu. This results in borked interrupts and corrupted stack pointers, causing programs to segfault or sigabort. It can also result in missed ipi's causing kernel deadlocks. Use similar logic to the SVM routines and check for cpu migration within the hot loop. Since we're letting the VMX features of the cpu restore GDTR, we can also drop the manual store/load routines. Reported and with much appreciated testing help from Mischa Peters. ok mlarkin@
2023-11-22	Fix race when initializing TSC.	Alexander Bluhm
	During boot TSC initialization could fail with panic: tsc_test_sync_ap: cpu2: tsc_ap_name is not NULL: cpu1. The root cause is a race between the moment the application processor sets CPUF_IDENTIFIED in cpu_hatch() and the moment the boot processor checks CPUF_IDENTIFIED in cpu_start_secondary() before the TSC sync test. The fix is to set CPUF_IDENTIFIED before clearing CPUF_IDENTIFY in cpu_hatch(). from hshoexer@ cheloha@; OK deraadt@ mlarkin@
2023-11-13	include function name in warning printf in vmx_handle_np_fault() and ↵	Jasper Lievisse Adriaanse
	svm_handle_np_fault() more clearer output ok mlarkin@
2023-11-11	Fix variable name in comment	Jeremie Courreges-Anglas
	Mostly a dummy commit so that the last llvm commit ends up in the git export. (No idea whether it's actually/still needed but it can't hurt.)
2023-10-30	Use KERNEL_ASSERT_UNLOCKED() instead of KASSERT(!_kernel_lock_held()).	Vitaliy Makkoveev
	The first one defined as the second, so no functional changed. ok claudio
2023-10-26	make efi_getdisklabel_cd9660() handle a block size of 512 and simplify	Jonathan Gray
	ok yasuoka@
2023-10-24	Normally context switches happen in mi_switch() but there are 3 cases	Claudio Jeker
	where a switch happens outside. Cleanup these code paths and make the machine independent. - when a process forks (fork, tfork, kthread), the new proc needs to somehow be scheduled for the first time. This is done by proc_trampoline. Since proc_trampoline is machine dependent assembler code change the MP specific proc_trampoline_mp() to proc_trampoline_mi() and make sure it is now always called. - cpu_hatch: when booting APs the code needs to jump to the first proc running on that CPU. This should be the idle thread for that CPU. - sched_exit: when a proc exits it needs to switch away from itself and then instruct the reaper to clean up the rest. This is done by switching to the idle loop. Since the last two cases require a context switch to the idle proc factor out the common code to sched_toidle() and use it in those places. Tested by many on all archs. OK miod@ mpi@ cheloha@
2023-10-13	enable dwqe(4) in RAMDISK_CD	Stefan Sperling
	reminded by jsg@, snapshot test build done by me
2023-10-11	Add initial support for Elkhart Lake ethernet to dwqe(4).	Stefan Sperling
	For now, only attach to PSE0/RGMII (device ID 0x4ba0) which is the only device I have access to for testing. There is a known problem where Tx throughput is lower than expected. This is being looked into. ok kettenis@
2023-09-25	enable mbg(4) at pci on amd64, from Maurice Janssen	Theo de Raadt

2023-09-17	clockintr: remove clockintr_init(), clockintr_flags	Scott Soule Cheloha
	All the state initialization once done in clockintr_init() has been moved to other parts of the kernel. It's a dead function. Remove it. Likewise, the clockintr_flags variable no longer sports any meaningful flags. Remove it. This frees up the CL_* flag namespace, which might be useful to the clockintr frontend if we ever need to add behavior flags to any of those functions.
2023-09-14	clockintr: replace CL_RNDSTAT with global variable statclock_is_randomized	Scott Soule Cheloha
	In order to separate the statclock from the clock interrupt subsystem we need to move all statclock state out into the broader kernel. Start by replacing the CL_RNDSTAT flag with a new global variable, "statclock_is_randomized", in kern_clock.c. Update all clockintr_init() callers to set the boolean instead of passing the flag. Thread: https://marc.info/?l=openbsd-tech&m=169428749720476&w=2
2023-09-10	load amd patch into a malloc'd region to make it page aligned	Jonathan Gray
	avoids a General-Protection Exception on patch loader wrmsr with A10-5700, TN-A1 00610f01 15-10-01 the alignment requirement is not present on at least Ryzen 5 2600X, PiR-B2 00800f82 17-08-02 problem reported and fix tested by espie@
2023-09-08	Clean up old console bootargs	Klemens Nanni
	7.3 is long gone, you must have new bootloaders and new kernels. Zaps both condition and else block, unindent and merge lines where fit. Feedback OK kettenis Tests OK denis
2023-09-06	vmm(4)/vmd(8): include pending interrupt in vm_run_parmams.	Dave Voutila
	To remove an ioctl(2) from the vcpu thread hotpath in vmd(8), add a flag in the vm_run_params structure to indicate if there's another interrupt pending. This reduces latency in vcpu work related to i/o as we save a trip into the kernel just to flip the interrupt pending flag on or off. Tested by phessler@, mbuhl@, stsp@, and Mischa Peters. ok mlarkin@
2023-09-06	revert disabling warnings for zlib on clang >= 15	Jonathan Gray
	no longer needed with zlib 1.3 ok tb@
2023-09-05	vmm(4): switch the APMI CPUID mask to an include mask	Mike Larkin
	dv points out that there are other bits there that imply the existence of other MSRs, so switching this to an include list is a better idea.
2023-09-03	vmm(4): Suppress AMD HwPstate visibility to guests	Mike Larkin
	On newer Ryzen/EPYC, we need to hide the HwPstate CPUID 80000007:EDX field for HwPstate, or guests will try to access the MSRs associated with those, and that will fail with #GP. ok deraadt
2023-08-23	all platforms: separate cpu_initclocks() from cpu_startclock()	Scott Soule Cheloha
	To give the primary CPU an opportunity to perform clock interrupt preparation in a machine-independent manner we need to separate the "initialization" parts of cpu_initclocks() from the "start the clock interrupt" parts. Currently, cpu_initclocks() does everything all at once, so there is no space for this MI setup. Many platforms have more-or-less already done this separation by implementing a separate routine named "cpu_startclock()". This patch promotes cpu_startclock() from de facto standard to mandatory API. - Prototype cpu_startclock() in sys/systm.h alongside cpu_initclocks(). The separation of responsibility between the two routines is a bit fuzzy but the basic guidelines are as follows: + cpu_initclocks() must initialize hz, stathz, and profhz, and call clockintr_init(). + cpu_startclock() must call clockintr_cpu_init() and start the clock interrupt cycle on the calling CPU. These guidelines will shift in the future, but that's the way things stand as of this commit. - In initclocks(): first call cpu_initclocks(), then do MI setup, and last call cpu_startclock(). - On platforms where cpu_startclock() already exists: don't call cpu_startclock() from cpu_initclocks() anymore. - On platforms where cpu_startclock() doesn't yet exist: implement it. Usually this is as simple as dividing cpu_initclocks() in two. Tested on amd64 (i8254, lapic), arm64, i386 (i8254, lapic), macppc, mips64/octeon, and sparc64. Tested on arm/armv7 (agtimer(4)) by phessler@ and jmatthew@. Tested on m88k/luna88k by aoyama@. Tested on powerpc64 by gkoehler@ and mlarkin@. Tested on riscv64 by jmatthew@. Thread: https://marc.info/?l=openbsd-tech&m=169195251322149&w=2
2023-08-16	add Intel ARCH_CAP_GDS bits	Jonathan Gray
	mentioned in https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/gather-data-sampling.html
2023-08-15	Replace a bunch of (1 << 31) with (1U << 31)	Miod Vallat

2023-08-09	correct platform id mask, it is 3 bits 52:50	Jonathan Gray

2023-08-09	show x86 cpu patch level in dmesg	Jonathan Gray
	ok guenther@ deraadt@
2023-07-31	On CPUs with eIBRS ("enhanced Indirect Branch Restricted Speculation")	Philip Guenther
	or IBT enabled the kernel, the hardware should the attacks which retpolines were created to prevent. In those cases, retpolines should be a net negative for security as they are an indirect branch gadget. They're also slower. * use -mretpoline-external-thunk to give us control of the code used for indirect branches * default to using a retpoline as before, but marks it and the other ASM kernel retpolines for code patching * if the CPU has eIBRS, then enable it * if the CPU has eIBRS or IBT, then codepatch the three different retpolines to just indirect jumps make clean && make config required after this ok kettenis@
2023-07-31	The replacement code passed to codepatch_replace() can usefully be	Philip Guenther
	const. suggested by bluhm@
2023-07-28	Include a newline in a DPRINTF()	Philip Guenther

2023-07-28	Add CODEPATCH_CODE() macro to simplify defining a symbol for a chunk	Philip Guenther
	of code to use in codepatching. Use that for all the existing codepatching snippets. Similarly, add CODEPATCH_CODE_LEN() which is CODEPATCH_CODE() but also provides a short variable holding the length of the codepatch snippet. Use that for some snippets that will be used for retpoline replacement. ok kettenis@ deraadt@
2023-07-27	Fix off-by-one: SEFF0ECX_WAITPKG is bit 5, not bit 4.	Philip Guenther
	ok mlarkin@ kettenis@ deraadt@
2023-07-27	Report speculation control bits in dmesg cpu lines.	Philip Guenther
	ok mlarkin@
2023-07-27	The interrupt resume (Xdoreti) and recurse (Xspllower) paths are	Philip Guenther
	invoked using indirect branches and should have endbr64's. ok deraadt@
2023-07-27	Follow the lead of mips64 and make cpu_idle_cycle() just call the	Philip Guenther
	indirect pointer itself and provide an initializer for that going to the default "just enable interrupts and halt" path. ok kettenis@
2023-07-25	statclock: move profil(2), GPROF code to profclock(), gmonclock()	Scott Soule Cheloha
	This patch isolates profil(2) and GPROF from statclock(). Currently, statclock() implements both profil(2) and GPROF through a complex mechanism involving both platform code (setstatclockrate) and the scheduler (pscnt, psdiv, and psratio). We have a machine-independent interface to the clock interrupt hardware now, so we no longer need to do it this way. - Move profil(2)-specific code from statclock() to a new clock interrupt callback, profclock(), in subr_prof.c. Each schedstate_percpu has its own profclock handle. The profclock is enabled/disabled for a given CPU when it is needed by the running thread during mi_switch() and sched_exit(). - Move GPROF-specific code from statclock() to a new clock interrupt callback, gmonclock(), in subr_prof.c. Where available, each cpu_info has its own gmonclock handle . The gmonclock is enabled/disabled for a given CPU via sysctl(2) in prof_state_toggle(). - Both profclock() and gmonclock() have a fixed period, profclock_period, that is initialized during initclocks(). - Export clockintr_advance(), clockintr_cancel(), clockintr_establish(), and clockintr_stagger() via <sys/clockintr.h>. They have external callers now. - Delete pscnt, psdiv, psratio. From schedstate_percpu, also delete spc_pscnt and spc_psdiv. The statclock frequency is not dynamic anymore so these variables are now useless. - Delete code/state related to the dynamic statclock frequency from kern_clockintr.c. The statclock frequency can still be pseudo-random, so move the contents of clockintr_statvar_init() into clockintr_init(). With input from miod@, deraadt@, and claudio@. Early revisions cleaned up by claudio. Early revisions tested by claudio@. Tested by cheloha@ on amd64, arm64, macppc, octeon, and sparc64 (sun4v). Compile- and boot- tested on i386 by mlarkin@. riscv64 compilation bugs found by mlarkin@. Tested on riscv64 by jca@. Tested on powerpc64 by gkoehler@.
2023-07-25	cpu_idle_{enter,leave} are no-ops on amd64 now, so just #define	Philip Guenther
	away the calls ok deraadt@ mpi@ miod@
2023-07-25	Some hypervisors (such as Hertzner) allow msr read of DE_CFG (which does	Theo de Raadt
	not indicate bit 9 set, but they could have a firmware fix) but then block a msr write to bit 9 (which disables enough AVX optimizations to prevent the exfiltration of data), with a fault. So let's also check the HV bit before we decide to modify the bit. hypervisors are expected to set that bit. tested by lucas at sexy dot is. with jsg, ok mlarkin
2023-07-24	Set DE_CFG[9] -- a chickenbit which stops Zenbleed. The chickenbit may	Theo de Raadt
	have other side-effects (not disclosed by AMD), and firmwares fixes may be better (and have other side-effects, same story). Newer processors will probably be validated more carefully by AMD. Issue found by Tavis Ormandy. This is errata 7.2/033_amdcpu.patch.sig and 7.3/011_amdcpu.patch.sig Zenbleed also blocked on select cpus by using errata 7.3/012_amdfirmware.patch.sig + 7.3/013_amdcpufirmware.patch / 7.2/034_amdfirmware.patch.sig + 7.2/035_amdcpufirmware.patch.sig which load AMD cpu firmwares (firmware.openbsd.org is updated often to contain the best firmwares) ok jsg
2023-07-23	update AMD CPU microcode if a newer patch is available	Jonathan Gray
	ok deraadt@
2023-07-22	BOOTARG_UCODE for AMD	Jonathan Gray
	ok deraadt@
2023-07-21	Rename ARCH_CAPABILITIES_* #defined to ARCH_CAP_*	Philip Guenther
	Provide more ARCH_CAP_* defines per June 2023 SDM ok jsg@ deraadt@
2023-07-20	Assign wsdisplay0 to the glass console always. The same change is	YASUOKA Masahiko
	done for GENERIC already. ok kettenis kn
2023-07-10	Enable Indirect Branch Tracking for amd64 userland, using XSAVES/XRSTORS	Philip Guenther
	to save/restore the state and enabling it at exec-time (and for signal handling) if the PS_NOBTCFI flag isn't set. Note: this changes the format of the sc_fpstate data in the signal context to possibly be in compressed format: starting now we just guarantee that that state is in a format understood by the XRSTOR instruction of the system that is being executed on. At this time, passing sigreturn a corrupt sc_fpstate now results in the process exiting with no attempt to fix it up or send a T_PROTFLT trap. That may change. prodding by deraadt@ issues with my original signal handling design identified by kettenis@ lots of base and ports preparation for this by deraadt@ and the libressl and ports teams ok deraadt@ kettenis@
2023-07-08	Move /dev/apm related acpi code to acpi_apm.c which is only built on	Tobias Heider
	amd64 and i386. This is the first step towards a machine independent safe sleep API. tested by yasuoka@ bluhm@ ok deraadt@ kettenis@
2023-07-08	Toggle IBT off during EFI runtime services calls.	Mark Kettenis
	While UEFI 2.10 has a way of indicating that runtime services use the appropriate ENDBR64 instructions, firmware that's out int the wild doesn't actually use that yet. Once the landscape changes we may want to reconsider toggling IBT off. ok guenther@, kn@