src - OpenBSD base system

Age	Commit message (Collapse)	Author
2023-04-25	vmm(4)/vmd(8): pull struct members out of vmm ioctl create struct.	Dave Voutila
	The object sent to vmm(4) contained file paths and details the kernel does not need for cpu virtualization as device emulation is in userland. Effectively, "pull up" the struct members from the vm_create_params struct to the parent vmop_create_params struct. This allows us to clean up some of vmd(8) and simplify things for switching to having vmctl(8) open the "kernel" file (SeaBIOS, bsd.rd, etc.) to allow users to boot recovery ramdisk kernels. ok mlarkin@
2023-04-22	Rename the XCR0_* #defines to XFEATURE_* and add the new supervisor-state	Philip Guenther
	features: while all are appropriate for xsaves/xrstors, the supervisor-state features aren't for xcr0 but rather for the new XSS_MSR, making the current names kinda confusing. Add #defines for masking bits for xcr0 vs XSS. Add and report the new XSAVE_XFD xsave subfeature bit. ok mlarkin@
2023-04-17	Add endbr64 instructions to most of the ENTRY() macros.	Theo de Raadt
	The IDTVEC() and KIDTVEC() macros also get a endbr64, and therefore we need to change the way that vectors are aliased with a new IDTVEC_ALIAS() macro. with guenther, jsg
2023-04-15	add endbr defines and control protection trap	Jonathan Gray
	ok deraadt@
2023-04-14	add VMX/VMCS defines for amd64 endbr64 features	Dave Voutila
	"these are fine," mlarkin@
2023-04-13	pmap_copy() has never, ever, been implemented in any of the platforms OpenBSD	Miod Vallat
	ever ran on, and it's unlikely to ever be implemented, so remove it. ok jsg@
2023-04-10	spelling	Jonathan Gray

2023-03-26	amd64: identify IBT capability in cpu(4) dmesg lines	Mike Larkin
	requested by and ok deraadt@
2023-03-19	Aggressively randomize the location of the stack on all 64-bit architectures	Mark Kettenis
	except alpha. This will put the stack at a random location in the upper 1/4th of the userland virtual address space providing up to 26 additional bits of randomness in the address. Skip alpha for now since it currently puts the stack at a (for a 64-bit architecture) very low address. Skip 32-bit architectures for now as well since those have a much smaller virtual address space and we need more time to figure out what a safe amount of extra randomizations is. These architectures will continue to use a mildly randomized stack address through the existing stackgap random mechanism. We will revisit this after 7.3 is released. This should make it harder for an attacker to find the stack. ok deraadt@, miod@
2023-01-31	On systems without xonly mmu hardware-enforcement, we can still mitigate	Theo de Raadt
	against classic BROP with a range-checking wrapper in front of copyin() and copyinstr() which ensures the userland source doesn't overlap the main program text, ld.so text, signal tramp text (it's mapping is hard to distinguish so it comes along for the ride), or libc.so text. ld.so tells the kernel libc.so text range with msyscall(2). The range checking for 2-4 elements is done without locking (because all 4 ranges are immutable!) and is inexpensive. write(sock, &open, 400) now fails with EFAULT. No programs have been discovered which require reading their own text segments with a system call. On a machine without mmu enforcement, a test program reports the following: userland kernel ld.so readable unreadable mmap xz unreadable unreadable mmap x readable readable mmap nrx readable readable mmap nwx readable readable mmap xnwx readable readable main readable unreadable libc unmapped? readable unreadable libc mapped readable unreadable ok kettenis, additional help from miod
2023-01-30	vmm(4): save and restore guest pkru.	Dave Voutila
	Take a simple approach for saving and restoring PKRU if the host has PKE support enabled. Uses explicit rdpkru/wrpkru instructions for now instead of xsave. This functionality is still gated behind amd64 pmap checking for operation under a hypervisor as well as vmm masking the cpuid bit for PKU. "if your diff is good, then commit it" -deraadt@
2023-01-28	Move some header definitions from vmm(4) to vmd(8).	Dave Voutila
	Part of an ongoing effort to move userland-specific information out of a kernel header and directly into vmd(8). No functional change. ok mlarkin@
2023-01-20	On cpu with the PKU feature, prot=PROT_EXEC pages now create pte which	Theo de Raadt
	contain PG_XO, which is PKU key1. On every exit from kernel to userland, force the PKU register to inhibit data read against key1 memory. On (some) traps into the kernel if the PKU register is changed, abort the process (processes have no reason to change the PKU register). This provides us with viable xonly functionality on most modern intel & AMD cpus. I started with a xsave-based diff from dv@, but discovered the fpu save/restore logic wasn't a good fit and went to direct register management. Disabled on HV (vm) systems until we know they handle PKU correctly. ok kettenis, dv, guenther, etc
2023-01-19	Revise implementation of pmap_protect(9) in preparation for execute-only	Mark Kettenis
	support. The current implementation doesn't handle the transition from RWX to RW correctly. Also generalize the pmap_write_protect() function in recognition of the fact that execute permission, write permission, and in the future read permission on executable pages, are handled by separate bits. ok deraadt@, mpi@
2023-01-19	Restrict vmm(4) exposed cpuid extended feature flags.	Dave Voutila
	We don't emulate or support most of the EAX=7,ECX=0 feature bits, so restrict the mask further to just UMIP. ok deraadt@
2023-01-18	change BIOSF_SMBIOS bit flag from 6 to 8	Jonathan Gray
	matches tom@'s i386 rev 1.47 change
2023-01-17	Simplify and clarify the implementation of the pmap_page_protect(9) API.	Mark Kettenis
	This function is only ever called with PROT_NONE or PROT_READ where PROT_NONE removes the mapping from the page tables and PROT_READ takes away write permission. Add a KASSERT to make sure no other values are passed. This KASSERT should be optimized away by any decent compiler. ok deraadt@, mpi@, guenther@
2023-01-16	3 new defines: he PTE protection key mask, the specific key value we use	Theo de Raadt
	for execute-only, and the PKU value used by userland to use that key.
2023-01-14	Implement access to EFI variables and ESRT through an ioctl(2) interface	Mark Kettenis
	that is compatible with what FreeBSD and NetBSD have. Setting EFI variables is only allowed at securelevel 0 and below. Heavily based on work done by Sergii Dmytruk. ok yasuoka@
2023-01-14	add protection-key violation error code for page-fault exceptions	Jonathan Gray
	ok deraadt@
2023-01-14	recognise protection keys for supervisor-mode (PKS) in cpuid	Jonathan Gray
	ok deraadt@
2023-01-14	sync cr4 and xcr0 bits with intel dec 2022 sdm	Jonathan Gray
	ok deraadt@
2023-01-10	Hide WAITPKG cpu feature from vmm(4) guests.	Dave Voutila
	Alder Lake and similar-era Intel platforms introduced new userland wait instructions. Since vmm was passing this cpuid bit into guests, some would attempt TPAUSE instructions and trigger invalid instruction exceptions because VMX requires additional configuration to support emulation. This also adds WAITPKG to i386 and amd64 cpu feature identification. Input from anton@, cheloha@, and guenther@. Tested by jmatthew@. OK deraadt.
2023-01-02	Let the EFI bootloader make a copy of the EFI System Resource Table (ESRT)	Mark Kettenis
	and pass it to the kernel. ok jca@, patrick@
2022-12-26	vmd(8): provide a detailed e820 memory map.	Dave Voutila
	When booting guests with SeaBIOS, vmd(8) supplied details about the available guest memory via CMOS registers. Consequently, we've been carrying some patches in the ports tree to SeaBIOS to fetch this information like it's the 1990s. When a vm initializes memory ranges, we now track what each range represents. This information can be used to supply the e820 memory map to SeaBIOS via the fw_cfg interface allowing it to properly communicate memory ranges to a guest operating system. (This will also allow us to drop some patches from the port.) Given the ranges can now be marked with a purpose, this also allows vmm(4) to switch from hard-coded mmio ranges and instead let the information on the memory range dictate if vmm should be handling a page fault or sending to vmd for a memory assist. Tested by Mischa Peters and others. OK mlarkin@.
2022-12-01	_C_LABEL() is no longer useful in the "everything is ELF" world.	Philip Guenther
	Start eliminating it. ok mpi@ mlarkin@ krw@
2022-11-29	Move the generic variable definitions from the ASM at the top of	Philip Guenther
	locore.S to be in C in cpu.c, machdep.c, pmap.c, or bus_space.c for better typing/debug info. Delete REALBASEMEM, REALEXTMEM, and biosextmem as unused/ignored. ok mpi@ krw@ mlarkin@
2022-11-10	vmd(8): import mmio decode and emulation, disabled for now.	Dave Voutila
	The initial mmio support for vmd adds support for only specific MOV and MOVZX instructions. Plan is to begin iterating in-tree on other missing pieces. All functionality is gated behind an #if for now. Only change to vmm(4) is reordering register #define's in vmmvar.h. ok mlarkin@
2022-11-09	vmm(4): treat vcpu lists as immutable, reducing complexity.	Dave Voutila
	Since vmm doesn't support hot-plug vcpus we can reduce complexity by treating the vcpu list per vm as immutable after creation. As a consequence, we can use the vm reference count to protect the lifetime of the vcpus, removing the need for reference counting individual vcpu objects. With an immutable list, we no longer need a rwlock protecting it either. Original diff from dlg@ that I reworked and tested. ok dlg@, mlarkin@
2022-11-08	further speed up delivery of interrupts to a running vcpu.	David Gwynne
	this records which physical cpu a vcpu is running on. this is used by the code that marks a vcpu as having a pending interrupt to check if the vcpu is currently running. if it thinks the vcpu is running, it sends a nop IPI to the physical cpu it is running on to trigger a vmexit, which in turn runs interrupt handling in the guest. ok mlarkin@
2022-11-08	amd64: switch to clockintr(9)	Scott Soule Cheloha
	Switch amd64 to the clockintr(9) subsystem. There are lots of little changes, but the bigs ones are listed here. When using the local apic timer: - Run the timer in one-shot mode. - lapic_delay() is gone. We can't use it to delay(9) when running the timer in one-shot mode. - Add a randomized statclock(); stathz = hz. - Add support for switching to profhz when profiling is enabled; profhz = stathz * 10. When using the i8254/mc146818: - i8254's clockintr() no longer has a monopoly on hardclock(). - mc146818's rtcintr() no longer has a monopoly on statclock(). - In profiling mode, the statclock() will drift very slightly because (profhz = 1024) does not divide evenly into one billion. We could avoid this by setting (profhz = 512) instead and programming the RTC to run at that rate. Early revisions reviewed by mlarkin@. Extensively tested by mlarkin@ on a variety of physical and virtual hardware. Additional testing from dv@ and jmc@. Link: https://marc.info/?l=openbsd-tech&m=166776339203279&w=2 ok kettenis@ mlarkin@
2022-11-08	amd64: add delay_fini()	Scott Soule Cheloha
	Not all of the clocks with a delay(9) implementation necessarily keep ticking across suspend/resume. We need a clean way to reverse delay_init() during suspend when those clocks stop ticking. Hence, delay_fini(). delay_fini() resets delay_func() to i8254_delay() if the given function pointer is the active delay(9) implementation. ok mlarkin@
2022-10-24	tsc: AMD Family 17h, 19h: compute frequency from Core::X86::Msr:PStateDef	Scott Soule Cheloha
	Compute the TSC frequency on AMD family 17h and 19h CPUs using the PStateDef MSRs. Link 1: https://marc.info/?l=openbsd-tech&m=166394236029484&w=2 Link 2: https://marc.info/?l=openbsd-tech&m=166446065916283&w=2 Test list: https://marc.info/?l=openbsd-tech&m=166646389821326&w=2 Reviewed by kettenis@ using the AMD documents cited in the comments. Maybe reviewed by mlarkin@? I can't remember. He seemed supportive of the idea at least. ok kettenis@
2022-10-16	Add the guts for EFI runtime services support on amd64. This will be used	Mark Kettenis
	in the future to implement support for things like EFI variables. ok krw@ (a few others ok'ed earlier incarnations of this diff)
2022-09-22	use the always serializing RDTSCP instruction in tsc and usertc if available	Robert Nagy
	tweaks from cheloha@; ok deraadt@, sthen@, cheloha@
2022-09-20	Split out handling of cpu family specific MSRs from cpu_init_msrs()	Robert Nagy
	to a separate function that gets called after identifycpu() so that we have the required information to handle the correct MSRs for each cpu. Additionally, move the handling of the DE_CFG_SERIALIZE_LFENCE and IA32_DEBUG_INTERFACE_LOCK MSRs out of identifycpu() to the new function so that they get set again after a suspend/resume cycle as well, which in fixes TSC sync failures. discussed with and input from deraadt@, mlarkin@
2022-09-01	vmm(4): send all port io emulation to userland	Dave Voutila
	Simplify things by sending any io exits from IN/OUT instructions to userland instead of trying to emulate anything in the kernel. vmm was sending most pertinent exits to vmd anyways, so this functionally changes little. An added benefit is this solves an issue reported by tb@ where i386 OpenBSD guests would probe for a pc keyboard repeatedly and cause excessive vm exits. (The emulation in vmm was not properly handling these port reads.) While here, make the assignment of the VEI_DIR_{IN,OUT} enum values not assume the underlying integer the compiler may assign. ok mlarkin@
2022-08-30	Initial support for mmio assist for vmm(4)	Dave Voutila
	Provide the basic information required for a userland assist in emulating instructions touching mmio regions, sending as much information as is provided by the host hardware. No decode or assist provided at the moment by vmd(8). ok mlarkin@
2022-08-30	Remove long unused WARN_REFERENCES macro; idea guenther@, ok jsg@ jca@	Miod Vallat

2022-08-29	static inline, not inline static	Jonathan Gray
	c99 6.11.5: "The placement of a storage-class specifier other than at the beginning of the declaration specifiers in a declaration is an obsolescent feature." ok guenther@
2022-08-29	use ansi volatile keyword, not __volatile	Jonathan Gray
	ok miod@ guenther@
2022-08-25	amd64, i386: add delay_init(): basic delay(9) implementation management	Scott Soule Cheloha
	Because the clock situation on x86 and amd64 is a terminal clusterfuck, there are many different ways to delay(9). We need a rudimentary mechanism for gracefully switching to progressively better delay(9) implementations as they become available during boot without riddling the code with ifdefs and function pointer comparisons. This patch adds delay_init() to both amd64 and i386. If the quality value passed to delay_init() exceeds the quality value of the current delay_func, delay_init() changes delay_func to the given function pointer and updates the quality value. Both platforms start with delay_func set to i8254_delay() and a quality value of zero: all other delay(9) implementations are preferable. Idea and patch provided by jsg@. With tons of input, research, and advice from jsg@. Link: https://marc.info/?l=openbsd-tech&m=166053729104923&w=2 ok mlarkin@ jsg@
2022-08-22	remove extern for cpu var removed in 2015	Jonathan Gray
	ok daniel@
2022-08-20	remove Cyrix 486DLC register defines from amd64	Daniel Dickman
	Cyrix CPUs don't support amd64. These defines were probably carried over from i386 accidentally when the amd64 code was first imported. ok mlarkin@, jsg@
2022-08-12	amd64: simplify TSC synchronization testing	Scott Soule Cheloha
	Computing a per-CPU TSC skew value is error-prone, especially on multisocket machines and VMs. My best guess is that larger latencies appear to the current skew measurement test as TSC desync, and so the TSC is demoted to a kernel timecounter on these machines or marked non-monotonic. This patch eliminates per-CPU TSC skew values. Instead of trying to measure and correct for TSC desync we only try to detect desync, which is less error-prone. This approach should allow a wider variety of machines to use the TSC as a timecounter when running OpenBSD. In the new sync test, both CPUs repeatedly try to detect whether their TSC is trailing the other CPU's TSC. The upside to this approach is that it yields no false positives. The downside to this approach is that it takes more time than the current skew measurement test. Each test round takes 1ms, and we run up to two rounds per CPU, so this patch slows boot down by 2ms per AP. If any CPU fails the sync test, the TSC is marked non-monotonic and a different timecounter is activated. The TC_USER flag remains intact. There is no middle ground where we fall back to only using the TSC in the kernel. Before running the test, we check for the IA32_TSC_ADJUST register and reset it if necessary. This is a trivial way to work around firmware bugs that desync the TSC before we reach the kernel. Unfortunately, at the moment this register appears to only be available on Intel processors. I cannot find an equivalent but differently-named MSR for AMD processors. Because there is no per-CPU skew value, there is also no concept of TSC drift anymore. Miscellaneous notes: - This patch adds a new timecounter utility function, tc_reset_quality(). Used after sync test failure to mark the TSC non-monotonic. - I have left TSC_DEBUG enabled for now. Unsure if we should leave it enabled for release or not. If we disable it we no longer run the sync test after failing it once. Running the test even after failure provides information about the desync on every CPU. - Taking 1ms per test round is fairly conservative. We can experiment with and discuss shorter test rounds. My main goal with a relatively long test round is ensuring VMs actually run the test. It would be bad if a hypervisor interrupted the test for so long that it concealed desync. - The use of two test rounds is mostly a diagnostic tool: it would be very strange if a CPU passed the first round but failed the second. If we ever saw this in the wild it would indicate something odd. - Most of the desync seen in test reports is on Ryzen CPUs. I believe, but cannot prove, that this is due to a widespread firmware bug on AMD motherboards. Hopefully AMD and/or the downstream vendors fix it. - Fixing TSC desync by writing the TSC directly with WRMSR is very difficult. The TSC is a moving target incrementing very quickly and compensating for WRMSR overhead is non-trivial. We can experiment with this, but my confidence is low that we can make it work reliably. Prompted by deraadt@ and kettenis@ in 2021. Shepherded along by deraadt@ throughout. Reprompted by Yuichiro Naito several times. With input from Yuichiro Naito, naddy@, sthen@, dv@, and deraadt@. Tested by florian@, gnezdo@, sthen@, Josh Rickmar, dv@, Mohamed Aslan, Hrvoje Popovski, Yuichiro Naito, semarie@, mlarkin@, asou@, jmatthew@, Renato Aguiar, and Timo Myyra. Patch v1: https://marc.info/?l=openbsd-tech&m=164330092208035&w=2 Patch v2: https://marc.info/?l=openbsd-tech&m=164558519712957&w=2 Patch v3: https://marc.info/?l=openbsd-tech&m=165698681018991&w=2 Patch v4: https://marc.info/?l=openbsd-tech&m=165835507113680&w=2 Patch v5: https://marc.info/?l=openbsd-tech&m=165923705118770&w=2 "just commit it" deraadt@
2022-08-07	Start to add annotations to the cpu_info members, doing I/a/o for	Philip Guenther
	immutable/atomic/owned ala <sys/proc.h>. Move CPUF_USERSEGS and CPUF_USERXSTATE, which really are private to the CPU, into a new ci_pflags and rename s/CPUF_/CPUPF_/. Make all (remaining) ci_flags alterations via atomic_{set,clear}bits_int(), so its annotation isn't a lie. Delete ci_info member as unused all the way from rev 1.1 ok jsg@ mlarkin@
2022-07-12	remove cache parts of struct cpu_info only vmm used	Jonathan Gray
	suggested by and ok mlarkin@
2022-07-02	remove machine/lock.h where unused	Jonathan Gray
	Previously for __cpu_simple_lock parts. Now only hppa and m88k use __cpu_simple_lock (and hppa uses atomic.h for it). ok miod@ visa@
2022-06-30	vmm(4): reference count vm's and vcpu's	Dave Voutila
	Unlocking most of vmm last year at k2k21 exposed bugs related to lifetime management of vm and vcpu objects. Add reference counts to make sure we don't attempt to teardown vcpu or vm related objects while a thread is holding a reference. This also reduces abuse of rwlocks originally intended to protect the linked lists cleaning things up quite a bit. While here, also document assumptions on how struct members are protected for the next brave soul wander in. ok mlarkin@
2022-06-29	Add support for using non-standard UARTs (such as the Synopsys DesignWare	Mark Kettenis
	UART found on AMD's Ryzen Embedded V1000 family) as an early console. This requires additional parameters to be passed by the bootloader to the kernel so it changes the struct for the BOOTARG_CONSDEV boot argument. The old struct will still be supported until OpenBSD 7.3 has been released such that new kernels boot with the old bootloader. ok anton@, deraadt@