src - OpenBSD base system

Age	Commit message (Collapse)	Author
2024-11-08	psp(4) waits for acknowledgement of wbinvd from other CPU.	Alexander Bluhm
	If any other CPU has not finished wbinvd, PSP command may fail. To avoid races, call wbinvd_on_all_cpus_acked() which waits for acknowledgement from IPI handler. Provide stub to build non-MP kernels. from hshoexer@; OK mlarkin@
2024-11-08	remove unused pmap_move()	Jonathan Gray

2024-11-08	remove unused VM_MAXUSER_ADDRESS32	Jonathan Gray

2024-11-07	Expand amd64 wbinvd_on_all_cpus() with acknowledge.	Alexander Bluhm
	Implement wbinvd_on_all_cpus_acked() similar to pmap_tlb_shootpage(). This ensures, wbinvd has been executed on all cores when the function returns. This is needed to avoid psp(4) races. from hshoexer@; OK mlarkin@
2024-10-22	put opening { on same line as struct name	Jonathan Gray
	ok claudio@
2024-10-22	remove prototypes with no matching function	Jonathan Gray

2024-10-21	remove unused MP_PICMODE define	Jonathan Gray

2024-10-07	Remove VMFUNC feature detection and tracking.	Dave Voutila
	vmm(4) doesn't use the VMX VMFUNC instruction. ok mlarkin@
2024-10-02	Move some PCI MMIO defines from vmm(4) kernel headers to userland.	Dave Voutila
	vmm(4) doesn't need this information anymore. vmd(8) is the only consumer of this information. ok mlarkin@
2024-09-26	Add an ipi for executing INVEPT to flush EPT on remote cpus.	Dave Voutila
	Similar to how the fast ipi for tlb flush is implemented, this adds one for calling INVEPT to invalidate EPT caches on the cpu. This is the first step to allowing guest memory to not be wired by UVM and decreases the behavioral differences between Intel and AMD's nested paging in vmm(4) and pmap(9). This change does not hook EPT ptes into the PV list, so the ipi is only used during address space teardown and pte removal. (With the removal of the "mprotect" ioctl, vmm(4) no longer modifies EPT ptes other than inserting them and removing them.) ok mlarkin@
2024-09-21	vmm(4): remove EPT mprotect ioctl	Mike Larkin
	This old ioctl isn't used by vmd(8) and is getting in the way of some improvements we want to do. It was used by solo5 but the person who was helping maintain this is no longer involved with that project. ok dv
2024-09-04	make psp attach to ccp as a different driver	Jonathan Gray
	'fine with me' hshoexer, ok bluhm@
2024-09-01	Pledge "vmm" for ccp(4) ioctl(2).	Alexander Bluhm
	Limit ccp ioctls to processes that pledge vmm. Specific psp device ioctls for AMD SEV will allowed for vmd(8). from hshoexer@; input deraadt@ jsg@
2024-08-27	Enable AMD SEV support in vmm(4).	Alexander Bluhm
	Bring the pieces for vmm(4) to support guests with SEV memory encryption on AMD CPUs. The corresponding vmd(8) changes will follow. Emulate cpuid 0x8000001f so the guest can discover SEV features. Allow vmd(8) to enable SEV on VM creation. Inform vmd(8) about the c-bit position and ASID assigned to each VCPU. Note that vmd(8) has to be rebuilt with the new header files. from hshoexer@; input dv@; OK mlarkin@
2024-08-14	Implement bounce buffering for AMD SEV in amd64 bus dma.	Alexander Bluhm
	When running as SEV guest, as indicated by variable cpu_sev_guestmode, allocate additional pages for each segment on dma map creation. These pages are mapped with the PMAP_NOCRYPT attribute, i.e. the crypt bit is not set in the PTE. Thus, these pages are shared with the hypervisor. When the map is loaded with actual pages, the address in the descriptor is replaced by the corresponding bounce buffer. Using bus_dmamap_sync(), data is copied from the encrypted pages used by guest drivers to the unencrypted bounce buffers shared with the hypervisor, and vice versa. If the kernel is not running in SEV guest mode, which means as normal host or non-SEV guest, no bounce buffers are used. from hshoexer@; based on ancient code of mickey@; OK kettenis@
2024-08-04	Add intelpmc(4), a driver for the power management controller found on	Mark Kettenis
	various Intel SoCs. The driver takes care of calling the AML methods needed to enter low power idle states during suspend-to-idle (S0i). The driver also implements some debug code that prints the residency of various power states in dmesg. Based on some earlier code by jcs@ ok jcs@
2024-07-21	For AMD SEV determine C-bit position and guest mode in locore0.	Alexander Bluhm
	Actually determine the C-bit position if we are running as a guest with SEV enabled. Configure pg_crypt, pg_frame and pg_lgframe accordingly, using the physical address bit reduction provided by cpuid. from hshoexer@; OK mlarkin@
2024-07-14	Add elf_aux_info(3)	Jeremie Courreges-Anglas
	Designed to let userland peek at AT_HWCAP and AT_HWCAP2 using an already existing interface coming from FreeBSD. Headers bits were snatched from there. Input & ok kettenis@ libc bump and sets sync will follow soon
2024-07-14	vmm(4)/vmx: update host cr3, invept on cpu migration.	Dave Voutila
	Since vmm handles nested page faults in the vcpu run loop, trying to avoid trips back to userland, it's possible for the thread to move host cpus. vmm(4) already updates some local cpu state when this happens, but also needs to update the host cr3 in the vmcs to allow vmx to restore the proper cr3 value on the next vm exit. Additionally, we should be flushing the ept cache on the new cpu. If the single context flush is available, use that instead of the global flush. ok mlarkin@
2024-07-10	Split vmd into mi/md parts.	Dave Voutila
	Makes as much of the core of vmd mi, pushing x86-isms into separate compilation units. Adds build logic for arm64, but no emulation yet. (You can build vmd, but it won't have a vmm device to connect to.) Some more cleanup probably needed around interrupt controller abstraction, but that can come as we implement more than the i8259. ok mlarkin@
2024-07-09	Prepare pmap for using the AMD SEV C-bit to encrypt guest memory.	Alexander Bluhm
	The C-bit in a page table entry is used by a SEV guest to specify, which pages are to be encrypted and which not. The latter is needed to share pages with the hypervisor for virtio(4). The actual position of the C-bit within a PTE is CPU implementation dependend and needs to be determined dynamically at system boot. The position of the C-bit also determines the actual size of page frame mask. This will be provided by a separate change. To be able to use the same kernel as both host and guest, the C-bit is provided as variable similar to the NX-bit. Same holds for the page frame masks. Right now, pg_crypt is set to 0, pg_frame an pg_lgframe to PG_FRAME and PG_LGFRAME respectively. Thus the kernel works as a host system same as before. Also introduce a PMAP_NOCRYPT flag. A guest will use this with busdma to establish unencrypted mappings that can be shared with the hypervisor. from hshoexer@; OK mlarkin@
2024-07-09	vmd/vmm: move vm_run_params into mi header.	Dave Voutila
	To prepare for mi/md splitting vmd, need to fixup the dev/vmm/vmm.h mi header. Move the vm_run_params struct and clean up the includes in vmd. "sure", mlarkin@
2024-06-24	Show AMD SEV bits during identify CPU in dmesg.	Alexander Bluhm
	Enable identifycpu() to discover and show AMD SEV related information provided by cpuid. The "crypt bit" for page table entries is stored in amd64_pos_cbit, although it is not used yet. Registers ecx and edx provide the number of guest and minimum ASID for SEV-only guests. At least the latter value can be configured in the BIOS, so it is useful to have this information in dmesg. Therefore define emtpy bit masks for printf("%b") to get the raw numbers. from hshoexer@; OK mlarkin@
2024-06-09	Add a compiler barrier where missing in CPU_BUSY_CYCLE() implems	Jeremie Courreges-Anglas
	Having differences between architectures is asking for problems. And adding a barrier here just makes sense in most cases. This is also what cpu_relax() provides in Linux land. ok kettenis@ claudio@
2024-06-09	remove unused prototypes and pin number defines	Jonathan Gray

2024-06-07	Make sure we select the deepest possible C-state during suspend-to-idle.	Mark Kettenis
	ok deraadt@, guenther@, mlarkin@, jsg@
2024-05-29	Implement the guts for "suspend-to-idle" on amd64. This enables suspend	Mark Kettenis
	on machines that don't support S3. In its current state it doesn't save a lot of power, but this should improve over time. Implementation of wakeup methods is incomplete which means that some machine can't resume at the moment. ok mglocker@, mlarkin@, stsp@, deraadt@
2024-05-26	Implement wakeup interrupts on amd64. Provide a dummy implementation for	Mark Kettenis
	i386 such that we can call the necessary hooks in the suspend/resume code without adding #ifdefs. Tweak the arm64 implementation such that we can call the hooks earlier as this is necessary to mask MSI and MSI-X interrupts on arm64. ok deraadt@, mlarkin@
2024-05-22	remove prototypes with no matching function and externs with no var	Jonathan Gray

2024-05-21	remove switch_exit() prototypes, replaced by sched_exit()	Jonathan Gray

2024-05-14	Delete the declaration of cpu_feature which has been unused since	Philip Guenther
	rev 1.17 (2017-5-27) when tlbflushg() stopped using it
2024-05-13	remove some unused defines and externs	Jonathan Gray
	isaphysmem and isaphysmempgs were removed in 1998 ok kettenis@
2024-05-12	Delete the cpu_perf_e[abd]x and cpu_apmi_edx globals and move the	Philip Guenther
	cpuid uses into identifycpu(), as they aren't needed anywhere else. ok kettenis@
2024-05-11	Use %b to format cpu flag info in dmesg, so we have the raw values	Philip Guenther
	too. This is also much more space efficient. Reduce the cpu flag noise in dmesg by suppressing lines and registers that are identical with the previous CPU and show -/+ info if there are any differences. particular feedback from deraadt@, kettenis@, jsg@, and dv@ ok deraadt@
2024-05-07	drop the MD byte-swap micro-optimizations on clang architectures	Christian Weisgerber
	The compiler already translates the generic code into arithmetic byte-swap instructions or byte-swapping memory load and store instructions if available on an architecture. ok deraadt@ guenther@
2024-05-01	Add per-CPU caches to the pmemrange allocator.	Martin Pieuchot
	The caches are used primarily to reduce contention on uvm_lock_fpageq() during concurrent page faults. For the moment only uvm_pagealloc() tries to get a page from the current CPU's cache. So on some architectures the caches are also used by the pmap layer. Each cache is composed of two magazines, design is borrowed from jeff bonwick vmem's paper and the implementation is similar to the one of pool_cache from dlg@. However there is no depot layer and magazines are refilled directly by the pmemrange allocator. This version includes splvm()/splx() dances because the buffer cache flips buffers in interrupt context. So we have to prevent recursive accesses to per-CPU magazines. Tested by naddy@, solene@, krw@, robert@, claudio@ and Laurence Tratt. ok claudio@, kettenis@
2024-04-29	vmm & vmd: drop "continue" flag to simplify running a vcpu.	Dave Voutila
	There's no need to distinguish the "first" time running a vcpu from the subsequent times because vmm(4) uses in-kernel state tracking the last vm exit reason to optimize the logic for updating vcpu registers from userland. While here, clean up the DPRINTF's to make the Intel VMX logic similar to the AMD SVM. ok mlarkin@
2024-04-19	Revert per-CPU caches a double-free has been found by naddy@.	Martin Pieuchot

2024-04-17	Add per-CPU caches to the pmemrange allocator.	Martin Pieuchot
	The caches are used primarily to reduce contention on uvm_lock_fpageq() during concurrent page faults. For the moment only uvm_pagealloc() tries to get a page from the current CPU's cache. So on some architectures the caches are also used by the pmap layer. Each cache is composed of two magazines, design is borrowed from jeff bonwick vmem's paper and the implementation is similar to the one of pool_cache from dlg@. However there is no depot layer and magazines are refilled directly by the pmemrange allocator. Tested by robert@, claudio@ and Laurence Tratt. ok kettenis@
2024-04-14	Implement support for AVX-512. This required some fixes to the so-far	Mark Kettenis
	unused Skylake AVX-512 MDS handler and increases the ci_mds_tmp array to 64 bytes. With help from guenther@ ok deraadt@, guenther@
2024-04-11	correct value of XFEATURE_AMX	Jonathan Gray
	ok miod@ guenther@
2024-04-09	vmm/vmd: add exception injection and refactor inject api.	Dave Voutila
	In order to continue work on mmio and other instruction emulation, vmd(8) needs the ability to inject exceptions (like page faults) from userland. Refactor the way events are injected from userland, cleaning up how hardware (external) interrupts are injected in the process. ok mlarkin@
2024-04-03	Add ci_cpuid_level and ci_vendor holding the per-CPU basic cpuid	Philip Guenther
	level and a numeric mapping of the cpu vendor, both from CPUID(0). Convert the general use of strcmp(cpu_vendor) to simple numeric tests of ci_vendor. Track the minimum of all ci_cpuid_level in the cpuid_level global and continue to use that for what we vmm exposes. AMD testing help matthieu@ krw@ ok miod@ deraadt@ cheloha@
2024-04-01	Delete 108 lines of ASM from vmx_enter_guest() that predated lots	Philip Guenther
	of later enhancements, removing the save/restore of flags, selectors, and MSRs: flags are caller-saved and don't need restoring while selectors and MSRs are auto-restored. The FSBASE, GSBASE, and KERNELGSBASE MSRs just need the correct values set with vmwrite() in the "on new CPU?" block of vcpu_run_vmx(). Also, only rdmsr(MSR_MISC_ENABLE) once in vcpu_reset_regs_vmx(), give symbolic names to the exit-load MSR slots, eliminate VMX_NUM_MSR_STORE, and #if 0 the vc_vmx_msr_entry_load_{va,pa} code and definitions as unused. ok dv@
2024-03-17	Use VERW to mitigate the RFDS (Register File Data Sampling) vulnerability	Philip Guenther
	present in Intel Atom CPUs, reordering some ASM in return-to-userspace and start/resume-vmx-guest to reduce the number of kernel values still live in registers when VERW is used. This mitigation requires updated firmware which has affected CPUs report RFDS_CLEAR in dmesg. Firmware packaging by jsg@ and sthen@ Logic for interpreting intel's flags by jsg@ after lots of discussion between him, deraadt@, and I ok deraadt@
2024-02-25	We don't do compat32 so MSR_CSTAR shouldn't be set up: delete the	Philip Guenther
	Xsyscall32 stub and UCODE32 selector, set MSR_CSTAR to zero at CPU startup, and rezero on ACPI resume and VM exit. requested a while ago by deraadt@ AMD VM testing chris@ testing and ok krw@
2024-02-25	clockintr: rename "struct clockintr_queue" to "struct clockqueue"	Scott Soule Cheloha
	The code has outgrown the original name for this struct. Both the external and internal APIs have used the "clockqueue" namespace for some time when operating on it, and that name is eyeball-consistent with "clockintr" and "clockrequest", so "clockqueue" it is.
2024-02-12	Retpolines are an anti-pattern for IBT, so we need to shift protecting	Philip Guenther
	userspace from cross-process BTI to the kernel. Have each CPU track the last pmap run on in userspace and the last vmm VCPU in guest-mode and use the IBPB msr to flush predictors right before running in userspace on a different pmap or entering guest-mode on a different VCPU. Codepatch-nop the userspace bits and conditionalize the vmm bits to keep working if IBPB isn't supported. ok deraadt@ kettenis@
2024-02-03	Add new amd64-only sysctl machdep.retpoline which says whether the cpu	Theo de Raadt
	requires retpoline. If 0, we should do everything in our power to avoid pure retpoline (replacing it with a simple thunk where possible), because by it's nature retpoline converts an indirect-branch into a direct branch (push to stack & ret), and therefore it is an IBT (endbr64) bypass method. This sysctl leverages guenther's decision-making logic in the kernel, which already uses codepatch to fix the kernel retpoline thunk. In my opinion, the retpoline-using logic really should be flipped; ROP execution bypassing IBT to re-enter regular control flow is more dangerous than spectre. ok kettenis
2024-01-31	Swap the r10 and rcx registers in the amd64 trapframe so that the	Philip Guenther
	first six entries are in the same order as syscall arguments, such that syscall() can just use the trapframe as the argument vector for mi_syscall() and not need to reorder into another buffer on the stack. This doesn't affect coredump layout or ptrace(2), but does affect kernel crash dumps. Possibility noted during miod@'s cleanup of the MD syscall() implementations ok mlarkin@ kurt@