summaryrefslogtreecommitdiff
path: root/sys/arch/amd64/include
AgeCommit message (Collapse)Author
2024-11-08psp(4) waits for acknowledgement of wbinvd from other CPU.Alexander Bluhm
If any other CPU has not finished wbinvd, PSP command may fail. To avoid races, call wbinvd_on_all_cpus_acked() which waits for acknowledgement from IPI handler. Provide stub to build non-MP kernels. from hshoexer@; OK mlarkin@
2024-11-08remove unused pmap_move()Jonathan Gray
2024-11-08remove unused VM_MAXUSER_ADDRESS32Jonathan Gray
2024-11-07Expand amd64 wbinvd_on_all_cpus() with acknowledge.Alexander Bluhm
Implement wbinvd_on_all_cpus_acked() similar to pmap_tlb_shootpage(). This ensures, wbinvd has been executed on all cores when the function returns. This is needed to avoid psp(4) races. from hshoexer@; OK mlarkin@
2024-10-22put opening { on same line as struct nameJonathan Gray
ok claudio@
2024-10-22remove prototypes with no matching functionJonathan Gray
2024-10-21remove unused MP_PICMODE defineJonathan Gray
2024-10-07Remove VMFUNC feature detection and tracking.Dave Voutila
vmm(4) doesn't use the VMX VMFUNC instruction. ok mlarkin@
2024-10-02Move some PCI MMIO defines from vmm(4) kernel headers to userland.Dave Voutila
vmm(4) doesn't need this information anymore. vmd(8) is the only consumer of this information. ok mlarkin@
2024-09-26Add an ipi for executing INVEPT to flush EPT on remote cpus.Dave Voutila
Similar to how the fast ipi for tlb flush is implemented, this adds one for calling INVEPT to invalidate EPT caches on the cpu. This is the first step to allowing guest memory to not be wired by UVM and decreases the behavioral differences between Intel and AMD's nested paging in vmm(4) and pmap(9). This change does not hook EPT ptes into the PV list, so the ipi is only used during address space teardown and pte removal. (With the removal of the "mprotect" ioctl, vmm(4) no longer modifies EPT ptes other than inserting them and removing them.) ok mlarkin@
2024-09-21vmm(4): remove EPT mprotect ioctlMike Larkin
This old ioctl isn't used by vmd(8) and is getting in the way of some improvements we want to do. It was used by solo5 but the person who was helping maintain this is no longer involved with that project. ok dv
2024-09-04make psp attach to ccp as a different driverJonathan Gray
'fine with me' hshoexer, ok bluhm@
2024-09-01Pledge "vmm" for ccp(4) ioctl(2).Alexander Bluhm
Limit ccp ioctls to processes that pledge vmm. Specific psp device ioctls for AMD SEV will allowed for vmd(8). from hshoexer@; input deraadt@ jsg@
2024-08-27Enable AMD SEV support in vmm(4).Alexander Bluhm
Bring the pieces for vmm(4) to support guests with SEV memory encryption on AMD CPUs. The corresponding vmd(8) changes will follow. Emulate cpuid 0x8000001f so the guest can discover SEV features. Allow vmd(8) to enable SEV on VM creation. Inform vmd(8) about the c-bit position and ASID assigned to each VCPU. Note that vmd(8) has to be rebuilt with the new header files. from hshoexer@; input dv@; OK mlarkin@
2024-08-14Implement bounce buffering for AMD SEV in amd64 bus dma.Alexander Bluhm
When running as SEV guest, as indicated by variable cpu_sev_guestmode, allocate additional pages for each segment on dma map creation. These pages are mapped with the PMAP_NOCRYPT attribute, i.e. the crypt bit is not set in the PTE. Thus, these pages are shared with the hypervisor. When the map is loaded with actual pages, the address in the descriptor is replaced by the corresponding bounce buffer. Using bus_dmamap_sync(), data is copied from the encrypted pages used by guest drivers to the unencrypted bounce buffers shared with the hypervisor, and vice versa. If the kernel is not running in SEV guest mode, which means as normal host or non-SEV guest, no bounce buffers are used. from hshoexer@; based on ancient code of mickey@; OK kettenis@
2024-08-04Add intelpmc(4), a driver for the power management controller found onMark Kettenis
various Intel SoCs. The driver takes care of calling the AML methods needed to enter low power idle states during suspend-to-idle (S0i). The driver also implements some debug code that prints the residency of various power states in dmesg. Based on some earlier code by jcs@ ok jcs@
2024-07-21For AMD SEV determine C-bit position and guest mode in locore0.Alexander Bluhm
Actually determine the C-bit position if we are running as a guest with SEV enabled. Configure pg_crypt, pg_frame and pg_lgframe accordingly, using the physical address bit reduction provided by cpuid. from hshoexer@; OK mlarkin@
2024-07-14Add elf_aux_info(3)Jeremie Courreges-Anglas
Designed to let userland peek at AT_HWCAP and AT_HWCAP2 using an already existing interface coming from FreeBSD. Headers bits were snatched from there. Input & ok kettenis@ libc bump and sets sync will follow soon
2024-07-14vmm(4)/vmx: update host cr3, invept on cpu migration.Dave Voutila
Since vmm handles nested page faults in the vcpu run loop, trying to avoid trips back to userland, it's possible for the thread to move host cpus. vmm(4) already updates some local cpu state when this happens, but also needs to update the host cr3 in the vmcs to allow vmx to restore the proper cr3 value on the next vm exit. Additionally, we should be flushing the ept cache on the new cpu. If the single context flush is available, use that instead of the global flush. ok mlarkin@
2024-07-10Split vmd into mi/md parts.Dave Voutila
Makes as much of the core of vmd mi, pushing x86-isms into separate compilation units. Adds build logic for arm64, but no emulation yet. (You can build vmd, but it won't have a vmm device to connect to.) Some more cleanup probably needed around interrupt controller abstraction, but that can come as we implement more than the i8259. ok mlarkin@
2024-07-09Prepare pmap for using the AMD SEV C-bit to encrypt guest memory.Alexander Bluhm
The C-bit in a page table entry is used by a SEV guest to specify, which pages are to be encrypted and which not. The latter is needed to share pages with the hypervisor for virtio(4). The actual position of the C-bit within a PTE is CPU implementation dependend and needs to be determined dynamically at system boot. The position of the C-bit also determines the actual size of page frame mask. This will be provided by a separate change. To be able to use the same kernel as both host and guest, the C-bit is provided as variable similar to the NX-bit. Same holds for the page frame masks. Right now, pg_crypt is set to 0, pg_frame an pg_lgframe to PG_FRAME and PG_LGFRAME respectively. Thus the kernel works as a host system same as before. Also introduce a PMAP_NOCRYPT flag. A guest will use this with busdma to establish unencrypted mappings that can be shared with the hypervisor. from hshoexer@; OK mlarkin@
2024-07-09vmd/vmm: move vm_run_params into mi header.Dave Voutila
To prepare for mi/md splitting vmd, need to fixup the dev/vmm/vmm.h mi header. Move the vm_run_params struct and clean up the includes in vmd. "sure", mlarkin@
2024-06-24Show AMD SEV bits during identify CPU in dmesg.Alexander Bluhm
Enable identifycpu() to discover and show AMD SEV related information provided by cpuid. The "crypt bit" for page table entries is stored in amd64_pos_cbit, although it is not used yet. Registers ecx and edx provide the number of guest and minimum ASID for SEV-only guests. At least the latter value can be configured in the BIOS, so it is useful to have this information in dmesg. Therefore define emtpy bit masks for printf("%b") to get the raw numbers. from hshoexer@; OK mlarkin@
2024-06-09Add a compiler barrier where missing in CPU_BUSY_CYCLE() implemsJeremie Courreges-Anglas
Having differences between architectures is asking for problems. And adding a barrier here just makes sense in most cases. This is also what cpu_relax() provides in Linux land. ok kettenis@ claudio@
2024-06-09remove unused prototypes and pin number definesJonathan Gray
2024-06-07Make sure we select the deepest possible C-state during suspend-to-idle.Mark Kettenis
ok deraadt@, guenther@, mlarkin@, jsg@
2024-05-29Implement the guts for "suspend-to-idle" on amd64. This enables suspendMark Kettenis
on machines that don't support S3. In its current state it doesn't save a lot of power, but this should improve over time. Implementation of wakeup methods is incomplete which means that some machine can't resume at the moment. ok mglocker@, mlarkin@, stsp@, deraadt@
2024-05-26Implement wakeup interrupts on amd64. Provide a dummy implementation forMark Kettenis
i386 such that we can call the necessary hooks in the suspend/resume code without adding #ifdefs. Tweak the arm64 implementation such that we can call the hooks earlier as this is necessary to mask MSI and MSI-X interrupts on arm64. ok deraadt@, mlarkin@
2024-05-22remove prototypes with no matching function and externs with no varJonathan Gray
2024-05-21remove switch_exit() prototypes, replaced by sched_exit()Jonathan Gray
2024-05-14Delete the declaration of cpu_feature which has been unused sincePhilip Guenther
rev 1.17 (2017-5-27) when tlbflushg() stopped using it
2024-05-13remove some unused defines and externsJonathan Gray
isaphysmem and isaphysmempgs were removed in 1998 ok kettenis@
2024-05-12Delete the cpu_perf_e[abd]x and cpu_apmi_edx globals and move thePhilip Guenther
cpuid uses into identifycpu(), as they aren't needed anywhere else. ok kettenis@
2024-05-11Use %b to format cpu flag info in dmesg, so we have the raw valuesPhilip Guenther
too. This is also much more space efficient. Reduce the cpu flag noise in dmesg by suppressing lines and registers that are identical with the previous CPU and show -/+ info if there are any differences. particular feedback from deraadt@, kettenis@, jsg@, and dv@ ok deraadt@
2024-05-07drop the MD byte-swap micro-optimizations on clang architecturesChristian Weisgerber
The compiler already translates the generic code into arithmetic byte-swap instructions or byte-swapping memory load and store instructions if available on an architecture. ok deraadt@ guenther@
2024-05-01Add per-CPU caches to the pmemrange allocator.Martin Pieuchot
The caches are used primarily to reduce contention on uvm_lock_fpageq() during concurrent page faults. For the moment only uvm_pagealloc() tries to get a page from the current CPU's cache. So on some architectures the caches are also used by the pmap layer. Each cache is composed of two magazines, design is borrowed from jeff bonwick vmem's paper and the implementation is similar to the one of pool_cache from dlg@. However there is no depot layer and magazines are refilled directly by the pmemrange allocator. This version includes splvm()/splx() dances because the buffer cache flips buffers in interrupt context. So we have to prevent recursive accesses to per-CPU magazines. Tested by naddy@, solene@, krw@, robert@, claudio@ and Laurence Tratt. ok claudio@, kettenis@
2024-04-29vmm & vmd: drop "continue" flag to simplify running a vcpu.Dave Voutila
There's no need to distinguish the "first" time running a vcpu from the subsequent times because vmm(4) uses in-kernel state tracking the last vm exit reason to optimize the logic for updating vcpu registers from userland. While here, clean up the DPRINTF's to make the Intel VMX logic similar to the AMD SVM. ok mlarkin@
2024-04-19Revert per-CPU caches a double-free has been found by naddy@.Martin Pieuchot
2024-04-17Add per-CPU caches to the pmemrange allocator.Martin Pieuchot
The caches are used primarily to reduce contention on uvm_lock_fpageq() during concurrent page faults. For the moment only uvm_pagealloc() tries to get a page from the current CPU's cache. So on some architectures the caches are also used by the pmap layer. Each cache is composed of two magazines, design is borrowed from jeff bonwick vmem's paper and the implementation is similar to the one of pool_cache from dlg@. However there is no depot layer and magazines are refilled directly by the pmemrange allocator. Tested by robert@, claudio@ and Laurence Tratt. ok kettenis@
2024-04-14Implement support for AVX-512. This required some fixes to the so-farMark Kettenis
unused Skylake AVX-512 MDS handler and increases the ci_mds_tmp array to 64 bytes. With help from guenther@ ok deraadt@, guenther@
2024-04-11correct value of XFEATURE_AMXJonathan Gray
ok miod@ guenther@
2024-04-09vmm/vmd: add exception injection and refactor inject api.Dave Voutila
In order to continue work on mmio and other instruction emulation, vmd(8) needs the ability to inject exceptions (like page faults) from userland. Refactor the way events are injected from userland, cleaning up how hardware (external) interrupts are injected in the process. ok mlarkin@
2024-04-03Add ci_cpuid_level and ci_vendor holding the per-CPU basic cpuidPhilip Guenther
level and a numeric mapping of the cpu vendor, both from CPUID(0). Convert the general use of strcmp(cpu_vendor) to simple numeric tests of ci_vendor. Track the minimum of all ci_cpuid_level in the cpuid_level global and continue to use that for what we vmm exposes. AMD testing help matthieu@ krw@ ok miod@ deraadt@ cheloha@
2024-04-01Delete 108 lines of ASM from vmx_enter_guest() that predated lotsPhilip Guenther
of later enhancements, removing the save/restore of flags, selectors, and MSRs: flags are caller-saved and don't need restoring while selectors and MSRs are auto-restored. The FSBASE, GSBASE, and KERNELGSBASE MSRs just need the correct values set with vmwrite() in the "on new CPU?" block of vcpu_run_vmx(). Also, only rdmsr(MSR_MISC_ENABLE) once in vcpu_reset_regs_vmx(), give symbolic names to the exit-load MSR slots, eliminate VMX_NUM_MSR_STORE, and #if 0 the vc_vmx_msr_entry_load_{va,pa} code and definitions as unused. ok dv@
2024-03-17Use VERW to mitigate the RFDS (Register File Data Sampling) vulnerabilityPhilip Guenther
present in Intel Atom CPUs, reordering some ASM in return-to-userspace and start/resume-vmx-guest to reduce the number of kernel values still live in registers when VERW is used. This mitigation requires updated firmware which has affected CPUs report RFDS_CLEAR in dmesg. Firmware packaging by jsg@ and sthen@ Logic for interpreting intel's flags by jsg@ after lots of discussion between him, deraadt@, and I ok deraadt@
2024-02-25We don't do compat32 so MSR_CSTAR shouldn't be set up: delete thePhilip Guenther
Xsyscall32 stub and UCODE32 selector, set MSR_CSTAR to zero at CPU startup, and rezero on ACPI resume and VM exit. requested a while ago by deraadt@ AMD VM testing chris@ testing and ok krw@
2024-02-25clockintr: rename "struct clockintr_queue" to "struct clockqueue"Scott Soule Cheloha
The code has outgrown the original name for this struct. Both the external and internal APIs have used the "clockqueue" namespace for some time when operating on it, and that name is eyeball-consistent with "clockintr" and "clockrequest", so "clockqueue" it is.
2024-02-12Retpolines are an anti-pattern for IBT, so we need to shift protectingPhilip Guenther
userspace from cross-process BTI to the kernel. Have each CPU track the last pmap run on in userspace and the last vmm VCPU in guest-mode and use the IBPB msr to flush predictors right before running in userspace on a different pmap or entering guest-mode on a different VCPU. Codepatch-nop the userspace bits and conditionalize the vmm bits to keep working if IBPB isn't supported. ok deraadt@ kettenis@
2024-02-03Add new amd64-only sysctl machdep.retpoline which says whether the cpuTheo de Raadt
requires retpoline. If 0, we should do everything in our power to avoid pure retpoline (replacing it with a simple thunk where possible), because by it's nature retpoline converts an indirect-branch into a direct branch (push to stack & ret), and therefore it is an IBT (endbr64) bypass method. This sysctl leverages guenther's decision-making logic in the kernel, which already uses codepatch to fix the kernel retpoline thunk. In my opinion, the retpoline-using logic really should be flipped; ROP execution bypassing IBT to re-enter regular control flow is more dangerous than spectre. ok kettenis
2024-01-31Swap the r10 and rcx registers in the amd64 trapframe so that thePhilip Guenther
first six entries are in the same order as syscall arguments, such that syscall() can just use the trapframe as the argument vector for mi_syscall() and not need to reorder into another buffer on the stack. This doesn't affect coredump layout or ptrace(2), but does affect kernel crash dumps. Possibility noted during miod@'s cleanup of the MD syscall() implementations ok mlarkin@ kurt@