summaryrefslogtreecommitdiff
path: root/sys/arch/amd64/include
AgeCommit message (Collapse)Author
2023-04-25vmm(4)/vmd(8): pull struct members out of vmm ioctl create struct.Dave Voutila
The object sent to vmm(4) contained file paths and details the kernel does not need for cpu virtualization as device emulation is in userland. Effectively, "pull up" the struct members from the vm_create_params struct to the parent vmop_create_params struct. This allows us to clean up some of vmd(8) and simplify things for switching to having vmctl(8) open the "kernel" file (SeaBIOS, bsd.rd, etc.) to allow users to boot recovery ramdisk kernels. ok mlarkin@
2023-04-22Rename the XCR0_* #defines to XFEATURE_* and add the new supervisor-statePhilip Guenther
features: while all are appropriate for xsaves/xrstors, the supervisor-state features aren't for xcr0 but rather for the new XSS_MSR, making the current names kinda confusing. Add #defines for masking bits for xcr0 vs XSS. Add and report the new XSAVE_XFD xsave subfeature bit. ok mlarkin@
2023-04-17Add endbr64 instructions to most of the ENTRY() macros.Theo de Raadt
The IDTVEC() and KIDTVEC() macros also get a endbr64, and therefore we need to change the way that vectors are aliased with a new IDTVEC_ALIAS() macro. with guenther, jsg
2023-04-15add endbr defines and control protection trapJonathan Gray
ok deraadt@
2023-04-14add VMX/VMCS defines for amd64 endbr64 featuresDave Voutila
"these are fine," mlarkin@
2023-04-13pmap_copy() has never, ever, been implemented in any of the platforms OpenBSDMiod Vallat
ever ran on, and it's unlikely to ever be implemented, so remove it. ok jsg@
2023-04-10spellingJonathan Gray
2023-03-26amd64: identify IBT capability in cpu(4) dmesg linesMike Larkin
requested by and ok deraadt@
2023-03-19Aggressively randomize the location of the stack on all 64-bit architecturesMark Kettenis
except alpha. This will put the stack at a random location in the upper 1/4th of the userland virtual address space providing up to 26 additional bits of randomness in the address. Skip alpha for now since it currently puts the stack at a (for a 64-bit architecture) very low address. Skip 32-bit architectures for now as well since those have a much smaller virtual address space and we need more time to figure out what a safe amount of extra randomizations is. These architectures will continue to use a mildly randomized stack address through the existing stackgap random mechanism. We will revisit this after 7.3 is released. This should make it harder for an attacker to find the stack. ok deraadt@, miod@
2023-01-31On systems without xonly mmu hardware-enforcement, we can still mitigateTheo de Raadt
against classic BROP with a range-checking wrapper in front of copyin() and copyinstr() which ensures the userland source doesn't overlap the main program text, ld.so text, signal tramp text (it's mapping is hard to distinguish so it comes along for the ride), or libc.so text. ld.so tells the kernel libc.so text range with msyscall(2). The range checking for 2-4 elements is done without locking (because all 4 ranges are immutable!) and is inexpensive. write(sock, &open, 400) now fails with EFAULT. No programs have been discovered which require reading their own text segments with a system call. On a machine without mmu enforcement, a test program reports the following: userland kernel ld.so readable unreadable mmap xz unreadable unreadable mmap x readable readable mmap nrx readable readable mmap nwx readable readable mmap xnwx readable readable main readable unreadable libc unmapped? readable unreadable libc mapped readable unreadable ok kettenis, additional help from miod
2023-01-30vmm(4): save and restore guest pkru.Dave Voutila
Take a simple approach for saving and restoring PKRU if the host has PKE support enabled. Uses explicit rdpkru/wrpkru instructions for now instead of xsave. This functionality is still gated behind amd64 pmap checking for operation under a hypervisor as well as vmm masking the cpuid bit for PKU. "if your diff is good, then commit it" -deraadt@
2023-01-28Move some header definitions from vmm(4) to vmd(8).Dave Voutila
Part of an ongoing effort to move userland-specific information out of a kernel header and directly into vmd(8). No functional change. ok mlarkin@
2023-01-20On cpu with the PKU feature, prot=PROT_EXEC pages now create pte whichTheo de Raadt
contain PG_XO, which is PKU key1. On every exit from kernel to userland, force the PKU register to inhibit data read against key1 memory. On (some) traps into the kernel if the PKU register is changed, abort the process (processes have no reason to change the PKU register). This provides us with viable xonly functionality on most modern intel & AMD cpus. I started with a xsave-based diff from dv@, but discovered the fpu save/restore logic wasn't a good fit and went to direct register management. Disabled on HV (vm) systems until we know they handle PKU correctly. ok kettenis, dv, guenther, etc
2023-01-19Revise implementation of pmap_protect(9) in preparation for execute-onlyMark Kettenis
support. The current implementation doesn't handle the transition from RWX to RW correctly. Also generalize the pmap_write_protect() function in recognition of the fact that execute permission, write permission, and in the future read permission on executable pages, are handled by separate bits. ok deraadt@, mpi@
2023-01-19Restrict vmm(4) exposed cpuid extended feature flags.Dave Voutila
We don't emulate or support most of the EAX=7,ECX=0 feature bits, so restrict the mask further to just UMIP. ok deraadt@
2023-01-18change BIOSF_SMBIOS bit flag from 6 to 8Jonathan Gray
matches tom@'s i386 rev 1.47 change
2023-01-17Simplify and clarify the implementation of the pmap_page_protect(9) API.Mark Kettenis
This function is only ever called with PROT_NONE or PROT_READ where PROT_NONE removes the mapping from the page tables and PROT_READ takes away write permission. Add a KASSERT to make sure no other values are passed. This KASSERT should be optimized away by any decent compiler. ok deraadt@, mpi@, guenther@
2023-01-163 new defines: he PTE protection key mask, the specific key value we useTheo de Raadt
for execute-only, and the PKU value used by userland to use that key.
2023-01-14Implement access to EFI variables and ESRT through an ioctl(2) interfaceMark Kettenis
that is compatible with what FreeBSD and NetBSD have. Setting EFI variables is only allowed at securelevel 0 and below. Heavily based on work done by Sergii Dmytruk. ok yasuoka@
2023-01-14add protection-key violation error code for page-fault exceptionsJonathan Gray
ok deraadt@
2023-01-14recognise protection keys for supervisor-mode (PKS) in cpuidJonathan Gray
ok deraadt@
2023-01-14sync cr4 and xcr0 bits with intel dec 2022 sdmJonathan Gray
ok deraadt@
2023-01-10Hide WAITPKG cpu feature from vmm(4) guests.Dave Voutila
Alder Lake and similar-era Intel platforms introduced new userland wait instructions. Since vmm was passing this cpuid bit into guests, some would attempt TPAUSE instructions and trigger invalid instruction exceptions because VMX requires additional configuration to support emulation. This also adds WAITPKG to i386 and amd64 cpu feature identification. Input from anton@, cheloha@, and guenther@. Tested by jmatthew@. OK deraadt.
2023-01-02Let the EFI bootloader make a copy of the EFI System Resource Table (ESRT)Mark Kettenis
and pass it to the kernel. ok jca@, patrick@
2022-12-26vmd(8): provide a detailed e820 memory map.Dave Voutila
When booting guests with SeaBIOS, vmd(8) supplied details about the available guest memory via CMOS registers. Consequently, we've been carrying some patches in the ports tree to SeaBIOS to fetch this information like it's the 1990s. When a vm initializes memory ranges, we now track what each range represents. This information can be used to supply the e820 memory map to SeaBIOS via the fw_cfg interface allowing it to properly communicate memory ranges to a guest operating system. (This will also allow us to drop some patches from the port.) Given the ranges can now be marked with a purpose, this also allows vmm(4) to switch from hard-coded mmio ranges and instead let the information on the memory range dictate if vmm should be handling a page fault or sending to vmd for a memory assist. Tested by Mischa Peters and others. OK mlarkin@.
2022-12-01_C_LABEL() is no longer useful in the "everything is ELF" world.Philip Guenther
Start eliminating it. ok mpi@ mlarkin@ krw@
2022-11-29Move the generic variable definitions from the ASM at the top ofPhilip Guenther
locore.S to be in C in cpu.c, machdep.c, pmap.c, or bus_space.c for better typing/debug info. Delete REALBASEMEM, REALEXTMEM, and biosextmem as unused/ignored. ok mpi@ krw@ mlarkin@
2022-11-10vmd(8): import mmio decode and emulation, disabled for now.Dave Voutila
The initial mmio support for vmd adds support for only specific MOV and MOVZX instructions. Plan is to begin iterating in-tree on other missing pieces. All functionality is gated behind an #if for now. Only change to vmm(4) is reordering register #define's in vmmvar.h. ok mlarkin@
2022-11-09vmm(4): treat vcpu lists as immutable, reducing complexity.Dave Voutila
Since vmm doesn't support hot-plug vcpus we can reduce complexity by treating the vcpu list per vm as immutable after creation. As a consequence, we can use the vm reference count to protect the lifetime of the vcpus, removing the need for reference counting individual vcpu objects. With an immutable list, we no longer need a rwlock protecting it either. Original diff from dlg@ that I reworked and tested. ok dlg@, mlarkin@
2022-11-08further speed up delivery of interrupts to a running vcpu.David Gwynne
this records which physical cpu a vcpu is running on. this is used by the code that marks a vcpu as having a pending interrupt to check if the vcpu is currently running. if it thinks the vcpu is running, it sends a nop IPI to the physical cpu it is running on to trigger a vmexit, which in turn runs interrupt handling in the guest. ok mlarkin@
2022-11-08amd64: switch to clockintr(9)Scott Soule Cheloha
Switch amd64 to the clockintr(9) subsystem. There are lots of little changes, but the bigs ones are listed here. When using the local apic timer: - Run the timer in one-shot mode. - lapic_delay() is gone. We can't use it to delay(9) when running the timer in one-shot mode. - Add a randomized statclock(); stathz = hz. - Add support for switching to profhz when profiling is enabled; profhz = stathz * 10. When using the i8254/mc146818: - i8254's clockintr() no longer has a monopoly on hardclock(). - mc146818's rtcintr() no longer has a monopoly on statclock(). - In profiling mode, the statclock() will drift very slightly because (profhz = 1024) does not divide evenly into one billion. We could avoid this by setting (profhz = 512) instead and programming the RTC to run at that rate. Early revisions reviewed by mlarkin@. Extensively tested by mlarkin@ on a variety of physical and virtual hardware. Additional testing from dv@ and jmc@. Link: https://marc.info/?l=openbsd-tech&m=166776339203279&w=2 ok kettenis@ mlarkin@
2022-11-08amd64: add delay_fini()Scott Soule Cheloha
Not all of the clocks with a delay(9) implementation necessarily keep ticking across suspend/resume. We need a clean way to reverse delay_init() during suspend when those clocks stop ticking. Hence, delay_fini(). delay_fini() resets delay_func() to i8254_delay() if the given function pointer is the active delay(9) implementation. ok mlarkin@
2022-10-24tsc: AMD Family 17h, 19h: compute frequency from Core::X86::Msr:PStateDefScott Soule Cheloha
Compute the TSC frequency on AMD family 17h and 19h CPUs using the PStateDef MSRs. Link 1: https://marc.info/?l=openbsd-tech&m=166394236029484&w=2 Link 2: https://marc.info/?l=openbsd-tech&m=166446065916283&w=2 Test list: https://marc.info/?l=openbsd-tech&m=166646389821326&w=2 Reviewed by kettenis@ using the AMD documents cited in the comments. Maybe reviewed by mlarkin@? I can't remember. He seemed supportive of the idea at least. ok kettenis@
2022-10-16Add the guts for EFI runtime services support on amd64. This will be usedMark Kettenis
in the future to implement support for things like EFI variables. ok krw@ (a few others ok'ed earlier incarnations of this diff)
2022-09-22use the always serializing RDTSCP instruction in tsc and usertc if availableRobert Nagy
tweaks from cheloha@; ok deraadt@, sthen@, cheloha@
2022-09-20Split out handling of cpu family specific MSRs from cpu_init_msrs()Robert Nagy
to a separate function that gets called after identifycpu() so that we have the required information to handle the correct MSRs for each cpu. Additionally, move the handling of the DE_CFG_SERIALIZE_LFENCE and IA32_DEBUG_INTERFACE_LOCK MSRs out of identifycpu() to the new function so that they get set again after a suspend/resume cycle as well, which in fixes TSC sync failures. discussed with and input from deraadt@, mlarkin@
2022-09-01vmm(4): send all port io emulation to userlandDave Voutila
Simplify things by sending any io exits from IN/OUT instructions to userland instead of trying to emulate anything in the kernel. vmm was sending most pertinent exits to vmd anyways, so this functionally changes little. An added benefit is this solves an issue reported by tb@ where i386 OpenBSD guests would probe for a pc keyboard repeatedly and cause excessive vm exits. (The emulation in vmm was not properly handling these port reads.) While here, make the assignment of the VEI_DIR_{IN,OUT} enum values not assume the underlying integer the compiler may assign. ok mlarkin@
2022-08-30Initial support for mmio assist for vmm(4)Dave Voutila
Provide the basic information required for a userland assist in emulating instructions touching mmio regions, sending as much information as is provided by the host hardware. No decode or assist provided at the moment by vmd(8). ok mlarkin@
2022-08-30Remove long unused WARN_REFERENCES macro; idea guenther@, ok jsg@ jca@Miod Vallat
2022-08-29static inline, not inline staticJonathan Gray
c99 6.11.5: "The placement of a storage-class specifier other than at the beginning of the declaration specifiers in a declaration is an obsolescent feature." ok guenther@
2022-08-29use ansi volatile keyword, not __volatileJonathan Gray
ok miod@ guenther@
2022-08-25amd64, i386: add delay_init(): basic delay(9) implementation managementScott Soule Cheloha
Because the clock situation on x86 and amd64 is a terminal clusterfuck, there are many different ways to delay(9). We need a rudimentary mechanism for gracefully switching to progressively better delay(9) implementations as they become available during boot without riddling the code with ifdefs and function pointer comparisons. This patch adds delay_init() to both amd64 and i386. If the quality value passed to delay_init() exceeds the quality value of the current delay_func, delay_init() changes delay_func to the given function pointer and updates the quality value. Both platforms start with delay_func set to i8254_delay() and a quality value of zero: all other delay(9) implementations are preferable. Idea and patch provided by jsg@. With tons of input, research, and advice from jsg@. Link: https://marc.info/?l=openbsd-tech&m=166053729104923&w=2 ok mlarkin@ jsg@
2022-08-22remove extern for cpu var removed in 2015Jonathan Gray
ok daniel@
2022-08-20remove Cyrix 486DLC register defines from amd64Daniel Dickman
Cyrix CPUs don't support amd64. These defines were probably carried over from i386 accidentally when the amd64 code was first imported. ok mlarkin@, jsg@
2022-08-12amd64: simplify TSC synchronization testingScott Soule Cheloha
Computing a per-CPU TSC skew value is error-prone, especially on multisocket machines and VMs. My best guess is that larger latencies appear to the current skew measurement test as TSC desync, and so the TSC is demoted to a kernel timecounter on these machines or marked non-monotonic. This patch eliminates per-CPU TSC skew values. Instead of trying to measure and correct for TSC desync we only try to detect desync, which is less error-prone. This approach should allow a wider variety of machines to use the TSC as a timecounter when running OpenBSD. In the new sync test, both CPUs repeatedly try to detect whether their TSC is trailing the other CPU's TSC. The upside to this approach is that it yields no false positives. The downside to this approach is that it takes more time than the current skew measurement test. Each test round takes 1ms, and we run up to two rounds per CPU, so this patch slows boot down by 2ms per AP. If any CPU fails the sync test, the TSC is marked non-monotonic and a different timecounter is activated. The TC_USER flag remains intact. There is no middle ground where we fall back to only using the TSC in the kernel. Before running the test, we check for the IA32_TSC_ADJUST register and reset it if necessary. This is a trivial way to work around firmware bugs that desync the TSC before we reach the kernel. Unfortunately, at the moment this register appears to only be available on Intel processors. I cannot find an equivalent but differently-named MSR for AMD processors. Because there is no per-CPU skew value, there is also no concept of TSC drift anymore. Miscellaneous notes: - This patch adds a new timecounter utility function, tc_reset_quality(). Used after sync test failure to mark the TSC non-monotonic. - I have left TSC_DEBUG enabled for now. Unsure if we should leave it enabled for release or not. If we disable it we no longer run the sync test after failing it once. Running the test even after failure provides information about the desync on every CPU. - Taking 1ms per test round is fairly conservative. We can experiment with and discuss shorter test rounds. My main goal with a relatively long test round is ensuring VMs actually run the test. It would be bad if a hypervisor interrupted the test for so long that it concealed desync. - The use of two test rounds is mostly a diagnostic tool: it would be very strange if a CPU passed the first round but failed the second. If we ever saw this in the wild it would indicate something odd. - Most of the desync seen in test reports is on Ryzen CPUs. I believe, but cannot prove, that this is due to a widespread firmware bug on AMD motherboards. Hopefully AMD and/or the downstream vendors fix it. - Fixing TSC desync by writing the TSC directly with WRMSR is very difficult. The TSC is a moving target incrementing very quickly and compensating for WRMSR overhead is non-trivial. We can experiment with this, but my confidence is low that we can make it work reliably. Prompted by deraadt@ and kettenis@ in 2021. Shepherded along by deraadt@ throughout. Reprompted by Yuichiro Naito several times. With input from Yuichiro Naito, naddy@, sthen@, dv@, and deraadt@. Tested by florian@, gnezdo@, sthen@, Josh Rickmar, dv@, Mohamed Aslan, Hrvoje Popovski, Yuichiro Naito, semarie@, mlarkin@, asou@, jmatthew@, Renato Aguiar, and Timo Myyra. Patch v1: https://marc.info/?l=openbsd-tech&m=164330092208035&w=2 Patch v2: https://marc.info/?l=openbsd-tech&m=164558519712957&w=2 Patch v3: https://marc.info/?l=openbsd-tech&m=165698681018991&w=2 Patch v4: https://marc.info/?l=openbsd-tech&m=165835507113680&w=2 Patch v5: https://marc.info/?l=openbsd-tech&m=165923705118770&w=2 "just commit it" deraadt@
2022-08-07Start to add annotations to the cpu_info members, doing I/a/o forPhilip Guenther
immutable/atomic/owned ala <sys/proc.h>. Move CPUF_USERSEGS and CPUF_USERXSTATE, which really are private to the CPU, into a new ci_pflags and rename s/CPUF_/CPUPF_/. Make all (remaining) ci_flags alterations via atomic_{set,clear}bits_int(), so its annotation isn't a lie. Delete ci_info member as unused all the way from rev 1.1 ok jsg@ mlarkin@
2022-07-12remove cache parts of struct cpu_info only vmm usedJonathan Gray
suggested by and ok mlarkin@
2022-07-02remove machine/lock.h where unusedJonathan Gray
Previously for __cpu_simple_lock parts. Now only hppa and m88k use __cpu_simple_lock (and hppa uses atomic.h for it). ok miod@ visa@
2022-06-30vmm(4): reference count vm's and vcpu'sDave Voutila
Unlocking most of vmm last year at k2k21 exposed bugs related to lifetime management of vm and vcpu objects. Add reference counts to make sure we don't attempt to teardown vcpu or vm related objects while a thread is holding a reference. This also reduces abuse of rwlocks originally intended to protect the linked lists cleaning things up quite a bit. While here, also document assumptions on how struct members are protected for the next brave soul wander in. ok mlarkin@
2022-06-29Add support for using non-standard UARTs (such as the Synopsys DesignWareMark Kettenis
UART found on AMD's Ryzen Embedded V1000 family) as an early console. This requires additional parameters to be passed by the bootloader to the kernel so it changes the struct for the BOOTARG_CONSDEV boot argument. The old struct will still be supported until OpenBSD 7.3 has been released such that new kernels boot with the old bootloader. ok anton@, deraadt@