summaryrefslogtreecommitdiff
path: root/sys/arch/amd64
AgeCommit message (Collapse)Author
2023-01-20On cpu with the PKU feature, prot=PROT_EXEC pages now create pte whichTheo de Raadt
contain PG_XO, which is PKU key1. On every exit from kernel to userland, force the PKU register to inhibit data read against key1 memory. On (some) traps into the kernel if the PKU register is changed, abort the process (processes have no reason to change the PKU register). This provides us with viable xonly functionality on most modern intel & AMD cpus. I started with a xsave-based diff from dv@, but discovered the fpu save/restore logic wasn't a good fit and went to direct register management. Disabled on HV (vm) systems until we know they handle PKU correctly. ok kettenis, dv, guenther, etc
2023-01-19Revise implementation of pmap_protect(9) in preparation for execute-onlyMark Kettenis
support. The current implementation doesn't handle the transition from RWX to RW correctly. Also generalize the pmap_write_protect() function in recognition of the fact that execute permission, write permission, and in the future read permission on executable pages, are handled by separate bits. ok deraadt@, mpi@
2023-01-19Restrict vmm(4) exposed cpuid extended feature flags.Dave Voutila
We don't emulate or support most of the EAX=7,ECX=0 feature bits, so restrict the mask further to just UMIP. ok deraadt@
2023-01-18change BIOSF_SMBIOS bit flag from 6 to 8Jonathan Gray
matches tom@'s i386 rev 1.47 change
2023-01-17Simplify and clarify the implementation of the pmap_page_protect(9) API.Mark Kettenis
This function is only ever called with PROT_NONE or PROT_READ where PROT_NONE removes the mapping from the page tables and PROT_READ takes away write permission. Add a KASSERT to make sure no other values are passed. This KASSERT should be optimized away by any decent compiler. ok deraadt@, mpi@, guenther@
2023-01-17On amd64 machines without the NX feature enabled, we can't distinguishMark Kettenis
between page faults as a result of instruction fetches or normal data access. Handle this in the same way as we do on landisk: if handling the fault with access type PROT_READ fails, retry with PROT_EXEC. Fortunately we know whether NX is enabled or nor so only do this when it isn't. Nobody should be running an amd64 machine without NX! ok deraadt@, miod@
2023-01-16we spent far too long debugging a weird go library problem (incorrectTheo de Raadt
arguments to mmap) because it was using syscall(2) and that callpath is invisible in ktrace. make it visible, it will now show "(via syscall)" and such. ok guenther
2023-01-16export PGK_VALUE so that .S files can use itTheo de Raadt
2023-01-163 new defines: he PTE protection key mask, the specific key value we useTheo de Raadt
for execute-only, and the PKU value used by userland to use that key.
2023-01-14Implement access to EFI variables and ESRT through an ioctl(2) interfaceMark Kettenis
that is compatible with what FreeBSD and NetBSD have. Setting EFI variables is only allowed at securelevel 0 and below. Heavily based on work done by Sergii Dmytruk. ok yasuoka@
2023-01-14add protection-key violation error code for page-fault exceptionsJonathan Gray
ok deraadt@
2023-01-14recognise protection keys for supervisor-mode (PKS) in cpuidJonathan Gray
ok deraadt@
2023-01-14sync cr4 and xcr0 bits with intel dec 2022 sdmJonathan Gray
ok deraadt@
2023-01-13Retake kernel lock in error paths of vmmioctl.Dave Voutila
From Christian Ludwig.
2023-01-10Hide WAITPKG cpu feature from vmm(4) guests.Dave Voutila
Alder Lake and similar-era Intel platforms introduced new userland wait instructions. Since vmm was passing this cpuid bit into guests, some would attempt TPAUSE instructions and trigger invalid instruction exceptions because VMX requires additional configuration to support emulation. This also adds WAITPKG to i386 and amd64 cpu feature identification. Input from anton@, cheloha@, and guenther@. Tested by jmatthew@. OK deraadt.
2023-01-06Remove copystr(9), unless used internally by copy{in,out}str.Miod Vallat
2023-01-02Let the EFI bootloader make a copy of the EFI System Resource Table (ESRT)Mark Kettenis
and pass it to the kernel. ok jca@, patrick@
2023-01-01update drm to linux 6.1.2Jonathan Gray
new hardware support includes AMD Raphael, Ryzen 7000 desktop, gfx1036/GC 10.3.6 Mendocino, Ryzen & Athlon 7020 Series mobile APU, gfx1037/GC 10.3.7 Navi 31, gfx1100 dGPU, GC 11.0.0, Radeon RX 7900 XT/XTX gfx1101 dGPU gfx1102 dGPU gfx1103 APU Thanks to the OpenBSD Foundation for sponsoring this work.
2022-12-30Actually hide the clang-15 workaround behind the COMPILER_VERSION checkJeremie Courreges-Anglas
COMPILER_VERSION initially missed. I'm not sure why we still have those COMPILER_VERSION checks in sys/arch/i386 and sys/arch/amd64, when the base system doesn't ship gcc any more, but let's stay consistent.
2022-12-30Neuter zlib fatal warnings when building kernels and bootloaders with clang 15Jeremie Courreges-Anglas
Disable -Wdeprecated-non-prototype instead of patching zlib. Upstream plans to drop the pre-ANSI syntax soon. ok tb@ millert@
2022-12-27Ansify pxe_netif_close() and {,pxe}socktodesc()Jeremie Courreges-Anglas
To appease the clang 15 warning -Wdeprecated-non-prototype (turned on by -Wall). ok millert@
2022-12-26vmd(8): provide a detailed e820 memory map.Dave Voutila
When booting guests with SeaBIOS, vmd(8) supplied details about the available guest memory via CMOS registers. Consequently, we've been carrying some patches in the ports tree to SeaBIOS to fetch this information like it's the 1990s. When a vm initializes memory ranges, we now track what each range represents. This information can be used to supply the e820 memory map to SeaBIOS via the fw_cfg interface allowing it to properly communicate memory ranges to a guest operating system. (This will also allow us to drop some patches from the port.) Given the ranges can now be marked with a purpose, this also allows vmm(4) to switch from hard-coded mmio ranges and instead let the information on the memory range dictate if vmm should be handling a page fault or sending to vmd for a memory assist. Tested by Mischa Peters and others. OK mlarkin@.
2022-12-19rework the synchronisation around suspend/resume.David Gwynne
the idea is that access to vmm from userland is gated by the vmm_softc sc_status field, and then accounted for by the refcnt. you take a read lock to check the gate, and if it is open then you can take a reference and do your thing. once you've finished the work then you rele the refcnt on the way out of the ioctl handler. the suspend code takes a write lock to close the sc_status gate, and then uses refcnt_finalise to wait for things in the ioctl handler to get out. on resume, the code takes the write lock, sets the refcnt up again for userland to use, and then opens the gate. tested by and ok dv@
2022-12-08_C_LABEL() and _ASM_LABEL() are no longer useful in the "everythingPhilip Guenther
is ELF" world. Eliminate use of them in amd64, arm64, armv7, i386, macppc, mips64, and sparc64 code. ok deraadt@ jca@ krw@
2022-12-01_C_LABEL() is no longer useful in the "everything is ELF" world.Philip Guenther
Start eliminating it. ok mpi@ mlarkin@ krw@
2022-11-29Move the generic variable definitions from the ASM at the top ofPhilip Guenther
locore.S to be in C in cpu.c, machdep.c, pmap.c, or bus_space.c for better typing/debug info. Delete REALBASEMEM, REALEXTMEM, and biosextmem as unused/ignored. ok mpi@ krw@ mlarkin@
2022-11-29Put the original image of the MP-startup and ACPI-suspend/hibernatePhilip Guenther
trampolines into .rodata instead of .text. While here, give types and sizes to all the global symbols and delete some superfluous directives and unrelocated symbols in the ACPI trampoline image. ok mlarkin@
2022-11-11Enable icc(4). ok anton@ patrick@Matthieu Herrb
2022-11-10vmd(8): import mmio decode and emulation, disabled for now.Dave Voutila
The initial mmio support for vmd adds support for only specific MOV and MOVZX instructions. Plan is to begin iterating in-tree on other missing pieces. All functionality is gated behind an #if for now. Only change to vmm(4) is reordering register #define's in vmmvar.h. ok mlarkin@
2022-11-10Convert amd64 clock and ipi event counters to per-cpuJonathan Matthew
ok kettenis@ jca@ cheloha@
2022-11-09vmm(4): treat vcpu lists as immutable, reducing complexity.Dave Voutila
Since vmm doesn't support hot-plug vcpus we can reduce complexity by treating the vcpu list per vm as immutable after creation. As a consequence, we can use the vm reference count to protect the lifetime of the vcpus, removing the need for reference counting individual vcpu objects. With an immutable list, we no longer need a rwlock protecting it either. Original diff from dlg@ that I reworked and tested. ok dlg@, mlarkin@
2022-11-09vmm on !MULTIPROCESSOR kernels should still mark vpus with pending intrs.David Gwynne
the #ifdef MULTIPROCESSOR was a little broad. still grateful to anton and stsp for unbreaking the tree though.
2022-11-09unbreak GENERIC build on amd64; patch by anton@Stefan Sperling
vmm.c:900:3: error: implicit declaration of function 'x86_send_ipi' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
2022-11-08don't keep track of how many vcpus are currently running.David Gwynne
the number is never read anywhere, and i'm not sure what value that number has anyway. mlarkin@ agrees
2022-11-08further speed up delivery of interrupts to a running vcpu.David Gwynne
this records which physical cpu a vcpu is running on. this is used by the code that marks a vcpu as having a pending interrupt to check if the vcpu is currently running. if it thinks the vcpu is running, it sends a nop IPI to the physical cpu it is running on to trigger a vmexit, which in turn runs interrupt handling in the guest. ok mlarkin@
2022-11-08vmm(4): remove locking in vmm_intr_pendingMike Larkin
Removes a lock around an atomic write; this lock was causing slowdowns since the lock being requested is nearly always unavailable because it is held while the VM is running. Noticed by claudio@, help from mpi@, dlg@ and claudio@. ok dv
2022-11-08amd64: switch to clockintr(9)Scott Soule Cheloha
Switch amd64 to the clockintr(9) subsystem. There are lots of little changes, but the bigs ones are listed here. When using the local apic timer: - Run the timer in one-shot mode. - lapic_delay() is gone. We can't use it to delay(9) when running the timer in one-shot mode. - Add a randomized statclock(); stathz = hz. - Add support for switching to profhz when profiling is enabled; profhz = stathz * 10. When using the i8254/mc146818: - i8254's clockintr() no longer has a monopoly on hardclock(). - mc146818's rtcintr() no longer has a monopoly on statclock(). - In profiling mode, the statclock() will drift very slightly because (profhz = 1024) does not divide evenly into one billion. We could avoid this by setting (profhz = 512) instead and programming the RTC to run at that rate. Early revisions reviewed by mlarkin@. Extensively tested by mlarkin@ on a variety of physical and virtual hardware. Additional testing from dv@ and jmc@. Link: https://marc.info/?l=openbsd-tech&m=166776339203279&w=2 ok kettenis@ mlarkin@
2022-11-08amd64: add delay_fini()Scott Soule Cheloha
Not all of the clocks with a delay(9) implementation necessarily keep ticking across suspend/resume. We need a clean way to reverse delay_init() during suspend when those clocks stop ticking. Hence, delay_fini(). delay_fini() resets delay_func() to i8254_delay() if the given function pointer is the active delay(9) implementation. ok mlarkin@
2022-11-07vmm(4): set RAX guest register state based on VMCBDave Voutila
The read/write register routines for SVM didn't acknowledge RAX in the VMCB as the de facto RAX state. When writing gprs, vmm should update RAX in the VMCB. When reading, it should be setting the guest regs state based on the VMCB. Needed for proper mmio emulation in userland. ok mlarkin@
2022-11-07In kpageflttrap(), validate a non-NULL pcb_onfault against an arrayPhilip Guenther
of permitted addresses, done via .nofault* sections that end up in the linked kernel's rodata. ok deraadt@ kettenis@
2022-11-06vmm(4): allocate reference for vm and vcpu SLISTsDave Voutila
Mischa Peters reported a performance regression in 7.2 when hosting numerous guests under vmm(4). While iterating through the list of vms during servicing an ioctl, vmm was triggering excessive wakeup calls due to hitting zero refcnt. Much guidance from dlg@ and testing from Mischa. OK mlarkin@.
2022-11-04EFI firmware has bugs which may mean that calling EFI runtime services willMark Kettenis
fault because it does memory accesses outside of the regions it told us to map. Try to mitigate this by installing a fault handler (using the pcb_onfault mechanism) and bail out using longjmp(9) if we encounter a page fault while executing an EFI runtime services call. Since some firmware bugs result in us executing code that isn't mapped, make kpageflttrap() handle execution faults as well as data faults. ok guenther@
2022-11-02Clean up more ancient history: since 2015 the libc stubs forPhilip Guenther
fork/vfork/__tfork haven't cared about the second return register. So, stop setting retval[1] in kern_fork.c and stop setting the second return register in the MD child_return() routines. With the above, we have no multi-register return values on LP64, so stop touching that register in the trapframe on those archs. testing miod@ and aoyama@ ok miod@
2022-11-01Use todr_attach().Mark Kettenis
ok phessler@
2022-11-01vmm(4): vcpu_reset_regs_svm: allow reads of MSR_HWCR, MSR_PSTATEDEF(0)Scott Soule Cheloha
Guests may need these MSRs to determine the TSC frequency on AMD families 17h and 19h. GP fault reported by weerd@, observed on "AMD EPYC 3201 8-Core Processor" (17-01-02). Same issue observed by Jesper Wallin on "AMD Ryzen PRO 3700U". Tested by Jesper Wallin. Link: https://marc.info/?l=openbsd-bugs&m=166721628323483&w=2 ok mlarkin@
2022-10-30Simplfity setregs() by passing it the ps_strings and switchingPhilip Guenther
sys_execve() to return EJUSTRETURN. setregs() is the MD routine used by sys_execve() to set up the thread's trapframe and PCB such that, on 'return' to userspace, it has the register values defined by the ABI and otherwise zero. It had to set the syscall retval[] values previously because the normal syscall return path overwrites a couple registers with the retval[] values. By instead returning EJUSTRETURN that and some complexity with program-counter handling on m88k and sparc64 goes away. Also, give setregs() add a 'struct ps_strings *arginfo' argument so powerpc, powerpc64, and sh can directly get argc/argv/envp values for registers instead of copyin()ing the one in userspace. Improvements from miod@ and millert@ Testing assistance miod@, kettenis@, and aoyama@ ok miod@ kettenis@
2022-10-24tsc: AMD Family 17h, 19h: compute frequency from Core::X86::Msr:PStateDefScott Soule Cheloha
Compute the TSC frequency on AMD family 17h and 19h CPUs using the PStateDef MSRs. Link 1: https://marc.info/?l=openbsd-tech&m=166394236029484&w=2 Link 2: https://marc.info/?l=openbsd-tech&m=166446065916283&w=2 Test list: https://marc.info/?l=openbsd-tech&m=166646389821326&w=2 Reviewed by kettenis@ using the AMD documents cited in the comments. Maybe reviewed by mlarkin@? I can't remember. He seemed supportive of the idea at least. ok kettenis@
2022-10-20Don't attempt to use EFI runtime services on UEFI versions before 2.1.Mark Kettenis
The Dell Precision T1600 has a UEFI 2.0 implementation where calling GetTime() accesses memory that isn't covered by a runtime mapping. And frankly UEFI 2.0 is so ancient that we don't really want to use it anyway. This also adds the check to the arm64 version even though UEFI versions before 2.4 don't have arm64 support. But for now I want to keep amd64 and arm64 code as similar as possible. ok kn@
2022-10-16Add the guts for EFI runtime services support on amd64. This will be usedMark Kettenis
in the future to implement support for things like EFI variables. ok krw@ (a few others ok'ed earlier incarnations of this diff)
2022-10-10add references to 10h 12h revision guidesJonathan Gray