summaryrefslogtreecommitdiff
path: root/sys/arch/amd64
AgeCommit message (Collapse)Author
2023-12-14NKMEMPAGES_MAX_DEFAULT is no longer used. Remove it from param.h.Claudio Jeker
OK miod@
2023-12-12remove support for syscall(2) -- the "indirection system call" becauseTheo de Raadt
it is a dangerous alternative entry point for all system calls, and thus incompatible with the precision system call entry point scheme we are heading towards. This has been a 3-year mission: First perl needed a code-generated wrapper to fake syscall(2) as a giant switch table, then all the ports were cleaned with relatively minor fixes, except for "go". "go" required two fixes -- 1) a framework issue with old library versions, and 2) like perl, a fake syscall(2) wrapper to handle ioctl(2) and sysctl(2) because "syscall(SYS_ioctl" occurs all over the place in the "go" ecosystem because the "go developers" are plan9-loving unix-hating folk who tried to build an ecosystem without allowing "ioctl". ok kettenis, jsing, afresh1, sthen
2023-12-12The sigtramp was calling sigreturn(2), and upon failure exit(2), whichTheo de Raadt
doesn't make sense anymore. It is better to just issue an illegal instruction. ok kettenis, with some misgivings about inconsistant approaches between architectures. In the future we could change sigreturn(2) to never return an exit code, but always just terminate the process. We stopped this system call from being callable ages ago with msyscall(2), and there is no stub for it in libc.. maybe that's the next step to take?
2023-12-11Implement per-CPU caching for the page table page (vp) pool and the PTEMark Kettenis
descriptor (pted) pool in the arm64 pmap implementation. This significantly reduces the side-effects of lock contention on the kernel map lock that is (incorrectly) translated into excessive page daemon wakeups. This is not a perfect solution but it does lead to significant speedups on machines with many CPU cores. This requires adding a new pmap_init_percpu() function that gets called at the point where kernel is ready to set up the per-CPU pool caches. Dummy implementations of this function are added for all non-arm64 architectures. Some other architectures can probably benefit from providing an actual implementation that sets up per-CPU caches for pmap pools as well. ok phessler@, claudio@, miod@, patrick@
2023-12-10vmm(4): flush EPTs after enabling VMX mode.Dave Voutila
The Intel SDM states the vmxon/vmxoff instructions don't invalidate any EPT states on the cpu and recommend invalidating the global context. vmm(4) opportunistically disables and enables VMX mode as vms are created or terminated, so this adds a recommended housekeeping step per the SDM. While here, tidy up the CR4 toggling by moving it to after the MSR feature check. ok mlarkin@
2023-12-10Add a new label "sigcodecall" inside every sigtramp definition, directlyTheo de Raadt
in front of the syscall instruction. This is used to calculate the start of the syscall for SYS_sigreturn and pinned system calls. ok kettenis
2023-11-28Adapt inv{vpid,ept} to return success or failure.Dave Voutila
ok mlarkin@
2023-11-26vmm(4)/vmx: pass correct vpid value to invvpid.Dave Voutila
While vmm's use of invvpid in the vmx vcpu run loop is questionable since we require and use EPT, the vpid value is unquestionably wrong in these calls. ok mlarkin@
2023-11-24vmm(4)/vmx: fix memory scribbling by updating GDTR/TR if vcpu moves.Dave Voutila
If the vcpu thread sleeps in the kernel, like when handling a nested page fault and calling uvm_fault(9), the thread may be rescheduled on another host cpu. vmm(4) was only setting the GDTR and TR bases in the VMCS once prior to first vm entry, so a thread migration can result in restoring the wrong GDTR and TR on vm exit for the host cpu. This results in borked interrupts and corrupted stack pointers, causing programs to segfault or sigabort. It can also result in missed ipi's causing kernel deadlocks. Use similar logic to the SVM routines and check for cpu migration within the hot loop. Since we're letting the VMX features of the cpu restore GDTR, we can also drop the manual store/load routines. Reported and with much appreciated testing help from Mischa Peters. ok mlarkin@
2023-11-22Fix race when initializing TSC.Alexander Bluhm
During boot TSC initialization could fail with panic: tsc_test_sync_ap: cpu2: tsc_ap_name is not NULL: cpu1. The root cause is a race between the moment the application processor sets CPUF_IDENTIFIED in cpu_hatch() and the moment the boot processor checks CPUF_IDENTIFIED in cpu_start_secondary() before the TSC sync test. The fix is to set CPUF_IDENTIFIED before clearing CPUF_IDENTIFY in cpu_hatch(). from hshoexer@ cheloha@; OK deraadt@ mlarkin@
2023-11-13include function name in warning printf in vmx_handle_np_fault() and ↵Jasper Lievisse Adriaanse
svm_handle_np_fault() more clearer output ok mlarkin@
2023-11-11Fix variable name in commentJeremie Courreges-Anglas
Mostly a dummy commit so that the last llvm commit ends up in the git export. (No idea whether it's actually/still needed but it can't hurt.)
2023-10-30Use KERNEL_ASSERT_UNLOCKED() instead of KASSERT(!_kernel_lock_held()).Vitaliy Makkoveev
The first one defined as the second, so no functional changed. ok claudio
2023-10-26make efi_getdisklabel_cd9660() handle a block size of 512 and simplifyJonathan Gray
ok yasuoka@
2023-10-24Normally context switches happen in mi_switch() but there are 3 casesClaudio Jeker
where a switch happens outside. Cleanup these code paths and make the machine independent. - when a process forks (fork, tfork, kthread), the new proc needs to somehow be scheduled for the first time. This is done by proc_trampoline. Since proc_trampoline is machine dependent assembler code change the MP specific proc_trampoline_mp() to proc_trampoline_mi() and make sure it is now always called. - cpu_hatch: when booting APs the code needs to jump to the first proc running on that CPU. This should be the idle thread for that CPU. - sched_exit: when a proc exits it needs to switch away from itself and then instruct the reaper to clean up the rest. This is done by switching to the idle loop. Since the last two cases require a context switch to the idle proc factor out the common code to sched_toidle() and use it in those places. Tested by many on all archs. OK miod@ mpi@ cheloha@
2023-10-13enable dwqe(4) in RAMDISK_CDStefan Sperling
reminded by jsg@, snapshot test build done by me
2023-10-11Add initial support for Elkhart Lake ethernet to dwqe(4).Stefan Sperling
For now, only attach to PSE0/RGMII (device ID 0x4ba0) which is the only device I have access to for testing. There is a known problem where Tx throughput is lower than expected. This is being looked into. ok kettenis@
2023-09-25enable mbg(4) at pci on amd64, from Maurice JanssenTheo de Raadt
2023-09-17clockintr: remove clockintr_init(), clockintr_flagsScott Soule Cheloha
All the state initialization once done in clockintr_init() has been moved to other parts of the kernel. It's a dead function. Remove it. Likewise, the clockintr_flags variable no longer sports any meaningful flags. Remove it. This frees up the CL_* flag namespace, which might be useful to the clockintr frontend if we ever need to add behavior flags to any of those functions.
2023-09-14clockintr: replace CL_RNDSTAT with global variable statclock_is_randomizedScott Soule Cheloha
In order to separate the statclock from the clock interrupt subsystem we need to move all statclock state out into the broader kernel. Start by replacing the CL_RNDSTAT flag with a new global variable, "statclock_is_randomized", in kern_clock.c. Update all clockintr_init() callers to set the boolean instead of passing the flag. Thread: https://marc.info/?l=openbsd-tech&m=169428749720476&w=2
2023-09-10load amd patch into a malloc'd region to make it page alignedJonathan Gray
avoids a General-Protection Exception on patch loader wrmsr with A10-5700, TN-A1 00610f01 15-10-01 the alignment requirement is not present on at least Ryzen 5 2600X, PiR-B2 00800f82 17-08-02 problem reported and fix tested by espie@
2023-09-08Clean up old console bootargsKlemens Nanni
7.3 is long gone, you must have new bootloaders and new kernels. Zaps both condition and else block, unindent and merge lines where fit. Feedback OK kettenis Tests OK denis
2023-09-06vmm(4)/vmd(8): include pending interrupt in vm_run_parmams.Dave Voutila
To remove an ioctl(2) from the vcpu thread hotpath in vmd(8), add a flag in the vm_run_params structure to indicate if there's another interrupt pending. This reduces latency in vcpu work related to i/o as we save a trip into the kernel just to flip the interrupt pending flag on or off. Tested by phessler@, mbuhl@, stsp@, and Mischa Peters. ok mlarkin@
2023-09-06revert disabling warnings for zlib on clang >= 15Jonathan Gray
no longer needed with zlib 1.3 ok tb@
2023-09-05vmm(4): switch the APMI CPUID mask to an include maskMike Larkin
dv points out that there are other bits there that imply the existence of other MSRs, so switching this to an include list is a better idea.
2023-09-03vmm(4): Suppress AMD HwPstate visibility to guestsMike Larkin
On newer Ryzen/EPYC, we need to hide the HwPstate CPUID 80000007:EDX field for HwPstate, or guests will try to access the MSRs associated with those, and that will fail with #GP. ok deraadt
2023-08-23all platforms: separate cpu_initclocks() from cpu_startclock()Scott Soule Cheloha
To give the primary CPU an opportunity to perform clock interrupt preparation in a machine-independent manner we need to separate the "initialization" parts of cpu_initclocks() from the "start the clock interrupt" parts. Currently, cpu_initclocks() does everything all at once, so there is no space for this MI setup. Many platforms have more-or-less already done this separation by implementing a separate routine named "cpu_startclock()". This patch promotes cpu_startclock() from de facto standard to mandatory API. - Prototype cpu_startclock() in sys/systm.h alongside cpu_initclocks(). The separation of responsibility between the two routines is a bit fuzzy but the basic guidelines are as follows: + cpu_initclocks() must initialize hz, stathz, and profhz, and call clockintr_init(). + cpu_startclock() must call clockintr_cpu_init() and start the clock interrupt cycle on the calling CPU. These guidelines will shift in the future, but that's the way things stand as of *this* commit. - In initclocks(): first call cpu_initclocks(), then do MI setup, and last call cpu_startclock(). - On platforms where cpu_startclock() already exists: don't call cpu_startclock() from cpu_initclocks() anymore. - On platforms where cpu_startclock() doesn't yet exist: implement it. Usually this is as simple as dividing cpu_initclocks() in two. Tested on amd64 (i8254, lapic), arm64, i386 (i8254, lapic), macppc, mips64/octeon, and sparc64. Tested on arm/armv7 (agtimer(4)) by phessler@ and jmatthew@. Tested on m88k/luna88k by aoyama@. Tested on powerpc64 by gkoehler@ and mlarkin@. Tested on riscv64 by jmatthew@. Thread: https://marc.info/?l=openbsd-tech&m=169195251322149&w=2
2023-08-16add Intel ARCH_CAP_GDS bitsJonathan Gray
mentioned in https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/gather-data-sampling.html
2023-08-15Replace a bunch of (1 << 31) with (1U << 31)Miod Vallat
2023-08-09correct platform id mask, it is 3 bits 52:50Jonathan Gray
2023-08-09show x86 cpu patch level in dmesgJonathan Gray
ok guenther@ deraadt@
2023-07-31On CPUs with eIBRS ("enhanced Indirect Branch Restricted Speculation")Philip Guenther
or IBT enabled the kernel, the hardware should the attacks which retpolines were created to prevent. In those cases, retpolines should be a net negative for security as they are an indirect branch gadget. They're also slower. * use -mretpoline-external-thunk to give us control of the code used for indirect branches * default to using a retpoline as before, but marks it and the other ASM kernel retpolines for code patching * if the CPU has eIBRS, then enable it * if the CPU has eIBRS *or* IBT, then codepatch the three different retpolines to just indirect jumps make clean && make config required after this ok kettenis@
2023-07-31The replacement code passed to codepatch_replace() can usefully bePhilip Guenther
const. suggested by bluhm@
2023-07-28Include a newline in a DPRINTF()Philip Guenther
2023-07-28Add CODEPATCH_CODE() macro to simplify defining a symbol for a chunkPhilip Guenther
of code to use in codepatching. Use that for all the existing codepatching snippets. Similarly, add CODEPATCH_CODE_LEN() which is CODEPATCH_CODE() but also provides a short variable holding the length of the codepatch snippet. Use that for some snippets that will be used for retpoline replacement. ok kettenis@ deraadt@
2023-07-27Fix off-by-one: SEFF0ECX_WAITPKG is bit 5, not bit 4.Philip Guenther
ok mlarkin@ kettenis@ deraadt@
2023-07-27Report speculation control bits in dmesg cpu lines.Philip Guenther
ok mlarkin@
2023-07-27The interrupt resume (Xdoreti) and recurse (Xspllower) paths arePhilip Guenther
invoked using indirect branches and should have endbr64's. ok deraadt@
2023-07-27Follow the lead of mips64 and make cpu_idle_cycle() just call thePhilip Guenther
indirect pointer itself and provide an initializer for that going to the default "just enable interrupts and halt" path. ok kettenis@
2023-07-25statclock: move profil(2), GPROF code to profclock(), gmonclock()Scott Soule Cheloha
This patch isolates profil(2) and GPROF from statclock(). Currently, statclock() implements both profil(2) and GPROF through a complex mechanism involving both platform code (setstatclockrate) and the scheduler (pscnt, psdiv, and psratio). We have a machine-independent interface to the clock interrupt hardware now, so we no longer need to do it this way. - Move profil(2)-specific code from statclock() to a new clock interrupt callback, profclock(), in subr_prof.c. Each schedstate_percpu has its own profclock handle. The profclock is enabled/disabled for a given CPU when it is needed by the running thread during mi_switch() and sched_exit(). - Move GPROF-specific code from statclock() to a new clock interrupt callback, gmonclock(), in subr_prof.c. Where available, each cpu_info has its own gmonclock handle . The gmonclock is enabled/disabled for a given CPU via sysctl(2) in prof_state_toggle(). - Both profclock() and gmonclock() have a fixed period, profclock_period, that is initialized during initclocks(). - Export clockintr_advance(), clockintr_cancel(), clockintr_establish(), and clockintr_stagger() via <sys/clockintr.h>. They have external callers now. - Delete pscnt, psdiv, psratio. From schedstate_percpu, also delete spc_pscnt and spc_psdiv. The statclock frequency is not dynamic anymore so these variables are now useless. - Delete code/state related to the dynamic statclock frequency from kern_clockintr.c. The statclock frequency can still be pseudo-random, so move the contents of clockintr_statvar_init() into clockintr_init(). With input from miod@, deraadt@, and claudio@. Early revisions cleaned up by claudio. Early revisions tested by claudio@. Tested by cheloha@ on amd64, arm64, macppc, octeon, and sparc64 (sun4v). Compile- and boot- tested on i386 by mlarkin@. riscv64 compilation bugs found by mlarkin@. Tested on riscv64 by jca@. Tested on powerpc64 by gkoehler@.
2023-07-25cpu_idle_{enter,leave} are no-ops on amd64 now, so just #definePhilip Guenther
away the calls ok deraadt@ mpi@ miod@
2023-07-25Some hypervisors (such as Hertzner) allow msr read of DE_CFG (which doesTheo de Raadt
not indicate bit 9 set, but they could have a firmware fix) but then block a msr write to bit 9 (which disables enough AVX optimizations to prevent the exfiltration of data), with a fault. So let's also check the HV bit before we decide to modify the bit. hypervisors are expected to set that bit. tested by lucas at sexy dot is. with jsg, ok mlarkin
2023-07-24Set DE_CFG[9] -- a chickenbit which stops Zenbleed. The chickenbit mayTheo de Raadt
have other side-effects (not disclosed by AMD), and firmwares fixes may be better (and have other side-effects, same story). Newer processors will probably be validated more carefully by AMD. Issue found by Tavis Ormandy. This is errata 7.2/033_amdcpu.patch.sig and 7.3/011_amdcpu.patch.sig Zenbleed also blocked on select cpus by using errata 7.3/012_amdfirmware.patch.sig + 7.3/013_amdcpufirmware.patch / 7.2/034_amdfirmware.patch.sig + 7.2/035_amdcpufirmware.patch.sig which load AMD cpu firmwares (firmware.openbsd.org is updated often to contain the best firmwares) ok jsg
2023-07-23update AMD CPU microcode if a newer patch is availableJonathan Gray
ok deraadt@
2023-07-22BOOTARG_UCODE for AMDJonathan Gray
ok deraadt@
2023-07-21Rename ARCH_CAPABILITIES_* #defined to ARCH_CAP_*Philip Guenther
Provide more ARCH_CAP_* defines per June 2023 SDM ok jsg@ deraadt@
2023-07-20Assign wsdisplay0 to the glass console always. The same change isYASUOKA Masahiko
done for GENERIC already. ok kettenis kn
2023-07-10Enable Indirect Branch Tracking for amd64 userland, using XSAVES/XRSTORSPhilip Guenther
to save/restore the state and enabling it at exec-time (and for signal handling) if the PS_NOBTCFI flag isn't set. Note: this changes the format of the sc_fpstate data in the signal context to possibly be in compressed format: starting now we just guarantee that that state is in a format understood by the XRSTOR instruction of the system that is being executed on. At this time, passing sigreturn a corrupt sc_fpstate now results in the process exiting with no attempt to fix it up or send a T_PROTFLT trap. That may change. prodding by deraadt@ issues with my original signal handling design identified by kettenis@ lots of base and ports preparation for this by deraadt@ and the libressl and ports teams ok deraadt@ kettenis@
2023-07-08Move /dev/apm related acpi code to acpi_apm.c which is only built onTobias Heider
amd64 and i386. This is the first step towards a machine independent safe sleep API. tested by yasuoka@ bluhm@ ok deraadt@ kettenis@
2023-07-08Toggle IBT off during EFI runtime services calls.Mark Kettenis
While UEFI 2.10 has a way of indicating that runtime services use the appropriate ENDBR64 instructions, firmware that's out int the wild doesn't actually use that yet. Once the landscape changes we may want to reconsider toggling IBT off. ok guenther@, kn@