Age | Commit message (Collapse) | Author |
|
contain PG_XO, which is PKU key1. On every exit from kernel to userland,
force the PKU register to inhibit data read against key1 memory. On
(some) traps into the kernel if the PKU register is changed, abort the
process (processes have no reason to change the PKU register). This
provides us with viable xonly functionality on most modern intel & AMD
cpus. I started with a xsave-based diff from dv@, but discovered the
fpu save/restore logic wasn't a good fit and went to direct register management.
Disabled on HV (vm) systems until we know they handle PKU correctly.
ok kettenis, dv, guenther, etc
|
|
support. The current implementation doesn't handle the transition from
RWX to RW correctly. Also generalize the pmap_write_protect() function
in recognition of the fact that execute permission, write permission,
and in the future read permission on executable pages, are handled by
separate bits.
ok deraadt@, mpi@
|
|
We don't emulate or support most of the EAX=7,ECX=0 feature bits,
so restrict the mask further to just UMIP.
ok deraadt@
|
|
matches tom@'s i386 rev 1.47 change
|
|
This function is only ever called with PROT_NONE or PROT_READ where
PROT_NONE removes the mapping from the page tables and PROT_READ takes
away write permission. Add a KASSERT to make sure no other values are
passed. This KASSERT should be optimized away by any decent compiler.
ok deraadt@, mpi@, guenther@
|
|
between page faults as a result of instruction fetches or normal data
access. Handle this in the same way as we do on landisk: if handling
the fault with access type PROT_READ fails, retry with PROT_EXEC.
Fortunately we know whether NX is enabled or nor so only do this when
it isn't. Nobody should be running an amd64 machine without NX!
ok deraadt@, miod@
|
|
arguments to mmap) because it was using syscall(2) and that callpath
is invisible in ktrace. make it visible, it will now show "(via syscall)"
and such.
ok guenther
|
|
|
|
for execute-only, and the PKU value used by userland to use that key.
|
|
that is compatible with what FreeBSD and NetBSD have. Setting EFI
variables is only allowed at securelevel 0 and below.
Heavily based on work done by Sergii Dmytruk.
ok yasuoka@
|
|
ok deraadt@
|
|
ok deraadt@
|
|
ok deraadt@
|
|
From Christian Ludwig.
|
|
Alder Lake and similar-era Intel platforms introduced new userland
wait instructions. Since vmm was passing this cpuid bit into guests,
some would attempt TPAUSE instructions and trigger invalid instruction
exceptions because VMX requires additional configuration to support
emulation.
This also adds WAITPKG to i386 and amd64 cpu feature identification.
Input from anton@, cheloha@, and guenther@. Tested by jmatthew@.
OK deraadt.
|
|
|
|
and pass it to the kernel.
ok jca@, patrick@
|
|
new hardware support includes
AMD
Raphael, Ryzen 7000 desktop, gfx1036/GC 10.3.6
Mendocino, Ryzen & Athlon 7020 Series mobile APU, gfx1037/GC 10.3.7
Navi 31, gfx1100 dGPU, GC 11.0.0, Radeon RX 7900 XT/XTX
gfx1101 dGPU
gfx1102 dGPU
gfx1103 APU
Thanks to the OpenBSD Foundation for sponsoring this work.
|
|
COMPILER_VERSION initially missed. I'm not sure why we still have those
COMPILER_VERSION checks in sys/arch/i386 and sys/arch/amd64, when the
base system doesn't ship gcc any more, but let's stay consistent.
|
|
Disable -Wdeprecated-non-prototype instead of patching zlib. Upstream
plans to drop the pre-ANSI syntax soon. ok tb@ millert@
|
|
To appease the clang 15 warning -Wdeprecated-non-prototype (turned on
by -Wall). ok millert@
|
|
When booting guests with SeaBIOS, vmd(8) supplied details about the
available guest memory via CMOS registers. Consequently, we've been
carrying some patches in the ports tree to SeaBIOS to fetch this
information like it's the 1990s.
When a vm initializes memory ranges, we now track what each range
represents. This information can be used to supply the e820 memory
map to SeaBIOS via the fw_cfg interface allowing it to properly
communicate memory ranges to a guest operating system. (This will
also allow us to drop some patches from the port.)
Given the ranges can now be marked with a purpose, this also allows
vmm(4) to switch from hard-coded mmio ranges and instead let the
information on the memory range dictate if vmm should be handling
a page fault or sending to vmd for a memory assist.
Tested by Mischa Peters and others. OK mlarkin@.
|
|
the idea is that access to vmm from userland is gated by the vmm_softc
sc_status field, and then accounted for by the refcnt. you take a read
lock to check the gate, and if it is open then you can take a reference
and do your thing. once you've finished the work then you rele the
refcnt on the way out of the ioctl handler.
the suspend code takes a write lock to close the sc_status gate,
and then uses refcnt_finalise to wait for things in the ioctl handler
to get out.
on resume, the code takes the write lock, sets the refcnt up again for
userland to use, and then opens the gate.
tested by and ok dv@
|
|
is ELF" world. Eliminate use of them in amd64, arm64, armv7, i386,
macppc, mips64, and sparc64 code.
ok deraadt@ jca@ krw@
|
|
Start eliminating it.
ok mpi@ mlarkin@ krw@
|
|
locore.S to be in C in cpu.c, machdep.c, pmap.c, or bus_space.c for
better typing/debug info. Delete REALBASEMEM, REALEXTMEM, and
biosextmem as unused/ignored.
ok mpi@ krw@ mlarkin@
|
|
trampolines into .rodata instead of .text. While here, give types
and sizes to all the global symbols and delete some superfluous
directives and unrelocated symbols in the ACPI trampoline image.
ok mlarkin@
|
|
|
|
The initial mmio support for vmd adds support for only specific MOV
and MOVZX instructions. Plan is to begin iterating in-tree on other
missing pieces. All functionality is gated behind an #if for now.
Only change to vmm(4) is reordering register #define's in vmmvar.h.
ok mlarkin@
|
|
ok kettenis@ jca@ cheloha@
|
|
Since vmm doesn't support hot-plug vcpus we can reduce complexity
by treating the vcpu list per vm as immutable after creation.
As a consequence, we can use the vm reference count to protect the
lifetime of the vcpus, removing the need for reference counting
individual vcpu objects. With an immutable list, we no longer need
a rwlock protecting it either.
Original diff from dlg@ that I reworked and tested.
ok dlg@, mlarkin@
|
|
the #ifdef MULTIPROCESSOR was a little broad.
still grateful to anton and stsp for unbreaking the tree though.
|
|
vmm.c:900:3: error: implicit declaration of function 'x86_send_ipi' is
invalid in C99 [-Werror,-Wimplicit-function-declaration]
|
|
the number is never read anywhere, and i'm not sure what value that
number has anyway.
mlarkin@ agrees
|
|
this records which physical cpu a vcpu is running on. this is used
by the code that marks a vcpu as having a pending interrupt to check
if the vcpu is currently running. if it thinks the vcpu is running,
it sends a nop IPI to the physical cpu it is running on to trigger
a vmexit, which in turn runs interrupt handling in the guest.
ok mlarkin@
|
|
Removes a lock around an atomic write; this lock was causing slowdowns
since the lock being requested is nearly always unavailable because it
is held while the VM is running.
Noticed by claudio@, help from mpi@, dlg@ and claudio@.
ok dv
|
|
Switch amd64 to the clockintr(9) subsystem. There are lots of little
changes, but the bigs ones are listed here.
When using the local apic timer:
- Run the timer in one-shot mode.
- lapic_delay() is gone. We can't use it to delay(9) when running
the timer in one-shot mode.
- Add a randomized statclock(); stathz = hz.
- Add support for switching to profhz when profiling is enabled;
profhz = stathz * 10.
When using the i8254/mc146818:
- i8254's clockintr() no longer has a monopoly on hardclock().
- mc146818's rtcintr() no longer has a monopoly on statclock().
- In profiling mode, the statclock() will drift very slightly
because (profhz = 1024) does not divide evenly into one billion.
We could avoid this by setting (profhz = 512) instead and
programming the RTC to run at that rate.
Early revisions reviewed by mlarkin@. Extensively tested by mlarkin@
on a variety of physical and virtual hardware. Additional testing
from dv@ and jmc@.
Link: https://marc.info/?l=openbsd-tech&m=166776339203279&w=2
ok kettenis@ mlarkin@
|
|
Not all of the clocks with a delay(9) implementation necessarily keep
ticking across suspend/resume. We need a clean way to reverse
delay_init() during suspend when those clocks stop ticking.
Hence, delay_fini(). delay_fini() resets delay_func() to
i8254_delay() if the given function pointer is the active delay(9)
implementation.
ok mlarkin@
|
|
The read/write register routines for SVM didn't acknowledge RAX in
the VMCB as the de facto RAX state. When writing gprs, vmm should
update RAX in the VMCB. When reading, it should be setting the guest
regs state based on the VMCB.
Needed for proper mmio emulation in userland.
ok mlarkin@
|
|
of permitted addresses, done via .nofault* sections that end up in
the linked kernel's rodata.
ok deraadt@ kettenis@
|
|
Mischa Peters reported a performance regression in 7.2 when hosting
numerous guests under vmm(4). While iterating through the list of
vms during servicing an ioctl, vmm was triggering excessive wakeup
calls due to hitting zero refcnt.
Much guidance from dlg@ and testing from Mischa. OK mlarkin@.
|
|
fault because it does memory accesses outside of the regions it told us to
map. Try to mitigate this by installing a fault handler (using the
pcb_onfault mechanism) and bail out using longjmp(9) if we encounter a
page fault while executing an EFI runtime services call.
Since some firmware bugs result in us executing code that isn't mapped,
make kpageflttrap() handle execution faults as well as data faults.
ok guenther@
|
|
fork/vfork/__tfork haven't cared about the second return register.
So, stop setting retval[1] in kern_fork.c and stop setting the
second return register in the MD child_return() routines.
With the above, we have no multi-register return values on LP64,
so stop touching that register in the trapframe on those archs.
testing miod@ and aoyama@
ok miod@
|
|
ok phessler@
|
|
Guests may need these MSRs to determine the TSC frequency on AMD
families 17h and 19h.
GP fault reported by weerd@, observed on "AMD EPYC 3201 8-Core Processor"
(17-01-02). Same issue observed by Jesper Wallin on "AMD Ryzen PRO 3700U".
Tested by Jesper Wallin.
Link: https://marc.info/?l=openbsd-bugs&m=166721628323483&w=2
ok mlarkin@
|
|
sys_execve() to return EJUSTRETURN.
setregs() is the MD routine used by sys_execve() to set up the
thread's trapframe and PCB such that, on 'return' to userspace, it
has the register values defined by the ABI and otherwise zero. It
had to set the syscall retval[] values previously because the normal
syscall return path overwrites a couple registers with the retval[]
values. By instead returning EJUSTRETURN that and some complexity
with program-counter handling on m88k and sparc64 goes away.
Also, give setregs() add a 'struct ps_strings *arginfo' argument
so powerpc, powerpc64, and sh can directly get argc/argv/envp
values for registers instead of copyin()ing the one in userspace.
Improvements from miod@ and millert@
Testing assistance miod@, kettenis@, and aoyama@
ok miod@ kettenis@
|
|
Compute the TSC frequency on AMD family 17h and 19h CPUs using the
PStateDef MSRs.
Link 1: https://marc.info/?l=openbsd-tech&m=166394236029484&w=2
Link 2: https://marc.info/?l=openbsd-tech&m=166446065916283&w=2
Test list: https://marc.info/?l=openbsd-tech&m=166646389821326&w=2
Reviewed by kettenis@ using the AMD documents cited in the comments.
Maybe reviewed by mlarkin@? I can't remember. He seemed supportive
of the idea at least.
ok kettenis@
|
|
The Dell Precision T1600 has a UEFI 2.0 implementation where calling
GetTime() accesses memory that isn't covered by a runtime mapping.
And frankly UEFI 2.0 is so ancient that we don't really want to use it
anyway.
This also adds the check to the arm64 version even though UEFI versions
before 2.4 don't have arm64 support. But for now I want to keep amd64
and arm64 code as similar as possible.
ok kn@
|
|
in the future to implement support for things like EFI variables.
ok krw@ (a few others ok'ed earlier incarnations of this diff)
|
|
|