Age | Commit message (Collapse) | Author |
|
OK miod@
|
|
it is a dangerous alternative entry point for all system calls, and thus
incompatible with the precision system call entry point scheme we are
heading towards. This has been a 3-year mission:
First perl needed a code-generated wrapper to fake syscall(2) as a giant
switch table, then all the ports were cleaned with relatively minor fixes,
except for "go". "go" required two fixes -- 1) a framework issue with
old library versions, and 2) like perl, a fake syscall(2) wrapper to
handle ioctl(2) and sysctl(2) because "syscall(SYS_ioctl" occurs all over
the place in the "go" ecosystem because the "go developers" are plan9-loving
unix-hating folk who tried to build an ecosystem without allowing "ioctl".
ok kettenis, jsing, afresh1, sthen
|
|
doesn't make sense anymore. It is better to just issue an illegal
instruction.
ok kettenis, with some misgivings about inconsistant approaches between
architectures.
In the future we could change sigreturn(2) to never return an exit code,
but always just terminate the process. We stopped this system call
from being callable ages ago with msyscall(2), and there is no stub for
it in libc.. maybe that's the next step to take?
|
|
descriptor (pted) pool in the arm64 pmap implementation. This
significantly reduces the side-effects of lock contention on the kernel
map lock that is (incorrectly) translated into excessive page daemon
wakeups. This is not a perfect solution but it does lead to significant
speedups on machines with many CPU cores.
This requires adding a new pmap_init_percpu() function that gets called
at the point where kernel is ready to set up the per-CPU pool caches.
Dummy implementations of this function are added for all non-arm64
architectures. Some other architectures can probably benefit from
providing an actual implementation that sets up per-CPU caches for
pmap pools as well.
ok phessler@, claudio@, miod@, patrick@
|
|
The Intel SDM states the vmxon/vmxoff instructions don't invalidate
any EPT states on the cpu and recommend invalidating the global
context. vmm(4) opportunistically disables and enables VMX mode as
vms are created or terminated, so this adds a recommended
housekeeping step per the SDM.
While here, tidy up the CR4 toggling by moving it to after the MSR
feature check.
ok mlarkin@
|
|
in front of the syscall instruction. This is used to calculate the start
of the syscall for SYS_sigreturn and pinned system calls.
ok kettenis
|
|
ok mlarkin@
|
|
While vmm's use of invvpid in the vmx vcpu run loop is questionable
since we require and use EPT, the vpid value is unquestionably wrong
in these calls.
ok mlarkin@
|
|
If the vcpu thread sleeps in the kernel, like when handling a nested
page fault and calling uvm_fault(9), the thread may be rescheduled
on another host cpu. vmm(4) was only setting the GDTR and TR bases
in the VMCS once prior to first vm entry, so a thread migration can
result in restoring the wrong GDTR and TR on vm exit for the host
cpu. This results in borked interrupts and corrupted stack pointers,
causing programs to segfault or sigabort. It can also result in
missed ipi's causing kernel deadlocks.
Use similar logic to the SVM routines and check for cpu migration
within the hot loop. Since we're letting the VMX features of the
cpu restore GDTR, we can also drop the manual store/load routines.
Reported and with much appreciated testing help from Mischa Peters.
ok mlarkin@
|
|
During boot TSC initialization could fail with panic: tsc_test_sync_ap:
cpu2: tsc_ap_name is not NULL: cpu1.
The root cause is a race between the moment the application processor
sets CPUF_IDENTIFIED in cpu_hatch() and the moment the boot processor
checks CPUF_IDENTIFIED in cpu_start_secondary() before the TSC sync
test.
The fix is to set CPUF_IDENTIFIED before clearing CPUF_IDENTIFY in
cpu_hatch().
from hshoexer@ cheloha@; OK deraadt@ mlarkin@
|
|
svm_handle_np_fault() more clearer output
ok mlarkin@
|
|
Mostly a dummy commit so that the last llvm commit ends up in the git export.
(No idea whether it's actually/still needed but it can't hurt.)
|
|
The first one defined as the second, so no functional changed.
ok claudio
|
|
ok yasuoka@
|
|
where a switch happens outside. Cleanup these code paths and make the
machine independent.
- when a process forks (fork, tfork, kthread), the new proc needs to
somehow be scheduled for the first time. This is done by proc_trampoline.
Since proc_trampoline is machine dependent assembler code change
the MP specific proc_trampoline_mp() to proc_trampoline_mi() and make
sure it is now always called.
- cpu_hatch: when booting APs the code needs to jump to the first proc
running on that CPU. This should be the idle thread for that CPU.
- sched_exit: when a proc exits it needs to switch away from itself and
then instruct the reaper to clean up the rest. This is done by switching
to the idle loop.
Since the last two cases require a context switch to the idle proc factor
out the common code to sched_toidle() and use it in those places.
Tested by many on all archs.
OK miod@ mpi@ cheloha@
|
|
reminded by jsg@, snapshot test build done by me
|
|
For now, only attach to PSE0/RGMII (device ID 0x4ba0) which is the
only device I have access to for testing.
There is a known problem where Tx throughput is lower than expected.
This is being looked into.
ok kettenis@
|
|
|
|
All the state initialization once done in clockintr_init() has been
moved to other parts of the kernel. It's a dead function. Remove it.
Likewise, the clockintr_flags variable no longer sports any meaningful
flags. Remove it. This frees up the CL_* flag namespace, which might
be useful to the clockintr frontend if we ever need to add behavior
flags to any of those functions.
|
|
In order to separate the statclock from the clock interrupt subsystem
we need to move all statclock state out into the broader kernel.
Start by replacing the CL_RNDSTAT flag with a new global variable,
"statclock_is_randomized", in kern_clock.c. Update all clockintr_init()
callers to set the boolean instead of passing the flag.
Thread: https://marc.info/?l=openbsd-tech&m=169428749720476&w=2
|
|
avoids a General-Protection Exception on patch loader wrmsr with
A10-5700, TN-A1 00610f01 15-10-01
the alignment requirement is not present on at least
Ryzen 5 2600X, PiR-B2 00800f82 17-08-02
problem reported and fix tested by espie@
|
|
7.3 is long gone, you must have new bootloaders and new kernels.
Zaps both condition and else block, unindent and merge lines where fit.
Feedback OK kettenis
Tests OK denis
|
|
To remove an ioctl(2) from the vcpu thread hotpath in vmd(8), add
a flag in the vm_run_params structure to indicate if there's another
interrupt pending. This reduces latency in vcpu work related to
i/o as we save a trip into the kernel just to flip the interrupt
pending flag on or off.
Tested by phessler@, mbuhl@, stsp@, and Mischa Peters.
ok mlarkin@
|
|
no longer needed with zlib 1.3
ok tb@
|
|
dv points out that there are other bits there that imply the existence
of other MSRs, so switching this to an include list is a better idea.
|
|
On newer Ryzen/EPYC, we need to hide the HwPstate CPUID 80000007:EDX
field for HwPstate, or guests will try to access the MSRs associated
with those, and that will fail with #GP.
ok deraadt
|
|
To give the primary CPU an opportunity to perform clock interrupt
preparation in a machine-independent manner we need to separate the
"initialization" parts of cpu_initclocks() from the "start the clock
interrupt" parts. Currently, cpu_initclocks() does everything all at
once, so there is no space for this MI setup.
Many platforms have more-or-less already done this separation by
implementing a separate routine named "cpu_startclock()". This patch
promotes cpu_startclock() from de facto standard to mandatory API.
- Prototype cpu_startclock() in sys/systm.h alongside cpu_initclocks().
The separation of responsibility between the two routines is a bit
fuzzy but the basic guidelines are as follows:
+ cpu_initclocks() must initialize hz, stathz, and profhz, and call
clockintr_init().
+ cpu_startclock() must call clockintr_cpu_init() and start the clock
interrupt cycle on the calling CPU.
These guidelines will shift in the future, but that's the way things
stand as of *this* commit.
- In initclocks(): first call cpu_initclocks(), then do MI setup, and
last call cpu_startclock().
- On platforms where cpu_startclock() already exists: don't call
cpu_startclock() from cpu_initclocks() anymore.
- On platforms where cpu_startclock() doesn't yet exist: implement it.
Usually this is as simple as dividing cpu_initclocks() in two.
Tested on amd64 (i8254, lapic), arm64, i386 (i8254, lapic), macppc,
mips64/octeon, and sparc64. Tested on arm/armv7 (agtimer(4)) by
phessler@ and jmatthew@. Tested on m88k/luna88k by aoyama@. Tested
on powerpc64 by gkoehler@ and mlarkin@. Tested on riscv64 by
jmatthew@.
Thread: https://marc.info/?l=openbsd-tech&m=169195251322149&w=2
|
|
mentioned in
https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/gather-data-sampling.html
|
|
|
|
|
|
ok guenther@ deraadt@
|
|
or IBT enabled the kernel, the hardware should the attacks which
retpolines were created to prevent. In those cases, retpolines
should be a net negative for security as they are an indirect branch
gadget. They're also slower.
* use -mretpoline-external-thunk to give us control of the code
used for indirect branches
* default to using a retpoline as before, but marks it and the
other ASM kernel retpolines for code patching
* if the CPU has eIBRS, then enable it
* if the CPU has eIBRS *or* IBT, then codepatch the three different
retpolines to just indirect jumps
make clean && make config required after this
ok kettenis@
|
|
const.
suggested by bluhm@
|
|
|
|
of code to use in codepatching. Use that for all the existing
codepatching snippets.
Similarly, add CODEPATCH_CODE_LEN() which is CODEPATCH_CODE() but also
provides a short variable holding the length of the codepatch snippet.
Use that for some snippets that will be used for retpoline replacement.
ok kettenis@ deraadt@
|
|
ok mlarkin@ kettenis@ deraadt@
|
|
ok mlarkin@
|
|
invoked using indirect branches and should have endbr64's.
ok deraadt@
|
|
indirect pointer itself and provide an initializer for that going
to the default "just enable interrupts and halt" path.
ok kettenis@
|
|
This patch isolates profil(2) and GPROF from statclock(). Currently,
statclock() implements both profil(2) and GPROF through a complex
mechanism involving both platform code (setstatclockrate) and the
scheduler (pscnt, psdiv, and psratio). We have a machine-independent
interface to the clock interrupt hardware now, so we no longer need to
do it this way.
- Move profil(2)-specific code from statclock() to a new clock
interrupt callback, profclock(), in subr_prof.c. Each
schedstate_percpu has its own profclock handle. The profclock is
enabled/disabled for a given CPU when it is needed by the running
thread during mi_switch() and sched_exit().
- Move GPROF-specific code from statclock() to a new clock interrupt
callback, gmonclock(), in subr_prof.c. Where available, each cpu_info
has its own gmonclock handle . The gmonclock is enabled/disabled for
a given CPU via sysctl(2) in prof_state_toggle().
- Both profclock() and gmonclock() have a fixed period, profclock_period,
that is initialized during initclocks().
- Export clockintr_advance(), clockintr_cancel(), clockintr_establish(),
and clockintr_stagger() via <sys/clockintr.h>. They have external
callers now.
- Delete pscnt, psdiv, psratio. From schedstate_percpu, also delete
spc_pscnt and spc_psdiv. The statclock frequency is not dynamic
anymore so these variables are now useless.
- Delete code/state related to the dynamic statclock frequency from
kern_clockintr.c. The statclock frequency can still be pseudo-random,
so move the contents of clockintr_statvar_init() into clockintr_init().
With input from miod@, deraadt@, and claudio@. Early revisions
cleaned up by claudio. Early revisions tested by claudio@. Tested by
cheloha@ on amd64, arm64, macppc, octeon, and sparc64 (sun4v).
Compile- and boot- tested on i386 by mlarkin@. riscv64 compilation
bugs found by mlarkin@. Tested on riscv64 by jca@. Tested on
powerpc64 by gkoehler@.
|
|
away the calls
ok deraadt@ mpi@ miod@
|
|
not indicate bit 9 set, but they could have a firmware fix) but then block
a msr write to bit 9 (which disables enough AVX optimizations
to prevent the exfiltration of data), with a fault. So let's also check
the HV bit before we decide to modify the bit. hypervisors are expected
to set that bit. tested by lucas at sexy dot is.
with jsg, ok mlarkin
|
|
have other side-effects (not disclosed by AMD), and firmwares fixes may
be better (and have other side-effects, same story). Newer processors
will probably be validated more carefully by AMD.
Issue found by Tavis Ormandy.
This is errata 7.2/033_amdcpu.patch.sig and 7.3/011_amdcpu.patch.sig
Zenbleed also blocked on select cpus by using errata
7.3/012_amdfirmware.patch.sig + 7.3/013_amdcpufirmware.patch /
7.2/034_amdfirmware.patch.sig + 7.2/035_amdcpufirmware.patch.sig
which load AMD cpu firmwares (firmware.openbsd.org is updated often to
contain the best firmwares)
ok jsg
|
|
ok deraadt@
|
|
ok deraadt@
|
|
Provide more ARCH_CAP_* defines per June 2023 SDM
ok jsg@ deraadt@
|
|
done for GENERIC already.
ok kettenis kn
|
|
to save/restore the state and enabling it at exec-time (and for
signal handling) if the PS_NOBTCFI flag isn't set.
Note: this changes the format of the sc_fpstate data in the signal
context to possibly be in compressed format: starting now we just
guarantee that that state is in a format understood by the XRSTOR
instruction of the system that is being executed on.
At this time, passing sigreturn a corrupt sc_fpstate now results
in the process exiting with no attempt to fix it up or send a
T_PROTFLT trap. That may change.
prodding by deraadt@
issues with my original signal handling design identified by kettenis@
lots of base and ports preparation for this by deraadt@ and the
libressl and ports teams
ok deraadt@ kettenis@
|
|
amd64 and i386. This is the first step towards a machine independent
safe sleep API.
tested by yasuoka@ bluhm@
ok deraadt@ kettenis@
|
|
While UEFI 2.10 has a way of indicating that runtime services use the
appropriate ENDBR64 instructions, firmware that's out int the wild doesn't
actually use that yet. Once the landscape changes we may want to
reconsider toggling IBT off.
ok guenther@, kn@
|