Age | Commit message (Collapse) | Author |
|
|
|
|
|
|
|
the binding to global (NB == "no binding"), as clang 13 is now
warning about changing the binding from global to weak.
This first pass does amd64 and sparc64 and pulls DEFS.h out of the
per-arch directory to a common directory; others to follow
ok kettenis@
|
|
|
|
|
|
doing the consistency check for locore.o; ok millert@
|
|
the com(4) devices to match the traditional order one the ISA bus.
ok patrick@, anton@
|
|
Not enabled yet. Pending firmware availability.
ok stsp@ jmatthew@
|
|
with LLVM 13.
|
|
assignment compared to the the legacy one supported by com over isa.
This causes the console to halt once userland takes over as no
interrupts are received. The actual address and irq can be read from
ACPI, kettenis@ already added support for arm64 which paved the way for
amd64.
Some consoles that previously attached over isa are now expected to
attach over acpi.
Thanks to patrick@ for testing on arm64.
ok kettenis@
|
|
|
|
invariant TSC and report that correctly in the guest's cpuid(0).eax
prompted by debug messages in report from Josh Grosse (josh(at)jggimi.net)
ok mlarkin@
|
|
SYS_syscall as the nosys() function into the MD syscall entry
routines and the SYSCALL_DEBUG support. Adjust alpha's syscall
check to match the other archs. Also, make sysent const to get it
into .rodata.
With that, 'struct emul' is unused: delete it and all its references
ok millert@
|
|
ok dv@
Reported-by: syzbot+c773ba1ce9b2d259d27f@syzkaller.appspotmail.com
|
|
Guests running on Intel hosts that sleep on a lock might have their
process moved to another cpu core by the scheduler. If this happens,
the VMCS needs to be remotely cleared and locally loaded otherwise
vmx instructions will fail. vmd(8) will receive a failure code and
abort the guest.
This change stores the current (last) cpu the process was on before
attempting a function call that may sleep (e.g. uvm_fault(9)). Upon
function return, perform the VMCS dance if needed.
Tested with help from Mischa Pieters.
OK mlarkin@
|
|
Older/slower hosts could easily hit the cap under load. Set it to
the same value we use in mplock_debug.
ok mlarkin@
|
|
this hides real problems that could be found at build time
ok kettenis@ visa@, ok sashan@ on amd64/i386
|
|
Partly related to a bug reported by kn@. We should be copying out
the guest exit state (including registers) when we succesfully
return from the vcpu run loop even if we don't require an emulation
assist from userland/vmd(8). This condition was introduced when I
removed the use of yield() and instead exit the kernel if the
scheduler says we've hogged the cpu.
ok mlarkin@
|
|
semicolon
ok deraadt@
|
|
as self_reloc.c only handles the former.
ok deraadt@ kettenis@
|
|
API implemented is a deadend.
OK akoshibe@ yasuoka@ deraadt@ kn@ patrick@ sthen@
|
|
ok deraadt@
|
|
|
|
Tested by kevlo@
|
|
shows gains up to 50%) by skipping attach of irrelevant devices, which are
tagged CD_SKIPHIBERNATE in the per-driver cfdriver. In particular, usb devices
are not attached, so they don't need to detach during the suspend-unpack-resume.
New bootblocks are required (which tell the kernel it's job is unhibernate
before configure runs)
tested by various
|
|
|
|
O_RDONLY rather using 0
ok beck
|
|
and returned the error, which made the MI crypto code set the etype for
a second time. We still have to set etype after calling the MD process
function, as the callers of crypto_invoke() still expect error handling
to be shown through the etype. But at least now all MD crypto code does
not have to worry about that anymore. Once the callers are changed to
not look at etype anymore, we can get rid of it completely.
ok tobhe@
|
|
adds unnecessary complexity. Dedicated crypto offloading devices are not common
anymore. Modern CPU crypto acceleration works synchronously, eliminating the need
for callbacks.
Replace all occurrences of crypto_dispatch() with crypto_invoke(), which is
blocking and only returns after the operation has completed or an error occured.
Invoke callback functions directly from the consumer (e.g. IPsec, softraid)
instead of relying on the crypto driver to call crypto_done().
ok bluhm@ mvs@ patrick@
|
|
the asynchronous crypto API which makes progress in MP difficult.
The hardware is rarely available. They support only obsolete crypto
algorithms. Scheduling crypto tasks via PCI is probably slower
than the CPU, especailly as modern CPUs have their own accelerators.
|
|
the callback was called, and sometimes both. So the caller of that
API could not release resources correctly.
A bunch of errors can or should not happen, replace them with an
assert. Remove redundant checks. crypto_invoke() should not return
the error, but pass it via callback.
Some old hardware drivers keep part of their inconsistency as I
cannot test them.
OK mpi@
|
|
data from struct process anymore. This changes how siginfo and onstack
are accessed and make sendsig() more MP friendly.
With and OK semarie@ OK kettenis@
|
|
while walking the page tables.
ok mpi@, deraadt@
|
|
ok deraadt
|
|
After fixing previous syzbot issues related to lock contention, the reproducer code managed to hit an issue where it can exhaust kernel memory by allocating vcpus. Since each vcpu (regardless if it's SVM or VMX-capable) requires wiring some number of pages of memory, it was possible to starve other parts of the kernel.
This change limits the total number of vcpus to 512, a conservative number given vmm(4) only supports single vcpu guests at the moment.
ok mlarkin@
|
|
are ongoing.
|
|
This prevents possible corruption due to a concurrent access between
pmap_growkernel() & pmap_create/pmap_destroy().
Discussed with and ok kettenis@
|
|
Similar to the recent change by mpi in revision 1.288, commitid:
A4zhVhOoHAIpRGBJ, raise the ipl level of the vm_pool to IPL_MPFLOOR
to prevent lock ordering issues.
ok mpi@
|
|
Syzbot found 3 issues related to the new vcpu lock. This diff adds
a write lock to vm_rwregs (needed on VMX as vmread instructions
require taking ownership of the vcpu to load the VMCS) and prevents
locking the vcpu in vm_run if we fail the cas operation for toggling
vcpu state.
In the future, we can push the locking in vm_rwregs on AMD SVM
systems.
The panics in question:
panic: rw_enter: vcpulock locking against myself
panic: lock (rwlock) vcpulock not locked
panic: vcpulock: lock not held
Reported-by: syzbot+1dab11e14aa7a159cadf@syzkaller.appspotmail.com
Reported-by: syzbot+36244e105daffa1a81b6@syzkaller.appspotmail.com
Reported-by: syzbot+c78b5644c7dc3d9b689a@syzkaller.appspotmail.com
ok mlarkin@
|
|
Some pmaps (x86, hppa) and the buffer cache rely on UVM objects to allocate
and manipulate pages. These objects should not be manipulated by uvm_fault()
and do not currently require the same locking enforcement.
Use the dummy pagers to explicitly document which UVM functions are meant to
manipulate UVM objects (uobj) that do not need the upcoming `vmobjlock' and
instead still rely on the KERNEL_LOCK().
Tested by many as part of a larger diff.
ok kettenis@, beck@
|
|
IBRS feature need an lfence instruction after every near ret. Place
them after all functions in the kernel which are implemented in
assembler. Change the retguard macro so that the end of the lfence
instruction is 16-byte aligned now. This prevents that the ret
instruction is at the end of a 32-byte boundary. The latter would
cause a performance impact on certain Intel processors which have
a microcode update to mitigate the jump conditional code erratum.
See software techniques for managing speculation on AMD processors
revision 9.17.20 mitigation G-5.
See Intel mitigations for jump conditional code erratum revision
1.0 november 2019 2.4 software guidance and optimization methods.
OK deraadt@ mortimer@
|
|
since amd64 is compiled with -msave-args we have all arguments available to print and
there's no reason to limit this to six.
discussed with kettenis@
|
|
this allows us to dynamically trace function boundaries with btrace by patching
prologues and epilogues with a breakpoint upon which the handler records the data,
sends it back to userland for btrace to consume.
currently it's hidden behind DDBPROF, and there is still a lot to cleanup and
improve, but basic scripts that observe return codes from a probed function
work.
from Tom Rollet, with various changes by me
feedback and ok mpi@
|
|
We need the kernel lock before calling some uvm functions. Fixes a
panic reported by syzbot.
Reported-by: syzbot+dd7a70eaf794705db27e@syzkaller.appspotmail.com
ok mlarkin@
|
|
|
|
Adds support for Aquantia AQC1xx family of PCIe ethernet adapters. This
driver supports 1Gbps through 10Gbps modes of operation based on the
hardware and media/switch capabilities.
The initial code was ported from NetBSD, with jmatthew@ finishing up
the Tx/Rx ring support and interrupt handler routine.
The driver only supports devices using firmware V2.
This diff enables aq(4) on riscv64 and amd64, the only platforms where
I have tested the driver, but it likely works on other architectures
as well.
|
|
Syzbot might complain about "new" panics, but to help debug a recent
report it helps to have unique rw lock names.
"sounds good to me" @mlarkin
|
|
Reported-by: syzbot+c8905496cd61610f77e2@syzkaller.appspotmail.com
ok mlarkin@
|
|
to stop speculation. This seems to be necessary when the branch
predictor hits the ret for the first time. In their white paper
to mitigate speculation attacks, AMD's retpoline example has an
explicit lfence. Adjust our retpoline assembly macro in the kernel.
OK guenther@ mortimer@ deraadt@
|