Age | Commit message (Collapse) | Author |
|
This should be sufficient for identifying pivoted ROP. Doing so for other
traps is at best opportunistic for finding a straight-running ROP chain,
but the added (and rare) sleeping point has proven to be dangerous.
Discussed at length with kettenis and mortimer.
ok mortimer kettenis mpi
|
|
Apple machines. The driver attaches through acpi(4) when the HID
'APP0002' is found.
Thanks to kettenis@ for helping me sorting out the PCI bits.
ok kettenis@
|
|
The fault address is read from cr2 in pageflttrap() which
gets called after this check and if the check sleeps, cr2 is likely to
be clobbered by a page fault in another process.
Fix this by reading cr2 early and pass it to pageflttrap().
ok mpi@, semarie@, deraadt@
|
|
with this we can revert the recent coherency workaround in mesa
ok deraadt@ kettenis@
|
|
ok kettenis@ deraadt@
|
|
|
|
Now that asmc(4) attaches through acpi(4), other than with isa(4), acpi(4)
could attach multiple SMC chips in theory, even though in practice there
will be only one SMC chip per machine.
Suggested and ok kettenis@
|
|
This e.g. makes the driver also work on iMac11,2.
ok kettenis@, jung@
|
|
The header is being pulled via db_machdep.h -> uvm_extern.h -> uvm_map.h
|
|
from jordan@
|
|
In preparation for running the lapic timer in oneshot mode on amd64 we
need a replacement for lapic_delay().
Using the lapic timer itself to implement delay(9) when the timer is
not running in periodic mode is complicated if not outright impossible.
Meanwhile, the i8254 provides our only other amd64 delay(9) implementation
and it is an extremely slow clock. On my 2GHz machine, gettick() takes
~20 microseconds to complete *without* mutex contention. On a VM it is
even slower, as you must exit the VM for each inb() and outb().
So, add tsc_delay() and use it when we have a constant/invariant TSC.
The TSC is a 64-bit "up-counter" so the implementation is simple.
Given how slow the i8254 is on modern machines, we may want to add an
HPET delay(9) implementation as a fallback for machines where the TSC
drifts. The HPET itself is pretty slow, but not as slow as the i8254.
Discussed with kettenis@, Mike Larkin, and naddy@.
Tweaked by kettenis@.
ok kettenis@
|
|
We reprogram the lapic timer by hand in three separate places.
This is error-prone and difficult to read.
To clean things up, introduce routines for reprogramming the lapic
timer in a given mode. lapic_timer_oneshot() starts a oneshot
countdown. lapic_timer_periodic() starts a repeating countdown.
Both of these routines call lapic_timer_start(), wherein we actually
write the lapic registers.
With input from dlg@.
Earlier version eyeballed by mlarkin@.
Suspend/resume tested by gnezdo@.
|
|
at the beginning of the loop. We need to use cr3 at the start of each
iteration for the top level page directory.
From and ok sf@
|
|
on the HP EliteBook 830 G6 we added a workaround which tries to re-map
the pages where we want to place to kernel read-write. On some machines
though this workaround causes a regression. Fix those by changing a few
things: Only set the writeable bit if it isn't set yet. Un-protect
write-protected page directories. Skip lower levels if large-page is
set, since the next level is already a page. Don't do anything at all
if paging is disabled.
From Christian Ehrhardt
ok bluhm@ tobhe@
|
|
OK deraadt@, mpi@
|
|
Regarding RDTSC, the Intel ISA reference says (Vol 2B. 4-545):
> The RDTSC instruction is not a serializing instruction.
>
> It does not necessarily wait until all previous instructions
> have been executed before reading the counter.
>
> Similarly, subsequent instructions may begin execution before the
> read operation is performed.
>
> If software requires RDTSC to be executed only after all previous
> instructions have completed locally, it can either use RDTSCP (if
> the processor supports that instruction) or execute the sequence
> LFENCE;RDTSC.
To mitigate this problem, Linux and DragonFly use LFENCE. FreeBSD and
NetBSD take a more complex route: they selectively use MFENCE, LFENCE,
or CPUID depending on whether the CPU is AMD, Intel, VIA or something
else.
Let's start with just LFENCE. We only use the TSC as a timecounter on
SSE2 systems so there is no need to conditionally compile the LFENCE.
We can explore conditionally using MFENCE later.
Microbenchmarking on my machine (Core i7-8650) suggests a penalty of
about 7-10% over a "naked" RDTSC. This is acceptable. It's a bit of
a moot point though: the alternative is a considerably weaker
monotonicity guarantee when comparing timestamps between threads,
which is not acceptable.
It's worth noting that kernel timecounting is not *exactly* like
userspace timecounting. However, they are similar enough that we can
use userspace benchmarks to make conjectures about possible impacts on
kernel performance.
Concerns about kernel performance, in particular the network stack,
were the blocking issue for this patch. Regarding networking
performance, claudio@ says a 10% slower nanotime(9) or nanouptime(9)
is acceptable and that shaving off "tens of cycles" is a
micro-optimization. There are bigger optimizations to chase down
before such a difference would matter.
There is additional work to be done here. We could experiment with
conditionally using MFENCE. Also, the userspace TSC timecounter
doesn't have access to the adjustment skews available to the kernel
timecounter. pirofti@ has suggested a scheme involving RDTSCP and an
array of skews mapped into user memory. deraadt@ has suggested a
scheme where the skew would be kept in the TCB. However it is done,
access to the skews will improve monotonicity, which remains a problem
with the TSC.
First proposed by kettenis@ and pirofti@. With input from pirofti@,
deraadt@, guenther@, naddy@, kettenis@, and claudio@. Based on
similar changes in Linux, FreeBSD, NetBSD, and DragonFlyBSD.
ok deraadt@ pirofti@ kettenis@ naddy@ claudio@
|
|
The "error" variable is used in one case only, so move it into scope under
#ifdef.
OK deraadt gnezdo
|
|
deraadt@: fine
|
|
ok kettenis@, visa@
|
|
from mortimer
|
|
the pciide subsystem a tiny bit at attach-time, but we don't have the
downstream cd(4) device to attach, so let's try without)
|
|
later processing. The use of a high interrupt will predate suspend/resume
efforts, we had to redesign acpi to be non-reentrant obviously
discussed with kettenis, in snaps for more than a week
|
|
OK deraadt@
|
|
* We don't need TC_LAST
* Make internal functions static to avoid namespace pollution in libc.a
* Use a switch statement to harmonize with architectures providing
multiple timecounters
ok deraadt@, pirofti@
|
|
functions in them and let rasops call them directly.
From John Carmack
ok kettenis
|
|
This diff exposes parts of clock_gettime(2) and gettimeofday(2) to
userland via libc eliberating processes from the need for a context
switch everytime they want to count the passage of time.
If a timecounter clock can be exposed to userland than it needs to set
its tc_user member to a non-zero value. Tested with one or multiple
counters per architecture.
The timing data is shared through a pointer found in the new ELF
auxiliary vector AUX_openbsd_timekeep containing timehands information
that is frequently updated by the kernel.
Timing differences between the last kernel update and the current time
are adjusted in userland by the tc_get_timecount() function inside the
MD usertc.c file.
This permits a much more responsive environment, quite visible in
browsers, office programs and gaming (apparently one is are able to fly
in Minecraft now).
Tested by robert@, sthen@, naddy@, kmos@, phessler@, and many others!
OK from at least kettenis@, cheloha@, naddy@, sthen@
|
|
"looks right" deraadt@
|
|
doing some sort of time measurement. This is necessary since RDTSC
is not a serializing instruction. We can use LFENCE as the serializing
instruction instead of CPUID since all amd64 machines have SSE.
This considerably reduces the jitter in TSC skew measurements.
ok deraadt@, cheloha@, phessler@
|
|
functionality is provided by <sys/stdarg.h> using compiler builtins.
Tested in a ports bulk build on amd64 by naddy@
OK naddy@ mpi@
|
|
First brought up by naddy@ in the usertc thread, OK kettenis@.
|
|
|
|
the cpu is specified by a struct cpu_info *, which should generally
come from an intrmap.
this is adapted from a diff that patrick@ sent round a few years
ago for a pci_intr_map_msix_cpuid, where you asked for an msi vector
on a specific cpu, and then called pci_intr_establish with the
handle you get. kettenis pointed out that it's hard on some archs
to carry cpu on a pci interrupt handle, so i tweaked it to turn it
into a pci_intr_establish_cpu instead.
jmatthew@ and i (but mostly jmatthew@ to be honest) have been
experimenting with this api on multiple archs and it is working out
well. i'm putting this diff in now on amd64 so people can kick the
tyres a bit.
tested with hacked up vmx(4), ix(4), and mcx(4)
|
|
intr_barrier passed NULL to sched_barrier before this, which ends
up being the primary cpu. that's been mostly right until this point,
but is set to change.
|
|
Do the same for rdseed.
ok deraadt@
|
|
|
|
|
|
and alternatively XOR'd against TSC. now always run both sequences, and
also support rdseed as a third procedure.
ok kettenis naddy
|
|
adds kernel support for
amdgpu: vega20, raven2, renoir, navi10, navi14
inteldrm: icelake, tigerlake
Thanks to the OpenBSD Foundation for sponsoring this work, kettenis@ for
helping, patrick@ for helping adapt rockchip drm and many developers for
testing.
|
|
|
|
ok kettenis
|
|
|
|
rnd.c uses nanotime to get access to some bits that change quickly
between events that it can mix into the entropy pool. it doesn't
use nanotime to get a monotonically increasing set or ordered and
accurate timestamps, it just wants something with bits that change.
there's been discussions for years about letting rnd use a clock
that's super fast to read, but not necessarily accurate, but it
wasn't until recently that i figured out it wasn't interested in
time at all, so things like keeping a fast clock coherent between
cpu cores or correct according to ntp is unecessary. this means we
can just let rnd read the cycle counters on cpus and things will
be fine. cpus with cycle counters that vary in their speed and
arent kept consistent between cores may even be desirable in this
context.
so this is the first step in converting rnd.c to reading cycle
counter. it copies the nanotime backend to each arch, and they can
replace it with something MD as a second step later on.
djm@ suggested rnd_messybytes, but we landed on cpu_rnd_messybits.
thanks to visa for his eyes.
ok deraadt@ visa@
deraadt@ says he will help handle any MD fallout that occurs.
|
|
conversion steps). it only contains kernel prototypes for 4 interfaces,
all of which legitimately belong in sys/systm.h, which are already included
by all enqueue_randomness() users.
|
|
EFIFB_HEIGHT and EFIFB_WIDTH instead of efifb_std_descr.n{rows,cols}.
Because the efifb resolution doesn't change, this ensures 'ri_emuwidth'
and 'ri_emuheight' will always get the same value when we remap and
later when we attach, so the text area is always displayed at the same
position.
This fixes display glitches happening on smaller screens or with larger
fonts, which caused the content previously displayed in the area that
was becoming margins when remapping to remain there.
OK jsg@
|
|
to select the VGA or the EFI framebuffer properly. Previous
initializes VGA unconditionally, it caused serious problems like the
video distortion and so on. As a downside of this commit, some early
panic or debug messages will not be displayed.
test Andrew Daugherity, jsg
ok jsg kettenis
|
|
This is the same change made in rev 1.21 to match the drm drivers.
It was reverted as Lucas Raab reported problems with inteldrm taking
over the fb with a 4k display. Lucas confirmed that this is no longer
an issue.
Prompted by a similar patch from John Carmack to raise the limits.
ok kettenis@
|
|
discussed with deraadt@
|
|
|
|
|
|
|