Age | Commit message (Collapse) | Author |
|
into FS_* values. Similar to what gpt_get_fstype() does. Code is
clearer and better positioned for planned enhancements to
spoofing.
No intentional functional change.
|
|
firmware probably does this for us on ACPI systems with proper S3 support,
but this doesn't happen on systems where we park CPUs in a low-power idle
state ourselves.
ok deraadt@
|
|
counters where the event being counted occurs across all CPUs in the
system. Counter instances can be made per-cpu by calling evcount_percpu()
after the counter is attached, and this can occur before or after all system
CPUs are attached. Per-cpu counter instances should be incremented using
evcount_inc().
ok kettenis@ jca@ cheloha@
|
|
|
|
OK mpi@
|
|
Also allow IPPROTO_TCP:TCP_NODELAY
It is very small kernel code, and will allow some software to drop "inet"
requested by djm
|
|
The code only needs to know if the vnode is exclusive locked and this
can be done on entry of the function.
OK mpi@
|
|
I never should have added the TIMEOUT_KCLOCK flag. It is redundant
and only serves to complicate the timeout(9) logic. In every place
where we check for the flag we can just use timeout.to_kclock.
So, remove the flag from <sys/timeout.h> and rewrite all affected
logic to use the value of timeout.to_kclock instead.
ok kn@
|
|
|
|
parameter const.
|
|
All the fields accessed in this syscall are protected by the SCHED_LOCK()
so it isn't necessary to wait for another CPU to release the KERNEL_LOCK()
before that.
ok claudio@
|
|
ok deraadt@
|
|
The kernel is not quite ready for timeout_in_nsec(). Remove it and
kclock_nanotime(). Both are unused.
Prompted by jsg@.
ok kn@
|
|
During resume, it isn't necessarily a problem if the UTC time we get
from inittodr(9) lags behind the system UTC clock. In particular, if
the active timecounter's frequency is low enough, tc_delta() might not
overflow across a brief suspend.
Remove the misleading warning message. The code is behaving as
intended, just not in a way I anticipated when I added the warning
message a few years ago.
Discovered by kettenis@. Root cause isolated with kettenis@.
Link: https://marc.info/?l=openbsd-tech&m=166790845619897&w=2
ok mlarkin@ kettenis@
|
|
This is a mechanical diff without semantical changes, locking ioctls
individually inside ifioctl() rather than all of them around it.
This allows us to unlock ioctls one by one.
OK mpi
|
|
|
|
Accesses to data structures used by these syscalls are serialized by the
VM map lock with the exception of file mappings which are still protected
by the KERNEL_LOCK().
Unlocking this set of syscalls improves most of userland workloads.
Tested by many including robert@ (since 2 years), mlarkin@, kn@, sdk@,
jca@, aoyama@, naddy@, Scott Bennett and others. Thanks to all!
Joint work with kn@.
ok robert@, aja@, kettenis@, kn@, deraadt@, beck@
|
|
to monitor state changes of the kernel device tree
input from dnd ok dlg@, deraadt@
|
|
|
|
clockintr(9) is a machine-independent clock interrupt scheduler. It
emulates most of what the machine-dependent clock interrupt code is
doing on every platform. Every CPU has a work schedule based on the
system uptime clock. For now, every CPU has a hardclock(9) and a
statclock(). If schedhz is set, every CPU has a schedclock(), too.
This commit only contains the MI pieces. All code is conditionally
compiled with __HAVE_CLOCKINTR. This commit changes no behavior yet.
At a high level, clockintr(9) is configured and used as follows:
1. During boot, the primary CPU calls clockintr_init(9). Global state
is initialized.
2. Primary CPU calls clockintr_cpu_init(9). Local, per-CPU state is
initialized. An "intrclock" struct may be installed, too.
3. Secondary CPUs call clockintr_cpu_init(9) to initialize their
local state.
4. All CPUs repeatedly call clockintr_dispatch(9) from the MD clock
interrupt handler. The CPUs complete work and rearm their local
interrupt clock, if any, during the dispatch.
5. Repeat step (4) until the system shuts down, suspends, or hibernates.
6. During resume, the primary CPU calls inittodr(9) and advances the
system uptime.
7. Go to step (2). This time around, clockintr_cpu_init(9) also
advances the work schedule on the calling CPU to skip events that
expired during suspend. This prevents a "thundering herd" of
useless work during the first clock interrupt.
In the long term, we need an MI clock interrupt scheduler in order to
(1) provide control over the clock interrupt to MI subsystems like
timeout(9) and dt(4) to improve their accuracy, (2) provide drivers
like acpicpu(4) a means for slowing or stopping the clock interrupt on
idle CPUs to conserve power, and (3) reduce the amount of duplicated
code in the MD clock interrupt code.
Before we can do any of that, though, we need to switch every platform
over to using clockintr(9) and do some cleanup.
Prompted by "the vmm(4) time bug," among other problems, and a
discussion at a2k19 on the subject. Lots of design input from
kettenis@. Early versions reviewed by kettenis@ and mlarkin@.
Platform-specific help and testing from kettenis@, gkoehler@,
mlarkin@, miod@, aoyama@, visa@, and dv@. Babysitting and spiritual
guidance from mlarkin@ and kettenis@.
Link: https://marc.info/?l=openbsd-tech&m=166697497302283&w=2
ok kettenis@ mlarkin@
|
|
It needs to be all non-writeable segments, which really means rodata.
crt0 and ld.so will need to call mimmutable() later on these regions.
ok kettenis
|
|
to reflect that retval is just a single return value.
ok miod@
|
|
for lseek() and a single register_t value for all others.
ok miod@
|
|
fork/vfork/__tfork haven't cared about the second return register.
So, stop setting retval[1] in kern_fork.c and stop setting the
second return register in the MD child_return() routines.
With the above, we have no multi-register return values on LP64,
so stop touching that register in the trapframe on those archs.
testing miod@ and aoyama@
ok miod@
|
|
sys_execve() to return EJUSTRETURN.
setregs() is the MD routine used by sys_execve() to set up the
thread's trapframe and PCB such that, on 'return' to userspace, it
has the register values defined by the ABI and otherwise zero. It
had to set the syscall retval[] values previously because the normal
syscall return path overwrites a couple registers with the retval[]
values. By instead returning EJUSTRETURN that and some complexity
with program-counter handling on m88k and sparc64 goes away.
Also, give setregs() add a 'struct ps_strings *arginfo' argument
so powerpc, powerpc64, and sh can directly get argc/argv/envp
values for registers instead of copyin()ing the one in userspace.
Improvements from miod@ and millert@
Testing assistance miod@, kettenis@, and aoyama@
ok miod@ kettenis@
|
|
Libraries are less of a concern, because ld.so can fix them in the right
order. So we must scan DYNAMIC for the TEXTREL marker, and not make
X LOADs immutable. ld.so will apply changes to the text segment. In
upcoming diff, crt0 and ld.so will then apply immutability.
ok kettenis
|
|
is inspected narrowly for base address later.
ok kettenis
|
|
ok anton@, millert@
|
|
|
|
This includes a change of siginfo_r which is technically an ABI break but
this should have no real-world impact since the members involved are
never touched by the kernel.
ok millert@, deraadt@
|
|
This includes a change of siginfo_r which is technically an ABI break but
this should have no real-world impact since the members involved are
never touched by the kernel.
ok millert@, deraadt@
|
|
DT_DEBUG word is inside a R LOAD that gets marked immutable, but ld.so
does a mprotect RW + adjustment + mprotect R. DT_DEBUG is specified as
being inside the DYNAMIC range, solet's do all the immutables and then,
on mips64 only, turn around and make DYNAMIC mutable. That gives us
time to see if we can move DT_DEBUG or change what ld.so is doing.
discussed at length with kettenis
|
|
because DT_DEBUG isn't in the right place
|
|
I juggled my trees incorrectly.
|
|
|
|
compromise...), but it means the stack can be marked immutable again.
ok kettenis
|
|
The large commented block in elf_load_psection explains the sitaution.
ok kettenis.
|
|
optional.
We have no interest on pru_abort() return value. We call it only from
soabort() which is dummy pru_abort() wrapper and has no return value.
Only the connection oriented sockets need to implement (*pru_abort)()
handler. Such sockets are tcp(4) and unix(4) sockets, so remove existing
code for all others, it doesn't called.
ok guenther@
|
|
go back to the old approach: using a new anon mapping because it removes
any potential gadgetry pre-placed in the region (by making it zero). But
also bring in a few more validation checks beyond contigious mapping -- it
must not be a syscall region, and the protection must be precisely RW.
This does allow sigaltstack() to shoot zero'd MAP_STACK non-immutable regions
into the main stack area (which will soon be immutable). I am not sure we
can keep reinforce immutable on the region after we do stack (like maybe
determine this while doing the validation entry walk?)
Sadly, continued support for sigaltstack() does require selecting the guessed
best compromise.
ok kettenis
|
|
problem because haphazard use could shoot holes in the address space
(changing permissions, providing opportunities for pivoting, etc). I
tried to write a diff to convert the address space correctly but did
not understand enough about map entries, so instead we mapped new
memory over top of the existing object. Placing a new mapping becomes
unfeasible with the upcoming mimmutable model, so here is code that
adds MAP_STACK to the region. It will only do so for a contigiously
mapped region that is non-syscall with permission RW, otherwise it
returns an error.
Food for thought: If we know the object isn't service by an object,
we should consider zero'ing the region, to block pre-pivot placement?
ok kettenis
|
|
to assign a quality to RTC implementation and pick the "best" RTC if a
system has multiple RTCs (or multiple interfaces to an RTC). This allows
us to prefer a battery-backed I2C RTC over an RTC that is part of the SoC
which is only running of the SoC is powered. It also allows us to
work around issues with firmware RTC interfaces that may lie to us or
even crash the system.
This change makes sure the todr_quality member of the struct is always
initialized. In most cases the quality will be set to zero; further
adjustments of the quality for specific subsystems/architectures will follow.
ok cheloha@, patrick@
|
|
regions, so immutable stack isn't viable yet. There are configure programs
which create sigstacks upon their own stacks, and there is no simple fix for
the sigaltstack mechanism...
discovered by sthen and tb
|
|
device whose disklabel is being checked. Within checkdisklabel()
use this information to discover a device name iff (sic) the
label is an obsolete version. Use the name to generate a
meaningful warning message asking the user to rewrite the
disklabel and thus promote it to the current version.
Suggested by, feedback from and ok deraadt@
|
|
to try to change the permissions of it. We won't know who's trying that
until we enable it and see what breaks.
A tricky piece relating to setrlimit stack size changing was previously commited.
ok kettenis
|
|
execve() time
ok kettenis
|
|
|
|
memory mappings so they cannot be changed by a later mmap(), mprotect(),
or munmap(), which will error with EPERM instead.
ok kettenis
|
|
malloc(9) or pool_get(9).
Pass down a wait flag to pru_attach(). During syscall socket(2)
it is ok to wait, this logic was missing for internet pcb. Pfkey
and route sockets were already waiting.
sonewconn() must not wait when called during TCP 3-way handshake.
This logic has been preserved. Unix domain stream socket connect(2)
can wait until the other side has created the socket to accept.
OK mvs@
|
|
symmetric to counters_read().
OK jmatthew@
|
|
in the past, but those compat layers are gone. Remove support for the
"config file"
ok miod millert
|