Age | Commit message (Collapse) | Author |
|
OK mpi@
|
|
used during devlopment (for visibility). There is speculation claudio will
immediately use these bits for something else.
|
|
Instead of the KERNEL_LOCK use the ps_mtx for most operations.
If the ps_klist is modified an additional global rwlock (kqueue_ps_list_lock)
is required. This includes the knotes with NOTE_FORK and NOTE_EXIT since
in either cases a ps_klist is changed. In the NOTE_FORK | NOTE_TRACK case
the call to kqueue_register() can sleep this is why a global rwlock is used.
Adjust the reaper() to call knote_processexit() without KERNEL_LOCK.
Double lock idea from visa@
OK mvs@
|
|
For procs (threads) the accounting happens now lockless by curproc using
a generation counter. Callers need to use tu_enter() and tu_leave() for this.
To read the proc p_tu struct tuagg_get_proc() should be used. It ensures
that the values read is consistent.
For processes only the time of exited threads is accumulated in ps_tu and
to get the proper process time usage tuagg_get_process() needs to be called.
tuagg_get_process() will sum up all procs p_tu plus the ps_tu.
This removes another SCHED_LOCK() dependency. Adjust the code in
exit1() and exit2() to correctly account for the full run time.
For this adjust sched_exit() to do the runtime accounting like it is done
in mi_switch().
OK jca@ dlg@
|
|
replaced it with a more strict mechanism, which happens to be lockless O(1)
rather than micro-lock O(1)+O(log N). Also nop-out the sys_msyscall(2) guts,
but leave the syscall around for a bit longer so that people can build through
it, since ld.so(1) still wants to call it.
|
|
check earlier, the pinsyscall(SYS_execve mechanism has become redundant.
It needs to be removed delicately since ld.so and static binaries use it.
As a first step, neuter the checking code in sys_execve(). Further steps
will follow slowly.
ok kettenis
|
|
the main program or ld.so, and accept a submission of that information
for libc.so from ld.so via pinsyscalls(2). At system call invocation,
the syscall number is matched to the specific address it must come from.
ok kettenis, gnezdo, testing of variations by many people
|
|
If single thread is already held by another thread just unwind to userret()
wait there and retry the system call later (if at all).
OK mpi@
|
|
The mode can now be or-ed with SINGLE_DEEP or SINGLE_NOWAIT to alter
the behaviour of single_thread_set(). This allows explicit control
of the SINGLE_DEEP behaviour.
If SINGLE_DEEP is set the deep flag is passed to the initial check call
and by that the check will error out instead of suspending (SINGLE_UNWIND)
or exiting (SINGLE_EXIT). The SINGLE_DEEP flag is required in calls to
single_thread_set() outside of userret. E.g. at the start of sys_execve
because the proc is not allowed to call exit1() in that location.
SINGLE_NOWAIT skips the wait at the end of single_thread_set() and therefor
returns BEFORE all threads have been parked. Currently this is only used by
the ptrace code and should not be used anywhere else. Not waiting for all
threads to settle is asking for trouble.
This solves an issue by using SINGLE_UNWIND in the coredump case where
the code should actually exit in case another thread crashed moments earlier.
Also the SINGLE_UNWIND in pledge_fail() is now marked SINGLE_DEEP since
the call to pledge_fail() is for sure not at the kernel boundary.
OK mpi@
|
|
Control Flow Integrity has been disabled for the process. At
exec-time, set that flag iff EXEC_NOBTCFI is passed from the ELF
exec bits (which set it based on presence of a PT_OPENBSD_NOBTCFI
segment). This will be used by the amd64 code.
kern_exec.c part by kettenis@
ok guenther@ deraadt@
|
|
IBT/BTI, because many more things are about to work correctly
|
|
ok jmc@ guenther@ tb@
|
|
or not. The idea is that since /usr/local has wxallowed by default this
will enable enforcement for base while leaving ports alone for now. This
will help us transition to a state where ports are properly marked and
allow us to establish that base is really clean.
Also add an exception for chrome. Chrome already appears to be clean on
arm64 and this exception can be easily modified for testing other ports.
This will screw over people that deliberately disable wxallowed on
/usr/local or who don't have a separate partition for /usr/local. We
think that is an acceptable compromise for the next months.
ok robert@, deraadt@ (who came up with the idea)
|
|
a new AEXECVE bit to acct(4), and print it in lastcomm(8)
ok bluhm
|
|
telling the kernel with pinsyscall(2)
|
|
Make knote(9) lock the knote list internally, and add knote_locked(9)
for the typical situation where the list is already locked.
Remove the KNOTE(9) macro to simplify the API.
Manual page OK jmc@
OK mpi@ mvs@
|
|
into core dumps. As a result backtraces through signal handlers no
longer work in gdb and other debuggers.
Fix this by keeping a read-only mapping of the signal trampoline in the
kernel and writing it into the core dump at the virtual address where it
is mapped in the process.
ok deraadt@, tb@
|
|
exposed in a new field returned by sysctl(KERN_PROC). Update
pthread_{get,set}_name_np(3) to use the syscalls. Show them, when
set, in ps -H and top -H output.
libc and libpthread minor bumps
ok mpi@, mvs@, deraadt@
|
|
signal trampoline can now be PROT_EXEC (without PROT_READ) everywhere
ok kettenis
|
|
copy on userland stack which points at an illicit region.
ok kettenis, deraadt
|
|
the entries, so the check-sp-at-system-call check failed. Quite strange
it took this long to find this.
ok kettenis
|
|
sys_execve() to return EJUSTRETURN.
setregs() is the MD routine used by sys_execve() to set up the
thread's trapframe and PCB such that, on 'return' to userspace, it
has the register values defined by the ABI and otherwise zero. It
had to set the syscall retval[] values previously because the normal
syscall return path overwrites a couple registers with the retval[]
values. By instead returning EJUSTRETURN that and some complexity
with program-counter handling on m88k and sparc64 goes away.
Also, give setregs() add a 'struct ps_strings *arginfo' argument
so powerpc, powerpc64, and sh can directly get argc/argv/envp
values for registers instead of copyin()ing the one in userspace.
Improvements from miod@ and millert@
Testing assistance miod@, kettenis@, and aoyama@
ok miod@ kettenis@
|
|
|
|
compromise...), but it means the stack can be marked immutable again.
ok kettenis
|
|
regions, so immutable stack isn't viable yet. There are configure programs
which create sigstacks upon their own stacks, and there is no simple fix for
the sigaltstack mechanism...
discovered by sthen and tb
|
|
to try to change the permissions of it. We won't know who's trying that
until we enable it and see what breaks.
A tricky piece relating to setrlimit stack size changing was previously commited.
ok kettenis
|
|
execve() time
ok kettenis
|
|
memory mappings so they cannot be changed by a later mmap(), mprotect(),
or munmap(), which will error with EPERM instead.
ok kettenis
|
|
ok mpi@ miod@
|
|
including the NUL), in all internal interafaces, and expose this
in ktrace, core, or proc.h visibility.
ok millert
|
|
original thread's stack hasn't been used since 2015.
ok miod@ deraadt@
|
|
SYS_syscall as the nosys() function into the MD syscall entry
routines and the SYSCALL_DEBUG support. Adjust alpha's syscall
check to match the other archs. Also, make sysent const to get it
into .rodata.
With that, 'struct emul' is unused: delete it and all its references
ok millert@
|
|
exec_elf_fixup() and coredump_elf() in <sys/exec_elf.h> and call
them and the MD setregs() directly in kern_exec.c and kern_sig.c
Also delete e_name[] (only used by sysctl), e_errno (unused), and
e_syscallnames[] (only used by SYSCALL_DEBUG) and constipate
syscallnames to 'const char *const[]'
ok kettenis@
|
|
sigobject. Just use the existing globals for the former and use a
global for the latter.
ok jsg@ kettenis@
|
|
of the auxinfo is fixed: provide ELF_AUX_WORDS in <sys/exec_elf.h>
as a replacement for emul->e_arglen
ok millert@
|
|
copyargs() return 0/1 and merge elf_copyargs() into it. Rename
ep_emul_arg and ep_emul_argp to have clearer meaning and type and
eliminate ep_emul_argsize as no longer necessary. Make sure
ep_auxinfo (nee ep_emul_argp) is initialized as powerpc64 always
uses it in setregs().
ok semarie@ deraadt@ kettenis@
|
|
architecture.
from miod
|
|
single_thread_set() is modified to explicitly indicated when waiting until
sibling threads are parked is required. This is obviously not required if
a traced thread is switching away from a CPU after handling a STOP signal.
ok claudio@
|
|
Kill SINGLE_PTRACE and use SINGLE_SUSPEND which has almost the same semantic.
This diff did not properly kill SINGLE_PTRACE and broke RAMDISK kernels.
|
|
single_thread_set() is modified to explicitly indicated when waiting until
sibling threads are parked is required. This is obviously not required if
a traced thread is switching away from a CPU after handling a STOP signal.
ok claudio@
|
|
If we fold the for-loop iterating over each interval timer into the
helper function the result is slightly tidier than what we have now.
Rename the helper function "cancel_all_itimers".
Based on input from millert@ and kettenis@.
|
|
During _exit(2) and sometimes during execve(2) we need to cancel any
active per-process interval timers. We don't currently do this in an
MP-safe way. Both syscalls ignore the locking assumptions documented
in proc.h.
The easiest way to make them MP-safe is to use setitimer(), just like
the getitimer(2) and setitimer(2) syscalls do. To make things a bit
cleaner I have added a helper function, cancelitimer(), so the callers
don't need to fuss with an itimerval struct.
While we're here we can remove the splclock/splx dance from execve(2).
It is no longer necessary.
ok deraadt@
|
|
|
|
|
|
page it out and bad things will happen when we try to page it back in
from within the clock interrupt handler.
While there, make sure we set timekeep_object back to NULL if we fail
to make the timekeep page into kernel space.
ok deraadt@ (who had a very similar diff)
|
|
This diff exposes parts of clock_gettime(2) and gettimeofday(2) to
userland via libc eliberating processes from the need for a context
switch everytime they want to count the passage of time.
If a timecounter clock can be exposed to userland than it needs to set
its tc_user member to a non-zero value. Tested with one or multiple
counters per architecture.
The timing data is shared through a pointer found in the new ELF
auxiliary vector AUX_openbsd_timekeep containing timehands information
that is frequently updated by the kernel.
Timing differences between the last kernel update and the current time
are adjusted in userland by the tc_get_timecount() function inside the
MD usertc.c file.
This permits a much more responsive environment, quite visible in
browsers, office programs and gaming (apparently one is are able to fly
in Minecraft now).
Tested by robert@, sthen@, naddy@, kmos@, phessler@, and many others!
OK from at least kettenis@, cheloha@, naddy@, sthen@
|
|
process.
ok bluhm@ claudio@ visa@
|
|
Convert those to a consolidated status when needed in wait4(), kevent(),
and sysctl()
Pass exit code and signal separately to exit1()
(This also serves as prep for adding waitid(2))
ok mpi@
|
|
ok millert@ deraadt@
|
|
enforce a new policy: system calls must be in pre-registered regions.
We have discussed more strict checks than this, but none satisfy the
cost/benefit based upon our understanding of attack methods, anyways
let's see what the next iteration looks like.
This is intended to harden (translation: attackers must put extra
effort into attacking) against a mixture of W^X failures and JIT bugs
which allow syscall misinterpretation, especially in environments with
polymorphic-instruction/variable-sized instructions. It fits in a bit
with libc/libcrypto/ld.so random relink on boot and no-restart-at-crash
behaviour, particularily for remote problems. Less effective once on-host
since someone the libraries can be read.
For static-executables the kernel registers the main program's
PIE-mapped exec section valid, as well as the randomly-placed sigtramp
page. For dynamic executables ELF ld.so's exec segment is also
labelled valid; ld.so then has enough information to register libc's
exec section as valid via call-once msyscall(2)
For dynamic binaries, we continue to to permit the main program exec
segment because "go" (and potentially a few other applications) have
embedded system calls in the main program. Hopefully at least go gets
fixed soon.
We declare the concept of embedded syscalls a bad idea for numerous
reasons, as we notice the ecosystem has many of
static-syscall-in-base-binary which are dynamically linked against
libraries which in turn use libc, which contains another set of
syscall stubs. We've been concerned about adding even one additional
syscall entry point... but go's approach tends to double the entry-point
attack surface.
This was started at a nano-hackathon in Bob Beck's basement 2 weeks
ago during a long discussion with mortimer trying to hide from the SSL
scream-conversations, and finished in more comfortable circumstances
next to a wood-stove at Elk Lakes cabin with UVM scream-conversations.
ok guenther kettenis mortimer, lots of feedback from others
conversations about go with jsing tb sthen
|