Age | Commit message (Collapse) | Author |
|
At boot, the powerpc64 kernel was calling
pmap_bootstrap -> pmap_kenter_pa -> mtx_enter(&pmap_hash_lock)
before it did
pmap_init -> mtx_init(&pmap_hash_lock, IPL_HIGH)
Change from mtx_init to MUTEX_INITIALIZER. This allows an option
WITNESS kernel to boot without warning of an uninitialized mutex.
Also change macppc's pmap_hash_lock from __ppc_lock_init to
PPC_LOCK_INITIALIZER, though WITNESS doesn't see this lock.
ok mpi@
|
|
|
|
The code has outgrown the original name for this struct. Both the
external and internal APIs have used the "clockqueue" namespace for
some time when operating on it, and that name is eyeball-consistent
with "clockintr" and "clockrequest", so "clockqueue" it is.
|
|
Almost all db_read_bytes() callers cast the destination buffer
argument to char*, which suggests the API's prototype is incompatible
with how the API is actually used.
Change db_read_bytes() and db_write_bytes() to take a void* as the
destination/source buffer parameter so callers don't need to cast the
argument.
With input from bluhm@. Bugs caught by Clemens Gossnitzer (ASCII
approximation of name).
Thread: https://marc.info/?l=openbsd-tech&m=170740813021636&w=2
ok bluhm@
|
|
Currently, clockintr_establish() calls malloc(9) to allocate a
clockintr struct on behalf of the caller. mpi@ says this behavior is
incompatible with dt(4). In particular, calling malloc(9) during the
initialization of a PCB outside of dt_pcb_alloc() is (a) awkward and
(b) may conflict with future changes/optimizations to PCB allocation.
To side-step the problem, this patch changes the clockintr subsystem
to use caller-allocated clockintr structs instead of callee-allocated
structs.
clockintr_establish() is named after softintr_establish(), which uses
malloc(9) internally to create softintr objects. The clockintr subsystem
is no longer using malloc(9), so the "establish" naming is no longer apt.
To avoid confusion, this patch also renames "clockintr_establish" to
"clockintr_bind".
Requested by mpi@. Tweaked by mpi@.
Thread: https://marc.info/?l=openbsd-tech&m=170597126103504&w=2
ok claudio@ mlarkin@ mpi@
|
|
off_t argument, there is no need to process more than 6 arguments on
64-bit platforms and 8 on 32-bit platforms.
Make the syscall argument gathering code simpler by removing never-used code
to fetch more arguments from the stack, and local argument arrays when pointing
to the trap frame does the job.
ok guenther@ jsing@
|
|
OK miod@
|
|
|
|
it is a dangerous alternative entry point for all system calls, and thus
incompatible with the precision system call entry point scheme we are
heading towards. This has been a 3-year mission:
First perl needed a code-generated wrapper to fake syscall(2) as a giant
switch table, then all the ports were cleaned with relatively minor fixes,
except for "go". "go" required two fixes -- 1) a framework issue with
old library versions, and 2) like perl, a fake syscall(2) wrapper to
handle ioctl(2) and sysctl(2) because "syscall(SYS_ioctl" occurs all over
the place in the "go" ecosystem because the "go developers" are plan9-loving
unix-hating folk who tried to build an ecosystem without allowing "ioctl".
ok kettenis, jsing, afresh1, sthen
|
|
descriptor (pted) pool in the arm64 pmap implementation. This
significantly reduces the side-effects of lock contention on the kernel
map lock that is (incorrectly) translated into excessive page daemon
wakeups. This is not a perfect solution but it does lead to significant
speedups on machines with many CPU cores.
This requires adding a new pmap_init_percpu() function that gets called
at the point where kernel is ready to set up the per-CPU pool caches.
Dummy implementations of this function are added for all non-arm64
architectures. Some other architectures can probably benefit from
providing an actual implementation that sets up per-CPU caches for
pmap pools as well.
ok phessler@, claudio@, miod@, patrick@
|
|
This patch isolates profil(2) and GPROF from statclock(). Currently,
statclock() implements both profil(2) and GPROF through a complex
mechanism involving both platform code (setstatclockrate) and the
scheduler (pscnt, psdiv, and psratio). We have a machine-independent
interface to the clock interrupt hardware now, so we no longer need to
do it this way.
- Move profil(2)-specific code from statclock() to a new clock
interrupt callback, profclock(), in subr_prof.c. Each
schedstate_percpu has its own profclock handle. The profclock is
enabled/disabled for a given CPU when it is needed by the running
thread during mi_switch() and sched_exit().
- Move GPROF-specific code from statclock() to a new clock interrupt
callback, gmonclock(), in subr_prof.c. Where available, each cpu_info
has its own gmonclock handle . The gmonclock is enabled/disabled for
a given CPU via sysctl(2) in prof_state_toggle().
- Both profclock() and gmonclock() have a fixed period, profclock_period,
that is initialized during initclocks().
- Export clockintr_advance(), clockintr_cancel(), clockintr_establish(),
and clockintr_stagger() via <sys/clockintr.h>. They have external
callers now.
- Delete pscnt, psdiv, psratio. From schedstate_percpu, also delete
spc_pscnt and spc_psdiv. The statclock frequency is not dynamic
anymore so these variables are now useless.
- Delete code/state related to the dynamic statclock frequency from
kern_clockintr.c. The statclock frequency can still be pseudo-random,
so move the contents of clockintr_statvar_init() into clockintr_init().
With input from miod@, deraadt@, and claudio@. Early revisions
cleaned up by claudio. Early revisions tested by claudio@. Tested by
cheloha@ on amd64, arm64, macppc, octeon, and sparc64 (sun4v).
Compile- and boot- tested on i386 by mlarkin@. riscv64 compilation
bugs found by mlarkin@. Tested on riscv64 by jca@. Tested on
powerpc64 by gkoehler@.
|
|
Every platform made the clockintr switch at least six months ago.
The __HAVE_CLOCKINTR symbol is now redundant. Remove it.
Prompted by claudio@.
Link: https://marc.info/?l=openbsd-tech&m=168826181015032&w=2
"makes sense" mlarkin@
|
|
This adds stacktrace_save_utrace() to extract and save the userland stack
which is stubbed out on most archs. alpha and riscv64 do not even implement
dt(4) and stacktrace_save_at() so the stubs are excluded there.
Additionally add a new ioctl DTIOCGETAUXBASE which allows btrace to
fetch the AUX_BASE vallue from the AUX vector of a process.
OK mpi@ (some time ago) discussed with kettenis@
|
|
ever ran on, and it's unlikely to ever be implemented, so remove it.
ok jsg@
|
|
feedback and ok jmc@ miod, ok millert@
|
|
This fixes a possible freeze in execve(2). It sometimes froze when a
dual-cpu macppc started daemons during boot. There is a chance that
uvm_map.c uvmspace_exec sees ovm->vm_refcnt != 1 and switches curproc
to a new pmap. If this happened, then execve froze by trying to
copyout to the wrong pmap; curpcb->pcb_pm was old. Fix by setting
pointers when uvmspace_exec calls pmap_activate.
ok miod@
|
|
it are now unpadded
ok kettenis guenther
|
|
The code was reading pg->pg_flags, so clang assumed pg != NULL, then
optimized a later "if (pg != NULL)" to "if (1)", and allowed a call to
pmap_enter_pv(pted, NULL). Such a call can freeze bsd.mp by trying to
lock NULL's ((struct mutex *)0x3c). I froze bsd.mp this way by
starting Xorg on a macppc with nv(4) or r128(4) video, as it tried to
mmap the xf86(4) aperture.
ok miod@
|
|
x86 macros but have never been implemented and never been used either.
|
|
against classic BROP with a range-checking wrapper in front of copyin() and
copyinstr() which ensures the userland source doesn't overlap the main program
text, ld.so text, signal tramp text (it's mapping is hard to distinguish
so it comes along for the ride), or libc.so text. ld.so tells the kernel
libc.so text range with msyscall(2). The range checking for 2-4 elements is
done without locking (because all 4 ranges are immutable!) and is inexpensive.
write(sock, &open, 400) now fails with EFAULT. No programs have been
discovered which require reading their own text segments with a system call.
On a machine without mmu enforcement, a test program reports the following:
userland kernel
ld.so readable unreadable
mmap xz unreadable unreadable
mmap x readable readable
mmap nrx readable readable
mmap nwx readable readable
mmap xnwx readable readable
main readable unreadable
libc unmapped? readable unreadable
libc mapped readable unreadable
ok kettenis, additional help from miod
|
|
The G5 PowerPC 970 has a Data Address Compare mechanism that can trap
loads and stores to pages with PTE_AC_64, while allowing instruction
fetches. Use this for execute-only mappings, like we do on powerpc64.
Add a check to pte_spill_v for execute-only mappings. Without this,
we would forever retry reading an execute-only page.
In altivec_assist, copyin would fail to read the instruction from an
execute-only page. Add copyinsn to bypass x-only, like sparc64.
with help from abieber@ deraadt@ kettenis@
ok deraadt@
|
|
|
|
arguments to mmap) because it was using syscall(2) and that callpath
is invisible in ktrace. make it visible, it will now show "(via syscall)"
and such.
ok guenther
|
|
Each pmap sets a bit in usedsr to claim 16 unique VSIDs for its
segment registers. Use atomic_cas_uint to set this bit (checking that
the other cpu didn't steal it) and atomic_clearbits_int to clear it.
Stop using splvm.
ok miod@
|
|
|
|
|
|
include PROT_READ, otherwise faults on executable pages mapped only as
PORT_EXEC will not work.
ok deraadt@
|
|
is ELF" world. Eliminate use of them in amd64, arm64, armv7, i386,
macppc, mips64, and sparc64 code.
ok deraadt@ jca@ krw@
|
|
- Remove powerpc-specific clock interrupt scheduling bits from cpu_info.
- Remove macppc-specific randomized statclock() bits from macppc/clock.c.
- Remove the 'stat_count' evcount. All clock interrupts are now counted
via the 'clock_count' evcount.
- Wire up dec_intrclock.
Bringup help from gkoehler@. The patch has survived five or six
kernel-release-upgrade cycles on my dual-core PowerMac3,6.
Link: https://marc.info/?l=openbsd-tech&m=166776385003520&w=2
ok gkoehler@ mlarkin@
|
|
fork/vfork/__tfork haven't cared about the second return register.
So, stop setting retval[1] in kern_fork.c and stop setting the
second return register in the MD child_return() routines.
With the above, we have no multi-register return values on LP64,
so stop touching that register in the trapframe on those archs.
testing miod@ and aoyama@
ok miod@
|
|
used by cpu_fork()
ok miod@ kettenis@ mpi@ deraadt@
|
|
The old CPU in a macppc traps AltiVec instructions when they encounter
denormal or subnormal floats. Emulate most of them. They operate on
vectors of 4 single-precision floats. The emulations either use
scalar operations (so vmaddfp becomes 4 of fmadds) or a formula (like
vrsqrtefp's 1 / sqrt(b) = 1 / sqrt(b * 2**126) * 2**63).
I am forgetting to emulate some instructions (at least vrfin, vrfiz,
vrfip, vrfim). If I don't emulate it, it will still cause SIGFPE.
Mac OS never emulated these instructions, but set AltiVec's "non-Java"
NJ bit (which changes all subnormal floats to zero). FreeBSD also
sets NJ; NetBSD does SIGFPE; Linux emulates them. The POWER9 running
OpenBSD/powerpc64 does them in hardware (without trapping).
ok kettenis@ miod@
|
|
The powerpc64 part is under #if 0, so this change affects only macppc.
Simplify powerpc64's __syncicache (which had size_t len) and copy it
to macppc's syncicache (which had int len).
macppc was looping while ((l -= CACHELINESIZE) > 0). The loop would
be infinite if l became an unsigned type like size_t. It is simpler
to set size_t i = 0, do i += by, and loop while (i < len). It helps
that dcbst and icbi can add 2 registers, from + i.
|
|
|
|
Use that define to shunt uvm_swapout_threads(), which is a noop when
pmap_collect() does nothing.
ok mpi@
|
|
ok miod@ mpi@ gnezdo@
|
|
ok miod@ guenther@
|
|
On PowerPC, by design, you cannot mask decrementer (DEC) interrupts
without also masking other interrupts that we want to leave unmasked
at or above IPL_CLOCK. So, currently, the DEC is left unmasked, even
when we're working at IPL_CLOCK or IPL_HIGH. If a DEC interrupt
arrives while we're at those priority levels, the current solution is
to postpone any clock interrupt work until the next hardclock(9) or
statclock tick.
This is a problem for a machine-independent clock interrupt subsystem
because the MD code, e.g. decr_intr(), ideally shouldn't need to know
anything about when the next event is scheduled to occur.
The most obvious solution to this problem that I can think of is to
instead postpone clock interrupt work until the next time our priority
level drops below IPL_CLOCK. This is something we can do from the MD
code without any knowledge of when the next clock interrupt event is
scheduled to occur.
So:
- Add a new boolean, ci_dec_deferred, to the PowerPC cpu_info struct.
- If we reach decr_intr() when the CPU's priority level is too high,
set ci_dec_deferred, clear the DEC exception, and return.
- If we reach decr_intr() and the CPU's priority level is low enough,
clear ci_dec_deferred and do any needed clock interrupt work.
- In splx(9) (there are three different versions we need to update),
check ci_dec_deferred. If it's set and our priority level is
dropping below IPL_CLOCK, raise a DEC exception.
Tested by me on PowerMac7,3 (openpic). Tested by miod@ on PowerMac1,1
(macintr) (`make build` completes). Tested by gkoehler@ on an unknown
PowerMac (probably openpic).
With lots of help from kettenis@.
ok gkoehler@ miod@
|
|
Previously for __cpu_simple_lock parts. Now only hppa and m88k use
__cpu_simple_lock (and hppa uses atomic.h for it).
ok miod@ visa@
|
|
setting the binding to global (NB == "no binding"), as clang 13 is
now warning about changing the binding from global to weak. Use
them for bcopy, brk, and sbrk.
Add the '.L' prefix to internal labels in the bcopy implementation
to remove them from the symbol table
Start using the MI DEFS.h: delete the #defines from powerpc/SYS.h
that the MI DEFS.h provides and switch from SYS.h to DEFS.h in files
that don't do syscalls. Use END_BUILTIN from the MI DEFS.h for ffs.
ok gkoehler@
|
|
exception addresses past EXC_LAST, making its definition wrong.
Replace it with EXC_END, which points to the end of hardware exception
addresses, and adjust logic accordingly.
ok kettenis@
|
|
ok jca@
|
|
Define a consistently named db_machine_command_table[] across all
archs that implement the MD "machine" command, and hook this into
the main command table instead of patching it at runtime.
ok mpi@ jca@
|
|
ok gkoehler@
|
|
These stubs don't work; they only pretend to suspend the machine.
SUSPEND + MULTIPROCESSOR doesn't build. zzz(8) stops giving an error
message, even in no-SUSPEND kernels.
Add intr_enable in <powerpc/cpu.h>, adapted from powerpc64, because
subr_suspend.c calls intr_enable().
|
|
In the powerpc pmap, hash collisions can spill page table entries.
Page faults can use pte_spill_v to reinsert a spilled pte. If the
fault is a write (DSISR_STORE), then pte_spill_v tries to check for a
read-only page. The existing check (pte_lo & PTE_RO_64) also matched
rw pages, because PTE_RO_64 is 3 and PTE_RW_64 is 2. This caused
pte_spill_v to deny writes to rw pages. Then uvm_fault might allow
the write; but uvm_fault can't handle some pages in the kernel. Such
faults caused, "panic: uvm_fault: fault on non-pageable map", or
"panic: trap type 300".
Change it to ((pte_lo & PTE_PP_64) == PTE_RO_64). This seems to fix
one reason why bsd.mp on a macppc dual G5 might panic.
ok kettenis@ miod@
|
|
If cpu0 sends PPC_IPI_DDB to cpu1, then cpu1 stops on its interrupt
stack. Teach ININTSTK to allow traces through all interrupt stacks,
not only cpu0's.
ININTSTK now works by looping for all cpus. It doesn't remember which
cpu owns the stack. A macppc has at most 4 cpus.
ok kettenis@ miod@
|
|
do not bother operating on its first 8 bytes, which will always be zero.
ok visa@
|
|
Edit db_regs[] in db_trace.c on both powerpc and powerpc64, so ddb can
access $r14, $r15, $r16, $dar, $dsisr.
Only for powerpc: change db_trap_glue to copy all registers to and
from ddb_regs (it was skipping some); change db_set_single_step and
db_clear_single_step to flip the correct bit of srr1; delete
FIXUP_PC_AFTER_BREAK, which was off by 1 instruction.
"ddb{1}> s" on my PowerMac7,3 (dual G5 at 2700 MHz) began to panic
like, "*cpu0: mutex 0xa7d0a0 not held in tc_update_timekeep". Add an
arbitrary delay(100) after sending PPC_IPI_DDB; I want cpu0 to get the
ipi before it can see db_active == 1 and skip acquiring a mutex.
ok kettenis@
|
|
of the kernel memory. Found with clang static analyzer.
Feedback and ok gkoehler@
ok bluhm@
|