Age | Commit message (Collapse) | Author |
|
|
|
be kept alive longer than expected causing syzkaller to no longer being
able to enable remote coverage.
ok visa@
Reported-by: syzbot+ab2016d729cda7b0d003@syzkaller.appspotmail.com
|
|
This time unix(4) sockets garbage collector always destroys any socket
with positive "fp->f_count == unp->unp_msgcount" equation. This is wrong
because unix(4) sockets within SCM_RIGHTS message but closed on sender
side also have this equation positive. Such sockets are not in the loop,
and if garbage collector kill them before they are received, we get
kernel panic.
FreeBSD already has garbage collector reworked to fix this issue [1]. The
logic is pretty simple so import it to our garbage collector.
1. https://reviews.freebsd.org/D23142
ok bluhm@
|
|
When a tty device is revoked, the associated knotes should be
invalidated. Otherwise the user processes can keep on receiving
events from the device.
It appears tricky to do the invalidation as part of revocation
in a way that does not allow unwanted event registration or clutter
the tty code. For now, make the knotes invalid lazily before delivery.
OK mpi@
|
|
OK mpi@
|
|
|
|
Switch libc and ld.so to the generic stubs for these calls.
WARNING: reboot to updated kernel before installing libc or ld.so!
Time for a story...
When gcc (back in 1.x days) first implemented long long, it didn't (always)
pass 64bit arguments in 'aligned' registers/stack slots, with the result that
argument offsets didn't match structure offsets. This affected the nine system
calls that pass off_t arguments:
ftruncate lseek mmap mquery pread preadv pwrite pwritev truncate
To avoid having to do custom ASM wrappers for those, BSD put an explicit pad
argument in so that the off_t argument would always start on a even slot and
thus be naturally aligned. Thus those odd wrappers in lib/libc/sys/ that use
__syscall() and pass an extra '0' argument.
The ABIs for different CPUs eventually settled how things should be passed on
each and gcc 2.x followed them. The only arch now where it helps is landisk,
which needs to skip the last argument register if it would be the first half of
a 64bit argument. So: add new syscalls without the pad argument and on landisk
do that skipping directly in the syscall handler in the kernel. Keep compat
support for the existing syscalls long enough for the transition.
ok deraadt@
|
|
OK anton@
|
|
allocated memory could be too short for the list of disks. Retry
allocating enough space until it did not change.
The disk list and duid memory are protected by kernel lock. Use
asserts to mark this explicitly.
Reported-by: syzbot+807423f6868bbfb836bc@syzkaller.appspotmail.com
OK anton@ mpi@
|
|
kern.shminfo.shmseg is set to something ridiculously large.
ok kettenis@ millert@
Reported-by: syzbot+9f1b201cdbc97b19c7f5@syzkaller.appspotmail.com
|
|
When a knote uses the dead event filter, the knote's file descriptor is
not supposed to point to an object with pending out-of-band data. Make
the knote inactive so that userspace will not receive a spurious event.
However, kqueue-based poll(2) should still receive HUP notifications.
This lets the system use dead_filtops with less strings attached
relative to the filter type.
|
|
Do not assume event status in the modify and process callbacks. Instead
always run the event filter so that it has a chance to set knote flags.
The filter can also indicate event inactivity.
|
|
* Implement EVFILT_EXCEPT for ttys for HUP condition detection.
This filter is used when pollfd.events has no read/write events.
* Add HUP condition detection to filt_ptcwrite() and filt_ttywrite()
to reflect ptcpoll() and ttpoll(). Only poll(2) and select(2) can
utilize the code; kevent(2) should behave as before with EVFILT_WRITE.
* Clear EV_EOF and __EV_HUP if the EOF/HUP condition ends.
OK mpi@
|
|
|
|
-1 is sufficient to indicate the process had no controlling tty, removing
one more sys/param.h include in our userland
ok millert
|
|
Restrict the circumstances where EVFILT_EXCEPT filters trigger:
* when out-of-band data is present and NOTE_OOB is requested.
* when the channel is fully closed and consumer is poll(2).
This should clarify the logic and suppress events that kqueue-based
poll(2) does not except.
OK mpi@
|
|
Currently, the only intended direct usage of the EVFILT_EXCEPT filter
is with NOTE_OOB to detect out-of-band data in ptys and sockets.
NOTE_OOB does not apply to FIFOs or pipes. Prevent the user from
registering the filter with these file types. The filter code is for
the kernel's internal use.
OK mpi@
|
|
Pass the device vnode as a parameter to VOP_STRATEGY() to allow calling
the correct vop_strategy callback. Now the vnode is also available
in the callback.
OK mpi@
|
|
Make __EV_POLL specific to kqueue-based poll(2), to remove overlap
with __EV_SELECT that only select(2) uses.
OK millert@ mpi@
|
|
first __tfork(2)"
The immediate issue is that a process linked with -znow will still
perform lazy relocation on objects loaded with dlopen(), but there
are possibly other dark corners to plumb to find a better invariant.
Problem reported by thfr@
|
|
SYS_syscall as the nosys() function into the MD syscall entry
routines and the SYSCALL_DEBUG support. Adjust alpha's syscall
check to match the other archs. Also, make sysent const to get it
into .rodata.
With that, 'struct emul' is unused: delete it and all its references
ok millert@
|
|
Prevent select(2) from indicating an exceptional condition when the
other end of a FIFO or pipe is closed.
Originally, select(2) returned an exceptfds event only with a pty or
socket that has out-of-band data pending. millert@ says that OpenBSD
diverged from this by accident when poll(2) and select(2) were changed
to use the same backend code in year 2003.
OK millert@
|
|
exec_elf_fixup() and coredump_elf() in <sys/exec_elf.h> and call
them and the MD setregs() directly in kern_exec.c and kern_sig.c
Also delete e_name[] (only used by sysctl), e_errno (unused), and
e_syscallnames[] (only used by SYSCALL_DEBUG) and constipate
syscallnames to 'const char *const[]'
ok kettenis@
|
|
sigobject. Just use the existing globals for the former and use a
global for the latter.
ok jsg@ kettenis@
|
|
The kqueue-based select(2) needs the filter to replicate the old
exceptfds behaviour. The upcoming new poll(2) code will use the filter
for POLLHUP condition checking when the events bitmap is clear of
read/write events.
OK anton@
|
|
of the auxinfo is fixed: provide ELF_AUX_WORDS in <sys/exec_elf.h>
as a replacement for emul->e_arglen
ok millert@
|
|
rwlock(9).
This save us from from races provided by unlocked access to the `f_count'
which cause false marking alive socket as dead. We always modify `f_count'
and `unp_msgcount' together so the `f_count' modification should also pass
the `unp_gc_rwlock' before `unp_msgcount' increment and after
`unp_msgcount' decrement. The locked `unp_file' assignment avoids us from
drain unp_gc() run.
This moves unp_gc() locking back when these wariables were protected with
the same lock which was taken for all garbage collector run but uses
another lock not `unp_lock'.
ok kettenis@ bluhm@
|
|
copyargs() return 0/1 and merge elf_copyargs() into it. Rename
ep_emul_arg and ep_emul_argp to have clearer meaning and type and
eliminate ep_emul_argsize as no longer necessary. Make sure
ep_auxinfo (nee ep_emul_argp) is initialized as powerpc64 always
uses it in setregs().
ok semarie@ deraadt@ kettenis@
|
|
To unlock kbind(2) we need to protect ps_kbind_addr and
ps_kbind_cookie.
The simplest way to do this is to disallow kbind(2) initialization
after the first __tfork(2) call. If the first thread does not
initialize the kbind(2) variables before __tfork(2) then we disable
kbind(2) during that first __tfork(2) call.
This is guenther@'s patch, I'm just committing it.
Discussed with guenther@, deraadt@, kettenis@, and mpi@.
ok kettenis@, positive response from mpi@, "I am busy" guenther@
|
|
a pointer to a local variable to allow concurrent use if that ever
needs to happen in the future.
ok mpi kettenis
|
|
a decade, because all tty drivers preallocate.
ok kettenis
|
|
OK mpi@
|
|
|
|
follow the same code path.
ok bluhm@
|
|
When closing a file descriptor and converting the poll/select knotes
into badfd knotes, keep the knotes attached to the by-fd table. This
should prevent kqueue_purge() from returning before the kqueue has
become quiescent. This in turn should fix a
KASSERT(TAILQ_EMPTY(&kq->kq_head)) panic in KQRELE() that bluhm@ has
reported.
The badfd conversion is only needed when a poll/select scan is ongoing.
The system can skip the conversion if the knote is not part of the
active event set.
The code of this commit skips the conversion when the fd is closed by
the same thread that has done the fd polling. This can be improved but
should already cover typical fd usage patterns.
As badfd knotes now hold slots in the by-fd table, kqueue_register()
clears them. poll/select use kqueue_register() to set up a new scan;
any found fd close notification is a leftover from the previous scan.
The new badfd handling should be free of accidental knote accumulation.
This obsoletes kqpoll_dequeue() and lowers kqpoll_init() overhead.
Re-enable lazy removal of poll/select knotes because the panic should
no longer happen.
OK mpi@
|
|
The late 1990s reasons for avoiding __dead with exit1() should not apply
with the current compilers.
This fixes compiler warnings about uninitialized variables in trap.c
on mips64.
Discussed with guenther@ and miod@
|
|
|
|
|
|
OK guenther@
|
|
function. This will make unlocking cursig() & postsig() a bit easier.
OK mpi@
|
|
to get a better order of functions. Also reduce the size of sigprop
to NSIG from NSIG+1. NSIG is defined as 33 and so includes the extra
element for this array.
OK mpi@
|
|
goes on in SMR.
OK mpi@
|
|
The translation to and from kqueue still has major shortcomings.
Discussed with deraadt@
|
|
This makes the kqueue-based poll(2) behave more similarly to the old
code when a monitored file descriptor is closed by another thread.
OK mpi@
|
|
The KERNEL_LOCK() is no longer necessary with rwsleep() and PCATCH
because the sleep machinery now does the locking internally.
OK mpi@
|
|
ok mpi@
|
|
were connected could be closed by concurrent thread. Check connection
state and return ECONNREFUSED if the connection was lost.
ok bluhm@
|
|
When the fd set is empty, the code waits for a signal or timeout.
Wakeups from the kqueue are neither expected nor wanted.
OK cheloha@, millert@, anton@, mpi@
|
|
Except `unp_ino' this leaves only per-socket data protected by
`unp_lock'. The `unp_ino' protection is not the big deal and will be
done with mutex(9) in the future diff.
The garbage collector flags moved from from `unp_flags' to unp_gcflags'.
The two new locks introduced to protect garbage collector data. The
`unp_gc_lock' rwlock(9) protects `unp_defer', `unp_gcing', `unp_gcflags'
and `unp_link' list. The `unp_df_lock' protects `ud_link' list.
We need to simultaneously lock `unp_gc_lock' and `unp_lock'. When we
perform unp_attach() or unp_detach() we link PCB to `unp_link' list with
`unp_lock' held. But when unp_gc() does `unp_link' list walkthrough with
the `unp_gc_lock' lock held it should lock socket while performs
`so_rcv' buffer scan and the lock order should be the opposite.
In the future diff `unp_lock' will be replaced by per-socket `so_lock'
so it's better to enforce `unp_gc_lock' -> `unp_lock' (solock()) lock
order and release `unp_lock' in the unp_attach() and unp_detach() paths.
The previously committed diffs made this safe.
The `unp_df_lock' introduced because the `unp_lock' and `unp_gc_lock'
state are unknown when unp_discard() called. Since it touches only
`ud_link' list the re-lock dances are unwanted in this path. Also this
keeps M_WAITOK allocation outside rwlock(9) when unp_discard() called
from unp_externalize() error path.
ok bluhm@
|
|
which checks PLEDGE_* bits more than once. Some functions are called without
locking, and this avoids misinterpreting bits which have some coupled behaviour.
ok cheloha kettenis
|