summaryrefslogtreecommitdiff
path: root/sys/kern
AgeCommit message (Collapse)Author
2022-01-01copyright++;Jonathan Gray
2021-12-29Do not allow send/receive of kcov descriptors as the file descriptor canAnton Lindqvist
be kept alive longer than expected causing syzkaller to no longer being able to enable remote coverage. ok visa@ Reported-by: syzbot+ab2016d729cda7b0d003@syzkaller.appspotmail.com
2021-12-26Rework garbage collector for unix(4) sockets.Vitaliy Makkoveev
This time unix(4) sockets garbage collector always destroys any socket with positive "fp->f_count == unp->unp_msgcount" equation. This is wrong because unix(4) sockets within SCM_RIGHTS message but closed on sender side also have this equation positive. Such sockets are not in the loop, and if garbage collector kill them before they are received, we get kernel panic. FreeBSD already has garbage collector reworked to fix this issue [1]. The logic is pretty simple so import it to our garbage collector. 1. https://reviews.freebsd.org/D23142 ok bluhm@
2021-12-25kqueue: Invalidate revoked vnodes' knotes on the flyVisa Hankala
When a tty device is revoked, the associated knotes should be invalidated. Otherwise the user processes can keep on receiving events from the device. It appears tricky to do the invalidation as part of revocation in a way that does not allow unwanted event registration or clutter the tty code. For now, make the knotes invalid lazily before delivery. OK mpi@
2021-12-24Make poll/select version of filt_solisten() more similar to soo_poll().Visa Hankala
OK mpi@
2021-12-23syncPhilip Guenther
2021-12-23Roll the syscalls that have an off_t argument to remove the explicit padding.Philip Guenther
Switch libc and ld.so to the generic stubs for these calls. WARNING: reboot to updated kernel before installing libc or ld.so! Time for a story... When gcc (back in 1.x days) first implemented long long, it didn't (always) pass 64bit arguments in 'aligned' registers/stack slots, with the result that argument offsets didn't match structure offsets. This affected the nine system calls that pass off_t arguments: ftruncate lseek mmap mquery pread preadv pwrite pwritev truncate To avoid having to do custom ASM wrappers for those, BSD put an explicit pad argument in so that the off_t argument would always start on a even slot and thus be naturally aligned. Thus those odd wrappers in lib/libc/sys/ that use __syscall() and pass an extra '0' argument. The ABIs for different CPUs eventually settled how things should be passed on each and gcc 2.x followed them. The only arch now where it helps is landisk, which needs to skip the last argument register if it would be the first half of a 64bit argument. So: add new syscalls without the pad argument and on landisk do that skipping directly in the syscall handler in the kernel. Keep compat support for the existing syscalls long enough for the transition. ok deraadt@
2021-12-23Use TAILQ_FOREACH to traverse the disk list in sysctl_diskinit().Alexander Bluhm
OK anton@
2021-12-22While malloc sleeps, the disk list could change during sysctl. ThenAlexander Bluhm
allocated memory could be too short for the list of disks. Retry allocating enough space until it did not change. The disk list and duid memory are protected by kernel lock. Use asserts to mark this explicitly. Reported-by: syzbot+807423f6868bbfb836bc@syzkaller.appspotmail.com OK anton@ mpi@
2021-12-21Let malloc return an error as opposed of panicking when sysctlAnton Lindqvist
kern.shminfo.shmseg is set to something ridiculously large. ok kettenis@ millert@ Reported-by: syzbot+9f1b201cdbc97b19c7f5@syzkaller.appspotmail.com
2021-12-20Make filt_dead() selectively inactive with EVFILT_EXCEPTVisa Hankala
When a knote uses the dead event filter, the knote's file descriptor is not supposed to point to an object with pending out-of-band data. Make the knote inactive so that userspace will not receive a spurious event. However, kqueue-based poll(2) should still receive HUP notifications. This lets the system use dead_filtops with less strings attached relative to the filter type.
2021-12-20Run seltrue/dead event filter in modify and process callbacksVisa Hankala
Do not assume event status in the modify and process callbacks. Instead always run the event filter so that it has a chance to set knote flags. The filter can also indicate event inactivity.
2021-12-15Adjust pty and tty event filtersVisa Hankala
* Implement EVFILT_EXCEPT for ttys for HUP condition detection. This filter is used when pollfd.events has no read/write events. * Add HUP condition detection to filt_ptcwrite() and filt_ttywrite() to reflect ptcpoll() and ttpoll(). Only poll(2) and select(2) can utilize the code; kevent(2) should behave as before with EVFILT_WRITE. * Clear EV_EOF and __EV_HUP if the EOF/HUP condition ends. OK mpi@
2021-12-14Cover all state checks and updates with spltty() in filt_ttyread().Visa Hankala
2021-12-13acct(4) ac_tty shouldn't need NODEV from sys/param.h (which is kernel API),Theo de Raadt
-1 is sufficient to indicate the process had no controlling tty, removing one more sys/param.h include in our userland ok millert
2021-12-13Revise EVFILT_EXCEPT filtersVisa Hankala
Restrict the circumstances where EVFILT_EXCEPT filters trigger: * when out-of-band data is present and NOTE_OOB is requested. * when the channel is fully closed and consumer is poll(2). This should clarify the logic and suppress events that kqueue-based poll(2) does not except. OK mpi@
2021-12-13Prevent kevent(2) use of EVFILT_EXCEPT with FIFOs and pipesVisa Hankala
Currently, the only intended direct usage of the EVFILT_EXCEPT filter is with NOTE_OOB to detect out-of-band data in ptys and sockets. NOTE_OOB does not apply to FIFOs or pipes. Prevent the user from registering the filter with these file types. The filter code is for the kernel's internal use. OK mpi@
2021-12-12Add vnode parameter to VOP_STRATEGY()Visa Hankala
Pass the device vnode as a parameter to VOP_STRATEGY() to allow calling the correct vop_strategy callback. Now the vnode is also available in the callback. OK mpi@
2021-12-11Clarify usage of __EV_POLL and __EV_SELECTVisa Hankala
Make __EV_POLL specific to kqueue-based poll(2), to remove overlap with __EV_SELECT that only select(2) uses. OK millert@ mpi@
2021-12-10Revert "kbind(2): disable system call if not initialized beforePhilip Guenther
first __tfork(2)" The immediate issue is that a process linked with -znow will still perform lazy relocation on objects loaded with dlopen(), but there are possibly other dark corners to plumb to find a better invariant. Problem reported by thfr@
2021-12-09We only have one syscall table: inline sysent/SYS_MAXSYSCALL andPhilip Guenther
SYS_syscall as the nosys() function into the MD syscall entry routines and the SYSCALL_DEBUG support. Adjust alpha's syscall check to match the other archs. Also, make sysent const to get it into .rodata. With that, 'struct emul' is unused: delete it and all its references ok millert@
2021-12-08Fix select(2) exceptfds handling of FIFOs and pipesVisa Hankala
Prevent select(2) from indicating an exceptional condition when the other end of a FIFO or pipe is closed. Originally, select(2) returned an exceptfds event only with a pty or socket that has out-of-band data pending. millert@ says that OpenBSD diverged from this by accident when poll(2) and select(2) were changed to use the same backend code in year 2003. OK millert@
2021-12-07Delete the last emulation callbacks: we're Just ELF, so declarePhilip Guenther
exec_elf_fixup() and coredump_elf() in <sys/exec_elf.h> and call them and the MD setregs() directly in kern_exec.c and kern_sig.c Also delete e_name[] (only used by sysctl), e_errno (unused), and e_syscallnames[] (only used by SYSCALL_DEBUG) and constipate syscallnames to 'const char *const[]' ok kettenis@
2021-12-07Continue to delete emulation support: we only have one sigcode andPhilip Guenther
sigobject. Just use the existing globals for the former and use a global for the latter. ok jsg@ kettenis@
2021-12-07Add EVFILT_EXCEPT filter for pipesVisa Hankala
The kqueue-based select(2) needs the filter to replicate the old exceptfds behaviour. The upcoming new poll(2) code will use the filter for POLLHUP condition checking when the events bitmap is clear of read/write events. OK anton@
2021-12-07Continue to delete emulation support: since we're Just ELF, the sizePhilip Guenther
of the auxinfo is fixed: provide ELF_AUX_WORDS in <sys/exec_elf.h> as a replacement for emul->e_arglen ok millert@
2021-12-07Make `unp_msgcount' and `unp_file' protection with `unp_gc_lock'Vitaliy Makkoveev
rwlock(9). This save us from from races provided by unlocked access to the `f_count' which cause false marking alive socket as dead. We always modify `f_count' and `unp_msgcount' together so the `f_count' modification should also pass the `unp_gc_rwlock' before `unp_msgcount' increment and after `unp_msgcount' decrement. The locked `unp_file' assignment avoids us from drain unp_gc() run. This moves unp_gc() locking back when these wariables were protected with the same lock which was taken for all garbage collector run but uses another lock not `unp_lock'. ok kettenis@ bluhm@
2021-12-06Start to delete emulation support: since we're Just ELF, makePhilip Guenther
copyargs() return 0/1 and merge elf_copyargs() into it. Rename ep_emul_arg and ep_emul_argp to have clearer meaning and type and eliminate ep_emul_argsize as no longer necessary. Make sure ep_auxinfo (nee ep_emul_argp) is initialized as powerpc64 always uses it in setregs(). ok semarie@ deraadt@ kettenis@
2021-12-05kbind(2): disable system call if not initialized before first __tfork(2)Scott Soule Cheloha
To unlock kbind(2) we need to protect ps_kbind_addr and ps_kbind_cookie. The simplest way to do this is to disallow kbind(2) initialization after the first __tfork(2) call. If the first thread does not initialize the kbind(2) variables before __tfork(2) then we disable kbind(2) during that first __tfork(2) call. This is guenther@'s patch, I'm just committing it. Discussed with guenther@, deraadt@, kettenis@, and mpi@. ok kettenis@, positive response from mpi@, "I am busy" guenther@
2021-12-02firstc() and nextc() use an int of global static storage. Make thisTheo de Raadt
a pointer to a local variable to allow concurrent use if that ever needs to happen in the future. ok mpi kettenis
2021-12-01late allocation of clist in putc() and b_to_q() hasn't been required inTheo de Raadt
a decade, because all tty drivers preallocate. ok kettenis
2021-11-30Prevent select(2) from blocking if registering found pending events.Visa Hankala
OK mpi@
2021-11-29regenVitaliy Makkoveev
2021-11-29Unlock accept(2) and accept4(2) syscalls. Unlock them both because theyVitaliy Makkoveev
follow the same code path. ok bluhm@
2021-11-29kqueue: Revise badfd knote handlingVisa Hankala
When closing a file descriptor and converting the poll/select knotes into badfd knotes, keep the knotes attached to the by-fd table. This should prevent kqueue_purge() from returning before the kqueue has become quiescent. This in turn should fix a KASSERT(TAILQ_EMPTY(&kq->kq_head)) panic in KQRELE() that bluhm@ has reported. The badfd conversion is only needed when a poll/select scan is ongoing. The system can skip the conversion if the knote is not part of the active event set. The code of this commit skips the conversion when the fd is closed by the same thread that has done the fd polling. This can be improved but should already cover typical fd usage patterns. As badfd knotes now hold slots in the by-fd table, kqueue_register() clears them. poll/select use kqueue_register() to set up a new scan; any found fd close notification is a leftover from the previous scan. The new badfd handling should be free of accidental knote accumulation. This obsoletes kqpoll_dequeue() and lowers kqpoll_init() overhead. Re-enable lazy removal of poll/select knotes because the panic should no longer happen. OK mpi@
2021-11-26Mark exit1() and sigexit() as non-returningVisa Hankala
The late 1990s reasons for avoiding __dead with exit1() should not apply with the current compilers. This fixes compiler warnings about uninitialized variables in trap.c on mips64. Discussed with guenther@ and miod@
2021-11-24Fix type of count.Visa Hankala
2021-11-24Simplify arithmetics on the main path.Visa Hankala
2021-11-24Remove unneeded <sys/stdarg.h>.Visa Hankala
OK guenther@
2021-11-24Refactor postsig_done(). Pass the catchmask and signal reset flag to theClaudio Jeker
function. This will make unlocking cursig() & postsig() a bit easier. OK mpi@
2021-11-24Minor code cleanup. Move a comment to the right place, move a functionClaudio Jeker
to get a better order of functions. Also reduce the size of sigprop to NSIG from NSIG+1. NSIG is defined as 33 and so includes the extra element for this array. OK mpi@
2021-11-24Add a few dt(4) TRACEPOINTS to SMR. Should help to better understand whatClaudio Jeker
goes on in SMR. OK mpi@
2021-11-22Revert poll(2) back to the original implementationVisa Hankala
The translation to and from kqueue still has major shortcomings. Discussed with deraadt@
2021-11-22Translate POLLNVAL in ppollcollect()Visa Hankala
This makes the kqueue-based poll(2) behave more similarly to the old code when a monitored file descriptor is closed by another thread. OK mpi@
2021-11-22Let futex_wait() run without kernel lockVisa Hankala
The KERNEL_LOCK() is no longer necessary with rwsleep() and PCATCH because the sleep machinery now does the locking internally. OK mpi@
2021-11-19Make futexes work in shared anonymous memory.Mark Kettenis
ok mpi@
2021-11-17When unp_connect() releases both solock() and vnode(9) locks the socket weVitaliy Makkoveev
were connected could be closed by concurrent thread. Check connection state and return ECONNREFUSED if the connection was lost. ok bluhm@
2021-11-16Use nowake when poll/select has empty fd setVisa Hankala
When the fd set is empty, the code waits for a signal or timeout. Wakeups from the kqueue are neither expected nor wanted. OK cheloha@, millert@, anton@, mpi@
2021-11-16Move UNIX domain sockets garbage collector out of `unp_lock.Vitaliy Makkoveev
Except `unp_ino' this leaves only per-socket data protected by `unp_lock'. The `unp_ino' protection is not the big deal and will be done with mutex(9) in the future diff. The garbage collector flags moved from from `unp_flags' to unp_gcflags'. The two new locks introduced to protect garbage collector data. The `unp_gc_lock' rwlock(9) protects `unp_defer', `unp_gcing', `unp_gcflags' and `unp_link' list. The `unp_df_lock' protects `ud_link' list. We need to simultaneously lock `unp_gc_lock' and `unp_lock'. When we perform unp_attach() or unp_detach() we link PCB to `unp_link' list with `unp_lock' held. But when unp_gc() does `unp_link' list walkthrough with the `unp_gc_lock' lock held it should lock socket while performs `so_rcv' buffer scan and the lock order should be the opposite. In the future diff `unp_lock' will be replaced by per-socket `so_lock' so it's better to enforce `unp_gc_lock' -> `unp_lock' (solock()) lock order and release `unp_lock' in the unp_attach() and unp_detach() paths. The previously committed diffs made this safe. The `unp_df_lock' introduced because the `unp_lock' and `unp_gc_lock' state are unknown when unp_discard() called. Since it touches only `ud_link' list the re-lock dances are unwanted in this path. Also this keeps M_WAITOK allocation outside rwlock(9) when unp_discard() called from unp_externalize() error path. ok bluhm@
2021-11-15Copy p_p->ps_pledge into a local variable (called pledge) in every functionTheo de Raadt
which checks PLEDGE_* bits more than once. Some functions are called without locking, and this avoids misinterpreting bits which have some coupled behaviour. ok cheloha kettenis