summaryrefslogtreecommitdiff
path: root/sys/kern
AgeCommit message (Collapse)Author
2020-04-29Ensure that if we are doing a delayed write with a NOCACHE buffer, weBob Beck
clear the NOCACHE flag, since if we are doing a delayed write the buffer must be cached or it is thrown away when the "write" is done. fixes vnd on mfs regress tests. ok kettenis@ deraadt@
2020-04-15Fix panic message.Mark Kettenis
ok millert@, deraadt@
2020-04-12In sosplice(), temporarily release the socket lock before callinganton
FRELE() as the last reference could be dropped which in turn will cause soclose() to be called where the socket lock is unconditionally acquired. Note that this is only a problem for sockets protected by the non-recursive NET_LOCK() right now. ok mpi@ visa@ Reported-by: syzbot+7c805a09545d997b924d@syzkaller.appspotmail.com
2020-04-11Add soassertlocked() checks to sbappend() and sbappendaddr(). This bringsClaudio Jeker
them in line with sbappendstream() and sbappendrecord(). Agreed by mpi@
2020-04-08Make fifo_kqfilter() honor FREAD|FWRITE just like fifo_poll() does.Martin Pieuchot
Prevent generating events that do not correspond to how the fifo has been opened. ok visa@, millert@
2020-04-07Abstract the head of knote lists. This allows extending the lists,Visa Hankala
for example, with locking assertions. OK mpi@, anton@
2020-04-07Defer selwakeup() from kqueue_wakeup() to kqueue_task() to preventVisa Hankala
deep recursion. This also helps making kqueue_wakeup() free of the kernel lock because the current implementation of selwakeup() requires the lock. OK mpi@
2020-04-06Fix single thread behaviour in sleep_setup_signal(). If a thread needs toClaudio Jeker
suspend (SINGLE_SUSPEND or SINGLE_PTRACE) it needs to do this in sleep_setup_signal(). This way the case where single_thread_clear() is called before the sleep gets its wakeup call can be correctly handled and the thread is put back to sleep in sleep_finish(). If the wakeup happens before unsuspend then p_wchan is 0 and the thread will not go to sleep again. In case of a unwind an error is returned causing the thread to return immediatly with that error. With and OK mpi@ kettenis@
2020-04-06futex(2): FUTEX_WAIT: rwsleep_nsec(9) at least one nanosecondcheloha
mpi@ and I added a warning log to *sleep_nsec(9) last year to smoke out division-to-zero bugs when converting kernel code from *sleep(9) to the new interfaces. It whines if you tell it to sleep for zero nanoseconds. Now that rwsleep_nsec(9) is exposed to userspace via futex(2), though, it is possible to get a legitimate zero-nanosecond timeout from the user. This can cause a lot of logging, which apparently can cause hiccups and hangs in Mesa. As a quick fix we can round the timeout up to one nanosecond and silence the warning. No logs, no delays, no hiccups or hangs. -- Aside: it is unclear what we are supposed to do in the FUTEX_WAIT zero-nanosecond timeout case: block for a tick or return ETIMEDOUT immediately. The Linux futex(2) manpage does not mention the case. It'd be nice to knew what the proper behavior is. -- Prompted by matthieu@. Input from kettenis@ and deraadt@. Tested by matthieu@, ajacoutot@. In snaps since Mar 27 2020. ok ajacoutot@, deraadt@, kettenis@.
2020-04-05Declare pledgenames[] as const.Visa Hankala
OK deraadt@
2020-04-03Adjust SMR_ASSERT_CRITICAL() and SMR_ASSERT_NONCRITICAL() so that theVisa Hankala
panic message shows the actual code location of the assert. Do this by moving the assert logic inside the macros. Prompted by and OK claudio@ OK mpi@
2020-04-03Take an explicit write reference when associating a thread with a ktraceVisa Hankala
vnode. This lets other parts of the kernel see the vnode as active for writing. In particular, now quotaon_vnode() properly sets up quotas for ktrace vnodes. This fixes a crash that could happen if quotas were turned on while a process was ktraced. ktrace vnodes are opened for writing and an initial write reference is provided for them by vn_open(9). However, this reference is removed, too early, when sys_ktrace() calls vn_close(9). Crash reported and fix tested by Bryan Stenson OK mpi@
2020-04-02Introduce kqueue_sleep() a wrapper around the tsleep(9) dance.Martin Pieuchot
While here check for the validity of the timeout at the begining of the syscall. ok kettenis@, cheloha@
2020-03-31Move sleep_finish_all() down to where sleep_finish() and all otherClaudio Jeker
sleep_setup/finish related functions are. OK kettenis@
2020-03-31Revert previous, syzkaller found a way to trigger the KASSERT().Martin Pieuchot
Let's fix this before we put them back :o)
2020-03-30Document that `a_p' is always curproc by using a KASSERT().Martin Pieuchot
This will allows for future simplifications of the VFS interfaces. Tested in a bulk by naddy@ and visa@. ok visa@, anton@
2020-03-27Relax the lockcount assertion in vputonfreelist(). Back when I fixedanton
several problems with the vnode exclusive lock implementation, I overlooked the fact that a vnode can be in a state where the usecount is zero while the holdcount still being positive. There could still be threads waiting on the vnode lock in uvn_io() as long as the holdcount is positive. "go ahead" mpi@ Reported-by: syzbot+767d6deb1a647850a0ca@syzkaller.appspotmail.com
2020-03-26Revert Rev 1.164. Setting sls_sig to 0 uncovered a bunch of issues when itClaudio Jeker
comes to setting a process into single thread mode. It is still worng but first the interaction with single_thread_set() must be corrected.
2020-03-23Check the outcome of ktrstart() and skip tracing if the trace fileVisa Hankala
header could not be written. OK anton@ mpi@
2020-03-23Prevent tsleep(9) with PCATCH from returning immediately without errorVisa Hankala
when called during execve(2). This was a caused by initializing sls_sig with value 0 in r1.164 of kern_synch.c. Previously, tsleep(9) returned immediately with EINTR in similar circumstances. The immediate return without error can cause a system hang. For example, vwaitforio() could end up spinning if called during execve(2) because the thread did not enter sleep and other threads were not able to finish the I/O. tsleep vwaitforio nfs_flush nfs_close VOP_CLOSE vn_closefile fdrop closef fdcloseexec sys_execve Fix the issue by checking (p->p_flag & P_SUSPSINGLE) instead of (p->p_p->ps_single != NULL) in sleep_setup_signal(). The former is more selective than the latter and allows the thread that invokes execve(2) enter sleep normally. Bug report, change bisecting and testing help by Pavel Korovin OK claudio@ mpi@
2020-03-22remove unused variable; ok beck@ mpi@anton
2020-03-21Stop tracing if vget(9) fails.Martin Pieuchot
Make sure to release the last reference of the vnode after all other traced processes have given up on it. CID 1453020 Unchecked return value. Inputs from guenther@, ok visa@
2020-03-20futex(2): futex_wait(): ensure timeout is set when calling rwsleep_nsec(9)cheloha
rwsleep_nsec(9) will not set a timeout if the nsecs parameter is equal to INFSLP (UINT64_MAX). We need to limit the duration to MAXTSLP (UINT64_MAX - 1) to ensure a timeout is set.
2020-03-20__thrsleep(2): ensure timeout is set when calling tsleep_nsec(9)cheloha
tsleep_nsec(9) will not set a timeout if the nsecs parameter is equal to INFSLP (UINT64_MAX). We need to limit the duration to MAXTSLP (UINT64_MAX - 1) to ensure a timeout is set.
2020-03-20Use atomic operations to update ps_singlecount. This makesClaudio Jeker
single_thread_check() safe to be called without KERNEL_LOCK(). single_thread_wait() needs to use sleep_setup() and sleep_finish() instead of tsleep() to make sure no wakeup() is lost. Input kettenis@, with and OK visa@
2020-03-20__thrsleep(2): fix absolute timeout checkcheloha
An absolute timeout T elapses when the clock has reached time T, i.e. when T is less than or equal to the clock's current time. But the current code thinks T elapses only when the clock is strictly greater than T. For example, if my absolute timeout is 1.00000000, the current code will not return EWOULDBLOCK until the clock reaches 1.00000001. This is wrong: my absolute timeout elapses a nanosecond prior to that point. So the timespeccmp(3) here should be timespeccmp(tsp, &now, <=) and not timespeccmp(tsp, &now, <) as it is currently.
2020-03-20kevent(2): tsleep(9) -> tsleep_nsec(9)cheloha
With input from visa@. ok visa@
2020-03-20poll(2), ppoll(2), pselect(2), select(2): tsleep(9) -> tsleep_nsec(9)cheloha
With input from visa@. ok visa@
2020-03-20nanosleep(2): tsleep(9) -> tsleep_nsec(9)cheloha
While here, rename the wait channel so the tsleep_nsec(9) call will fit onto a single line. It isn't a global channel so the name is arbitrary anyway. With input from visa@. ok visa@
2020-03-19Separate variable declaration and assignment. No functional change.anton
Requested by mpi@
2020-03-19Move unveil data structures away from the proc.h header into theanton
implementation file. Pushing the assignment of ps_uvpcwd down to unveil_add() is required but it doesn't introduce any functional change. ok mpi@ semarie@
2020-03-18regenanton
2020-03-18Unlock flock(2).anton
ok mpi@ visa@
2020-03-18Restart child process scan in dowait4() if single_thread_wait() sleeps.Visa Hankala
This ensures that the conditions checked are still in force. The sleep breaks atomicity, allowing another thread to alter the state. single_thread_set() should return immediately after sleep when called from dowait4() because there is no guarantee that the process pr still exists. When called from single_thread_set(), the process is that of the calling thread, which prevents process pr from disappearing. OK anton@, mpi@, claudio@
2020-03-16Keep track of traced child under a list of orphans while they are beingMartin Pieuchot
reparented to a debugger process. Also re-parent exiting traced processes to their original parent, if it is still alive, after the debugger has seen the exit status. Logic comes from FreeBSD pointed out by guenther@. While here rename proc_reparent() into process_reparent() and get rid of superfluous checks. ok visa@
2020-03-15Fix memory corruption with kern.witness.locktrace.Visa Hankala
The allocating of lock stacks does not handle correctly the case where the system-wide free list becomes empty. Consequently, the returned stack buffer can still be on the CPU's free list. This patch fixes the bug by simplifying the code. Lock stack buffers are now allocated and freed one by one from the system-wide free list instead of using batching. The fix additionally addresses a buffer hoarding problem that can arise under workloads where some CPUs are net acquirers and some other CPUs net releasers of rwlocks. Panic reported by Hrvoje Popovski
2020-03-13Initialize sls_sig to 0 and not 1. sls_sig stores the signal number of aClaudio Jeker
possible signal that was caught during sleep setup. It does not make sense to have a default of 1 (SIGHUP) for this. OK visa@ mpi@
2020-03-13In order to unlock flock(2), make writes to the f_iflags field of structanton
file atomic. This also gets rid of the last kernel lock protected field in the scope of struct file. ok mpi@ visa@
2020-03-13Simplify logic, the "netboot" interface is always related to `bootdv'.Martin Pieuchot
Logic is hard, so keep only one of two logically equivalent statements. CID 271085 ok kettenis@, deraadt@, miod@
2020-03-13Rename "sigacts" flag field to avoid conflict with the "process" one.Martin Pieuchot
This shows that atomic_* operations should not be necessery to write to this field unlike with the process one. The advantage of using a somewhat-unique prefix for struct member is moot when multiple definitions use the same prefix :o) From Amit Kulkarni, ok claudio@
2020-03-12Revert previous. Something in it causes unexpected slowdown.Visa Hankala
2020-03-12Enable caching when turning a synchronous write into a delayed write.Visa Hankala
Otherwise the write will be discarded, which would prevent use of vnd(4) on top of an async-mounted file system. OK beck@
2020-03-11Fix unlimited recursion caused by local outbound bcast/mcast packetAlexandr Nedvedicky
sent via spliced socket. Reported-by: syzbot+2f9616f39d3f3b281cfb@syzkaller.appspotmail.com OK bluhm@
2020-03-11Move the sigprop definition and the other bits under SIGPROP intoClaudio Jeker
kern_sig.c where they are currently added by the include. While doing that mark the sigprop array as const. OK mpi@ anton@ millert@
2020-03-10regenanton
2020-03-10Unlock fcntl(2).anton
ok visa@
2020-03-09Return EINVAL for KERN_PROC if the size parameter is 0.Todd C. Miller
Prevents a panic due to a NULL dereference; Coverity CID 1452899. Based on a diff from mpi@, OK deraadt@ kettenis@
2020-03-05The 'lock spun out' db_printf needs a newline. All other MP_LOCKDEBUGClaudio Jeker
messages do have the newline already. OK anton@ kettenis@
2020-03-04Make an assertion free from side effects. The intention was probably toanton
assert that the wire count is equal to 1 and not unconditionally set it to 1. ok kettenis@ mpi@
2020-03-04Grab a reference for the shared memory segment before calling uvm_map()anton
as the same function could end up putting the thread to sleep. Allowing another thread to free the shared memory segment, which in turns causes a use-after-free. With help from and ok millert@ visa@ Reported-by: syzbot+0fc1766671a9461de8a5@syzkaller.appspotmail.com