summaryrefslogtreecommitdiff
path: root/sys/kern
AgeCommit message (Collapse)Author
2019-07-05Use timeout_add_msec(9)kn
Although the period is specified in seconds, convert to milliseconds so uneven periods are not truncated after integer division by two. Input and OK cheloha
2019-07-04Remove a useless kernel lock from the TCP socket splicing path.Alexander Bluhm
When send buffer space in the drain socket becomes available, a task is added to move data, and also the userland was informed. The latter is not usefull as this would mix a kernel and user stream. So programs do not wait for this event. Avoid calling sowakeup() from sowwakeup(), this also reduces grabing the kernel lock. Instead inform the userland about the write event when the splicing is dissolved in sounsplice(). OK claudio@
2019-07-03Add tsleep_nsec(9), msleep_nsec(9), and rwsleep_nsec(9).cheloha
Equivalent to their unsuffixed counterparts except that (a) they take a timeout in terms of nanoseconds, and (b) INFSLP, aka UINT64_MAX (not zero) indicates that a timeout should not be set. For now, zero nanoseconds is not a strictly valid invocation: we log a warning on DIAGNOSTIC kernels if we see such a call. We still sleep until the next tick in such a case, however. In the future this could become some sort of poll... TBD. To facilitate conversions to these interfaces: add inline conversion functions to sys/time.h for turning your timeout into nanoseconds. Also do a few easy conversions for warmup and to demonstrate how further conversions should be done. Lots of input from mpi@ and ratchov@. Additional input from tedu@, deraadt@, mortimer@, millert@, and claudio@. Partly inspired by FreeBSD r247787. positive feedback from deraadt@, ok mpi@
2019-07-03Lock the kernel when removing file descriptors from the descriptorVisa Hankala
table. This should prevent a race with kevent when unlocked code closes file descriptors that are fully set up. OK mpi@
2019-07-03add the kernel side of net.link.ifrxq.pressure_return and pressure_dropDavid Gwynne
these values are used as the backpressure thresholds in the interface rx q processing code. theyre being exposed as tunables to userland while we are figuring out what the best values for them are. ok visa@ deraadt@
2019-07-02R.I.P. timespecfix(); ok visa@ mpi@cheloha
2019-07-01kevent(2): remove 24hr timeout limitcheloha
As with nanosleep(2), poll(2), and select(2), here we can chip away at the timespec until it's empty. This lets us support the full range of the timespec regardless of the kernel's HZ. Update the manpage accordingly. ok visa@
2019-06-28Skip VFS barrier lock during normal operation to reduce overhead.Visa Hankala
This removes a system-wide serialization point, which might help finding timing-related bugs. OK deraadt@ anton@
2019-06-26allow more video(4) ioctls for the video pledge (required by chromium)Robert Nagy
ok deraadt@
2019-06-26Return EINVAL, not EBADF for fcntl(fd, F_GETLK) of a non-vnode.Todd C. Miller
Matches the recent F_SETLK change, POSIX and the man page.
2019-06-25Return EINVAL not EBADF when trying to lock a non-vnode.Todd C. Miller
This behavior matches POSIX and our own fnctl(2) man page. OK anton@ deraadt@
2019-06-24regenVisa Hankala
2019-06-24Unlock getrlimit(2) and setrlimit(2).Visa Hankala
OK semarie@ mpi@ deraadt@ anton@
2019-06-24Guard uvm_map_protect() with kernel lock to prepare dosetrlimit()Visa Hankala
for unlocking. OK semarie@ mpi@ deraadt@ anton@
2019-06-23Make taskq_barrier(9) work for multi-threaded task queues.Mark Kettenis
ok visa@
2019-06-22push the KERNEL_LOCK deeper on read(2) and write(2)Sebastien Marie
unlocks read(2) and write(2) syscalls families, and push the KERNEL_LOCK deeper in the code path. KERNEL_LOCK is managed per file type in fileops handlers (fo_read, fo_write, and fo_close). read(2) and write(2) on socket are KERNEL_LOCK-free. initial work from mpi@ and ians@ ok mpi@ kettenis@ visa@ ians@
2019-06-22push the KERNEL_LOCK deeper on read(2) and write(2)Sebastien Marie
unlocks read(2) and write(2) syscalls families, and push the KERNEL_LOCK deeper in the code path. KERNEL_LOCK is managed per file type in fileops handlers (fo_read, fo_write, and fo_close). read(2) and write(2) on socket are KERNEL_LOCK-free. initial work from mpi@ and ians@ ok mpi@ kettenis@ visa@ ians@
2019-06-21Make resource limit access MP-safe. So far, the copy-on-write sharingVisa Hankala
of resource limit structs has been done between processes. By applying copy-on-write also between threads, threads can read rlimits in a nearly lock-free manner. Inspired by code in DragonFly BSD and FreeBSD. OK mpi@, agreement from jmatthew@ and anton@
2019-06-20Undefined behavior (UB) can potentially be present anywhere in theanton
kernel. kubsan reports findings using printf() and assuming that calling printf() is safe in all contexts can be problematic. Instead, defer reporting of findings to the systq task queue. Storage for findings is allocated early in the boot process in order to catch potential UB during boot. The same findings are reported once the task queue subsystem has been initialized. Feedback from kettenis@ and ok mpi@
2019-06-20Work around locking issues in logwakeup(). Instead of actually waking upVisa Hankala
waiters, just set a flag in logwakeup(). The flag is later noted through periodic polling. This lets the wakeup code run with sufficient locking. logwakeup() is a very tricky place to take locks because the function can be called in many different contexts. By not requiring locks in the routine helps to keep printf(9) as usable as possible. OK mpi@
2019-06-19the pledge STATLIE code is no longer needed, as discussed with beck.Theo de Raadt
it actually isn't reached...
2019-06-18Ensure that timeout p_sleep_to is not left running when finishing sleep.Visa Hankala
This is necessary when invoking sleep_finish_timeout() without the kernel lock. If not cancelled properly, an already running endtsleep() might cause a spurious wakeup on the thread if the thread re-enters a sleep queue very quickly before the handler completes. The flag P_TIMEOUT should stay cleared across the timeout cancellation. Add an assertion for that. OK mpi@
2019-06-17dosendsyslog() must only pass ktrgenio(9) userspace buffers that it canPhilip Guenther
use copyin() on. While here: just put the struct iovec for ktrace on the stack instead of mallocing and freeing it. problem debugged by patrick@ ok deraadt@ mpi@
2019-06-16SYS___realpath is legitimately PLEDGE_STDIO, because the other pledgeTheo de Raadt
feature bits checked in namei()
2019-06-16In previous commit I forgot a net unlock if the PCB of the socketAlexander Bluhm
was already gone. OK mpi@
2019-06-15Have __realpath() do the pathname==NULL -> EINVAL check itself, eliminatingTheo de Raadt
the need to do this in libc. btw, it is unfortunate posix went this way, because converting a clearly illegal condition to not be fatal but instead return an error which is potentially not checked in the caller, is sadly a large component of the runaway-train model that makes exploitation of software easy.. illegal software should crash hard. ok beck
2019-06-13Use PWAIT instead of PUSER in exit1().Martin Pieuchot
When the main thread of a MT process dies, it doesn't matter at which priority it gets awaken to do the lasts cleanups. Not using PUSER makes it easier to understand the existing scheduler logic. ok visa@
2019-06-13When tcp_close() is running in parallel with fill_file(), the kernelAlexander Bluhm
could crash due to missing inp_ppcb. This happend when fstat(1) was called often and TCP was aborted with reset. Protect the sysctl path with the net lock. OK mpi@
2019-06-10add m_microtime for getting the wall clock time associated with a packetDavid Gwynne
if the packet has the M_TIMESTAMP csum_flag, ph_timestamp is added to the boottime clock, otherwise it just uses microtime().
2019-06-10Avoid changing resource limits in rucheck() by introducing a new stateVisa Hankala
variable that tracks when to send next SIGXCPU. This eases MP work and prevents accidental alteration of shared resource limit structs. OK mpi@ semarie@
2019-06-09Add a temporary workaround to make removal of giant files betterBob Beck
mlarkin@ noticed we would freeze while removing enormous files because of the amount of work done to invalidate buffers on unlink. This adds a temporary workaround to ensure we give up the lock and yield while doing this. The longer term answer will be to move these buffers to another list and not do the work here. ok deraadt@
2019-06-06Restore missing newline.Visa Hankala
2019-06-04Let SP kernel work with WITNESS. The necessary instrumentation wasVisa Hankala
missing from the SP variant of mtx_enter() and mtx_enter_try(). mtx_leave() was correct already. Prompted by and OK patrick@
2019-06-03sort struct declarationsanton
2019-06-03Switch from bintime_add() et al. to bintimeadd(9).cheloha
Basically just make all the bintime routines look and behave more like the timeradd(3) macros. Switch to three-argument forms for structure math, introduce and use bintimecmp(9), and rename the structure conversion routines to resemble e.g. TIMEVAL_TO_TIMESPEC(3). Document all of this in a new bintimeadd.9 page. Code input from mpi@, manpage input from schwarze@. code ok mpi@, docs ok schwarze@, docs probably still ok jmc@
2019-06-02Move initialization of limit0 into a dedicated function. This newVisa Hankala
function is also a proper place for setting up the plimit pool. While here, raise the IPL of the plimit pool to IPL_MPFLOOR, needed in upcoming MP work. OK claudio@
2019-06-01Revert to using the SCHED_LOCK() to protect time accounting.Martin Pieuchot
It currently creates a lock ordering problem because SCHED_LOCK() is taken by hardclock(). That means the "priorities" of a thread should be moved out of the SCHED_LOCK() first in order to make progress. Reported-by: syzbot+8e4863b3dde88eb706dc@syzkaller.appspotmail.com via anton@ as well as by kettenis@
2019-05-31Use a per-process mutex to protect time accounting instead of SCHED_LOCK().Martin Pieuchot
Note that hardclock(9) still increments p_{u,s,i}ticks without holding a lock. ok visa@, cheloha@
2019-05-31Rename struct plimit field p_refcnt to pl_refcnt to avoid confusionVisa Hankala
with the fields of struct proc. Make pl_refcnt unsigned for upcoming atomic updating. OK deraadt@ guenther@
2019-05-30Fix the initialization of bp before calling vfs_getcwd_commonBob Beck
It is bad style to make a pointer point outside the object so correct this to simply point to the last byte up front. ok deraadt@
2019-05-30namei() generate KTR_NAMEI record input filenames, but getcwd(2) andTheo de Raadt
realpath(2) have output filenames. Generate additional KTR_NAMEI records upon success. ok millert beck
2019-05-30use copyoutstr, instead of fragile range math; ok beckTheo de Raadt
2019-05-30Correct call to vfs_getcwd_common from within __realpathBob Beck
I borrowed an example usage from __getcwd poorly to begin with and then there was some other strangeness in there. diagnosed with deraadt. ok deraadt@
2019-05-29The past is fuzzy, but it appears during development of __getcwd, *retvalTheo de Raadt
was used to return the length of the path, when the actual return value is 0. This would cause confusing results in ktrace. Diagnosed with beck since __realpath() picked up the same odd behaviour
2019-05-25Do not account spinning time as running time when a thread crosses aMartin Pieuchot
tick boundary of schedlock(). This reduces the contention on the SCHED_LOCK() when the current thread is already spinning. Prompted by deraadt@, ok visa@
2019-05-24rename struct for consistencyanton
2019-05-24fix incorrect order of argumentsanton
2019-05-24A source location in kubsan is an absolute path making reports quiteanton
long. Instead, use everything after the first /sys/ segment as the path.
2019-05-24The latest inteldrm update brought along code making use ofanton
__attribute__((nonnull)); which the undefined behavior sanitizer in clang is aware of. A new handler is therefore needed in order to compile a kernel with kubsan enabled. ok visa@
2019-05-24Prevent a kernel hang if an empty message is sent over an SOCK_SEQPACKETAlexander Bluhm
socketpair. Do not wakeup receiver if there is no data available. OK claudio@ anton@