summaryrefslogtreecommitdiff
path: root/sys/kern
AgeCommit message (Collapse)Author
2017-12-14make sched_barrier use cond_wait/cond_signal.David Gwynne
previously the code was using a percpu flag to manage the sleeps/wakeups, which means multiple threads waiting for a barrier on a cpu could race. moving to a cond struct on the stack fixes this. while here, get rid of the sbar taskq and just use systqmp instead. the barrier tasks are short, so there's no real downside. ok mpi@
2017-12-14Don't bother using DETACH_FORCE for the softraid luns at rebootTheo de Raadt
time; the aggressive mountpoint destruction seems to hit insane use-after-frees when we are already far on the way down.
2017-12-14Give vflush_vnode() a hint about vnodes we don't need to account as "busy".Theo de Raadt
Change mountpoint to RDONLY a little later. Seems to improve the rw->ro transition a bit.
2017-12-14i forgot to convert timeout_proc_barrier to cond_signalDavid Gwynne
2017-12-14replace the bare sleep state handling in barriers with wait cond codeDavid Gwynne
2017-12-14add code to provide simple wait condition handling.David Gwynne
this will be used to replace the bare sleep_state handling in a bunch of places, starting with the barriers.
2017-12-12syncTheo de Raadt
2017-12-12pledge()'s 2nd argument becomes char *execpromises, which becomes theTheo de Raadt
pledge for a new execve image immediately upon start. Also introduces "error" which makes violations return -1 ENOSYS instead of killing the program ("error" may not be handed to a setuid/setgid program, which may be missing/ignoring syscall return values and would continue with inconsistant state) Discussion with many florian has used this to improve the strictness of a daemon
2017-12-11Format the vnode lists of ddb show mount properly in columns.Alexander Bluhm
OK krw@
2017-12-11In uvm Chuck decided backing store would not be allocated proactivelyTheo de Raadt
for blocks re-fetchable from the filesystem. However at reboot time, filesystems are unmounted, and since processes lack backing store they are killed. Since the scheduler is still running, in some cases init is killed... which drops us to ddb [noted by bluhm]. Solution is to convert filesystems to read-only [proposed by kettenis]. The tale follows: sys_reboot() should pass proc * to MD boot() to vfs_shutdown() which completes current IO with vfs_busy VB_WRITE|VB_WAIT, then calls VFS_MOUNT() with MNT_UPDATE | MNT_RDONLY, soon teaching us that *fs_mount() calls a copyin() late... so store the sizes in vfsconflist[] and move the copyin() to sys_mount()... and notice nfs_mount copyin() is size-variant, so kill legacy struct nfs_args3. Next we learn ffs_mount()'s MNT_UPDATE code is sharp and rusty especially wrt softdep, so fix some bugs adn add ~MNT_SOFTDEP to the downgrade. Some vnodes need a little more help, so tie them to &dead_vnops. ffs_mount calling DIOCCACHESYNC is causing a bit of grief still but this issue is seperate and will be dealt with in time. couple hundred reboots by bluhm and myself, advice from guenther and others at the hut
2017-12-10Move SB_SPLICE, SB_WAIT and SB_SEL to `sb_flags', serialized by solock().Martin Pieuchot
SB_KNOTE remains the only bit set on `sb_flagsintr' as it is set/unset in contexts related to kqueue(2) where we'd like to avoid grabbing solock(). While here add some KERNEL_LOCK()/UNLOCK() dances around selwakeup() and csignal() to mark which remaining functions need to be addressed in the socket layer. ok visa@, bluhm@
2017-12-09More precision in pledge sysctl reportTheo de Raadt
2017-12-04Change __mp_lock_held() to work with an arbitrary CPU info structure andMartin Pieuchot
extend ddb(4) "ps /o" output to print which CPU is currently holding the KERNEL_LOCK(). Tested by dhill@, ok visa@
2017-12-04Use _kernel_lock_held() instead of __mp_lock_held(&kernel_lock).Martin Pieuchot
ok visa@
2017-11-28Raise the IPL of the sbar taskq to avoid lock order issuesVisa Hankala
with the kernel lock. Fixes a deadlock seen by Hrvoje Popovski and dhill@. OK mpi@, dhill@
2017-11-28deadproc_mutex is only taken _before_ kernel_lock; exclude it fromPhilip Guenther
WITNESS checking as (our) witness code isn't smart enough to let that by. ok visa@
2017-11-28syncPhilip Guenther
2017-11-28Delete fktrace(2). The consequences of it were not thought throughPhilip Guenther
sufficiently and at least one horrific security hole was the result. ok deraadt@ beck@
2017-11-27Fix comment typoPhilip Guenther
2017-11-24add timeout_barrier, which is like intr_barrier and taskq_barrier.David Gwynne
if you're trying to free something that a timeout is using, you have to wait for that timeout to finish running before doing the free. timeout_del can stop a timeout from running in the future, but it doesn't know if a timeout has finished being scheduled and is now running. previously you could know that timeouts are not running by simply masking softclock interrupts on the cpu running the kernel. however, code is now running outside the kernel lock, and timeouts can run in a thread instead of softclock. timeout_barrier solves the first problem by taking the kernel lock and then masking softclock interrupts. that is enough to ensure that any further timeout processing is waiting for those resources to run again. the second problem is solved by having timeout_barrier insert work into the thread. when that work runs, that means all previous work running in that thread has completed. fixes and ok visa@, who thinks this will be useful for his work too.
2017-11-23Constify protocol tables and remove an assert now that ip_deliver() isMartin Pieuchot
mp-safe. ok bluhm@, visa@
2017-11-23We want `sb_flags' to be protected by the socket lock rather than theMartin Pieuchot
KERNEL_LOCK(), so change asserts accordingly. This is now possible since sblock()/sbunlock() are always called with the socket lock held. ok bluhm@, visa@
2017-11-17permit IPV6_V6ONLY in sockoptAaron Bieber
OK deraadt@
2017-11-14Push the NET_LOCK into ifioctl() and use the NET_RLOCK in ifioctl_get().Theo Buehler
In particular, this allows SIOCGIF* requests to run in parallel. lots of help & ok mpi, ok visa, sashan
2017-11-14Fix the initial check of the checkorder and lock operationsVisa Hankala
so that statically initialized locks get properly enrolled to the validator. OK mpi@
2017-11-14remove MALLOC_DEBUGDavid Gwynne
the code has rotted, and obviously hasnt been used for ages. it is also hard to make mpsafe. if we need something like this again it would be better to do it from scratch. ok tedu@ visa@
2017-11-13add taskq_barrierDavid Gwynne
taskq_barrier guarantees that any task that was running on the taskq has finished by the time taskq_barrier returns. it is similar to intr_barrier. this is needed for use in ifq_barrier as part of an upcoming change.
2017-11-04raw_init() is dead and <net/raw_cb.h> doesn't need to be included there.Martin Pieuchot
2017-11-04Make it possible for multiple threads to enter kqueue_scan() in parallel.Martin Pieuchot
This is a requirement to use a sleeping lock inside kqueue filters. It is now possible, but not recommended, to sleep inside ``f_event''. Threads iterating over the list of pending events are now recognizing and skipping other threads' markers. knote_acquire() and knote_release() must be used to "own" a knote to make sure no other thread is sleeping with a reference on it. Acquire and marker logic taken from DragonFly but the KERNEL_LOCK() is still serializing the execution of the kqueue code. This also enable the NET_LOCK() in socket filters. Tested by abieber@ & juanfra@, run by naddy@ in a bulk, ok visa@, bluhm@
2017-11-02Move PRU_DETACH out of pr_usrreq into per proto pr_detachFlorian Obser
functions to pave way for more fine grained locking. Suggested by, comments & OK mpi
2017-10-30Let witness(4) differentiate between taskq mutexes to avoidVisa Hankala
reporting an error in a scenario like the following: 1. mtx_enter(&tqa->tq_mtx); 2. IRQ 3. mtx_enter(&tqb->tq_mtx); Found by Hrvoje Popovski, OK mpi@
2017-10-29Move NET_{,UN}LOCK into individual slowtimo functions.Florian Obser
Direction suggested by mpi OK mpi, visa
2017-10-24Use membar_enter_after_atomic(9) amd membar_exit_before_atomic(9).Martin Pieuchot
Micro-optimization useful to x86 archs where the cmpxchg{q,l} instruction used by rw_enter(9) and rw_exit(9) already include an implicit memory barrier. From Mateusz Guzik, ok visa@, mikeb@, kettenis@
2017-10-17Add a machine-independent implementation for the mplock.Visa Hankala
This reduces code duplication and makes it easier to instrument lock primitives. The MI mplock uses the ticket lock code that has been in use on amd64, i386 and sparc64. These are the architectures that now switch to the MI code. The lock_machdep.c files are unhooked from the build but not removed yet, in case something goes wrong. OK mpi@, kettenis@
2017-10-17Print the pid of the most recent program that failed to send a logMartin Pieuchot
via sendsyslog(2) along with the corresponding errno. Help when troubleshooting which program is triggering an error, like an overflow. ok bluhm@
2017-10-14Split sys_ptrace() by request type:Philip Guenther
- control operations: trace_me, attach, detach, step, kill, continue. Manipulate process relation/state or send a signal - kernel-state get/set: thread list, event mask, trace state. About the process and don't require target to be stopped, need copyin/out - user-state get/set: memory, register, window cookie. Often thread-specific, require target to be stopped, need copyin/out sys_ptrace() changes to handle request checking, copyin/out to kernel buffers with size check and zeroing, and dispatching to the routines above for the real work. This simplfies the permission checks and copyin/out handling and will simplify lock handling in the future. Inspired in part by FreeBSD. ok mpi@ visa@
2017-10-12Print the word pledge in the kernel log when there is a violation.Alexander Bluhm
This should make it easier to figure out what is going on. Note that the pledgecode it shows is only a guess which pledge(2) might help. OK deraadt@ semarie@
2017-10-12Use a temporary variable in rw_status() to dereference only once theMartin Pieuchot
volatile member of the struct. Not forcing a memory read on every access, 3 in this function, might reduce cache traffic in some cases. Micro-optimization and diff provided by Mateusz Guzik. ok visa@
2017-10-12Move sysctl_mq() where it can safely mess with mbuf queue internals.Martin Pieuchot
ok visa@, bluhm@, deraadt@
2017-10-11Move `kq_count' increase/decrease close to the corresponding TAILQ_*Martin Pieuchot
insert/remove operation. No functionnal change for the moment. However this helps to make this code mp-safe. Note that markers are still not, and wont be, counted. ok visa@, jsing@, bluhm@
2017-10-11Move kq_kev from struct kqueue to the stack.Martin Pieuchot
It turns this set of events per-thread without having to lock anything. From Dragonfly 10f6680a4f6684751aaae0965abfe140f19e9231 ok kettenis@, visa@, bluhm@
2017-10-09Reduces the scope of the NET_LOCK() in sysctl(2) path.Martin Pieuchot
Exposes per-CPU counters to real parrallelism. ok visa@, bluhm@, jca@
2017-10-09Make _kernel_lock_held() always succeed after panic(9).Martin Pieuchot
ok visa@
2017-10-07In "tty", permitting TIOCSTART is fineTheo de Raadt
2017-10-07permit SYS___set_tcb, upcoming code will require thisTheo de Raadt
2017-09-29New ddb(4) command: kill.Martin Pieuchot
Send an uncatchable SIGABRT to the process specified by the pid argument. Useful in case of CPU exhaustion to kill the DoSing process and generate a core for later inspection. ok phessler@, visa@, kettenis@, miod@
2017-09-27guenther sleep-commited the version without #ifdefsTheo de Raadt
2017-09-27amd64 needs FS.base values (the TCB pointer) to be validated, as noncanonicalPhilip Guenther
addresses will cause a fault on load by the kernel. Problem observed by Maxime Villard ok kettenis@ deraadt@
2017-09-25sendsyslog should take a const char * everywhere.Marc Espie
okay bluhm@, deraadt@
2017-09-15Coverity complains that top == NULL was checked and further downAlexander Bluhm
top->m_pkthdr.len was accessed without check. See CID 1452933. In fact top cannot be NULL there and the condition was always false. m_getuio() did never reserve space for the header. The correct check is m == top to find the first mbuf. OK visa@