summaryrefslogtreecommitdiff
path: root/sys/kern
AgeCommit message (Collapse)Author
2018-02-10Move cleanup job control bits to their own function.Martin Pieuchot
Part of the larger 'proctreelk' diff from guenther@ No functional change, ok benno@, tedu@
2018-02-10mbufs and mbuf clusters are now backed by large pools. Because of thisClaudio Jeker
we can relax the oversubscribe limit of socketbuffers a fair bit. Instead of maxing out as sb_max * 1.125 or 2 * sb_hiwat the maximum is increased to 8 * sb_hiwat -- which seems to be a good compromise between memory waste and better socket buffer usage. OK deraadt@
2018-02-10Syncronize filesystems to disk when suspending. Each mountpoint's vnodesTheo de Raadt
are pushed to disk. Dangling vnodes (unlinked files still in use) and vnodes undergoing change by long-running syscalls are identified -- and such filesystems are marked dirty on-disk while we are suspended (in case power is lost, a fsck will be required). Filesystems without dangling or busy vnodes are marked clean, resulting in faster boots following "battery died" circumstances. Tested by numerous developers, thanks for the feedback.
2018-02-10Use sched_pause(yield) to decide when to yield when filling randomdata.mortimer
ok deraadt@
2018-02-09Call socreate() before falloc() in sys_socket().Martin Pieuchot
This is similar to what we do in sys_socketpair() and will allow us to grab the KERNEL_LOCK() only after having created a socket. ok tedu@
2018-02-08Remove CSRG copyright, there isn't any code left from Berkeley here.Martin Pieuchot
In 2016 natano@ removed the last two functions remaining from the CSRG time: lockinit() and lockstatus(). At that time they were already wrappers around recursive rwlocks functions from thib@ that tedu@ committed in 2013. ok deraadt@
2018-02-08Use a temporary chacha instance to fill large randomdata sections. Avoidsmortimer
grabbing the rnglock repeatedly. ok deraadt@ djm@
2018-02-06slightly randomize the order that new pages populate their item lists in.David Gwynne
ok tedu@ deraadt@
2018-02-06reduce scope of variable a bit to avoid shadowingTed Unangst
2018-01-25Move common mutex implementations to a MI place.Martin Pieuchot
Archs not yet converted can to the jump by defining __USE_MI_MUTEX. ok visa@
2018-01-18While booting it does not make sense to wait for memory, there isAlexander Bluhm
no other process which could free it. Better panic in malloc(9) or pool_get(9) instead of sleeping forever. tested by visa@ patrick@ Jan Klemkow suggested by kettenis@; OK deraadt@
2018-01-16garbage collect an unused variableSebastian Benoit
ok dlg@
2018-01-13introduce a filter called EVFILT_DEVICE that can be used to notifyRobert Nagy
listeners of device state changes. currently only supports NOTE_CHANGE that will be used by drm(4) ok kettenis@
2018-01-11Postpone secondary CPUs until after mounthook activities. This isPatrick Wildt
useful for loading CPU microcode from the disk before the CPUs are let go. Tested by visa@ on sgi, loongson and octeon "don't see immediate issues" kettenis@ ok deraadt@
2018-01-10Mark sosplice task mp safe, do not grab kernel lock for tcp output.Alexander Bluhm
OK mpi@
2018-01-09Change `so_state' and `so_error' to unsigned int such that they canMartin Pieuchot
be atomically read from any context. ok bluhm@, visa@
2018-01-08Do not pass a userland pointer to ktrabstimespec().Martin Pieuchot
Prevents an infinite pagefault/pmap_enter() loop when ktracing apps doing a lot of futex(2) calls like firefox & chrome.
2018-01-08Allow TIOCUCNTL issued on a pty(4) master in promise "tty".Martin Pieuchot
This will be soon be used to emulate BREAK commands in vmd(8). ok nicm@, ccardenas@, deraadt@
2018-01-08Translate the TIOCSBRK & TIOCCBRK ioctl(2)s issued on a pty(4) slave toMartin Pieuchot
corresponding user mode ioctls. If the master part of the pseudo terminal previously enabled TIOCUCNTL, it will now receive the TIOCUCNTL_{S,C}BRK commands. This allows to send BREAKS commands over a pty(4) independently of the serial terminal emulator used. Guidance and ok nicm@, ok ccardenas@, looks ok to deraadt@
2018-01-05Show uvm_fault and trace when typing show panic on a page fault'd kernelPaul Irofti
Currently there is only support for amd64, if this change settles I will add support for the rest of the architectures. OK kettenis@.
2018-01-04Unifdef snd/rcv.Martin Pieuchot
ok visa@, claudio@
2018-01-02Do not memset() the whole structure in sorflush() to keep `sb_flagsintr'Martin Pieuchot
untouched. ok bluhm@, visa@
2018-01-02Stop assuming <sys/file.h> will pull in fcntl.h when _KERNEL is defined.Philip Guenther
ok millert@ sthen@
2018-01-02Fix an off-by-one in the free(9) "passed size was too small" check:Philip Guenther
if the size passed is exactly half the size of the bucket that the allocation was actually from, then it was incorrect. problem noted by florian@ ok florian@ visa@
2018-01-01free(9) sizes for sys_execve.Florian Obser
Convert the hand rolled loop to strlcpy which gives us the size for free(9). OK visa
2018-01-01We are either allocating 2 or three array members. Unroll while loopFlorian Obser
to be able to call free(9) with sizes. off-by-one pointed out by guenther OK visa
2018-01-01copyright++;Jonathan Gray
2017-12-30Don't pull in <sys/file.h> just to get fcntl.hPhilip Guenther
ok deraadt@ krw@
2017-12-30Delete unnecessary <sys/file.h> includesPhilip Guenther
ok millert@ krw@
2017-12-29Make sure that pf_mbuf_link_state_key() does not overwrite anAlexander Bluhm
existing statekey in the mbuf header. Reset the statekey in m_dup_pkthdr(). suggested by and OK sahan@
2017-12-29Make the functions which link the pf state keys to mbufs, inpcbs,Alexander Bluhm
or other states more consistent. OK visa@ sashan@ on a previous version
2017-12-19curproc access isn't necessarily as cheap as a local variable access,Theo de Raadt
so only get it once ok guenther
2017-12-19Remove unused ps_stackgap from process structStefan Kempf
Nothing uses this field since Linux compat was removed. ok mpi@ deraadt@ guenther@
2017-12-19Remove a 27 year old #ifdef notdef chunk involving SB_LOCK.Martin Pieuchot
ok bluhm@
2017-12-19Inline socket buffer related defines, no functional change.Martin Pieuchot
ok bluhm@
2017-12-19Remove unnecessary unlock/lock dance when following a goto.Martin Pieuchot
ok bluhm@
2017-12-18Revert support for multiple threads to enter kqueue_scan() in parallel.Martin Pieuchot
It is not clear if this change is responsible for the lockups experienced by dhill@ and jcs@ but since we're no longer grabbing the socket lock in kqueue(2) filters there's no need for this change.
2017-12-18Revert grabbing the socket lock in kqueue(2) filters.Martin Pieuchot
This change exposed or created a situation where a CPU started to be irresponsive while holding the KERNEL_LOCK(). These led to lockups and even with MP_LOCKDEBUG it was not clear what happened to this CPU. These situations have been experience by dhill@ with dcrwallet and jcs@ with syncthing. Both applications are written in Go and do kevent(2) & networking across multiple threads.
2017-12-18Make rw_exit() always succeed after a panic.Martin Pieuchot
Prevents a deadlock in if_downall() when rw_enter() succeed without really grabbing the lock. Reported by and ok phessler@
2017-12-18Add the CLOCK_BOOTTIME clockid for use with clock_gettime(2)cheloha
and put it to use in userspace in lieu of the kern.boottime sysctl. Its absolute value is the time that has elapsed since the system booted, i.e., the system uptime. Use in top(1), w(1), and snmpd(8) eliminates a race with settimeofday(2), adjtime(2), etc. inherent to deriving the system uptime via the kern.boottime sysctl. Product of a great deal of discussion/revision with jca@, tb@, and guenther@. ok tb@ jca@ guenther@ dlg@ mlarkin@ tom@
2017-12-14make sched_barrier use cond_wait/cond_signal.David Gwynne
previously the code was using a percpu flag to manage the sleeps/wakeups, which means multiple threads waiting for a barrier on a cpu could race. moving to a cond struct on the stack fixes this. while here, get rid of the sbar taskq and just use systqmp instead. the barrier tasks are short, so there's no real downside. ok mpi@
2017-12-14Don't bother using DETACH_FORCE for the softraid luns at rebootTheo de Raadt
time; the aggressive mountpoint destruction seems to hit insane use-after-frees when we are already far on the way down.
2017-12-14Give vflush_vnode() a hint about vnodes we don't need to account as "busy".Theo de Raadt
Change mountpoint to RDONLY a little later. Seems to improve the rw->ro transition a bit.
2017-12-14i forgot to convert timeout_proc_barrier to cond_signalDavid Gwynne
2017-12-14replace the bare sleep state handling in barriers with wait cond codeDavid Gwynne
2017-12-14add code to provide simple wait condition handling.David Gwynne
this will be used to replace the bare sleep_state handling in a bunch of places, starting with the barriers.
2017-12-12syncTheo de Raadt
2017-12-12pledge()'s 2nd argument becomes char *execpromises, which becomes theTheo de Raadt
pledge for a new execve image immediately upon start. Also introduces "error" which makes violations return -1 ENOSYS instead of killing the program ("error" may not be handed to a setuid/setgid program, which may be missing/ignoring syscall return values and would continue with inconsistant state) Discussion with many florian has used this to improve the strictness of a daemon
2017-12-11Format the vnode lists of ddb show mount properly in columns.Alexander Bluhm
OK krw@
2017-12-11In uvm Chuck decided backing store would not be allocated proactivelyTheo de Raadt
for blocks re-fetchable from the filesystem. However at reboot time, filesystems are unmounted, and since processes lack backing store they are killed. Since the scheduler is still running, in some cases init is killed... which drops us to ddb [noted by bluhm]. Solution is to convert filesystems to read-only [proposed by kettenis]. The tale follows: sys_reboot() should pass proc * to MD boot() to vfs_shutdown() which completes current IO with vfs_busy VB_WRITE|VB_WAIT, then calls VFS_MOUNT() with MNT_UPDATE | MNT_RDONLY, soon teaching us that *fs_mount() calls a copyin() late... so store the sizes in vfsconflist[] and move the copyin() to sys_mount()... and notice nfs_mount copyin() is size-variant, so kill legacy struct nfs_args3. Next we learn ffs_mount()'s MNT_UPDATE code is sharp and rusty especially wrt softdep, so fix some bugs adn add ~MNT_SOFTDEP to the downgrade. Some vnodes need a little more help, so tie them to &dead_vnops. ffs_mount calling DIOCCACHESYNC is causing a bit of grief still but this issue is seperate and will be dealt with in time. couple hundred reboots by bluhm and myself, advice from guenther and others at the hut