summaryrefslogtreecommitdiff
path: root/sys/kern/kern_sched.c
AgeCommit message (Collapse)Author
2019-10-15Reduce the number of places where `p_priority' and `p_stat' are set.Martin Pieuchot
This refactoring will help future scheduler locking, in particular to shrink the SCHED_LOCK(). No intended behavior change. ok visa@
2019-06-01Revert to using the SCHED_LOCK() to protect time accounting.Martin Pieuchot
It currently creates a lock ordering problem because SCHED_LOCK() is taken by hardclock(). That means the "priorities" of a thread should be moved out of the SCHED_LOCK() first in order to make progress. Reported-by: syzbot+8e4863b3dde88eb706dc@syzkaller.appspotmail.com via anton@ as well as by kettenis@
2019-05-31Use a per-process mutex to protect time accounting instead of SCHED_LOCK().Martin Pieuchot
Note that hardclock(9) still increments p_{u,s,i}ticks without holding a lock. ok visa@, cheloha@
2019-03-26Make sure that each ci has its spc_deferred queue initialized.Visa Hankala
Otherwise, the system can crash in smr_call_impl() if SMT is enabled later. Crash reported by jcs@
2019-02-26Introduce safe memory reclamation, a mechanism for reclaiming sharedVisa Hankala
objects that readers can access without locking. This provides a basis for read-copy-update operations. Readers access SMR-protected shared objects inside SMR read-side critical section where sleeping is not allowed. To reclaim an SMR-protected object, the writer has to ensure mutual exclusion of other writers, remove the object's shared reference and wait until read-side references cannot exist any longer. As an alternative to waiting, the writer can schedule a callback that gets invoked when reclamation is safe. The mechanism relies on CPU quiescent states to determine when an SMR-protected object is ready for reclamation. The <sys/smr.h> header additionally provides an implementation of singly- and doubly-linked lists that can be used together with SMR. These lists allow lockless read access with a concurrent writer. Discussed with many OK mpi@ sashan@
2018-11-17Add new KERN_CPUSTATS sysctl(2) so we can identify offline CPUs.cheloha
Because of hw.smt we need a way to determine whether a given CPU is "online" or "offline" from userspace. KERN_CPTIME2 is an array, and so cannot be cleanly extended for this purpose, so add a new sysctl(2) KERN_CPUSTATS with an extensible struct. At the moment it's just KERN_CPTIME2 with a flags member, but it can grow as needed. KERN_CPUSTATS appears to have been defined by BSDi long ago, but there are few (if any) packages in the wild still using the symbol so breakage in ports should be near zero. No other system inherited the symbol from BSDi, either. Then, use the new sysctl(2) in systat(1) and top(1): - systat(1) draws placeholder marks ('-') instead of percentages for offline CPUs in the cpu view. - systat(1) omits offline CPU ticks when drawing the "big bar" in the vmstat view. The upshot is that the bar isn't half idle when half your logical CPUs are disabled. - top(1) does not draw lines for offline CPUs; if CPUs toggle on or offline in interactive mode we redraw the display to expand/reduce space for the new/missing CPUs. This is consistent with what some top(1) implementations do on Linux. - top(1) omits offline CPUs from the totals when CPU totals are combined into a single line (the '-1' flag). Originally prompted by deraadt@. Discussed endlessly with deraadt@, ketennis@, and sthen@. Tested by jmc@ and jca@. Earlier versions also discussed with jca@. Earlier versions tested by jmc@, tb@, and many others. docs ok jmc@, kernel bits ok ketennis@, everything ok sthen@, "Is your stuff in yet?" deraadt@
2018-10-05Revert KERN_CPTIME2 ENODEV changes in kernel and userspace.cheloha
ok kettenis deraadt
2018-09-26KERN_CPTIME2: set ENODEV if the CPU is offline.cheloha
This lets userspace distinguish between idle CPUs and those that are not schedulable because hw.smt=0. A subsequent commit probably needs to add documentation for this to sysctl.2 (and perhaps elsewhere) after the dust settles. Also included here are changes to systat(1) and top(1) that account for the ENODEV case and adjust behavior accordingly: - systat(1)'s cpu view prints placeholder marks ('-') instead of percentages for each state if the given CPU is offline. - systat(1)'s vmstat view checks for offline CPUs when computing the machine state total and excludes them, so the CPU usage graph only represents the states for online CPUs. - top(1) does not draw CPU rows for offline CPUs when the view is redrawn. If CPUs "go offline", percentages for each state are replaced by placeholder marks ('-'); the view will need to be redrawn to remove these rows. If CPUs "go online" the view will need to be redrawn to show these new CPUs. In "combined CPU" mode, the count and the state totals only represent online CPUs. Ports using KERN_CPTIME2 will need to be updated. The changes described above to make systat(1) and top(1) aware of the ENODEV case *and* gracefully handle a changing HW_NCPUONLINE while the application is running are not necessarily appropriate for each and every port. The changes described above are so extensive in part to demonstrate one way a program *might* be made robust to changing CPU availability. In particular, changing hw.smt after boot is an extremely rare event, and this needs to be weighed when updating ports. The logic needed to account for the KERN_CPTIME2 ENODEV case is very roughly: if (sysctl(...) == -1) { if (errno != ENODEV) { /* Actual error occurred. */ } else { /* CPU is offline. */ } } else { /* CPU is online and CPU states were set by sysctl(2). */ } Prompted by deraadt@. Basic idea for ENODEV from kettenis@. Discussed at length with kettenis@. Additional testing by tb@. No complaints from hackers@ after a week. ok kettenis@, "I think you should commit [now]" deraadt@
2018-07-12Add hw.ncpuonline to count the number of online CPUs.cheloha
The introduction of hw.smt means that logical CPUs can be disabled after boot and prior to suspend/resume. If hw.smt=0 (the default), there needs to be a way to count the number of hardware threads available on the system at any given time. So, import HW_NCPUONLINE/hw.ncpuonline from NetBSD and document it. hw.ncpu becomes equal to the number of CPUs given to sched_init_cpu() during boot, while hw.ncpuonline is equal to the number of CPUs available to the scheduler in the cpuset "sched_all_cpus". Set_SC_NPROCESSORS_ONLN equal to this new sysctl and keep _SC_NPROCESSORS_CONF equal to hw.ncpu. This is preferable to adding a new sysctl to count the number of configured CPUs and keeping hw.ncpu equal to the number of online CPUs because such a change would break software in the ecosystem that relies on HW_NCPU/hw.ncpu to measure CPU usage and the like. Such software in base includes top(1), systat(1), and snmpd(8), and perhaps others. We don't need additional locking to count the cardinality of a cpuset in this case because the only interfaces that can modify said cardinality are sysctl(2) and ioctl(2), both of which are under the KERNEL_LOCK. Software using HW_NCPU/hw.ncpu to determine optimal parallism will need to be updated to use HW_NCPUONLINE/hw.ncpuonline. Until then, such software may perform suboptimally. However, most changes will be similar to the change included here for libcxx's std::thread:hardware_concurrency(): using HW_NCPUONLINE in lieu of HW_NCPU should be sufficient for determining optimal parallelism for most software if the change to _SC_NPROCESSORS_ONLN is insufficient. Prompted by deraadt. Discussed at length with kettenis, deraadt, and sthen. Lots of patch tweaks from kettenis. ok kettenis, "proceed" deraadt
2018-07-07Release the kernel lock fully on thread exit. This prevents a lockingVisa Hankala
error that would happen otherwise when a traced and stopped multithreaded process is forced to exit. The error shows up as a kernel panic when WITNESS is enabled. Without WITNESS, the error causes a system hang. sched_exit() has expected that a single KERNEL_UNLOCK() would release the lock completely. That assumption is wrong when an exit happens through the signal tracing logic: sched_exit exit1 single_thread_check single_thread_set issignal <-- KERNEL_LOCK() userret <-- KERNEL_LOCK() syscall The error is a regression of r1.216 of kern_sig.c. Panic reported and fix tested by Laurence Tratt OK mpi@
2018-06-30Don't steal processes from other CPUs if we're not scheduling processes onMark Kettenis
a CPU. ok deraadt@
2018-06-19SMT (Simultanious Multi Threading) implementations typically shareMark Kettenis
TLBs and L1 caches between threads. This can make cache timing attacks a lot easier and we strongly suspect that this will make several spectre-class bugs exploitable. Especially on Intel's SMT implementation which is better known as Hypter-threading. We really should not run different security domains on different processor threads of the same core. Unfortunately changing our scheduler to take this into account is far from trivial. Since many modern machines no longer provide the ability to disable Hyper-threading in the BIOS setup, provide a way to disable the use of additional processor threads in our scheduler. And since we suspect there are serious risks, we disable them by default. This can be controlled through a new hw.smt sysctl. For now this only works on Intel CPUs when running OpenBSD/amd64. But we're planning to extend this feature to CPUs from other vendors and other hardware architectures. Note that SMT doesn't necessarily have a posive effect on performance; it highly depends on the workload. In all likelyhood it will actually slow down most workloads if you have a CPU with more than two cores. ok deraadt@
2017-12-14make sched_barrier use cond_wait/cond_signal.David Gwynne
previously the code was using a percpu flag to manage the sleeps/wakeups, which means multiple threads waiting for a barrier on a cpu could race. moving to a cond struct on the stack fixes this. while here, get rid of the sbar taskq and just use systqmp instead. the barrier tasks are short, so there's no real downside. ok mpi@
2017-11-28Raise the IPL of the sbar taskq to avoid lock order issuesVisa Hankala
with the kernel lock. Fixes a deadlock seen by Hrvoje Popovski and dhill@. OK mpi@, dhill@
2017-02-12Split up fork1():Philip Guenther
- FORK_THREAD handling is a totally separate function, thread_fork(), that is only used by sys___tfork() and which loses the flags, func, arg, and newprocp parameters and gains tcb parameter to guarantee the new thread's TCB is set before the creating thread returns - fork1() loses its stack and tidptr parameters Common bits factor out: - struct proc allocation and initialization moves to thread_new() - maxthread handling moves to fork_check_maxthread() - setting the new thread running moves to fork_thread_start() The MD cpu_fork() function swaps its unused stacksize parameter for a tcb parameter. luna88k testing by aoyama@, alpha testing by dlg@ ok mpi@
2017-01-21p_comm is the process's command and isn't per thread, so move it fromPhilip Guenther
struct proc to struct process. ok deraadt@ kettenis@
2016-06-03Allow pegged process on secondary CPUs to continue to be scheduled whenMark Kettenis
halting a CPU. Necessary for intr_barrier(9) to work when interrupts are targeted at secondary CPUs. ok mpi@, mikeb@ (a while back)
2016-03-17Replace curcpu_is_idle() by cpu_is_idle() and use it instead of rollingMartin Pieuchot
our own. From Michal Mazurek, ok mmcc@
2015-12-23One "sbar" taskq is enough.Mark Kettenis
ok visa@
2015-12-17Make the cost of moving a process to the primary cpu a bit higher. This isMark Kettenis
the CPU that handles most hardware interrupts but we don't account for that in any way in the scheduler. So processes (and kernel threads) that are unlucky enough to end up on this CPU will get less CPU cycles than those running on other CPUs. This is especially true for the softnet taskq. There network interrupts will prevent the softnet taskq from running. This means that the more packets we receive, the less packets we can actually process and/or forward. This is why "unlocking" network drivers actually decreases the forwarding performance. This diff restores most of the lost performance by making it less likely that the softnet taskq ends up on the same CPU that handles network interrupts. Tested by Hrvoje Popovski ok mpi@, deraadt@
2015-10-16Make sched_barrier() use its own task queue to avoid deadlocks.Martin Pieuchot
Prevent a deadlock from occuring when intr_barrier() is called from a non-primary CPU in the watchdog task, also enqueued on ``systq''. ok kettenis@
2015-09-20Short circuit if we're running on the CPU that we want to sync with. FixesMark Kettenis
suspend on machines with em(4) now that it uses intr_barrier(9). ok krw@
2015-09-13Introduce sched_barrier(9), an interface that acts as a scheduler barrier inMark Kettenis
the sense that it guarantees that the specified CPU went through the scheduler. This also guarantees that interrupt handlers running on that CPU will have finished when sched_barrier() returns. ok miod@, guenther@
2015-03-14Remove some includes include-what-you-use claims don'tJonathan Gray
have any direct symbols used. Tested for indirect use by compiling amd64/i386/sparc64 kernels. ok tedu@ deraadt@
2014-09-24Keep under #ifdef MULTIPROCESSOR the code that deals with SPCF_SHOULDHALTMartin Pieuchot
and SPCF_HALTED, these flags only make sense on secondary CPUs which are unlikely to be present on a SP kernel. ok kettenis@
2014-07-26If we're stopping a secondary cpu, don't let sched_choosecpu() short-circuitMark Kettenis
and return the current current CPU, otherwise sched_stop_secondary_cpus() will spin forever trying to empty its run queues. Fixes hangs during suspend that many people reported over the last couple of days. ok bcook@, guenther@
2014-07-13Fix sched_stop_secondary_cpus() to properly drain CPUsMatthew Dempsky
TAILQ_FOREACH() isn't safe to use in sched_chooseproc() to iterate over the run queues because within the loop body we remove the threads from their run queues and reinsert them elsewhere. As a result, we end up only draining the first thread of each run queue rather than all of them. ok kettenis
2014-05-04Add PS_SYSTEM, the process-level mirror of the thread-level P_SYSTEM,Philip Guenther
and FORK_SYSTEM as a flag to set them. This eliminates needing to peek into other processes threads in various places. Inspired by NetBSD ok miod@ matthew@
2014-02-12Eliminate the exit sig handling, which was only invokable via thePhilip Guenther
Linux-compat clone() syscall when *not* using CLONE_THREAD. pirofti@ confirms Opera runs in compat without this, so out it goes; one less hair to choke on in kern_exit.c ok tedu@ pirofti@
2013-06-06Prevent idle thread from being stolen on startup.Christiano F. Haesbaert
There is a race condition which might trigger a case where two cpus try to run the same idle thread. The problem arises when one cpu steals the idle proc of another cpu and this other cpu ends up running the idle thread via spc->spc_idleproc, resulting in two cpus trying to cpu_switchto(idleX). On startup, idle procs are scaterred around different runqueues, the decision for scheduling is: 1 look at my runqueue. 2 if empty, look at other dudes runqueue. 3 if empty, select idle proc via spc->spc_idleproc. The problem is that cpu0's idle0 might be running on cpu1 due to step 1 or 2 and cpu0 hits step 3. So cpu0 will select idle0, while cpu1 is in fact running it already. The solution is to never place idle on a runqueue, therefore being only selectable through spc->spc_idleproc. This race can be more easily triggered on a HT cpu on virtualized environments, where the guest more often than not doesn't have the cpu for itself, so timing gets shuffled. ok tedu@ guenther@ go ahead after t2k13 deraadt@
2013-06-03Convert some internal APIs to use timespecs instead of timevalsPhilip Guenther
ok matthew@ deraadt@
2013-04-19sprinkle ifdef MP to disable cpu migration code when not needed.Ted Unangst
ok deraadt
2012-07-10Make sure that we don't schedule processes on CPUs that we havetaken out ofMark Kettenis
the scheduler. ok hasbaert@. deraadt@
2012-03-23Make rusage totals, itimers, and profile settings per-process insteadPhilip Guenthe
of per-rthread. Handling of per-thread tick and runtime counters inspired by how FreeBSD does it. ok kettenis@
2012-03-10Account for sched_noidle and document the scheduler variables.Christiano F. Haesbaert
ok tedu@
2011-10-12Remove all MD diagnostics in cpu_switchto(), and move them to MI code ifMiod Vallat
they apply. ok oga@ deraadt@
2011-07-06Clean up after P_BIGLOCK removal.Artur Grabowski
KERNEL_PROC_LOCK -> KERNEL_LOCK KERNEL_PROC_UNLOCK -> KERNEL_UNLOCK oga@ ok
2010-05-28Delete a fallback definition for CPU_INFO_UNIT that's both unnecessaryPhilip Guenthe
and incorrect. Kills an XXX comment. ok syuu, thib, art, kettenis, millert, deraadt
2010-05-25Actively remove processes from the runqueues of a CPU when we stop it.Mark Kettenis
Also make sure not to take the scheduler lock once we have stopped a CPU such that we can safely take it away without having to worry about deadlock because it happened to own the scheduler lock. Fixes issues with suspen on SMP machines. ok mlarkin@, marco@, art@, deraadt@
2010-05-14Make sure we initialize sched_lock before we try to use it.Mark Kettenis
ok miod@, thib@, oga@, jsing@
2010-04-23Merge the only relevant (for now) parts of simplelock.h into lock.hTheo de Raadt
since it is time to start transitioning away from the no-op behaviour. ok oga kettenis
2010-04-06Implement functions to take away the secondary CPUs from the scheduler andMark Kettenis
give them back again, effectively stopping and starting these CPUs. Use the stop function in sys_reboot(). ok marco@, deraadt@
2010-01-09Add code to stop scheduling processes on CPUs, effectively halting that CPU.Mark Kettenis
Use this to do a shutdown with only the boot processor running. This should avoid nasty races during shutdown. help from art@, ok deraadt@, miod@
2009-11-29Backout previous commit. There is a possible race which makes it possibleMark Kettenis
for sys_reboot() to hang forever.
2009-11-25Add a mechanism to stop the scheduler from scheduling processes on aMark Kettenis
particular CPU such that it just sits and spins in the idle loop, effectively halting that CPU. ok deraadt@, miod@
2009-10-05Don't drop the big lock at the end of exit1(), but move it into the middle ofTheo de Raadt
sched_exit(). This means that cpu_exit() and whatever it does (for instance calling free(), as well as the deadproc p_hash handling are now locked as well. This may have been one of the causes of the reaper panics, especially with rthread patches... which were terminating a lot of threads very quickly onto the deadproc p_hash list. ok kurt kettenis miod
2009-04-22When starting up idle, explicitly set p_cpu and the peg flag for theArtur Grabowski
idle proc. p_cpu might be necessary in the future and pegging is just to be extra safe (although we'll be horribly broken if the idle proc ever ends up where that flag is checked).
2009-04-20Make pegging a proc work when there are idle cpus that are looking forArtur Grabowski
something to do. Walk the highest priority queue looking for a proc to steal and skip those that are pegged. We could consider walking the other queues in the future too, but this should do for now. kettenis@ guenther@ ok
2009-04-14Some tweaks to the cpu affinity code.Artur Grabowski
- Split up choosing of cpu between fork and "normal" cases. Fork is very different and should be treated as such. - Instead of implicitly choosing a cpu in setrunqueue, do it outside where it actually makes sense. - Just because a cpu is marked as idle doesn't mean it will be soon. There could be a thundering herd effect if we call wakeup from an interrupt handler, so subtract cpus with queued processes when deciding which cpu is actually idle. - some simplifications allowed by the above. kettenis@ ok (except one bugfix that was not in the intial diff)
2009-04-03sched_peg_curproc_to_cpu() - function to force a proc to stay on a cpuArtur Grabowski
forever.