diff options
author | Scott Soule Cheloha <cheloha@cvs.openbsd.org> | 2022-11-05 19:29:47 +0000 |
---|---|---|
committer | Scott Soule Cheloha <cheloha@cvs.openbsd.org> | 2022-11-05 19:29:47 +0000 |
commit | 02dc962cf694b58ab04d3ec0483b539051ebe369 (patch) | |
tree | bca3746ada93fd48e98b668683df00c57f875a63 /share/man/man9 | |
parent | ad7e72bf7a6d51934c476167fdfaf12af2d75e30 (diff) |
clockintr(9): initial commit
clockintr(9) is a machine-independent clock interrupt scheduler. It
emulates most of what the machine-dependent clock interrupt code is
doing on every platform. Every CPU has a work schedule based on the
system uptime clock. For now, every CPU has a hardclock(9) and a
statclock(). If schedhz is set, every CPU has a schedclock(), too.
This commit only contains the MI pieces. All code is conditionally
compiled with __HAVE_CLOCKINTR. This commit changes no behavior yet.
At a high level, clockintr(9) is configured and used as follows:
1. During boot, the primary CPU calls clockintr_init(9). Global state
is initialized.
2. Primary CPU calls clockintr_cpu_init(9). Local, per-CPU state is
initialized. An "intrclock" struct may be installed, too.
3. Secondary CPUs call clockintr_cpu_init(9) to initialize their
local state.
4. All CPUs repeatedly call clockintr_dispatch(9) from the MD clock
interrupt handler. The CPUs complete work and rearm their local
interrupt clock, if any, during the dispatch.
5. Repeat step (4) until the system shuts down, suspends, or hibernates.
6. During resume, the primary CPU calls inittodr(9) and advances the
system uptime.
7. Go to step (2). This time around, clockintr_cpu_init(9) also
advances the work schedule on the calling CPU to skip events that
expired during suspend. This prevents a "thundering herd" of
useless work during the first clock interrupt.
In the long term, we need an MI clock interrupt scheduler in order to
(1) provide control over the clock interrupt to MI subsystems like
timeout(9) and dt(4) to improve their accuracy, (2) provide drivers
like acpicpu(4) a means for slowing or stopping the clock interrupt on
idle CPUs to conserve power, and (3) reduce the amount of duplicated
code in the MD clock interrupt code.
Before we can do any of that, though, we need to switch every platform
over to using clockintr(9) and do some cleanup.
Prompted by "the vmm(4) time bug," among other problems, and a
discussion at a2k19 on the subject. Lots of design input from
kettenis@. Early versions reviewed by kettenis@ and mlarkin@.
Platform-specific help and testing from kettenis@, gkoehler@,
mlarkin@, miod@, aoyama@, visa@, and dv@. Babysitting and spiritual
guidance from mlarkin@ and kettenis@.
Link: https://marc.info/?l=openbsd-tech&m=166697497302283&w=2
ok kettenis@ mlarkin@
Diffstat (limited to 'share/man/man9')
-rw-r--r-- | share/man/man9/Makefile | 3 | ||||
-rw-r--r-- | share/man/man9/clockintr.9 | 333 |
2 files changed, 335 insertions, 1 deletions
diff --git a/share/man/man9/Makefile b/share/man/man9/Makefile index 4a749fe6719..f67508224a2 100644 --- a/share/man/man9/Makefile +++ b/share/man/man9/Makefile @@ -1,4 +1,4 @@ -# $OpenBSD: Makefile,v 1.307 2022/03/10 15:19:01 bluhm Exp $ +# $OpenBSD: Makefile,v 1.308 2022/11/05 19:29:45 cheloha Exp $ # $NetBSD: Makefile,v 1.4 1996/01/09 03:23:01 thorpej Exp $ # Makefile for section 9 (kernel function and variable) manual pages. @@ -9,6 +9,7 @@ MAN= aml_evalnode.9 atomic_add_int.9 atomic_cas_uint.9 \ audio.9 autoconf.9 \ bemtoh32.9 bio_register.9 bintimeadd.9 boot.9 bpf_mtap.9 buffercache.9 \ bufq_init.9 bus_dma.9 bus_space.9 \ + clockintr.9 \ copy.9 cond_init.9 config_attach.9 config_defer.9 counters_alloc.9 \ cpumem_get.9 crypto.9 \ delay.9 disk.9 disklabel.9 dma_alloc.9 dohooks.9 \ diff --git a/share/man/man9/clockintr.9 b/share/man/man9/clockintr.9 new file mode 100644 index 00000000000..e152a1f6093 --- /dev/null +++ b/share/man/man9/clockintr.9 @@ -0,0 +1,333 @@ +.\" $OpenBSD: clockintr.9,v 1.1 2022/11/05 19:29:45 cheloha Exp $ +.\" +.\" Copyright (c) 2020-2022 Scott Cheloha <cheloha@openbsd.org> +.\" +.\" Permission to use, copy, modify, and distribute this software for any +.\" purpose with or without fee is hereby granted, provided that the above +.\" copyright notice and this permission notice appear in all copies. +.\" +.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES +.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF +.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR +.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES +.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN +.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF +.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. +.\" +.Dd $Mdocdate: November 5 2022 $ +.Dt CLOCKINTR 9 +.Os +.Sh NAME +.Nm clockintr_cpu_init , +.Nm clockintr_dispatch , +.Nm clockintr_init , +.Nm clockintr_setstatclockrate , +.Nm clockintr_trigger +.Nd clock interrupt scheduler +.Sh SYNOPSIS +.In sys/clockintr.h +.Ft void +.Fo clockintr_init +.Fa "u_int flags" +.Fc +.Ft void +.Fo clockintr_cpu_init +.Fa "struct intrclock *ic" +.Fc +.Ft int +.Fo clockintr_dispatch +.Fa "void *frame" +.Fc +.Ft void +.Fo clockintr_setstatclockrate +.Fa "int freq" +.Fc +.Ft void +.Fo clockintr_trigger +.Fa "void" +.Fc +.In sys/kernel.h +.Vt extern int hz; +.Vt extern int stathz; +.Vt extern int profhz; +.In sys/sched.h +.Vt extern int schedhz; +.Sh DESCRIPTION +The +.Nm +subsystem maintains a schedule of events, +dispatches expired events, +and rearms the local interrupt clock for each CPU in the system. +.Pp +The +.Fn clockintr_init +function initializes the subsystem as follows: +.Bl -dash +.It +.Xr hardclock 9 +is configured to run +.Xr hz 9 +times per second on each CPU. +It is an error if +.Vt hz +is less than one or greater than one billion. +.It +.Fn statclock +is configured to run +.Vt stathz +times per second on each CPU. +It is an error if +.Vt stathz +is less than one or greater than one billion. +.It +When appropriate, +.Fn statclock +will be reconfigured to run +.Vt profhz +times per second on each CPU. +.Vt profhz +must be a non-zero integer multiple of +.Vt stathz . +It is an error if +.Vt profhz +is less than +.Vt stathz +or greater than one billion. +.It +If +.Vt schedhz +is non-zero, +.Fn schedclock +is configured to run +.Vt schedhz +times per second on each CPU. +It is an error if +.Vt schedhz +is less than zero or greater than one billion. +.El +.Pp +The event schedule has a resolution of one nanosecond and event periods are +computed using integer divison. +If +.Vt hz , +.Vt stathz , +.Vt profhz , +or +.Vt schedhz +do not divide evenly into one billion, +the corresponding event will not be dispatched at the specified frequency. +.Pp +The +.Fn clockintr_init +function accepts the bitwise OR of zero or more of the following +.Fa flags : +.Bl -tag -width CL_RNDSTAT +.It Dv CL_RNDSTAT +Randomize the +.Fn statclock . +Instead of using a fixed period, +the subsystem will select pseudorandom intervals in a range such that +the average +.Fn statclock +period is equal to the inverse of +.Vt stathz . +.El +.Pp +The +.Fn clockintr_init +function must be called exactly once and only by the primary CPU. +It should be called after all timecounters are installed with +.Xr tc_init 9 . +.Pp +The +.Fn clockintr_cpu_init +function prepares the calling CPU for +.Fn clockintr_dispatch . +The first time it is called on a given CPU, +if +.Fa ic +is not +.Dv NULL , +the caller is configured to use the given +.Fa intrclock +during +.Fn clockintr_dispatch ; +otherwise the caller is responsible for rearming its own interrupt +clock after each +.Fn clockintr_dispatch . +Subsequent calls ignore +.Fa ic : +instead, +the caller's event schedule is advanced past any expired events +without dispatching those events. +It is an error to call this function before the subsystem is initialized with +.Fn clockintr_init . +All CPUs should call +.Fn clockintr_cpu_init +during each system resume after the system time is updated with +.Xr inittodr 9 , +otherwise they will needlessly dispatch every event that expired while +the system was suspended. +.Pp +The +.Fn clockintr_dispatch +function executes all expired events on the caller's event schedule and, +if configured, +rearms the caller's interrupt clock to fire when the next event is scheduled +to expire. +The +.Fa frame +argument must point to the caller's +.Dv clockframe +struct. +The +.Fn clockintr_dispatch +function should only be called from a clock interrupt handler at +.Dv IPL_CLOCK +.Pq see Xr spl 9 . +It is an error to call this function on a given CPU before +.Fn clockintr_cpu_init . +.Pp +The +.Fn clockintr_setstatclockrate +function changes the effective dispatch frequency for +.Fn statclock +to +.Fa freq . +It should be called from the machine-dependent +.Fn setstatclockrate +function after performing any needed hardware reconfiguration. +It is an error if +.Fa freq +is not equal to +.Vt stathz +or +.Vt profhz . +It is an error to call this function before the subsystem is initialized with +.Fn clockintr_init . +.Pp +The +.Fn clockintr_trigger +function causes the +.Fn clockintr_dispatch +function to run in the appropriate context as soon as possible if +the caller was configured with an +.Fa intrclock +when +.Fn clockintr_cpu_init +was first called. +If the caller was not configured with an +.Fa intrclock , +the function does nothing. +It is an error to call this function on a given CPU before +.Fn clockintr_cpu_init . +.Pp +The +.Fa ic +argument to +.Fn clockintr_cpu_init +points to an +.Fa intrclock +structure: +.Bd -literal -offset indent +struct intrclock { + void *ic_cookie; + void (*ic_rearm)(void *cookie, uint64_t nsecs); + void (*ic_trigger)(void *cookie); +}; +.Ed +.Pp +The +.Fa intrclock +structure provides the +.Nm +subsystem with a uniform interface for manipulating an interrupt clock. +It has the following members: +.Bl -tag -width XXXXXXXXXX +.It Fa ic_cookie +May point to any resources needed during +.Fa ic_rearm +or +.Fa ic_trigger +to arm the underlying interrupt clock +.Pq see below . +.It Fa ic_rearm +Should cause +.Fn clockintr_dispatch +to run on the calling CPU in the appropriate context after at least +.Fa nsecs +nanoseconds have elapsed. +The first argument, +.Fa cookie , +is the +.Fa ic_cookie +member of the parent structure. +The second argument, +.Fa nsecs , +is a non-zero count of nanoseconds. +.It Fa ic_trigger +Should cause +.Fn clockintr_dispatch +to run on the calling CPU in the appropriate context as soon as possible. +The first argument, +.Fa cookie , +is the +.Fa ic_cookie +member of the parent structure. +.El +.Sh CONTEXT +The +.Fn clockintr_init , +.Fn clockintr_cpu_init , +and +.Fn clockintr_trigger +functions may be called during autoconf. +.Pp +The +.Fn clockintr_dispatch +function may be called from interrupt context at +.Dv IPL_CLOCK . +.Pp +The +.Fn clockintr_setstatclockrate +function may be called during autoconf, +from process context, +or from interrupt context. +.Sh RETURN VALUES +The +.Fn clockintr_dispatch +function returns non-zero if at least one event was dispatched, +otherwise it returns zero. +.Sh CODE REFERENCES +.Pa sys/kern/kern_clockintr.c +.Sh SEE ALSO +.Xr hardclock 9 , +.Xr hz 9 , +.Xr inittodr 9 , +.Xr nanouptime 9 , +.Xr spl 9 , +.Xr tc_init 9 , +.Xr timeout 9 +.Rs +.%A Steven McCanne +.%A Chris Torek +.%T A Randomized Sampling Clock for CPU Utilization Estimation and Code Profiling +.%B In Proc. Winter 1993 USENIX Conference +.%D 1993 +.%P pp. 387\(en394 +.%I USENIX Association +.Re +.Rs +.%A Richard McDougall +.%A Jim Mauro +.%B Solaris Internals: Solaris 10 and OpenSolaris Kernel Architecture +.%I Prentice Hall +.%I Sun Microsystems Press +.%D 2nd Edition, 2007 +.%P pp. 912\(en925 +.Re +.Sh HISTORY +The +.Nm +subsystem first appeared in +.Ox 7.3 . |