summaryrefslogtreecommitdiff
path: root/sys/kern/kern_time.c
AgeCommit message (Collapse)Author
2024-10-03Fix the clock_gettime() handler for pthread_getcpuclockid() toClaudio Jeker
use the proper way to read tu_runtime. OK mpi@
2024-07-26Trace struct itimervalPhilip Guenther
ok deraadt@ claudio@
2024-07-08Rework per proc and per process time usage accountingClaudio Jeker
For procs (threads) the accounting happens now lockless by curproc using a generation counter. Callers need to use tu_enter() and tu_leave() for this. To read the proc p_tu struct tuagg_get_proc() should be used. It ensures that the values read is consistent. For processes only the time of exited threads is accumulated in ps_tu and to get the proper process time usage tuagg_get_process() needs to be called. tuagg_get_process() will sum up all procs p_tu plus the ps_tu. This removes another SCHED_LOCK() dependency. Adjust the code in exit1() and exit2() to correctly account for the full run time. For this adjust sched_exit() to do the runtime accounting like it is done in mi_switch(). OK jca@ dlg@
2023-10-17clockintr: move callback-specific API behaviors to "clockrequest" namespaceScott Soule Cheloha
The API's behavior when invoked from a callback function is impossible to document. Move the special behavior into a distinct namespace, "clockrequest". - Add a 'struct clockrequest'. Basically a stripped-down 'struct clockintr' for exclusive use during clockintr_dispatch(). - In clockintr_queue, replace the "cq_shadow" clockintr with a "cq_request" clockrequest. They serve the same purpose. - CLST_SHADOW_PENDING -> CR_RESCHEDULE; different namespace, same meaning. - CLST_IGNORE_SHADOW -> CLST_IGNORE_REQUEST; same meaning. - Move shadow branch in clockintr_advance() to clockrequest_advance(). - clockintr_request_random() becomes clockrequest_advance_random(). - Delete dead shadow branches in clockintr_cancel(), clockintr_schedule(). - Callback functions now get a clockrequest pointer instead of a special clockintr pointer: update all prototypes, callers. No functional change intended.
2023-09-10clockintr: support an arbitrary callback function argumentScott Soule Cheloha
Callers can now provide an argument pointer to clockintr_establish(). The pointer is kept in a new struct clockintr member, cl_arg. The pointer is passed as the third parameter to clockintr.cl_func when it is executed during clockintr_dispatch(). Like the callback function, the callback argument is immutable after the clockintr is established. At present, nothing uses this. All current clockintr_establish() callers pass a NULL arg pointer. However, I am confident that dt(4)'s profile provider will need this in the near future. Requested by dlg@ back in March.
2023-08-29Remove p_rtime from struct proc and replace it by passing the timespecClaudio Jeker
as argument to the tuagg_locked function. - Remove incorrect use of p_rtime in other parts of the tree. p_rtime was almost always 0 so including it in any sum did not alter the result. - In main() the update of time can be further simplified since at that time only the primary cpu is running. - Add missing nanouptime() call in cpu_hatch() for hppa - Rename tuagg_unlocked to tuagg_locked like it is done in the rest of the tree. OK cheloha@ dlg@
2023-08-05hardclock(9): move setitimer(2) code into itimer_update()Scott Soule Cheloha
- Move the setitimer(2) code responsible for updating the ITIMER_VIRTUAL and ITIMER_PROF timers from hardclock(9) into a new clock interrupt routine, itimer_update(). itimer_update() is periodic and runs at the same frequency as the hardclock. + Revise itimerdecr() to run within itimer_mtx instead of entering and leaving it. - Each schedstate_percpu has its own itimer_update() handle, spc_itimer. A new scheduler flag, SPCF_ITIMER, indicates whether spc_itimer was started during the last mi_switch() and needs to be stopped during the next mi_switch() or sched_exit(). - A new per-process flag, PS_ITIMER, indicates whether ITIMER_VIRTUAL and/or ITIMER_PROF are running. Checking the flag is easier than entering itimer_mtx to check process.ps_timer[]. The flag is set and cleared in a new helper function, process_reset_itimer_flag(). - In setitimer(), call need_resched() when the state of ITIMER_VIRTUAL or ITIMER_PROF is changed to force an mi_switch() and update spc_itimer. claudio@ notes that ITIMER_PROF could be implemented as a high-res timer using the thread's execution time as a guide for when to interrupt the process and assert SIGPROF. This would probably work really well in single-threaded processes. ITIMER_VIRTUAL would be more difficult to make high-res, though, as you need to exclude time spent in the kernel. Tested on powerpc64 by gkoehler@. With input from claudio@. Thread: https://marc.info/?l=openbsd-tech&m=169038818517101&w=2 ok claudio@
2023-02-15Use the nowake channel for sys_nanosleep() instead of handrolling the same.Claudio Jeker
OK cheloha@ jca@ mvs@
2023-02-04kernel: stathz is always non-zero after cpu_initclocks()Scott Soule Cheloha
Now that the clockintr switch is complete, cpu_initclocks() always initializes stathz to a non-zero value. We don't call statclock() from hardclock(9) anymore and, more broadly, we don't need to test whether stathz is non-zero before using it. With input from kettenis@. Link: https://marc.info/?l=openbsd-tech&m=167434223309668&w=2 ok kettenis@ miod@
2023-01-02Add tfind_user(), for getting a proc* given a user-space TID andPhilip Guenther
the process* that it should be part of. Use that in clock_get{time,res}(), thrkill(), and ptrace(). ok jca@ miod@ mpi@ mvs@
2022-12-31timeout: rename "timeout_at_ts" to "timeout_abs_ts"Scott Soule Cheloha
I think "abs" ("absolute timeout") is a better mnemonic than "at" ("at the given time"). The interface is undocumented and there are only two callers, so renaming it is not a big deal. probably ok kn@
2022-12-05zap a pile of dangling tabsTheo de Raadt
2022-10-12Extend struct todr_chip_handle with a todr_quality member. This allows usMark Kettenis
to assign a quality to RTC implementation and pick the "best" RTC if a system has multiple RTCs (or multiple interfaces to an RTC). This allows us to prefer a battery-backed I2C RTC over an RTC that is part of the SoC which is only running of the SoC is powered. It also allows us to work around issues with firmware RTC interfaces that may lie to us or even crash the system. This change makes sure the todr_quality member of the struct is always initialized. In most cases the quality will be set to zero; further adjustments of the quality for specific subsystems/architectures will follow. ok cheloha@, patrick@
2022-08-14remove unneeded includes in sys/kernJonathan Gray
ok mpi@ miod@
2022-05-05Using mutex initializer for static variable does not compile withAlexander Bluhm
witness. Make ratecheck mutex global. Reported-by: syzbot+9864ba1338526d0e8aca@syzkaller.appspotmail.com
2022-05-04Introduce mutex for ratecheck(9) and ppsratecheck(9). A globalAlexander Bluhm
mutex with spl high for all function calls is used for now. It protects the lasttime and curpps parameter. This solution is MP safe for the usual use case, allows progress, and can be optimized later. Remove a useless #if 1 while there. OK claudio@
2021-06-18setitimer(2): increase timer limit to UINT_MAX secondscheloha
Currently setitimer(2) rejects timers larger than 100 million seconds and sets EINVAL. With the change to kclock timeouts there is no longer any reason to use this arbitrary value. Kclock timeouts support the full range of a timespec, so we can increase the upper bound without practical risk of arithmetic overflow. If we push the limit to UINT_MAX we can support the full input range of alarm(3). We can then simplify the alarm.3 manpage in a separate patch. We can push the limit even higher in the future if we find software that doesn't like the UINT_MAX limit. Until then, UINT_MAX seconds (over 68 years) is plenty for all practical timers. ok claudio@
2021-06-11setitimer(2): don't round up it_valuecheloha
We can reduce latency for the first expiration of a timer if we don't round it_value up to the minimum interval (1 tick). While we're at it, we may as well consolidate all input validation and adjustment into a single itimerfix() call. There are no other callers in the kernel (nor should there be), so remove the prototype from sys/time.h. Discussion: https://marc.info/?l=openbsd-tech&m=162084338005502&w=2 Tested by weerd@ and claudio@. probably ok claudio@
2021-05-31Redefine ADJFREQ_MIN to avoid undefined behaviour (when not using -fwrapv)Visa Hankala
Change the definition of ADJFREQ_MIN so that it does not shift a negative value. Such shifting is undefined in standard C. This came up when cross-compiling the kernel using ports clang. The shifting becomes defined when compiling with option -fwrapv. Base clang enables this option by default. OK naddy@ cheloha@
2020-12-23nanosleep(2): shorten wmesg from "nanosleep" to "nanoslp"cheloha
We only see 8 characters of wmesg in e.g. top(1), so shorten the string to fit. Indirectly prompted by kettenis@.
2020-11-10setitimer(2): ITIMER_REAL: protect state with per-process mutex ps_mtxcheloha
To unlock getitimer(2) and setitimer(2) we need to protect the per-process ITIMER_REAL state with something other than the kernel lock. As the ITIMER_REAL timeout callback realitexpire() runs at IPL_SOFTCLOCK the per-process mutex ps_mtx is appropriate. In setitimer() we need to use ps_mtx instead of the global itimer_mtx if the given timer is ITIMER_REAL. Easy. The ITIMER_REAL timeout callback routine realitexpire() is trickier. When we enter ps_mtx during the callback we need to check if the timer was cancelled or rescheduled. A thread from the process can call setitimer(2) at the exact moment the callback is about to run from timeout_run() (see kern_timeout.c). Update the locking annotation in sys/proc.h accordingly. ok anton@
2020-10-25setitimer(2): ITIMER_REAL: use kclock timeoutscheloha
Reimplement the ITIMER_REAL interval timer with a kclock timeout. Couple things of note: - We need to use the high-res nanouptime(9) call, not the low-res getnanouptime(9). - The code is simpler now that we aren't working with ticks. Misc. thoughts: - Still unsure if "kclock" is the right name for these things. - MP-safely cancelling a periodic timeout is very difficult.
2020-10-15_exit(2), execve(2): tweak per-process interval timer cancellationcheloha
If we fold the for-loop iterating over each interval timer into the helper function the result is slightly tidier than what we have now. Rename the helper function "cancel_all_itimers". Based on input from millert@ and kettenis@.
2020-10-15_exit(2), execve(2): cancel per-process interval timers safelycheloha
During _exit(2) and sometimes during execve(2) we need to cancel any active per-process interval timers. We don't currently do this in an MP-safe way. Both syscalls ignore the locking assumptions documented in proc.h. The easiest way to make them MP-safe is to use setitimer(), just like the getitimer(2) and setitimer(2) syscalls do. To make things a bit cleaner I have added a helper function, cancelitimer(), so the callers don't need to fuss with an itimerval struct. While we're here we can remove the splclock/splx dance from execve(2). It is no longer necessary. ok deraadt@
2020-10-13setitimer(2): zero itv.it_interval if itv.it_value is zerocheloha
If itv.it_value is zero we cancel the timer. When we cancel the timer we don't care about itv.it_interval because the timer is not running: we don't use it, we don't look at it, etc. To be on the paranoid side, I think we should zero itv.it_interval when itv.it_value is zero. No need to write arbitrary values into the process struct if we aren't required to. The standard is ambiguous about what should happen in this case, i.e. the value of olditv after the following code executes is unspecified: struct itimerval newitv, olditv; newitv.it_value.tv_sec = newitv.it_value.tv_usec = 0; newitv.it_interval.tv_sec = newitv.it_interval.tv_usec = 1; setitimer(ITIMER_REAL, &newitv, NULL); getitimer(ITIMER_REAL, &olditv); This change should not break any real code.
2020-10-13setitimer(2): realitexpire(): call getnanouptime(9) oncecheloha
timespecadd(3) is fast. There is no need to call getnanouptime(9) repeatedly when searching for the next expiration point. Given that it_interval is at least 1/hz, we expect to run through the loop maybe hz times at most. Even at HZ=10000 that's pretty brief. While we're here, pull *all* of the other logic out of the loop. The only thing we need to do in the loop is timespecadd(3).
2020-10-07sys_getitimer(), sys_setitimer(): style(9), misc. cleanupcheloha
- Consolidate variable declarations. - Remove superfluous parentheses from return statements. - Prefer sizeof(variable) to sizeof(type) for copyin(9)/copyout(9). - Remove some intermediate pointers from sys_setitimer(). Using SCARG() directly here makes it more obvious to the reader what you're copying.
2020-10-07getitimer(2), setitimer(2): ITIMER_REAL: call getnanouptime(9) oncecheloha
Now that the critical sections are merged we should call getnanouptime(9) once. This makes an ITIMER_REAL timer swap atomic with respect to the clock: the time remaining on the old timer is computed with the same timestamp used to schedule the new timer.
2020-10-07getitimer(2), setitimer(2): merge critical sectionscheloha
Merge the common code from sys_getitimer() and sys_setitimer() into a new kernel subroutine, setitimer(). setitimer() performs all of the error-free work for both system calls within a single critical section. We need a single critical section to make the setitimer(2) timer swap operation atomic relative to realitexpire() and hardclock(9). The downside of the new atomicity is that the behavior of setitimer(2) must change. With a single critical section we can no longer copyout(9) the old timer before installing the new timer. So If SCARG(uap, oitv) points to invalid memory, setitimer(2) now fail with EFAULT but the new timer will be left running. You can see this in action with code like the following: struct itv, olditv; itv.it_value.tv_sec = 1; itv.it_value.tv_usec = 0; itv.it_interval = itv.it_value; /* This should EFAULT. 0x1 is probably an invalid address. */ if (setitimer(ITIMER_REAL, &itv, (void *)0x1) == -1) warn("setitimer"); /* The timer will be running anyway. */ getitimer(ITIMER_REAL, &olditv); printf("time left: %lld.%06ld\n", olditv.it_value.tv_sec, olditv.it_value.tv_usec); There is no easy way to work around this. Both FreeBSD's and Linux's setitimer(2) implementations have a single critical section and they too fail with EFAULT in this case and leave the new timer running. I imagine their developers decided that fixing this error case was a waste of effort. Without permitting copyout(9) from within a mutex I'm not sure it is even possible to avoid it on OpenBSD without sacrificing atomicity during a setitimer(2) timer swap. Given the rarity of this error case I would rather have an atomic swap. Behavior change discussed with deraadt@.
2020-10-02expose timeval/timespec from system calls into ktrace, before determiningTheo de Raadt
if they are out of range, making it easier to isolate reason for EINVAL ok cheloha
2020-08-12getitimer(2): delay TIMESPEC_TO_TIMEVAL(9) conversion until copyout(9)cheloha
setitimer(2) works with timespecs in its critical section. It will be easier to merge the two critical sections if getitimer(2) also works with timespecs. In particular, we currently read the uptime clock *twice* during a setitimer(2) swap: we call getmicrouptime(9) in sys_getitimer() and then call getnanouptime(9) in sys_setitimer(). This means that swapping one timer in for another is not atomic with respect to the uptime clock. It also means the two operations are working with different time structures and resolutions, which is potentially confusing. If both critical sections work with timespecs we can combine the two getnanouptime(9) calls into a single call at the start of the combined critical section in a future patch, making the swap atomic with respect to the clock. So, in preparation, move the TIMESPEC_TO_TIMEVAL conversions in getitimer(2) after the ITIMER_REAL conversion from absolute to relative time, just before copyout(9). The ITIMER_REAL conversion must then be done with timespec macros and getnanouptime(9), just like in setitimer(2).
2020-08-12setitimer(2): ITIMER_REAL: don't call timeout_del(9) before timeout_add(9)cheloha
If we're replacing the current ITIMER_REAL timer with a new one we don't need to call timeout_del(9) before calling timeout_add(9). timeout_add(9) does the work of timeout_del(9) implicitly if the timeout in question is already pending. This saves us an extra trip through the timeout_mutex.
2020-08-11setitimer(2): write new timer value in one placecheloha
Rearrange the critical section in setitimer(2) to match that of getitimer(2). This will make it easier to merge the two critical sections in a subsequent diff. In particular, we want to write the new timer value in *one* place in the code, regardless of which timer we're setting. ok millert@
2020-08-11setitimer(2): consolidate copyin(9), input validation, input conversioncheloha
For what are probably historical reasons, setitimer(2) does not validate its input (itv) immediately after copyin(9). Instead, it waits until after (possibly) performing a getitimer(2) to copy out the state of the timer. Consolidating copyin(9), input validation, and input conversion into a single block before the getitimer(2) operation makes setitimer(2) itself easier to read. It will also simplify merging the critical sections of setitimer(2) and getitimer(2) in a subsequent patch. This changes setitimer(2)'s behavior in the EINVAL case. Currently, if your input (itv) is invalid, we return EINVAL *after* modifying the output (olditv). With the patch we will now return EINVAL *before* modifying the output. However, any code dependent upon this behavior is broken: the contents of olditv are undefined in all setitimer(2) error cases. ok millert@
2020-08-11getitimer(2): don't enter itimer_mtx to read ITIMER_REAL itimerspeccheloha
The ITIMER_REAL per-process interval timer is protected by the kernel lock. The ITIMER_REAL timeout (ps_realit_to), setitimer(2), and getitimer(2) all run under the kernel lock. Entering itimer_mtx during getitimer(2) when reading the ITIMER_REAL ps_timer state is superfluous and misleading.
2020-08-09hardclock(9): fix race with setitimer(2) for ITIMER_VIRTUAL, ITIMER_PROFcheloha
The ITIMER_VIRTUAL and ITIMER_PROF per-process interval timers are updated from hardclock(9). If a timer for the parent process is enabled the hardclock(9) thread calls itimerdecr() to update and reload it as needed. However, in itimerdecr(), after entering itimer_mtx, the thread needs to double-check that the timer in question is still enabled. While the hardclock(9) thread is entering itimer_mtx a thread in setitimer(2) can take the mutex and disable the timer. If the timer is disabled, itimerdecr() should return 1 to indicate that the timer has not expired and that no action needs to be taken. ok kettenis@
2020-08-08adjtime(2): simplify input validation for new adjustmentcheloha
The current input validation for overflow is more complex than it needs to be. We can flatten the conditional hierarchy into a string of checks just one level deep. The result is easier to read.
2020-07-15settimeofday(2): securelevel 2: prevent root from freezing the UTC clockcheloha
At securelevel 2 we prevent root from rewinding the kernel UTC clock. The rationale given in the comment is that this prevents a compromised root from setting arbitrary timestamps on files. I can't really speak to the efficacy of this mitigation, or to the efficacy of the securelevel concept in general, but the implementation of this mitigation is wrong. We need to check: timespeccmp(ts, &now, <=) instead of timespeccmp(ts, &now, <) like we do now. Time is a continuous value that is always advancing. We must prevent root from setting the kernel UTC clock to its current value in addition to prior values. Setting the UTC clock to its current value amounts to rewinding it even if we cannot actually measure the difference with a timespec. With this change, at securelevel 2, root can no longer completely freeze the UTC clock.
2020-07-09adjfreq(2): limit adjustment to [-500000, +500000] ppmcheloha
When we recompute the scaling factor during tc_windup() there is an opportunity for arithmetic overflow if the active timecounter's adjfreq(2) adjustment is too large. If we limit the adjustment to [-500000, +500000] ppm the statement in question cannot overflow. In particular, we are concerned with the following bit of code: scale = (u_int64_t)1 << 63; scale += \ ((th->th_adjustment + th->th_counter->tc_freq_adj) / 1024) * 2199; scale /= th->th_counter->tc_frequency; th->th_scale = scale * 2; where scale is an int64_t. Overflow when we do: scale += (...) / 1024 * 2199; as th->th_counter->tc_freq_adj is currently unbounded. th->th_adjustment is limited to [-5000ppm, 5000ppm]. To see that overflow is prevented with the new bounds, consider the new edge case where th->th_counter->tc_freq_adj is 500000ppm and th->th_adjustment is 5000ppm. Both are of type int64_t. We have: int64_t th_adjustment = (5000 * 1000) << 32; /* 21474836480000000 */ int64_t tc_freq_adj = 500000000LL << 32; /* 2147483648000000000 */ scale = (u_int64_t)1 << 63; /* 9223372036854775808 */ scale += (th_adjustment + tc_freq_adj) / 1024 * 2199; /* scale += 2168958484480000000 / 1024 * 2199; */ /* scale += 4657753620480000000; */ 9223372036854775808 + 4657753620480000000 = 13881125657334775808, which less than 18446744073709551616, so we don't have overflow. On the opposite end, if th->th_counter->tc_freq_adj is -500000ppm and th->th_adjustment is -5000ppm we would have -4657753620480000000. 9223372036854775808 - 4657753620480000000 = 4565618416374775808. Again, no overflow. 500000ppm and -500000ppm are extreme adjustments. otto@ says ntpd(8) would never arrive at them naturally, so we are not at risk of breaking a working setup by imposing these restrictions. Documentation input from kettenis@. No complaints from otto@.
2020-06-22inittodr(9): introduce dedicated flag to enable writes from resettodr(9)cheloha
We don't want resettodr(9) to write the RTC until inittodr(9) has actually run. Until inittodr(9) calls tc_setclock() the system UTC clock will contain a meaningless value and there's no sense in overwriting a good value with a value we know is nonsense. This is not an uncommon problem if you're debugging a problem in early boot, e.g. a panic that occurs prior to inittodr(9). Currently we use the following logic in resettodr(9) to inhibit writes: if (time_second == 1) return; ... this is too magical. A better way to accomplish the same thing is to introduce a dedicated flag set from inittodr(9). Hence, "inittodr_done". Suggested by visa@. ok kettenis@
2020-05-20clock_gettime(2): use nanoruntime(9) to get value for CLOCK_UPTIMEcheloha
2020-05-17Add function for attaching RTC drivers, to reduce direct useVisa Hankala
of todr_handle. OK kettenis@
2020-05-16Make inittodr() and resettodr() MI.Mark Kettenis
ok deraadt@, mpi@, visa@ ok cheloha@ as well (would have preferred in new file for this code)
2020-03-20nanosleep(2): tsleep(9) -> tsleep_nsec(9)cheloha
While here, rename the wait channel so the tsleep_nsec(9) call will fit onto a single line. It isn't a global channel so the name is arbitrary anyway. With input from visa@. ok visa@
2019-11-07adjfreq(2): fix atomic swapcheloha
I broke adjfreq(2)'s atomic swap in kern_time.c,v1.112. By using the "f" variable to store both the new and old frequency adjustments, the new adjustment gets clobbered by the old adjustment if the caller asked for a swap. ok visa@ mpi@
2019-10-26clock_getres(2): actually return the resolution of the given clockcheloha
Currently we return (1000000000 / hz) from clock_getres(2) as the resolution for every clock. This is often untrue. For CPUTIME clocks, if we have a separate statclock interrupt the resolution is (1000000000 / stathz). Otherwise it is as we currently claim: (1000000000 / hz). For the REALTIME/MONOTONIC/UPTIME/BOOTTIME clocks the resolution is that of the active timecounter. During tc_init() we can compute the precision of a timecounter by examining its tc_counter_mask and store it for lookup later in a new member, tc_precision. The resolution of a clock backed by a timecounter "tc" is then tc.tc_precision * (2^64 / tc.tc_frequency) fractional seconds. While here we can clean up sys_clock_getres() a bit. Standards input from guenther@. Lots of input, feedback from kettenis@. ok kettenis@
2019-09-04gettimeofday, settimeofday(2): limit timezone supportcheloha
For gettimeofday(2), always copy out an empty timezone struct. For settimeofday(2), still copyin(9) the struct but ignore the contents. In gettimeofday(2)'s case we have not changed the original BSD semantics: the kernel only tracks UTC time without an offset for DST, so a zeroed timezone struct is the correct thing to return to the caller. Future work could move these out into libc as stubs for clock_gettime and clock_settime(2). But, definitely a "later" thing, given that we are in beta. Update the manpage to de-emphasize the timezone parameters for these syscalls. Discussed with tedu@, deraadt@, millert@, kettenis@, yasuoka@, jca@, and guenther@. Tested by job@. Ports input from jca@ and sthen@. Manpage input from jca@. ok jca@ deraadt@
2019-08-03R.I.P. itimerround(); ok mpi@cheloha
2019-08-02per-process itimers: itimerval -> itimerspeccheloha
Loongson runs at 128hz. 128 doesn't divide evenly into a million, but it does divide evenly into a billion. So if we do the per-process itimer bookkeeping with itimerspec structs we can have error-free virtual itimers on loongson just as we do on most other platforms. This change doesn't fix the virtual itimer error alpha, as 1024 does not divide evenly into a billion. But this doesn't make the situation any worse, either. ok deraadt@
2019-07-25itimerdecr(): simplify logic with timer*(9) macros; ok millert@cheloha