summaryrefslogtreecommitdiff
path: root/sys/kern/kern_sysctl.c
AgeCommit message (Collapse)Author
2021-05-04Reorder the integer sysctl functions. Then the traditional 4.4BSDAlexander Bluhm
comment 'As above...' makes sense again. Improve comments for sysctl_int_bounded() and sysctl_bounded_arr(). OK gnezdo@ mvs@
2021-05-04As the unbouded feature in sysctl_int_bounded() is no longer used,Alexander Bluhm
remove it. This also fixes a defective check of the dynamic boundary in sysctl_sysvshm(). OK mvs@ gnezdo@
2021-05-01Update the remaining SYSCTL_INT_READONLY casesgnezdo
OK mvs@
2021-04-30Rearrange the implementation of bounded sysctl. The primitiveAlexander Bluhm
functions are sysctl_int() and sysctl_rdint(). This brings us back the 4.4BSD implementation. Then sysctl_int_bounded() builds the magic for range checks on top. sysctl_bounded_arr() is a wrapper around it to support multiple variables. Introduce macros that describe the meaning of the magic boundary values. Use these macros in obvious places. input and OK gnezdo@ mvs@
2021-04-23Remove the sysctl kern.allowdt code from kernel if dt(4) is notAlexander Bluhm
configured. This will result in a "value is not available" error from sysctl when trying to enable dt on a kernel without support. The variable allowdt should be in the device, not in sysctl source. We don't need #ifdef for extern and prototypes. OK mpi@
2021-02-08Revert the convertion of per-process thread into a SMR_TAILQ.Martin Pieuchot
We did not reach a consensus about using SMR to unlock single_thread_set() so there's no point in keeping this change.
2021-01-17Cache parent's pid as `ps_ppid' and use it instead of `ps_pptr->ps_pid'.mvs
This allows us to unlock getppid(2). ok mpi@
2021-01-09Split hierarchical calls into kern_sysctl_dirsgnezdo
Removed a rash of +/-1 and made both functions shorter and more focused. OK millert@
2021-01-09Reduce case duplication in kern_sysctlgnezdo
This changes amd64 GENERIC.MP .text size of kern_sysctl.o from 6440 to 6400. Surprisingly, RAMDISK grows from 1645 to 1678. OK millert@, mglocker@
2020-12-28Analog to the the kern.audio.record sysctl parameter for audio(4)Marcus Glocker
devices, introduce kern.video.record for video(4) devices. By default kern.video.record will be set to zero, blanking all data delivered by device drivers which attach to video(4). The idea was initially proposed by Laurence Tratt <laurie AT tratt DOT net>. ok mpi@
2020-12-07Convert the per-process thread list into a SMR_TAILQ.Martin Pieuchot
Currently all iterations are done under KERNEL_LOCK() and therefor use the *_LOCKED() variant. From and ok claudio@
2020-11-16Convert hw_sysctl to sysctl_bounded_argsgnezdo
This one is surprisingly a minor loss if one were to simply add bytes on amd64: .text+.data+.bss+.rodata before 0x64b0+0x40+0x14+0x338 = 0x683c after 0x6440+0x48+0x14+0x3b8 = 0x6854
2020-11-16Convert kern_sysctl to sysctl_bounded_argsgnezdo
objdump -h changes in Size of kern_sysctl.o on amd64 before after .text 7140 64b0 .data 24 40 .bss 10 14 .rodata 50 338
2020-11-07Convert ffs_sysctl to sysctl_bounded_argsgnezdo
Requires sysctl_bounded_arr branch to support sysctl_rdint. The read-only variables are marked by an empty range of [1, 0]. OK millert@
2020-10-19Serialize accesses to "struct vmspace" and document its refcounting.Martin Pieuchot
The underlying vm_space lock is used as a substitute to the KERNEL_LOCK() in uvm_grow() to make sure `vm_ssize' is not corrupted. ok anton@, kettenis@
2020-09-01Remove unused sysctl_int_arr(9)gnezdo
2020-08-23Remove unused debug_syncprt, improve debug sysctl handlingkn
"syncprt" is unused since kern/vfs_syscalls.c r1.147 from 2008. Adding new debug sysctls is a bit opaque and looking at kern/kern_sysctl.c the only visible difference between used and stub ctldebug structs in the debugvars[] array is their extern keyword, indicating that it is defined elsewhere. sys/sysctl.h declares all debugN members as extern upfront, but these declarations are not needed. Remove the unused debug sysctl, rename the only remaining one to something meaningful and remove forward declarations from /sys/sysctl.h; this way, adding new debug sysctls is a matter of adding extern and coming up with a name, which is nicer to read on its own and better to grep for. OK mpi
2020-08-22Move sysctl(2) CTL_DEBUG from DEBUG to new DEBUG_SYSCTLkn
Adding "debug.my-knob" sysctls is really helpful to select different code paths and/or log on demand during runtime without recompile, but as this code is under DEBUG, lots of other noise comes with it which is often undesired, at least when looking at specific subsystems only. Adding globals to the kernel and breaking into DDB to change them helps, but that does not work over SSH, hence the need for debug sysctls. Introduces DEBUG_SYSCTL to make use of the "debug" MIB without the rest of DEBUG; it's DEBUG_SYSCTL and not SYSCTL_DEBUG because it's not a general option for all of sysctl(2). OK gnezdo
2020-08-18Style fixups from hurried commitsgnezdo
Thanks kettenis@ for pointing out. ok kettenis@
2020-08-18Add sysctl_bounded_arr as a replacement for sysctl_int_arrgnezdo
Design by deraadt@ ok deraadt@
2020-08-01Move range check inside sysctl_int_arrgnezdo
Range violations are now consistently reported as EOPNOTSUPP. Previously they were mixed with ENOPROTOOPT. OK kn@
2020-06-22there's not going to be any whole kernel wide network livelocks soon.David Gwynne
2020-05-29rndvar.h not needed hereTheo de Raadt
2020-03-09Return EINVAL for KERN_PROC if the size parameter is 0.Todd C. Miller
Prevents a panic due to a NULL dereference; Coverity CID 1452899. Based on a diff from mpi@, OK deraadt@ kettenis@
2020-01-24New `kern.allowdt' button must be set to open(2) /dev/dt.Martin Pieuchot
dt(4) exposes kernel internals, addresses and content of states to userland. As such its interface shouldn't be available without enabling it consciously. ok millert@, deraadt@
2020-01-02Exclude offline cpus in KERN_CPTIME calculation. Without this too highClaudio Jeker
idle time is reported in tools like vmstat. OK visa@ benno@ krw@
2019-12-11Replace p_xstat with ps_xexit and ps_xsigPhilip Guenther
Convert those to a consolidated status when needed in wait4(), kevent(), and sysctl() Pass exit code and signal separately to exit1() (This also serves as prep for adding waitid(2)) ok mpi@
2019-10-22struct proc: change ps_start from utc time to uptimecheloha
Allows us to determine how long a process has been running, even if the UTC clock jumps. With help from bluhm@ and millert@, who squashed several bugs. ok bluhm@ millert@
2019-08-21sysctl(2): add kern.utc_offset: successor to the DST/TIMEZONE options(4)cheloha
The DST and TIMEZONE options(4) are incompatible with KARL, so we need some other way to compensate for an RTC running with a known offset. Enter kern.utc_offset, an offset in minutes East of UTC. TIMEZONE has always been minutes West, but this is inconsistent with how everyone else talks about timezones, hence the flip. TIMEZONE has the advantage of being compiled into the binary. Our new sysctl(2) has no such luck, so it needs to be set as early as possible in boot, from sysctl.conf(5), so we can correct the kernel clock from the RTC's local time to UTC before daemons like ntpd(8) and cron(8) start. To encourage this, kern.utc_offset is made immutable after the securelevel(7) is raised to 1. Prompted by yasuoka@. Discussed with deraadt@, kettenis@, yasuoka@. Additional testing by yasuoka@. ok deraadt@, yasuoka@
2019-08-05Allow concurrent reads of the f_offset field of struct file byanton
serializing both read/write operations using the existing file mutex. The vnode lock still grants exclusive write access to the offset; the mutex is only used to make the actual write atomic and prevent any concurrent reader from observing intermediate values. ok mpi@ visa@
2019-07-16Prevent integer overflow in kernel and userland when checking mbufAlexander Bluhm
limits. Convert kernel variables and calculations for mbuf memory into long to allow larger values on 64 bit machines. Put a range check into the kernel sysctl. For the interface itself int is still sufficient. In netstat -m cast all multiplications to unsigned long to hold the product of two unsigned int. input and OK visa@
2019-07-12Revert anton@ changes about read/write unlockingsolene
https://marc.info/?l=openbsd-cvs&m=156277704122293&w=2 ok anton@
2019-07-12sysctl(2): add KERN_TIMEOUT_STATS: timeout(9) status and statistics.cheloha
With these totals one can track the throughput of the timeout(9) layer from userspace. With input from mpi@. ok mpi@
2019-07-10Make read/write of the f_offset field belonging to struct file MP-safe;anton
as part of the effort to unlock the kernel. Instead of relying on the vnode lock, introduce a dedicated lock per file. Exclusive write access is granted using the new foffset_enter and foffset_leave API. A convenience function foffset_get is also available for threads that only need to read the current offset. The lock acquisition order in vn_write has been changed to match the one in vn_read in order to avoid a potential deadlock. This change also gets rid of a documented race in vn_read(). Inspired by the FreeBSD implementation. With help and ok mpi@ visa@
2019-06-16In previous commit I forgot a net unlock if the PCB of the socketAlexander Bluhm
was already gone. OK mpi@
2019-06-13When tcp_close() is running in parallel with fill_file(), the kernelAlexander Bluhm
could crash due to missing inp_ppcb. This happend when fstat(1) was called often and TCP was aborted with reset. Protect the sysctl path with the net lock. OK mpi@
2019-06-01Revert to using the SCHED_LOCK() to protect time accounting.Martin Pieuchot
It currently creates a lock ordering problem because SCHED_LOCK() is taken by hardclock(). That means the "priorities" of a thread should be moved out of the SCHED_LOCK() first in order to make progress. Reported-by: syzbot+8e4863b3dde88eb706dc@syzkaller.appspotmail.com via anton@ as well as by kettenis@
2019-05-31Use a per-process mutex to protect time accounting instead of SCHED_LOCK().Martin Pieuchot
Note that hardclock(9) still increments p_{u,s,i}ticks without holding a lock. ok visa@, cheloha@
2019-05-22Read and assign the integer value only once. With this sysctl_int() willClaudio Jeker
do word loads and stores and so partial updates should no longer be observed. With this accessing global variables set by sysctl_int() should be mostly MP save. OK dlg@ mpi@
2019-05-09Add a sysctl accessor to struct pf_status. The pf_status only holds theClaudio Jeker
current status and statistics and can be exported without super-user rights via sysctl to make it easier for tools like systat to access those. OK deraadt@, sashan@
2019-01-29Add a dedicated sysctl(2) node for witness(4).Visa Hankala
The new node contains the subsystem's main control variable, kern.witness.watch. It is aliased by the old name, kern.witnesswatch. The alias will be removed in the future. OK anton@ mpi@
2019-01-19Move boottime into the timehands.cheloha
To protect the timehands we first need to protect the basis for all UTC time in the kernel: the boottime. Because the boottime can be changed at any time it needs to be versioned along with the other members of the timehands to enable safe lockless reads when using it for anything. So the global boottime timespec goes away and the static boottimebin becomes a member of the timehands. Instead of reading the global boottime you use one of two interfaces: binboottime(9) or microboottime(9). nanoboottime(9) can trivially be added later, though there are no consumers for it at the moment. This introduces one small change in behavior. We used to advance the reported boottime just before launching kernel threads from main(). This makes it look to userland like we "booted" moments before those threads were launched. Because there is no longer a boottime global we can no longer trivially do this from main(), so the boottime we report to userspace via e.g. kern.boottime will now reflect whatever the time was when we bootstrapped the timehands via inittodr(9). This is usually no more than a minute before the kernel threads are launched from main(). The prior behavior can be restored by adding a new interface to the timecounter layer in a future commit. Based on FreeBSD r303387. Discussed with mpi@ and visa@. ok visa@
2018-11-19delete the dns jackport experiment. it has no future.Ted Unangst
2018-11-17Add new KERN_CPUSTATS sysctl(2) so we can identify offline CPUs.cheloha
Because of hw.smt we need a way to determine whether a given CPU is "online" or "offline" from userspace. KERN_CPTIME2 is an array, and so cannot be cleanly extended for this purpose, so add a new sysctl(2) KERN_CPUSTATS with an extensible struct. At the moment it's just KERN_CPTIME2 with a flags member, but it can grow as needed. KERN_CPUSTATS appears to have been defined by BSDi long ago, but there are few (if any) packages in the wild still using the symbol so breakage in ports should be near zero. No other system inherited the symbol from BSDi, either. Then, use the new sysctl(2) in systat(1) and top(1): - systat(1) draws placeholder marks ('-') instead of percentages for offline CPUs in the cpu view. - systat(1) omits offline CPU ticks when drawing the "big bar" in the vmstat view. The upshot is that the bar isn't half idle when half your logical CPUs are disabled. - top(1) does not draw lines for offline CPUs; if CPUs toggle on or offline in interactive mode we redraw the display to expand/reduce space for the new/missing CPUs. This is consistent with what some top(1) implementations do on Linux. - top(1) omits offline CPUs from the totals when CPU totals are combined into a single line (the '-1' flag). Originally prompted by deraadt@. Discussed endlessly with deraadt@, ketennis@, and sthen@. Tested by jmc@ and jca@. Earlier versions also discussed with jca@. Earlier versions tested by jmc@, tb@, and many others. docs ok jmc@, kernel bits ok ketennis@, everything ok sthen@, "Is your stuff in yet?" deraadt@
2018-10-05Revert KERN_CPTIME2 ENODEV changes in kernel and userspace.cheloha
ok kettenis deraadt
2018-10-04Revert the inpcb table mutex commit. It triggers a witness panicAlexander Bluhm
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx is held and sorwakeup() is called within the loop. As sowakeup() grabs the kernel lock, we have a lock ordering problem. found by Hrvoje Popovski; OK deraadt@ mpi@
2018-09-26KERN_CPTIME2: set ENODEV if the CPU is offline.cheloha
This lets userspace distinguish between idle CPUs and those that are not schedulable because hw.smt=0. A subsequent commit probably needs to add documentation for this to sysctl.2 (and perhaps elsewhere) after the dust settles. Also included here are changes to systat(1) and top(1) that account for the ENODEV case and adjust behavior accordingly: - systat(1)'s cpu view prints placeholder marks ('-') instead of percentages for each state if the given CPU is offline. - systat(1)'s vmstat view checks for offline CPUs when computing the machine state total and excludes them, so the CPU usage graph only represents the states for online CPUs. - top(1) does not draw CPU rows for offline CPUs when the view is redrawn. If CPUs "go offline", percentages for each state are replaced by placeholder marks ('-'); the view will need to be redrawn to remove these rows. If CPUs "go online" the view will need to be redrawn to show these new CPUs. In "combined CPU" mode, the count and the state totals only represent online CPUs. Ports using KERN_CPTIME2 will need to be updated. The changes described above to make systat(1) and top(1) aware of the ENODEV case *and* gracefully handle a changing HW_NCPUONLINE while the application is running are not necessarily appropriate for each and every port. The changes described above are so extensive in part to demonstrate one way a program *might* be made robust to changing CPU availability. In particular, changing hw.smt after boot is an extremely rare event, and this needs to be weighed when updating ports. The logic needed to account for the KERN_CPTIME2 ENODEV case is very roughly: if (sysctl(...) == -1) { if (errno != ENODEV) { /* Actual error occurred. */ } else { /* CPU is offline. */ } } else { /* CPU is online and CPU states were set by sysctl(2). */ } Prompted by deraadt@. Basic idea for ENODEV from kettenis@. Discussed at length with kettenis@. Additional testing by tb@. No complaints from hackers@ after a week. ok kettenis@, "I think you should commit [now]" deraadt@
2018-09-20As a step towards per inpcb or socket locks, remove the net lockAlexander Bluhm
for netstat -a. Introduce a global mutex that protects the tables and hashes for the internet PCBs. To detect detached PCB, set its inp_socket field to NULL. This has to be protected by a per PCB mutex. The protocol pointer has to be protected by the mutex as netstat uses it. Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify() before the table mutex to avoid lock ordering problems in the notify functions. OK visa@
2018-07-12Add hw.ncpuonline to count the number of online CPUs.cheloha
The introduction of hw.smt means that logical CPUs can be disabled after boot and prior to suspend/resume. If hw.smt=0 (the default), there needs to be a way to count the number of hardware threads available on the system at any given time. So, import HW_NCPUONLINE/hw.ncpuonline from NetBSD and document it. hw.ncpu becomes equal to the number of CPUs given to sched_init_cpu() during boot, while hw.ncpuonline is equal to the number of CPUs available to the scheduler in the cpuset "sched_all_cpus". Set_SC_NPROCESSORS_ONLN equal to this new sysctl and keep _SC_NPROCESSORS_CONF equal to hw.ncpu. This is preferable to adding a new sysctl to count the number of configured CPUs and keeping hw.ncpu equal to the number of online CPUs because such a change would break software in the ecosystem that relies on HW_NCPU/hw.ncpu to measure CPU usage and the like. Such software in base includes top(1), systat(1), and snmpd(8), and perhaps others. We don't need additional locking to count the cardinality of a cpuset in this case because the only interfaces that can modify said cardinality are sysctl(2) and ioctl(2), both of which are under the KERNEL_LOCK. Software using HW_NCPU/hw.ncpu to determine optimal parallism will need to be updated to use HW_NCPUONLINE/hw.ncpuonline. Until then, such software may perform suboptimally. However, most changes will be similar to the change included here for libcxx's std::thread:hardware_concurrency(): using HW_NCPUONLINE in lieu of HW_NCPU should be sufficient for determining optimal parallelism for most software if the change to _SC_NPROCESSORS_ONLN is insufficient. Prompted by deraadt. Discussed at length with kettenis, deraadt, and sthen. Lots of patch tweaks from kettenis. ok kettenis, "proceed" deraadt
2018-07-02Update the file reference count field `f_count' using atomic operationsVisa Hankala
instead of using a mutex for update serialization. Use a per-fdp mutex to manage updating of file instance pointers in the `fd_ofiles' array to let fd_getfile() acquire file references safely with concurrent file reference releases. OK mpi@