src - OpenBSD base system

Age	Commit message (Collapse)	Author
8 days	Cast atomic_load_int(9) to signed int when loading `securelevel'.	Vitaliy Makkoveev
	The return value of atomic_load_int(9) is unsigned so needs a cast, otherwise securelevel=-1 gets misrepresented. From Paul Fertser.
2024-11-08	Use PCB iterator for raw IPv6 input loop.	Alexander Bluhm
	Implement inpcb iterator in rip6_input(). Factor out the real work to rip6_sbappend(). Now UDP broadcast and multicast, raw IPv4 and IPv6 input work similar. While there, make rip_input() look more like rip6_input(). OK mvs@
2024-11-05	Use PCB iterator for raw IP input deliver loop.	Alexander Bluhm
	Inspired by mvs@ idea of the iterator in the UDP multicast loop, implement the same for raw IP input delivery. This removes an unneccesary rwlock and only uses table mutex. When comparing the inp routing table, address and port, the table lock must be held. So assume that in_pcb_iterator() already has the table mutex and hold it while traversing the list and doing the checks. Release the mutex during mbuf copy, socket buffer append and the upcalls. Adapt the logic for both rip_input() and udp_input(). In rip_input() move the actual work to rip_sbappend(). This can be called without mutex during list traversal and for the final element. OK mvs@
2024-11-05	Replace rwlock with iterator in UDP input multicast loop.	Alexander Bluhm
	The broadcast and multicast loop in udp_input() is protected by the table mutex. The relevant PCBs were collected in a separate list, which was processed while the table notify rwlock was held. When sending UDP multicast packets over vxlan(4) configured over UDP with multicast groups, this lock was taken recursively causing a kernel crash. By using an iterator, traversing the PCB list of the table does not require to hold the mutex all the time. Only while accessing the next element after the iterator, the mutex is taken for a short time. udp_sbappend() and the upcall to vxlan_input() is done with neither mutex nor rwlock. The PCB is reference counted while traversing the list. crash reported by Holger Glaess; iterator implemented by mvs@; tested and fixed by bluhm@; OK mvs@
2024-10-31	Unlock fs_sysctl(). It is the only `suid_clear' variable - atomically	Vitaliy Makkoveev
	accessed integer. ok bluhm
2024-10-28	Unlock KERN_ALLOWKMEM. The `allowkmem' is atomically accessed integer.	Vitaliy Makkoveev
	Also use atomic_load_int(9) to load `securelevel'. sysctl_securelevel() is mp-safe, but will be under kernel lock until all existing `securelevel' loading became mp-safe too. ok mpi
2024-10-25	Unlock timeout_sysctl(). `tostat' timeout(9) statistics is already	Vitaliy Makkoveev
	protected by `timeout_mtx' mutex(9). ok kettenis
2024-09-30	Use ps_ppid instead of ps_pptr->ps_pid in all places.	Claudio Jeker
	OK mpi@
2024-09-24	Fix sleeping race during malloc in sysctl hw.disknames.	Alexander Bluhm
	When mallocarray(9) sleeps, disk_count can change, and diskstatslen gets inconsistent. This caused free(9) to panic. Reported-by: syzbot+36e1f3b306f721f90c72@syzkaller.appspotmail.com OK deraadt@ mpi@
2024-08-29	Show expensive mbuf operations in netstat(1) statistics.	Alexander Bluhm
	If the memory layout is not optimal, m_defrag(), m_prepend(), m_pullup(), and m_pulldown() will allocate mbufs or copy memory. Count these operations to find possible optimizations. input dhill@; OK mvs@
2024-08-26	style(9) fix. No functional changes.	Vitaliy Makkoveev

2024-08-23	Fix KERN_AUDIO broken in rev 1.440.	Vitaliy Makkoveev

2024-08-22	Introduce sysctl_securelevel() to modify `securelevel' mp-safe. Keep	Vitaliy Makkoveev
	KERN_SECURELVL locked until existing `securelevel' checks became moved out of kernel lock. Make sysctl_securelevel_int() mp-safe by using atomic_load_int(9) to unlocked read-only access for `securelevel'. Unlock KERN_ALLOWDT. `allowdt' is the atomically accessed integer used only once in dtopen(). ok mpi
2024-08-20	Unlock KERN_MAXFILES.	Vitaliy Makkoveev
	`maxfiles' is atomically accessed integer which is lockless and read-only accessed in file descriptors layer. lim_startup() called during kernel bootstrap, no need to atomic_load_int() within. ok mpi
2024-08-20	Unlock KERN_MAXPROC and KERN_MAXTHREAD from `kern_vars'. Both	Vitaliy Makkoveev
	`maxprocess' and `maxthread' are atomically accessed integers. ok mpi
2024-08-20	Unlock sysctl_audio().	Vitaliy Makkoveev
	It is the only KERN_AUDIO_RECORD. `audio_record_enable' is atomically accessed integer. Reasonable from deraadt
2024-08-14	Push kernel lock down to net_sysctl().	Vitaliy Makkoveev
	All except PF_MPLS paths are mp-safe: - net_link_sysctl() and following net_ifiq_sysctl() only return EOPNOTSUPP; - uipc_sysctl() - mp-safe atomic access to integers; - bpf_sysctl() - mp-safe atomic access to integers; - pflow_sysctl() - returns statistics from per-CPU counters; - pipex_sysctl() - mp-safe atomic access to integer; Push kernel lock down to mpls_sysctl(). sysctl_int_bounded() do copying with local variable, so context switch is safe. No need to wire memory or take `sysctl_lock' rwlock(9). Keep protocols locked as they was include pages wiring. Copying will not sleep - no network slowdown while doing it with net lock held. ok bluhm
2024-08-14	Make sysctl_int() and sysctl_int_lower() mp-safe and unlock KERN_HOSTID.	Vitaliy Makkoveev
	The only difference between sysctl_int() and sysctl_int_bounded() is the range check, so sysctl_int() is just sysctl_int_bounded(..., INT_MIN, INT_MAX). sysctl_int() is not the fast path, so this useless check is not significant. Mp-safe sysctl_int() is meaningless for sysctl_int_lower(), so rework it in the sysctl_int_bounded() style. This time all affected paths are kernel locked, but this doesn't make sysctl_int_lower() worse. Change `hostid' type to the type of int. It only stored but never used within kernel, userland accesses it through sysctl_int(). Nothing changes, but variable becomes consistent with sysctl_int(). ok bluhm
2024-08-11	Make exit1() wait sysctl(2) `allprocess' loops.	Vitaliy Makkoveev
	Regardless on wired userland memory, KERN_FILE_BYPID and KERN_FILE_BYUID `allprocess' loops have netlock provided sleep points, so concurrent process exit(1) could crash kernel. The main exit1() problem is that process teardown begins while process is still linked to `allprocess' list, and current code doesn't allow to unlink it first. Wait for concurrent sysctl(2) `allprocess' loops between PS_EXITING bit setting and list unlinking. Both KERN_FILE_BYPID and KERN_FILE_BYUID loops do PS_EXITING check and won't deal with dying process. Concurrent exit1() thread will wait loops keeping process linked to `allprocess' list. Tested with i386 dpb(1) run. Stress tests and ok bluhm.
2024-08-08	In sysctl KERN_FILE_BYPID stop traversal after pid has been found.	Alexander Bluhm
	When searching for a specific process, there is no need to traverse the list of all processes to the end. Break after pid has been found and the file structure has been filled. Also check for arg >= 0 as this is consistent with the arg < -1 check before. This makes no functional difference as process 0 has PS_SYSTEM set and is skipped anyway. OK millert@ mvs@
2024-08-08	Unlock KERN_MSGBUFSIZE and KERN_CONSBUFSIZE.	Vitaliy Makkoveev
	`msgbufp' and `consbufp' are immutable, such as `msg_magic' and `msg_bufs'. initmsgbuf() and initconsbuf() which initialize this buffers are called during kernel bootstrap, when concurrent sysctl(2) is impossible, so they don't need to be reordered or use barriers. ok bluhm
2024-08-06	Unlock KERN_CLOCKRATE.	Vitaliy Makkoveev
	Read-only access to local `clkinfo' filled with immutable data. ok bluhm
2024-08-05	Unlock KERN_BOOTTIME.	Vitaliy Makkoveev
	microboottime() and following binboottime() are mp-safe and `mb' is local data. ok bluhm
2024-08-05	Unlock most of `kern_vars' variables.	Vitaliy Makkoveev
	Add corresponding cases to the kern_sysctl() switch and unlock read-only variables from `kern_vars'. Unlock KERN_SOMAXCONN and KERN_SOMINCONN which are atomically read-only accessed only from solisten(). ok kettenis
2024-08-05	Take `sysctl_lock' before kernel lock.	Vitaliy Makkoveev
	ok bluhm
2024-08-02	Push kernel lock down to sysctl(2).	Vitaliy Makkoveev
	Unlock few obvious immutable or read-only variables from "kern." and "hw." paths. Keep the rest variables locked as before, include pages wiring. Use new sysctl_vs{,un}lock() functions introduced for thar purpose. In kern.* path: - KERN_OSTYPE, KERN_OSRELEASE, KERN_OSVERSION, KERN_VERSION - immutable; - KERN_NUMVNODES - read-only access to integer; - KERN_MBSTAT - read-only access to per-CPU counters; In hw.* path: - HW_MACHINE, HW_MODEL, HW_NCPUONLINE, HW_PHYSMEM, HW_VENDOR, HW_PRODUCT, HW_VERSION, HW_SERIALNO, HW_UUID, HW_PHYSMEM64 - immutable; - HW_USERMEM and HW_USERMEM64 - `physmem' is immutable, uvmexp.wired is mutable but integer; read-only access to localy stored difference between `physmem' and uvmexp.wired; - `hw_vars' - read-only access to integers; some of them like HW_BYTEORDER and HW_PAGESIZE are immutable; ok bluhm kettenis
2024-07-11	Use atomic operations to access integers in sysctl(2).	Alexander Bluhm
	In sysctl_int_bounded() use atomic operations to load, store, or swap integer values. By using volatile pointers this will result in a single assembly instruction, no matter how over optimizing compilers will become. Note that this does not solve data dependency problems, nor MP problems in the kernel code using these integers. For full MP safety additional considerations, memory barriers, or locks will be needed where the values are used. But for simple integer in- and output volatile is enough. If new and old value pointers are given to sysctl, atomic swapping guarantees that userlands sees the same old value only once. There are more sysctl_int() functions that have to be adapted. OK deraadt@ kettenis@
2024-07-08	Rework per proc and per process time usage accounting	Claudio Jeker
	For procs (threads) the accounting happens now lockless by curproc using a generation counter. Callers need to use tu_enter() and tu_leave() for this. To read the proc p_tu struct tuagg_get_proc() should be used. It ensures that the values read is consistent. For processes only the time of exited threads is accumulated in ps_tu and to get the proper process time usage tuagg_get_process() needs to be called. tuagg_get_process() will sum up all procs p_tu plus the ps_tu. This removes another SCHED_LOCK() dependency. Adjust the code in exit1() and exit2() to correctly account for the full run time. For this adjust sched_exit() to do the runtime accounting like it is done in mi_switch(). OK jca@ dlg@
2024-04-12	Split single TCP inpcb table into IPv4 and IPv6 parts.	Alexander Bluhm
	With two separate TCP hash tables, each one becomes smaller. When we remove the exclusive net lock from TCP, contention on internet PCB table mutex will be reduced. UDP has been split earlier into IPv4 and IPv6. Replace branch conditions based on INP_IPV6 with assertions. OK mvs@
2024-03-29	Remove one global variable duplicating uvmexp.pagesize.	Miod Vallat
	ok guenther@ deraadt@
2024-02-10	On kernels without ucom(4) support, 'sysctl hw.ucomnames' should return	Theo de Raadt
	the empty string, rather than error. ok krw
2024-01-19	Backout priterator() for walking allprocess list.	Alexander Bluhm
	This approach does not work as LIST_NEXT() of a removed element does not return NULL. I causes a crash in syzcaller and triggers kernel diagnostic assertion "vp->v_uvcount == 0" in sys/kern/kern_unveil.c line 845 during reboot. Unfortunately the backout brings back the race in fill_file() and fstat(1) may crash the kernel. Reported-by: syzbot+54fba1c004d7383d5e85@syzkaller.appspotmail.com
2024-01-18	Use solock() instead of netlock within fill_file(). This makes all	Vitaliy Makkoveev
	socket types protected. The netlock is still used while fill_file() called through *table.inpt_queue walkthroughs, but this is the inet sockets case. ok bluhm
2024-01-15	Introduce priterator(), the `ps_list' iterator. Some of `allprocess'	Vitaliy Makkoveev
	list walkthroughs have context switch within, so make exit1() wait until the last reference released. Reported-by: syzbot+0e9dda76c42c82c626d7@syzkaller.appspotmail.com ok bluhm claudio
2024-01-10	Split UDP PCB table into IPv4 and IPv6.	Alexander Bluhm
	Having two hash tables instead of a common one, reduces table size and contention on the per table lock. The address family is always known in advance. The lookups and loops are more specific. OK sashan@
2023-10-01	Add sysctl hw.ucomnames to list 'fixed' paths to USB serial	Kenneth R Westerback
	ports. Suggested by deraadt@, USB route idea from kettenis@. Feedback from anton@, man page improvements from deraadt@, jmc@, schwarze@. ok deraadt@ kettenis@
2023-09-16	Allow counters_read(9) to take an optional scratch buffer.	Martin Pieuchot
	Using a scratch buffer makes it possible to take a consistent snapshot of per-CPU counters without having to allocate memory. Makes ddb(4) show uvmexp command work in OOM situations. ok kn@, mvs@, cheloha@
2023-07-16	Make the mbstat preserve the same size which is actually used. Also	YASUOKA Masahiko
	revert the previous that the mbstat is located on the stack. ok claudio
2023-07-07	Expand the counters in struct mbstat from u_short to u_long. Use	Alexander Bluhm
	malloc(9) memory instead of kernel stack for sysctl kern.mbstat. from yasuoka@; chunk missed in previous commit; OK claudio@ tb@
2023-07-02	all platforms, kernel: remove __HAVE_CLOCKINTR symbol	Scott Soule Cheloha
	Every platform made the clockintr switch at least six months ago. The __HAVE_CLOCKINTR symbol is now redundant. Remove it. Prompted by claudio@. Link: https://marc.info/?l=openbsd-tech&m=168826181015032&w=2 "makes sense" mlarkin@
2023-05-21	In sysctl_hwchargestop() check that hw_battery_setchargestop is set	Claudio Jeker
	and not hw_battery_setchargestart. OK kettenis@
2023-05-18	Backout sysctl(2) unlocking. Lock order issue was triggered in UVM	Vitaliy Makkoveev
	layer.
2023-05-17	Implement battery management sysctl. This will provide a set of sysctls	Mark Kettenis
	to control the charging of laptop batteries: * hw.battery.chargemode (int) -1: force discharge 0: inhibit charge 1: auto In auto mode charging may be controlled by: * hw.battery.chargestop (int) Percentage (0-100) of last full capacity at which the battery should stop charging. * hw.battery.chargestart (int) Percentage (0-100) of last full capacity at which the battery should start charging. The idea is that with hw.battery.chargemode=1 hw.battery.chargestop=80 hw.battery.chargestart=75 the battery would be kept charged within the range between 75% and 80%. Allowable settings and some details of the behavior may differ between hardware implementations. Committing this early to easy testing of further diffs that implement this functionality in acpithinkpad(4) and aplsmc(4). ok kn@
2023-05-04	Push kernel lock deep down to sys_sysctl(). At least network subset of	Vitaliy Makkoveev
	sysctl(8) MIBs relies on netlock or another locks and doesn't require kernel lock, so unlock it. The protocols layer *_sysctl()s are left under kernel lock and will be sequentially unlocked later. ok bluhm@
2023-01-22	Move SS_CANTRCVMORE and SS_RCVATMARK bits from `so_state' to `sb_state' of	Vitaliy Makkoveev
	receive buffer. As it was done for SS_CANTSENDMORE bit, the definition kept as is, but now these bits belongs to the `sb_state' of receive buffer. `sb_state' ored with `so_state' when socket data exporting to the userland. ok bluhm@
2023-01-21	Introduce per-sockbuf `sb_state' to use it with SS_CANTSENDMORE.	Vitaliy Makkoveev
	This time, socket's buffer lock requires solock() to be held. As a part of socket buffers standalone locking work, move socket state bits which represent its buffers state to per buffer state. Opposing the previous reverted diff, the SS_CANTSENDMORE definition left as is, but it used only with `sb_state'. `sb_state' ored with original `so_state' when socket's data exported to the userland, so the ABI kept as it was. Inputs from deraadt@. ok bluhm@
2023-01-14	sysctl(2): KERN_CPUSTATS: zero struct cpustats before copyout	Scott Soule Cheloha

2022-11-07	introduce a new kern.autoconf_serial sysctl that can be used by userland	Robert Nagy
	to monitor state changes of the kernel device tree input from dnd ok dlg@, deraadt@
2022-11-05	clockintr(9): initial commit	Scott Soule Cheloha
	clockintr(9) is a machine-independent clock interrupt scheduler. It emulates most of what the machine-dependent clock interrupt code is doing on every platform. Every CPU has a work schedule based on the system uptime clock. For now, every CPU has a hardclock(9) and a statclock(). If schedhz is set, every CPU has a schedclock(), too. This commit only contains the MI pieces. All code is conditionally compiled with __HAVE_CLOCKINTR. This commit changes no behavior yet. At a high level, clockintr(9) is configured and used as follows: 1. During boot, the primary CPU calls clockintr_init(9). Global state is initialized. 2. Primary CPU calls clockintr_cpu_init(9). Local, per-CPU state is initialized. An "intrclock" struct may be installed, too. 3. Secondary CPUs call clockintr_cpu_init(9) to initialize their local state. 4. All CPUs repeatedly call clockintr_dispatch(9) from the MD clock interrupt handler. The CPUs complete work and rearm their local interrupt clock, if any, during the dispatch. 5. Repeat step (4) until the system shuts down, suspends, or hibernates. 6. During resume, the primary CPU calls inittodr(9) and advances the system uptime. 7. Go to step (2). This time around, clockintr_cpu_init(9) also advances the work schedule on the calling CPU to skip events that expired during suspend. This prevents a "thundering herd" of useless work during the first clock interrupt. In the long term, we need an MI clock interrupt scheduler in order to (1) provide control over the clock interrupt to MI subsystems like timeout(9) and dt(4) to improve their accuracy, (2) provide drivers like acpicpu(4) a means for slowing or stopping the clock interrupt on idle CPUs to conserve power, and (3) reduce the amount of duplicated code in the MD clock interrupt code. Before we can do any of that, though, we need to switch every platform over to using clockintr(9) and do some cleanup. Prompted by "the vmm(4) time bug," among other problems, and a discussion at a2k19 on the subject. Lots of design input from kettenis@. Early versions reviewed by kettenis@ and mlarkin@. Platform-specific help and testing from kettenis@, gkoehler@, mlarkin@, miod@, aoyama@, visa@, and dv@. Babysitting and spiritual guidance from mlarkin@ and kettenis@. Link: https://marc.info/?l=openbsd-tech&m=166697497302283&w=2 ok kettenis@ mlarkin@
2022-08-16	Remove obsolete kern.nselcoll sysctl.	Visa Hankala
	OK millert@ deraadt@