src - OpenBSD base system

Age	Commit message (Collapse)	Author
2024-04-17	dogetrusage() must be called with the KERNEL_LOCK held for now.	Claudio Jeker
	OK mpi@
2024-04-15	Don't take solock() in soreceive() for udp(4) sockets.	Vitaliy Makkoveev
	These sockets are not connection oriented, they don't call pru_rcvd(), but they have splicing ability and they set `so_error'. Splicing ability is the most problem. However, we can hold `sb_mtx' around `ssp_socket' modifications together with solock(). So the `sb_mtx' is pretty enough to isspiced() check in soreceive(). The unlocked `so_sp' dereference is fine, because we set it only once for the whole socket life-time and we do this before `ssp_socket' assignment. We also need to take sblock() before splice sockets, so the sosplice() and soreceive() are both serialized. Since `sb_mtx' required to unsplice sockets too, it also serializes somove() with soreceive() regardless on somove() caller. The sosplice() was reworked to accept standalone sblock() for udp(4) sockets. soreceive() performs unlocked `so_error' check and modification. Previously, we have no ability to predict which concurrent soreceive() or sosend() thread will fail and clean `so_error'. With this unlocked access we could have sosend() and soreceive() threads which fails together. `so_error' stored to local `error2' variable because `so_error' could be overwritten by concurrent sosend() thread. Tested and ok bluhm
2024-04-15	Regen after sigsuspend and __thrsigdivert unlock	Claudio Jeker

2024-04-15	sigsuspend and __thrsigdivert no longer require the KERNEL_LOCK since	Claudio Jeker
	dosigsuspend() no longer needs it. OK mvs@ mpi@
2024-04-13	correct indentation	Jonathan Gray
	no functional change, found by smatch warnings ok miod@ bluhm@
2024-04-12	Split single TCP inpcb table into IPv4 and IPv6 parts.	Alexander Bluhm
	With two separate TCP hash tables, each one becomes smaller. When we remove the exclusive net lock from TCP, contention on internet PCB table mutex will be reduced. UDP has been split earlier into IPv4 and IPv6. Replace branch conditions based on INP_IPV6 with assertions. OK mvs@
2024-04-11	Don't take solock() in soreceive() for SOCK_RAW inet sockets.	Vitaliy Makkoveev
	For inet sockets solock() is the netlock wrapper, so soreceive() could be performed simultaneous with exclusively locked code paths. These sockets are not connection oriented, they don't call pru_rcvd(), they can't be spliced, they don't set `so_error'. Nothing to protect with solock() in soreceive() path. `so_rcv' buffer protected by `sb_mtx' mutex(9), but since it released, sblock() required to serialize concurrent soreceive() and sorflush() threads. Current sblock() is some kind of rwlock(9) implementation, so introduce `sb_lock' rwlock(9) and use it directly for that purpose. The sorflush() and callers were refactored to avoid solock() for raw inet sockets. This was done to avoid packet processing stop. Tested and ok bluhm.
2024-04-11	Take solock_shared() in soo_stat().	Vitaliy Makkoveev
	Only unix(4) and tcp(4) sockets set (*pru_sence)() handler. The rest of soo_stat() is the read only access. ok bluhm
2024-04-10	Remove `head' socket re-locking in sonewconn().	Vitaliy Makkoveev
	uipc_attach() releases solock() because it should be taken after `unp_gc_lock' rwlock(9) which protects the `unp_link' list. For this reason, the listening `head' socket should be unlocked too while sonewconn() calls uipc_attach(). This could be reworked because now `so_rcv' sockbuf relies on `sb_mtx' mutex(9). The last one `unp_link' foreach loop within unp_gc() discards sockets previously marked as UNP_GCDEAD. These sockets are not accessed from the userland. The only exception is the sosend() threads of connected sending peers, but they only sbappend*() mbuf(9) to `so_rcv'. So it's enough to unlink mbuf(9) chain with `sb_mtx' held and discard lockless. Please note, the existing SS_NEWCONN_WAIT logic was never used because the listening unix(4) socket protected from concurrent unp_detach() by vnode(9) lock, however `head' re-locked all times. ok bluhm
2024-04-10	Unlock dosigsuspend() and with that some aspects of ppoll and pselect	Claudio Jeker
	Change p_sigmask from atomic back to non-atomic updates. All changes to p_sigmask are only allowed by curproc (the owner). There is no need for atomic instructions here. p_sigmask is mostly accessed by curproc with the exception of ptsignal(). In ptsignal() p_sigmask is now only read once unless a SSLEEP proc gets the signal. In that case recheck the p_sigmask before wakeup to ensure that no unnecessary wakeup happens. Add some KASSERT(p == curproc) to ensure this precondition. sigabort() is special since it is also called by ddb but apart from that only works for curproc. With and OK mvs@ OK mpi@
2024-04-05	sync	Theo de Raadt

2024-04-05	msyscall(2) goes away	Theo de Raadt

2024-04-05	noone calls msyscall() anymore.	Theo de Raadt

2024-04-02	Implement SO_ACCEPTCONN in getsockopt(2)	Claudio Jeker
	Requested by robert@ OK mvs@ millert@ deraadt@
2024-04-02	Remove wrong "temporary udp error" comment in filt_so{read,write}(). Not	Vitaliy Makkoveev
	only udp(4) sockets set and check `so_error'. No functional changes. ok bluhm
2024-04-02	Delete the msyscall mechanism entirely, since mimmutable+pinsyscalls has	Theo de Raadt
	replaced it with a more strict mechanism, which happens to be lockless O(1) rather than micro-lock O(1)+O(log N). Also nop-out the sys_msyscall(2) guts, but leave the syscall around for a bit longer so that people can build through it, since ld.so(1) still wants to call it.
2024-04-02	remove useless whitespace; from Jia Tan	Theo de Raadt

2024-03-31	Allow listen(2) only on sockets of type SOCK_STREAM or SOCK_SEQPACKET.	Vitaliy Makkoveev
	listen(2) man(1) page clearly prohibits sockets of other types. Reported-by: syzbot+00450333592fcd38c6fe@syzkaller.appspotmail.com ok bluhm
2024-03-31	Mark `so_rcv' sockbuf of udp(4) sockets as SB_OWNLOCK.	Vitaliy Makkoveev
	sbappend*() and soreceive() of SB_MTXLOCK marked sockets uses `sb_mtx' mutex(9) for protection, meanwhile buffer usage check and corresponding sbwait() sleep still serialized by solock(). Mark udp(4) as SB_OWNLOCK to avoid solock() serialization and rely to `sb_mtx' mutex(9). The `sb_state' and `sb_flags' modifications must be protected by `sb_mtx' too. ok bluhm
2024-03-30	Prevent a recursion inside wakeup(9) when scheduler tracepoints are enabled.	Martin Pieuchot
	Tracepoints like "sched:enqueue" and "sched:unsleep" were called from inside the loop iterating over sleeping threads as part of wakeup_proc(). When such tracepoints were enabled they could result in another wakeup(9) possibly corrupting the sleepqueue. Rewrite wakeup(9) in two stages, first dequeue threads from the sleepqueue then call setrunnable() and possible tracepoints for each of them. This requires moving unsleep() outside of setrunnable() because it messes with the sleepqueue. ok claudio@
2024-03-29	Remove one global variable duplicating uvmexp.pagesize.	Miod Vallat
	ok guenther@ deraadt@
2024-03-28	sys	Theo de Raadt

2024-03-28	Delete pinsyscall(2) [which was specific only to SYS_execve] now	Theo de Raadt
	that it has been replaced with pinsyscalls(2) [which tells the kernel the location of all system calls in libc.so] floated to various people before release, but it was prudent to wait.
2024-03-27	Introduce SB_OWNLOCK to mark sockets which `so_rcv' buffer modified	Vitaliy Makkoveev
	outside socket lock. `sb_mtx' mutex(9) used for this case and it should not be released between `so_rcv' usage check and corresponding sbwait() sleep. Otherwise wakeup() could be lost sometimes. ok bluhm
2024-03-26	Improve spinning in mtx_enter().	Alexander Bluhm
	Instead of calling mtx_enter_try() in each spinning loop, do it only if the result of a lockless read indicates that the mutex has been released. This avoids some expensive atomic compare-and-swap operations. Up to 5% reduction of spinning time during kernel build can been seen on a 8 core amd64 machine. On other machines there was no visible effect. Test on powerpc64 has revealed a bug in mtx_owner declaration. Not the variable was volatile, but the object it points to. Move the volatile declaration in struct mutex to avoid a hang when going to multiuser. from Mateusz Guzik; input kettenis@ jca@; OK mpi@
2024-03-26	Use `sb_mtx' to protect `so_rcv' receive buffer of unix(4) sockets.	Vitaliy Makkoveev
	This makes re-locking unnecessary in the uipc_send() paths, because it's enough to lock one socket to prevent peer from concurrent disconnection. As the little bonus, one unix(4) socket can perform simultaneous transmission and reception with one exception for uipc_rcvd(), which still requires the re-lock for connection oriented sockets. The socket lock is not held while filt_soread() and filt_soexcept() called from uipc_send() through sorwakeup(). However, the unlocked access to the `so_options', `so_state' and `so_error' is fine. The receiving socket can't be or became listening socket. It also can't be disconnected concurrently. This makes immutable SO_ACCEPTCONN, SS_ISDISCONNECTED and SS_ISCONNECTED bits which are clean and set respectively. `so_error' is set on the peer sockets only by unp_detach(), which also can't be called concurrently on sending socket. This is also true for filt_fiforead() and filt_fifoexcept(). For other callers like kevent(2) or doaccept() the socket lock is still held. ok bluhm
2024-03-25	Move the "no (hard) linking directories" and "no cross-mount links"	Philip Guenther
	checks from all the filesystems that support hardlinks at all into the VFS layer. Simplify, EPERM description in link(2). ok miod@ mpi@
2024-03-25	regen	Vitaliy Makkoveev

2024-03-25	Unlock shutdown(2).	Vitaliy Makkoveev
	ok bluhm
2024-03-22	Use sorflush() instead of direct unp_scan(..., unp_discard) to discard	Vitaliy Makkoveev
	dead unix(4) sockets. The difference in direct unp_scan() and sorflush() is the mbuf(9) chain. For the first case it is still linked to the `so_rcv', for the second it is not. This is required to make `sb_mtx' mutex(9) the only `so_rcv' sockbuf protection and remove socket re-locking from the most of uipc_*send() paths. The unlinked mbuf(9) chain doesn't require any protection, so this allows to perform sleeping unp_discard() lockless. Also, the mbuf(9) chain of the discarded socket still contains addresses of file descriptors and it is much safer to unlink it before FRELE() them. This is the reason to commit this diff standalone. ok bluhm
2024-03-22	pledge: Allow the AUDIO_GETDEV ioctl in "audio"	Alexandre Ratchov
	ok deraadt, kn, phessler
2024-03-17	Do UNP_CONNECTING and UNP_BINDING flags check in uipc_listen() and	Vitaliy Makkoveev
	return EINVAL if set. This prevents concurrent solisten() thread to make this socket listening while socket is unlocked. Reported-by: syzbot+4acfcd73d15382a3e7cf@syzkaller.appspotmail.com ok mpi
2024-03-05	Revert m_defrag() mbuf alignment to IP header.	Alexander Bluhm
	m_defrag() is intended as last resort to make DMA transfers to the hardware. Therefore page alingment is more important than IP header alignment. The reason, why the mbuf returned by m_defrag() was switched to IP header alingment, was that ether_extract_headers() failed in em(4) driver with TSO on sparc64. This has been fixed by using memcpy(). The alignment change in m_defrag() is too late in the 7.5 relaese process. It may affect several drivers on different architectures. Bus dmamap for ixl(4) on sun4v expects page alignment. Such alignment issues and TSO mbuf mapping for IOMMU need more thought. OK deraadt@
2024-03-01	Protect pool_get() with kernel lock in sys_ypconnect().	Alexander Bluhm
	Pool namei_pool is initialized with IPL_NONE as filesystem always runs with kernel lock. So pool_get() needs kernel lock also in sys_ypconnect(). OK kn@ deraadt@
2024-02-28	No need to kick a CPU twice when putting a thread on its runqueue.	Martin Pieuchot
	From Christian Ludwig, ok claudio@
2024-02-25	clockintr: rename "struct clockintr_queue" to "struct clockqueue"	Scott Soule Cheloha
	The code has outgrown the original name for this struct. Both the external and internal APIs have used the "clockqueue" namespace for some time when operating on it, and that name is eyeball-consistent with "clockintr" and "clockrequest", so "clockqueue" it is.
2024-02-25	clockintr.h, kern_clockintr.c: add 2023, 2024 to copyright range	Scott Soule Cheloha

2024-02-25	New accounting flag ABTCFI to indicate signal SIGILL + code ILL_BTCFI	Theo de Raadt
	has occurred in the process. ok various people
2024-02-24	clockintr: rename clockqueue_reset_intrclock to clockqueue_intrclock_reprogram	Scott Soule Cheloha
	The function should be in the clockqueue_intrclock namespace. Also, "reprogram" is a better word for what the function actually does.
2024-02-23	timecounting: start system uptime at 0.0 instead of 1.0	Scott Soule Cheloha
	OpenBSD starts the system uptime clock at 1.0 instead of 0.0. We inherited this behavior from FreeBSD when we imported kern_tc.c. patrick@ reports that this causes a problem in sdmmc(4) during boot: the sdmmc_delay() call in sdmmc_init() doesn't block for the full 250ms. This happens because the system hardclock() starts at 0.0 and executes about hz times, rapidly, to "catch up" to 1.0. This instantly expires the first hz timeout ticks, hence the short sleep. Starting the system uptime at 0.0 fixes the problem. Prompted by patrick@. Tested by patrick@. In snaps since Feb 19 2023. Thread: https://marc.info/?l=openbsd-tech&m=170830229732396&w=2 ok patrick@ deraadt@
2024-02-23	timeout: make to_kclock validation more rigorous	Scott Soule Cheloha
	In kern_timeout.c, the to_kclock checks are not strict enough to catch all plausible programmer mistakes. Tighten them up: - timeout_set_flags: KASSERT that kclock is valid - timeout_abs_ts: KASSERT that to_kclock is KCLOCK_UPTIME We can also add to_kclock validation to softclock() and db_show_timeout(), which may help to debug memory corruption: - softclock: panic if to_kclock is not KCLOCK_NONE or KCLOCK_UPTIME - db_show_timeout: print warning if to_kclock is invalid Prompted by bluhm@ in response to a syzbot panic. Hopefully these changes help to narrow down the root cause. Link: https://syzkaller.appspot.com/bug?extid=49d3f7118413963f651a Reported-by: syzbot+49d3f7118413963f651a@syzkaller.appspotmail.com ok bluhm@
2024-02-21	Keep mbuf data alignment intact in m_defrag()	Claudio Jeker
	The recent TSO support in em(4) triggered an alignment error on the TCP header. In em(4) m_defrag() is called before setting up the TSO dma bits and with that the TCP header was suddenly no longer aligned. Like other mbuf functions preserve the data alignment in m_defrag() to prevent such unaligned packets. With help and OK bluhm@ mglocker@
2024-02-14	Enable the pool gc thread on m88k MULTIPROCESSOR kernels now that	Miod Vallat
	pmap_unmap_direct() has been fixed; also tested by aoyama@
2024-02-12	Pass protosw instead of domain structure to soalloc() to get real	Vitaliy Makkoveev
	`pr_type'. The corresponding domain is referenced as `pr_domain'. Otherwise dp->dom_protosw->pr_type of inet sockets always points to inetsw[0]. ok bluhm
2024-02-12	kernel: disable hardclock() on secondary CPUs	Scott Soule Cheloha
	There is no useful work left for secondary CPUs to do in hardclock(). Disable cq_hardclock on secondary CPUs and remove the now-unnecessary early-return from hardclock(). This change reduces every system's normal clock interrupt rate by (HZ - HZ/10) per secondary CPU. For example, an 8-core machine with a HZ=100 kernel should see its clock interrupt rate drop from ~1600 to ~960. Thread: https://marc.info/?l=openbsd-tech&m=170750140915898&w=2 ok kettenis@
2024-02-11	Release `sb_mtx' mutex(9) before sbunlock().	Vitaliy Makkoveev
	ok bluhm
2024-02-11	Use `sb_mtx' instead of `inp_mtx' in receive path for inet sockets.	Vitaliy Makkoveev
	In soreceve(), we only touch `so_rcv' socket buffer, which has it's own `sb_mtx' mutex(9) for protection. So, we can avoid solock() in this path - it's enough to hold `sb_mtx' in soreceive() and around corresponding sbappend*(). But not right now :) This time we use shared netlock for some inet sockets in the soreceive() path. To protect `so_rcv' buffer we use `inp_mtx' mutex(9) and the pru_lock() to acquire this mutex(9) in socket layer. But the `inp_mtx' mutex belongs to the PCB. We initialize socket before PCB, tcp(4) sockets could exist without PCB, so use `sb_mtx' mutex(9) to protect sockbuf stuff. This diff mechanically replaces `inp_mtx' by `sb_mtx' in the receive path. Only for sockets which already use `inp_mtx'. All other sockets left as is. They will be converted later. Since the `sb_mtx' is optional, the new SB_MTXLOCK flag introduced. If this flag is set on `sb_flags', the `sb_mtx' mutex(9) should be taken. New sb_mtx_lock() and sb_mtx_unlock() was introduced to hide this check. They are temporary and will be replaced by mtx_enter() when all this area will be converted to `sb_mtx' mutex(9). Also, the new sbmtxassertlocked() function introduced to throw corresponding assertion for SB_MTXLOCK marked buffers. This time only sbappendaddr() calls it. This function is also temporary and will be replaced by MTX_ASSERT_LOCKED() later. ok bluhm
2024-02-10	On kernels without ucom(4) support, 'sysctl hw.ucomnames' should return	Theo de Raadt
	the empty string, rather than error. ok krw
2024-02-09	dt(4): move interval/profile entry points to dedicated clockintr callback	Scott Soule Cheloha
	To improve the utility of dt(4)'s interval and profile probes we need to move the probe entry points from the fixed-frequency hardclock() to a dedicated clock interrupt callback so that the probes can fire at arbitrary frequencies. - Remove entry points for interval/profile probes from hardclock(). - Merge dt_prov_profile_enter(), dt_prov_interval_enter(), and dt_prov_profile_fire() into one function, dt_clock(). This is the now-unified callback for interval/profile probes. dt_clock() will consume multiple events during a single execution if it is delayed, but on platforms with high quality interrupt clocks this should be rare. - Each struct dt_pcb gets its own clockintr handle, dp_clockintr. - In struct dt_pcb, replace dp_maxtick/dp_nticks with dp_nsecs, the PCB's sampling period. Aynchronous probes must initialize dp_nsecs to a non-zero value during dtpv_alloc(). - In struct dt_pcb, replace dp_cpuid with dp_cpu so that dt_ioctl_record_start() knows where to bind the PCB's dp_clockintr. - dt_ioctl_record_start() binds, staggers, and starts all interval/profile PCBs on the given dt_softc. Each dp_clockintr is given a reference to its enclosing PCB so that dt_clock() doesn't need to search for it. The staggering sort-of simulates the current behavior under hardclock(). - dt_ioctl_record_stop() unbinds all interval/profile PCBs. The CL_BARRIER ensures that dp_clockintr's PCB reference is not in use by dt_clock() so that the PCB may be safely freed upon return from dt_ioctl_record_stop(). Blocking while holding dt_lock is not ideal, but in practice blocking in this spot is rare and dt_clock() completes quickly on all but the oldest hardware. An extremely unlucky thread could block for every interval/profile PCB on the softc, but this is implausible. DT_FA_PROFILE values are up-to-date for amd64, i386, and macppc. Somebody with the right hardware needs to check-and-maybe-fix the values on octeon, powerpc64, and sparc64. Joint effort with mpi@. Thread: https://marc.info/?l=openbsd-tech&m=170629371821879&w=2 ok mpi@
2024-02-09	clockintr: add clockintr_unbind()	Scott Soule Cheloha
	The clockintr_unbind() function cancels any pending execution of the given clock interrupt object's callback and severs the binding between the object and its host CPU. Upon return from clockintr_unbind(), the clock interrupt object may be rebound with a call to clockintr_bind(). The optional CL_BARRIER flag tells clockintr_unbind() to block if the clockintr's callback function is executing at the moment of the call. This is useful when the clockintr's arg is a shared reference and the caller needs to be certain the reference is inactive. Now that clockintrs can be bound and unbound repeatedly, there is more room for error. To help catch programmer errors, clockintr_unbind() sets cl_queue to NULL. Calls to other API functions after a clockintr is unbound will then fault on a NULL dereference. clockintr_bind() also KASSERTs that cl_queue is NULL to ensure the clockintr is not already bound. These checks are not perfect, but they do catch some common errors. With input from mpi@. Thread: https://marc.info/?l=openbsd-tech&m=170629367121800&w=2 ok mpi@