src - OpenBSD base system

Age	Commit message (Collapse)	Author
2022-10-03	System calls should not fail due to temporary memory shortage in	Alexander Bluhm
	malloc(9) or pool_get(9). Pass down a wait flag to pru_attach(). During syscall socket(2) it is ok to wait, this logic was missing for internet pcb. Pfkey and route sockets were already waiting. sonewconn() must not wait when called during TCP 3-way handshake. This logic has been preserved. Unix domain stream socket connect(2) can wait until the other side has created the socket to accept. OK mvs@
2022-09-05	Use shared netlock in soreceive(). The UDP and IP divert layer	Alexander Bluhm
	provide locking of the PCB. If that is possible, use shared instead of exclusive netlock in soreceive(). The PCB mutex provides a per socket lock against multiple soreceive() running in parallel. Release and regrab both locks in sosleep_nsec(). OK mvs@
2022-08-13	Introduce the pru_() wrappers for corresponding (pr_usrreq)() calls.	Vitaliy Makkoveev
	This is helpful for the following (pr_usrreq)() split to multiple handlers. But right now this makes code more readable. Also add '#ifndef _SYS_SOCKETVAR_H_' to sys/socketvar.h. This prevents the collisions when both sys/protosw.h and sys/socketvar.h are included together. Both 'socket' and 'protosw' structures are required to be defined before pru_() wrappers, so we need to include sys/socketvar.h to sys/protosw.h. ok bluhm@
2022-07-25	Replace selwakeup() with KNOTE() in socket event activation	Visa Hankala
	Let's try this again now that the kernel locking issue in nfsrv_rcv() has been fixed. The previous attempt of the conversion triggered hangs on NFS servers. This was probably caused by the removal of the kernel-locked section just prior to the socket upcall. The section had masked a locking error in NFS code.
2022-07-01	Make fine grained unix(4) domain sockets locking. Use the per-socket	Vitaliy Makkoveev
	`so_lock' rwlock(9) instead of global `unp_lock' which locks the whole layer. The PCB of unix(4) sockets are linked to each other and we need to lock them both. This introduces the lock ordering problem, because when the thread (1) keeps lock on `so1' and trying to lock `so2', the thread (2) could hold lock on `so2' and trying to lock `so1'. To solve this we always lock sockets in the strict order. For the sockets which are already accessible from userland, we always lock socket with the smallest memory address first. Sometimes we need to unlock socket before lock it's peer and lock it again. We use reference counters for prevent the connected peer destruction during to relock. We also handle the case where the peer socket was replaced by another socket. For the newly connected sockets, which are not yet exported to the userland by accept(2), we always lock the listening socket `head' first. This allows us to avoid unwanted relock within accept(2) syscall. ok claudio@
2022-06-26	Remove unused VOP_POLL().	Visa Hankala
	OK mpi@
2022-06-06	Simplify solock() and sounlock(). There is no reason to return a value	Claudio Jeker
	for the lock operation and to pass a value to the unlock operation. sofree() still needs an extra flag to know if sounlock() should be called or not. But sofree() is called less often and mostly without keeping the lock. OK mpi@ mvs@
2022-05-09	Revert "Replace selwakeup() with KNOTE() in pipe and socket event activation."	Visa Hankala
	The commit caused hangs with NFS. Reported by ajacoutot@ and naddy@
2022-05-06	Replace selwakeup() with KNOTE() in pipe and socket event activation.	Visa Hankala
	OK mpi@
2022-02-25	Reported-by: syzbot+1b5b209ce506db4d411d@syzkaller.appspotmail.com	Philip Guenther
	Revert the pr_usrreqs move: syzkaller found a NULL pointer deref and I won't be available to monitor for followup issues for a bit
2022-02-25	Move pr_attach and pr_detach to a new structure pr_usrreqs that can	Philip Guenther
	then be shared among protosw structures, following the same basic direction as NetBSD and FreeBSD for this. Split PRU_CONTROL out of pr_usrreq into pru_control, giving it the proper prototype to eliminate the previously necessary casts. ok mvs@ bluhm@
2022-02-21	expliclitly -> explicitly	Jonathan Gray

2022-02-14	update sbchecklowmem() to better detect actual mbuf memory usage.	David Gwynne
	previously sbchecklowmem() (and sonewconn()) would look at the mbuf and mbuf cluster pools to see if they were approaching their hard limits. based on how many mbufs/clusters were allocated against the limits, socket operations would start to fail with ENOBUFS until utilisation went down. mbufs and clusters have changed a lot since then though. there are now many mbuf cluster pools, not just one for 2k clusters. because of this the mbuf layer now limits the amount of memory all the mbuf pools can allocate backend pages from rather than limit the individual pools. this means sbchecklowmem() ends up looking at the default pool hard limit, which is UINT_MAX, which in turn means means sbchecklowmem() probably never applies backpressure. this is made worse on multiprocessor systems where per cpu caches of mbuf and cluster pool items are enabled because the number of in use pool items is distorted by the cpu caches. this switches sbchecklowmem to looking at the page allocations made by all the pools instead. the big benefit of this is that the page allocations are much more representative of the overall mbuf memory usage in the system. the downside is is that the backend page allocation accounting does not see idle memory held by pools. pools cannot release partially free pages to the page backend (obviously), and pools cache idle items to avoid thrashing on the backend page allocator. this means the page allocation level is higher than the memory used by actual in-flight mbufs. however, this can also be a benefit. the backend page allocation is a kind of smoothed out "trend" line. mbuf utilisation over short periods can be extremely bursty because of things like rx ring dequeue and fill cycles, or large socket sends. if you're trying to grow socket buffers while these things are happening, luck becomes an important factor in whether it will work or not. because pools cache idle items, the backend page utilisation better represents the overall trend of activity in the system and will give more consistent behaviour here. this diff is deliberately simple. we're basically going from "no limits" to "some sort of limit" for sockets again, so keeping the code simple means it should be easy to understand and tweak in the future. ok djm@ visa@ claudio@
2021-11-06	Allocate socket and initialize so_lock in one place	Visa Hankala
	This makes witness(4) use a single lock type for tracking so_lock. Previously, so_lock was covered by two distinct lock types because there were separate rw_init() initializers in socreate() and sonewconn(). OK kettenis@
2021-10-27	Replace 'DIAGNOSTIC' block within soqinsque() by KASSERT(9).	Vitaliy Makkoveev
	ok sashan@
2021-10-24	Set klist lock for sockets to make socket event filters MP-safe	Visa Hankala
	The filterops instances already provide f_modify and f_process callbacks with proper internal locking. Locking of socket klists has been the missing detail for MP-safety. OK mpi@
2021-07-26	Pass a socket pointer to various socket buffer routines in preparation for	Martin Pieuchot
	per-socket locking. No functional change.
2021-07-25	Kill unused sbinsertoob().	Martin Pieuchot
	ok mvs@
2021-06-07	Kill SS_ASYNC and only check SB_ASYNC when async signals are wanted.	Martin Pieuchot
	This socket flag was redundant with the socket buffer one. ok mvs@
2021-05-26	Use `so_lock' to protect key management (PF_KEY) sockets. This can be	mvs
	done because we have no cases where one thread should lock two sockets simultaneously. tested by yasuoka@ ok bluhm@ markus@
2021-05-01	Implement per-socket `so_lock' rwlock(9) and use it to protect routing	mvs
	(PF_ROUTE) sockets. This can be done because we have no cases where one thread should lock two sockets simultaneously. Against the previous version rtm_senddesync_timer() execution was moved to process context. Also this time `so_lock' used for routing sockets only but in the future it will be used to other socket types too. tested by claudio@ ok claudio@ bluhm@
2021-04-26	Revert per-socket `so_lock' rwlock(9) and use it to protect routing	Claudio Jeker
	(PF_ROUTE) sockets. There is a locking issue with timeouts that needs to be fixed. Requested by deraadt@
2021-04-25	Implement per-socket `so_lock' rwlock(9) and use it to protect routing	mvs
	(PF_ROUTE) sockets. This can be done because we have no cases where one thread should lock two sockets simultaneously. Also this time `so_lock 'used for routing sockets only but in the future it will be used to other socket types too. ok bluhm@
2021-02-11	sbdrop(): use NULL instead of 0 in pointer assignment	mvs
	ok bluhm@
2021-02-10	Move UNIX domain sockets out of kernel lock. The new `unp_lock' rwlock(9)	mvs
	used as solock()'s backend to protect the whole layer. With feedback from mpi@. ok bluhm@ claudio@
2020-04-11	Add soassertlocked() checks to sbappend() and sbappendaddr(). This brings	Claudio Jeker
	them in line with sbappendstream() and sbappendrecord(). Agreed by mpi@
2020-02-14	Push the KERNEL_LOCK() insidge pgsigio() and selwakeup().	Martin Pieuchot
	The 3 subsystems: signal, poll/select and kqueue can now be addressed separatly. Note that bpf(4) and audio(4) currently delay the wakeups to a separate context in order to respect the KERNEL_LOCK() requirement. Sockets (UDP, TCP) and pipes spin to grab the lock for the sames reasons. ok anton@, visa@
2020-01-15	Keep socket timeout intervals in nsecs and use them with tsleep_nsec(9).	Martin Pieuchot
	Introduce and use TIMEVAL_TO_NSEC() to convert SO_RCVTIMEO/SO_SNDTIMEO specified values into nanoseconds. As a side effect it is now possible to specify a timeout larger that (USHRT_MAX / 100) seconds. To keep code simple `so_linger' now represents a number of seconds with 0 meaning no timeout or 'infinity'. Yes, the 0 -> INFSLP API change makes conversions complicated as many timeout holders are still memset()'d. Inputs from cheloha@ and bluhm@, ok bluhm@
2019-04-16	Use the actual cluster size instead of fixed MCLBYTES for the	YASUOKA Masahiko
	condition in sb_compress(). Currently the actual cluster size might be 9KB even if the mtu is 1500, in this case a lot of memory space had been wasted, since sbcompress() doesn't compress because of previous condition. ok dlg claudio
2019-02-15	let sbcreatecontrol take a const void * instead of a caddr_t.	David Gwynne
	this makes it easier to call since you don't have to cast to caddr_t if it's a void *. this also changes a size argument from int to size_t. ok claudio@
2018-11-19	Utilize sigio with sockets.	Visa Hankala
	OK mpi@
2018-11-09	M_LEADINGSPACE() and M_TRAILINGSPACE() are just wrappers for	Claudio Jeker
	m_leadingspace() and m_trailingspace(). Convert all callers to call directly the functions and remove the defines. OK krw@, mpi@
2018-10-29	Now that most archs have better NMBCLUSTERS defaults it is possible to bring	Claudio Jeker
	back rev 1.90. ---- mbufs and mbuf clusters are now backed by large pools. Because of this we can relax the oversubscribe limit of socketbuffers a fair bit. Instead of maxing out as sb_max * 1.125 or 2 * sb_hiwat the maximum is increased to 8 * sb_hiwat -- which seems to be a good compromise between memory waste and better socket buffer usage. OK deraadt@ ---- ok benno@
2018-07-10	After removing raw_usrreq() from route and pfkey, the global sockaddr	Alexander Bluhm
	variables can be delared constant. OK claudio@ mpi@
2018-06-11	Do not unlock the KERNEL_LOCK() unconditionally in sounlock().	Martin Pieuchot
	Instead introduce two flags to deal with global lock recursion. This is necessary until we get per-socket lock. Req. by and ok visa@
2018-06-06	Pass the socket to sounlock(), this prepare the terrain for per-socket	Martin Pieuchot
	locking. ok visa@, bluhm@
2018-05-07	Grab the KERNEL_LOCK() for unix/routing/pfkey sockets in solock()...	Martin Pieuchot
	...and release it in sounlock(). This will allows us to progressively remove the KERNEL_LOCK() in syscalls. ok visa@ some time ago
2018-04-08	AF_LOCAL was a failed attempt (by POSIX?) to seem less UNIX-specific, but	Philip Guenther
	AF_UNIX is both the historical _and_ standard name, so prefer and recommend it in the headers, manpages, and kernel. ok miller@ deraadt@ schwarze@
2018-02-18	Revert previous. It triggers mbuf pool exhaustion on arm64.	Mark Kettenis
	Requested by claudio@
2018-02-10	mbufs and mbuf clusters are now backed by large pools. Because of this	Claudio Jeker
	we can relax the oversubscribe limit of socketbuffers a fair bit. Instead of maxing out as sb_max * 1.125 or 2 * sb_hiwat the maximum is increased to 8 * sb_hiwat -- which seems to be a good compromise between memory waste and better socket buffer usage. OK deraadt@
2017-12-30	Delete unnecessary <sys/file.h> includes	Philip Guenther
	ok millert@ krw@
2017-12-10	Move SB_SPLICE, SB_WAIT and SB_SEL to `sb_flags', serialized by solock().	Martin Pieuchot
	SB_KNOTE remains the only bit set on `sb_flagsintr' as it is set/unset in contexts related to kqueue(2) where we'd like to avoid grabbing solock(). While here add some KERNEL_LOCK()/UNLOCK() dances around selwakeup() and csignal() to mark which remaining functions need to be addressed in the socket layer. ok visa@, bluhm@
2017-11-23	We want `sb_flags' to be protected by the socket lock rather than the	Martin Pieuchot
	KERNEL_LOCK(), so change asserts accordingly. This is now possible since sblock()/sbunlock() are always called with the socket lock held. ok bluhm@, visa@
2017-08-11	Remove NET_LOCK()'s argument.	Martin Pieuchot
	Tested by Hrvoje Popovski, ok bluhm@
2017-07-27	Assert that the KERNEL_LOCK() is held prior to call csignal() and	Martin Pieuchot
	selwakeup(). ok bluhm@
2017-07-18	soreserve() modifies `so_snd' and `so_rcv' so asserts that it is called	Martin Pieuchot
	with the socket lock. This change is safe because sbreserve() already asserts that the lock is held, but it acts as implicit documentation and indicates that I looked at the function.
2017-07-04	Always hold the socket lock when calling sblock().	Martin Pieuchot
	Implicitely protects `so_state' with the socket lock in sosend(). ok visa@, bluhm@
2017-07-04	Assert that the socket lock is held when `so_state' is modified.	Martin Pieuchot
	ok bluhm@, visa@
2017-07-04	Assert that the socket lock is held when `so_qlen' is modified.	Martin Pieuchot
	ok bluhm@, visa@
2017-06-27	Add missing solock()/sounlock() dances around sbreserve().	Martin Pieuchot
	While here document an abuse of parent socket's lock. Problem reported by krw@, analysis and ok bluhm@