src - OpenBSD base system

Age	Commit message (Collapse)	Author
2024-02-13	Merge struct route and struct route_in6.	Alexander Bluhm
	Use a common struct route for both inet and inet6. Unfortunately struct sockaddr is shorter than sockaddr_in6, so netinet/in.h has to be exposed from net/route.h. Struct route has to be bsd visible for userland as netstat kvm code inspects inp_route. Internet PCB and TCP SYN cache can use a plain struct route now. All specific sockaddr types for inet and inet6 are embeded there. OK claudio@
2024-02-11	Use `sb_mtx' instead of `inp_mtx' in receive path for inet sockets.	Vitaliy Makkoveev
	In soreceve(), we only touch `so_rcv' socket buffer, which has it's own `sb_mtx' mutex(9) for protection. So, we can avoid solock() in this path - it's enough to hold `sb_mtx' in soreceive() and around corresponding sbappend*(). But not right now :) This time we use shared netlock for some inet sockets in the soreceive() path. To protect `so_rcv' buffer we use `inp_mtx' mutex(9) and the pru_lock() to acquire this mutex(9) in socket layer. But the `inp_mtx' mutex belongs to the PCB. We initialize socket before PCB, tcp(4) sockets could exist without PCB, so use `sb_mtx' mutex(9) to protect sockbuf stuff. This diff mechanically replaces `inp_mtx' by `sb_mtx' in the receive path. Only for sockets which already use `inp_mtx'. All other sockets left as is. They will be converted later. Since the `sb_mtx' is optional, the new SB_MTXLOCK flag introduced. If this flag is set on `sb_flags', the `sb_mtx' mutex(9) should be taken. New sb_mtx_lock() and sb_mtx_unlock() was introduced to hide this check. They are temporary and will be replaced by mtx_enter() when all this area will be converted to `sb_mtx' mutex(9). Also, the new sbmtxassertlocked() function introduced to throw corresponding assertion for SB_MTXLOCK marked buffers. This time only sbappendaddr() calls it. This function is also temporary and will be replaced by MTX_ASSERT_LOCKED() later. ok bluhm
2024-02-03	Rework socket buffers locking for shared netlock.	Vitaliy Makkoveev
	Shared netlock is not sufficient to call so{r,w}wakeup(). The following sowakeup() modifies `sb_flags' and knote(9) stuff. Unfortunately, we can't call so{r,w}wakeup() with `inp_mtx' mutex(9) because sowakeup() also calls pgsigio() which grabs kernel lock. However, `so_filtops' callbacks only perform read-only access to the socket stuff, so it is enough to hold shared netlock only, but the klist stuff needs to be protected. This diff introduces `sb_mtx' mutex(9) to protect sockbuf. This time `sb_mtx' used to protect only `sb_flags' and `sb_klist'. Now we have soassertlocked_readonly() and soassertlocked(). The first one is happy if only shared netlock is held, meanwhile the second wants `so_lock' or pru_lock() be held together with shared netlock. To keep soassertlocked() assertions soft, we need to know mutex(9) state, so new mtx_owned() macro was introduces. Also, the new optional (*pru_locked)() handler brings the state of pru_lock(). Tests and ok from bluhm.
2024-01-21	Assert that inpcb table has correct address family.	Alexander Bluhm
	Since inpcb tables for UDP and Raw IP have been split into IPv4 and IPv6, assert that INP_IPV6 flag is correct instead of checking it. While there, give the table variable a nicer name. OK sashan@ mvs@
2023-12-15	Use inpcb table mutex to set addresses.	Alexander Bluhm
	Protect all remaining write access to inp_faddr and inp_laddr with inpcb table mutex. Document inpcb locking for foreign and local address and port and routing table id. Reading will be made MP safe by adding per socket rw-locks in a next step. OK sashan@ mvs@
2023-12-03	Rename all in6p local variables to inp.	Alexander Bluhm
	There exists no struct in6pcb in OpenBSD, this was an old kame idea. Calling the local variable in6p does not make sense, it is actually a struct inpcb. Also in6p is not used consistently in inet6 code. Having the same convention for IPv4 and IPv6 is less confusing. OK sashan@ mvs@
2023-12-01	Make internet PCB connect more consistent.	Alexander Bluhm
	The public interface is in_pcbconnect(). It dispatches to in6_pcbconnect() if necessary. Call the former from tcp_connect() and udp_connect(). In in6_pcbconnect() initialization in6a = NULL is not necessary. in6_pcbselsrc() sets the pointer, but does not read the value. Pass a constant in6_addr pointer to in6_pcbselsrc() and in6_selectsrc(). It returns a reference to the address of some internal data structure. We want to be sure that in6_addr is not modified this way. IPv4 in_pcbselsrc() solves this by passing a copy of the address. OK kn@ sashan@ mvs@
2023-11-28	Remove struct inpcb from in6_embedscope() parameters.	Alexander Bluhm
	rip6_output() did modify inp_outputopts6 temporarily to provide different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6 and inp_moptions6 as separate arguments to in6_embedscope(). Simplify the code that deals with these options in in6_embedscope(). Doucument inp_moptions and inp_moptions6 as protected by net lock. OK kn@
2023-11-26	Remove inp parameter from ip_output().	Alexander Bluhm
	ip_output() received inp as parameter. This is only used to lookup the IPsec level of the socket. Reasoning about MP locking is much easier if only relevant data is passed around. Convert ip_output() to receive constant inp_seclevel as argument and mark it as protected by net lock. OK mvs@
2023-09-16	Allow counters_read(9) to take an optional scratch buffer.	Martin Pieuchot
	Using a scratch buffer makes it possible to take a consistent snapshot of per-CPU counters without having to allocate memory. Makes ddb(4) show uvmexp command work in OOM situations. ok kn@, mvs@, cheloha@
2023-01-22	Move SS_CANTRCVMORE and SS_RCVATMARK bits from `so_state' to `sb_state' of	Vitaliy Makkoveev
	receive buffer. As it was done for SS_CANTSENDMORE bit, the definition kept as is, but now these bits belongs to the `sb_state' of receive buffer. `sb_state' ored with `so_state' when socket data exporting to the userland. ok bluhm@
2022-10-17	Change pru_abort() return type to the type of void and make pru_abort()	Vitaliy Makkoveev
	optional. We have no interest on pru_abort() return value. We call it only from soabort() which is dummy pru_abort() wrapper and has no return value. Only the connection oriented sockets need to implement (*pru_abort)() handler. Such sockets are tcp(4) and unix(4) sockets, so remove existing code for all others, it doesn't called. ok guenther@
2022-10-03	System calls should not fail due to temporary memory shortage in	Alexander Bluhm
	malloc(9) or pool_get(9). Pass down a wait flag to pru_attach(). During syscall socket(2) it is ok to wait, this logic was missing for internet pcb. Pfkey and route sockets were already waiting. sonewconn() must not wait when called during TCP 3-way handshake. This logic has been preserved. Unix domain stream socket connect(2) can wait until the other side has created the socket to accept. OK mvs@
2022-09-13	Do soreceive() with shared netlock for raw sockets.	Vitaliy Makkoveev
	ok bluhm@
2022-09-03	Move PRU_PEERADDR request to (*pru_peeraddr)().	Vitaliy Makkoveev
	Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets, except tcp(4) case. Also remove *_usrreq() handlers. ok bluhm@
2022-09-03	Move PRU_SOCKADDR request to (*pru_sockaddr)()	Vitaliy Makkoveev
	Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4) inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability. The key management and route domain sockets returns EINVAL error for PRU_SOCKADDR request, so keep this behaviour for a while instead of make pru_sockaddr handler optional and return EOPNOTSUPP. ok bluhm@
2022-09-02	Move PRU_CONTROL request to (*pru_control)().	Vitaliy Makkoveev
	The 'proc *' arg is not used for PRU_CONTROL request, so remove it from pru_control() wrapper. Split out {tcp,udp}6_usrreqs from {tcp,udp}_usrreqs and use them for inet6 case. ok guenther@ bluhm@
2022-09-01	Move PRU_CONNECT2 request to (*pru_connect2)().	Vitaliy Makkoveev
	ok bluhm@
2022-08-31	Move PRU_SENDOOB request to (*pru_sendoob)().	Vitaliy Makkoveev
	PRU_SENDOOB request always consumes passed `top' and `control' mbufs. To avoid dummy m_freem(9) handlers for all protocols release passed mbufs in the pru_sendoob() EOPNOTSUPP error path. Also fix `control' mbuf(9) leak in the tcp(4) PRU_SENDOOB error path. ok bluhm@
2022-08-30	Refactor internet PCB lookup function. Rename in_pcbhashlookup()	Alexander Bluhm
	so the public API is in_pcblookup() and in_pcblookup_listen(). For internal use introduce in_pcbhash_insert() and in_pcbhash_lookup() to avoid code duplication. Routing domain is unsigned, change the type to u_int. OK mvs@
2022-08-29	Move PRU_RCVOOB request to (*pru_rcvoob)().	Vitaliy Makkoveev
	ok bluhm@
2022-08-28	Move PRU_SENSE request to (*pru_sense)().	Vitaliy Makkoveev
	ok bluhm@
2022-08-28	Move PRU_ABORT request to (*pru_abort)().	Vitaliy Makkoveev
	We abort only the sockets which are linked to `so_q' or `so_q0' queues of listening socket. Such sockets have no corresponding file descriptor and are not accessed from userland, so PRU_ABORT used to destroy them on listening socket destruction. Currently all our sockets support PRU_ABORT request, but actually it required only for tcp(4) and unix(4) sockets, so i should be optional. However, they will be removed with separate diff, and this time PRU_ABORT requests were converted as is. Also, the socket should be destroyed on PRU_ABORT request, but route and key management sockets leave it alive. This was also converted as is, because this wrong code never called. ok bluhm@
2022-08-27	Move PRU_SEND request to (*pru_send)().	Vitaliy Makkoveev
	The former PRU_SEND error path of gre_usrreq() had `control' mbuf(9) leak. It was fixed in new gre_send(). The former pfkeyv2_send() was renamed to pfkeyv2_dosend(). ok bluhm@
2022-08-26	Move PRU_RCVD request to (*pru_rcvd)().	Vitaliy Makkoveev
	ok bluhm@
2022-08-22	Move PRU_SHUTDOWN request to (*pru_shutdown)().	Vitaliy Makkoveev
	ok bluhm@
2022-08-22	Move PRU_DISCONNECT request to (*pru_disconnect).	Vitaliy Makkoveev
	ok bluhm@
2022-08-22	Use rwlock per inpcb table to protect notify list. The notify	Alexander Bluhm
	function may sleep, so holding a mutex is not possible. The same list entry and rwlock is used for UDP multicast and raw IP delivery. By adding a write lock, exclusive netlock is no longer necessary for PCB notify and UDP and raw IP input. OK mvs@
2022-08-22	Move PRU_ACCEPT request to (*pru_accept)().	Vitaliy Makkoveev
	ok bluhm@
2022-08-21	Move PRU_CONNECT request to (*pru_connect)() handler.	Vitaliy Makkoveev
	ok bluhm@
2022-08-21	Move PRU_LISTEN request to (*pru_listen)() handler.	Vitaliy Makkoveev
	ok bluhm@
2022-08-20	Move PRU_BIND request to (*pru_bind)() handler.	Vitaliy Makkoveev
	For the protocols which don't support request, leave handler NULL. Do the NULL check within corresponding pru_() wrapper and return EOPNOTSUPP in such case. This will be done for all upcoming user request handlers. ok bluhm@ guenther@
2022-08-15	Introduce 'pr_usrreqs' structure and move existing user-protocol	Vitaliy Makkoveev
	handlers into it. We want to split existing (pr_usrreq)() to multiple short handlers for each PRU_ request as it was already done for PRU_ATTACH and PRU_DETACH. This is the preparation step, (pr_usrreq)() split will be done with the following diffs. Based on reverted diff from guenther@. ok bluhm@
2022-08-08	To make protocol input functions MP safe, internet PCB need protection.	Alexander Bluhm
	Use their reference counter in more places. The in_pcb lookup functions hold the PCBs in hash tables protected by table->inpt_mtx mutex. Whenever a result is returned, increment the ref count before releasing the mutex. Then the inp can be used as long as neccessary. Unref it at the end of all functions that call in_pcb lookup. As a shortcut, pf may also hold a reference to the PCB. When pf_inp_lookup() returns it, it also incements the ref count and the caller can handle it like the inp from table lookup. OK sashan@
2022-08-06	Clean up the netlock macros. Merge NET_RLOCK_IN_SOFTNET and	Alexander Bluhm
	NET_RLOCK_IN_IOCTL, which have the same implementation. The R and W are hard to see, call the new macro NET_LOCK_SHARED. Rename the opposite assertion from NET_ASSERT_WLOCKED to NET_ASSERT_LOCKED_EXCLUSIVE. Update some outdated comments about net locking. OK mpi@ mvs@
2022-03-23	For raw IPv6 packets rip6_input() traverses the loop of all PCBs.	Alexander Bluhm
	From there it calls sbappendaddr() while holding the raw6 table mutex. This ends in sorwakeup() where we finally grab the kernel lock while holding a mutex. Witness detects this misuse. Use the same solution as for PCB notify. Collect the affected PCBs in a temporary list. The list is protected by exclusive net lock. Reported-by: syzbot+5b2679ee9be0895d26f9@syzkaller.appspotmail.com OK claudio@
2022-03-22	Extract the type from the ICMP6 header before looping over Raw IPv6	Alexander Bluhm
	PCBs. This make mutex and error handling easier. OK claudio@
2022-03-21	Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutex	Alexander Bluhm
	for PCB tables. It does not break userland build anymore. pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To run pf in parallel, make parts of the stack MP safe. Protect the list and hashes in the PCB tables with a mutex. Note that the protocol notify functions may call pf via tcp_output(). As the pf lock is a sleeping rw_lock, we must not hold a mutex. To solve this for now, collect these PCBs in inp_notify list and protect it with exclusive netlock. OK sashan@
2022-03-14	Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ul	Theo Buehler
	This reverts the commit protecting the list and hashes in the PCB tables with a mutex since the build of sysctl(8) breaks, as found by kettenis. ok sthen
2022-03-14	pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To	Alexander Bluhm
	run pf in parallel, make parts of the stack MP safe. Protect the list and hashes in the PCB tables with a mutex. Note that the protocol notify functions may call pf via tcp_output(). As the pf lock is a sleeping rw_lock, we must not hold a mutex. To solve this for now, collect these PCBs in inp_notify list and protect it with exclusive netlock. OK sashan@
2022-03-02	The return value of in6_pcbnotify() is never used. Make it a void	Alexander Bluhm
	function. OK gnezdo@ mvs@ florian@ sashan@
2022-02-25	Reported-by: syzbot+1b5b209ce506db4d411d@syzkaller.appspotmail.com	Philip Guenther
	Revert the pr_usrreqs move: syzkaller found a NULL pointer deref and I won't be available to monitor for followup issues for a bit
2022-02-25	Move pr_attach and pr_detach to a new structure pr_usrreqs that can	Philip Guenther
	then be shared among protosw structures, following the same basic direction as NetBSD and FreeBSD for this. Split PRU_CONTROL out of pr_usrreq into pru_control, giving it the proper prototype to eliminate the previously necessary casts. ok mvs@ bluhm@
2022-02-21	futther -> further	Jonathan Gray

2020-08-02	Add missing rtable(4) check in rip6_input()	kn
	Copied over from sys/netinet/raw_ip.c:rip_input() where it appeared with initial support for multiple routing tables. This enforces separation between multiple raw sockets in different routing tables, i.e. one must not see packets from the other if the rtable differs. Observed with ping6(8)'s "-v" showing all ICMPv6 packets on its raw socket including those produced by another ping6 with "-V1". florian reported IPv6 route advertisments in one routing table appearing on raw sockets in other routing tables as well. OK claudio florian
2019-11-29	add __func__ to panic() and printf() calls in sys/netinet6/*	Nayden Markatchev
	ok benno@ mortimer@
2019-04-23	For raw IPv6 sockets userland may specify an offset where the	Alexander Bluhm
	checksum field is located. During rip6 input and output make sure that this field is within the packet. The offset my be -1 to disable the feature, otherwise it must be non-negative and aligned. Do a stricter check during setsockopt(2). from FreeBSD; OK claudio@
2019-04-20	Statistics of "netstat -s -f inet6 -p rip6" did not work. In	Alexander Bluhm
	rip6_sysctl_rip6stat() copy out rip6counters, not ip6counters. OK deraadt@ claudio@
2019-02-04	Avoid an mbuf double free in the oob soreceive() path. In the	Alexander Bluhm
	usrreq functions move the mbuf m_freem() logic to the release block instead of distributing it over the switch statement. Then the goto release in the initial check, whether the pcb still exists, will not free the mbuf for the PRU_RCVD, PRU_RVCOOB, PRU_SENSE command. OK claudio@ mpi@ visa@ Reported-by: syzbot+8e7997d4036ae523c79c@syzkaller.appspotmail.com
2018-11-09	Remove the last few XXX rdomain markers. Even those functions respect the	Claudio Jeker
	rdomain now and are therefor rdomain save. OK mpi@