src - OpenBSD base system

Age	Commit message (Collapse)	Author
2024-08-16	Introduce PR_MPSYSCTL flag to mark mp-safe (*pr_sysctl)() handlers and	Vitaliy Makkoveev
	unlock both divert_sysctl() and divert6_sysctl(). Unlock them together, because they are identical and pretty simple: - DIVERTCTL_RECVSPACE and DIVERTCTL_SENDSPACE - atomically accessed integers; - DIVERTCTL_STATS - per-CPU counters; ok bluhm
2024-07-12	Remove internet PCB mutex.	Alexander Bluhm
	All incpb locking has been converted to socket receive buffer mutex. Per PCB mutex inp_mtx is not needed anymore. Also delete PRU related locking functions. A flag PR_MPSOCKET indicates whether protocol functions support parallel access with per socket rw-lock. TCP is the only protocol that is not MP capable from the socket layer and needs exclusive netlock. OK mvs@
2024-03-05	Validate IPv4 packet options in divert output.	Alexander Bluhm
	When sending raw packets over divert socket, IP options were not validated. Fragment code tries to copy them and crashes. Raw IP output has a similar feature, but uses rip_chkhdr() to prevent invalid packets from userland. Call this funtion also from divert_output() for strict user input validation. Reported-by: syzbot+b1ba3a2a8ef13e5b4698@syzkaller.appspotmail.com OK dlg@ deraadt@ mvs@
2024-02-11	Use `sb_mtx' instead of `inp_mtx' in receive path for inet sockets.	Vitaliy Makkoveev
	In soreceve(), we only touch `so_rcv' socket buffer, which has it's own `sb_mtx' mutex(9) for protection. So, we can avoid solock() in this path - it's enough to hold `sb_mtx' in soreceive() and around corresponding sbappend*(). But not right now :) This time we use shared netlock for some inet sockets in the soreceive() path. To protect `so_rcv' buffer we use `inp_mtx' mutex(9) and the pru_lock() to acquire this mutex(9) in socket layer. But the `inp_mtx' mutex belongs to the PCB. We initialize socket before PCB, tcp(4) sockets could exist without PCB, so use `sb_mtx' mutex(9) to protect sockbuf stuff. This diff mechanically replaces `inp_mtx' by `sb_mtx' in the receive path. Only for sockets which already use `inp_mtx'. All other sockets left as is. They will be converted later. Since the `sb_mtx' is optional, the new SB_MTXLOCK flag introduced. If this flag is set on `sb_flags', the `sb_mtx' mutex(9) should be taken. New sb_mtx_lock() and sb_mtx_unlock() was introduced to hide this check. They are temporary and will be replaced by mtx_enter() when all this area will be converted to `sb_mtx' mutex(9). Also, the new sbmtxassertlocked() function introduced to throw corresponding assertion for SB_MTXLOCK marked buffers. This time only sbappendaddr() calls it. This function is also temporary and will be replaced by MTX_ASSERT_LOCKED() later. ok bluhm
2024-02-03	Rework socket buffers locking for shared netlock.	Vitaliy Makkoveev
	Shared netlock is not sufficient to call so{r,w}wakeup(). The following sowakeup() modifies `sb_flags' and knote(9) stuff. Unfortunately, we can't call so{r,w}wakeup() with `inp_mtx' mutex(9) because sowakeup() also calls pgsigio() which grabs kernel lock. However, `so_filtops' callbacks only perform read-only access to the socket stuff, so it is enough to hold shared netlock only, but the klist stuff needs to be protected. This diff introduces `sb_mtx' mutex(9) to protect sockbuf. This time `sb_mtx' used to protect only `sb_flags' and `sb_klist'. Now we have soassertlocked_readonly() and soassertlocked(). The first one is happy if only shared netlock is held, meanwhile the second wants `so_lock' or pru_lock() be held together with shared netlock. To keep soassertlocked() assertions soft, we need to know mutex(9) state, so new mtx_owned() macro was introduces. Also, the new optional (*pru_locked)() handler brings the state of pru_lock(). Tests and ok from bluhm.
2023-09-16	Allow counters_read(9) to take an optional scratch buffer.	Martin Pieuchot
	Using a scratch buffer makes it possible to take a consistent snapshot of per-CPU counters without having to allocate memory. Makes ddb(4) show uvmexp command work in OOM situations. ok kn@, mvs@, cheloha@
2023-05-13	Instead of implementing IPv4 header checksum creation everywhere,	Alexander Bluhm
	introduce in_hdr_cksum_out(). It is used like in_proto_cksum_out(). OK claudio@
2023-04-04	When sending IP packets to userland with divert-packet rules, the	Alexander Bluhm
	checksum may be wrong. Locally generated packets diverted by pf out rules may have no checksum due to to hardware offloading. Calculate the checksum in that case. OK mvs@ sashan@
2022-10-17	Change pru_abort() return type to the type of void and make pru_abort()	Vitaliy Makkoveev
	optional. We have no interest on pru_abort() return value. We call it only from soabort() which is dummy pru_abort() wrapper and has no return value. Only the connection oriented sockets need to implement (*pru_abort)() handler. Such sockets are tcp(4) and unix(4) sockets, so remove existing code for all others, it doesn't called. ok guenther@
2022-10-03	System calls should not fail due to temporary memory shortage in	Alexander Bluhm
	malloc(9) or pool_get(9). Pass down a wait flag to pru_attach(). During syscall socket(2) it is ok to wait, this logic was missing for internet pcb. Pfkey and route sockets were already waiting. sonewconn() must not wait when called during TCP 3-way handshake. This logic has been preserved. Unix domain stream socket connect(2) can wait until the other side has created the socket to accept. OK mvs@
2022-09-05	Use shared netlock in soreceive(). The UDP and IP divert layer	Alexander Bluhm
	provide locking of the PCB. If that is possible, use shared instead of exclusive netlock in soreceive(). The PCB mutex provides a per socket lock against multiple soreceive() running in parallel. Release and regrab both locks in sosleep_nsec(). OK mvs@
2022-09-03	Move PRU_PEERADDR request to (*pru_peeraddr)().	Vitaliy Makkoveev
	Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets, except tcp(4) case. Also remove *_usrreq() handlers. ok bluhm@
2022-09-03	Move PRU_SOCKADDR request to (*pru_sockaddr)()	Vitaliy Makkoveev
	Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4) inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability. The key management and route domain sockets returns EINVAL error for PRU_SOCKADDR request, so keep this behaviour for a while instead of make pru_sockaddr handler optional and return EOPNOTSUPP. ok bluhm@
2022-09-02	Move PRU_CONTROL request to (*pru_control)().	Vitaliy Makkoveev
	The 'proc *' arg is not used for PRU_CONTROL request, so remove it from pru_control() wrapper. Split out {tcp,udp}6_usrreqs from {tcp,udp}_usrreqs and use them for inet6 case. ok guenther@ bluhm@
2022-09-01	Move PRU_CONNECT2 request to (*pru_connect2)().	Vitaliy Makkoveev
	ok bluhm@
2022-08-31	Move PRU_SENDOOB request to (*pru_sendoob)().	Vitaliy Makkoveev
	PRU_SENDOOB request always consumes passed `top' and `control' mbufs. To avoid dummy m_freem(9) handlers for all protocols release passed mbufs in the pru_sendoob() EOPNOTSUPP error path. Also fix `control' mbuf(9) leak in the tcp(4) PRU_SENDOOB error path. ok bluhm@
2022-08-29	Move PRU_RCVOOB request to (*pru_rcvoob)().	Vitaliy Makkoveev
	ok bluhm@
2022-08-28	Move PRU_SENSE request to (*pru_sense)().	Vitaliy Makkoveev
	ok bluhm@
2022-08-28	Move PRU_ABORT request to (*pru_abort)().	Vitaliy Makkoveev
	We abort only the sockets which are linked to `so_q' or `so_q0' queues of listening socket. Such sockets have no corresponding file descriptor and are not accessed from userland, so PRU_ABORT used to destroy them on listening socket destruction. Currently all our sockets support PRU_ABORT request, but actually it required only for tcp(4) and unix(4) sockets, so i should be optional. However, they will be removed with separate diff, and this time PRU_ABORT requests were converted as is. Also, the socket should be destroyed on PRU_ABORT request, but route and key management sockets leave it alive. This was also converted as is, because this wrong code never called. ok bluhm@
2022-08-27	Move PRU_SEND request to (*pru_send)().	Vitaliy Makkoveev
	The former PRU_SEND error path of gre_usrreq() had `control' mbuf(9) leak. It was fixed in new gre_send(). The former pfkeyv2_send() was renamed to pfkeyv2_dosend(). ok bluhm@
2022-08-26	Move PRU_RCVD request to (*pru_rcvd)().	Vitaliy Makkoveev
	ok bluhm@
2022-08-22	Move PRU_SHUTDOWN request to (*pru_shutdown)().	Vitaliy Makkoveev
	ok bluhm@
2022-08-22	Move PRU_DISCONNECT request to (*pru_disconnect).	Vitaliy Makkoveev
	ok bluhm@
2022-08-22	Move PRU_ACCEPT request to (*pru_accept)().	Vitaliy Makkoveev
	ok bluhm@
2022-08-21	Move PRU_CONNECT request to (*pru_connect)() handler.	Vitaliy Makkoveev
	ok bluhm@
2022-08-21	Move PRU_LISTEN request to (*pru_listen)() handler.	Vitaliy Makkoveev
	ok bluhm@
2022-08-21	Introduce a mutex per inpcb to serialize access to socket receive	Alexander Bluhm
	buffer. Later it may be used to protect more of the PCB or socket. In divert input replace the kernel lock with this mutex. OK mvs@
2022-08-20	Move PRU_BIND request to (*pru_bind)() handler.	Vitaliy Makkoveev
	For the protocols which don't support request, leave handler NULL. Do the NULL check within corresponding pru_() wrapper and return EOPNOTSUPP in such case. This will be done for all upcoming user request handlers. ok bluhm@ guenther@
2022-08-15	Introduce 'pr_usrreqs' structure and move existing user-protocol	Vitaliy Makkoveev
	handlers into it. We want to split existing (pr_usrreq)() to multiple short handlers for each PRU_ request as it was already done for PRU_ATTACH and PRU_DETACH. This is the preparation step, (pr_usrreq)() split will be done with the following diffs. Based on reverted diff from guenther@. ok bluhm@
2022-05-09	Protect sbappendaddr() in divert_packet() with kernel lock. With	Alexander Bluhm
	divert-packet rules pf calls directly from IP layer to protocol layer. As the former has only shared net lock, additional protection against parallel access is needed. Kernel lock is a temporary workaround until the socket layer is MP safe. discussed with kettenis@ mvs@
2022-05-05	Clean up divert_packet(). Function does not return error, make it	Alexander Bluhm
	void. Introduce mutex and refcounting for inp like in the other PCB functions. OK sashan@
2022-02-25	Reported-by: syzbot+1b5b209ce506db4d411d@syzkaller.appspotmail.com	Philip Guenther
	Revert the pr_usrreqs move: syzkaller found a NULL pointer deref and I won't be available to monitor for followup issues for a bit
2022-02-25	Move pr_attach and pr_detach to a new structure pr_usrreqs that can	Philip Guenther
	then be shared among protosw structures, following the same basic direction as NetBSD and FreeBSD for this. Split PRU_CONTROL out of pr_usrreq into pru_control, giving it the proper prototype to eliminate the previously necessary casts. ok mvs@ bluhm@
2020-11-16	Remove the cases folded into sysctl_bounded_args but left behind	gnezdo
	divert_sysctl and divert6_sysctl get a tiny bit slimmer.
2020-08-24	Convert divert*_sysctl to sysctl_bounded_args	gnezdo
	OK sashan
2020-08-01	Move range check inside sysctl_int_arr	gnezdo
	Range violations are now consistently reported as EOPNOTSUPP. Previously they were mixed with ENOPROTOOPT. OK kn@
2019-02-04	Avoid an mbuf double free in the oob soreceive() path. In the	Alexander Bluhm
	usrreq functions move the mbuf m_freem() logic to the release block instead of distributing it over the switch statement. Then the goto release in the initial check, whether the pcb still exists, will not free the mbuf for the PRU_RCVD, PRU_RVCOOB, PRU_SENSE command. OK claudio@ mpi@ visa@ Reported-by: syzbot+8e7997d4036ae523c79c@syzkaller.appspotmail.com
2018-11-10	Do not translate the EACCES error from pf(4) to EHOSTUNREACH anymore.	Alexander Bluhm
	It also translated a documented send(2) EACCES case erroneously. This was too much magic and always prone to errors. from Jan Klemkow; man page jmc@; OK claudio@
2018-10-04	Revert the inpcb table mutex commit. It triggers a witness panic	Alexander Bluhm
	in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx is held and sorwakeup() is called within the loop. As sowakeup() grabs the kernel lock, we have a lock ordering problem. found by Hrvoje Popovski; OK deraadt@ mpi@
2018-09-20	As a step towards per inpcb or socket locks, remove the net lock	Alexander Bluhm
	for netstat -a. Introduce a global mutex that protects the tables and hashes for the internet PCBs. To detect detached PCB, set its inp_socket field to NULL. This has to be protected by a per PCB mutex. The protocol pointer has to be protected by the mutex as netstat uses it. Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify() before the table mutex to avoid lock ordering problems in the notify functions. OK visa@
2018-04-24	Push NET_LOCK down in the default ifioctl case.	Paul Irofti
	For the PRU_CONTROL bit the NET_LOCK surrounds in[6]_control() and on the ENOTSUPP case we guard the driver if_ioctl functions. OK mpi@
2017-11-02	Move PRU_DETACH out of pr_usrreq into per proto pr_detach	Florian Obser
	functions to pave way for more fine grained locking. Suggested by, comments & OK mpi
2017-10-09	Reduces the scope of the NET_LOCK() in sysctl(2) path.	Martin Pieuchot
	Exposes per-CPU counters to real parrallelism. ok visa@, bluhm@, jca@
2017-10-06	Unfortunately I removed too much in my previous commit and broke	Alexander Bluhm
	divert-packet. Bring back the loop over the global list to find the divert socket.
2017-10-06	Kill the divert-packet socket option IP_DIVERTFL to filter packets.	Alexander Bluhm
	It used a loop over the global list divbtable that would be hard to make MP safe. The port net/dnsfilter does not work without this, it should be converted to divert-to. Neither other ports nor base use this filter feature. ports checked by sthen@; OK mpi@ benno@
2017-09-06	Replace the call to ifa_ifwithaddr() in divert6_output() with a	Alexander Bluhm
	route lookup to make it MP safe. Only set the mbuf header fields that are needed. Validate the name input. Also use the same variables in IPv4 and IPv6 functions and avoid unneccessary initialization. OK mpi@
2017-09-06	Replace the call to ifa_ifwithaddr() in divert_output() with a route	Alexander Bluhm
	lookup to make it MP safe. Only set the mbuf header fields that are needed. Validate the name input. OK mpi@
2017-09-05	Replace NET_ASSERT_LOCKED() by soassertlocked() in *_usrreq().	Martin Pieuchot
	Not all of them need the NET_LOCK(). ok bluhm@
2017-07-27	Grab the KERNEL_LOCK() before calling sorwakeup().	Martin Pieuchot
	In the forwarding path, pf_test() is executed w/o KERNEL_LOCK() and in case of divert end up calling sowakup(). However selwakup() and csignal() are not yet ready to be executed w/o KERNEL_LOCK(). ok bluhm@
2017-06-26	Assert that the corresponding socket is locked when manipulating socket	Martin Pieuchot
	buffers. This is one step towards unlocking TCP input path. Note that all the functions asserting for the socket lock are not necessarilly MP-safe. All the fields of 'struct socket' aren't protected. Introduce a new kernel-only kqueue hint, NOTE_SUBMIT, to be able to tell when a filter needs to lock the underlying data structures. Logic and name taken from NetBSD. Tested by Hrvoje Popovski. ok claudio@, bluhm@, mikeb@