src - OpenBSD base system

Age	Commit message (Collapse)	Author
2022-08-29	Do not calculate the output protocol checksum in the IP input path.	Alexander Bluhm
	This logic was introduced in 2013 when pf checksum fixup was temporarily removed. After restoring the pf bahavior in 2016, it should not be necessary anymore. OK claudio@
2022-08-29	Sendmsg could crash in tcp_output due to a missing check after the	Moritz Buhl
	introduction of tcp_send. OK mvs@, bluhm@, gnezdo@ Reported-by: syzbot+e859fd353c90eeac26f8@syzkaller.appspotmail.com
2022-08-29	Move PRU_RCVOOB request to (*pru_rcvoob)().	Vitaliy Makkoveev
	ok bluhm@
2022-08-29	Use struct refcnt for interface address reference counting.	Alexander Bluhm
	There was a crash due to use after free of the ifa although it is ref counted. As ifa_refcnt was a simple integer increment, there may be a path where multiple CPUs access it concurrently. So change to struct refcnt which is MP safe and provides dt(4) leak debugging. Link level address for IPsec enc(4) and various MPLS interfaces is special. There ifa is part of struct sc. Use refcount anyway and add a panic to detect use after free. bug report stsp@; OK mvs@
2022-08-28	Move PRU_SENSE request to (*pru_sense)().	Vitaliy Makkoveev
	ok bluhm@
2022-08-28	Move PRU_ABORT request to (*pru_abort)().	Vitaliy Makkoveev
	We abort only the sockets which are linked to `so_q' or `so_q0' queues of listening socket. Such sockets have no corresponding file descriptor and are not accessed from userland, so PRU_ABORT used to destroy them on listening socket destruction. Currently all our sockets support PRU_ABORT request, but actually it required only for tcp(4) and unix(4) sockets, so i should be optional. However, they will be removed with separate diff, and this time PRU_ABORT requests were converted as is. Also, the socket should be destroyed on PRU_ABORT request, but route and key management sockets leave it alive. This was also converted as is, because this wrong code never called. ok bluhm@
2022-08-27	Move PRU_SEND request to (*pru_send)().	Vitaliy Makkoveev
	The former PRU_SEND error path of gre_usrreq() had `control' mbuf(9) leak. It was fixed in new gre_send(). The former pfkeyv2_send() was renamed to pfkeyv2_dosend(). ok bluhm@
2022-08-26	Move PRU_RCVD request to (*pru_rcvd)().	Vitaliy Makkoveev
	ok bluhm@
2022-08-22	Move PRU_SHUTDOWN request to (*pru_shutdown)().	Vitaliy Makkoveev
	ok bluhm@
2022-08-22	Document that igmp_timers_are_running and mld6_timers_are_running	Alexander Bluhm
	are protected by netlock. They are only used as shortcut in fast timer. Common prefix in mld6.c is mld6. OK mvs@
2022-08-22	Move PRU_DISCONNECT request to (*pru_disconnect).	Vitaliy Makkoveev
	ok bluhm@
2022-08-22	Use rwlock per inpcb table to protect notify list. The notify	Alexander Bluhm
	function may sleep, so holding a mutex is not possible. The same list entry and rwlock is used for UDP multicast and raw IP delivery. By adding a write lock, exclusive netlock is no longer necessary for PCB notify and UDP and raw IP input. OK mvs@
2022-08-22	Move PRU_ACCEPT request to (*pru_accept)().	Vitaliy Makkoveev
	ok bluhm@
2022-08-21	Only grab netlock in igmp and mdl6 fast timer when necessary. There	Alexander Bluhm
	are status variables that can be used to avoid locking if timers are not running. This should reduce contention on exclusive netlock. OK kn@ mvs@
2022-08-21	Move PRU_CONNECT request to (*pru_connect)() handler.	Vitaliy Makkoveev
	ok bluhm@
2022-08-21	Move PRU_LISTEN request to (*pru_listen)() handler.	Vitaliy Makkoveev
	ok bluhm@
2022-08-21	Change soabort() return value to void. We never interesting on it.	Vitaliy Makkoveev
	ok bluhm@
2022-08-21	Remove ip_local() and ip6_local(). After moving the IPv4 fragment	Alexander Bluhm
	reassembly and IPv6 hob-by-hob header chain processing out of ip_local() and ip6_local(), they are almost empty stubs. The check for local deliver loop in ip_ours() and ip6_ours() is sufficient. Recover mbuf offset and next protocol directly in ipintr() and ip6intr(). OK mvs@
2022-08-21	Introduce a mutex per inpcb to serialize access to socket receive	Alexander Bluhm
	buffer. Later it may be used to protect more of the PCB or socket. In divert input replace the kernel lock with this mutex. OK mvs@
2022-08-20	Move PRU_BIND request to (*pru_bind)() handler.	Vitaliy Makkoveev
	For the protocols which don't support request, leave handler NULL. Do the NULL check within corresponding pru_() wrapper and return EOPNOTSUPP in such case. This will be done for all upcoming user request handlers. ok bluhm@ guenther@
2022-08-15	Run IPv6 hop-by-hop options processing in parallel. The ip6_hbhchcheck()	Alexander Bluhm
	code is MP safe and moves from ip6_local() to ip6_ours(). If there are any options, store the chain offset and next protocol in a mbuf tag. When dequeuing without tag, it is a regular IPv6 header. As mbuf tags degrade performance, use them only if a hop-by-hop header is present. Such packets are rare and pf drops them by default. OK mvs@
2022-08-15	Introduce tcp_sogetpcb() to assign `inp' and `tp' from passed socket.	Vitaliy Makkoveev
	This function will help to avoid code duplication when tcp_usrreq() will be divided to multiple handlers. ok bluhm@
2022-08-15	Introduce 'pr_usrreqs' structure and move existing user-protocol	Vitaliy Makkoveev
	handlers into it. We want to split existing (pr_usrreq)() to multiple short handlers for each PRU_ request as it was already done for PRU_ATTACH and PRU_DETACH. This is the preparation step, (pr_usrreq)() split will be done with the following diffs. Based on reverted diff from guenther@. ok bluhm@
2022-08-13	Remove needless include pledge.h accidently added in previous commit.	Alexander Bluhm
	OK claudio@
2022-08-12	Remove differences between ip_fragment() and ip6_fragment(). They	Alexander Bluhm
	do nearly the same thing, so they should look similar. OK sashan@
2022-08-12	There are some places in ip and ip6 input where operations fail due	Alexander Bluhm
	to out of memory. Use a generic idropped counter for those. OK mvs@
2022-08-11	Add TCP_INFO support to getsockopt for tcp sessions.	Claudio Jeker
	TCP_INFO provides a lot of information about the TCP session of this socket. Many processes like to peek at the rtt of a connection but this also provides a lot of more special info for use by e.g. tcpbench(1). While the basic minimal info is available all the time the more specific data is only populated for privileged processes. This is done to not share data back to userland that may allow to attack a session. TCP_INFO is available to pledge "inet" since pledged processes like chrome tend to use TCP_INFO when available. OK bluhm@
2022-08-08	To make protocol input functions MP safe, internet PCB need protection.	Alexander Bluhm
	Use their reference counter in more places. The in_pcb lookup functions hold the PCBs in hash tables protected by table->inpt_mtx mutex. Whenever a result is returned, increment the ref count before releasing the mutex. Then the inp can be used as long as neccessary. Unref it at the end of all functions that call in_pcb lookup. As a shortcut, pf may also hold a reference to the PCB. When pf_inp_lookup() returns it, it also incements the ref count and the caller can handle it like the inp from table lookup. OK sashan@
2022-08-06	Clean up the netlock macros. Merge NET_RLOCK_IN_SOFTNET and	Alexander Bluhm
	NET_RLOCK_IN_IOCTL, which have the same implementation. The R and W are hard to see, call the new macro NET_LOCK_SHARED. Rename the opposite assertion from NET_ASSERT_WLOCKED to NET_ASSERT_LOCKED_EXCLUSIVE. Update some outdated comments about net locking. OK mpi@ mvs@
2022-08-04	Use 16 bit variable to store more fragment flag. This avoids loss	Alexander Bluhm
	of significant bits on big endian machines. Bug has been introduced in previous commit by removing the =! 0 check. OK mvs@
2022-07-28	Checking the fragment flags of an incoming IP packet does not need	Alexander Bluhm
	the mutex for the fragment list. Move this code before the critical section. Use ISSET() to make clear which flags are checked. OK mvs@
2022-07-25	The IPv4 reassembly code is MP safe, so we can run it in parallel.	Alexander Bluhm
	Note that ip_ours() runs with shared netlock, while ip_local() has exclusive netlock after queuing. Move existing the code into function ip_fragcheck() and call it from ip_ours(). OK mvs@
2022-07-24	Fix assertion for write netlock in rip6_input(). ip6_input() has	Alexander Bluhm
	shared net lock. ip_deliver() needs exclusive net lock. Instead of calling ip_deliver() directly, use ip6_ours() to queue the packet. Move the write lock assertion into ip_deliver() to catch such bugs earlier. The assertion was only triggered with IPv6 multicast forwarding or router alert hop by hop option. Found by regress test. OK kn@ mvs@
2022-07-16	To fix an KASSERT(la != NULL) panic in ARP, protect the rt_llinfo	Alexander Bluhm
	field of the route with a mutex. Keep rt_llinfo not NULL consistent with RTF_LLINFO flag is set. Also do not put the mutex in the fast path. OK mpi@
2022-07-14	Use capital letters for global ipsec(4) locks description. Use 'D'	Vitaliy Makkoveev
	instead of 's' for `tdb_sadb_mtx' mutex(9) because this is 'D'atabase. No functional changes. ok bluhm@
2022-06-29	Nullify `ipsecflowinfo' when mbuf(9) has no ipsec flowinfo data.	Vitaliy Makkoveev
	Otherwise we use `ipsecflowinfo' obtained from previous packet. ok claudio@
2022-06-28	Use btrace(8) to debug reference counting. dt(4) provides a static	Alexander Bluhm
	tracepoint for each type of refcnt we have. As a start, add inpcb and tdb refcnt. When the counter changes, btrace may print the actual object, the current counter, the change value and optionally the stack trace. discussed with visa@; OK mpi@
2022-06-27	Push the kernel lock down into arpresolve(). We still need it to	Alexander Bluhm
	prevent concurrent access to rt_llinfo from rtrequest_delete(). But the common case, when the MAC address is already known, works without lock. tested by Hrvoje Popovski; OK mvs@
2022-06-27	Instead of calling getuptime() all the time in ARP code, do it only	Alexander Bluhm
	once per function. This gives a more consistent time value. OK claudio@ miod@ mvs@
2022-06-26	The "ifq_set_maxlen(..., 1);" hack we use to enforce pipex(4) related	Vitaliy Makkoveev
	(*if_qstart)() be always called with netlock held doesn't work anymore with PPPOE sessions. Introduce `pipex_list_mtx' mutex(9) and use it to protect global pipex(4) lists and radix trees. Protect pipex(4) `session' dereference with reference counters, because we could sleep when accessing pipex(4) from ioctl(2) path, and this is not possible with mutex(9) held. ok bluhm@
2022-06-17	The timeout for ipsec acquire does not decrement the reference	Alexander Bluhm
	counter to 0 properly. We have one reference count for the lists, and one for the timeout handler. When the timout fires, it has to decrement the reference to itself. Then the ipa is removed from the lists and decremented again. from Stefan Butz; OK tobhe@ mvs@
2022-06-06	Simplify solock() and sounlock(). There is no reason to return a value	Claudio Jeker
	for the lock operation and to pass a value to the unlock operation. sofree() still needs an extra flag to know if sounlock() should be called or not. But sofree() is called less often and mostly without keeping the lock. OK mpi@ mvs@
2022-05-25	Call if_put(9) after we finish with `ia' within ip_getmoptions().	Vitaliy Makkoveev
	if_put(9) call means we finish work with `ifp' and it could be destroyed. `ia' is the pointer to 'in_ifaddr' data belongs to `ifp', so we need to release corresponding `ifp' after we finish deal with `ia'. `if_addrlist' list destruction and ip_getmoptions() are serialized with kernel and net locks so this is not critical, but looks inconsistent. ok bluhm@
2022-05-15	have in_pcbselsrc copy the selected address to memory provided by the caller.	David Gwynne
	having it return a pointer to something that has a lifetime managed by a lock without accounting for it or taking a reference count or anything like that is asking for trouble. copying the address to caller provded memory while still inside the lock is a lot safer. discussed with visa@ ok bluhm@ claudio@
2022-05-09	Protect sbappendaddr() in divert_packet() with kernel lock. With	Alexander Bluhm
	divert-packet rules pf calls directly from IP layer to protocol layer. As the former has only shared net lock, additional protection against parallel access is needed. Kernel lock is a temporary workaround until the socket layer is MP safe. discussed with kettenis@ mvs@
2022-05-05	Clean up divert_packet(). Function does not return error, make it	Alexander Bluhm
	void. Introduce mutex and refcounting for inp like in the other PCB functions. OK sashan@
2022-05-05	Use static objects for struct rttimer_queue instead of dynamically	Claudio Jeker
	allocate them. Currently there are 6 rttimer_queues and not many more will follow. So change rt_timer_queue_create() to rt_timer_queue_init() which now takes a struct rttimer_queue * as argument which will be initialized. Since this changes the gloabl vars from pointer to struct adjust other callers as well. OK bluhm@
2022-05-05	No longer consider IN_EXPERIMENTAL aka 240/4 as not forwardable.	Claudio Jeker
	We already allow 240/4 in and out so lets allow it through as well. One of many steps to make 240/4 useable. Diff by Seth David Schoen (schoen at loyalty.org) OK bluhm@ djm@
2022-05-04	Move rttimer callback function from the rttimer itself to rttimer_queue.	Claudio Jeker
	All users use the same callback per queue so that makes sense. Also replace rt_timer_queue_destroy() with rt_timer_queue_flush(). OK bluhm@
2022-05-04	In ipsp_spd_lookup() rename the parameter tdbp to tdbin as it is	Alexander Bluhm
	always the incoming TDB that has to be checked. from markus@