src - OpenBSD base system

Age	Commit message (Collapse)	Author
2024-04-17	Use struct ipsec_level within inpcb.	Alexander Bluhm
	Instead of passing around u_char[4], introduce struct ipsec_level that contains 4 ipsec levels. This provides better type safety. The embedding struct inpcb is globally visible for netstat(1), so put struct ipsec_level outside of #ifdef _KERNEL. OK deraadt@ mvs@
2024-04-12	Split single TCP inpcb table into IPv4 and IPv6 parts.	Alexander Bluhm
	With two separate TCP hash tables, each one becomes smaller. When we remove the exclusive net lock from TCP, contention on internet PCB table mutex will be reduced. UDP has been split earlier into IPv4 and IPv6. Replace branch conditions based on INP_IPV6 with assertions. OK mvs@
2024-02-13	Merge struct route and struct route_in6.	Alexander Bluhm
	Use a common struct route for both inet and inet6. Unfortunately struct sockaddr is shorter than sockaddr_in6, so netinet/in.h has to be exposed from net/route.h. Struct route has to be bsd visible for userland as netstat kvm code inspects inp_route. Internet PCB and TCP SYN cache can use a plain struct route now. All specific sockaddr types for inet and inet6 are embeded there. OK claudio@
2024-02-11	Remove include netinet6/ip6_var.h from netinet/in_pcb.h.	Alexander Bluhm
	OK mvs@
2024-01-28	Use more specific sockaddr type for inpcb notify.	Alexander Bluhm
	in_pcbnotifyall() is an IPv4 only function. All callers check that sockaddr dst is in fact a sockaddr_in. Pass the more spcific type and remove the runtime check at beginning of in_pcbnotifyall(). Use const sockaddr_in in in_pcbnotifyall() and const sockaddr_in6 in6_pcbnotify() as dst parameter. OK millert@
2024-01-27	Declare address parameter in TCP SYN cache const.	Alexander Bluhm
	tcp6_ctlinput() casted a constant sockaddr_sin6 to non-const sockaddr. sa6_src may be &sa6_any which lives in read-only data section. Better pass down the const addresses to syn_cache_lookup(). They are needed for hash lookup and are not modified. OK mvs@
2024-01-11	Fix white spaces in TCP.	Alexander Bluhm

2023-11-29	Document inp_socket as immutable and remove NULL checks.	Alexander Bluhm
	Struct inpcb field inp_socket is initialized in in_pcballoc(). It is not NULL and never changed. OK mvs@
2023-11-26	Remove inp parameter from ip_output().	Alexander Bluhm
	ip_output() received inp as parameter. This is only used to lookup the IPsec level of the socket. Reasoning about MP locking is much easier if only relevant data is passed around. Convert ip_output() to receive constant inp_seclevel as argument and mark it as protected by net lock. OK mvs@
2023-07-06	Convert tcp_now() time counter to 64 bit.	Alexander Bluhm
	After changing tcp now tick to milliseconds, 32 bits will wrap around after 49 days of uptime. That may be a problem in some places of our stack. Better use a 64 bit counter. As timestamp option is 32 bit in TCP protocol, use the lower 32 bit there. There are casts to 32 bits that should behave correctly. Start with random 63 bit offset to avoid uptime leakage. 2^63 milliseconds result in 2.9*10^8 years of possible uptime. OK yasuoka@
2023-05-10	Implement TCP send offloading, for now in software only. This is	Alexander Bluhm
	meant as a fallback if network hardware does not support TSO. Driver support is still work in progress. TCP output generates large packets. In IP output the packet is chopped to TCP maximum segment size. This reduces the CPU cycles used by pf. The regular output could be assisted by hardware later, but pf route-to and IPsec needs the software fallback in general. For performance comparison or to workaround possible bugs, sysctl net.inet.tcp.tso=0 disables the feature. netstat -s -p tcp shows TSO counter with chopped and generated packets. based on work from jan@ tested by jmc@ jan@ Hrvoje Popovski OK jan@ claudio@
2022-11-07	Modify TCP receive buffer size auto scaling to use the smoothed RTT	YASUOKA Masahiko
	(SRTT) instead of the timestamp option. Since the timestamp option is disabled on some OSs (eg. Windows) or dropped by some firewalls/routers, in such a case the window size had been fixed at 16KB, this limits throughput at very low on high latency networks. Also replace "tcp_now" from 2HZ tick counter to binuptime in milliseconds to calculate the SRTT better. tested by krw matthieu jmatthew dlg djm stu stsp ok claudio
2022-10-03	System calls should not fail due to temporary memory shortage in	Alexander Bluhm
	malloc(9) or pool_get(9). Pass down a wait flag to pru_attach(). During syscall socket(2) it is ok to wait, this logic was missing for internet pcb. Pfkey and route sockets were already waiting. sonewconn() must not wait when called during TCP 3-way handshake. This logic has been preserved. Unix domain stream socket connect(2) can wait until the other side has created the socket to accept. OK mvs@
2022-09-03	Initialize TCP mutex forgotten in previous commit.	Alexander Bluhm
	found by Hrvoje Popovski with witness; OK mvs@
2022-09-03	Use a mutex to update tcp_maxidle, tcp_iss, and tcp_now. This	Alexander Bluhm
	removes pressure from the exclusive netlock in tcp_slowtimo(). Reading is done atomically. Ensure that the tcp_now value is read only once per function to provide consistent time. OK yasuoka@
2022-08-30	Refactor internet PCB lookup function. Rename in_pcbhashlookup()	Alexander Bluhm
	so the public API is in_pcblookup() and in_pcblookup_listen(). For internal use introduce in_pcbhash_insert() and in_pcbhash_lookup() to avoid code duplication. Routing domain is unsigned, change the type to u_int. OK mvs@
2022-08-08	To make protocol input functions MP safe, internet PCB need protection.	Alexander Bluhm
	Use their reference counter in more places. The in_pcb lookup functions hold the PCBs in hash tables protected by table->inpt_mtx mutex. Whenever a result is returned, increment the ref count before releasing the mutex. Then the inp can be used as long as neccessary. Unref it at the end of all functions that call in_pcb lookup. As a shortcut, pf may also hold a reference to the PCB. When pf_inp_lookup() returns it, it also incements the ref count and the caller can handle it like the inp from table lookup. OK sashan@
2022-03-02	The return value of in6_pcbnotify() is never used. Make it a void	Alexander Bluhm
	function. OK gnezdo@ mvs@ florian@ sashan@
2022-01-02	spelling	Jonathan Gray
	ok jmc@ reads ok tb@
2021-11-11	Do not call ip_deliver() recursively from IPsec. As there is no	Alexander Bluhm
	crypto task anymore, it is possible to return the next protocol. Then ip_deliver() will walk the header chain in its loop. IPsec bridge(4) tested by jan@ OK mvs@ tobhe@ jan@
2021-10-23	There is an m_pullup() down in AH input. As it may free or change	Alexander Bluhm
	the mbuf, the callers must be careful. Although there is no bug, use the common pattern to handle this. Pass down an mbuf pointer mp and let m_pullup() update the pointer in all callers. It looks like the tcp signature functions should not be called. Avoid an mbuf leak and return an error. OK mvs@
2021-10-13	The function ipip_output() was registered as .xf_output() xform	Alexander Bluhm
	function. But was is never called via this pointer. It would have immediatley crashed as mp is always NULL when called via .xf_output(). Do not set .xf_output to ipip_output. This allows to pass only the parameters which are actually needed and the control flow is clearer. OK mpi@
2021-07-14	Resend the TCP packet only if the MTU locked flag appears at the	Alexander Bluhm
	route and was not there before. This should prevent a recursion in path MTU discovery with TCP over IPsec. reported and tested Matthias Schmidt; tested and OK tobhe@
2021-07-08	The xformsw array never changes. Declare struct xformsw constant	Alexander Bluhm
	and map data read only. OK deraadt@ mvs@ mpi@
2021-06-30	For path MTU discovery tcp_mtudisc() should resend a TCP packet by	Alexander Bluhm
	calling tcp_output() if the TCP maximum segment size changes. But that did not work, as the new value was compared before tcp_mss() had a chance to modify it. Move the comparison and change it from not equal to greater than. It makes only sense to resend a packet immediately if it becomes smaller and is more likely to fit. OK sashan@ tobhe@
2021-02-25	we don't have to cast to caddr_t when calling m_copydata anymore.	David Gwynne
	the first cut of this diff was made with coccinelle using this spatch: @rule@ type caddr_t; expression m, off, len, cp; @@ -m_copydata(m, off, len, (caddr_t)cp) +m_copydata(m, off, len, cp) i had fix it's opinionated idea of formatting by hand though, so i'm not sure it was worth it. ok deraadt@ bluhm@
2020-07-24	netinet: tcp_close(): delay reaper timeout by one tick	cheloha
	Zero-tick timeouts rely on implicit behavior in the timeout layer that inhibits optimizations in softclock(). bluhm@ says waiting a tick for the reaper shouldn't break anything. ok bluhm@
2018-10-04	Revert the inpcb table mutex commit. It triggers a witness panic	Alexander Bluhm
	in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx is held and sorwakeup() is called within the loop. As sowakeup() grabs the kernel lock, we have a lock ordering problem. found by Hrvoje Popovski; OK deraadt@ mpi@
2018-09-20	As a step towards per inpcb or socket locks, remove the net lock	Alexander Bluhm
	for netstat -a. Introduce a global mutex that protects the tables and hashes for the internet PCBs. To detect detached PCB, set its inp_socket field to NULL. This has to be protected by a per PCB mutex. The protocol pointer has to be protected by the mutex as netstat uses it. Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify() before the table mutex to avoid lock ordering problems in the notify functions. OK visa@
2018-06-14	Use mbuf (not cluster) always for t_template of tcpcb.	YASUOKA Masahiko
	ok bluhm
2018-05-08	Historically there were slow and fast tcp timeouts. That is why	Alexander Bluhm
	the delack timer had a different implementation. Use the same mechanism for all TCP timer. OK mpi@ visa@
2018-04-02	Use memcpy on freshly allocated memory and add the free size.	David Hill
	OK millert@
2018-03-18	Refactor tcp_mtudisc() like NetBSD did. Do the route lookup only	Alexander Bluhm
	if the tcpcb exits. OK mpi@
2018-01-23	The TCP reaper timeout was still imlemented as soft timeout. So	Alexander Bluhm
	it could run immediately and was not synchronized with the TCP timeouts, although that was the intension when it was introduced in revision 1.85. Convert the reaper to an ordinary TCP timeout so it is scheduled on the same timeout thread after all timeouts have finished. A net lock is not necessary as the process calling tcp_close() will not access the tcpcb after arming the reaper timeout. OK mikeb@
2017-12-07	Initialize tcp_secret in tcp_init	Mike Belopuhov
	The initialization of a secret SHA256 context for generating TCP initial sequence numbers is moved out of tcp_set_iss_tsm used to set up ISN for new connections and into tcp_init, sparing the need for a global flag. OK deraadt, visa, mpi
2017-10-22	Unconditionally enable TCP selective acknowledgements (SACK)	Mike Belopuhov
	OK deraadt, mpi, visa, job
2017-06-26	Assert that the corresponding socket is locked when manipulating socket	Martin Pieuchot
	buffers. This is one step towards unlocking TCP input path. Note that all the functions asserting for the socket lock are not necessarilly MP-safe. All the fields of 'struct socket' aren't protected. Introduce a new kernel-only kqueue hint, NOTE_SUBMIT, to be able to tell when a filter needs to lock the underlying data structures. Logic and name taken from NetBSD. Tested by Hrvoje Popovski. ok claudio@, bluhm@, mikeb@
2017-05-18	Merge the content of <netinet/tcpip.h> and <netinet6/tcpipv6.h> in	Martin Pieuchot
	<netinet/tcp_debug.h>. The IPv6 variant was always included and the IPv4 version is not present on all systems. Most of the offending ports are already fixed, thanks to sthen@!
2017-05-09	Convert diagnostic panic to compile time assert in tcp6_ctlinput().	Alexander Bluhm
	No binary change. OK mpi@
2017-05-04	Introduce sstosa() for converting sockaddr_storage with a type safe	Alexander Bluhm
	inline function instead of casting it to sockaddr. While there, use inline instead of __inline for all these conversions. Some struct sockaddr casts can be avoided completely. OK dhill@ mpi@
2017-04-19	Use the rt_rmx defines that hide the struct rt_kmetrics indirection.	Alexander Bluhm
	No binary change. OK mpi@
2017-02-09	percpu counters for TCP stats	Jeremie Courreges-Anglas
	ok mpi@ bluhm@
2017-01-26	Reduce the difference between struct protosw and ip6protosw. The	Alexander Bluhm
	IPv4 pr_ctlinput functions did return a void pointer that was always NULL and never used. Make all functions void like in the IPv6 case. OK mpi@
2017-01-10	Remove NULL checks before m_free(9), it deals with it.	Martin Pieuchot
	ok bluhm@, kettenis@
2016-12-20	No need for splsoftnet()/splx() dance around a pool_put() if the pool	Martin Pieuchot
	has IPL_SOFTNET as ipl. ok mikeb@, kettenis@
2016-09-24	ANSIfy netinet/; from David Hill	Christian Weisgerber

2016-09-15	all pools have their ipl set via pool_setipl, so fold it into pool_init.	David Gwynne
	the ioff argument to pool_init() is unused and has been for many years, so this replaces it with an ipl argument. because the ipl will be set on init we no longer need pool_setipl. most of these changes have been done with coccinelle using the spatch below. cocci sucks at formatting code though, so i fixed that by hand. the manpage and subr_pool.c bits i did myself. ok tedu@ jmatthew@ @ipl@ expression pp; expression ipl; expression s, a, o, f, m, p; @@ -pool_init(pp, s, a, o, f, m, p); -pool_setipl(pp, ipl); +pool_init(pp, s, a, ipl, f, m, p);
2016-09-06	pool_setipl for various netinet and netinet6 bits	David Gwynne
	thank you to everyone who helped reviewed these diffs ok mpi@
2016-09-03	Reduce the factor of the limits derived form NMBCLUSTERS. We want	Alexander Bluhm
	the additional clusters in the socket buffer and not elsewhere. OK claudio@
2016-08-31	Use 'sc_route{4,6}' directly instead of casting them to 'struct route *'.	Martin Pieuchot
	This is another little step towards deprecating 'struct route{,_in6}'. ok florian@