summaryrefslogtreecommitdiff
path: root/sys/netinet/tcp_subr.c
AgeCommit message (Collapse)Author
2024-04-17Use struct ipsec_level within inpcb.Alexander Bluhm
Instead of passing around u_char[4], introduce struct ipsec_level that contains 4 ipsec levels. This provides better type safety. The embedding struct inpcb is globally visible for netstat(1), so put struct ipsec_level outside of #ifdef _KERNEL. OK deraadt@ mvs@
2024-04-12Split single TCP inpcb table into IPv4 and IPv6 parts.Alexander Bluhm
With two separate TCP hash tables, each one becomes smaller. When we remove the exclusive net lock from TCP, contention on internet PCB table mutex will be reduced. UDP has been split earlier into IPv4 and IPv6. Replace branch conditions based on INP_IPV6 with assertions. OK mvs@
2024-02-13Merge struct route and struct route_in6.Alexander Bluhm
Use a common struct route for both inet and inet6. Unfortunately struct sockaddr is shorter than sockaddr_in6, so netinet/in.h has to be exposed from net/route.h. Struct route has to be bsd visible for userland as netstat kvm code inspects inp_route. Internet PCB and TCP SYN cache can use a plain struct route now. All specific sockaddr types for inet and inet6 are embeded there. OK claudio@
2024-02-11Remove include netinet6/ip6_var.h from netinet/in_pcb.h.Alexander Bluhm
OK mvs@
2024-01-28Use more specific sockaddr type for inpcb notify.Alexander Bluhm
in_pcbnotifyall() is an IPv4 only function. All callers check that sockaddr dst is in fact a sockaddr_in. Pass the more spcific type and remove the runtime check at beginning of in_pcbnotifyall(). Use const sockaddr_in in in_pcbnotifyall() and const sockaddr_in6 in6_pcbnotify() as dst parameter. OK millert@
2024-01-27Declare address parameter in TCP SYN cache const.Alexander Bluhm
tcp6_ctlinput() casted a constant sockaddr_sin6 to non-const sockaddr. sa6_src may be &sa6_any which lives in read-only data section. Better pass down the const addresses to syn_cache_lookup(). They are needed for hash lookup and are not modified. OK mvs@
2024-01-11Fix white spaces in TCP.Alexander Bluhm
2023-11-29Document inp_socket as immutable and remove NULL checks.Alexander Bluhm
Struct inpcb field inp_socket is initialized in in_pcballoc(). It is not NULL and never changed. OK mvs@
2023-11-26Remove inp parameter from ip_output().Alexander Bluhm
ip_output() received inp as parameter. This is only used to lookup the IPsec level of the socket. Reasoning about MP locking is much easier if only relevant data is passed around. Convert ip_output() to receive constant inp_seclevel as argument and mark it as protected by net lock. OK mvs@
2023-07-06Convert tcp_now() time counter to 64 bit.Alexander Bluhm
After changing tcp now tick to milliseconds, 32 bits will wrap around after 49 days of uptime. That may be a problem in some places of our stack. Better use a 64 bit counter. As timestamp option is 32 bit in TCP protocol, use the lower 32 bit there. There are casts to 32 bits that should behave correctly. Start with random 63 bit offset to avoid uptime leakage. 2^63 milliseconds result in 2.9*10^8 years of possible uptime. OK yasuoka@
2023-05-10Implement TCP send offloading, for now in software only. This isAlexander Bluhm
meant as a fallback if network hardware does not support TSO. Driver support is still work in progress. TCP output generates large packets. In IP output the packet is chopped to TCP maximum segment size. This reduces the CPU cycles used by pf. The regular output could be assisted by hardware later, but pf route-to and IPsec needs the software fallback in general. For performance comparison or to workaround possible bugs, sysctl net.inet.tcp.tso=0 disables the feature. netstat -s -p tcp shows TSO counter with chopped and generated packets. based on work from jan@ tested by jmc@ jan@ Hrvoje Popovski OK jan@ claudio@
2022-11-07Modify TCP receive buffer size auto scaling to use the smoothed RTTYASUOKA Masahiko
(SRTT) instead of the timestamp option. Since the timestamp option is disabled on some OSs (eg. Windows) or dropped by some firewalls/routers, in such a case the window size had been fixed at 16KB, this limits throughput at very low on high latency networks. Also replace "tcp_now" from 2HZ tick counter to binuptime in milliseconds to calculate the SRTT better. tested by krw matthieu jmatthew dlg djm stu stsp ok claudio
2022-10-03System calls should not fail due to temporary memory shortage inAlexander Bluhm
malloc(9) or pool_get(9). Pass down a wait flag to pru_attach(). During syscall socket(2) it is ok to wait, this logic was missing for internet pcb. Pfkey and route sockets were already waiting. sonewconn() must not wait when called during TCP 3-way handshake. This logic has been preserved. Unix domain stream socket connect(2) can wait until the other side has created the socket to accept. OK mvs@
2022-09-03Initialize TCP mutex forgotten in previous commit.Alexander Bluhm
found by Hrvoje Popovski with witness; OK mvs@
2022-09-03Use a mutex to update tcp_maxidle, tcp_iss, and tcp_now. ThisAlexander Bluhm
removes pressure from the exclusive netlock in tcp_slowtimo(). Reading is done atomically. Ensure that the tcp_now value is read only once per function to provide consistent time. OK yasuoka@
2022-08-30Refactor internet PCB lookup function. Rename in_pcbhashlookup()Alexander Bluhm
so the public API is in_pcblookup() and in_pcblookup_listen(). For internal use introduce in_pcbhash_insert() and in_pcbhash_lookup() to avoid code duplication. Routing domain is unsigned, change the type to u_int. OK mvs@
2022-08-08To make protocol input functions MP safe, internet PCB need protection.Alexander Bluhm
Use their reference counter in more places. The in_pcb lookup functions hold the PCBs in hash tables protected by table->inpt_mtx mutex. Whenever a result is returned, increment the ref count before releasing the mutex. Then the inp can be used as long as neccessary. Unref it at the end of all functions that call in_pcb lookup. As a shortcut, pf may also hold a reference to the PCB. When pf_inp_lookup() returns it, it also incements the ref count and the caller can handle it like the inp from table lookup. OK sashan@
2022-03-02The return value of in6_pcbnotify() is never used. Make it a voidAlexander Bluhm
function. OK gnezdo@ mvs@ florian@ sashan@
2022-01-02spellingJonathan Gray
ok jmc@ reads ok tb@
2021-11-11Do not call ip_deliver() recursively from IPsec. As there is noAlexander Bluhm
crypto task anymore, it is possible to return the next protocol. Then ip_deliver() will walk the header chain in its loop. IPsec bridge(4) tested by jan@ OK mvs@ tobhe@ jan@
2021-10-23There is an m_pullup() down in AH input. As it may free or changeAlexander Bluhm
the mbuf, the callers must be careful. Although there is no bug, use the common pattern to handle this. Pass down an mbuf pointer mp and let m_pullup() update the pointer in all callers. It looks like the tcp signature functions should not be called. Avoid an mbuf leak and return an error. OK mvs@
2021-10-13The function ipip_output() was registered as .xf_output() xformAlexander Bluhm
function. But was is never called via this pointer. It would have immediatley crashed as mp is always NULL when called via .xf_output(). Do not set .xf_output to ipip_output. This allows to pass only the parameters which are actually needed and the control flow is clearer. OK mpi@
2021-07-14Resend the TCP packet only if the MTU locked flag appears at theAlexander Bluhm
route and was not there before. This should prevent a recursion in path MTU discovery with TCP over IPsec. reported and tested Matthias Schmidt; tested and OK tobhe@
2021-07-08The xformsw array never changes. Declare struct xformsw constantAlexander Bluhm
and map data read only. OK deraadt@ mvs@ mpi@
2021-06-30For path MTU discovery tcp_mtudisc() should resend a TCP packet byAlexander Bluhm
calling tcp_output() if the TCP maximum segment size changes. But that did not work, as the new value was compared before tcp_mss() had a chance to modify it. Move the comparison and change it from not equal to greater than. It makes only sense to resend a packet immediately if it becomes smaller and is more likely to fit. OK sashan@ tobhe@
2021-02-25we don't have to cast to caddr_t when calling m_copydata anymore.David Gwynne
the first cut of this diff was made with coccinelle using this spatch: @rule@ type caddr_t; expression m, off, len, cp; @@ -m_copydata(m, off, len, (caddr_t)cp) +m_copydata(m, off, len, cp) i had fix it's opinionated idea of formatting by hand though, so i'm not sure it was worth it. ok deraadt@ bluhm@
2020-07-24netinet: tcp_close(): delay reaper timeout by one tickcheloha
Zero-tick timeouts rely on implicit behavior in the timeout layer that inhibits optimizations in softclock(). bluhm@ says waiting a tick for the reaper shouldn't break anything. ok bluhm@
2018-10-04Revert the inpcb table mutex commit. It triggers a witness panicAlexander Bluhm
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx is held and sorwakeup() is called within the loop. As sowakeup() grabs the kernel lock, we have a lock ordering problem. found by Hrvoje Popovski; OK deraadt@ mpi@
2018-09-20As a step towards per inpcb or socket locks, remove the net lockAlexander Bluhm
for netstat -a. Introduce a global mutex that protects the tables and hashes for the internet PCBs. To detect detached PCB, set its inp_socket field to NULL. This has to be protected by a per PCB mutex. The protocol pointer has to be protected by the mutex as netstat uses it. Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify() before the table mutex to avoid lock ordering problems in the notify functions. OK visa@
2018-06-14Use mbuf (not cluster) always for t_template of tcpcb.YASUOKA Masahiko
ok bluhm
2018-05-08Historically there were slow and fast tcp timeouts. That is whyAlexander Bluhm
the delack timer had a different implementation. Use the same mechanism for all TCP timer. OK mpi@ visa@
2018-04-02Use memcpy on freshly allocated memory and add the free size.David Hill
OK millert@
2018-03-18Refactor tcp_mtudisc() like NetBSD did. Do the route lookup onlyAlexander Bluhm
if the tcpcb exits. OK mpi@
2018-01-23The TCP reaper timeout was still imlemented as soft timeout. SoAlexander Bluhm
it could run immediately and was not synchronized with the TCP timeouts, although that was the intension when it was introduced in revision 1.85. Convert the reaper to an ordinary TCP timeout so it is scheduled on the same timeout thread after all timeouts have finished. A net lock is not necessary as the process calling tcp_close() will not access the tcpcb after arming the reaper timeout. OK mikeb@
2017-12-07Initialize tcp_secret in tcp_initMike Belopuhov
The initialization of a secret SHA256 context for generating TCP initial sequence numbers is moved out of tcp_set_iss_tsm used to set up ISN for new connections and into tcp_init, sparing the need for a global flag. OK deraadt, visa, mpi
2017-10-22Unconditionally enable TCP selective acknowledgements (SACK)Mike Belopuhov
OK deraadt, mpi, visa, job
2017-06-26Assert that the corresponding socket is locked when manipulating socketMartin Pieuchot
buffers. This is one step towards unlocking TCP input path. Note that all the functions asserting for the socket lock are not necessarilly MP-safe. All the fields of 'struct socket' aren't protected. Introduce a new kernel-only kqueue hint, NOTE_SUBMIT, to be able to tell when a filter needs to lock the underlying data structures. Logic and name taken from NetBSD. Tested by Hrvoje Popovski. ok claudio@, bluhm@, mikeb@
2017-05-18Merge the content of <netinet/tcpip.h> and <netinet6/tcpipv6.h> inMartin Pieuchot
<netinet/tcp_debug.h>. The IPv6 variant was always included and the IPv4 version is not present on all systems. Most of the offending ports are already fixed, thanks to sthen@!
2017-05-09Convert diagnostic panic to compile time assert in tcp6_ctlinput().Alexander Bluhm
No binary change. OK mpi@
2017-05-04Introduce sstosa() for converting sockaddr_storage with a type safeAlexander Bluhm
inline function instead of casting it to sockaddr. While there, use inline instead of __inline for all these conversions. Some struct sockaddr casts can be avoided completely. OK dhill@ mpi@
2017-04-19Use the rt_rmx defines that hide the struct rt_kmetrics indirection.Alexander Bluhm
No binary change. OK mpi@
2017-02-09percpu counters for TCP statsJeremie Courreges-Anglas
ok mpi@ bluhm@
2017-01-26Reduce the difference between struct protosw and ip6protosw. TheAlexander Bluhm
IPv4 pr_ctlinput functions did return a void pointer that was always NULL and never used. Make all functions void like in the IPv6 case. OK mpi@
2017-01-10Remove NULL checks before m_free(9), it deals with it.Martin Pieuchot
ok bluhm@, kettenis@
2016-12-20No need for splsoftnet()/splx() dance around a pool_put() if the poolMartin Pieuchot
has IPL_SOFTNET as ipl. ok mikeb@, kettenis@
2016-09-24ANSIfy netinet/; from David HillChristian Weisgerber
2016-09-15all pools have their ipl set via pool_setipl, so fold it into pool_init.David Gwynne
the ioff argument to pool_init() is unused and has been for many years, so this replaces it with an ipl argument. because the ipl will be set on init we no longer need pool_setipl. most of these changes have been done with coccinelle using the spatch below. cocci sucks at formatting code though, so i fixed that by hand. the manpage and subr_pool.c bits i did myself. ok tedu@ jmatthew@ @ipl@ expression pp; expression ipl; expression s, a, o, f, m, p; @@ -pool_init(pp, s, a, o, f, m, p); -pool_setipl(pp, ipl); +pool_init(pp, s, a, ipl, f, m, p);
2016-09-06pool_setipl for various netinet and netinet6 bitsDavid Gwynne
thank you to everyone who helped reviewed these diffs ok mpi@
2016-09-03Reduce the factor of the limits derived form NMBCLUSTERS. We wantAlexander Bluhm
the additional clusters in the socket buffer and not elsewhere. OK claudio@
2016-08-31Use 'sc_route{4,6}' directly instead of casting them to 'struct route *'.Martin Pieuchot
This is another little step towards deprecating 'struct route{,_in6}'. ok florian@