summaryrefslogtreecommitdiff
path: root/sys/netinet
AgeCommit message (Collapse)Author
2021-02-11Swap faddr/laddr and fport/lport arguments in call to stoeplitz_ipXport().Patrick Wildt
Technically the whole point of the stoeplitz API is that it's symmetric, meaning that the order of addresses and ports doesn't matter and will produce the same hash value. Coverity CID 1501717 ok dlg@
2021-02-10If pf changes the routing table when sending packets, the kernelAlexander Bluhm
could get stuck in an endless recursion during TCP path MTU discovery. Create a dynamic host route in ip_output() that can be used by tcp_mtudisc() to store the MTU. Reported by Peter Mueller and Sebastian Sturm OK claudio@
2021-02-08Remove maxburst feature from tcp_outputjan
OK bluhm@, claudio@, deraadt@
2021-02-08Start refcounting interface groups with 1. if_creategroup() returnsAlexander Bluhm
a new object that is already refcounted, so carp attach does not reach into internal structures. Add kasserts to detect counter overflow or underflow. OK mvs@
2021-02-06Simplex interface sends packet back without hardware checksumAlexander Bluhm
offloading. The checksum must be calculated in software. Use the same condition in ether_resolve() to send the broadcast packet back to the stack and in in_ifcap_cksum() to force software checksumming. This fixes regress/sys/kern/sosplice/loop. OK procter@
2021-02-03Turns off the direct ACK on every other segmentjan
The kernel uses a huge amount of processing time for sending ACKs to the sender on the receiving interface. After receiving a data segment, we send out two ACKs. The first one in tcp_input() direct after receiving. The second ACK is send out, after the userland or the sosplice task read some data out of the socket buffer. Thus, we save some processing time and improve network performance. Longer tested by sthen@ OK claudio@
2021-02-02If IP_MULTICAST_IF or IP_ADD_MEMBERSHIP pass a interface index to theClaudio Jeker
kernel make sure that the rdomain of that interface is the same as the rdomain of the inpcb. Problem spotted and fix tested by semarie@ OK bluhm@ mvs@
2021-02-01Fix path MTU discovery for ESP tunneled in IPv6. We always wantAlexander Bluhm
short TCP segments or fragments encapsulated in ESP instead of fragmented ESP packets. Pass the don't fragment flag down along the stack so that dynamic routes with MTU are created eventually. with and OK markus@; OK tobhe@
2021-01-28Drop tcp_trace() from SMALL_KERNEL builds to make room on amd64 floppyVisa Hankala
OK deraadt@
2021-01-25if stoeplitz is enabled, use it to provide a flowid for tcp packets.David Gwynne
drivers that implement rss and multiple rings depend on the symmetric toeplitz code, and use it to generate a key that decides with rx ring a packet lands on. if the toeplitz code is enabled, this diff has the pcb and tcp layer use the toeplitz code to generate a flowid for packets they send, which in turn is used to pick a tx ring. because the nic and the stack use the same key, the tx and rx sides end up with the same hash/flowid. at the very least this means that the same rx and tx queue pair on a particular nic are used for both sides of the connection. as the stack becomes more parallel, it will also help keep both sides of the tcp connection processing in the one place.
2021-01-21carp(4): convert ifunit() to if_unit(9)mvs
ok dlg@ bluhm@
2021-01-18add IPPROTO_SCTP, ok claudio@Stuart Henderson
2021-01-16Extend IP_MULTICAST_IF to take either an address (struct in_addr), aClaudio Jeker
struct ip_mreq or a struct ip_mreqn. Using struct ip_mreqn allows to pass a interface index instead of specifying the multicast interface via its IP address. This is also the API implemented by Linux and FreeBSD and should help porting software. OK bluhm@ phessler@ robert@
2021-01-15As documented in sysctl(2) net.inet.ip.forwarding can be 2.Alexander Bluhm
Relax input validation and use integer comparison. OK kn@ mvs@ sthen@
2021-01-11Create a path MTU host route for IPsec over IPv6. Basically theAlexander Bluhm
code is copied from IPv4 and adapted. Some things are changed in v4 to make it look similar. - ip6_forward increases the noroute error counter, do that in ip_forward, too. - Pass more specific sockaddr_in6 to icmp6_mtudisc_clone(). - IPv6 may also use reject routes for IPsec PMTU clones. - To pass a route_in6 to ip6_output_ipsec_send() introduce one in ip6_forward(). That is the same what IPv4 does. Note that dst and sin6 switch roles. - Copy comments from ip_output_ipsec_send() to ip6_output_ipsec_send() to make code similar. - Implement dynamic IPv6 IPsec PMTU routes. OK tobhe@
2021-01-09Enforce range with sysctl_int_bounded in ipip_sysctlgnezdo
OK millert@
2021-01-09Enforce range with sysctl_int_bounded in tcp_sysctlgnezdo
One case uses the explicit range from the code and the other was inferred from reading the usage. OK millert@
2021-01-07Extend IP_ADD_MEMBERSHIP to also support struct ip_mreqn.Claudio Jeker
struct ip_mreqn allows to use the interface index to select the interface for multicast packets which makes it possible to use this with unnumbered interfaces. OK dlg@ robert@
2021-01-04- fix use after free, when packet gets dropped.Alexandr Nedvedicky
patch submitted by Ralf Horstmann from ackstorm.de OK dlg@
2020-12-20Accept reject and blackhole routes for IPsec PMTU discovery.Alexander Bluhm
Since revision 1.87 of ip_icmp.c icmp_mtudisc_clone() ignored reject routes. Otherwise TCP would clone these routes for PMTU discovery. They will not work, even after dynamic routing has found a better route than the reject route. With IPsec the use case is different. First you need a route, but then the flow handles the packet without routing. Usually this route should be a reject route to avoid sending unencrypted traffic if the flow is missing. But IPsec needs this route for PMTU discovery, so use it for that. OK claudio@ tobhe@
2020-12-18Make sure the first packet of an SA has sequence number 1 (as described intobhe
RFC 4302 and RFC 4303). It seems this was changed by accident when support for 64 bit sequence numbers was added. ok bluhm@ patrick@
2020-12-16Use ESP sequence number as IV for AES-CTR, AES-GCM and Chacha20.tobhe
This eliminates the risk for IV reuse because of random collisions and increases performance a little. ok patrick@ markus@
2020-11-16Replace sysctl_rdint with sysctl_bounded_args entries in net.inet*gnezdo
2020-11-16Remove the cases folded into sysctl_bounded_args but left behindgnezdo
divert_sysctl and divert6_sysctl get a tiny bit slimmer.
2020-11-07Rework source IP address setting.denis
- Move most of the processing out of rtable.c (reasonnable tb@, ok bluhm@) - Remove memory allocation, store pointer to existing ifaddr - Fix tunnel interface handling looks fine mpi@
2020-11-05Enable support for ASN1_DN ipsec identifiers.Peter Hessler
Tested with multiple Window 10 Pro (ver 2004) clients, and OpenBSD+iked as the server. OK tobhe@ sthen@ kn@
2020-11-05Replace wrong cast with satosin.denis
Advised by bluhm@
2020-11-02Move TCPCTL_ALWAYS_KEEPALIVE into tcpctl_varsgnezdo
OK deraadt
2020-10-29Add feature to force the selection of source IP addressdenis
Based/previous work on an idea from deraadt@ Input from claudio@, djm@, deraadt@, sthen@ OK deraadt@
2020-10-28When generating the ICMP6 response to an IPv6 packet, the kernelAlexander Bluhm
could use mbuf memory after freeing it. If m_pullup() allocates a new mbuf, the caller uses the old pointer. found and reported by Maxime Villard, thanks OK claudio@ markus@ denis@
2020-09-22whitespacetobhe
2020-09-01Convert *_sysctl in ipsec_input.c to sysctl_bounded_arrgnezdo
The best-guessed limits will be tested by trial.
2020-09-01Convert icmp6_sysct to sysctl_bounded_argsgnezdo
The best-guessed limits will be tested by trial.
2020-08-24Convert divert*_sysctl to sysctl_bounded_argsgnezdo
OK sashan
2020-08-22Convert icmp_sysctl to sysctl_bounded_argsgnezdo
... these all look fine, derradt@
2020-08-22Convert ip_sysctl to sysctl_bounded_argsgnezdo
2020-08-22Convert udp_sysctl to sysctl_bounded_argsgnezdo
2020-08-18Style fixups from hurried commitsgnezdo
Thanks kettenis@ for pointing out. ok kettenis@
2020-08-18Convert tcp_sysctl to sysctl_bounded_argsgnezdo
This introduces bounds checks for many net.inet.tcp sysctl variables. Folded some fitting cases into the framework: tcp_do_sack, tcp_do_ecn. ok derradt@
2020-08-17Simplify igmp_sysctl to directly return error in default casegnezdo
This replaces a piece of observationally identical code which was much more complicated. ok mpi@
2020-08-08No longer prevent TCP connections to IPv6 anycast addresses.Florian Obser
RFC 4291 dropped this requirement from RFC 3513: o An anycast address must not be used as the source address of an IPv6 packet. And from that requirement draft-itojun-ipv6-tcp-to-anycast rightly concluded that TCP connections must be prevented. The draft also states: The proposed method MUST be removed when one of the following events happens in the future: o Restriction imposed on IPv6 anycast address is loosened, so that anycast address can be placed into source address field of the IPv6 header[...] OK jca
2020-08-05Don't compare pointers against zero.Marcus Glocker
Reported by Peter J. Philipp. ok mvs@ deraadt@
2020-08-01Move range check inside sysctl_int_arrgnezdo
Range violations are now consistently reported as EOPNOTSUPP. Previously they were mixed with ENOPROTOOPT. OK kn@
2020-07-28Don't treat an error if carppeer is an unicast and the peer is down.YASUOKA Masahiko
ok kn
2020-07-28After the previous commit, src/regress/sys/netinet/carp triggeredAlexander Bluhm
an uvm fault. Check that ifp0 is not NULL. OK sashan@ mvs@
2020-07-24netinet: tcp_close(): delay reaper timeout by one tickcheloha
Zero-tick timeouts rely on implicit behavior in the timeout layer that inhibits optimizations in softclock(). bluhm@ says waiting a tick for the reaper shouldn't break anything. ok bluhm@
2020-07-24Use interface index instead of pointer to `ifnet' in carp(4).mvs
ok sashan@
2020-07-22deprecate interface input handler lists, just use one input function.David Gwynne
the interface input handler lists were originally set up to help us during the intial mpsafe network stack work. at the time not all the virtual ethernet interfaces (vlan, svlan, bridge, trunk, etc) were mpsafe, so we wanted a way to avoid them by default, and only take the kernel lock hit when they were specifically enabled on the interface. since then, they have been fixed up to be mpsafe. i could leave the list in place, but it has some semantic problems. because virtual interfaces filter packets based on the order they were attached to the parent interface, you can get packets taken away in surprising ways, especially when you reboot and netstart does something different to what you did by hand. by hardcoding the order that things like vlan and bridge get to look at packets, we can document the behaviour and get consistency. it also means we can get rid of a use of SRPs which were difficult to replace with SMRs. the interface input handler list is an SRPL, which we would like to deprecate. it turns out that you can sleep during stack processing, which you're not supposed to do with SRPs or SMRs, but SRPs are a lot more forgiving and it worked. lastly, it turns out that this code is faster than the input list handling, so lots of winning all around. special thanks to hrvoje popovski and aaron bieber for testing. this has been in snaps as part of a larger diff for over a week.
2020-07-22move carp_input into ether_input, instead of via an input handler.David Gwynne
carp_input is only tried after vlan and bridge handling is done, and after the ethernet packet doesnt match the parent interfaces mac address. this has been in snaps as part of a larger diff for over a week.
2020-07-22add code to coordinate how bridges attach to ethernet interfaces.David Gwynne
this is the first step in refactoring how ethernet frames are demuxed by virtual interfaces, and also in deprecating interface input list handling. we now have drivers for three types of virtual bridges, bridge(4), switch(4), and tpmr(4), and it doesn't make sense for any of them to be enabled on the same "port" interfaces at the same time. currently you can add a port interface to multiple types of bridge, but which one gets to steal the packets depends on the order in which they were attached. this creates an ether_brport structure that holds an input function for the bridge, and optionally some per port state that the bridge can use. arpcom has a single pointer to one of these structs that will be used during normal ether_input processing to see if a packet should be passed to a bridge, and will be used instead of an if input handler. because it is a single pointer, it will make sure only one bridge of any type is attached to a port at any one time. this has been in snaps as part of a larger diff for over a week.