summaryrefslogtreecommitdiff
path: root/sys/netinet/ip_output.c
AgeCommit message (Collapse)Author
2023-05-22Fix TSO for traffic to a local address on a physical interface.Alexander Bluhm
When sending TCP packets with software TSO to the local address of a physical interface, the TCP checksum was miscalculated. As the small MSS is taken from the physical interface, but the large MTU of the loopback interface is used, large TSO packets are generated, but sent directly to the loopback interface. There we need the regular pseudo header checksum and not the modified without packet length. To avoid this confusion, use the same decision for checksum generation in in_proto_cksum_out() as for using hardware TSO in tcp_if_output_tso(). bug reported and tested by robert@ bket@ Hrvoje Popovski OK claudio@ jan@
2023-05-15Implement the TCP/IP layer for hardware TCP segmentation offload.Alexander Bluhm
If the driver of a network interface claims to support TSO, do not chop the packet in software, but pass it down to the interface layer. Precalculate parts of the pseudo header checksum, but without the packet length. The length of all generated smaller packets is not known yet. Driver and hardware will use the mbuf packet header field ph_mss to calculate it and update checksum. Introduce separate flags IFCAP_TSOv4 and IFCAP_TSOv6 as hardware might support ony one protocol family. The old flag IFXF_TSO is only relevant for large receive offload. It is missnamed, but keep that for now. Note that drivers do not set TSO capabilites yet. Also the ifconfig flags and pseudo interfaces capabilities will be done separately. So this commit should not change behavior. heavily based on the work from jan@; OK sashan@
2023-05-13Instead of implementing IPv4 header checksum creation everywhere,Alexander Bluhm
introduce in_hdr_cksum_out(). It is used like in_proto_cksum_out(). OK claudio@
2023-05-10Implement TCP send offloading, for now in software only. This isAlexander Bluhm
meant as a fallback if network hardware does not support TSO. Driver support is still work in progress. TCP output generates large packets. In IP output the packet is chopped to TCP maximum segment size. This reduces the CPU cycles used by pf. The regular output could be assisted by hardware later, but pf route-to and IPsec needs the software fallback in general. For performance comparison or to workaround possible bugs, sysctl net.inet.tcp.tso=0 disables the feature. netstat -s -p tcp shows TSO counter with chopped and generated packets. based on work from jan@ tested by jmc@ jan@ Hrvoje Popovski OK jan@ claudio@
2023-05-08The call to in_proto_cksum_out() is only needed before the packetAlexander Bluhm
is passed to ifp->if_output(). The fragment code has its own checksum calculation and the other paths end in goto bad. OK claudio@
2023-05-07I preparation for TSO in software, cleanup the fragment code. UseAlexander Bluhm
if_output_ml() to send mbuf lists to interfaces. This can be used for TSO, fragments, ARP and ND6. Rename variable fml to ml. In pf_route6() split the if else block. Put the safety check (hlen + firstlen < tlen) into ip_fragment(). It makes the code correct in case the packet is too short to be fragmented. This should not happen, but other functions also have this logic. No functional change. OK sashan@
2022-08-12Remove differences between ip_fragment() and ip6_fragment(). TheyAlexander Bluhm
do nearly the same thing, so they should look similar. OK sashan@
2022-05-25Call if_put(9) after we finish with `ia' within ip_getmoptions().Vitaliy Makkoveev
if_put(9) call means we finish work with `ifp' and it could be destroyed. `ia' is the pointer to 'in_ifaddr' data belongs to `ifp', so we need to release corresponding `ifp' after we finish deal with `ia'. `if_addrlist' list destruction and ip_getmoptions() are serialized with kernel and net locks so this is not critical, but looks inconsistent. ok bluhm@
2022-01-04Add `ipsec_flows_mtx' mutex(9) to protect `ipsp_ids_*' list andYASUOKA Masahiko
trees. ipsp_ids_lookup() returns `ids' with bumped reference counter. original diff from mvs ok mvs
2021-12-23IPsec is not MP safe yet. To allow forwarding in parallel withoutAlexander Bluhm
dirty hacks, it is better to protect IPsec input and output with kernel lock. Not much is lost as crypto needs the kernel lock anyway. From here we can refine the lock later. Note that there is no kernel lock in the SPD lockup path. Goal is to keep that lock free to allow fast forwarding with non IPsec traffic. tested by Hrvoje Popovski; OK tobhe@
2021-12-20Use per-CPU counters for tunnel descriptor block (TDB) statistics.Vitaliy Makkoveev
'tdb_data' struct became unused and was removed. Tested by Hrvoje Popovski. ok bluhm@
2021-12-03Add TDB reference counting to ipsp_spd_lookup(). If an outputAlexander Bluhm
pointer is passed to the function, it will return a refcounted TDB. The ref happens when ipsp_spd_inp() copies the pointer from ipo->ipo_tdb. The caller of ipsp_spd_lookup() has to unref after using it. tested by Hrvoje Popovski; OK mvs@ tobhe@
2021-12-01Let ipsp_spd_lookup() return an error instead of a TDB. The TDBAlexander Bluhm
is not always needed, but the error value is necessary for the caller. As TDB should be refcounted, it makes not sense to always return it. Pass an output pointer for the TDB which can be NULL. OK mvs@ tobhe@
2021-11-24When sending ICMP packets for IPsec path MTU discovery, the firstAlexander Bluhm
ICMP packet could be wrong. The mtu was taken from the loopback interface as the tdb mtu was copied to the route too late. Without crypto task, ipsp_process_packet() returns the EMSGSIZE error earlier. Immediately update tdb and route mtu. IPv4 part from markus@; OK tobhe@
2021-07-27Revert "Use per-CPU counters for tunnel descriptor block" diff.mvs
Panic reported by Hrvoje Popovski.
2021-07-26Use per-CPU counters for tunnel descriptor block (tdb) statistics.mvs
'tdb_data' struct became unused and was removed. ok bluhm@
2021-07-08Debug printfs in encdebug were inconsistent, some missing newlinesAlexander Bluhm
produced ugly output. Move the function name and the newline into the DPRINTF macro. This simplifies the debug statements. OK tobhe@
2021-05-12Use local copy of `ps_rtableid' in ip{,6}_ctloutput() and markmvs
`ps_rtableid' as atomic. This allows us to unlock setrtable(2). ok claudio@ mpi@
2021-03-30[ICMP] IP options lead to malformed replyAlexandr Nedvedicky
icmp_send() must update IP header length if IP optaions are appended. Such packet also has to be dispatched with IP_RAWOUTPUT flags. Bug reported and fix co-designed by Dominik Schreilechner _at_ siemens _dot_ com OK bluhm@
2021-03-20use m_dup_pkthdr in ip_fragment to copy pkthdr info to fragments.David Gwynne
this ensures more stuff is copied, in particular the flowid information. this is also how v6 does it, which makes things more consistent. ok bluhm@
2021-03-01Refactor ip_fragment() and ip6_fragment(). Use a mbuf list toAlexander Bluhm
simplify the handling of the fragment list. Now the functions ip_fragment() and ip6_fragment() always consume the mbuf. They free the mbuf and mbuf list in case of an error and take care about the counter. Adjust the code a bit to make v4 and v6 look similar. Fixes a potential mbuf leak when pf_route6() called pf_refragment6() and it failed. Now the mbuf is always freed by ip6_fragment(). OK dlg@ mvs@
2021-02-23As ip_insertoptions() may prepend a mbuf, "goto bad" has to freeAlexander Bluhm
the new chain. This fixes a potential memory leak in ip_output(). Also simplify a bunch of "goto done". OK kn@ mvs@
2021-02-23Use NULL instead of 0 in `m_nextpkt' assignment.mvs
ok deraadt@ dlg@
2021-02-10If pf changes the routing table when sending packets, the kernelAlexander Bluhm
could get stuck in an endless recursion during TCP path MTU discovery. Create a dynamic host route in ip_output() that can be used by tcp_mtudisc() to store the MTU. Reported by Peter Mueller and Sebastian Sturm OK claudio@
2021-02-06Simplex interface sends packet back without hardware checksumAlexander Bluhm
offloading. The checksum must be calculated in software. Use the same condition in ether_resolve() to send the broadcast packet back to the stack and in in_ifcap_cksum() to force software checksumming. This fixes regress/sys/kern/sosplice/loop. OK procter@
2021-02-02If IP_MULTICAST_IF or IP_ADD_MEMBERSHIP pass a interface index to theClaudio Jeker
kernel make sure that the rdomain of that interface is the same as the rdomain of the inpcb. Problem spotted and fix tested by semarie@ OK bluhm@ mvs@
2021-02-01Fix path MTU discovery for ESP tunneled in IPv6. We always wantAlexander Bluhm
short TCP segments or fragments encapsulated in ESP instead of fragmented ESP packets. Pass the don't fragment flag down along the stack so that dynamic routes with MTU are created eventually. with and OK markus@; OK tobhe@
2021-01-16Extend IP_MULTICAST_IF to take either an address (struct in_addr), aClaudio Jeker
struct ip_mreq or a struct ip_mreqn. Using struct ip_mreqn allows to pass a interface index instead of specifying the multicast interface via its IP address. This is also the API implemented by Linux and FreeBSD and should help porting software. OK bluhm@ phessler@ robert@
2021-01-11Create a path MTU host route for IPsec over IPv6. Basically theAlexander Bluhm
code is copied from IPv4 and adapted. Some things are changed in v4 to make it look similar. - ip6_forward increases the noroute error counter, do that in ip_forward, too. - Pass more specific sockaddr_in6 to icmp6_mtudisc_clone(). - IPv6 may also use reject routes for IPsec PMTU clones. - To pass a route_in6 to ip6_output_ipsec_send() introduce one in ip6_forward(). That is the same what IPv4 does. Note that dst and sin6 switch roles. - Copy comments from ip_output_ipsec_send() to ip6_output_ipsec_send() to make code similar. - Implement dynamic IPv6 IPsec PMTU routes. OK tobhe@
2021-01-07Extend IP_ADD_MEMBERSHIP to also support struct ip_mreqn.Claudio Jeker
struct ip_mreqn allows to use the interface index to select the interface for multicast packets which makes it possible to use this with unnumbered interfaces. OK dlg@ robert@
2020-12-20Accept reject and blackhole routes for IPsec PMTU discovery.Alexander Bluhm
Since revision 1.87 of ip_icmp.c icmp_mtudisc_clone() ignored reject routes. Otherwise TCP would clone these routes for PMTU discovery. They will not work, even after dynamic routing has found a better route than the reject route. With IPsec the use case is different. First you need a route, but then the flow handles the packet without routing. Usually this route should be a reject route to avoid sending unencrypted traffic if the flow is missing. But IPsec needs this route for PMTU discovery, so use it for that. OK claudio@ tobhe@
2020-06-24kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)cheloha
time_second(9) and time_uptime(9) are widely used in the kernel to quickly get the system UTC or system uptime as a time_t. However, time_t is 64-bit everywhere, so it is not generally safe to use them on 32-bit platforms: you have a split-read problem if your hardware cannot perform atomic 64-bit reads. This patch replaces time_second(9) with gettime(9), a safer successor interface, throughout the kernel. Similarly, time_uptime(9) is replaced with getuptime(9). There is a performance cost on 32-bit platforms in exchange for eliminating the split-read problem: instead of two register reads you now have a lockless read loop to pull the values from the timehands. This is really not *too* bad in the grand scheme of things, but compared to what we were doing before it is several times slower. There is no performance cost on 64-bit (__LP64__) platforms. With input from visa@, dlg@, and tedu@. Several bugs squashed by visa@. ok kettenis@
2020-03-06Fix uninitialized use of variable 'len'.tobhe
ok bluhm@
2019-06-10Use mallocarray(9) & put some free(9) sizes for M_IPMOPTS allocations.Martin Pieuchot
ok semarie@, visa@
2019-04-28Removes the KERNEL_LOCK() from bridge(4)'s output fast-path.Martin Pieuchot
This redefines the ifp <-> bridge relationship. No lock can be currently used across the multiples contexts where the bridge has tentacles to protect a pointer, use an interface index. Tested by various, ok dlg@, visa@
2019-01-18Bring back the ip_pcbopts() refactor. Pad the option buffer and thereforClaudio Jeker
the mbuf to the next word length as it is required by the standard. Also use the correct offset from the input mbuf. OK visa@, input & OK bluhm@
2019-01-18Revert Rev 1.351, the change is not quite right yet.Claudio Jeker
2019-01-06Rewrite ip_pcbopts() to fill a fresh mbuf with the ip options insteadClaudio Jeker
of fiddling with the user supplied mbuf and then copy it at the end. OK visa@
2019-01-03Replace a funky 'else switch' construct into something that is equal butClaudio Jeker
a lot easier to read. The if can simply return the error and so the else branch is no longer needed. Input and OK dhill@
2018-12-20Replace a wrong poor mans m_trailingspace() with the real thing. The mbufClaudio Jeker
passed to ip_pcbopts could be a cluster and so the size check is all wrong. found by Greg Steuck; OK bluhm@ Reported-by: syzbot+c2543ae6b6692a5843e3@syzkaller.appspotmail.com eVS: ----------------------------------------------------------------------
2018-08-28Add per-TDB counters and a new SADB extension to export them toMartin Pieuchot
userland. Inputs from markus@, ok sthen@
2018-07-12Introduce ipsec_output_cb() to merge duplicate code and account forMartin Pieuchot
dropped packets in the output path. While here fix a memory leak when compression is not needed w/ IPcomp. ok markus@
2018-03-21In ip6_output() check that the interface of a route is valid. ForAlexander Bluhm
IPv4 we do the same and there are races that triggers it. Increment the statistics counter for both. from markus@; OK mpi@
2018-02-19Remove almost unused `flags' argument of suser().Martin Pieuchot
The account flag `ASU' will no longer be set but that makes suser() mpsafe since it no longer mess with a per-process field. No objection from millert@, ok tedu@, bluhm@
2017-11-22It does not make sense to call pcb lookup from pf during packetAlexander Bluhm
forwarding. It should never match and would cause MP locking problems. While there remove an useless ifp parameter from ip_output_ipsec_send(). from markus@; OK visa@ sashan@
2017-10-26Stop grabbing the KERNEL_LOCK() in network tasks when `ipsec_in_use'Martin Pieuchot
is set. Accesses to IPsec global data structure are now serialized by the NET_LOCK(). Tested by many, ok visa@, bluhm@
2017-09-20Use m_copym() instead of m_dup_pkt() to fix a kernel assert whenVisa Hankala
setting IP options. Issue reported by Kapetanakis Giannis OK mpi@
2017-09-01Change sosetopt() to no longer free the mbuf it receives and changeMartin Pieuchot
all the callers to call m_freem(9). Support from deraadt@ and tedu@, ok visa@, bluhm@
2017-05-29Per-interface list of addresses, both multicast and unicast, areMartin Pieuchot
currently protected by the NET_LOCK(). They are not accessed in the hot path, so protecting them with a mutex could be an option. However since we're now going to run with a NET_LOCK() for some time, assert that it is held. IPsec is not yet ready to run without KERNEL_LOCK(), so assert it is held, even in the forwarding path. Tested by sthen@, ok visa@, claudio@, bluhm@
2017-04-19Use the rt_rmx defines that hide the struct rt_kmetrics indirection.Alexander Bluhm
No binary change. OK mpi@