Age | Commit message (Collapse) | Author |
|
When sending TCP packets with software TSO to the local address of
a physical interface, the TCP checksum was miscalculated. As the
small MSS is taken from the physical interface, but the large MTU
of the loopback interface is used, large TSO packets are generated,
but sent directly to the loopback interface. There we need the
regular pseudo header checksum and not the modified without packet
length.
To avoid this confusion, use the same decision for checksum generation
in in_proto_cksum_out() as for using hardware TSO in tcp_if_output_tso().
bug reported and tested by robert@ bket@ Hrvoje Popovski
OK claudio@ jan@
|
|
If the driver of a network interface claims to support TSO, do not
chop the packet in software, but pass it down to the interface
layer.
Precalculate parts of the pseudo header checksum, but without the
packet length. The length of all generated smaller packets is not
known yet. Driver and hardware will use the mbuf packet header
field ph_mss to calculate it and update checksum.
Introduce separate flags IFCAP_TSOv4 and IFCAP_TSOv6 as hardware
might support ony one protocol family. The old flag IFXF_TSO is
only relevant for large receive offload. It is missnamed, but keep
that for now.
Note that drivers do not set TSO capabilites yet. Also the ifconfig
flags and pseudo interfaces capabilities will be done separately.
So this commit should not change behavior.
heavily based on the work from jan@; OK sashan@
|
|
introduce in_hdr_cksum_out(). It is used like in_proto_cksum_out().
OK claudio@
|
|
meant as a fallback if network hardware does not support TSO. Driver
support is still work in progress. TCP output generates large
packets. In IP output the packet is chopped to TCP maximum segment
size. This reduces the CPU cycles used by pf. The regular output
could be assisted by hardware later, but pf route-to and IPsec needs
the software fallback in general.
For performance comparison or to workaround possible bugs, sysctl
net.inet.tcp.tso=0 disables the feature. netstat -s -p tcp shows
TSO counter with chopped and generated packets.
based on work from jan@
tested by jmc@ jan@ Hrvoje Popovski
OK jan@ claudio@
|
|
is passed to ifp->if_output(). The fragment code has its own
checksum calculation and the other paths end in goto bad.
OK claudio@
|
|
if_output_ml() to send mbuf lists to interfaces. This can be used
for TSO, fragments, ARP and ND6. Rename variable fml to ml. In
pf_route6() split the if else block. Put the safety check (hlen +
firstlen < tlen) into ip_fragment(). It makes the code correct in
case the packet is too short to be fragmented. This should not
happen, but other functions also have this logic.
No functional change. OK sashan@
|
|
do nearly the same thing, so they should look similar.
OK sashan@
|
|
if_put(9) call means we finish work with `ifp' and it could be destroyed.
`ia' is the pointer to 'in_ifaddr' data belongs to `ifp', so we need to
release corresponding `ifp' after we finish deal with `ia'.
`if_addrlist' list destruction and ip_getmoptions() are serialized with
kernel and net locks so this is not critical, but looks inconsistent.
ok bluhm@
|
|
trees. ipsp_ids_lookup() returns `ids' with bumped reference
counter. original diff from mvs
ok mvs
|
|
dirty hacks, it is better to protect IPsec input and output with
kernel lock. Not much is lost as crypto needs the kernel lock
anyway. From here we can refine the lock later.
Note that there is no kernel lock in the SPD lockup path. Goal is
to keep that lock free to allow fast forwarding with non IPsec
traffic.
tested by Hrvoje Popovski; OK tobhe@
|
|
'tdb_data' struct became unused and was removed.
Tested by Hrvoje Popovski.
ok bluhm@
|
|
pointer is passed to the function, it will return a refcounted TDB.
The ref happens when ipsp_spd_inp() copies the pointer from
ipo->ipo_tdb. The caller of ipsp_spd_lookup() has to unref after
using it.
tested by Hrvoje Popovski; OK mvs@ tobhe@
|
|
is not always needed, but the error value is necessary for the
caller. As TDB should be refcounted, it makes not sense to always
return it. Pass an output pointer for the TDB which can be NULL.
OK mvs@ tobhe@
|
|
ICMP packet could be wrong. The mtu was taken from the loopback
interface as the tdb mtu was copied to the route too late. Without
crypto task, ipsp_process_packet() returns the EMSGSIZE error
earlier. Immediately update tdb and route mtu.
IPv4 part from markus@; OK tobhe@
|
|
Panic reported by Hrvoje Popovski.
|
|
'tdb_data' struct became unused and was removed.
ok bluhm@
|
|
produced ugly output. Move the function name and the newline into
the DPRINTF macro. This simplifies the debug statements.
OK tobhe@
|
|
`ps_rtableid' as atomic. This allows us to unlock setrtable(2).
ok claudio@ mpi@
|
|
icmp_send() must update IP header length if IP optaions are appended.
Such packet also has to be dispatched with IP_RAWOUTPUT flags.
Bug reported and fix co-designed by Dominik Schreilechner _at_ siemens _dot_ com
OK bluhm@
|
|
this ensures more stuff is copied, in particular the flowid
information. this is also how v6 does it, which makes things more
consistent.
ok bluhm@
|
|
simplify the handling of the fragment list. Now the functions
ip_fragment() and ip6_fragment() always consume the mbuf. They
free the mbuf and mbuf list in case of an error and take care about
the counter. Adjust the code a bit to make v4 and v6 look similar.
Fixes a potential mbuf leak when pf_route6() called pf_refragment6()
and it failed. Now the mbuf is always freed by ip6_fragment().
OK dlg@ mvs@
|
|
the new chain. This fixes a potential memory leak in ip_output().
Also simplify a bunch of "goto done".
OK kn@ mvs@
|
|
ok deraadt@ dlg@
|
|
could get stuck in an endless recursion during TCP path MTU discovery.
Create a dynamic host route in ip_output() that can be used by
tcp_mtudisc() to store the MTU.
Reported by Peter Mueller and Sebastian Sturm
OK claudio@
|
|
offloading. The checksum must be calculated in software. Use the
same condition in ether_resolve() to send the broadcast packet back
to the stack and in in_ifcap_cksum() to force software checksumming.
This fixes regress/sys/kern/sosplice/loop.
OK procter@
|
|
kernel make sure that the rdomain of that interface is the same as
the rdomain of the inpcb.
Problem spotted and fix tested by semarie@
OK bluhm@ mvs@
|
|
short TCP segments or fragments encapsulated in ESP instead of
fragmented ESP packets. Pass the don't fragment flag down along
the stack so that dynamic routes with MTU are created eventually.
with and OK markus@; OK tobhe@
|
|
struct ip_mreq or a struct ip_mreqn. Using struct ip_mreqn allows to
pass a interface index instead of specifying the multicast interface
via its IP address. This is also the API implemented by Linux and
FreeBSD and should help porting software.
OK bluhm@ phessler@ robert@
|
|
code is copied from IPv4 and adapted. Some things are changed in
v4 to make it look similar.
- ip6_forward increases the noroute error counter, do that in
ip_forward, too.
- Pass more specific sockaddr_in6 to icmp6_mtudisc_clone().
- IPv6 may also use reject routes for IPsec PMTU clones.
- To pass a route_in6 to ip6_output_ipsec_send() introduce one in
ip6_forward(). That is the same what IPv4 does. Note
that dst and sin6 switch roles.
- Copy comments from ip_output_ipsec_send() to ip6_output_ipsec_send()
to make code similar.
- Implement dynamic IPv6 IPsec PMTU routes.
OK tobhe@
|
|
struct ip_mreqn allows to use the interface index to select the
interface for multicast packets which makes it possible to use
this with unnumbered interfaces.
OK dlg@ robert@
|
|
Since revision 1.87 of ip_icmp.c icmp_mtudisc_clone() ignored reject
routes. Otherwise TCP would clone these routes for PMTU discovery.
They will not work, even after dynamic routing has found a better
route than the reject route.
With IPsec the use case is different. First you need a route, but
then the flow handles the packet without routing. Usually this
route should be a reject route to avoid sending unencrypted traffic
if the flow is missing. But IPsec needs this route for PMTU
discovery, so use it for that.
OK claudio@ tobhe@
|
|
time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.
This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).
There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.
There is no performance cost on 64-bit (__LP64__) platforms.
With input from visa@, dlg@, and tedu@.
Several bugs squashed by visa@.
ok kettenis@
|
|
ok bluhm@
|
|
ok semarie@, visa@
|
|
This redefines the ifp <-> bridge relationship. No lock can be
currently used across the multiples contexts where the bridge has
tentacles to protect a pointer, use an interface index.
Tested by various, ok dlg@, visa@
|
|
the mbuf to the next word length as it is required by the standard. Also use
the correct offset from the input mbuf.
OK visa@, input & OK bluhm@
|
|
|
|
of fiddling with the user supplied mbuf and then copy it at the end.
OK visa@
|
|
a lot easier to read. The if can simply return the error and so the else
branch is no longer needed.
Input and OK dhill@
|
|
passed to ip_pcbopts could be a cluster and so the size check is all wrong.
found by Greg Steuck; OK bluhm@
Reported-by: syzbot+c2543ae6b6692a5843e3@syzkaller.appspotmail.com
eVS: ----------------------------------------------------------------------
|
|
userland.
Inputs from markus@, ok sthen@
|
|
dropped packets in the output path.
While here fix a memory leak when compression is not needed w/ IPcomp.
ok markus@
|
|
IPv4 we do the same and there are races that triggers it. Increment
the statistics counter for both.
from markus@; OK mpi@
|
|
The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.
No objection from millert@, ok tedu@, bluhm@
|
|
forwarding. It should never match and would cause MP locking
problems. While there remove an useless ifp parameter from
ip_output_ipsec_send().
from markus@; OK visa@ sashan@
|
|
is set.
Accesses to IPsec global data structure are now serialized by the
NET_LOCK().
Tested by many, ok visa@, bluhm@
|
|
setting IP options.
Issue reported by Kapetanakis Giannis
OK mpi@
|
|
all the callers to call m_freem(9).
Support from deraadt@ and tedu@, ok visa@, bluhm@
|
|
currently protected by the NET_LOCK().
They are not accessed in the hot path, so protecting them with a
mutex could be an option. However since we're now going to run
with a NET_LOCK() for some time, assert that it is held.
IPsec is not yet ready to run without KERNEL_LOCK(), so assert it
is held, even in the forwarding path.
Tested by sthen@, ok visa@, claudio@, bluhm@
|
|
No binary change.
OK mpi@
|