src - OpenBSD base system

Age	Commit message (Collapse)	Author
2023-05-22	Fix TSO for traffic to a local address on a physical interface.	Alexander Bluhm
	When sending TCP packets with software TSO to the local address of a physical interface, the TCP checksum was miscalculated. As the small MSS is taken from the physical interface, but the large MTU of the loopback interface is used, large TSO packets are generated, but sent directly to the loopback interface. There we need the regular pseudo header checksum and not the modified without packet length. To avoid this confusion, use the same decision for checksum generation in in_proto_cksum_out() as for using hardware TSO in tcp_if_output_tso(). bug reported and tested by robert@ bket@ Hrvoje Popovski OK claudio@ jan@
2023-05-15	Implement the TCP/IP layer for hardware TCP segmentation offload.	Alexander Bluhm
	If the driver of a network interface claims to support TSO, do not chop the packet in software, but pass it down to the interface layer. Precalculate parts of the pseudo header checksum, but without the packet length. The length of all generated smaller packets is not known yet. Driver and hardware will use the mbuf packet header field ph_mss to calculate it and update checksum. Introduce separate flags IFCAP_TSOv4 and IFCAP_TSOv6 as hardware might support ony one protocol family. The old flag IFXF_TSO is only relevant for large receive offload. It is missnamed, but keep that for now. Note that drivers do not set TSO capabilites yet. Also the ifconfig flags and pseudo interfaces capabilities will be done separately. So this commit should not change behavior. heavily based on the work from jan@; OK sashan@
2023-05-13	Instead of implementing IPv4 header checksum creation everywhere,	Alexander Bluhm
	introduce in_hdr_cksum_out(). It is used like in_proto_cksum_out(). OK claudio@
2023-05-10	Implement TCP send offloading, for now in software only. This is	Alexander Bluhm
	meant as a fallback if network hardware does not support TSO. Driver support is still work in progress. TCP output generates large packets. In IP output the packet is chopped to TCP maximum segment size. This reduces the CPU cycles used by pf. The regular output could be assisted by hardware later, but pf route-to and IPsec needs the software fallback in general. For performance comparison or to workaround possible bugs, sysctl net.inet.tcp.tso=0 disables the feature. netstat -s -p tcp shows TSO counter with chopped and generated packets. based on work from jan@ tested by jmc@ jan@ Hrvoje Popovski OK jan@ claudio@
2023-05-08	The call to in_proto_cksum_out() is only needed before the packet	Alexander Bluhm
	is passed to ifp->if_output(). The fragment code has its own checksum calculation and the other paths end in goto bad. OK claudio@
2023-05-07	I preparation for TSO in software, cleanup the fragment code. Use	Alexander Bluhm
	if_output_ml() to send mbuf lists to interfaces. This can be used for TSO, fragments, ARP and ND6. Rename variable fml to ml. In pf_route6() split the if else block. Put the safety check (hlen + firstlen < tlen) into ip_fragment(). It makes the code correct in case the packet is too short to be fragmented. This should not happen, but other functions also have this logic. No functional change. OK sashan@
2022-08-12	Remove differences between ip_fragment() and ip6_fragment(). They	Alexander Bluhm
	do nearly the same thing, so they should look similar. OK sashan@
2022-05-25	Call if_put(9) after we finish with `ia' within ip_getmoptions().	Vitaliy Makkoveev
	if_put(9) call means we finish work with `ifp' and it could be destroyed. `ia' is the pointer to 'in_ifaddr' data belongs to `ifp', so we need to release corresponding `ifp' after we finish deal with `ia'. `if_addrlist' list destruction and ip_getmoptions() are serialized with kernel and net locks so this is not critical, but looks inconsistent. ok bluhm@
2022-01-04	Add `ipsec_flows_mtx' mutex(9) to protect `ipsp_ids_*' list and	YASUOKA Masahiko
	trees. ipsp_ids_lookup() returns `ids' with bumped reference counter. original diff from mvs ok mvs
2021-12-23	IPsec is not MP safe yet. To allow forwarding in parallel without	Alexander Bluhm
	dirty hacks, it is better to protect IPsec input and output with kernel lock. Not much is lost as crypto needs the kernel lock anyway. From here we can refine the lock later. Note that there is no kernel lock in the SPD lockup path. Goal is to keep that lock free to allow fast forwarding with non IPsec traffic. tested by Hrvoje Popovski; OK tobhe@
2021-12-20	Use per-CPU counters for tunnel descriptor block (TDB) statistics.	Vitaliy Makkoveev
	'tdb_data' struct became unused and was removed. Tested by Hrvoje Popovski. ok bluhm@
2021-12-03	Add TDB reference counting to ipsp_spd_lookup(). If an output	Alexander Bluhm
	pointer is passed to the function, it will return a refcounted TDB. The ref happens when ipsp_spd_inp() copies the pointer from ipo->ipo_tdb. The caller of ipsp_spd_lookup() has to unref after using it. tested by Hrvoje Popovski; OK mvs@ tobhe@
2021-12-01	Let ipsp_spd_lookup() return an error instead of a TDB. The TDB	Alexander Bluhm
	is not always needed, but the error value is necessary for the caller. As TDB should be refcounted, it makes not sense to always return it. Pass an output pointer for the TDB which can be NULL. OK mvs@ tobhe@
2021-11-24	When sending ICMP packets for IPsec path MTU discovery, the first	Alexander Bluhm
	ICMP packet could be wrong. The mtu was taken from the loopback interface as the tdb mtu was copied to the route too late. Without crypto task, ipsp_process_packet() returns the EMSGSIZE error earlier. Immediately update tdb and route mtu. IPv4 part from markus@; OK tobhe@
2021-07-27	Revert "Use per-CPU counters for tunnel descriptor block" diff.	mvs
	Panic reported by Hrvoje Popovski.
2021-07-26	Use per-CPU counters for tunnel descriptor block (tdb) statistics.	mvs
	'tdb_data' struct became unused and was removed. ok bluhm@
2021-07-08	Debug printfs in encdebug were inconsistent, some missing newlines	Alexander Bluhm
	produced ugly output. Move the function name and the newline into the DPRINTF macro. This simplifies the debug statements. OK tobhe@
2021-05-12	Use local copy of `ps_rtableid' in ip{,6}_ctloutput() and mark	mvs
	`ps_rtableid' as atomic. This allows us to unlock setrtable(2). ok claudio@ mpi@
2021-03-30	[ICMP] IP options lead to malformed reply	Alexandr Nedvedicky
	icmp_send() must update IP header length if IP optaions are appended. Such packet also has to be dispatched with IP_RAWOUTPUT flags. Bug reported and fix co-designed by Dominik Schreilechner _at_ siemens _dot_ com OK bluhm@
2021-03-20	use m_dup_pkthdr in ip_fragment to copy pkthdr info to fragments.	David Gwynne
	this ensures more stuff is copied, in particular the flowid information. this is also how v6 does it, which makes things more consistent. ok bluhm@
2021-03-01	Refactor ip_fragment() and ip6_fragment(). Use a mbuf list to	Alexander Bluhm
	simplify the handling of the fragment list. Now the functions ip_fragment() and ip6_fragment() always consume the mbuf. They free the mbuf and mbuf list in case of an error and take care about the counter. Adjust the code a bit to make v4 and v6 look similar. Fixes a potential mbuf leak when pf_route6() called pf_refragment6() and it failed. Now the mbuf is always freed by ip6_fragment(). OK dlg@ mvs@
2021-02-23	As ip_insertoptions() may prepend a mbuf, "goto bad" has to free	Alexander Bluhm
	the new chain. This fixes a potential memory leak in ip_output(). Also simplify a bunch of "goto done". OK kn@ mvs@
2021-02-23	Use NULL instead of 0 in `m_nextpkt' assignment.	mvs
	ok deraadt@ dlg@
2021-02-10	If pf changes the routing table when sending packets, the kernel	Alexander Bluhm
	could get stuck in an endless recursion during TCP path MTU discovery. Create a dynamic host route in ip_output() that can be used by tcp_mtudisc() to store the MTU. Reported by Peter Mueller and Sebastian Sturm OK claudio@
2021-02-06	Simplex interface sends packet back without hardware checksum	Alexander Bluhm
	offloading. The checksum must be calculated in software. Use the same condition in ether_resolve() to send the broadcast packet back to the stack and in in_ifcap_cksum() to force software checksumming. This fixes regress/sys/kern/sosplice/loop. OK procter@
2021-02-02	If IP_MULTICAST_IF or IP_ADD_MEMBERSHIP pass a interface index to the	Claudio Jeker
	kernel make sure that the rdomain of that interface is the same as the rdomain of the inpcb. Problem spotted and fix tested by semarie@ OK bluhm@ mvs@
2021-02-01	Fix path MTU discovery for ESP tunneled in IPv6. We always want	Alexander Bluhm
	short TCP segments or fragments encapsulated in ESP instead of fragmented ESP packets. Pass the don't fragment flag down along the stack so that dynamic routes with MTU are created eventually. with and OK markus@; OK tobhe@
2021-01-16	Extend IP_MULTICAST_IF to take either an address (struct in_addr), a	Claudio Jeker
	struct ip_mreq or a struct ip_mreqn. Using struct ip_mreqn allows to pass a interface index instead of specifying the multicast interface via its IP address. This is also the API implemented by Linux and FreeBSD and should help porting software. OK bluhm@ phessler@ robert@
2021-01-11	Create a path MTU host route for IPsec over IPv6. Basically the	Alexander Bluhm
	code is copied from IPv4 and adapted. Some things are changed in v4 to make it look similar. - ip6_forward increases the noroute error counter, do that in ip_forward, too. - Pass more specific sockaddr_in6 to icmp6_mtudisc_clone(). - IPv6 may also use reject routes for IPsec PMTU clones. - To pass a route_in6 to ip6_output_ipsec_send() introduce one in ip6_forward(). That is the same what IPv4 does. Note that dst and sin6 switch roles. - Copy comments from ip_output_ipsec_send() to ip6_output_ipsec_send() to make code similar. - Implement dynamic IPv6 IPsec PMTU routes. OK tobhe@
2021-01-07	Extend IP_ADD_MEMBERSHIP to also support struct ip_mreqn.	Claudio Jeker
	struct ip_mreqn allows to use the interface index to select the interface for multicast packets which makes it possible to use this with unnumbered interfaces. OK dlg@ robert@
2020-12-20	Accept reject and blackhole routes for IPsec PMTU discovery.	Alexander Bluhm
	Since revision 1.87 of ip_icmp.c icmp_mtudisc_clone() ignored reject routes. Otherwise TCP would clone these routes for PMTU discovery. They will not work, even after dynamic routing has found a better route than the reject route. With IPsec the use case is different. First you need a route, but then the flow handles the packet without routing. Usually this route should be a reject route to avoid sending unencrypted traffic if the flow is missing. But IPsec needs this route for PMTU discovery, so use it for that. OK claudio@ tobhe@
2020-06-24	kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)	cheloha
	time_second(9) and time_uptime(9) are widely used in the kernel to quickly get the system UTC or system uptime as a time_t. However, time_t is 64-bit everywhere, so it is not generally safe to use them on 32-bit platforms: you have a split-read problem if your hardware cannot perform atomic 64-bit reads. This patch replaces time_second(9) with gettime(9), a safer successor interface, throughout the kernel. Similarly, time_uptime(9) is replaced with getuptime(9). There is a performance cost on 32-bit platforms in exchange for eliminating the split-read problem: instead of two register reads you now have a lockless read loop to pull the values from the timehands. This is really not too bad in the grand scheme of things, but compared to what we were doing before it is several times slower. There is no performance cost on 64-bit (__LP64__) platforms. With input from visa@, dlg@, and tedu@. Several bugs squashed by visa@. ok kettenis@
2020-03-06	Fix uninitialized use of variable 'len'.	tobhe
	ok bluhm@
2019-06-10	Use mallocarray(9) & put some free(9) sizes for M_IPMOPTS allocations.	Martin Pieuchot
	ok semarie@, visa@
2019-04-28	Removes the KERNEL_LOCK() from bridge(4)'s output fast-path.	Martin Pieuchot
	This redefines the ifp <-> bridge relationship. No lock can be currently used across the multiples contexts where the bridge has tentacles to protect a pointer, use an interface index. Tested by various, ok dlg@, visa@
2019-01-18	Bring back the ip_pcbopts() refactor. Pad the option buffer and therefor	Claudio Jeker
	the mbuf to the next word length as it is required by the standard. Also use the correct offset from the input mbuf. OK visa@, input & OK bluhm@
2019-01-18	Revert Rev 1.351, the change is not quite right yet.	Claudio Jeker

2019-01-06	Rewrite ip_pcbopts() to fill a fresh mbuf with the ip options instead	Claudio Jeker
	of fiddling with the user supplied mbuf and then copy it at the end. OK visa@
2019-01-03	Replace a funky 'else switch' construct into something that is equal but	Claudio Jeker
	a lot easier to read. The if can simply return the error and so the else branch is no longer needed. Input and OK dhill@
2018-12-20	Replace a wrong poor mans m_trailingspace() with the real thing. The mbuf	Claudio Jeker
	passed to ip_pcbopts could be a cluster and so the size check is all wrong. found by Greg Steuck; OK bluhm@ Reported-by: syzbot+c2543ae6b6692a5843e3@syzkaller.appspotmail.com eVS: ----------------------------------------------------------------------
2018-08-28	Add per-TDB counters and a new SADB extension to export them to	Martin Pieuchot
	userland. Inputs from markus@, ok sthen@
2018-07-12	Introduce ipsec_output_cb() to merge duplicate code and account for	Martin Pieuchot
	dropped packets in the output path. While here fix a memory leak when compression is not needed w/ IPcomp. ok markus@
2018-03-21	In ip6_output() check that the interface of a route is valid. For	Alexander Bluhm
	IPv4 we do the same and there are races that triggers it. Increment the statistics counter for both. from markus@; OK mpi@
2018-02-19	Remove almost unused `flags' argument of suser().	Martin Pieuchot
	The account flag `ASU' will no longer be set but that makes suser() mpsafe since it no longer mess with a per-process field. No objection from millert@, ok tedu@, bluhm@
2017-11-22	It does not make sense to call pcb lookup from pf during packet	Alexander Bluhm
	forwarding. It should never match and would cause MP locking problems. While there remove an useless ifp parameter from ip_output_ipsec_send(). from markus@; OK visa@ sashan@
2017-10-26	Stop grabbing the KERNEL_LOCK() in network tasks when `ipsec_in_use'	Martin Pieuchot
	is set. Accesses to IPsec global data structure are now serialized by the NET_LOCK(). Tested by many, ok visa@, bluhm@
2017-09-20	Use m_copym() instead of m_dup_pkt() to fix a kernel assert when	Visa Hankala
	setting IP options. Issue reported by Kapetanakis Giannis OK mpi@
2017-09-01	Change sosetopt() to no longer free the mbuf it receives and change	Martin Pieuchot
	all the callers to call m_freem(9). Support from deraadt@ and tedu@, ok visa@, bluhm@
2017-05-29	Per-interface list of addresses, both multicast and unicast, are	Martin Pieuchot
	currently protected by the NET_LOCK(). They are not accessed in the hot path, so protecting them with a mutex could be an option. However since we're now going to run with a NET_LOCK() for some time, assert that it is held. IPsec is not yet ready to run without KERNEL_LOCK(), so assert it is held, even in the forwarding path. Tested by sthen@, ok visa@, claudio@, bluhm@
2017-04-19	Use the rt_rmx defines that hide the struct rt_kmetrics indirection.	Alexander Bluhm
	No binary change. OK mpi@