src - OpenBSD base system

Age	Commit message (Collapse)	Author
2023-06-24	Calculate inet PCB SIP hash without table mutex.	Alexander Bluhm
	Goal is to run UDP input in parallel. Btrace kstack analysis shows that SIP hash for PCB lookup is quite expensive. When running in parallel, there is also lock contention on the PCB table mutex. It results in better performance to calculate the hash value before taking the mutex. The hash secret has to be constant as hash calculation must not depend on values protected by the table mutex. Do not reseed anymore when hash table gets resized. Analysis also shows that asserting a rw_lock while holding a mutex is a bit expensive. Just remove the netlock assert. OK dlg@ mvs@
2023-06-16	If TSO is enabled, fix the IPv6 forward counters and icmp6 redirect.	Alexander Bluhm
	First try to send with TSO. The goto senderr handles icmp6 redirect and other errors. If TSO is not necessary and the interface MTU fits, just send the packet. Again goto senderr handles icmp6. Finally care about icmp6 packet too big. tested and OK jan@
2023-06-14	Add missing kernel lock around (*if_ioctl)().	Vitaliy Makkoveev
	ok bluhm
2023-06-13	Fix a typo with TSO logic in ip6_output(). Of course compare ph_mss	Alexander Bluhm
	with if_mtu and not the packet checksum flags. ph_mss contains the size of the copped packets. OK jan@
2023-06-01	Enable forwarding of ix(4) LRO Pakets via TSO	Jan Klemkow
	Also fix ip6_forwarding of TSO packets with tcp_if_output_tso(). With a lot of testing from Hrvoje Popovski and a lot of tweaks from bluhm@ ok bluhm@
2023-05-22	Fix TSO for traffic to a local address on a physical interface.	Alexander Bluhm
	When sending TCP packets with software TSO to the local address of a physical interface, the TCP checksum was miscalculated. As the small MSS is taken from the physical interface, but the large MTU of the loopback interface is used, large TSO packets are generated, but sent directly to the loopback interface. There we need the regular pseudo header checksum and not the modified without packet length. To avoid this confusion, use the same decision for checksum generation in in_proto_cksum_out() as for using hardware TSO in tcp_if_output_tso(). bug reported and tested by robert@ bket@ Hrvoje Popovski OK claudio@ jan@
2023-05-15	Implement the TCP/IP layer for hardware TCP segmentation offload.	Alexander Bluhm
	If the driver of a network interface claims to support TSO, do not chop the packet in software, but pass it down to the interface layer. Precalculate parts of the pseudo header checksum, but without the packet length. The length of all generated smaller packets is not known yet. Driver and hardware will use the mbuf packet header field ph_mss to calculate it and update checksum. Introduce separate flags IFCAP_TSOv4 and IFCAP_TSOv6 as hardware might support ony one protocol family. The old flag IFXF_TSO is only relevant for large receive offload. It is missnamed, but keep that for now. Note that drivers do not set TSO capabilites yet. Also the ifconfig flags and pseudo interfaces capabilities will be done separately. So this commit should not change behavior. heavily based on the work from jan@; OK sashan@
2023-05-13	Finally remove the kernel lock from IPv6 neighbor discovery. ND6	Alexander Bluhm
	entries in rt_llinfo are protected either by exclusive netlock or the ND6 mutex. The performance critical lookup path in nd6_resolve() uses shared netlock, but is not lockless. In contrast to ARP it grabs the mutex also in the common case. tested by Hrvoje Popovski; with and OK kn@
2023-05-12	Make access to rt_llinfo consistent and remove needless initialisation.	Alexander Bluhm
	OK mvs@
2023-05-10	Implement TCP send offloading, for now in software only. This is	Alexander Bluhm
	meant as a fallback if network hardware does not support TSO. Driver support is still work in progress. TCP output generates large packets. In IP output the packet is chopped to TCP maximum segment size. This reduces the CPU cycles used by pf. The regular output could be assisted by hardware later, but pf route-to and IPsec needs the software fallback in general. For performance comparison or to workaround possible bugs, sysctl net.inet.tcp.tso=0 disables the feature. netstat -s -p tcp shows TSO counter with chopped and generated packets. based on work from jan@ tested by jmc@ jan@ Hrvoje Popovski OK jan@ claudio@
2023-05-08	The call to in_proto_cksum_out() is only needed before the packet	Alexander Bluhm
	is passed to ifp->if_output(). The fragment code has its own checksum calculation and the other paths end in goto bad. OK claudio@
2023-05-08	To make ND6 mp-safe, the life time of struct llinfo_nd6 *ln =	Alexander Bluhm
	rt->rt_llinfo has to be guaranteed. Replace the complicated logic in nd6_rtrequest() case RTM_ADD with what we have in ARP. This avoids accessing ln here. Digging through histroy shows a lot of refactoring that makes rt_expire handling in RTM_ADD obsolete. Just initialize it to 0. Cloning and local routes should never expire. If RTF_LLINFO is set, ln should not be NULL. So nd6_llinfo_settimer() was not reached in this case. While there, remove obsolete comments and #if 0 code that never worked. OK kn@ claudio@
2023-05-08	As the nd6 mutex protects the lifetime of struct llinfo_nd6 ln,	Alexander Bluhm
	nd6_mtx must be held longer in nd6_rtrequest() case RTM_RESOLVE. OK kn@
2023-05-07	I preparation for TSO in software, cleanup the fragment code. Use	Alexander Bluhm
	if_output_ml() to send mbuf lists to interfaces. This can be used for TSO, fragments, ARP and ND6. Rename variable fml to ml. In pf_route6() split the if else block. Put the safety check (hlen + firstlen < tlen) into ip_fragment(). It makes the code correct in case the packet is too short to be fragmented. This should not happen, but other functions also have this logic. No functional change. OK sashan@
2023-05-04	Introduce a neighbor discovery mutex like ARP uses it. For now it	Alexander Bluhm
	only protects nd6_list. It does not unlock ND6 from kernel lock yet. OK kn@
2023-05-03	Some checks in nd6_resolve() do not require kernel lock. The analog	Alexander Bluhm
	code for ARP has been unlocked a while ago. OK kn@
2023-05-02	Call nd6_ns_output() without kernel lock from nd6_resolve().	Alexander Bluhm
	OK kn@
2023-04-28	Inbound portion of RFC9131. Routers can create new neighbor cache entries	Peter Hessler
	when receiving a valid Neighbor Advertisement. OK florian@ kn@
2023-04-25	When configuring a new address on an interface, an upstream router	Peter Hessler
	doesn't know where to send traffic. This will send an unsolicited neighbor advertisement, as described in RFC9131, to the all-routers multicast address so all routers on the same link will learn the path back to the address. This is intended to speed up the first return packet on an IPv6 interface. OK florian@
2023-04-21	Drop error variable and return directly; OK mvs tb	Klemens Nanni

2023-04-19	move kernel lock into multicast ioctl handlers; OK mvs	Klemens Nanni

2023-04-05	Push kernel lock into nd6_resolve()	Klemens Nanni
	Tested as part of bigger unlock diffs, commit now as tiny first step. OK bluhm
2023-04-05	ARP has a sysctl to show the number of packets waiting for an arp	Alexander Bluhm
	response. Implement analog sysctl net.inet6.icmp6.nd6_queued for ND6 to reduce places where mbufs can hide within the kernel. Atomic operations operate on unsigned int. Make the type of total hold queue length consistent. Use atomic load to read the value for the sysctl. This clarifies why no lock around sysctl_rdint() is needed. OK mvs@ kn@
2023-04-05	ARP has a queue of packets that should be sent after name resolution.	Alexander Bluhm
	ND6 did only hold a single packet. Unify the logic and add a mbuf hold queue to struct llinfo_nd6. This is MP safe and queue limits are tracked with atomic operations. New function if_mqoutput() has common code for ARP and ND6. ln_saddr6 holds the source address of the requesting packet. That is easier than fiddling with mbuf queue in nd6_ns_output(). OK kn@
2023-04-05	Call getuptime(9) once for consistency; OK bluhm	Klemens Nanni

2023-04-05	Call getuptime(9) once for consistency, sync with ARP	Klemens Nanni
	Feedback OK bluhm
2023-04-04	When sending IP packets to userland with divert-packet rules, the	Alexander Bluhm
	checksum may be wrong. Locally generated packets diverted by pf out rules may have no checksum due to to hardware offloading. Calculate the checksum in that case. OK mvs@ sashan@
2023-03-31	Fix white space.	Alexander Bluhm

2023-03-25	sync nd6_resolve() EINVAL handling with arpresolve()	Klemens Nanni
	Less diff between them; merging three returns into one also reduces upcoming unlock diffs. OK bluhm
2023-03-25	sync nd6_resolve() uptime handling with arpresolve()	Klemens Nanni
	makes the two familiar functions look more alike; OK bluhm
2023-01-24	Refactor nd6_options() a bit more. Rewrite the loop to be a proper loop	Claudio Jeker
	and not some endless loop with some gotos. OK kn@
2023-01-22	Move SS_CANTRCVMORE and SS_RCVATMARK bits from `so_state' to `sb_state' of	Vitaliy Makkoveev
	receive buffer. As it was done for SS_CANTSENDMORE bit, the definition kept as is, but now these bits belongs to the `sb_state' of receive buffer. `sb_state' ored with `so_state' when socket data exporting to the userland. ok bluhm@
2023-01-06	Clean up struct nd_opts, use nd6_options() function local variables	Klemens Nanni
	nd_opts_search is really the next option, so call it next_opt. nd_opts_done == 1 means next_opt == NULL, i.e. no more option to handle, so zap the former and use the latter to stop. Finally drop the useless struct members, all under _KERNEL. OK claudio
2023-01-06	Inline nd6_option() helper, remove indirections	Klemens Nanni
	Move the function body into the while loop, merge identical variables, pull the `invalid' label out of the loop and straighten `skip' into the `skip1' label. Merging nd6_option() into nd6_options() is now much clearer after the previous clean up. nd_opts_{search,last,done} are now clearly "private" to n6_options() and can be cleaned up from struct nd_opts next. OK claudio
2023-01-06	Clarify nd6_option() return semantics	Klemens Nanni
	nd_opts_last is set only once in nd6_options() during struct init and guaranteed non-NULL as it is set to the function's argument opt which is passed in as (struct_ptr + 1) in both callers. nd6_option(), the internal helper, returns a pointer to the next option or NULL, which means either "no option, ok" or "invalid option, fail". Failure is signaled through nd_opts_last being NULL after nd6_option() returned, which only happens if nd6_option() zeroed the whole ndopts. Move the two cases under mnemonic labels and zap the now obviously redundant bzero() call in nd6_options(). OK claudio
2023-01-06	Simplify nd6_options() initialise logic	Klemens Nanni
	nd_opts_{search,last,done} are exlusively used in the internal option handling machinery; the only two nd6_options() callers only use nd_opts_{src,tgt}_lladdr. nd6_options() always zeroes and initialises the caller's struct nd_opts. If icmp6len is zero, i.e. if there are no ICMP6 header options left, everything inside *ndopts is zero, except nd_opts_done=1 which is not used by the callers. Set the internal nd_opts_{search,last,done} members only when needed. OK claudio
2023-01-06	Merge common code into new nd6_dad_destroy()	Klemens Nanni
	The current code wrt. stopping DAD for and removing a particular IP from the list is flawed. Introduce a single nd6_dad_destroy() to the cleanup, so that there's only one place to fix. This is just a mechanical deduplication without significant behaviour change; in case a duplicated address was found, RTM_CHGADDRATTR now goes out before cleanup, which should be no problem. The nd6_dad_create() pendant could be done as well, but the end of nd6_dad_start() is currently the only place where a new IP/DAD entry is set up, so little gain besides function name symmetry. OK claudio
2022-12-10	Remove unused experimental ICMP6 redirect low water bits	Klemens Nanni
	Dead since introduction in 2001 with icmp6.c r1.31: implement upper limit to icmp6 redirects (experimental, turned off) negative value to {mtudisc,redirect}_{hi,lo}wat will turn off the limitation. sync with kame. icmp6_redirect_lowat was always -1 and never hit the empty conditional. icmp6_redirect_hiwat never existed. icmp6_mtudisc_{hi,lo}wat are exposed as net.inet6.icmp6.mtudisc_{hi,lo}wat sysctl(2)s, so don't touch those for now. OK mvs
2022-12-10	Reuse off variable from previous line; no object change	Klemens Nanni

2022-12-10	zap 68 trailing spaces from a single line	Klemens Nanni

2022-12-10	`dp' was just allocated with M_ZERO flag, so the following bzero(3) is not	Vitaliy Makkoveev
	required. ok kn@
2022-12-10	Merge nd6_option_init() into nd6_options()	Klemens Nanni
	All call-sites call nd6_options() directly after nd6_option_init(). Fold them to simplify the logic and do less pointing around. Feedback OK bluhm florian
2022-12-09	Switch nd_opts from a union to just a struct.	Claudio Jeker
	The ND6 option handling in the kernel got a lot simpler since only the tgt and src lladdr option are inspected by the kernel. The magic of assigning options via one side of the union and accessing them via the other is total overkill and actually quite error prone. OK florian@
2022-12-07	Do not store unused ICMPv6 Option PREFIX_INFORMATION	Klemens Nanni
	Dead since 2017 sys/netinet6/nd6_rtr.c r1.163 Remove sending of router solicitations and processing of router advertisements from the kernel. It's handled by slaacd(8) these days. sysctl(2) net.inet6.icmp6.nd6_debug does not warn about it like it does for, e.g., duplicate MTU options, so don't do anything with this option. Remove access macros for other unused options while here. Eventually, union nd_opts should be removed completely. All under _KERNEL. tcpdump(8)/rad(8)/slaacd(8) keep showing/sending/receiving this option when running this diff on both router and client. OK claudio
2022-12-06	Add missing kernel lock around (*if_ioctl)() call within	Vitaliy Makkoveev
	in{,6}_addmulti(). Since kernel lock is no more taken while following setsockopt() path, it should be taken in this place. Corresponding in{,6}_delmulti() already acquire kernel lock around (*if_ioctl)(). Problem reported and diff tested by weerd@ ok kn@ bluhm@
2022-12-02	Remove constant basereachable and retrans members from struct nd_ifinfo	Klemens Nanni
	Both are initalised with compile-time constants and never written to. They are part of the Neighbour Discovery machinery and only surface through the single-user SIOCGIFINFO_IN6: $ ndp -i lo0 basereachable=30s0ms, reachable=39s, retrans=1s0ms These values are read-only since 2017 sys/netinet6/nd6.c r1.217 usr.sbin/ndp/ndp.c r1.85 Remove knob and always do neighbor unreachable detection Inline the macros (to keep meaningful names), shrink the per-interface allocated struct nd_ifinfo to what is actually needed and inline nd6_dad_starttimer()'s constant `msec' argument. Nothing else in base, incl. regress, uses SIOCGIFINFO_IN6 or `ndp -i'. OK bluhm
2022-12-02	Remove useless variable, simplify code	Klemens Nanni
	Using a local `duplicate' variable to defer the actual checks by a few lines, interleaved with comments (saying the same thing but negated), is harder to follow that neccessary. Fold the logic and merge comments (remove the last obvious one missing a negation) to save 20 LOC. OK bluhm
2022-12-02	Unlock in6_ioctl_get() aka. SIOCGIF{DSTADDR,NETMASK,AFLAG,ALIFETIME}_IN6	Klemens Nanni
	First the right address is picked from the net lock protected if_addrlist. Then all ioctls just copy out the address, nothing requires the kernel lock. SIOCGIFDSTADDR_IN6 checks the net lock protected if_flags, SIOCGIFALIFETIME_IN6 computes lifetimes which only need the address. This removes the last kernel lock from IPv6 read ioctls (multicast being the untouched exception here). Users of these ioctl(2)s are route6d(8), rad(8), slaacd(8), isakmpd(8) and of course ifconfig(8). OK mvs
2022-11-30	Unlock nd6_ioctl(), push kernel lock into in6_ioctl_{get,change_ifaddr}()	Klemens Nanni
	Neighbour Discovery information is protected by the net lock, as documented in nd6.h struct nd_ifinfo. ndp(8) is the only SIOCGIFINFO_IN6 and SIOCGNBRINFO_IN6 user in base. nd6_lookup(), also used in ICMP6 input and IPv6 forwarding, only needs the net lock. OK mvs
2022-11-28	Document struct nd_ifinfo protection, remove obsolete .initialized member	Klemens Nanni
	All access to struct ifnet's member if_nd is read-only, with the one write exception being nd6_slowtimo() updating ND information. IPv6 Neighbour Discovery information is fully protected by the net lock. --- nd6_ifattach() allocates and unconditionally initialises struct ifnet's if_nd member, so early in if_attachsetup() that there is no way to query unitialised Neighour Unreachable Detection bits. Only SIOCGIFINFO_IN6 through ndp(8) used the .initialized member: Added/set since 2002 sys/netinet6/nd6.c r1.42 attach nd_ifinfo structure to if_afdata. split IPv6 MTU (advertised by RA) from real link MTU. sync with kame Read since 2002 usr.sbin/ndp/ndp.c r1.16 use new SIOCGIFINFO_IN6. random other cleanups. sync w/kame. Obsolete since 2017 sys/netinet6/nd6.c r1.217 usr.sbin/ndp/ndp.c r1.85 Remove knob and always do neighbor unreachable detection. Feedback OK bluhm