src - OpenBSD base system

Age	Commit message (Collapse)	Author
2024-03-26	Avoid NULL pointer dereference in routing table an_match().	Alexander Bluhm
	OK mvs@
2024-03-19	count if_enqueue/ifq_enqueue errors as oqdrops.	David Gwynne
	this helps narrow down where some "output failures" on sec interfaces occur. based on discussion with jason tubnor
2024-03-18	expose per port information via kstats.	David Gwynne
	the most interesting information exposed here is the number of times a port changes state according to the lacp state machine. if a port is flapping, it's hard to see if you only look at the current state. getting a count of changes over time makes problems a lot more visible and therefore fixable. this also exposes counters around how the lacp protocol packets. all of these can be useful when trying to line up behaviors with another system (eg, a switch). ok jmatthew@
2024-03-18	use high bits from the mbuf flowid to pick a port to transmit on.	David Gwynne
	a port here is a physical interface used by an aggr. this leaves the low bits for a physical interface to use to pick a tx ring. without this, aggr used low bits for port selection, which takes bits away from the ring selection, which can lead to uneven distribution of packets over tx rings. ive been running this in production for well over a year now.
2024-03-05	Convert `t_lock', `r_keypair_lock' and `c_lock' rwlock(9)s to	Vitaliy Makkoveev
	corresponding mutex(9)es. ifq_start() and following wg_qstart() could be called from software interrupt context if bandwidth control is enabled in pf.conf(5). Remove sleep points provided by rwlock(9)s from wg(4) output start routine. looks ok claudio
2024-03-04	white space fixes. no functional change	David Gwynne

2024-02-29	revert "Combine route_cache() and rtalloc_mpath() in new route_mpath()"	Christian Weisgerber
	It breaks NFS. ok claudio@
2024-02-28	Enable IPv6 AF for ppp(4)	Denis Fondras
	OK claudio@
2024-02-27	Combine route_cache() and rtalloc_mpath() in new route_mpath().	Alexander Bluhm
	Fill and check the cache and call rtalloc_mpath() together. Then the caller of route_mpath() does not have to care about the uint32_t *src pointer and just pass struct in_addr. All the conversions are done inside the functions. ro->ro_rt is either valid or NULL. Note that some places have a stricter rtisvalid() now compared to the previous NULL check. OK claudio@
2024-02-22	Make the route cache aware of multipath routing.	Alexander Bluhm
	Pass source address to route_cache() and store it in struct route. Cached multipath routes are only valid if source address matches. If sysctl multipath changes, increase route generation number. OK claudio@
2024-02-14	Check IP length in ether_extract_headers().	Alexander Bluhm
	For LRO with ix(4) it is necessary to detect ethernet padding. Extract ip_len and ip6_plen from the mbuf and provide it to the drivers. Add extended sanitity checks, like IP packet is shorter than TCP header. This prevents offloading to network hardware with bougus packets. Also iphlen of extracted headers contains header length for IPv4 and IPv6, to make code in drivers simpler. OK mglocker@
2024-02-13	Analyse header layout in ether_extract_headers().	Alexander Bluhm
	Several drivers need IPv4 header length and TCP offset for checksum offload, TSO and LRO. Accessing these fields directly caused crashes on sparc64 due to misaligned access. It cannot be guaranteed that IP and TCP header is 4 byte aligned in driver level. Also gcc 4.2.1 assumes that bit fields can be accessed with 32 bit load instructions. Use memcpy() in ether_extract_headers() to get the bits from IPv4 and TCP header and store the header length in struct ether_extracted. From there network drivers can esily use it without caring about alignment and bit shift. Do some sanity checks with the length values to prevent that invalid values from evil packets get stored into hardware registers. If check fails, clear the pointer to the header to hide it from the driver. Add debug prints that help to figure out the reason for bad packets and provide information when debugging drivers. OK mglocker@
2024-02-13	Merge struct route and struct route_in6.	Alexander Bluhm
	Use a common struct route for both inet and inet6. Unfortunately struct sockaddr is shorter than sockaddr_in6, so netinet/in.h has to be exposed from net/route.h. Struct route has to be bsd visible for userland as netstat kvm code inspects inp_route. Internet PCB and TCP SYN cache can use a plain struct route now. All specific sockaddr types for inet and inet6 are embeded there. OK claudio@
2024-02-09	Route cache function returns hit or miss.	Alexander Bluhm
	The route_cache() function can easily return whether it was a cache hit or miss. Then the logic to perform a route lookup gets a bit simpler. Some more complicated if (ro->ro_rt == NULL) checks still exist elsewhere. Also use route cache in in_pcbselsrc() instead of filling struct route manually. OK claudio@
2024-02-07	Add missing #ifdef INET6 to fix ramdisk build.	Alexander Bluhm

2024-02-07	Use the route generation number also for IPv6.	Alexander Bluhm
	Implement route6_cache() to check whether the cached route is still valid and otherwise fill caching parameter of struct route_in6. Also count cache hits and misses in netstat. in_pcbrtentry() uses route cache now. OK claudio@
2024-02-06	Invert broken check of panic string in if_linkstate().	Alexander Bluhm
	original bug report from syzkaller Reported-by: syzbot+d19060a65721eb432a72@syzkaller.appspotmail.com broken fix found by Hrvoje Popovski hint to the problem and OK deraadt@
2024-02-05	Add netstat counter for route cache.	Alexander Bluhm
	To optimize route caching, count cache hits and misses. This is shown in netstat -s for both inet and inet6. Reuse the old IPv6 forward cache counter. Sort ip6s_wrongif consistently. For now only IPv4 cache counter has been implemented. OK mvs@
2024-02-05	Don't send route messages while rebooting after panic. Syskaller exposed	Vitaliy Makkoveev
	[1] that if_downall() tries to send route messages and triggers panic again but in knote(9) layer. 1. https://syzkaller.appspot.com/bug?extid=d19060a65721eb432a72 ok bluhm
2024-02-05	Move route_cache() declaration from net/route.h to netinet/in.h.	Kenji Aoyama
	This prevents gcc3's 'parameter has incomplete type' warning that causes kernel build failure. Suggested by claudio@, ok bluhm@
2024-01-31	Add route generation number to route cache.	Alexander Bluhm
	The outgoing route is cached at the inpcb. This cache was only invalidated when the socket closes or if the route gets invalid. More specific routes were not detected. Especially with dynamic routing protocols, sockets must be closed and reopened to use the correct route. Running ping during a route change shows the problem. To solve this, add a route generation number that is updated whenever the routing table changes. The lookup in struct route is put into the route_cache() function. If the generation number is too old, the cached route gets discarded. Implement route_cache() for ip_output() and ip_forward() first. IPv6 and more places will follow. OK claudio@
2024-01-26	Put checksum flags in bpf_hdr to use them in userland dhcpleased.	Jan Klemkow
	Thus, dhcpleased accept non-calculated checksums which were verified by hardware/hypervisor. With tweaks from dlg@ ok bluhm@ mkay tobhe@
2024-01-24	tag packets going out a sec interface to prevent route/encap loops.	David Gwynne
	sec(4) was already looking for this mbuf tag so it could drop packets that had already been sent out on the same interface, but i forgot the code that adds the tag. this was reported by jason tubnor who experienced spins/lockups when using sec and a physical interface was disconnected. rather than being a locking problem like we initially assumed, it turned out that unplugging a physical interface caused a route for ipsec encapsulated traffic to go out over sec(4), causing the packet to loop in the stack. the fix was also tested and verified by jason. sorry for taking so long to look at it.
2024-01-23	Introduce pipex_iterator(), the special thing to perform	Vitaliy Makkoveev
	`pipex_session_list' foreach walkthrough with `pipex_list_mtx' mutex(9) relocking. It inserts special item after acquired `session' and keeps it linked until `session' release. Only owner can unlink it's own item, so the LIST_NEXT(session) is always valid even the `session' was unlinked. The iterator skips special items at the `session' acquisition time, as all other foreach loops where `pipex_list_mtx' mutex(9) is not relocked. ok yasuoka
2024-01-23	Remove `pipex_rd_head6' and `ps6_rn[2]'. They are not used.	Vitaliy Makkoveev
	ok yasuoka
2024-01-18	Use `nowake' as tsleep_nsec(9) ident. It has no corresponding wakeup(9).	Vitaliy Makkoveev
	ok bluhm
2024-01-11	Use domain name for socket lock.	Alexander Bluhm
	Syzkaller with witness complains about lock ordering of pf lock with socket lock. Socket lock for inet is taken before pf lock. Pf lock is taken before socket lock for route. This is a false positive as route and inet socket locks are distinct. Witness does not know this. Name the socket lock like the domain of the socket, then rwlock name is used in witness lo_name subtype. Make domain names more consistent for locking, they were not used anyway. Regardless of witness problem, unique lock name for each socket type make sense. Reported-by: syzbot+34d22dcbf20d76629c5a@syzkaller.appspotmail.com Reported-by: syzbot+fde8d07ba74b69d0adfe@syzkaller.appspotmail.com OK mvs@
2024-01-10	Split UDP PCB table into IPv4 and IPv6.	Alexander Bluhm
	Having two hash tables instead of a common one, reduces table size and contention on the per table lock. The address family is always known in advance. The lookups and loops are more specific. OK sashan@
2024-01-06	Do not count packets though multicast loopback and simplex interfaces.	Alexander Bluhm
	Counting multicast packets sent to local stack or packets that are reflected by simplex interfaces does not make much sense. They are neither received nor output by any ethernet device. Counting these packets at lo0 or the loopback interface of the routing domain would be possible, but is not worth the effort. Make if_input_local() MP safe by deleting the if_opackets++ code. OK mvs@
2024-01-06	Take net lock before kernel lock.	Alexander Bluhm
	Doing KERNEL_LOCK() just before NET_LOCK() does not make sense. Net lock is a rwlock that releases kernel lock during sleep. To avoid an unnecessary release and take kernel lock cycle, move KERNEL_LOCK() after NET_LOCK(). There is no lock order reversal deadlock issue. Both locks are used in any order thoughout the kernel. As NET_LOCK() releases the kernel lock when it cannot take the lock immediately and has to sleep, we always end in the order kernel lock before net lock after sleeping. OK sashan@
2024-01-01	Protect link between pf and inp with mutex.	Alexander Bluhm
	Introduce global mutex to protect the pointers between pf state key and internet PCB. Then in_pcbdisconnect() and in_pcbdetach() do not need exclusive netlock anymore. Use a bunch of read once unlocked access to reduce performance impact. OK sashan@
2024-01-01	Call if_counters_alloc() before if_attach().	Vitaliy Makkoveev
	ok bluhm sashan
2024-01-01	Fix white space in pf.c.	Alexander Bluhm

2023-12-29	Make loopback interface counters MP safe.	Alexander Bluhm
	Create and use the MP safe version of the interface counters for lo(4). Input packets were counted twice. As interface input queue is already counting, remove input count in if_input_local(). Multicast and siplex packets are counted at the ethernet interface. Add a comment that this not MP safe. OK mvs@
2023-12-28	use RB_FOREACH_SAFE for pf_purge_expired_src_nodes	aisha
	OK bluhm@
2023-12-23	Backout always allocate per-CPU statistics counters for network	Alexander Bluhm
	interface descriptor. It panics during attach of em(4) device at boot.
2023-12-22	Always allocate per-CPU statistics counters for network interface	Vitaliy Makkoveev
	descriptor. We have the mess in network interface statistics. Only pseudo drivers do per-CPU counters allocation, all other network devices use the old `if_data'. The network stack partially uses per-CPU counters and partially use `if_data', but the protection is inconsistent: some times counters accessed with exclusive netlock, some times with shared netlock, some times with kernel lock, but without netlock, some times with another locks. To make network interfaces statistics more consistent, always allocate per-CPU counters at interface attachment time and use it instead of `if_data'. At this step only move counters allocation to the if_attach() internals. The `if_data' removal will be performed with the following diffs to make review and tests easier. ok bluhm
2023-12-19	Initialize `sc_outputtask' before interface attachment. if_alloc_sadl()	Vitaliy Makkoveev
	has sleep point, so the uninitialized `sc_outputtask` could be accessed through ioctl(2) interface. ok sashan bluhm
2023-12-16	Rework pflowioctl() lock dances.	Vitaliy Makkoveev
	Release netlock and take `sc_lock' rwlock(9) just in the beginning of pflowioctl() and do corresponding operations in the end. Use `sc_lock' to protect `sc_dying'. We need to release netlock not only to keep locks order with `sc_lock' rwlock(9), but also because pflowioctl() calls some operations like socreate() or soclose() on udp(4) socket. Current implementation has many relocking places which breaks atomicy, so merge them into one. The `sc_lock' rwlock(9) is taken during all pflowioctl() call, so `sc_dying' atomicy is not broken. Not the ideal solution, but better then we have now. Tested by Hrvoje Popovski. Discussed with and ok from sashan
2023-12-12	slyle(9) fix. No functional changes.	Vitaliy Makkoveev

2023-12-12	Turn `pflowstats' statistics counters into per-CPU counters to make them	Vitaliy Makkoveev
	mpsafe. The weird interactions around `pflow_flows' and `sc_gcounter' replaced by simple `pflow_flows' increment. Since the flow sequence is the 32 bits integer, the `sc_gcounter' type replaced by the type of uint32_t. ok bluhm sashan
2023-12-11	Turn `pflow_softc' list into SMR list.	Vitaliy Makkoveev
	Since the revision 1.1182 of net/pf.c netlock is not taken while export_pflow() called from pf_purge_states(). Current locks order requires netlock to be taken before PF_LOCK(), so there is no reason to turn it back into this path only for optional export_pflow() call. The `pflowif_list' foreach loop has no context switch within, so SMR list is better than mutex(9). Tested by Hrvoje Popovski. ok sashan bluhm
2023-12-08	Add spaces around '='. style(9) fix, no functional changes.	Vitaliy Makkoveev

2023-12-08	Introduce `sc_mtx' mutex(9) to protect the most of pflow_softc	Vitaliy Makkoveev
	structure. Protect the `send_nam', `sc_flowsrc' and `sc_flowdst' pflow_softc members by existing `sc_lock' rwlock(9). This partially fixes locking inconsistency of pflow_softc. The following work will be done with separate diffs. Also, pass `sc' instead of NULL to pflow_get_mbuf() while calling from pflow_sendout_ipfix_tmpl(). This fixes the NULL dereference. ok bluhm@
2023-12-03	Make rtm_senddesync_timer() timeout(9) handler mpsafe. solock() protects	Vitaliy Makkoveev
	the socket and the socket's PCB data. ok bluhm
2023-12-01	pipex(4) layer is completely mp-safe, move the pipex_timer() timeout(9)	Vitaliy Makkoveev
	handler out of kernel lock. ok bluhm
2023-12-01	Prevent race between pf_test() and pf_purge_expired_states().	Alexandr Nedvedicky
	Packets (callers to pf_test()) must alter pf_state::timeout under protection of pf_state::mtx. We also have to make sure the packet does not update pf_state::timeout when ::timeout reaches PFTM_UNLINKED. The first report came from Johan Huldtgren, but he is not the single user who has noticed "st->timeout == PFTM_UNLINKED" assert violation. OK bluhm@
2023-11-29	remove unused VXLANMTU definition	Denis Fondras
	OK claudio, miod
2023-11-28	Remove struct inpcb from in6_embedscope() parameters.	Alexander Bluhm
	rip6_output() did modify inp_outputopts6 temporarily to provide different ip6_pktopts to in6_embedscope(). Better pass inp_outputopts6 and inp_moptions6 as separate arguments to in6_embedscope(). Simplify the code that deals with these options in in6_embedscope(). Doucument inp_moptions and inp_moptions6 as protected by net lock. OK kn@
2023-11-23	avoid passing weird mbuf chains to pf when pushing out a veb.	David Gwynne
	pf expects the ip header to be in the first mbuf of the chain we pass to pf_test, but in some situations the ethernet header is the only data in the first mbuf. after we remove the ethernet header, the first mbuf had no data in it which confused pf. fix this by passing all packets to ip_check on output as well as input. ip input handlers do all the necessary m_pullups. found by Mark Patruck.