src - OpenBSD base system

Age	Commit message (Collapse)	Author
2021-08-10	Remove unused `ipa_pcb' from 'ipsec_acquire' structure.	mvs
	ok gnezdo@
2021-08-09	During unidirectional data transmission, a TCP connection may stall.	Alexander Bluhm
	The sending machine is doing zero window probes, but is not sending any more data although the other machine announced that it has space again. The header prediction code did not update snd_wl2. If there was a sequence number wrap, the send window update block is not reached. Update snd_wl2 when receiving predicted ACKs and and update snd_wl1 and rcv_up for predicted pure data. from FreeBSD; OK sashan@ claudio@
2021-08-09	Fix white spaces.	Alexander Bluhm

2021-07-27	Revert "Use per-CPU counters for tunnel descriptor block" diff.	mvs
	Panic reported by Hrvoje Popovski.
2021-07-26	Use per-CPU counters for tunnel descriptor block (tdb) statistics.	mvs
	'tdb_data' struct became unused and was removed. ok bluhm@
2021-07-26	Do not queue crypto operations for IPsec. The packet entries in	Alexander Bluhm
	task queues were unlimited and could overflow during havy traffic. Even if we still use hardware drivers that sleep, softnet task instead of soft interrupt can handle this now. Without queues net lock is inherited and kernel lock is only needed once per packet. This results in less lock contention and faster IPsec. Also protect tdb drop counters with net lock and avoid a leak in crypto dispatch error handling. intense testing Hrvoje Popovski; OK mpi@
2021-07-26	The mbuf header cleanup in revision 1.173 of ip_icmp.c was too	Alexander Bluhm
	strict. ICMP error packets generated by pf were not passed immediately, but could be blocked. Preserve PF_TAG_GENERATED flag in icmp_reflect() and icmp6_reflect(). reported by sf@; OK patrick@ kn@
2021-07-21	Also count crypto errors in ipsec_input_cb() like IPsec output in	Alexander Bluhm
	previous commit.
2021-07-21	Propagate errors from crypto_invoke() and count them in IPsec. They	Alexander Bluhm
	should not happen, but always check error conditions. tq is never NULL, remove the check. tdb->tdb_odrops++ is not MP safe, but will be addressed separately in ipsec_output_cb(). OK mvs@
2021-07-19	Remove `ids' from `ipsec_ids_tree' while following ipsp_ids_insert()	mvs
	error path. This fixes use-after-free issue. Also fix debug message mistype pointed by bluhm@ in error path. ok millert@ bluhm@
2021-07-18	Introduce and use garbage collector for 'ipsec_ids' struct entities	mvs
	destruction instead of using per-entity timeout. This fixes the races between ipsp_ids_insert(), ipsp_ids_free() and ipsp_ids_timeout(). ipsp_ids_insert() can't stop ipsp_ids_timeout() timeout handler which is already running and awaiting netlock to be released, so reused `ids' will be silently removed in this case. ipsp_ids_free() can't determine is ipsp_ids_timeout() timeout handler running because timeout_del(9) called by ipsp_ids_insert() clears it's triggered state. So ipsp_ids_timeout() could be scheduled to run twice in this case. Also hrvoje@ reported about ipsec(4) throughput increased with this diff so it seems we caught significant count of ipsp_ids_insert() races. tests and feedback by hrvoje@ ok bluhm@
2021-07-18	The IPsec authentication before decryption used a different replay	Alexander Bluhm
	counter than after decryption. This could result in "esp_input_cb: authentication failed for packet in SA" errors. As we run crypto operations async, thousands of packets are stored in the crypto task. During the queueing the replay counter of the tdb can change. Then the higher 32 bits may increment although the lower 32 bits did not wrap. checkreplaywindow() must be called twice per packet with the same replay counter. Store the value in struct tdb_crypto while dangling in the task queue and doing crypto operations. tested by Hrvoje Popovski; joint work with tobhe@
2021-07-16	Improve comments in IPsec replay window calculation.	Alexander Bluhm
	OK tobhe@
2021-07-14	Resend the TCP packet only if the MTU locked flag appears at the	Alexander Bluhm
	route and was not there before. This should prevent a recursion in path MTU discovery with TCP over IPsec. reported and tested Matthias Schmidt; tested and OK tobhe@
2021-07-13	Remove unused `PolicyHead' from 'sockaddr_encap' structure.	mvs
	ok tobhe@
2021-07-08	The xformsw array never changes. Declare struct xformsw constant	Alexander Bluhm
	and map data read only. OK deraadt@ mvs@ mpi@
2021-07-08	Initialize `ipsec_acquire_pool' pool (9) within pfkey_init() instead of	mvs
	doing that in runtime within ipsp_acquire_sa(). ok bluhm@
2021-07-08	Debug printfs in encdebug were inconsistent, some missing newlines	Alexander Bluhm
	produced ugly output. Move the function name and the newline into the DPRINTF macro. This simplifies the debug statements. OK tobhe@
2021-07-08	The properties of the crypto algorithms never change. Declare them	Alexander Bluhm
	constant. Then they are mapped as read only. OK deraadt@ dlg@
2021-07-07	tell ether_input() to call pf_test() outside of smr_read sections,	Alexandr Nedvedicky
	because smr_read sections don't play well with sleeping locks in pf(4). OK bluhm@
2021-07-07	Fix whitespaces in IPsec code.	Alexander Bluhm

2021-06-30	For path MTU discovery tcp_mtudisc() should resend a TCP packet by	Alexander Bluhm
	calling tcp_output() if the TCP maximum segment size changes. But that did not work, as the new value was compared before tcp_mss() had a chance to modify it. Move the comparison and change it from not equal to greater than. It makes only sense to resend a packet immediately if it becomes smaller and is more likely to fit. OK sashan@ tobhe@
2021-06-21	Fix uninitialized variables introduced in rev 1.361	Jeremie Courreges-Anglas
	Thankfully clang elided the code in an almost harmless way (at least on amd64 GENERIC.MP). Spotted by chance when building kernels with -Wno-error=uninitialized. ok dlg@ sashan@ bluhm@
2021-06-18	The crypto(9) framework used by IPsec runs on a kernel task that	Alexander Bluhm
	is protected by kernel lock. There were crashes in swcr_authenc() when it was accessing swcr_sessions. As a quick fix, protect all calls from network stack to crypto with kernel lock. This also covers the rekeying case that is called from pfkey via tdb_init(). OK mvs@
2021-06-03	remember if the ipv4 header checksum is ok.	David Gwynne
	if a bridge checks the ip header before the network stack, then we can remember it was ok when the bridge checks it so the ip stack doesnt have to. ok claudio@ mvs@
2021-06-02	factor out the code that does basic sanity checks on ipv4 headers.	David Gwynne
	this will allow these checks to be reused by bridge (where they're currently duplicated), veb, and tpmr. ok bluhm@ sashan@
2021-05-25	As network features are not added dynamically, the domain structures	Alexander Bluhm
	are constant. Having more const makes MP review easier. More pointers are mapped read-only in the kernel image. OK deraadt@ mvs@
2021-05-15	Fix IPsec NAT-T to work with pipex(4). Introduce a new packet tag	YASUOKA Masahiko
	PACKET_TAG_IPSEC_FLOWINFO to specify the IPsec flow. ok mvs
2021-05-12	Use local copy of `ps_rtableid' in ip{,6}_ctloutput() and mark	mvs
	`ps_rtableid' as atomic. This allows us to unlock setrtable(2). ok claudio@ mpi@
2021-05-04	Initialize `ipsec_policy_pool' within pfkey_init() instead of doing that	mvs
	in runtime within pfkeyv2_send(). Also set it's interrupt protection level to IPL_SOFTNET. ok bluhm@ mpi@
2021-04-30	Rearrange the implementation of bounded sysctl. The primitive	Alexander Bluhm
	functions are sysctl_int() and sysctl_rdint(). This brings us back the 4.4BSD implementation. Then sysctl_int_bounded() builds the magic for range checks on top. sysctl_bounded_arr() is a wrapper around it to support multiple variables. Introduce macros that describe the meaning of the magic boundary values. Use these macros in obvious places. input and OK gnezdo@ mvs@
2021-04-28	Use mq_delist() to fetch the ARP mbuf hold queue once and feed the	Alexander Bluhm
	mbuf list to if_output(). OK sashan@ mvs@
2021-04-28	Document the locking mechanism of the global variables in ARP code.	Alexander Bluhm
	The global list of ARP llinfo is protected by net lock. This is not sufficent when we switch to shared netlock. Add a mutex for insertion and removal when net lock is not exclusive. This is needed if we want run IP output on multiple CPU. Put an assertion for shared net lock into arp_rtrequest. input mvs@; OK sashan@
2021-04-26	Convert the ARP packet hold queue from mbuf list to mbuf queue which	Alexander Bluhm
	contins a mutex. Update la_hold_total with atomic operations. OK sashan@
2021-04-23	Setting variable arpinit_done is not MP save if we want to execute	Alexander Bluhm
	arp_rtrequest() in parallel. Move initialization to arpinit() function. OK kettenis@ mvs@
2021-04-23	The variable la_hold_total contains the number of packets currently	Alexander Bluhm
	in the arp queue. So the sysctl net.inet.ip.arpqueued must be read only. In if_ether.c include the header with the declaration of la_hold_total to ensure that the definition matches. OK mvs@
2021-04-16	Turn on the direct ACK on every other segment.	Alexander Bluhm
	This is a backout of rev 1.366 which turned this feature off. Although sending less ACKs makes TCP faster if the CPU is busy with processing packets, there are corner cases where TCP gets slower. Especially OpenBSD 6.8 and older has a maxbust limitiation that scales badly if the other side sends too few ACKs. Also regress test relayd run-args-http-slow-consumer.pl uses strange socket buffer sizes that triggers slow performance with the new algorithm. For OpenBSD 6.9 release switch back to 6.8 delayed ACK behavior. discussed with deraadt@ benno@ claudio@ jan@
2021-03-30	[ICMP] IP options lead to malformed reply	Alexandr Nedvedicky
	icmp_send() must update IP header length if IP optaions are appended. Such packet also has to be dispatched with IP_RAWOUTPUT flags. Bug reported and fix co-designed by Dominik Schreilechner _at_ siemens _dot_ com OK bluhm@
2021-03-20	use m_dup_pkthdr in ip_fragment to copy pkthdr info to fragments.	David Gwynne
	this ensures more stuff is copied, in particular the flowid information. this is also how v6 does it, which makes things more consistent. ok bluhm@
2021-03-10	spelling	Jonathan Gray
	ok gnezdo@ semarie@ mpi@
2021-03-07	use uint64_t ethernet addresses for compares in carp.	David Gwynne
	pass the uint64_t that ether_input has already converted from a real ethernet address into carp_input so it can use it without having to do its own conversion. tested by hrvoje popovski tested by me on amd64 and sparc64 ok patrick@ jmatthew@
2021-03-05	pass the uint64_t dst ethernet address from ether_input to bridges.	David Gwynne
	tested on amd64 and sparc64.
2021-03-01	Refactor ip_fragment() and ip6_fragment(). Use a mbuf list to	Alexander Bluhm
	simplify the handling of the fragment list. Now the functions ip_fragment() and ip6_fragment() always consume the mbuf. They free the mbuf and mbuf list in case of an error and take care about the counter. Adjust the code a bit to make v4 and v6 look similar. Fixes a potential mbuf leak when pf_route6() called pf_refragment6() and it failed. Now the mbuf is always freed by ip6_fragment(). OK dlg@ mvs@
2021-02-26	add some helpers for working with ethernet addresses as uint64_t	David Gwynne
	the main bits are ether_addr_to_e64 and ether_e64_to addr for loading an ethernet address into a uin64_t and visa versa. there's also some macros for testing if an address in a uint64_t is multicast, broadcast, anyaddr, or if it's an 802.1q reserved multicast group address. the reason for this functionality is once you have an ethernet address as a uint64_t, operations like compares, bit tests, and so on are fast and easy. tested on amd64 and sparc64
2021-02-25	we don't have to cast to caddr_t when calling m_copydata anymore.	David Gwynne
	the first cut of this diff was made with coccinelle using this spatch: @rule@ type caddr_t; expression m, off, len, cp; @@ -m_copydata(m, off, len, (caddr_t)cp) +m_copydata(m, off, len, cp) i had fix it's opinionated idea of formatting by hand though, so i'm not sure it was worth it. ok deraadt@ bluhm@
2021-02-23	Use pool to allocate tdbs.	tobhe
	ok patrick@ bluhm@
2021-02-23	As ip_insertoptions() may prepend a mbuf, "goto bad" has to free	Alexander Bluhm
	the new chain. This fixes a potential memory leak in ip_output(). Also simplify a bunch of "goto done". OK kn@ mvs@
2021-02-23	Use NULL instead of 0 in `m_nextpkt' assignment.	mvs
	ok deraadt@ dlg@
2021-02-11	Swap faddr/laddr and fport/lport arguments in call to stoeplitz_ipXport().	Patrick Wildt
	Technically the whole point of the stoeplitz API is that it's symmetric, meaning that the order of addresses and ports doesn't matter and will produce the same hash value. Coverity CID 1501717 ok dlg@
2021-02-10	If pf changes the routing table when sending packets, the kernel	Alexander Bluhm
	could get stuck in an endless recursion during TCP path MTU discovery. Create a dynamic host route in ip_output() that can be used by tcp_mtudisc() to store the MTU. Reported by Peter Mueller and Sebastian Sturm OK claudio@