src - OpenBSD base system

Age	Commit message (Collapse)	Author
2019-08-21	Remove support for semantically opace interface identifiers (RFC 7217)	Florian Obser
	for IPv6 link local addresses. Some hosting and VM providers route customer IPv6 prefixes to link local addresses derived from ethernet MAC addresses (RFC 2464). This leads to hard to debug IPv6 connectivity problems and is probably not worth the effort. RFC 7721 lists 4 weaknesses: 3.1. Correlation of Activities over Time & 3.2. Location Tracking These are still possible with RFC 7217 addresses for an adversary connected to the same layer 2 network (think conference wifi). Since the link local prefix stays the same (fe80::/64) the link local addresses do not change between different networks. An adversary on the same layer 2 network can probably track ethernet MAC addresses via different means, too. 3.3. Address Scanning & 3.4. Device-Specific Vulnerability Exploitation These now become possible, however, as noted above a layer 2 adversary was probably able to do this via different means. People concerned with these weaknesses are advised to use ifconfig lladdr random. OK benno input & OK kn
2019-08-16	ifq_hdatalen should keep the mbuf it's looking at, not leak it.	David Gwynne
	ie, use ifq_deq_rollback after looking at the head mbuf instead of ifq_deq_commit. this is used in tun/tap, where it had the effect that you'd get the datalen for the packet, and then when you try to read that many bytes it had gone. cool and normal. this was found by a student who was trying to do just that. i've always just read the packet into a large buffer.
2019-08-06	When we needed the kernel lock for local IP packet delivery, mpi@	Alexander Bluhm
	introduced a queue to grab the lock for multiple packets. Now we have only netlock for both IP and protocol input. So the queue is not necessary anymore. It just switches CPU and decreases performance. So remove the inet and inet6 ip queue for local packets. To get TCP running on loopback, we have to queue once between TCP input and output of the two sockets. So use the loopback queue in looutput() unconditionally. OK visa@
2019-08-05	try to be more compliant with the spec by implementing marker responses.	David Gwynne
	i hope, i didn't test this that hard.
2019-08-05	run pf against ip packets coming in and out of the two ports.	David Gwynne
	the idea and a good chunk of the implementation is copied from bridge(4). note that IP packets inside "service delimited" traffic, ie, vlan, svlan, or bpe encapsulated traffic, are not considered IP and will therefore not be given to pf to look at. if you want to filter that you'll need to configure vlan/svlan/bpe interfaces to get past their headers, and then configure them with their own tpmrs. hopefully the interface input handlers were established in the right order.
2019-08-05	pay some lip service to TPMR compliance according to 802.1Q-2018	David Gwynne
	the spec says we should filter packets destined to a list of ethernet addresses. im currently interpreting "filter" as meaning dropping, which this diff does. however, one of the addresses to filter is the one lacp uses by default and not a lot of lacp implementations (read switches) support the configuration of a different address. i still need lacp to go over tpmr, and because i can't change the address, this diff also has a way to configure tpmr to still allow the packets through.
2019-08-01	add tpmr(4), a quick and dirty 802.1Q Two-Port MAC Relay implementation	David Gwynne
	a TPMR is a simplified brigde (as supported by bridge(4)). it only supports two ports, and unconditionally forwards frames between them. this is unlike a real bridge which can support an arbitrary number of ports and implements a learning algorithm. i needed this to tunnel LACP between switches in a couple of data centers separated by an IP network. because bridge(4) implements an actual 802.1Q bridge, it eats packets that are supposed to be sent between bridges, such as spanning tree and LACP. TPMR according to the spec does a lot less of this, and is in fact documented in the spec as being able to support transport of LACP frames. tpmr(4) is actually a lot dumber and current does no filtering (except what you can do with BPF). because the forwarding path in tpmr(4) is so short and simple, it is relatively fast and can be used to isolate and help improve the relative performance of some parts of the system. i also have plans to use this for monitoring traffic without processing it. tpmr(4) implements the trunk(4) ioctls for managing configuration. the ifconfig output for trunk interfaces is a bit shorter and needs a lot less stuff faked to be useful. inside the kernel it appears as an IFT_BRIDGE interface (like bridge(4)). it generally just drops stuff unless it's between the ports it's managing. this has been in production at my work for a few days between some physical nics and etherip(4), and so far it has been really solid. hrvoje popovski has kicked the tyres too, but more from a performance point of view. ok claudio@ deraadt@
2019-07-29	The IPv6 duplicate address detection may send a packet before the	Alexander Bluhm
	gre tunnel is set up. This could cause a panic. In gre(4) reject outgoing packets during that time window. While there, count interface errors and use generic unhandled_af(). bug reported by andreas at nullbyte dot se; OK dlg@
2019-07-25	AF_INET comes before AF_INET6. Shorten line to <80 chars.	Kenneth R Westerback
	pointed out by claudio@
2019-07-25	Add IFXF_AUTOCONF4 to if_xflags to match IFXF_AUTOCONF6. Let	Kenneth R Westerback
	ifconfig set/unset it. ok deraadt@ kmos@
2019-07-20	When multiple ports share the same MAC, pick the physical one for delivery.	Martin Pieuchot
	Fix an issue reported by Eygene Ryabinkin where packet where dropped by pf(4) because a vlan(4) interface was picked instead of its underlying em(4). While here do some refactoring to avoid code duplication. Based on a submission from Eygene Ryabinkin <rea at codelabs dot ru>. ok bluhm@, kn@
2019-07-20	generate the actor info per port to send to userland.	David Gwynne
	useful for debugging.
2019-07-20	just use LINK_STATE_IS_UP to see if a port has link.	David Gwynne
	excluding HALF_DUPLEX just seems mean.
2019-07-19	try to notify the partner when the port is going away or down.	David Gwynne
	by notify i mean we send an lacp packet with our collecting and distributing flags cleared, which should tell the remote system that it should no longer handle packets on their port as part of their aggregation. this is implemented by "unselecting" a port. if an active port is going away, ie, being removed from an aggr via "ifconfig aggr0 -trunkport port0", all that happens is software state on our side changes and we stop considering the interface as part of the aggr interface. the partner system is otherwise oblivious and can continue to send us packets until its expiry timeout fires because it doesn't know any better. we already intercept a ports ioctl handling, so if someone goes "ifconfig portX down" while it is attached to an aggr, we can catch that before the underlying driver actually tears the rings down, and we still have a chance to try and send a packet to the peer. this is useful because our drivers generally do not drop the physical link, so again, the partner system is oblivious to the change on our side until its expiry timer fires. expiry timeouts can be up to 90 seconds away, which is a lot of traffic to blackhole. sending the notification to the parnter means they withdraw this link at the same time the local system is pulling the port out of the aggregation. hopefully. it is possible the packet is lost, but this is a good start. the only caveat to this is is my implementation ignores the transmit state machine from the lacp spec, and may cause more than 3 lacp packets per second to be transmitted to the partner system. oh well. i should look at the marker protocol too.
2019-07-19	default (ie, reset) the partner info when a ports link goes down.	David Gwynne
	this doesnt seem to be mentioned in the spec, but is a sensible thing to do if you think about it. all the switches i've tried also do this, so there's some consensus about it being sensible. this is done in the link state handler rather than being added to one of the state machines. the idea is to keep the state machines as close to what's in the spec as possible.
2019-07-19	export all the partner info to userland, not just what ifconfig prints.	David Gwynne

2019-07-19	ttysleep(): drop unused timeout parameter	cheloha
	All callers sleep indefinitely. With help from visa@. ok visa@, ratchov@, kn@
2019-07-18	follow up to 'once rule' expiration	Alexandr Nedvedicky
	ok lteo@
2019-07-18	make the UCT in the rxm generate debug output	David Gwynne
	without this it looks like debug output loses info because of how the uct was shortcutted. no functional change, just prettier printfs.
2019-07-18	run the selection logic from the rxm current state if the port is unselected	David Gwynne
	previously it would only run the selection logic if the peer information changed, but it is possible to be in the current state with stale partner info. that can happen if the port becomes disabled/disconnected, which unwinds the mux machine, but doesnt clear the partner info. when the link is enabled again we re-enter the current state, but because the partner info is the same we didn't run the selection logic, which in turn didn't let the mux machine move forward again.
2019-07-18	bulk up the debug output around selection logic	David Gwynne
	lacp didnt come up again after i replaced some optics with dacs, and it has to be because of a problem around the selection logic. this will let me narrow it down.
2019-07-18	replace ether_{cmp,is_eq,is_zero} with the new ones in netinet/if_ether.h	David Gwynne
	ehter_cmp goes away, ether_is_eq becomes ETHER_IS_EQ, ether_is_zero becomes ETHER_IS_ANYADDR. ether_is_slow is kept locally, but renamed to ETHER_IS_SLOWADDR to better match what comes from if_ether.h.
2019-07-18	This commit fixes two bugs involving PF once rules:	Lawrence Teo
	1. If a packet happens to match an expired once rule before the rule is removed by the purge thread, the rule will be added to the pf_rule_gcl list again, eventually causing a kernel crash when the purge thread tries to remove the expired rule multiple times; and 2. A packet that matches an expired once rule will still cause a state to be created, so a once rule is not truly a one shot rule while it is in that expired-but-not-purged time window. To fix both bugs, add a check in pf_test_rule() to prevent expired once rules from being added to pf_rule_gcl. The check is added "early" in pf_test_rule() to prevent any new connections from creating state if they match the expired once rule. This commit also includes a tweak by sashan@ to ensure that only one PF task will mark a once rule as expired. Here is sashan@'s commentary: "As soon as there will be more PF tasks running in parallel, we would be able to hit similar crash you are fixing now. The rules are covered by read lock, so with more PF tasks there might be two packets racing to expire at once rule at the same time. Using atomic_cas() is sufficient measure to serialize competing packets." tested by abieber@ who reported the kernel crash on bugs@ ok sashan@
2019-07-17	Convert struct rtpcb malloc(9) to pool_get(9). PCB for routing	Alexander Bluhm
	socket is only used in process context, so pass PR_WAITOK to pool_init(9). The possible sleep in pool_put(9) should not hurt as route_detach() is only called by soclose(9). As both pr_attach() and pr_detach() are always called with kernel lock, PR_RWLOCK is not needed. OK mpi@
2019-07-17	Convert struct pkpcb malloc(9) to pool_get(9). PCB for pfkey is	Alexander Bluhm
	only used in process context, so pass PR_WAITOK to pool_init(9). The possible sleep in pool_put(9) should not hurt as pfkeyv2_detach() is only called by soclose(9). As both pr_attach() and pr_detach() are always called with kernel lock, PR_RWLOCK is not needed. OK mpi@
2019-07-17	Introduce ETHER_IS_BROADCAST/ANYADDR/EQ() and use them where appropriate.	Martin Pieuchot
	ok dlg@, sthen@, millert@
2019-07-14	newlen was a dead store, but what we could use is oldlen to	Florian Obser
	simplify the code. Pointed out by daniel@ with the help of their friend gcc9 OK kn
2019-07-11	fix NULL pointer dereference, reported and fix tested by sthen	Alexandr Nedvedicky
	ok yasuoka
2019-07-09	Add missing mtx_leave() in error path.	Martin Pieuchot
	Reported by kn@, ok visa@
2019-07-09	Fix previous commit which made src-node have a reference for the kif.	YASUOKA Masahiko
	Src-node should use the reference counter since it might live longer than its table entry, rule or the associated states. OK sashan
2019-07-08	free(9) sizes for M_RTABLE.	Martin Pieuchot
	ok kn@
2019-07-05	pretend to handle setting trunkproto, but only support setting it to lacp	David Gwynne

2019-07-05	fix the $OpenBSD$ tag	David Gwynne

2019-07-05	initialise sc_lacp_timeout to AGGR_LACP_TIMEOUT_SLOW, not 0;	David Gwynne
	it's the same, but there was a misleading comment on the same line which this cleans up too.
2019-07-05	iterate over distributing ports when populating the tx map, not all ports	David Gwynne
	this probably explains why ive seen a box decide not to use a distributing port, even though the state machine and all the lacp state flags say it's fine. it may also explain why jmatthew@ has seen a port still transmitting after it's been removed from an aggr(4).
2019-07-05	init the log of tx times to somewhere in the past when adding a port.	David Gwynne

2019-07-05	move a declaration before a statement.	David Gwynne

2019-07-05	report a port as active to userland if it is muxed	David Gwynne

2019-07-05	tweak mtu handling and propagate mtu setting to trunkports	David Gwynne
	make setting a trunkports mtu to its current mtu a nop. set a trunkports mtu to the aggr mtu when the port is getting added. set the mtu on all trunkports when the aggr mtu is set so things look consistent. restore a trunkports mtu when it is removed from an aggr. this is mostly cosmetic since the mtu on trunkports isn't really used anywhere.
2019-07-05	add aggr(4), a dedicated driver that implements 802.1AX link aggregation	David Gwynne
	802.1AX (formerly known as 802.3ad) describes the Link Aggregation Control Protocol (LACP) and how to use it in a bunch of different state machines to control when to bundle interfaces into an aggregation. technically the trunk(4) driver already implements support for 802.1AX, but it had a couple of problems i struggled to deal with as part of that driver. firstly, i couldnt easily make the output path in trunk mpsafe without getting bogged down, and the state machine handling had a few hard to diagnose edge cases that i couldnt figure out. the new driver has an mpsafe output path, and implements ifq bypass like vlan(4) does. this means output with aggr(4) is up to twice as fast as trunk(4). the implementation of the state machines as per the standard means the driver behaves more correctly in edge cases like when a physical link looks like it is up, but is logically unidirectional. the code has been good enough for me to use in production, but it does need more work. that can happen in tree now instead of carrying a large diff around. some testing by ccardenas@, hrvoje popovski, and jmatthew@ ok deraadt@ ccardenas@ jmatthew@
2019-07-05	record when trunk takes over an interface by setting ac_trunkport	David Gwynne
	this will be used to prevent trunk and the upcoming aggr driver from taking ownership of an Ethernet interface at the same time.
2019-07-03	add the kernel side of net.link.ifrxq.pressure_return and pressure_drop	David Gwynne
	these values are used as the backpressure thresholds in the interface rx q processing code. theyre being exposed as tunables to userland while we are figuring out what the best values for them are. ok visa@ deraadt@
2019-07-02	When source address tracking record is used for "route-to", the next	YASUOKA Masahiko
	hop interface configured with "route-to" was not used. Keep the interface within the pf_src_node and use it when the record is used. OK sashan
2019-07-01	Link the state and the source track to keep the source track while	YASUOKA Masahiko
	there are states which refer it. OK sashan
2019-07-01	reintroduce ifiq_input counting backpressure	David Gwynne
	instead of counting the number of packets on an ifiq, count the number of times a nic has tried to queue packets before the stack processes them. this new semantic interacted badly with virtual interfaces like vlan and trunk, but these interfaces have been tweaked to call if_vinput instead of if_input so their packets are processed directly because theyre already running inside the stack. im putting this in so we can see what the effect is. if it goes badly i'll back it out again. ok cheloha@ proctor@ visa@
2019-06-30	if_vinput should pass BPF_DIRECTION_IN to bpf_mtap, not OUT	David Gwynne

2019-06-26	Create IF_WWAN_DEFAULT_PRIORITY which is lower than	Claudio Jeker
	IF_WIRELESS_DEFAULT_PRIORITY and use it in umb(4) as default prio. OK kettenis@, sthen@
2019-06-26	The MPLS edge devices get the packets from the MPLS stack which never	Claudio Jeker
	passed though pf_test(). So there is no need to try to call pf_pkt_addr_changed() instead just check that the PF statekey is NULL. Initial problem of not including pf.h found by jsg@ OK jsg@ sashan@
2019-06-24	Since the recent recursion fix in rtable_walk(), deleting an interface	Alexander Bluhm
	address could trigger the "rt->rt_ifidx == ifp->if_index" assertion. In rtflushclone() the ifp that is passed to rtdeletemsg() has been changed from the route interface to the ifa interface. Restore the old behavior and get the route ifp. found by regress/sys/netinet/carp; OK mpi@
2019-06-24	Use timeout_add_sec(9)	kn
	Re-challenge timeouts are made up of single scalar factors which are multiplied with the time unit lcp.timeout to compute the timeout period. Simply reduce that unit of 1 * hz [ticks] to 1 [s] and use the appropiate API. OK mpi