summaryrefslogtreecommitdiff
path: root/sys/net
AgeCommit message (Collapse)Author
2019-08-21 Remove support for semantically opace interface identifiers (RFC 7217)Florian Obser
for IPv6 link local addresses. Some hosting and VM providers route customer IPv6 prefixes to link local addresses derived from ethernet MAC addresses (RFC 2464). This leads to hard to debug IPv6 connectivity problems and is probably not worth the effort. RFC 7721 lists 4 weaknesses: 3.1. Correlation of Activities over Time & 3.2. Location Tracking These are still possible with RFC 7217 addresses for an adversary connected to the same layer 2 network (think conference wifi). Since the link local prefix stays the same (fe80::/64) the link local addresses do not change between different networks. An adversary on the same layer 2 network can probably track ethernet MAC addresses via different means, too. 3.3. Address Scanning & 3.4. Device-Specific Vulnerability Exploitation These now become possible, however, as noted above a layer 2 adversary was probably able to do this via different means. People concerned with these weaknesses are advised to use ifconfig lladdr random. OK benno input & OK kn
2019-08-16ifq_hdatalen should keep the mbuf it's looking at, not leak it.David Gwynne
ie, use ifq_deq_rollback after looking at the head mbuf instead of ifq_deq_commit. this is used in tun/tap, where it had the effect that you'd get the datalen for the packet, and then when you try to read that many bytes it had gone. cool and normal. this was found by a student who was trying to do just that. i've always just read the packet into a large buffer.
2019-08-06When we needed the kernel lock for local IP packet delivery, mpi@Alexander Bluhm
introduced a queue to grab the lock for multiple packets. Now we have only netlock for both IP and protocol input. So the queue is not necessary anymore. It just switches CPU and decreases performance. So remove the inet and inet6 ip queue for local packets. To get TCP running on loopback, we have to queue once between TCP input and output of the two sockets. So use the loopback queue in looutput() unconditionally. OK visa@
2019-08-05try to be more compliant with the spec by implementing marker responses.David Gwynne
i hope, i didn't test this that hard.
2019-08-05run pf against ip packets coming in and out of the two ports.David Gwynne
the idea and a good chunk of the implementation is copied from bridge(4). note that IP packets inside "service delimited" traffic, ie, vlan, svlan, or bpe encapsulated traffic, are not considered IP and will therefore not be given to pf to look at. if you want to filter that you'll need to configure vlan/svlan/bpe interfaces to get past their headers, and then configure them with their own tpmrs. hopefully the interface input handlers were established in the right order.
2019-08-05pay some lip service to TPMR compliance according to 802.1Q-2018David Gwynne
the spec says we should filter packets destined to a list of ethernet addresses. im currently interpreting "filter" as meaning dropping, which this diff does. however, one of the addresses to filter is the one lacp uses by default and not a lot of lacp implementations (read switches) support the configuration of a different address. i still need lacp to go over tpmr, and because i can't change the address, this diff also has a way to configure tpmr to still allow the packets through.
2019-08-01add tpmr(4), a quick and dirty 802.1Q Two-Port MAC Relay implementationDavid Gwynne
a TPMR is a simplified brigde (as supported by bridge(4)). it only supports two ports, and unconditionally forwards frames between them. this is unlike a real bridge which can support an arbitrary number of ports and implements a learning algorithm. i needed this to tunnel LACP between switches in a couple of data centers separated by an IP network. because bridge(4) implements an actual 802.1Q bridge, it eats packets that are supposed to be sent between bridges, such as spanning tree and LACP. TPMR according to the spec does a lot less of this, and is in fact documented in the spec as being able to support transport of LACP frames. tpmr(4) is actually a lot dumber and current does no filtering (except what you can do with BPF). because the forwarding path in tpmr(4) is so short and simple, it is relatively fast and can be used to isolate and help improve the relative performance of some parts of the system. i also have plans to use this for monitoring traffic without processing it. tpmr(4) implements the trunk(4) ioctls for managing configuration. the ifconfig output for trunk interfaces is a bit shorter and needs a lot less stuff faked to be useful. inside the kernel it appears as an IFT_BRIDGE interface (like bridge(4)). it generally just drops stuff unless it's between the ports it's managing. this has been in production at my work for a few days between some physical nics and etherip(4), and so far it has been really solid. hrvoje popovski has kicked the tyres too, but more from a performance point of view. ok claudio@ deraadt@
2019-07-29The IPv6 duplicate address detection may send a packet before theAlexander Bluhm
gre tunnel is set up. This could cause a panic. In gre(4) reject outgoing packets during that time window. While there, count interface errors and use generic unhandled_af(). bug reported by andreas at nullbyte dot se; OK dlg@
2019-07-25AF_INET comes before AF_INET6. Shorten line to <80 chars.Kenneth R Westerback
pointed out by claudio@
2019-07-25Add IFXF_AUTOCONF4 to if_xflags to match IFXF_AUTOCONF6. LetKenneth R Westerback
ifconfig set/unset it. ok deraadt@ kmos@
2019-07-20When multiple ports share the same MAC, pick the physical one for delivery.Martin Pieuchot
Fix an issue reported by Eygene Ryabinkin where packet where dropped by pf(4) because a vlan(4) interface was picked instead of its underlying em(4). While here do some refactoring to avoid code duplication. Based on a submission from Eygene Ryabinkin <rea at codelabs dot ru>. ok bluhm@, kn@
2019-07-20generate the actor info per port to send to userland.David Gwynne
useful for debugging.
2019-07-20just use LINK_STATE_IS_UP to see if a port has link.David Gwynne
excluding HALF_DUPLEX just seems mean.
2019-07-19try to notify the partner when the port is going away or down.David Gwynne
by notify i mean we send an lacp packet with our collecting and distributing flags cleared, which should tell the remote system that it should no longer handle packets on their port as part of their aggregation. this is implemented by "unselecting" a port. if an active port is going away, ie, being removed from an aggr via "ifconfig aggr0 -trunkport port0", all that happens is software state on our side changes and we stop considering the interface as part of the aggr interface. the partner system is otherwise oblivious and can continue to send us packets until its expiry timeout fires because it doesn't know any better. we already intercept a ports ioctl handling, so if someone goes "ifconfig portX down" while it is attached to an aggr, we can catch that before the underlying driver actually tears the rings down, and we still have a chance to try and send a packet to the peer. this is useful because our drivers generally do not drop the physical link, so again, the partner system is oblivious to the change on our side until its expiry timer fires. expiry timeouts can be up to 90 seconds away, which is a lot of traffic to blackhole. sending the notification to the parnter means they withdraw this link at the same time the local system is pulling the port out of the aggregation. hopefully. it is possible the packet is lost, but this is a good start. the only caveat to this is is my implementation ignores the transmit state machine from the lacp spec, and may cause more than 3 lacp packets per second to be transmitted to the partner system. oh well. i should look at the marker protocol too.
2019-07-19default (ie, reset) the partner info when a ports link goes down.David Gwynne
this doesnt seem to be mentioned in the spec, but is a sensible thing to do if you think about it. all the switches i've tried also do this, so there's some consensus about it being sensible. this is done in the link state handler rather than being added to one of the state machines. the idea is to keep the state machines as close to what's in the spec as possible.
2019-07-19export all the partner info to userland, not just what ifconfig prints.David Gwynne
2019-07-19ttysleep(): drop unused timeout parametercheloha
All callers sleep indefinitely. With help from visa@. ok visa@, ratchov@, kn@
2019-07-18follow up to 'once rule' expirationAlexandr Nedvedicky
ok lteo@
2019-07-18make the UCT in the rxm generate debug outputDavid Gwynne
without this it looks like debug output loses info because of how the uct was shortcutted. no functional change, just prettier printfs.
2019-07-18run the selection logic from the rxm current state if the port is unselectedDavid Gwynne
previously it would only run the selection logic if the peer information changed, but it is possible to be in the current state with stale partner info. that can happen if the port becomes disabled/disconnected, which unwinds the mux machine, but doesnt clear the partner info. when the link is enabled again we re-enter the current state, but because the partner info is the same we didn't run the selection logic, which in turn didn't let the mux machine move forward again.
2019-07-18bulk up the debug output around selection logicDavid Gwynne
lacp didnt come up again after i replaced some optics with dacs, and it has to be because of a problem around the selection logic. this will let me narrow it down.
2019-07-18replace ether_{cmp,is_eq,is_zero} with the new ones in netinet/if_ether.hDavid Gwynne
ehter_cmp goes away, ether_is_eq becomes ETHER_IS_EQ, ether_is_zero becomes ETHER_IS_ANYADDR. ether_is_slow is kept locally, but renamed to ETHER_IS_SLOWADDR to better match what comes from if_ether.h.
2019-07-18This commit fixes two bugs involving PF once rules:Lawrence Teo
1. If a packet happens to match an expired once rule before the rule is removed by the purge thread, the rule will be added to the pf_rule_gcl list again, eventually causing a kernel crash when the purge thread tries to remove the expired rule multiple times; and 2. A packet that matches an expired once rule will still cause a state to be created, so a once rule is not truly a one shot rule while it is in that expired-but-not-purged time window. To fix both bugs, add a check in pf_test_rule() to prevent expired once rules from being added to pf_rule_gcl. The check is added "early" in pf_test_rule() to prevent any new connections from creating state if they match the expired once rule. This commit also includes a tweak by sashan@ to ensure that only one PF task will mark a once rule as expired. Here is sashan@'s commentary: "As soon as there will be more PF tasks running in parallel, we would be able to hit similar crash you are fixing now. The rules are covered by read lock, so with more PF tasks there might be two packets racing to expire at once rule at the same time. Using atomic_cas() is sufficient measure to serialize competing packets." tested by abieber@ who reported the kernel crash on bugs@ ok sashan@
2019-07-17Convert struct rtpcb malloc(9) to pool_get(9). PCB for routingAlexander Bluhm
socket is only used in process context, so pass PR_WAITOK to pool_init(9). The possible sleep in pool_put(9) should not hurt as route_detach() is only called by soclose(9). As both pr_attach() and pr_detach() are always called with kernel lock, PR_RWLOCK is not needed. OK mpi@
2019-07-17Convert struct pkpcb malloc(9) to pool_get(9). PCB for pfkey isAlexander Bluhm
only used in process context, so pass PR_WAITOK to pool_init(9). The possible sleep in pool_put(9) should not hurt as pfkeyv2_detach() is only called by soclose(9). As both pr_attach() and pr_detach() are always called with kernel lock, PR_RWLOCK is not needed. OK mpi@
2019-07-17Introduce ETHER_IS_BROADCAST/ANYADDR/EQ() and use them where appropriate.Martin Pieuchot
ok dlg@, sthen@, millert@
2019-07-14newlen was a dead store, but what we could use is oldlen toFlorian Obser
simplify the code. Pointed out by daniel@ with the help of their friend gcc9 OK kn
2019-07-11fix NULL pointer dereference, reported and fix tested by sthenAlexandr Nedvedicky
ok yasuoka
2019-07-09Add missing mtx_leave() in error path.Martin Pieuchot
Reported by kn@, ok visa@
2019-07-09Fix previous commit which made src-node have a reference for the kif.YASUOKA Masahiko
Src-node should use the reference counter since it might live longer than its table entry, rule or the associated states. OK sashan
2019-07-08free(9) sizes for M_RTABLE.Martin Pieuchot
ok kn@
2019-07-05pretend to handle setting trunkproto, but only support setting it to lacpDavid Gwynne
2019-07-05fix the $OpenBSD$ tagDavid Gwynne
2019-07-05initialise sc_lacp_timeout to AGGR_LACP_TIMEOUT_SLOW, not 0;David Gwynne
it's the same, but there was a misleading comment on the same line which this cleans up too.
2019-07-05iterate over distributing ports when populating the tx map, not all portsDavid Gwynne
this probably explains why ive seen a box decide not to use a distributing port, even though the state machine and all the lacp state flags say it's fine. it may also explain why jmatthew@ has seen a port still transmitting after it's been removed from an aggr(4).
2019-07-05init the log of tx times to somewhere in the past when adding a port.David Gwynne
2019-07-05move a declaration before a statement.David Gwynne
2019-07-05report a port as active to userland if it is muxedDavid Gwynne
2019-07-05tweak mtu handling and propagate mtu setting to trunkportsDavid Gwynne
make setting a trunkports mtu to its current mtu a nop. set a trunkports mtu to the aggr mtu when the port is getting added. set the mtu on all trunkports when the aggr mtu is set so things look consistent. restore a trunkports mtu when it is removed from an aggr. this is mostly cosmetic since the mtu on trunkports isn't really used anywhere.
2019-07-05add aggr(4), a dedicated driver that implements 802.1AX link aggregationDavid Gwynne
802.1AX (formerly known as 802.3ad) describes the Link Aggregation Control Protocol (LACP) and how to use it in a bunch of different state machines to control when to bundle interfaces into an aggregation. technically the trunk(4) driver already implements support for 802.1AX, but it had a couple of problems i struggled to deal with as part of that driver. firstly, i couldnt easily make the output path in trunk mpsafe without getting bogged down, and the state machine handling had a few hard to diagnose edge cases that i couldnt figure out. the new driver has an mpsafe output path, and implements ifq bypass like vlan(4) does. this means output with aggr(4) is up to twice as fast as trunk(4). the implementation of the state machines as per the standard means the driver behaves more correctly in edge cases like when a physical link looks like it is up, but is logically unidirectional. the code has been good enough for me to use in production, but it does need more work. that can happen in tree now instead of carrying a large diff around. some testing by ccardenas@, hrvoje popovski, and jmatthew@ ok deraadt@ ccardenas@ jmatthew@
2019-07-05record when trunk takes over an interface by setting ac_trunkportDavid Gwynne
this will be used to prevent trunk and the upcoming aggr driver from taking ownership of an Ethernet interface at the same time.
2019-07-03add the kernel side of net.link.ifrxq.pressure_return and pressure_dropDavid Gwynne
these values are used as the backpressure thresholds in the interface rx q processing code. theyre being exposed as tunables to userland while we are figuring out what the best values for them are. ok visa@ deraadt@
2019-07-02When source address tracking record is used for "route-to", the nextYASUOKA Masahiko
hop interface configured with "route-to" was not used. Keep the interface within the pf_src_node and use it when the record is used. OK sashan
2019-07-01Link the state and the source track to keep the source track whileYASUOKA Masahiko
there are states which refer it. OK sashan
2019-07-01reintroduce ifiq_input counting backpressureDavid Gwynne
instead of counting the number of packets on an ifiq, count the number of times a nic has tried to queue packets before the stack processes them. this new semantic interacted badly with virtual interfaces like vlan and trunk, but these interfaces have been tweaked to call if_vinput instead of if_input so their packets are processed directly because theyre already running inside the stack. im putting this in so we can see what the effect is. if it goes badly i'll back it out again. ok cheloha@ proctor@ visa@
2019-06-30if_vinput should pass BPF_DIRECTION_IN to bpf_mtap, not OUTDavid Gwynne
2019-06-26Create IF_WWAN_DEFAULT_PRIORITY which is lower thanClaudio Jeker
IF_WIRELESS_DEFAULT_PRIORITY and use it in umb(4) as default prio. OK kettenis@, sthen@
2019-06-26The MPLS edge devices get the packets from the MPLS stack which neverClaudio Jeker
passed though pf_test(). So there is no need to try to call pf_pkt_addr_changed() instead just check that the PF statekey is NULL. Initial problem of not including pf.h found by jsg@ OK jsg@ sashan@
2019-06-24Since the recent recursion fix in rtable_walk(), deleting an interfaceAlexander Bluhm
address could trigger the "rt->rt_ifidx == ifp->if_index" assertion. In rtflushclone() the ifp that is passed to rtdeletemsg() has been changed from the route interface to the ifa interface. Restore the old behavior and get the route ifp. found by regress/sys/netinet/carp; OK mpi@
2019-06-24Use timeout_add_sec(9)kn
Re-challenge timeouts are made up of single scalar factors which are multiplied with the time unit lcp.timeout to compute the timeout period. Simply reduce that unit of 1 * hz [ticks] to 1 [s] and use the appropiate API. OK mpi