Age | Commit message (Collapse) | Author |
|
for IPv6 link local addresses.
Some hosting and VM providers route customer IPv6 prefixes to link
local addresses derived from ethernet MAC addresses (RFC 2464). This
leads to hard to debug IPv6 connectivity problems and is probably not
worth the effort.
RFC 7721 lists 4 weaknesses:
3.1. Correlation of Activities over Time & 3.2. Location Tracking
These are still possible with RFC 7217 addresses for an adversary
connected to the same layer 2 network (think conference wifi). Since
the link local prefix stays the same (fe80::/64) the link local
addresses do not change between different networks.
An adversary on the same layer 2 network can probably track ethernet
MAC addresses via different means, too.
3.3. Address Scanning & 3.4. Device-Specific Vulnerability Exploitation
These now become possible, however, as noted above a layer 2 adversary
was probably able to do this via different means.
People concerned with these weaknesses are advised to use
ifconfig lladdr random.
OK benno
input & OK kn
|
|
ie, use ifq_deq_rollback after looking at the head mbuf instead of
ifq_deq_commit.
this is used in tun/tap, where it had the effect that you'd get the
datalen for the packet, and then when you try to read that many
bytes it had gone. cool and normal.
this was found by a student who was trying to do just that. i've
always just read the packet into a large buffer.
|
|
introduced a queue to grab the lock for multiple packets. Now we
have only netlock for both IP and protocol input. So the queue is
not necessary anymore. It just switches CPU and decreases performance.
So remove the inet and inet6 ip queue for local packets.
To get TCP running on loopback, we have to queue once between TCP
input and output of the two sockets. So use the loopback queue in
looutput() unconditionally.
OK visa@
|
|
i hope, i didn't test this that hard.
|
|
the idea and a good chunk of the implementation is copied from
bridge(4).
note that IP packets inside "service delimited" traffic, ie, vlan,
svlan, or bpe encapsulated traffic, are not considered IP and will
therefore not be given to pf to look at. if you want to filter that
you'll need to configure vlan/svlan/bpe interfaces to get past their
headers, and then configure them with their own tpmrs. hopefully
the interface input handlers were established in the right order.
|
|
the spec says we should filter packets destined to a list of ethernet
addresses. im currently interpreting "filter" as meaning dropping,
which this diff does.
however, one of the addresses to filter is the one lacp uses by
default and not a lot of lacp implementations (read switches) support
the configuration of a different address. i still need lacp to go
over tpmr, and because i can't change the address, this diff also
has a way to configure tpmr to still allow the packets through.
|
|
a TPMR is a simplified brigde (as supported by bridge(4)). it only
supports two ports, and unconditionally forwards frames between
them. this is unlike a real bridge which can support an arbitrary
number of ports and implements a learning algorithm.
i needed this to tunnel LACP between switches in a couple of data
centers separated by an IP network. because bridge(4) implements
an actual 802.1Q bridge, it eats packets that are supposed to be
sent between bridges, such as spanning tree and LACP. TPMR according
to the spec does a lot less of this, and is in fact documented in
the spec as being able to support transport of LACP frames. tpmr(4)
is actually a lot dumber and current does no filtering (except what
you can do with BPF).
because the forwarding path in tpmr(4) is so short and simple, it
is relatively fast and can be used to isolate and help improve the
relative performance of some parts of the system. i also have plans
to use this for monitoring traffic without processing it.
tpmr(4) implements the trunk(4) ioctls for managing configuration.
the ifconfig output for trunk interfaces is a bit shorter and needs
a lot less stuff faked to be useful. inside the kernel it appears
as an IFT_BRIDGE interface (like bridge(4)). it generally just drops
stuff unless it's between the ports it's managing.
this has been in production at my work for a few days between some
physical nics and etherip(4), and so far it has been really solid.
hrvoje popovski has kicked the tyres too, but more from a performance
point of view.
ok claudio@ deraadt@
|
|
gre tunnel is set up. This could cause a panic. In gre(4) reject
outgoing packets during that time window. While there, count
interface errors and use generic unhandled_af().
bug reported by andreas at nullbyte dot se; OK dlg@
|
|
pointed out by claudio@
|
|
ifconfig set/unset it.
ok deraadt@ kmos@
|
|
Fix an issue reported by Eygene Ryabinkin where packet where dropped by
pf(4) because a vlan(4) interface was picked instead of its underlying
em(4).
While here do some refactoring to avoid code duplication.
Based on a submission from Eygene Ryabinkin <rea at codelabs dot ru>.
ok bluhm@, kn@
|
|
useful for debugging.
|
|
excluding HALF_DUPLEX just seems mean.
|
|
by notify i mean we send an lacp packet with our collecting and
distributing flags cleared, which should tell the remote system
that it should no longer handle packets on their port as part of
their aggregation. this is implemented by "unselecting" a port.
if an active port is going away, ie, being removed from an aggr via
"ifconfig aggr0 -trunkport port0", all that happens is software
state on our side changes and we stop considering the interface as
part of the aggr interface. the partner system is otherwise oblivious
and can continue to send us packets until its expiry timeout fires
because it doesn't know any better.
we already intercept a ports ioctl handling, so if someone goes
"ifconfig portX down" while it is attached to an aggr, we can catch
that before the underlying driver actually tears the rings down, and
we still have a chance to try and send a packet to the peer. this
is useful because our drivers generally do not drop the physical
link, so again, the partner system is oblivious to the change on
our side until its expiry timer fires.
expiry timeouts can be up to 90 seconds away, which is a lot of
traffic to blackhole. sending the notification to the parnter means
they withdraw this link at the same time the local system is pulling
the port out of the aggregation. hopefully. it is possible the
packet is lost, but this is a good start.
the only caveat to this is is my implementation ignores the transmit
state machine from the lacp spec, and may cause more than 3 lacp
packets per second to be transmitted to the partner system. oh
well.
i should look at the marker protocol too.
|
|
this doesnt seem to be mentioned in the spec, but is a sensible
thing to do if you think about it. all the switches i've tried also
do this, so there's some consensus about it being sensible.
this is done in the link state handler rather than being added to
one of the state machines. the idea is to keep the state machines
as close to what's in the spec as possible.
|
|
|
|
All callers sleep indefinitely.
With help from visa@.
ok visa@, ratchov@, kn@
|
|
ok lteo@
|
|
without this it looks like debug output loses info because of how
the uct was shortcutted.
no functional change, just prettier printfs.
|
|
previously it would only run the selection logic if the peer
information changed, but it is possible to be in the current state
with stale partner info. that can happen if the port becomes
disabled/disconnected, which unwinds the mux machine, but doesnt
clear the partner info. when the link is enabled again we re-enter
the current state, but because the partner info is the same we
didn't run the selection logic, which in turn didn't let the mux
machine move forward again.
|
|
lacp didnt come up again after i replaced some optics with dacs, and it
has to be because of a problem around the selection logic. this will let
me narrow it down.
|
|
ehter_cmp goes away, ether_is_eq becomes ETHER_IS_EQ, ether_is_zero
becomes ETHER_IS_ANYADDR.
ether_is_slow is kept locally, but renamed to ETHER_IS_SLOWADDR to
better match what comes from if_ether.h.
|
|
1. If a packet happens to match an expired once rule before the rule is removed
by the purge thread, the rule will be added to the pf_rule_gcl list again,
eventually causing a kernel crash when the purge thread tries to remove the
expired rule multiple times; and
2. A packet that matches an expired once rule will still cause a state to be
created, so a once rule is not truly a one shot rule while it is in that
expired-but-not-purged time window.
To fix both bugs, add a check in pf_test_rule() to prevent expired once rules
from being added to pf_rule_gcl. The check is added "early" in pf_test_rule()
to prevent any new connections from creating state if they match the expired
once rule.
This commit also includes a tweak by sashan@ to ensure that only one PF task
will mark a once rule as expired. Here is sashan@'s commentary:
"As soon as there will be more PF tasks running in parallel, we would be
able to hit similar crash you are fixing now. The rules are covered by
read lock, so with more PF tasks there might be two packets racing
to expire at once rule at the same time. Using atomic_cas() is sufficient
measure to serialize competing packets."
tested by abieber@ who reported the kernel crash on bugs@
ok sashan@
|
|
socket is only used in process context, so pass PR_WAITOK to
pool_init(9). The possible sleep in pool_put(9) should not hurt
as route_detach() is only called by soclose(9). As both pr_attach()
and pr_detach() are always called with kernel lock, PR_RWLOCK is
not needed.
OK mpi@
|
|
only used in process context, so pass PR_WAITOK to pool_init(9).
The possible sleep in pool_put(9) should not hurt as pfkeyv2_detach()
is only called by soclose(9). As both pr_attach() and pr_detach()
are always called with kernel lock, PR_RWLOCK is not needed.
OK mpi@
|
|
ok dlg@, sthen@, millert@
|
|
simplify the code.
Pointed out by daniel@ with the help of their friend gcc9
OK kn
|
|
ok yasuoka
|
|
Reported by kn@, ok visa@
|
|
Src-node should use the reference counter since it might live longer
than its table entry, rule or the associated states.
OK sashan
|
|
ok kn@
|
|
|
|
|
|
it's the same, but there was a misleading comment on the same line
which this cleans up too.
|
|
this probably explains why ive seen a box decide not to use a
distributing port, even though the state machine and all the lacp
state flags say it's fine. it may also explain why jmatthew@ has
seen a port still transmitting after it's been removed from an
aggr(4).
|
|
|
|
|
|
|
|
make setting a trunkports mtu to its current mtu a nop. set a
trunkports mtu to the aggr mtu when the port is getting added. set
the mtu on all trunkports when the aggr mtu is set so things look
consistent. restore a trunkports mtu when it is removed from an
aggr.
this is mostly cosmetic since the mtu on trunkports isn't really
used anywhere.
|
|
802.1AX (formerly known as 802.3ad) describes the Link Aggregation
Control Protocol (LACP) and how to use it in a bunch of different
state machines to control when to bundle interfaces into an
aggregation.
technically the trunk(4) driver already implements support for
802.1AX, but it had a couple of problems i struggled to deal with
as part of that driver. firstly, i couldnt easily make the output
path in trunk mpsafe without getting bogged down, and the state
machine handling had a few hard to diagnose edge cases that i couldnt
figure out.
the new driver has an mpsafe output path, and implements ifq bypass
like vlan(4) does. this means output with aggr(4) is up to twice
as fast as trunk(4). the implementation of the state machines as
per the standard means the driver behaves more correctly in edge
cases like when a physical link looks like it is up, but is logically
unidirectional.
the code has been good enough for me to use in production, but it
does need more work. that can happen in tree now instead of carrying
a large diff around.
some testing by ccardenas@, hrvoje popovski, and jmatthew@
ok deraadt@ ccardenas@ jmatthew@
|
|
this will be used to prevent trunk and the upcoming aggr driver
from taking ownership of an Ethernet interface at the same time.
|
|
these values are used as the backpressure thresholds in the interface
rx q processing code. theyre being exposed as tunables to userland
while we are figuring out what the best values for them are.
ok visa@ deraadt@
|
|
hop interface configured with "route-to" was not used. Keep the
interface within the pf_src_node and use it when the record is used.
OK sashan
|
|
there are states which refer it.
OK sashan
|
|
instead of counting the number of packets on an ifiq, count the
number of times a nic has tried to queue packets before the stack
processes them.
this new semantic interacted badly with virtual interfaces like
vlan and trunk, but these interfaces have been tweaked to call
if_vinput instead of if_input so their packets are processed directly
because theyre already running inside the stack.
im putting this in so we can see what the effect is. if it goes
badly i'll back it out again.
ok cheloha@ proctor@ visa@
|
|
|
|
IF_WIRELESS_DEFAULT_PRIORITY and use it in umb(4) as default prio.
OK kettenis@, sthen@
|
|
passed though pf_test(). So there is no need to try to call
pf_pkt_addr_changed() instead just check that the PF statekey is NULL.
Initial problem of not including pf.h found by jsg@
OK jsg@ sashan@
|
|
address could trigger the "rt->rt_ifidx == ifp->if_index" assertion.
In rtflushclone() the ifp that is passed to rtdeletemsg() has been
changed from the route interface to the ifa interface. Restore the
old behavior and get the route ifp.
found by regress/sys/netinet/carp; OK mpi@
|
|
Re-challenge timeouts are made up of single scalar factors which are
multiplied with the time unit lcp.timeout to compute the timeout period.
Simply reduce that unit of 1 * hz [ticks] to 1 [s] and use the appropiate
API.
OK mpi
|