Age | Commit message (Collapse) | Author |
|
a new object that is already refcounted, so carp attach does not
reach into internal structures. Add kasserts to detect counter
overflow or underflow.
OK mvs@
|
|
offloading. The checksum must be calculated in software. Use the
same condition in ether_resolve() to send the broadcast packet back
to the stack and in in_ifcap_cksum() to force software checksumming.
This fixes regress/sys/kern/sosplice/loop.
OK procter@
|
|
|
|
The code delivered in this change is currently disabled. Brave souls
may enable the code by adding -DWITH_PF_LOCK when building customized
kernel. Big thanks goes to Hrvoje@ for providing test equipment and
testing.
As soon as we enter the next release cycle, the WITH_PF_LOCK will be
defined as default option for MP kernels.
OK dlg@
|
|
before this change pf_route operated on the semantic that pf runs
when packets go over an interface, so when pf_route changed which
interface the packet was on it would run pf_test again. this change
changes (restores) the semantic that pf is only supposed to run
when packets go in or out of the network stack, even if route-to
is responsibly for short circuiting past the network stack.
just to be clear, for normal packets (ie, those not touched by
route-to/reply-to/dup-to), there isn't a difference between running
pf when packets enter or leave the stack, or having pf run when a
packet goes over an interface.
the main reason for this change is that running the same packet
through pf multiple times creates confusion for the state table.
by default, pf states are floating, meaning that packets are matched
to states regardless of which interface they're going over. if a
packet leaving on em0 is rerouted out em1, both traversals will end
up using the same state, which at best will make the accounting
look weird, or at worst fail some checks in the state and get
dropped.
another reason for this commit is is to make handling of the changes
that route-to makes consistent with other changes that are made to
packet. eg, when nat is applied to a packet, we don't run pf_test
again with the new addresses.
the main caveat with this diff is you can't have one rule that
pushes a packet out a different interface, and then have a rule on
that second interface that NATs the packet. i'm not convinced this
ever worked reliably or was used much anyway, so we don't think
it's a big concern.
discussed with many, with special thanks to bluhm@, sashan@ and
sthen@ for weathering most of that pain.
ok claudio@ sashan@ jmatthew@
|
|
Otherwise this `pxi' can be killed by concurrent thread after context
switch caused by following netlock.
ok yasuoka@
|
|
OpenBSD 6.7 npppd(8) can't work over tun(4).
ok yasuoka@
|
|
ok bluhm@ dlg@
|
|
this is a significant (and breaking) reworking of the policy based
routing that pf can do. the intention is to make it as easy as
nat/rdr to use, and more robust when it's operating.
the main reasons for this change are:
- route-to, reply-to, and dup-to do not work with pfsync
this is because the information about where to route-to is stored in
rules, and it is hard to have a ruleset synced between firewalls,
and impossible to have them synced 100% of the time.
- i can make my boxes panic in certain situations using route-to
yeah...
- the configuration and syntax for route-to rules are confusing.
the argument to route-to and co is an interace name with an optional
ip address. there are several problems with this. one is that people
tend to think about routing as sending packets to peers by their
address, not by the interface they're reachable on. another is that
we currently have no way to synchronise interface topology information
between firewalls, so using an interface to say where packets go
means we can't do failover of these states with pfsync. another
is that a change in routing topology means a host may become
reachable over a different interface. tying routing policy to
interfaces gets in the way of failover and load balancing.
this change does the following:
- stores the route info in the state instead of the pf rule
this allows route-to to keep working when the ruleset changes, and
allows route-to info to be sent over pfsync. there's enough spare bits
in pfsync messages that the protocol doesnt break.
the caveat is that route-to becomes tied to pass rules that create
state, like rdr-to and nat-to.
- the argument to route-to etc is a destination ip address
it's not limited to a next-hop address (thought a next-hop can be a
destination address). this allows for the failover and load balancing
referred to above.
- deprecates the address@interface host syntax in pfctl
because routing is done entirely by IPs, the interface is derived from
the route lookup, not pf. any attempt to use the @interface syntax
will fail now in all contexts.
there's enthusiasm from proctor@ jmatthew@ and others
ok sashan@ bluhm@
|
|
ok bluhm@ sashan@
|
|
ok bluhm@
|
|
pfsync may want to defer the transmission of a packet. it does this so
it can try and get a state over to a peer firewall before a host may
send a reply to the peer, which would get dropped cos there's no
matching state.
i think the once rule processing should happen before that. the state
is created from the rule, whether the packet the state is for goes out
immediately or not shouldn't matter.
ok sashan@
|
|
of course this is limited to the !dup-to case.
ok sashan@ bluhm@
|
|
pf_route and pf_route6 are called to take over delivery of the
packet with route-to and reply-to instead of letting it get processed
normally. for the dup-to handling, it copies the mbuf but leaves
the original mbuf in place. pf_route takes over the packet by
clearing the mbuf pointer in the pf_pdesc struct. this diff moves
the clearing of that pointer to the start of the function, rather
than checking for dup-to again on the way out of the function.
i think this is better because it means that it's more robust in
the face of future code changes. even if that's not true, it's still
shorter code in a forwarding path.
ok sashan@ jmatthew@
|
|
dup-to is kind of like what you do with a span port, but is a bit
more fine grained. it copies packets in a connection out an interface
so that connection can be monitored. it doesnt make sense for pf
to see the copied packets and try to match or create new states for
them either. at best it needs config to stop pf seeing the copies
(eg, set skip on $dup_to_tgt_if). at worst it breaks the connections
you're monitoring because the states in pf get confused.
found while discussing larger route-to changes on tech@.
ok bluhm@ sashan@
|
|
ifs = ifunit(req->ifbr_ifsname);
if (ifs == NULL) {
error = ENOENT;
break;
}
if (ifs->if_bridgeidx != ifp->if_index) {
error = ESRCH;
break;
}
bif = bridge_getbif(ifs);
This sequence repeats 8 times. Also we don't check value returned by
bridge_getbig() before use. Newly introduced bridge_getbig() function
replaces this sequence. This not only reduces duplicated code but also
makes `bif' dereference safe.
ok bluhm@
|
|
Diff from Yuichiro NAITO.
ok procter
|
|
ok dlg@ kn@
|
|
ok claudio@ mvs@
|
|
Add a NULL check to prevent crash in pflog(4) introduced in previous
commit.
Reported-by: syzbot+c6d2f2ad34b822bce98a@syzkaller.appspotmail.com
|
|
rdr-to, nat-to, af-to rules. The kernel uses the information from
the packet description and fills it into the fields in the pflog
header. While doing this, it is trival to figure out whether the
packet has been rewritten.
OK sashan@
|
|
and af-to addresses and ports applied. Therefore it created a mbuf
chain on the stack with a partial copy. This is too complicated
for IP options, extension header, NAT46 af-to, and fragmented mbuf
chains. It even caused a crash in syzkaller. Usually the length
checks in pf_setup_pdesc() rejected the faked mbuf and the goto
copy logged the packet unmodified. Remove the pflog_mtap() function
and call bpf_mtap_hdr() directly. As the old buggy code was bypassed
in most cases, tcpdump(8) output of pflog does not change.
Uncondionally log the unmodified packet.
Reported-by: syzbot+947e89e06ac3fec187d0@syzkaller.appspotmail.com
OK sashan@
|
|
ok dlg@
|
|
ok dlg@
|
|
ok dlg@ kn@
|
|
ok dlg@
|
|
ok dlg@
|
|
ok dlg@
|
|
ok dlg@
|
|
ok dlg@
|
|
ok sashan@
|
|
interface descriptor corresponding to the unique name. This descriptor
is guaranteed to be valid until if_put(9) is called on the returned
pointer. if_unit(9) should replace already existent ifunit() which
returns descriptor not safe for dereference when context was switched.
This allow us to avoid some use-after-free issues in ioctl(2) path.
Also this unifies interface descriptor usage.
ok claudio@ sashan@
|
|
the vlan tag we're injecting into the mbuf chain is either straight
off the wire and therefore already has the vlan priority encoded,
or is straight after it's been set up by vlan(4), which also has
the prio already encoded.
ok kn@ visa@ mvs@
|
|
ip_input() passes the packet to ip_forward(). But with an af-to
rule, pf(4) calls ip_forward() directly. Check the forwarding
sysctl also in pf to get consistent behavior. This requires to set
both ip and ip6 forwarding to get packet flow in both directions
over af-to rules.
OK kn@
|
|
when NAT was implemented differently. Now it does not seem to make
sense anymore. sashan@ has identified cases where it does harm.
dlg@ wants to remove it to simplify route-to code.
from dlg@; OK sashan@
|
|
ok kn mvs
|
|
`pflogifs' array. This was done to prevent panics caused by internal
malloc(9) limit.
Also we avoid the case while single pflog(4) interface with a high index
allocates an array for all indices below and eats up kernel memory.
Since we have a very little count of pflog(4) interfaces linear search
does not performance impact.
ok bluhm@ claudio@ kn@
|
|
The output path can run without kernel lock just fine as is.
Looking at CVS log, it seems this was not done during import
because IFXF_MPSAFE only became a thing afterwards.
OK mvs
|
|
the rule did not specify it. Check the option again for the log
rule in case another rule has triggered a socket lookup. Remove
logopt group, it is not documented and cannot work as struct pfloghdr
does not contain a gid. Rename PF_LOG_SOCKET_LOOKUP to PF_LOG_USER
to express what it does. The lookup involved is only an implemntation
detail.
OK kn@ sashan@ mvs@
|
|
pflog(4) does not send or generate packets by design.
OK mvs sashan
|
|
OK millert@
|
|
OK millert@
|
|
packets were resent through simplex broadcast delivery and socket
splicing. Although there is an M_LOOP check in somove(9), it did
not take effect. if_input_local() cleared the M_BCAST and M_MCAST
flags with m_resethdr().
As if_input_local() is used for broadcast and multicast delivery,
it was a mistake to delete them. Keep the M_BCAST and M_MCAST mbuf
flags when packets are reinjected into the network stack.
Reported-by: syzbot+a43ace363f1b663238f8@syzkaller.appspotmail.com
OK anton@; discussed with claudio@
|
|
bridge(4) drops packets coming from somewhere else that have a
source MAC address that's owned by one of the interfaces that's a
member of the bridge. because this check was done with bridge_ourether,
it included the addresses of active carp interfaces hanging off
these member interfaces. this meant if the local machine is the
carp master while another machine is trying to preempt it by sending
hellos, the packets from the other machine were dropped because the
local one is already the master.
carp roles are supposed to move around a l2 network, so another
host sending a packet with a carp mac address is actually normal
and necessary.
found by and fix tested by stsp@
ok stsp@ claudio@
|
|
|
|
Less scheduling, lock contention and queues.
Previously, if_netisr() handled the net lock around those calls, now
if_input_process() does it before calling ether_input(), so no need to add
or remove NET_*LOCK() anywhere.
OK mvs claudio
|
|
"struct pppoe_softc" documents no member being protected by the kernel lock
(alone); further review of the code paths starting from pppoeintr() shows
no sleeping points which must be avoided in the softnet thread.
Everything is fine as is to run without the big lock, so remove it.
Tests sthen
Feedback mpi mvs
OK mvs claudio
|
|
longer memcopied but assigned. Alignment should not be an issue
as it is __packed.
Part of a larger diff from dlg@; OK dlg@ sashan@
|
|
ok ok@ yasuoka@
|
|
if_detach() will do this.
ok kn@
|