summaryrefslogtreecommitdiff
path: root/sys/net
AgeCommit message (Collapse)Author
2021-02-08Start refcounting interface groups with 1. if_creategroup() returnsAlexander Bluhm
a new object that is already refcounted, so carp attach does not reach into internal structures. Add kasserts to detect counter overflow or underflow. OK mvs@
2021-02-06Simplex interface sends packet back without hardware checksumAlexander Bluhm
offloading. The checksum must be calculated in software. Use the same condition in ether_resolve() to send the broadcast packet back to the stack and in in_ifcap_cksum() to force software checksumming. This fixes regress/sys/kern/sosplice/loop. OK procter@
2021-02-05Fix whitespace.Alexander Bluhm
2021-02-04make if_pfsync.c a better friend with PF_LOCKAlexandr Nedvedicky
The code delivered in this change is currently disabled. Brave souls may enable the code by adding -DWITH_PF_LOCK when building customized kernel. Big thanks goes to Hrvoje@ for providing test equipment and testing. As soon as we enter the next release cycle, the WITH_PF_LOCK will be defined as default option for MP kernels. OK dlg@
2021-02-03change pf_route so pf only runs when packets enter and leave the stack.David Gwynne
before this change pf_route operated on the semantic that pf runs when packets go over an interface, so when pf_route changed which interface the packet was on it would run pf_test again. this change changes (restores) the semantic that pf is only supposed to run when packets go in or out of the network stack, even if route-to is responsibly for short circuiting past the network stack. just to be clear, for normal packets (ie, those not touched by route-to/reply-to/dup-to), there isn't a difference between running pf when packets enter or leave the stack, or having pf run when a packet goes over an interface. the main reason for this change is that running the same packet through pf multiple times creates confusion for the state table. by default, pf states are floating, meaning that packets are matched to states regardless of which interface they're going over. if a packet leaving on em0 is rerouted out em1, both traversals will end up using the same state, which at best will make the accounting look weird, or at worst fail some checks in the state and get dropped. another reason for this commit is is to make handling of the changes that route-to makes consistent with other changes that are made to packet. eg, when nat is applied to a packet, we don't run pf_test again with the new addresses. the main caveat with this diff is you can't have one rule that pushes a packet out a different interface, and then have a rule on that second interface that NATs the packet. i'm not convinced this ever worked reliably or was used much anyway, so we don't think it's a big concern. discussed with many, with special thanks to bluhm@, sashan@ and sthen@ for weathering most of that pain. ok claudio@ sashan@ jmatthew@
2021-02-01Netlock should be grabbed before pppx_if_find() call in pppxwrite().mvs
Otherwise this `pxi' can be killed by concurrent thread after context switch caused by following netlock. ok yasuoka@
2021-02-01Remove dummy TUNSIFMODE ioctl(2) call from pppac(4) and npppd(8). Sincemvs
OpenBSD 6.7 npppd(8) can't work over tun(4). ok yasuoka@
2021-02-01ifunit() was fully replaced by if_unit(9) and should go away.mvs
ok bluhm@ dlg@
2021-02-01change route-to so it sends packets to IPs instead of interfaces.David Gwynne
this is a significant (and breaking) reworking of the policy based routing that pf can do. the intention is to make it as easy as nat/rdr to use, and more robust when it's operating. the main reasons for this change are: - route-to, reply-to, and dup-to do not work with pfsync this is because the information about where to route-to is stored in rules, and it is hard to have a ruleset synced between firewalls, and impossible to have them synced 100% of the time. - i can make my boxes panic in certain situations using route-to yeah... - the configuration and syntax for route-to rules are confusing. the argument to route-to and co is an interace name with an optional ip address. there are several problems with this. one is that people tend to think about routing as sending packets to peers by their address, not by the interface they're reachable on. another is that we currently have no way to synchronise interface topology information between firewalls, so using an interface to say where packets go means we can't do failover of these states with pfsync. another is that a change in routing topology means a host may become reachable over a different interface. tying routing policy to interfaces gets in the way of failover and load balancing. this change does the following: - stores the route info in the state instead of the pf rule this allows route-to to keep working when the ruleset changes, and allows route-to info to be sent over pfsync. there's enough spare bits in pfsync messages that the protocol doesnt break. the caveat is that route-to becomes tied to pass rules that create state, like rdr-to and nat-to. - the argument to route-to etc is a destination ip address it's not limited to a next-hop address (thought a next-hop can be a destination address). this allows for the failover and load balancing referred to above. - deprecates the address@interface host syntax in pfctl because routing is done entirely by IPs, the interface is derived from the route lookup, not pf. any attempt to use the @interface syntax will fail now in all contexts. there's enthusiasm from proctor@ jmatthew@ and others ok sashan@ bluhm@
2021-01-28bridge(4): convert ifunit() to if_unit(9)mvs
ok bluhm@ sashan@
2021-01-28trunk(4): convert ifunit to if_unit(9)mvs
ok bluhm@
2021-01-28handle "once" rules before letting pfsync defer tx of a packet.David Gwynne
pfsync may want to defer the transmission of a packet. it does this so it can try and get a state over to a peer firewall before a host may send a reply to the peer, which would get dropped cos there's no matching state. i think the once rule processing should happen before that. the state is created from the rule, whether the packet the state is for goes out immediately or not shouldn't matter. ok sashan@
2021-01-27if the route resolved in pf_route is invalid, generate an icmp error.David Gwynne
of course this is limited to the !dup-to case. ok sashan@ bluhm@
2021-01-27have pf_route{,6} clear the pf_pdesc mbuf ref early for route-to/reply-to.David Gwynne
pf_route and pf_route6 are called to take over delivery of the packet with route-to and reply-to instead of letting it get processed normally. for the dup-to handling, it copies the mbuf but leaves the original mbuf in place. pf_route takes over the packet by clearing the mbuf pointer in the pf_pdesc struct. this diff moves the clearing of that pointer to the start of the function, rather than checking for dup-to again on the way out of the function. i think this is better because it means that it's more robust in the face of future code changes. even if that's not true, it's still shorter code in a forwarding path. ok sashan@ jmatthew@
2021-01-27don't run copies of packets made by dup-to through pf_test.David Gwynne
dup-to is kind of like what you do with a span port, but is a bit more fine grained. it copies packets in a connection out an interface so that connection can be monitored. it doesnt make sense for pf to see the copied packets and try to match or create new states for them either. at best it needs config to stop pf seeing the copies (eg, set skip on $dup_to_tgt_if). at worst it breaks the connections you're monitoring because the states in pf get confused. found while discussing larger route-to changes on tech@. ok bluhm@ sashan@
2021-01-25We have this sequence in bridge(4) ioctl(2) path:mvs
ifs = ifunit(req->ifbr_ifsname); if (ifs == NULL) { error = ENOENT; break; } if (ifs->if_bridgeidx != ifp->if_index) { error = ESRCH; break; } bif = bridge_getbif(ifs); This sequence repeats 8 times. Also we don't check value returned by bridge_getbig() before use. Newly introduced bridge_getbig() function replaces this sequence. This not only reduces duplicated code but also makes `bif' dereference safe. ok bluhm@
2021-01-25Fix wg(4) ioctl to be able to handle multiple wgpeers.YASUOKA Masahiko
Diff from Yuichiro NAITO. ok procter
2021-01-21vlan(4): convert ifunit() to if_unit(9)mvs
ok dlg@ kn@
2021-01-21let vfs keep track of nonblocking state for us.David Gwynne
ok claudio@ mvs@
2021-01-20An invalid packet may not have set src and dst in packet descriptor.Alexander Bluhm
Add a NULL check to prevent crash in pflog(4) introduced in previous commit. Reported-by: syzbot+c6d2f2ad34b822bce98a@syzkaller.appspotmail.com
2021-01-20Print rewritten addresses in tcpdump(8) logged with pflog(4) forAlexander Bluhm
rdr-to, nat-to, af-to rules. The kernel uses the information from the packet description and fills it into the fields in the pflog header. While doing this, it is trival to figure out whether the packet has been rewritten. OK sashan@
2021-01-19pflog(4) tried to log the translated packet with rdr-to, nat-to,Alexander Bluhm
and af-to addresses and ports applied. Therefore it created a mbuf chain on the stack with a partial copy. This is too complicated for IP options, extension header, NAT46 af-to, and fragmented mbuf chains. It even caused a crash in syzkaller. Usually the length checks in pf_setup_pdesc() rejected the faked mbuf and the goto copy logged the packet unmodified. Remove the pflog_mtap() function and call bpf_mtap_hdr() directly. As the old buggy code was bypassed in most cases, tcpdump(8) output of pflog does not change. Uncondionally log the unmodified packet. Reported-by: syzbot+947e89e06ac3fec187d0@syzkaller.appspotmail.com OK sashan@
2021-01-19pipex(4): convert ifunit() to if_unit(9)mvs
ok dlg@
2021-01-19switch(4): convert ifunit to if_unit(9)mvs
ok dlg@
2021-01-19pppoe(4): convert ifunit() to if_unit(9)mvs
ok dlg@ kn@
2021-01-19pipex(4): convert ifunit() to if_unit(9)mvs
ok dlg@
2021-01-19gre(4): convert ifunit() to if_unit(9)mvs
ok dlg@
2021-01-19tpmr(4): convert ifunit() to if_unit(9)mvs
ok dlg@
2021-01-19bpe(4): convert ifunit() to if_unit(9)mvs
ok dlg@
2021-01-19aggr(4): convert ifunit() to if_unit(9)mvs
ok dlg@
2021-01-18Convert ifunit() to if_unit(9).mvs
ok sashan@
2021-01-18Introduce new function if_unit(9). This function returns a pointer themvs
interface descriptor corresponding to the unique name. This descriptor is guaranteed to be valid until if_put(9) is called on the returned pointer. if_unit(9) should replace already existent ifunit() which returns descriptor not safe for dereference when context was switched. This allow us to avoid some use-after-free issues in ioctl(2) path. Also this unifies interface descriptor usage. ok claudio@ sashan@
2021-01-17don't encode the mbuf prio as part of the vlan tag in bpf_mtap_ether.David Gwynne
the vlan tag we're injecting into the mbuf chain is either straight off the wire and therefore already has the vlan priority encoded, or is straight after it's been set up by vlan(4), which also has the prio already encoded. ok kn@ visa@ mvs@
2021-01-16The sysctl variable net.inet.ip.forwarding is checked beforeAlexander Bluhm
ip_input() passes the packet to ip_forward(). But with an af-to rule, pf(4) calls ip_forward() directly. Check the forwarding sysctl also in pf to get consistent behavior. This requires to set both ip and ip6 forwarding to get packet flow in both directions over af-to rules. OK kn@
2021-01-15Remove a check that bypasses pf state tests. It dates back to 2003Alexander Bluhm
when NAT was implemented differently. Now it does not seem to make sense anymore. sashan@ has identified cases where it does harm. dlg@ wants to remove it to simplify route-to code. from dlg@; OK sashan@
2021-01-14Fix build without carp: ifp0 is only used within #if NCARP > 0.Theo Buehler
ok kn mvs
2021-01-13Link pflog(4) instances to `pflog_ifs' list instead of allocatingmvs
`pflogifs' array. This was done to prevent panics caused by internal malloc(9) limit. Also we avoid the case while single pflog(4) interface with a high index allocates an array for all indices below and eats up kernel memory. Since we have a very little count of pflog(4) interfaces linear search does not performance impact. ok bluhm@ claudio@ kn@
2021-01-13Send without kernel lockkn
The output path can run without kernel lock just fine as is. Looking at CVS log, it seems this was not done during import because IFXF_MPSAFE only became a thing afterwards. OK mvs
2021-01-12Sometimes a user ID was logged in pflog(4) although the logopt ofAlexander Bluhm
the rule did not specify it. Check the option again for the log rule in case another rule has triggered a socket lookup. Remove logopt group, it is not documented and cannot work as struct pfloghdr does not contain a gid. Rename PF_LOG_SOCKET_LOOKUP to PF_LOG_USER to express what it does. The lookup involved is only an implemntation detail. OK kn@ sashan@ mvs@
2021-01-11Remove unused start routinekn
pflog(4) does not send or generate packets by design. OK mvs sashan
2021-01-09Enforce range with sysctl_int_bounded in etherip_sysctlgnezdo
OK millert@
2021-01-09Enforce range with sysctl_int_bounded in pipex_sysctlgnezdo
OK millert@
2021-01-09Syzkaller has found a stack overflow in socket splicing. BroadcastAlexander Bluhm
packets were resent through simplex broadcast delivery and socket splicing. Although there is an M_LOOP check in somove(9), it did not take effect. if_input_local() cleared the M_BCAST and M_MCAST flags with m_resethdr(). As if_input_local() is used for broadcast and multicast delivery, it was a mistake to delete them. Keep the M_BCAST and M_MCAST mbuf flags when packets are reinjected into the network stack. Reported-by: syzbot+a43ace363f1b663238f8@syzkaller.appspotmail.com OK anton@; discussed with claudio@
2021-01-08don't check local carp addresses as part of the antispoof checks.David Gwynne
bridge(4) drops packets coming from somewhere else that have a source MAC address that's owned by one of the interfaces that's a member of the bridge. because this check was done with bridge_ourether, it included the addresses of active carp interfaces hanging off these member interfaces. this meant if the local machine is the carp master while another machine is trying to preempt it by sending hellos, the packets from the other machine were dropped because the local one is already the master. carp roles are supposed to move around a l2 network, so another host sending a packet with a carp mac address is actually normal and necessary. found by and fix tested by stsp@ ok stsp@ claudio@
2021-01-05pppoeintr() is no morekn
2021-01-04Process pppoe(4) packets directly, do not queue through netiskn
Less scheduling, lock contention and queues. Previously, if_netisr() handled the net lock around those calls, now if_input_process() does it before calling ether_input(), so no need to add or remove NET_*LOCK() anywhere. OK mvs claudio
2021-01-04Remove kernel lock from pppoe(4) input pathkn
"struct pppoe_softc" documents no member being protected by the kernel lock (alone); further review of the code paths starting from pppoeintr() shows no sleeping points which must be avoided in the softnet thread. Everything is fine as is to run without the big lock, so remove it. Tests sthen Feedback mpi mvs OK mvs claudio
2021-01-04Minor refactoring in pf(4). Note that struct pfsync_state is noAlexander Bluhm
longer memcopied but assigned. Alignment should not be an issue as it is __packed. Part of a larger diff from dlg@; OK dlg@ sashan@
2021-01-04Remove unused `pipex_iface_context' struct.mvs
ok ok@ yasuoka@
2021-01-02Don't call if_deactivate() in switch_clone_destroy(). Followingmvs
if_detach() will do this. ok kn@