summaryrefslogtreecommitdiff
path: root/sys/net
AgeCommit message (Collapse)Author
2019-03-20States in pf(4) let ICMP and ICMP6 packets pass if they have aAlexander Bluhm
packet in their payload that matches an exiting connection. It was not checked whether the outer ICMP packet has the same destination IP as the source IP of the inner protocol packet. Enforce that these addresses match, to prevent ICMP packets that do not make sense. Issue found by Nicolas Collignon, Corentin Bayet, Eloi Vanderbeken, Luca Moro at Synacktiv.com OK sashan@
2019-03-18tweak SIOCGETLABEL and add SIOCDELLABELDavid Gwynne
makes mpe consistent with mpw and mpip
2019-03-18make SIOCGETLABEL fail with EADDRNOTAVAIL if the label is not set.David Gwynne
this makes ifconfig print "(unset)" to show the label isn't set yet.
2019-03-18extend BIOCSFILDROP so it can be configured to not capture packets.David Gwynne
BIOCSFILDROP was already able to be used as a quick and dirty firewall, which is especially useful when you you want to filter non-ip things. however, capturing the packets you're dropping is a lot of overhead when you just want to drop stuff. this extends fildrop so you can tell bpf not to capture the packets it drops. ok sthen@ mikeb@ claudio@ visa@
2019-03-17extend BIOCSFILDROP so it can be configured to not capture packets.David Gwynne
this just provides the macros for the different values for BIOCGFILDROP and BIOCSFILDROP, the implementation behing them is coming. ok sthen@ mikeb@ claudio@ visa@
2019-03-12Merge copy/pasted code to export STP states via ioctl into a function.Martin Pieuchot
2019-03-08Do not grab a `bif' pointer again, we already have it.Martin Pieuchot
ok visa@
2019-03-08Move the tag mechanism outside of net/if_bridge.c.Martin Pieuchot
This will help for future (un)locking. ok visa@
2019-03-05Make sure pointer is within bounds before dereferencing it.anton
ok claudio@ deraadt@ Reported-by: syzbot+8e29400e09a351f17884@syzkaller.appspotmail.com
2019-03-04move back to ifiq_input counting packets instead of queue operations.David Gwynne
the backpressure seems to have kicked in too early, introducing a lot of packet loss where there wasn't any before. secondly, counting operations interacted extremely badly with pseudo-interfaces. for example, if you have a physical interface that rxes 100 vlan encapsulated packets, it will call ifiq_input once for all 100 packets. when the network stack is running vlan_input against thes packets, vlan_input will take the packet and call ifiq_input against each of them. because the stack is running packets on the parent interface, it can't run the packets on the vlan interface, so you end up with ifiq_input being called 100 times, and we dropped packets after 16 calls to ifiq_input without a matching run of the stack. chris cappuccio hit some weird stuff too. discussed with claudio@
2019-03-04don't need to initialise qdrops twice when setting up ifqs and ifiqs.David Gwynne
2019-03-04allow IPv6 to flow through pppx(4)denis
OK phessler@ deraadt@
2019-03-04Add padding to struct sadb_x_counter to make it comply withStefan Sperling
alignment constraints documented in RFC 2367 section 2.2. Fixes 'ipsecctl -ss' segfault observed on i386. with and ok deraadt@ visa@ mikeb@
2019-03-03Found some historical code. Don't cast the pointer for bzero to a different ↵Theo de Raadt
type, and definately don't do this to the length: (unsigned)(cplim2 - cp2) ok claudio
2019-03-01rework how ifiq_input decides the stack is busy and whether it should dropDavid Gwynne
previously ifiq_input uses the traditional backpressure or defense mechanism and counts packets to decide when to shed load by dropping. currently it ends up waiting for 10240 packets to get queued on the stack before it would decide to drop packets. this may be ok for some machines, but for a lot this was too much. this diff reworks how ifiqs measure how busy the stack is by introducing an ifiq_pressure counter that is incremented when ifiq_input is called, and cleared when ifiq_process calls the network stack to process the queue. if ifiq_input is called multiple times before ifiq_process in a net taskq runs, ifiq_pressure goes up, and ifiq_input uses a high value to decide the stack is busy and it should drop. i was hoping there would be no performance impact from this change, but hrvoje popovski notes a slight bump in forwarding performance. my own testing shows that the ifiq input list length grows to a fraction of the 10240 it used to get to, which means the maximum burst of packets through the stack is smoothed out a bit. instead of big lists of packets followed by big periods of drops, we get relatively small bursts of packets with smaller gaps where we drop. the follow-on from this is to make drivers implementing rx ring moderation to use the return value of ifiq_input to scale the ring allocation down, allowing the hardware to drop packets so software doesnt have to.
2019-02-28IPv6 fragments with malformed extension headers could be erroneouslyAlexander Bluhm
passed by pf or cause a panic in pf. fix from sashan@; OK bluhm@ claudio@ bug found by Corentin Bayet, Nicolas Collignon, Luca Moro at Synacktiv
2019-02-28Add mpip(4), an IP tunnel interface for "IP Layer 2" over MPLS pseudowiresDavid Gwynne
This is basically mpw(4), but it carries IP directly instead of Ethernet. On the wire it can look the same as what IP over MPLS looks like, but because it is a pseudowire you can configure a control word or the FAT label to improve load balancing. It can be used to quickly set up an IP tunnel over an MPLS fabric without the need to configure bgpd and mpe(4) interfaces. Because It implements the same pwe3 ioctls that mpw(4) uses ifconfig already supports configuration of mpip(4) interfaces. ldpd will grow support for this in the near future. This is not hooked up to the build yet discussed with claduio@ at ak219 ok claudio@
2019-02-26don't check the pseudowire type in tx and rx paths.David Gwynne
whether the mpw interface is advertising "ethernet" or "ethernet- tagged" is something the ends of the wire agree on (ie, ldpd is configured a certain way), it is not something that affects ethernet encap or decap. the MPW ioctls can still configure it and read it, but it has no bearing on how the driver operates on packets.
2019-02-26use NET_LOCK to coordinate destroying a cloned interface.David Gwynne
2019-02-26add support for the new pwe3 ioctls.David Gwynne
the existing mpw ioctl is still available for ldpd to use for a (short) while. discussed with claudio@ at a2k19 ok mpi@
2019-02-26check for root on mpls and pwe3 ioctlsDavid Gwynne
part of a larger diff ok mpi@
2019-02-20Protect the hash table with a mutex.Martin Pieuchot
inputs & ok visa@
2019-02-20add support for rfc 6391: flow-aware transport of pseudowires.David Gwynne
this basically adds a dummy mpls tag to the stack for pseudowires and uses a flow as the label on that dummy tag. this allows intermediate systems that hash packets onto multiple links to use the extra tag as input to the hash, providing more entropy and therefore better load balancing. it's a pity there's no way to turn it on yet...
2019-02-20don't store the BOS flag as part of the remote label, add it at tx time.David Gwynne
this is to prepare for flow aware transport for FAT from RFC 6391
2019-02-20replace sc_flgas with sc_cwordDavid Gwynne
the only flag used with sc_flags was the one to turn the control word on and off. this is in preparation for split ioctls for controlling pseudowire behaviour. sc_cword can be set atomically and indepentently as a separate variable.
2019-02-20add the locking for coordinating between ioctls and a clone destroy.David Gwynne
i wrote this in mpe before porting and committing it in mpw, but forgot to commit the mpe version.
2019-02-20sigh, more whitespace fixesDavid Gwynne
no functional change
2019-02-20oops, whitespace tweakDavid Gwynne
no functional change
2019-02-20add support for SIOCGETLABELDavid Gwynne
this is a first step in breaking up the monolithic and redundant SIOCSETMPWCFG ioctls discussed with claudio@
2019-02-20make ether_output with AF_MPLS use a routes gateway address if availableDavid Gwynne
sending an MPLS frame is weird compared to other address families. other families figure out and pass the address on the local link for ether_output to use for resolution, but AF_MPLS basically passes a dummy sockaddr so ether_output can get the ethernet protocol field right. ether_output then has to pull the route apart to figure out which address and family to use for address resolution on the local net. eg, MPLS tagged routes via ip addresess need to pull the route apart and get at the AF_INET sockaddr to pass to arpresolve. that code currently uses the destination address of the route, but if that destination is not on the local network, we'd end up using it for arp requests that don't work. this change uses the rt_gateway sockaddr if RTF_GATEWAY is set. this solves the problem in my testing and doesn't seem to break other uses cases ive tried. reported by adrian close via bugs@ ok deraadt@ claudio@
2019-02-18Change ps_len of struct pfioc_states and psn_len of structAlexander Bluhm
pfioc_src_nodes to size_t. This avoids integer truncation by casts to unsigned. As the types of DIOCGETSTATES and DIOCGETSRCNODES ioctl(2) arguments change, pfctl(8) and systat(1) should be updated together with the kernel. Calculate number of pf(4) states as size_t in userland. OK sashan@ deraadt@
2019-02-18get rid of some trailing whitespace.David Gwynne
no functional change
2019-02-17Make bridge_rtupdate() return an error value instead of a pointer.Martin Pieuchot
2019-02-15Remove KASSERT() for now. It triggers when destroying lo(4) of a rdomainClaudio Jeker
because the rtable_l2 is modified before calling rt_ifa_del. Triggered by regress test and reported by Moritz Buhl mbuhl at mbuhl dot me
2019-02-15Use `ifidx' when storing an interface index.Martin Pieuchot
ok dlg@
2019-02-15coordinate configuration of local mpls labels with destroying an interfaceDavid Gwynne
this adds an rwlock like mpe has which means only one thing can be adding or removing a local mpls label, and those things check if the interface is dying before doing their thing.
2019-02-15allow configuration of the rdomain that mpls operates inDavid Gwynne
this is based on the changes to mpe i made yesterday. unfortunately mpw has a monstor ioctl that configures all the things, which makes the kernel side complicated. hopefully i can split them up.
2019-02-14Use timeout_barrier() when bringing the bridge(4) down and only executeMartin Pieuchot
the timeout handler if the interface is running. ok claudio@
2019-02-14mpw.h is no longer needed.Martin Pieuchot
2019-02-14Remove mpw(4) hacks now that all the world is Ethernet.Martin Pieuchot
2019-02-14the rdomain for the mpls stuff should still be hardcoded to 0 in mpw.David Gwynne
it was using ifp->if_rdomain for the rtalloc of the mpls encapsulated tunnel in mpw_start.
2019-02-14use the configured route domain for the mpls tunnel when sending packets.David Gwynne
2019-02-14consistently use the same flags for rt_ifa_add and _del.David Gwynne
experience with mpe shows you need RTF_LOCAL everywhere for del to work.
2019-02-14allow configuration of the rdomain the mpls encap operates inDavid Gwynne
this borrows the SIOCSLIFPHYRTABLE and SIOCGLIFPHYRTABLE that tunnel interfaces implement to set the rdomain mpls operates in. ifconfig tunneldomain X lets you set it, and you can see the effect with netstat -nr -f mpls -TX, but ifconfig currently doesnt show the tunneldomain. yet.
2019-02-13don't confuse the interface rdomain with the one the local label is in.David Gwynne
SIOCSIFRDOMAIN is about the routes on top of an mpe interface. the rdomain mpls operates in is independent of that, and currently restricted to rdomain 0.
2019-02-13change rt_ifa_add and rt_ifa_del so they take an rdomain argument.David Gwynne
this allows mpls interfaces (mpe, mpw) to pass the rdomain they wish the local label to be in, rather than have it implicitly forced to 0 by these functions. right now they'll pass 0, but it will soon be possible to have them rx packets in other rdomains. previously the functions used ifp->if_rdomain for the rdomain. everything other than mpls still passes ifp->if_rdomain. ok mpi@
2019-02-11add M_CANFAIL to malloc, and return ENOMEM if allocating an interfaceDavid Gwynne
fails.
2019-02-10assign the m_prepend result to the right variable.David Gwynne
2019-02-10whitespace tweak, no functional changeDavid Gwynne
2019-02-10get rid of the global list of mpe interfaces, it's not needed anymoreDavid Gwynne
mpe would try to detect label collisions itself, but wasn't coordinating with mpw or other labels, making it's solution incomplete. this also means i won't need extra locking if i try to make the ioctl paths mpsafe.