summaryrefslogtreecommitdiff
path: root/sys/net/if.c
AgeCommit message (Collapse)Author
2016-05-18rework the srp api so it takes an srp_ref struct that the caller provides.David Gwynne
the srp_ref struct is used to track the location of the callers hazard pointer so later calls to srp_follow and srp_enter already know what to clear. this in turn means most of the caveats around using srps go away. specifically, you can now: - switch cpus while holding an srp ref - ie, you can sleep while holding an srp ref - you can take and release srp refs in any order the original intent was to simplify use of the api when dealing with complicated data structures. the caller now no longer has to track the location of the srp a value was fetched from, the srp_ref effectively does that for you. srp lists have been refactored to use srp_refs instead of srpl_iter structs. this is in preparation of using srps inside the ART code. ART is a complicated data structure, and lookups require overlapping holds of srp references. ok mpi@ jmatthew@
2016-05-10make bpf_mtap callers set the M_FILDROP flag if they care about it.David Gwynne
ok mpi@
2016-05-08Do not export the IFXF_MPSAFE flag to userland, it is a kernel-onlyMartin Pieuchot
hint. ok kettenis@, deraadt@
2016-05-03Stop using a soft-interrupt context to process incoming network packets.Martin Pieuchot
Use a new task that runs holding the KERNEL_LOCK to execute mp-unsafe code. Our current goal is to progressively move input functions to the unlocked task. This gives a small performance boost confirmed by Hrvoje Popovski's IPv4 forwarding measurement: before: after: send receive send receive 400kpps 400kpps 400kpps 400kpps 500kpps 500kpps 500kpps 500kpps 600kpps 600kpps 600kpps 600kpps 650kpps 650kpps 650kpps 640kpps 700kpps 700kpps 700kpps 700kpps 720kpps 640kpps 720kpps 710kpps 800kpps 640kpps 800kpps 650kpps 1.4Mpps 570kpps 1.4Mpps 590kpps 14Mpps 570kpps 14Mpps 590kpps ok kettenis@, bluhm@, dlg@
2016-03-16if ticks diverge from ifq_congestion too far the diff will go negativeDavid Gwynne
detect this and bump ifq_congestion forward rather than claim the system is congested for a long period of time. ok mpi@ henning@ jmatthew@
2016-03-07Sync no-argument function declaration and definition by adding (void).Christian Weisgerber
ok mpi@ millert@
2016-03-02provide generic ioctls for managing an interfaces parentDavid Gwynne
in the future this will subsume the individual vlandev, carpdev, pppoedev, foodev options for things like vlan, carp, pppoe, etc. inspired by vnetid ok mpi@ jmatthew@
2016-02-28Support for running Linux binaries under emulation is going away.Christian Weisgerber
Remove "option COMPAT_LINUX" and everything directly tied to it from the kernel and the corresponding man page documentation. ok visa@ guenther@
2015-12-09rework the if_start mpsafe serialisation so it can serialise arbitrary workDavid Gwynne
work is represented by struct task. the start routine is now wrapped by a task which is serialised by the infrastructure. if_start_barrier has been renamed to ifq_barrier and is now implemented as a task that gets serialised with the start routine. this also adds an ifq_restart() function. it serialises a call to ifq_clr_oactive and calls the start routine again. it exists to avoid a race that kettenis@ identified in between when a start routine discovers theres no space left on a ring, and when it calls ifq_set_oactive. if the txeof side of the driver empties the ring and calls ifq_clr_oactive in between the above calls in start, the queue will be marked oactive and the stack will never call the start routine again. by serialising the ifq_set_oactive call in the start routine and ifq_clr_oactive calls we avoid that race. tested on various nics ok mpi@
2015-12-08Kill unused iftxlist.Martin Pieuchot
ok dlg@
2015-12-08split the interface send queue (struct ifqueue) implementation out.David Gwynne
the intention is to make it more clear what belongs to a transmit queue and what belongs to an interface. suggested by and ok mpi@
2015-12-05remove old lint annotationsTed Unangst
2015-12-04Grab the KERNEL_LOCK() around bridge_output().Martin Pieuchot
It is now safe to call if_enqueue() without holding the KERNEL_LOCK() even on an interface part of a bridge(4). ok dlg@, henning@, kettenis@
2015-12-03Use SRPL_HEAD() and SRPL_ENTRY() to be consistent with and allow toMartin Pieuchot
fallback to a SLIST. ok dlg@, jasper@
2015-12-03Remove broadcast matching from ifa_ifwithaddr(), use in_broadcast() whereVincent Gross
required. ok bluhm@ mpi@.
2015-12-03rework if_start to allow nics to provide an mpsafe start routine.David Gwynne
existing start routines will still be called under the kernel lock and at IPL_NET. mpsafe start routines will be serialised so only one instance of each interfaces function will be running in the kernel at any point in time. this guarantees packets will be dequeued in order, and the start routines dont have to lock against themselves because if_start does it for them. the code to do that is based on the scsi runqueue code. this also provides an if_start_barrier() function that should wait until any currently running instances of if_start have finished. a driver can opt in to the mpsafe if_start call by doing the following: 1. setting ifp->if_xflags = IFXF_MPSAFE 2. only calling if_start() instead of its own start routine 3. clearing IFF_RUNNING before calling if_start_barrier() on its way down 4. only using IFQ_DEQUEUE (not ifq_deq_begin/commit/rollback) to simplify the implementation the tx mitigation code has been removed. tested by several ok mpi@ jmatthew@
2015-12-02When destroying an interface, we have to wait until all referencesAlexander Bluhm
are not used anymore. This has to be done before any interface fields become invalid. As the route delete request cannot call if_get() anymore, pass down the interface. Split rtrequest_delete() into a separate function that may take an existing inteface. OK mpi@
2015-12-02Rework the MPLS handling. Remove the lookup loops since nothing is usingClaudio Jeker
them and they make everything so much harder with no gain. Remove the ifp argument from mpls_input since it is not needed. On the input side the lookup side is modified a bit when it comes to BOS handling. Tested in a L3VPN setup with ldpd and bgpd. Commiting now so we can move on with cleaning up rt_ifp usage. If this breaks L2VPN I will fix it once reported. OK mpi@
2015-12-01Iterating on &ifnet should only be done with the KERNEL_LOCK held.Vincent Gross
With input and ok mpi@.
2015-11-27Protect the growth of the routing table arrays used by rtable_get()Martin Pieuchot
with SRPs. This is a simplified version of the dynamically sizeable array of pointers used by if_get() because routing table heads are never freed. ok dlg@
2015-11-25replace IFF_OACTIVE manipulation with mpsafe operations.David Gwynne
there are two things shared between the network stack and drivers in the send path: the send queue and the IFF_OACTIVE flag. the send queue is now protected by a mutex. this diff makes the oactive functionality mpsafe too. IFF_OACTIVE is part of if_flags. there are two problems with that. firstly, if_flags is a short and we dont have any MI atomic operations to manipulate a short. secondly, while we could make the IFF_OACTIVE operates mpsafe, all changes to other flags would have to be made safe at the same time, otherwise a read-modify-write cycle on their updates could clobber the oactive change. instead, this moves the oactive mark into struct ifqueue and provides an API for changing it. there's ifq_set_oactive, ifq_clr_oactive, and ifq_is_oactive. these are modelled on ifsq_set_oactive, ifsq_clr_oactive, and ifsq_is_oactive in dragonflybsd. this diff includes changes to all the drivers manipulating IFF_OACTIVE to now use the ifsq_{set,clr_is}_oactive API too. ok kettenis@ mpi@ jmatthew@ deraadt@
2015-11-21simplify ifq_deq_rollback by only having it unlock.David Gwynne
hfsc needed a rollback ifqop to requeue the mbuf because it used ml_dequeue in the begin op. now it uses MBUF_LIST_FIRST to get a ref to the first mbuf in deq_begin. now the disciplines dont need a rollback op, so ifq_deq_rollback can be simplified to just releasing the mutex. based on a discussion with kenjiro cho
2015-11-20Keep if_ref() private, if_get() is what you want to use before if_put().Martin Pieuchot
The thread detaching an interface will sleep until all references to this interface have been released. So we decided to only keep references for a short period of time. Keeping if_ref() private will hopefully help preserve this goal as long as it makes sense. Calling if_get()/if_put() in the same function also allows us to make use of static analysis tools (thanks jsg@!) to catch our errors. ok dlg@
2015-11-20i made a mistake. rename ifq_enq and ifq_deq to ifq_enqueue and ifq_dequeueDavid Gwynne
fixing it now before i regret it more.
2015-11-20fix prio KASSERT, it should be <= not <. ok dlg@Stuart Henderson
2015-11-20shuffle struct ifqueue so in flight mbufs are protected by a mutex.David Gwynne
the code is refactored so the IFQ macros call newly implemented ifq functions. the ifq code is split so each discipline (priq and hfsc in our case) is an opaque set of operations that the common ifq code can call. the common code does the locking, accounting (ifq_len manipulation), and freeing of the mbuf if the disciplines enqueue function rejects it. theyre kind of like bufqs in the block layer with their fifo and nscan disciplines. the new api also supports atomic switching of disciplines at runtime. the hfsc setup in pf_ioctl.c has been tweaked to build a complete hfsc_if structure which it attaches to the send queue in a single operation, rather than attaching to the interface up front and building up a list of queues. the send queue is now mutexed, which raises the expectation that packets can be enqueued or purged on one cpu while another cpu is dequeueing them in a driver for transmission. a lot of drivers use IFQ_POLL to peek at an mbuf and attempt to fit it on the ring before committing to it with a later IFQ_DEQUEUE operation. if the mbuf gets freed in between the POLL and DEQUEUE operations, fireworks will ensue. to avoid this, the ifq api introduces ifq_deq_begin, ifq_deq_rollback, and ifq_deq_commit. ifq_deq_begin allows a driver to take the ifq mutex and get a reference to the mbuf they wish to try and tx. if there's space, they can ifq_deq_commit it to remove the mbuf and release the mutex. if there's no space, ifq_deq_rollback simply releases the mutex. this api was developed to make updating the drivers using IFQ_POLL easy, instead of having to do significant semantic changes to avoid POLL that we cannot test on all the hardware. the common code has been tested pretty hard, and all the driver modifications are straightforward except for de(4). if that breaks it can be dealt with later. ok mpi@ jmatthew@
2015-11-18Factorize the bits to check if a L2 route is connected, wether it isMartin Pieuchot
attached to a carp(4) or bridge(4) member, to not dereference rt_ifp directly. ok visa@
2015-11-13Sore the index of the interface used for revarp instead of a pointer toMartin Pieuchot
its descriptor. Get rid of a if_ref(). ok dlg@
2015-11-11Store the index of the lo0 interface instead of a pointer to itsMartin Pieuchot
descriptor. Allow to get rid of two if_ref() in the output paths. ok dlg@
2015-11-07Use input handlers for bridge(4).Martin Pieuchot
This allows more flexible configurations with vlan(4) and bridge(4) on top of the same physical interface. In particular it allows to not feed VLAN tagget packets into a bridge(4). Fix regression reported by Armin Wolfermann on bugs@, ok dlg@
2015-11-06Rename rt_mpath_next() into rtable_mpath_next() and provide anMartin Pieuchot
implementation for ART based on the singly-linked list of route entries.
2015-11-03Do not clear M_PROTO1 flag before calling if_start() because pseudo-Martin Pieuchot
drivers, like vlan(4), call if_enqueue() in their *start function. Prevent an infinite recursion reported by Armin Wolfermann on bugs@.
2015-11-02Merge rtable_mpath_match() into rtable_lookup().Martin Pieuchot
ok bluhm@
2015-10-28Remove linkmtu and maxmtu from struct nd_ifinfo. IN6_LINKMTU can nowFlorian Obser
die and ifp->if_mtu is the one true mtu. Suggested by and OK mpi@
2015-10-27Use rt_ifidx rather than rt_ifp.Martin Pieuchot
ok bluhm@
2015-10-25unbreak tree for ramdisks without INET6Theo de Raadt
2015-10-25Do not overwrite if_rtrequest() if the driver specified it *before*Martin Pieuchot
calling if_attach().
2015-10-25arp_ifinit() is no longer required.Martin Pieuchot
2015-10-25Introduce if_rtrequest() the successor of ifa_rtrequest().Martin Pieuchot
L2 resolution depends on the protocol (encoded in the route entry) and an ``ifp''. Not having to care about an ``ifa'' makes our life easier in our MP effort. Fewer dependencies between data structures implies fewer headaches. Discussed with bluhm@, ok claudio@
2015-10-24Add pair(4), a vether-based virtual Ethernet driver to interconnectReyk Floeter
rdomains and bridges on the local system. This can be used to route through local rdomains, to create L2 devices (like trunks) between them, and many other things. Discussed with many, with input from mpi@ OK sthen@ phessler@ yasuoka@ mikeb@
2015-10-22Kill link_rtrequest(), introduce in 1990 to "fix" the resultMartin Pieuchot
of rt_getifa() when adding link level route from outside the kernel. ok claudio@
2015-10-22Make sure that the address matching the key (destination) of a routeMartin Pieuchot
entry is attached to this entry. ok phessler@, bluhm@
2015-10-22Inspired by satosin(), use inline functions to convert sockaddr dl.Alexander Bluhm
Instead of casts they check wether the incoming object has the expected type. So introduce satosdl() and sdltosa() in the kernel. OK mpi@
2015-10-22Do not dereference ``ifa_ifp'' when we already have an ``ifp'' pointer.Martin Pieuchot
2015-10-12the pattr argument to IFQ_ENQUEUE is unused, so let's get rid of it.David Gwynne
also the comment above IFQ_ENQUEUE that says the pattr argument is unused. ok mpi@
2015-10-12Unify link state change notification.Martin Pieuchot
ok mikeb@
2015-10-12protect SIOCSLIFPHYTTL, SIOCSVNETID so only root can call them, andDavid Gwynne
return EPNOTSUPP for SIOCGLIFPHYTTL and SIOCGVNETID. all so drivers dont have to do these checks themselves. ok mikeb@ mpi@
2015-10-08Unlock the softnet task.Martin Pieuchot
ok dlg@, kettenis@
2015-10-05Revert if_oqdrops accounting changes done in kernel, per request from mpi@.Masao Uebayashi
(Especially adding IF_DROP() after IFQ_ENQUEUE() was completely wrong because IFQ_ENQUEUE() already does it. Oops.) After this revert, the situation becomes: - if_snd.ifq_drops is incremented in either IFQ_ENQUEUE() or IF_DROP(), but it is not shown to userland, and - if_data.ifi_oqdrops is shown to userland, but it is not incremented by anyone.
2015-10-05Count IFQ_ENQUEUE() failure as output drop.Masao Uebayashi
mpi@ prefers checking IFQ_ENQUEUE() error, and this matches that. OK dlg@