summaryrefslogtreecommitdiff
path: root/sys/net/if_var.h
AgeCommit message (Collapse)Author
2021-02-20add p2p_input, like ether_input but for l3 tunnel interfaces.David Gwynne
the l3 protocol input to push the packet is based on a value in m->m_pkthdr.ph_family, which tunnel drivers should set before calling if_vinput. add p2p_bpf_mtap to call bpf_mtap_af also using m->m_pkthdr.ph_family.
2021-02-20give interfaces an if_bpf_mtap handler.David Gwynne
the network stack is now responsible for calling bpf for packets that the interface receives, and we so far got away with using bpf_mtap_ether for everything. this doesn't work if layer 3 input goes through the same functions, so letting drivers specify the appropriate bpf mtap function means they will be able to cope.
2020-07-29Interface index is unsigned integer. Fix the places where it referencedmvs
as signed. u_int used within pipex(4) for consistency with other code. ok dlg@ mpi@
2020-07-24Use interface index instead of pointer to `ifnet' in carp(4).mvs
ok sashan@
2020-07-22deprecate interface input handler lists, just use one input function.David Gwynne
the interface input handler lists were originally set up to help us during the intial mpsafe network stack work. at the time not all the virtual ethernet interfaces (vlan, svlan, bridge, trunk, etc) were mpsafe, so we wanted a way to avoid them by default, and only take the kernel lock hit when they were specifically enabled on the interface. since then, they have been fixed up to be mpsafe. i could leave the list in place, but it has some semantic problems. because virtual interfaces filter packets based on the order they were attached to the parent interface, you can get packets taken away in surprising ways, especially when you reboot and netstart does something different to what you did by hand. by hardcoding the order that things like vlan and bridge get to look at packets, we can document the behaviour and get consistency. it also means we can get rid of a use of SRPs which were difficult to replace with SMRs. the interface input handler list is an SRPL, which we would like to deprecate. it turns out that you can sleep during stack processing, which you're not supposed to do with SRPs or SMRs, but SRPs are a lot more forgiving and it worked. lastly, it turns out that this code is faster than the input list handling, so lots of winning all around. special thanks to hrvoje popovski and aaron bieber for testing. this has been in snaps as part of a larger diff for over a week.
2020-07-10Change users of IFQ_SET_MAXLEN() and IFQ_IS_EMPTY() to use the "new" API.Patrick Wildt
ok dlg@ tobhe@
2020-07-10Change users of IFQ_PURGE() to use the "new" API.Patrick Wildt
ok dlg@ tobhe@
2020-07-10Change users of IFQ_DEQUEUE(), IFQ_ENQUEUE() and IFQ_LEN() to use thePatrick Wildt
"new" API. ok dlg@ tobhe@
2020-07-04It's been agreed upon that global locks should be expressed usinganton
capital letters in locking annotations. Therefore harmonize the existing annotations. Also, if multiple locks are required they should be delimited using commas. ok mpi@
2020-05-12Set timeout(9) to refill the receive ring descriptors if the amount ofjan
descriptors runs below the low watermark. The em(4) firmware seems not to work properly with just a few descriptors in the receive ring. Thus, we use the low water mark as an indicator instead of zero descriptors, which causes deadlocks. ok kettenis@
2020-04-12say if_pcount needs NET_LOCK instead of the kernel lock.David Gwynne
if_pcount is only touched in ifpromisc(), and ifpromisc() needs NET_LOCK anyway because it also modifies if_flags. suggested by mpi@ ok visa@
2019-11-08convert interface address change hooks to tasks and a task_list.David Gwynne
this follows what's been done for detach and link state hooks, and makes handling of hooks generally more robust. address hooks are a bit different to detach/link state hooks in that there's only a few things that register hooks (carp, pf, vxlan), but a lot of places to run the hooks (lots of ipv4 and ipv6 address configuration). an address hook cookie was in struct pfi_kif, which is part of the pf abi. rather than break pfctl -sI, this maintains the void * used for the cookie and uses it to store a task, which is then used as intended with the new api.
2019-11-07turn the linkstate hooks into a task list, like the detach hooks.David Gwynne
this is largely mechanical, except for carp. this moves the addition of the carp link state hook after we're committed to using the new interface as a carpdev. because the add can't fail, we avoid a complicated unwind dance. also, this tweaks the carp linkstate hook so it only updates the relevant carp interface, not all of the carpdevs on the parent. hrvoje popovski has tested an early version of this diff and it's generally ok, but there's some splasserts that this diff fires that i'll fix in an upcoming diff. ok claudio@
2019-11-06replace the hooks used with if_detachhooks with a task list.David Gwynne
the main semantic change is that things registering detach hooks have to allocate and set a task structure that then gets added to the list. this means if the task is allocated up front (eg, as part of carps softc or bridges port structure), it avoids the possibility that adding a hook can fail. a lot of drivers weren't checking for failure, and unwinding state in the event of failure in other parts was error prone. while doing this i discovered that the list operations have to be in a particular order, but drivers weren't doing that consistently either. this diff wraps the list ops up so you have to seriously go out of your way to screw them up. ive also sprinkled some NET_ASSERT_LOCKED around the list operations so we can make sure there's no potential for the list to be corrupted, especially while it's being run. hrvoje popovski has tested this a bit, and some issues he discovered have been fixed. ok sashan@
2019-06-26Create IF_WWAN_DEFAULT_PRIORITY which is lower thanClaudio Jeker
IF_WIRELESS_DEFAULT_PRIORITY and use it in umb(4) as default prio. OK kettenis@, sthen@
2019-04-28Removes the KERNEL_LOCK() from bridge(4)'s output fast-path.Martin Pieuchot
This redefines the ifp <-> bridge relationship. No lock can be currently used across the multiples contexts where the bridge has tentacles to protect a pointer, use an interface index. Tested by various, ok dlg@, visa@
2019-04-22add if_vinput so pseudo (ethernet) interfaces can bypass ifiqsDavid Gwynne
if_vinput assumes that the interface that its called against uses per cpu counters so it can count input packets, but basically does all the things that if_input and ifiq_input do. the main difference is it assumes the network stack is already running and runs the interface input handlers directly. this is instead of queuing the packets for a nettq to run. ifiqs arent free, especially when they only run per packet like they do on psuedo interfaces. this allows that overhead to be bypassed.
2019-04-19provide factored out txhprio and rxhprio checksDavid Gwynne
l2 and l3 drivers do the same thing all the time, so reduce the chance of error by doing the checks once and making it available for drivers to call instead of rolling on their own again.
2019-04-16have another go at tx mitigationDavid Gwynne
the idea is to call the hardware transmit routine less since in a lot of cases posting a producer ring update to the chip is (very) expensive. it's better to do it for several packets instead of each packet, hence calling this tx mitigation. this diff defers the call to the transmit routine to a network taskq, or until a backlog of packets has built up. dragonflybsd uses 16 as the size of it's backlog, so i'm copying them for now. i've tried this before, but previous versions caused deadlocks. i discovered that the deadlocks in the previous version was from ifq_barrier calling taskq_barrier against the nettq. interfaces generally hold NET_LOCK while calling ifq_barrier, but the tq might already be waiting for the lock we hold. this version just doesnt have ifq_barrier call taskq_barrier. it instead relies on the IFF_RUNNING flag and normal ifq serialiser barrier to guarantee the start routine wont be called when an interface is going down. the taskq_barrier is only used during interface destruction to make sure the task struct wont get used in the future, which is already done without the NET_LOCK being held. tx mitigation provides a nice performanace bump in some setups. up to 25% in some cases. tested by tb@ and hrvoje popovski (who's running this in production). ok visa@
2019-03-31Document that it is safe to dereference `if_softc' when the caller hasMartin Pieuchot
a valid reference to the corresponding `ifp'. ok visa@
2019-01-09split if_enqueue up so drivers can replace ifq handling if neededDavid Gwynne
if_enqueue() still makes sure packets get handled by pf on the way out, and seen by bridge if needed. however instead of falling through to ifq mapping and output, it now calls a function pointer in the ifnet struct. that pointer defaults to the ifq handling, but drivers can override it to bypass ifq processing. the most obvious users of the function pointer will be virtual interfaces, eg, vlan(4). ifqs are good if you need to serialise access to the thing that transmits packets (like hardware rings on nics), or mitigate the number of times you do ring processing, but neither of those things are desirable on vlan interfaces. ideally vlan could transmit on any cpu without having packets serialised by it's own ifq before being pushed down to an arbitrary number of rings on the parent interface. bypassing ifqs means the driver can push the vlan tag on concurrently and push down to the parent frmo any cpu. ok mpi@ no objection from claudio@
2018-12-20Make this not hz dependent by using timeout_add_sec() also rename theClaudio Jeker
define to IFNET_SLOWTIMO since it is no longer a hz divisor. OK visa@ bluhm@ kn@
2018-12-19get rid of a prototype for if_enqueue_try()David Gwynne
it isn't implemented, and is never called.
2018-12-11add optional per-cpu counters for interface stats.David Gwynne
these exist so interfaces that want to do mpsafe work outside the ifq machinery have a place to allocate and update stats in. the generic ioctl handling for getting stats to userland knows how to roll the new per cpu stats into the rest before export. ok visa@
2018-09-10- if_cloners list populated at boot time only then becomes immutable,Alexandr Nedvedicky
so we can let go if_cloners_lock. OK tb@, claudio@, bluhm@, kn@, henning@
2018-01-10get rid of struct carp_if by moving the srpl into struct ifnet if_carp.David Gwynne
currently carp uses a struct carp_if to hold an srp list head, which is accessed by both if_carp in struct ifnet, and via the if input handlers list. this gets rid of some indirection by making if_carp itself the list head, rather than a pointer to the list head via a struct carp_if. it also makes accessing the list consistent by only using if_carp to get to it. ok mpi@
2018-01-08Convert IF_CLONE_INITIALIZER() into C99 initializer.Alexander Bluhm
OK mpi@
2018-01-04Include timeout & tasks in 'struct ifnet' instead of always allocatingMartin Pieuchot
them as M_TEMP. ok visa@
2018-01-02Move the NET_LOCK() inside the switch and start documenting which fieldMartin Pieuchot
is protected by which lock. ok bluhm@, visa@
2017-12-15add ifiqueues for mp safety and nics with multiple rx rings.David Gwynne
currently there is a single mbuf_queue per interface, which all rings on a nic shove packets onto. while the list inside this queue is protected by a mutex, the counters around it (ie, ipackets, ibytes, idrops) are not. this means updates can be lost, and reading the statistics is also inconsistent. having a single queue means that busy rx rings can dominate and then starve the others. ifiqueue structs are like ifqueue structs. they provide per ring queues, and independent counters for each ring. when ifdata is read for userland, these counters are aggregated. having a queue per ring now allows for per ring backpressure to be applied. MCLGETI will have it's day again. right now we assume every interface wants an input queue and unconditionally provide one. individual interfaces can opt into more. im not completely happy about the shape of this atm, but shuffling it around more makes the diff bigger. ok visa@
2017-11-17add if_rxr_livelocked so rxr users can request backpressure themselves.David Gwynne
right now the rx ring moderation code makes a decision globally that a machine is livelocked, and uses that to apply backpressure on all the rx rings. we're moving toward having the network stack run on multiple cpus, and fed from multiple rx rings. if_rxr_livelocked lets a driver apply backpressure explicitely if something tells it that whatever is consuming previous packets cannot keep up. while here expose the current ring watermark with if_rxr_cwm. tweaks and ok visa@
2017-10-31- add one more softnet taskqAlexandr Nedvedicky
NOTE: code still runs with single softnet task. change definition of SOFTNET_TASKS in net/if.c, if you want to have more than one softnet task OK mpi@, OK phessler@
2017-10-12Move sysctl_mq() where it can safely mess with mbuf queue internals.Martin Pieuchot
ok visa@, bluhm@, deraadt@
2017-05-08Added initial IPv6 multicast routing support for multiple rdomains:Rafael Zalamena
* don't share mifs (multicast interface) between rdomains * allow multiple routing sockets connected at the same time if they are in different rdomains. ok bluhm@
2017-01-24add support for multiple transmit ifqueues per network interface.David Gwynne
an ifq to transmit a packet is picked by the current traffic conditioner (ie, priq or hfsc) by providing an index into an array of ifqs. by default interfaces get a single ifq but can ask for more using if_attach_queues(). the vast majority of our drivers still think there's a 1:1 mapping between interfaces and transmit queues, so their if_start routines take an ifnet pointer instead of a pointer to the ifqueue struct. instead of changing all the drivers in the tree, drivers can opt into using an if_qstart routine and setting the IFXF_MPSAFE flag. the stack provides a compatability wrapper from the new if_qstart handler to the previous if_start handlers if IFXF_MPSAFE isnt set. enabling hfsc on an interface configures it to transmit everything through the first ifq. any other ifqs are left configured as priq, but unused, when hfsc is enabled. getting this in now so everyone can kick the tyres. ok mpi@ visa@ (who provided some tweaks for cnmac).
2017-01-21Make the if_flags member unsigned. This was prompted by clangPatrick Wildt
complaining that assigning the MULTICAST flag, which sets the uppermost bit, would invert the meaning of MULTICAST flag's numeric value. ok claudio@ deraadt@ tom@ visa@
2017-01-06Remove the global viftable vector that holds the virtual interfacesRafael Zalamena
configuration and instead use ifnet to store the configuration and counters. With this we can safely use multicast routing daemons on multiple domains without vif id colisions. ok mpi@
2016-11-14Automatically create a default lo(4) interface per rdomain.Martin Pieuchot
In order to stop abusing lo0 for all rdomains, a new loopback interface will be created every time a rdomain is created. The unit number will be the same as the rdomain, i.e. lo1 will be attached to rdomain 1. If this loopback interface is already in use it wont be possible to create the corresponding rdomain. In order to know which lo(4) interface is attached to a rdomain, its index is stored in the rtable/rdomain map. This is a long overdue since the introduction of rtable/rdomain. It also fixes a recent regression due to resetting the rdomain of an incoming packet reported by semarie@, Andreas Bartelt and Nils Frohberg. ok claudio@
2016-11-08RIP ifa_ifwithnet()Martin Pieuchot
ok vgross@
2016-09-04When auto-creating an interface when opening a /dev/{tun,tap,switch}Reyk Floeter
device, inherit the rdomain from the calling process. This adds an rdomain argument to if_clone_create(). OK mpi@ henning@
2016-09-03Use per-ifp tasks to process incoming packets.Martin Pieuchot
Reduce the number of if_get/if_put from one per packet to one per ring since we now know that all the packets are coming from the same interface. Improve forwarding performances by 10Kpps in Hrvoje Popovski's test setup. ok bluhm@, henning@, dlg@
2016-09-01Import switch(4), an in-kernel OpenFlow switch which can work alone.Kazuya Goda
switch(4) currently supports OpenFlow 1.3.5. Currently, it's disabled by the kernel config. With help from yasuoka@ reyk@ jsg@. ok deraadt@ yasuoka@ reyk@ henning@
2016-06-10Add the "llprio" field to struct ifnet, and the corresponding keywordVincent Gross
to ifconfig. "llprio" allows one to set the priority of packets that do not go through pf(4), as the case is for arp(4) or bpf(4). ok sthen@ mikeb@
2016-04-15remove ml_filter, mq_filter, niq_filter.David Gwynne
theyre currently unused, so no functional change.
2016-04-13We're always ready! So send IFQ_SET_READY() to the bitbucket.Martin Pieuchot
2015-12-18Remove leftover prototype.Visa Hankala
ok mpi@
2015-12-09Keep all ether prototypes in one place.Martin Pieuchot
2015-12-09rework the if_start mpsafe serialisation so it can serialise arbitrary workDavid Gwynne
work is represented by struct task. the start routine is now wrapped by a task which is serialised by the infrastructure. if_start_barrier has been renamed to ifq_barrier and is now implemented as a task that gets serialised with the start routine. this also adds an ifq_restart() function. it serialises a call to ifq_clr_oactive and calls the start routine again. it exists to avoid a race that kettenis@ identified in between when a start routine discovers theres no space left on a ring, and when it calls ifq_set_oactive. if the txeof side of the driver empties the ring and calls ifq_clr_oactive in between the above calls in start, the queue will be marked oactive and the stack will never call the start routine again. by serialising the ifq_set_oactive call in the start routine and ifq_clr_oactive calls we avoid that race. tested on various nics ok mpi@
2015-12-08if_stop is unused, so kill it.David Gwynne
ok mpi@
2015-12-08split the interface send queue (struct ifqueue) implementation out.David Gwynne
the intention is to make it more clear what belongs to a transmit queue and what belongs to an interface. suggested by and ok mpi@