Age | Commit message (Collapse) | Author |
|
currently carp uses a struct carp_if to hold an srp list head, which
is accessed by both if_carp in struct ifnet, and via the if input
handlers list.
this gets rid of some indirection by making if_carp itself the list
head, rather than a pointer to the list head via a struct carp_if.
it also makes accessing the list consistent by only using if_carp
to get to it.
ok mpi@
|
|
OK mpi@
|
|
them as M_TEMP.
ok visa@
|
|
is protected by which lock.
ok bluhm@, visa@
|
|
currently there is a single mbuf_queue per interface, which all
rings on a nic shove packets onto. while the list inside this queue
is protected by a mutex, the counters around it (ie, ipackets,
ibytes, idrops) are not. this means updates can be lost, and reading
the statistics is also inconsistent. having a single queue means
that busy rx rings can dominate and then starve the others.
ifiqueue structs are like ifqueue structs. they provide per ring
queues, and independent counters for each ring. when ifdata is read
for userland, these counters are aggregated. having a queue per
ring now allows for per ring backpressure to be applied. MCLGETI
will have it's day again.
right now we assume every interface wants an input queue and
unconditionally provide one. individual interfaces can opt into
more.
im not completely happy about the shape of this atm, but shuffling
it around more makes the diff bigger.
ok visa@
|
|
right now the rx ring moderation code makes a decision globally
that a machine is livelocked, and uses that to apply backpressure
on all the rx rings. we're moving toward having the network stack
run on multiple cpus, and fed from multiple rx rings. if_rxr_livelocked
lets a driver apply backpressure explicitely if something tells it
that whatever is consuming previous packets cannot keep up.
while here expose the current ring watermark with if_rxr_cwm.
tweaks and ok visa@
|
|
NOTE: code still runs with single softnet task. change definition of
SOFTNET_TASKS in net/if.c, if you want to have more than one softnet task
OK mpi@, OK phessler@
|
|
ok visa@, bluhm@, deraadt@
|
|
* don't share mifs (multicast interface) between rdomains
* allow multiple routing sockets connected at the same time if they are
in different rdomains.
ok bluhm@
|
|
an ifq to transmit a packet is picked by the current traffic
conditioner (ie, priq or hfsc) by providing an index into an array
of ifqs. by default interfaces get a single ifq but can ask for
more using if_attach_queues().
the vast majority of our drivers still think there's a 1:1 mapping
between interfaces and transmit queues, so their if_start routines
take an ifnet pointer instead of a pointer to the ifqueue struct.
instead of changing all the drivers in the tree, drivers can opt
into using an if_qstart routine and setting the IFXF_MPSAFE flag.
the stack provides a compatability wrapper from the new if_qstart
handler to the previous if_start handlers if IFXF_MPSAFE isnt set.
enabling hfsc on an interface configures it to transmit everything
through the first ifq. any other ifqs are left configured as priq,
but unused, when hfsc is enabled.
getting this in now so everyone can kick the tyres.
ok mpi@ visa@ (who provided some tweaks for cnmac).
|
|
complaining that assigning the MULTICAST flag, which sets the
uppermost bit, would invert the meaning of MULTICAST flag's
numeric value.
ok claudio@ deraadt@ tom@ visa@
|
|
configuration and instead use ifnet to store the configuration and
counters. With this we can safely use multicast routing daemons on
multiple domains without vif id colisions.
ok mpi@
|
|
In order to stop abusing lo0 for all rdomains, a new loopback interface
will be created every time a rdomain is created. The unit number will
be the same as the rdomain, i.e. lo1 will be attached to rdomain 1.
If this loopback interface is already in use it wont be possible to create
the corresponding rdomain.
In order to know which lo(4) interface is attached to a rdomain, its index
is stored in the rtable/rdomain map.
This is a long overdue since the introduction of rtable/rdomain. It also
fixes a recent regression due to resetting the rdomain of an incoming
packet reported by semarie@, Andreas Bartelt and Nils Frohberg.
ok claudio@
|
|
ok vgross@
|
|
device, inherit the rdomain from the calling process. This adds an
rdomain argument to if_clone_create().
OK mpi@ henning@
|
|
Reduce the number of if_get/if_put from one per packet to one per ring
since we now know that all the packets are coming from the same interface.
Improve forwarding performances by 10Kpps in Hrvoje Popovski's test setup.
ok bluhm@, henning@, dlg@
|
|
switch(4) currently supports OpenFlow 1.3.5.
Currently, it's disabled by the kernel config.
With help from yasuoka@ reyk@ jsg@.
ok deraadt@ yasuoka@ reyk@ henning@
|
|
to ifconfig.
"llprio" allows one to set the priority of packets that do not go through
pf(4), as the case is for arp(4) or bpf(4).
ok sthen@ mikeb@
|
|
theyre currently unused, so no functional change.
|
|
|
|
ok mpi@
|
|
|
|
work is represented by struct task.
the start routine is now wrapped by a task which is serialised by the
infrastructure. if_start_barrier has been renamed to ifq_barrier and
is now implemented as a task that gets serialised with the start
routine.
this also adds an ifq_restart() function. it serialises a call to
ifq_clr_oactive and calls the start routine again. it exists to
avoid a race that kettenis@ identified in between when a start
routine discovers theres no space left on a ring, and when it calls
ifq_set_oactive. if the txeof side of the driver empties the ring
and calls ifq_clr_oactive in between the above calls in start, the
queue will be marked oactive and the stack will never call the start
routine again.
by serialising the ifq_set_oactive call in the start routine and
ifq_clr_oactive calls we avoid that race.
tested on various nics
ok mpi@
|
|
ok mpi@
|
|
the intention is to make it more clear what belongs to a transmit
queue and what belongs to an interface.
suggested by and ok mpi@
|
|
<net/if_var.h> because some other operating systems have defines in
there.
ok jasper@
|
|
this avoids current recursion to pf_test() function. the change also
switches icmp_error()/icmp6_error() to use ip_send()/ip6_send() so
they are safe for PF.
The idea comes from Markus Friedl. bluhm, mikeb and mpi helped me
a lot to get it into shape.
OK bluhm@, mpi@
|
|
fallback to a SLIST.
ok dlg@, jasper@
|
|
existing start routines will still be called under the kernel lock
and at IPL_NET.
mpsafe start routines will be serialised so only one instance of
each interfaces function will be running in the kernel at any point
in time. this guarantees packets will be dequeued in order, and the
start routines dont have to lock against themselves because if_start
does it for them.
the code to do that is based on the scsi runqueue code.
this also provides an if_start_barrier() function that should wait
until any currently running instances of if_start have finished.
a driver can opt in to the mpsafe if_start call by doing the following:
1. setting ifp->if_xflags = IFXF_MPSAFE
2. only calling if_start() instead of its own start routine
3. clearing IFF_RUNNING before calling if_start_barrier() on its way down
4. only using IFQ_DEQUEUE (not ifq_deq_begin/commit/rollback)
to simplify the implementation the tx mitigation code has been removed.
tested by several
ok mpi@ jmatthew@
|
|
changing.
|
|
|
|
there are two things shared between the network stack and drivers
in the send path: the send queue and the IFF_OACTIVE flag. the send
queue is now protected by a mutex. this diff makes the oactive
functionality mpsafe too.
IFF_OACTIVE is part of if_flags. there are two problems with that.
firstly, if_flags is a short and we dont have any MI atomic operations
to manipulate a short. secondly, while we could make the IFF_OACTIVE
operates mpsafe, all changes to other flags would have to be made
safe at the same time, otherwise a read-modify-write cycle on their
updates could clobber the oactive change.
instead, this moves the oactive mark into struct ifqueue and provides
an API for changing it. there's ifq_set_oactive, ifq_clr_oactive,
and ifq_is_oactive. these are modelled on ifsq_set_oactive,
ifsq_clr_oactive, and ifsq_is_oactive in dragonflybsd.
this diff includes changes to all the drivers manipulating IFF_OACTIVE
to now use the ifsq_{set,clr_is}_oactive API too.
ok kettenis@ mpi@ jmatthew@ deraadt@
|
|
|
|
hfsc needed a rollback ifqop to requeue the mbuf because it used
ml_dequeue in the begin op. now it uses MBUF_LIST_FIRST to get a
ref to the first mbuf in deq_begin.
now the disciplines dont need a rollback op, so ifq_deq_rollback
can be simplified to just releasing the mutex.
based on a discussion with kenjiro cho
|
|
fixing it now before i regret it more.
|
|
the code is refactored so the IFQ macros call newly implemented ifq
functions. the ifq code is split so each discipline (priq and hfsc
in our case) is an opaque set of operations that the common ifq
code can call. the common code does the locking, accounting (ifq_len
manipulation), and freeing of the mbuf if the disciplines enqueue
function rejects it. theyre kind of like bufqs in the block layer
with their fifo and nscan disciplines.
the new api also supports atomic switching of disciplines at runtime.
the hfsc setup in pf_ioctl.c has been tweaked to build a complete
hfsc_if structure which it attaches to the send queue in a single
operation, rather than attaching to the interface up front and
building up a list of queues.
the send queue is now mutexed, which raises the expectation that
packets can be enqueued or purged on one cpu while another cpu is
dequeueing them in a driver for transmission. a lot of drivers use
IFQ_POLL to peek at an mbuf and attempt to fit it on the ring before
committing to it with a later IFQ_DEQUEUE operation. if the mbuf
gets freed in between the POLL and DEQUEUE operations, fireworks
will ensue.
to avoid this, the ifq api introduces ifq_deq_begin, ifq_deq_rollback,
and ifq_deq_commit. ifq_deq_begin allows a driver to take the ifq
mutex and get a reference to the mbuf they wish to try and tx. if
there's space, they can ifq_deq_commit it to remove the mbuf and
release the mutex. if there's no space, ifq_deq_rollback simply
releases the mutex. this api was developed to make updating the
drivers using IFQ_POLL easy, instead of having to do significant
semantic changes to avoid POLL that we cannot test on all the
hardware.
the common code has been tested pretty hard, and all the driver
modifications are straightforward except for de(4). if that breaks
it can be dealt with later.
ok mpi@ jmatthew@
|
|
attached to a carp(4) or bridge(4) member, to not dereference rt_ifp
directly.
ok visa@
|
|
descriptor.
Allow to get rid of two if_ref() in the output paths.
ok dlg@
|
|
L2 resolution depends on the protocol (encoded in the route entry) and
an ``ifp''. Not having to care about an ``ifa'' makes our life easier
in our MP effort. Fewer dependencies between data structures implies
fewer headaches.
Discussed with bluhm@, ok claudio@
|
|
rdomains and bridges on the local system. This can be used to route
through local rdomains, to create L2 devices (like trunks) between
them, and many other things.
Discussed with many, with input from mpi@
OK sthen@ phessler@ yasuoka@ mikeb@
|
|
of rt_getifa() when adding link level route from outside the
kernel.
ok claudio@
|
|
also the comment above IFQ_ENQUEUE that says the pattr argument is unused.
ok mpi@
|
|
Necessary bumps in Ports will be handled by sthen@.
OK mpi@ dlg@
|
|
this is done by moving to the refcnt api and using refcnt_finalize.
tested by Hrjove Popovski
ok mpi@
|
|
Instead of violating a layer of abstraction by keeping per pseudo-driver
informations in "struct ifnet", the port trunk is now passed as a cookie
to the interface input handler (ifih).
The time of per pseudo-driver hack in the network stack is over!
ok mikeb@
|
|
the mbuf in both the hfsc and priq error paths.
ok mikeb@ mpi@ claudio@ henning@
|
|
needs to see lo0 in the output path.
ok claudio@
|
|
context.
ok mpi@, claudio@
|
|
Use instead the RTF_LOCAL flag to loop local traffic back to the
corresponding protocol queue.
With this change rt_ifp is now always the same as rt_ifa->ifa_ifp.
ok claudio@
|
|
the protocol queues.
It basically does what looutput() was doing but having a generic
function will allow us to get rid of the loopback hack overwwritting
the rt_ifp field of RTF_LOCAL routes.
ok mikeb@, dlg@, claudio@
|