Age | Commit message (Collapse) | Author |
|
the srp_ref struct is used to track the location of the callers
hazard pointer so later calls to srp_follow and srp_enter already
know what to clear. this in turn means most of the caveats around
using srps go away. specifically, you can now:
- switch cpus while holding an srp ref
- ie, you can sleep while holding an srp ref
- you can take and release srp refs in any order
the original intent was to simplify use of the api when dealing
with complicated data structures. the caller now no longer has to
track the location of the srp a value was fetched from, the srp_ref
effectively does that for you.
srp lists have been refactored to use srp_refs instead of srpl_iter
structs.
this is in preparation of using srps inside the ART code. ART is a
complicated data structure, and lookups require overlapping holds
of srp references.
ok mpi@ jmatthew@
|
|
ok mpi@
|
|
hint.
ok kettenis@, deraadt@
|
|
Use a new task that runs holding the KERNEL_LOCK to execute mp-unsafe
code. Our current goal is to progressively move input functions to the
unlocked task.
This gives a small performance boost confirmed by Hrvoje Popovski's
IPv4 forwarding measurement:
before: after:
send receive send receive
400kpps 400kpps 400kpps 400kpps
500kpps 500kpps 500kpps 500kpps
600kpps 600kpps 600kpps 600kpps
650kpps 650kpps 650kpps 640kpps
700kpps 700kpps 700kpps 700kpps
720kpps 640kpps 720kpps 710kpps
800kpps 640kpps 800kpps 650kpps
1.4Mpps 570kpps 1.4Mpps 590kpps
14Mpps 570kpps 14Mpps 590kpps
ok kettenis@, bluhm@, dlg@
|
|
detect this and bump ifq_congestion forward rather than claim the
system is congested for a long period of time.
ok mpi@ henning@ jmatthew@
|
|
ok mpi@ millert@
|
|
in the future this will subsume the individual vlandev, carpdev,
pppoedev, foodev options for things like vlan, carp, pppoe, etc.
inspired by vnetid
ok mpi@ jmatthew@
|
|
Remove "option COMPAT_LINUX" and everything directly tied to it from the
kernel and the corresponding man page documentation.
ok visa@ guenther@
|
|
work is represented by struct task.
the start routine is now wrapped by a task which is serialised by the
infrastructure. if_start_barrier has been renamed to ifq_barrier and
is now implemented as a task that gets serialised with the start
routine.
this also adds an ifq_restart() function. it serialises a call to
ifq_clr_oactive and calls the start routine again. it exists to
avoid a race that kettenis@ identified in between when a start
routine discovers theres no space left on a ring, and when it calls
ifq_set_oactive. if the txeof side of the driver empties the ring
and calls ifq_clr_oactive in between the above calls in start, the
queue will be marked oactive and the stack will never call the start
routine again.
by serialising the ifq_set_oactive call in the start routine and
ifq_clr_oactive calls we avoid that race.
tested on various nics
ok mpi@
|
|
ok dlg@
|
|
the intention is to make it more clear what belongs to a transmit
queue and what belongs to an interface.
suggested by and ok mpi@
|
|
|
|
It is now safe to call if_enqueue() without holding the KERNEL_LOCK()
even on an interface part of a bridge(4).
ok dlg@, henning@, kettenis@
|
|
fallback to a SLIST.
ok dlg@, jasper@
|
|
required.
ok bluhm@ mpi@.
|
|
existing start routines will still be called under the kernel lock
and at IPL_NET.
mpsafe start routines will be serialised so only one instance of
each interfaces function will be running in the kernel at any point
in time. this guarantees packets will be dequeued in order, and the
start routines dont have to lock against themselves because if_start
does it for them.
the code to do that is based on the scsi runqueue code.
this also provides an if_start_barrier() function that should wait
until any currently running instances of if_start have finished.
a driver can opt in to the mpsafe if_start call by doing the following:
1. setting ifp->if_xflags = IFXF_MPSAFE
2. only calling if_start() instead of its own start routine
3. clearing IFF_RUNNING before calling if_start_barrier() on its way down
4. only using IFQ_DEQUEUE (not ifq_deq_begin/commit/rollback)
to simplify the implementation the tx mitigation code has been removed.
tested by several
ok mpi@ jmatthew@
|
|
are not used anymore. This has to be done before any interface
fields become invalid.
As the route delete request cannot call if_get() anymore, pass down
the interface. Split rtrequest_delete() into a separate function
that may take an existing inteface.
OK mpi@
|
|
them and they make everything so much harder with no gain. Remove the
ifp argument from mpls_input since it is not needed. On the input side
the lookup side is modified a bit when it comes to BOS handling.
Tested in a L3VPN setup with ldpd and bgpd. Commiting now so we can move
on with cleaning up rt_ifp usage. If this breaks L2VPN I will fix it once
reported. OK mpi@
|
|
With input and ok mpi@.
|
|
with SRPs.
This is a simplified version of the dynamically sizeable array of
pointers used by if_get() because routing table heads are never
freed.
ok dlg@
|
|
there are two things shared between the network stack and drivers
in the send path: the send queue and the IFF_OACTIVE flag. the send
queue is now protected by a mutex. this diff makes the oactive
functionality mpsafe too.
IFF_OACTIVE is part of if_flags. there are two problems with that.
firstly, if_flags is a short and we dont have any MI atomic operations
to manipulate a short. secondly, while we could make the IFF_OACTIVE
operates mpsafe, all changes to other flags would have to be made
safe at the same time, otherwise a read-modify-write cycle on their
updates could clobber the oactive change.
instead, this moves the oactive mark into struct ifqueue and provides
an API for changing it. there's ifq_set_oactive, ifq_clr_oactive,
and ifq_is_oactive. these are modelled on ifsq_set_oactive,
ifsq_clr_oactive, and ifsq_is_oactive in dragonflybsd.
this diff includes changes to all the drivers manipulating IFF_OACTIVE
to now use the ifsq_{set,clr_is}_oactive API too.
ok kettenis@ mpi@ jmatthew@ deraadt@
|
|
hfsc needed a rollback ifqop to requeue the mbuf because it used
ml_dequeue in the begin op. now it uses MBUF_LIST_FIRST to get a
ref to the first mbuf in deq_begin.
now the disciplines dont need a rollback op, so ifq_deq_rollback
can be simplified to just releasing the mutex.
based on a discussion with kenjiro cho
|
|
The thread detaching an interface will sleep until all references to this
interface have been released. So we decided to only keep references for
a short period of time.
Keeping if_ref() private will hopefully help preserve this goal as long
as it makes sense.
Calling if_get()/if_put() in the same function also allows us to make
use of static analysis tools (thanks jsg@!) to catch our errors.
ok dlg@
|
|
fixing it now before i regret it more.
|
|
|
|
the code is refactored so the IFQ macros call newly implemented ifq
functions. the ifq code is split so each discipline (priq and hfsc
in our case) is an opaque set of operations that the common ifq
code can call. the common code does the locking, accounting (ifq_len
manipulation), and freeing of the mbuf if the disciplines enqueue
function rejects it. theyre kind of like bufqs in the block layer
with their fifo and nscan disciplines.
the new api also supports atomic switching of disciplines at runtime.
the hfsc setup in pf_ioctl.c has been tweaked to build a complete
hfsc_if structure which it attaches to the send queue in a single
operation, rather than attaching to the interface up front and
building up a list of queues.
the send queue is now mutexed, which raises the expectation that
packets can be enqueued or purged on one cpu while another cpu is
dequeueing them in a driver for transmission. a lot of drivers use
IFQ_POLL to peek at an mbuf and attempt to fit it on the ring before
committing to it with a later IFQ_DEQUEUE operation. if the mbuf
gets freed in between the POLL and DEQUEUE operations, fireworks
will ensue.
to avoid this, the ifq api introduces ifq_deq_begin, ifq_deq_rollback,
and ifq_deq_commit. ifq_deq_begin allows a driver to take the ifq
mutex and get a reference to the mbuf they wish to try and tx. if
there's space, they can ifq_deq_commit it to remove the mbuf and
release the mutex. if there's no space, ifq_deq_rollback simply
releases the mutex. this api was developed to make updating the
drivers using IFQ_POLL easy, instead of having to do significant
semantic changes to avoid POLL that we cannot test on all the
hardware.
the common code has been tested pretty hard, and all the driver
modifications are straightforward except for de(4). if that breaks
it can be dealt with later.
ok mpi@ jmatthew@
|
|
attached to a carp(4) or bridge(4) member, to not dereference rt_ifp
directly.
ok visa@
|
|
its descriptor. Get rid of a if_ref().
ok dlg@
|
|
descriptor.
Allow to get rid of two if_ref() in the output paths.
ok dlg@
|
|
This allows more flexible configurations with vlan(4) and bridge(4) on
top of the same physical interface. In particular it allows to not feed
VLAN tagget packets into a bridge(4).
Fix regression reported by Armin Wolfermann on bugs@, ok dlg@
|
|
implementation for ART based on the singly-linked list of route
entries.
|
|
drivers, like vlan(4), call if_enqueue() in their *start function.
Prevent an infinite recursion reported by Armin Wolfermann on bugs@.
|
|
ok bluhm@
|
|
die and ifp->if_mtu is the one true mtu.
Suggested by and OK mpi@
|
|
ok bluhm@
|
|
|
|
calling if_attach().
|
|
|
|
L2 resolution depends on the protocol (encoded in the route entry) and
an ``ifp''. Not having to care about an ``ifa'' makes our life easier
in our MP effort. Fewer dependencies between data structures implies
fewer headaches.
Discussed with bluhm@, ok claudio@
|
|
rdomains and bridges on the local system. This can be used to route
through local rdomains, to create L2 devices (like trunks) between
them, and many other things.
Discussed with many, with input from mpi@
OK sthen@ phessler@ yasuoka@ mikeb@
|
|
of rt_getifa() when adding link level route from outside the
kernel.
ok claudio@
|
|
entry is attached to this entry.
ok phessler@, bluhm@
|
|
Instead of casts they check wether the incoming object has the
expected type. So introduce satosdl() and sdltosa() in the kernel.
OK mpi@
|
|
|
|
also the comment above IFQ_ENQUEUE that says the pattr argument is unused.
ok mpi@
|
|
ok mikeb@
|
|
return EPNOTSUPP for SIOCGLIFPHYTTL and SIOCGVNETID. all so drivers
dont have to do these checks themselves.
ok mikeb@ mpi@
|
|
ok dlg@, kettenis@
|
|
(Especially adding IF_DROP() after IFQ_ENQUEUE() was completely wrong because
IFQ_ENQUEUE() already does it. Oops.)
After this revert, the situation becomes:
- if_snd.ifq_drops is incremented in either IFQ_ENQUEUE() or IF_DROP(), but
it is not shown to userland, and
- if_data.ifi_oqdrops is shown to userland, but it is not incremented by
anyone.
|
|
mpi@ prefers checking IFQ_ENQUEUE() error, and this matches that.
OK dlg@
|