Age | Commit message (Collapse) | Author |
|
the SCHED_LOCK().
Putting a thread on a sleep queue is reduce to the following:
sleep_setup();
/* check condition or release lock */
sleep_finish();
Previous version ok cheloha@, jmatthew@, ok claudio@
|
|
this raises performance of tcpbench on an m3000 from ~3kpps and
~8MB/s to ~70kpps and ~191MB/s when transmitting, and ~10kpps and
~15MB/s to ~120kpps and 174MB/s when receiving.
i also tested this on a v245 and an m4000 a while back.
|
|
OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@
|
|
ok semarie@
|
|
|
|
myx is unusually minimal, so there's not a lot of information that
the chip provides. the most interesting is the number of packets
the chip drops cos of a lack of space on the rx rings.
|
|
ok dlg@ tobhe@
|
|
|
|
reading all 256 at a time was a nice idea, but meant page 0xa2 wasnt
appearing like it should. this follows what freebsd does more
closely too.
|
|
some modules seem to need more time when waiting for bytes while here.
hrvoje popovski hit the endian issue
|
|
myx doesn't allow i2c writes, so you can only read whatever page
the firmware is already pointing at on device 0xa0. if you try to
read another page it will return ENXIO.
tested on a 10G-PCIE-8A-R with an xfp module.
|
|
the list of commands is going to grow, but the thought of keeping
the list in debug code up to date with it just makes me feel tired.
this prints the command id number instead in the same format we
represent it in the header.
|
|
pool_cache_init cannot be called during autoconf because we cant
be confident about the number of cpus in the machine until the first
run of attaches.
mountroot is after autoconf, and myx already has code that runs
there for the firmware loading.
discussed with deraadt@
|
|
this replaces individual calls to pool_init, pool_set_constraints, and
pool_sethardlimit with calls to m_pool_init. m_pool_init inits the
mbuf pools with the mbuf pool allocator, and because of that doesnt
set per pool limits.
ok bluhm@ as part of a larger diff
|
|
an ifq to transmit a packet is picked by the current traffic
conditioner (ie, priq or hfsc) by providing an index into an array
of ifqs. by default interfaces get a single ifq but can ask for
more using if_attach_queues().
the vast majority of our drivers still think there's a 1:1 mapping
between interfaces and transmit queues, so their if_start routines
take an ifnet pointer instead of a pointer to the ifqueue struct.
instead of changing all the drivers in the tree, drivers can opt
into using an if_qstart routine and setting the IFXF_MPSAFE flag.
the stack provides a compatability wrapper from the new if_qstart
handler to the previous if_start handlers if IFXF_MPSAFE isnt set.
enabling hfsc on an interface configures it to transmit everything
through the first ifq. any other ifqs are left configured as priq,
but unused, when hfsc is enabled.
getting this in now so everyone can kick the tyres.
ok mpi@ visa@ (who provided some tweaks for cnmac).
|
|
this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.
ok mpi@ deraadt@
|
|
raise the mtu to 9380 bytes so we can take advantage of the extra space.
i need to revisit the macro names at some point.
|
|
my early revision board doesnt like it at all
|
|
now it asks the mbuf layer for the 9k from its pools.
a question from chris@ made me go look at the chip doco again and i
realised that the chip only requires 4 byte alignment for rx buffers,
no 4k alignment for jumbo buffers.
i also found that the chip is supposed to be able to rx up to 9400
bytes instead of 9000. ill fix that later though.
|
|
the ioff argument to pool_init() is unused and has been for many
years, so this replaces it with an ipl argument. because the ipl
will be set on init we no longer need pool_setipl.
most of these changes have been done with coccinelle using the spatch
below. cocci sucks at formatting code though, so i fixed that by hand.
the manpage and subr_pool.c bits i did myself.
ok tedu@ jmatthew@
@ipl@
expression pp;
expression ipl;
expression s, a, o, f, m, p;
@@
-pool_init(pp, s, a, o, f, m, p);
-pool_setipl(pp, ipl);
+pool_init(pp, s, a, ipl, f, m, p);
|
|
via unions, and we don't want to make it easy to control the target.
instead an integer index into an array of acceptable functions is used.
drivers using custom functions must register them to receive an index.
ok deraadt
|
|
|
|
|
|
similar to config_defer(9).
ok mikeb@, deraadt@
|
|
work is represented by struct task.
the start routine is now wrapped by a task which is serialised by the
infrastructure. if_start_barrier has been renamed to ifq_barrier and
is now implemented as a task that gets serialised with the start
routine.
this also adds an ifq_restart() function. it serialises a call to
ifq_clr_oactive and calls the start routine again. it exists to
avoid a race that kettenis@ identified in between when a start
routine discovers theres no space left on a ring, and when it calls
ifq_set_oactive. if the txeof side of the driver empties the ring
and calls ifq_clr_oactive in between the above calls in start, the
queue will be marked oactive and the stack will never call the start
routine again.
by serialising the ifq_set_oactive call in the start routine and
ifq_clr_oactive calls we avoid that race.
tested on various nics
ok mpi@
|
|
as per the stack commit, the driver changes are:
1. setting ifp->if_xflags = IFXF_MPSAFE
2. only calling if_start() instead of its own start routine
3. clearing IFF_RUNNING before calling if_start_barrier() on its way down
4. only using IFQ_DEQUEUE (not ifq_deq_begin/commit/rollback)
|
|
|
|
there are two things shared between the network stack and drivers
in the send path: the send queue and the IFF_OACTIVE flag. the send
queue is now protected by a mutex. this diff makes the oactive
functionality mpsafe too.
IFF_OACTIVE is part of if_flags. there are two problems with that.
firstly, if_flags is a short and we dont have any MI atomic operations
to manipulate a short. secondly, while we could make the IFF_OACTIVE
operates mpsafe, all changes to other flags would have to be made
safe at the same time, otherwise a read-modify-write cycle on their
updates could clobber the oactive change.
instead, this moves the oactive mark into struct ifqueue and provides
an API for changing it. there's ifq_set_oactive, ifq_clr_oactive,
and ifq_is_oactive. these are modelled on ifsq_set_oactive,
ifsq_clr_oactive, and ifsq_is_oactive in dragonflybsd.
this diff includes changes to all the drivers manipulating IFF_OACTIVE
to now use the ifsq_{set,clr_is}_oactive API too.
ok kettenis@ mpi@ jmatthew@ deraadt@
|
|
turns out i was calculating the number of packets (not descriptors)
on the tx ring, and then using that as the free space for descriptors.
|
|
myx_start calculates the free space by reading the consumer index
and doing some maths, which lets us avoid the interlocked cpu ops.
|
|
|
|
myx is unusual in that it has an explicit command to shut down the
chip that gets an interrupt when it's done. so myx_down sends the
command and has to sleep until it gets that interrupt. this moves
to using a single int to represent that state (so loads and stores
are atomic), and sleep_setup/sleep_finish in myx_down to wait for
it to change.
this has been running in production at work for a few months now
tested by chris@
|
|
|
|
instead of one per packet.
seems to let me send packets a little faster.
|
|
this removes the myx_buf structure and uses myx_slot instead. theyre
the same expcet slots dont have list entry structures, so theyre
smaller.
this cuts out four mutex ops per packet out of the tx handling.
just have to get rid of the atomic op per packet in myx_start now.
|
|
this lets me get rid of the locking around the refilling of the rx ring.
the timeout only runs refill if the rx ring is empty. we know rxeof
wont try and refill it in that situation because there's no packets
on the ring so we wont get interrupts for it. therefore we dont
need to lock between the timeout and rxeof cos they cant run at the
same time.
|
|
originally there were two mutex protected lists for rx packets, a
list of free packets, and a list of packets that were on the ring.
filling the ring popped packets off the free list, attached an mbuf
and dmamapped it, and pushed it onto the list of active packets.
the hw fills packets in order, so on rx completion we'd pop packets
the active list, unmap the mbuf and shove it up the stack before
putting the packet on the free list.
the problem with the lists is that every rx ring operation resulted
in two mutex ops. so 4 mutex ops per packet after you do both fill
and rxeof.
this replaces the mutexed lists with rings that shadow the hardware
rings. filling the rx ring pushes a producer index along, while
rxeof chases it with a consumer. because we know only one thing can
do either of those tasks at a time, we can get away with not using
atomic ops for them.
there's more to be done, but this is a good first step.
|
|
Note that pseudo-drivers not using if_input() are not affected by this
conversion.
ok mikeb@, kettenis@, claudio@, dlg@
|
|
appropriate locking around bpf now.
ok dlg@
|
|
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.
ok tedu@ deraadt@
|
|
ok dlg@
|
|
told me that if you're going to rx into buffers greater than 4k in
size, they have to be aligned to a 4k boundary.
the mru of this chip is 9k, but ive been using the 12k mcl pool to
provide the alignment. however, if we move to putting 8 items on a
pool page there'll be enough slack space in the mcl12k pool pages
to allow item colouring, which in turn will break the chip requirement
above. in practice the chips i have seem to work fine with unaligned
buffers, but i dont want to risk breaking early revision chips.
this moves myx to using a private pool for allocating clusters for
the big rx ring. the item size is 9k, but we specify a 4k alignment
so every item we get out of it will be correct for the chip.
|
|
size up to 4k.
found while reading someone elses driver.
|
|
|
|
be put on the ring couldnt be allocated.
this pulls the code that puts the mbufs on the ring out of myx_rx_fill
so it can return early if firstmb cant be allocated, which puts it
in the right place to return unused slots to the if_rxring.
this means myx rx wont lock up if you're DoSsed to the point where
you exhaust your mbuf pools and cant allocate mbufs for the ring.
ok jmatthew@
|
|
the number of contexts that are refilling the rx rings with atomic
ops.
this is borrowed from code i wrote for the scsi midlayer but cant
put in yet because i havent got atomic.h up to scrach on all archs
yet. the archs myx runs on do have enough atomic.h to be fine though.
|
|
|
|
and ether_input, queue all the mbufs onto an mbuf_list on the stack
and then take the biglock once outside the loop.
|
|
to ketenis.
move the if_ipacket and if_opacket increments out of biglock too.
theyre only updated from the interrupt handler, which is only run
on a single cpu so there's no chance of the update racing. everywhere
else only reads them.
|
|
|