Age | Commit message (Collapse) | Author |
|
|
|
along with the interrupt and ethernet address details.
ok dlg@
|
|
|
|
found by smatch warning about uninitialised var use
ok jmatthew@
|
|
any different to earlier revisions.
from Brad
|
|
If none of the regular ethernet capabilities are present, check the extended
capabilities. Since we only report that the link is active if there's a
detected media type, this isn't just a cosmetic change.
Joerg Streckfuss reported that a gigabit SFP didn't work in a ConnectX-6 Lx,
and tested that this change makes it work.
ok dlg@
|
|
Rename ifq_set_maxlen() to ifq_init_maxlen(). This function neither
uses WRITE_ONCE() nor a mutex and is called before the ifq mutex
is initialized. The new name expresses that it should be used only
during interface attach when there is no concurrency.
Protect ifq_len(), ifq_empty(), ifiq_len(), and ifiq_empty() with
READ_ONCE(). They can be used without lock as they only read a
single integer.
OK dlg@
|
|
This isn't listed in the public PRM but it can be found in the Linux driver.
from Olivier Croquin
|
|
from and tested by Olivier Croquin
ok dlg@
|
|
|
|
ok jmatthew@
|
|
to run commands where we can sleep while waiting. Rather than actually
using it as a queue, just allocate the slots to particular uses.
The first slot is used for polled commands (anything run while cold),
then there's one for general ioctls, one for kstat reads, and one for
link operations. Since we can sleep while waiting now, we need to serialize
access to the command slots. This is done with rwlocks for the ioctl and
kstat slots, and link slot is only used from a single instance task.
This also means we don't need to hold the kernel lock while doing kstat
reads.
Using interrupt based command completion drops the time taken to read all
the kstats off mcx interfaces from tens of milliseconds to almost nothing,
which is a pretty big win when you're reading them every few seconds on
busy firewalls.
ok dlg@
|
|
errors, ensuring the IPL is correctly restored.
from Christian Ludwig
|
|
|
|
feedback and ok tb@ jmc@ ok ratchov@
|
|
vectors use to decide whether to use MSI-X, so make it return 0 if MSI
is not enabled for the device.
fixes problems with ix(4) on older amd64 hardware and current riscv64
ok kettenis@ dlg@
|
|
mcx(4) told us has arrived. The DMA map's mapsize on RX packets
is the length of the allocated buffer. For mcx(4), this can be
more than around 9000 bytes, as each buffer will be at least as
big as the maximum supported MTU. There's no need to sync the
whole buffer, if it's only a small packet.
ok dlg@ jmatthew@
|
|
the first cut of this diff was made with coccinelle using this spatch:
@rule@
type caddr_t;
expression m, off, len, cp;
@@
-m_copydata(m, off, len, (caddr_t)cp)
+m_copydata(m, off, len, cp)
i had fix it's opinionated idea of formatting by hand though, so
i'm not sure it was worth it.
ok deraadt@ bluhm@
|
|
this is the only real diff we have left outstanding on a box that
experienced rx lockups. since adding this change it's been happy
for the last 4 weeks and counting so far.
ok jmatthew@
|
|
ok jmatthew@
|
|
jmatthew@ has tried this before, but hrvoje popovski experienced
breakage so it wasn't enabled. we've tightened the code up since
then so it's time to try again.
this diff has been tested by hrvoje popovski and myself
ok jmatthew@
|
|
found while poking around with hrvoje popovski
yes jmatthew@
|
|
mcx is still hardcoded/limited to 1 queue for now, but this lets
different mcx devices use different cpus for handling packets.
looks good jmatthew@
|
|
avoiding an unhelpful error message if the card's firmware doesn't expose
the sensor registers.
tested by chris@, who saw the unhelpful error message
ok dlg@
|
|
|
|
ok jmatthew@
|
|
it used a pointer in an argument to communicate that back to the
caller, while being a void functon. this seems more natural and
brings it in line with how the rx completion function returns free
slots to its caller too.
|
|
ok jmatthew@
|
|
hrvoje popovski reports the current code faults on some boxes. i'm
working on it, but the code isn't being used right now.
|
|
ok jmatthew@
|
|
doing the maths again feels error prone.
|
|
ok jmatthew@
|
|
ok jmatthew@
|
|
ok jmatthew@
|
|
ok jmatthew@
|
|
ok jmatthew@
|
|
this is mostly to help me better understand where i accumulate error
when trying to sync the chip to the kernel clocks. ie, if im using
mcx as the kernel clock source and my attempts to sync to it still
produce errors, then my code is very wrong instead of slightly
wrong.
it's also fun and a tiny amount of code.
|
|
there's a comment that explains how it works now, but the result is
that i get much tighter and more consistent synchronisation between
the kernel clock and the values derived from the mcx timestamps
now.
however, i only just worked out that there is still an unresolved
problem where the kernel clock changes how fast it ticks. this
happens when ntpd uses adjtime(2) or adjfreq(2) to try and make the
kernel tick at the same rate as the rest of the universe (well, the
small bit of it that it can observe). these adjustments to the
kernel clock immediately skew the timestamps that mcx calculates,
but then it also throws off the mcx calibration calculations that
run every 30 seconds. the offsets calculated in the next calibration
period are often (very) negative.
eg, when things are synced up nicely and you do a read of the mcx
timer and immediately follow it with a nanouptime(9) call, on this
box it calculates that the time in between those two events is about
2600ns. in the calibration period after ntpd did a very small adjtime
call, it now thinks the time between those two events is -700000ns.
this is a pretty obvious problem in hindsight. i can't think of a
simple solution to it at the moment though so i'm going to leave
mcx timestamping disabled for now.
|
|
fun fact, my Connect-x 4 Lx boards seem to run at 156MHz. less fun
fact, mcx_calibrate() seems to work that out pretty well anyway,
but the maths is still a bit too wonky to make it usable for mbuf
timestamps.
|
|
the idea is to avoid some other work, like a hardware interrupt,
running in between the reads of the kernel and chip clocks and
therefore skewing the interval calculations. this tightens up a
lot of the slop seen when using the cqe timestamp for an mbuf
timestamp, but there's still something not quite right.
|
|
we see too many ntp replies (appear to) arrive before the request
for them was sent, which ntpd handles by disabling the peer for an
hour.
this was a lot easier to narrow down after fixing up bpf and
timestamps, cos it let me see this:
16:50:36.051696 802.1Q vid 871 pri 3 192.0.2.55.47079 > 162.159.200.123.123: v4 client strat 0 poll 0 prec 0 [tos 0x10]
16:50:36.047201 802.1Q vid 871 pri 3 162.159.200.123.123 > 192.0.2.55.47079: v4 server strat 3 poll 0 prec -25 (DF)
im going to borrow the link0 flag for a bit to allow turning
timestamping on.
|
|
OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@
|
|
tested by Nilson Lopes
|
|
on in the attach process with a useless error message.
tested on a ConnectX-6 card in infiniband mode by Nilson Lopes
ok dlg@
|
|
operations during attach fail on some amd64 systems using the TSC delay
function, seemingly as there aren't enough memory operations happening to
get the doorbell write out to the device otherwise. The lapic delay
function didn't expose this problem.
suggested by kettenis@
ok dlg@
|
|
counters for send, receive, completion and event queues, as well as
the queue states. There are still some bugs in queue handling that
we're trying to track down and these should help. No change in object
size without kstat enabled.
ok dlg@
|
|
we don't wait for the event to be posted to the queue, we just look at the
command itself, which means we can build up a backlog of events to be posted.
Newer firmware for ConnectX-4 seems to get upset if the backlog grows
beyond some fraction of the event queue size, causing an interrupt storm.
This was reported by patrick@ and Hrvoje Popovski (at least) while testing
support for multiple tx/rx queues. With the new event queue size, we can
safely create 8 queues.
|
|
so we can attach to them too.
ok dlg@
|
|
the link is up, rather than the operational status (PAOS).
ok dlg@
|
|
to intr_barrier(9). Fixes mysterious panics seen while working
on intr_barrier(9) for arm64.
ok jmatthew@
|