summaryrefslogtreecommitdiff
path: root/sys/dev/pci/if_mcx.c
AgeCommit message (Collapse)Author
2024-10-23remove duplicate MCX_CAP_DEVICE_DRAIN_SIGERR defineJonathan Gray
2024-10-04As with other multiqueue drivers, print the number of queues we set upJonathan Matthew
along with the interrupt and ethernet address details. ok dlg@
2024-05-24remove unneeded includes; ok miod@Jonathan Gray
2024-04-12fix non-auto setting of extended media type bitsJonathan Gray
found by smatch warning about uninitialised var use ok jmatthew@
2024-04-11Match on ConnectX-6 virtual functions too, since they don't seem to beJonathan Matthew
any different to earlier revisions. from Brad
2024-04-11Add support for media types from the extended ethernet capabilities fields.Jonathan Matthew
If none of the regular ethernet capabilities are present, check the extended capabilities. Since we only report that the link is active if there's a detected media type, this isn't just a cosmetic change. Joerg Streckfuss reported that a gigabit SFP didn't work in a ConnectX-6 Lx, and tested that this change makes it work. ok dlg@
2023-11-10Make ifq and ifiq interface MP safe.Alexander Bluhm
Rename ifq_set_maxlen() to ifq_init_maxlen(). This function neither uses WRITE_ONCE() nor a mutex and is called before the ifq mutex is initialized. The new name expresses that it should be used only during interface attach when there is no concurrency. Protect ifq_len(), ifq_empty(), ifiq_len(), and ifiq_empty() with READ_ONCE(). They can be used without lock as they only read a single integer. OK dlg@
2023-09-18Add 100GB LR4 Ethernet capability and map it to IFM_100G_LR4.Jonathan Matthew
This isn't listed in the public PRM but it can be found in the Linux driver. from Olivier Croquin
2023-09-07match on Mellanox ConnectX-6 LxJonathan Gray
from and tested by Olivier Croquin ok dlg@
2023-08-15Replace a bunch of (1 << 31) with (1U << 31)Miod Vallat
2023-06-06don't need mcx_uptime() now that we have nsecuptime()David Gwynne
ok jmatthew@
2022-11-22Allocate additional command queue slots and use command completion eventsJonathan Matthew
to run commands where we can sleep while waiting. Rather than actually using it as a queue, just allocate the slots to particular uses. The first slot is used for polled commands (anything run while cold), then there's one for general ioctls, one for kstat reads, and one for link operations. Since we can sleep while waiting now, we need to serialize access to the command slots. This is done with rwlocks for the ioctl and kstat slots, and link slot is only used from a single instance task. This also means we don't need to hold the kernel lock while doing kstat reads. Using interrupt based command completion drops the time taken to read all the kstats off mcx interfaces from tens of milliseconds to almost nothing, which is a pretty big win when you're reading them every few seconds on busy firewalls. ok dlg@
2022-06-26Break out of the switch statement rather than returning early on ioctlJonathan Matthew
errors, ensuring the IPL is correctly restored. from Christian Ludwig
2022-03-11Constify struct cfattach.Martin Pieuchot
2022-01-09spellingJonathan Gray
feedback and ok tb@ jmc@ ok ratchov@
2021-07-23pci_intr_msix_count() is the function that drivers using multiple MSI-XJonathan Matthew
vectors use to decide whether to use MSI-X, so make it return 0 if MSI is not enabled for the device. fixes problems with ix(4) on older amd64 hardware and current riscv64 ok kettenis@ dlg@
2021-06-02When processing a received packet, only sync the amount of bytesPatrick Wildt
mcx(4) told us has arrived. The DMA map's mapsize on RX packets is the length of the allocated buffer. For mcx(4), this can be more than around 9000 bytes, as each buffer will be at least as big as the maximum supported MTU. There's no need to sync the whole buffer, if it's only a small packet. ok dlg@ jmatthew@
2021-02-25we don't have to cast to caddr_t when calling m_copydata anymore.David Gwynne
the first cut of this diff was made with coccinelle using this spatch: @rule@ type caddr_t; expression m, off, len, cp; @@ -m_copydata(m, off, len, (caddr_t)cp) +m_copydata(m, off, len, cp) i had fix it's opinionated idea of formatting by hand though, so i'm not sure it was worth it. ok deraadt@ bluhm@
2021-02-15move the rearming of the cq after the refill of the rq.David Gwynne
this is the only real diff we have left outstanding on a box that experienced rx lockups. since adding this change it's been happy for the last 4 weeks and counting so far. ok jmatthew@
2021-01-27do better accounting of how many msix interrupts we want to use.David Gwynne
ok jmatthew@
2021-01-25raise the max number of queues/interrupts to 16, up from 1.David Gwynne
jmatthew@ has tried this before, but hrvoje popovski experienced breakage so it wasn't enabled. we've tightened the code up since then so it's time to try again. this diff has been tested by hrvoje popovski and myself ok jmatthew@
2021-01-25don't lose the M_FLOWID flag if the ipv4 cksum is ok.David Gwynne
found while poking around with hrvoje popovski yes jmatthew@
2021-01-25use an intrmap when establishing interrupts for queues.David Gwynne
mcx is still hardcoded/limited to 1 queue for now, but this lets different mcx devices use different cpus for handling packets. looks good jmatthew@
2021-01-20Check management capabilities before trying to attach temperature sensors,Jonathan Matthew
avoiding an unhelpful error message if the card's firmware doesn't expose the sensor registers. tested by chris@, who saw the unhelpful error message ok dlg@
2021-01-04the tx doorbell is next to the rx doorbell, not on top of it.David Gwynne
2021-01-04use bus_dmamap_sync around updates to the doorbells.David Gwynne
ok jmatthew@
2020-12-27have mcx_process_txeof return the number of slots it processed.David Gwynne
it used a pointer in an argument to communicate that back to the caller, while being a void functon. this seems more natural and brings it in line with how the rx completion function returns free slots to its caller too.
2020-12-27do a bus space barrier after arming the eq.David Gwynne
ok jmatthew@
2020-12-27disable timestamping a little bit harder to avoid divide by 0.David Gwynne
hrvoje popovski reports the current code faults on some boxes. i'm working on it, but the code isn't being used right now.
2020-12-27shuffle filling the rx ring so the sw prod is updated before the hw.David Gwynne
ok jmatthew@
2020-12-26reuse the calculated vector as the argument to pci_intr_map_msix.David Gwynne
doing the maths again feels error prone.
2020-12-26add bus_dmamap_sync ops around the eq.David Gwynne
ok jmatthew@
2020-12-26add some bus_dmamap_syncs around the rq.David Gwynne
ok jmatthew@
2020-12-26sprinkle some bus_dmamap_syncs around the cq handling.David Gwynne
ok jmatthew@
2020-12-26sprinkle some bus_dmamap_syncs around the sq.David Gwynne
ok jmatthew@
2020-12-26better manage the lifetime of the dmamem used for various rings.David Gwynne
ok jmatthew@
2020-12-25expose the mcx timer as a timecounter.David Gwynne
this is mostly to help me better understand where i accumulate error when trying to sync the chip to the kernel clocks. ie, if im using mcx as the kernel clock source and my attempts to sync to it still produce errors, then my code is very wrong instead of slightly wrong. it's also fun and a tiny amount of code.
2020-12-17rework the maths used to set mbuf timestamps.David Gwynne
there's a comment that explains how it works now, but the result is that i get much tighter and more consistent synchronisation between the kernel clock and the values derived from the mcx timestamps now. however, i only just worked out that there is still an unresolved problem where the kernel clock changes how fast it ticks. this happens when ntpd uses adjtime(2) or adjfreq(2) to try and make the kernel tick at the same rate as the rest of the universe (well, the small bit of it that it can observe). these adjustments to the kernel clock immediately skew the timestamps that mcx calculates, but then it also throws off the mcx calibration calculations that run every 30 seconds. the offsets calculated in the next calibration period are often (very) negative. eg, when things are synced up nicely and you do a read of the mcx timer and immediately follow it with a nanouptime(9) call, on this box it calculates that the time in between those two events is about 2600ns. in the calibration period after ntpd did a very small adjtime call, it now thinks the time between those two events is -700000ns. this is a pretty obvious problem in hindsight. i can't think of a simple solution to it at the moment though so i'm going to leave mcx timestamping disabled for now.
2020-12-15fill in more of mcx_cap_device so i can get to the device frequencies.David Gwynne
fun fact, my Connect-x 4 Lx boards seem to run at 156MHz. less fun fact, mcx_calibrate() seems to work that out pretty well anyway, but the maths is still a bit too wonky to make it usable for mbuf timestamps.
2020-12-15go to splhigh around the kernel clock and hardware timer reads.David Gwynne
the idea is to avoid some other work, like a hardware interrupt, running in between the reads of the kernel and chip clocks and therefore skewing the interval calculations. this tightens up a lot of the slop seen when using the cqe timestamp for an mbuf timestamp, but there's still something not quite right.
2020-12-15turn hardware rx mbuf timestamping off by default.David Gwynne
we see too many ntp replies (appear to) arrive before the request for them was sent, which ntpd handles by disabling the peer for an hour. this was a lot easier to narrow down after fixing up bpf and timestamps, cos it let me see this: 16:50:36.051696 802.1Q vid 871 pri 3 192.0.2.55.47079 > 162.159.200.123.123: v4 client strat 0 poll 0 prec 0 [tos 0x10] 16:50:36.047201 802.1Q vid 871 pri 3 162.159.200.123.123 > 192.0.2.55.47079: v4 server strat 3 poll 0 prec -25 (DF) im going to borrow the link0 flag for a bit to allow turning timestamping on.
2020-12-12Rename the macro MCLGETI to MCLGETL and removes the dead parameter ifp.jan
OK dlg@, bluhm@ No Opinion mpi@ Not against it claudio@
2020-11-06Match on ConnectX-6 (non-Dx) cards too.Jonathan Matthew
tested by Nilson Lopes
2020-11-06Bail out early if the port type is not Ethernet, rather than failing laterJonathan Matthew
on in the attach process with a useless error message. tested on a ConnectX-6 card in infiniband mode by Nilson Lopes ok dlg@
2020-10-28Add missing bus_space_barrier() in mcx_cmdq_post() - without this, cmdqJonathan Matthew
operations during attach fail on some amd64 systems using the TSC delay function, seemingly as there aren't enough memory operations happening to get the doorbell write out to the device otherwise. The lapic delay function didn't expose this problem. suggested by kettenis@ ok dlg@
2020-08-21Add kstats reporting the software and hardware producer and consumerJonathan Matthew
counters for send, receive, completion and event queues, as well as the queue states. There are still some bugs in queue handling that we're trying to track down and these should help. No change in object size without kstat enabled. ok dlg@
2020-07-23Increase the event queue size. When polling for admin command completion,Jonathan Matthew
we don't wait for the event to be posted to the queue, we just look at the command itself, which means we can build up a backlog of events to be posted. Newer firmware for ConnectX-4 seems to get upset if the backlog grows beyond some fraction of the event queue size, causing an interrupt storm. This was reported by patrick@ and Hrvoje Popovski (at least) while testing support for multiple tx/rx queues. With the new event queue size, we can safely create 8 queues.
2020-07-17Virtual functions are effectively identical to full physical functions,Jonathan Matthew
so we can attach to them too. ok dlg@
2020-07-17Consistently use the port type and speed register (PTYS) to determine ifJonathan Matthew
the link is up, rather than the operational status (PAOS). ok dlg@
2020-07-16Pass the interrupt handler cookie instead of the pointer to itPatrick Wildt
to intr_barrier(9). Fixes mysterious panics seen while working on intr_barrier(9) for arm64. ok jmatthew@