Age | Commit message (Collapse) | Author |
|
the counters happen to be a series of uint64_t values in memory,
so we treat them as arrays that get mapped to a series of kstat_kv
structs that are set up as 64 bit counters with either packet or
byte counters as appropriate. this helps keep the code size down.
while we export the counters as separate kstats per rx and tx ring,
you request an update from the hypervisor at the controller level.
this code ratelimits these requests to 1 per second per interface
to try and debounce this a bit so each kstat read doesnt cause a
vmexit.
here's an example of the stats. note that we get to see how many
packets that rx ring moderation drops for the first time. see the
"no buffers" stat.
vmx0:0:rxq:5
packets: 2372483 packets
bytes: 3591909057 bytes
qdrops: 0 packets
errors: 0 packets
qlen: 0 packets
...
vmx0:0:txq:5
packets: 1316856 packets
bytes: 86961577 bytes
qdrops: 0 packets
errors: 0 packets
qlen: 1 packets
maxqlen: 512 packets
oactive: false
...
vmx0:0:vmx-rxstats:5
LRO packets: 0 packets
LRO bytes: 0 bytes
ucast packets: 2372483 packets
ucast bytes: 3591909053 bytes
mcast packets: 0 packets
mcast bytes: 0 bytes
bcast packets: 0 packets
bcast bytes: 0 bytes
no buffers: 696 packets
errors: 0 packets
...
vmx0:0:vmx-txstats:5
TSO packets: 0 packets
TSO bytes: 0 bytes
ucast packets: 1316839 packets
ucast bytes: 86960455 bytes
mcast packets: 0 packets
mcast bytes: 0 bytes
bcast packets: 0 packets
bcast bytes: 0 bytes
errors: 0 packets
discards: 0 packets
|
|
this means you can observe what the network stack is trying to do
when it's working with a nic driver that supports multiple rings.
a nic with only one set of rings still gets queues though, and this
still exports their stats.
here is a small example of what kstat(8) currently outputs for these
stats:
em0:0:rxq:0
packets: 2292 packets
bytes: 229846 bytes
qdrops: 0 packets
errors: 0 packets
qlen: 0 packets
em0:0:txq:0
packets: 1297 packets
bytes: 193413 bytes
qdrops: 0 packets
errors: 0 packets
qlen: 0 packets
maxqlen: 511 packets
oactive: false
|
|
page it out and bad things will happen when we try to page it back in
from within the clock interrupt handler.
While there, make sure we set timekeep_object back to NULL if we fail
to make the timekeep page into kernel space.
ok deraadt@ (who had a very similar diff)
|
|
simultaneously protected by KERNEL_LOCK() and NET_LOCK() and now we have
the only lock for it. This step reduces locking mess in this layer.
ok mpi@
|
|
pipex_destroy_session() instead of pool_put(9) to prevent memory leak.
ok mpi@
|
|
|
|
|
|
bit to achieve this with a single #ifdef/#endif pair.
|
|
length of up to 31 characters. This limit is also present in the
flattened device tree specification/ Unfortunately this limit isn't enforced
by the tooling and there are systems in the wild that use longer strings.
This includes the device trees used on POWER9 systems and has been seen
on some ARM systems as well.
So bump the buffer size from 32 bytes (31 + terminating NUL) to 64 bytes.
Centrally define OFMAXPARAM to this value (in <dev/ofw/openfirm.h>)
replacing the various OPROMMAXPARAM definition scattered around the tree
to make sure the FDT implementation of OF_nextprop() uses the same
buffer size as its consumers.
Eliminate the static buffer in various openprom(4) implementations on
FDT systems.
Makes it possible to dump the full device tree on POWER9 systems using
eeprom -p.
ok deraadt@, visa@
|
|
Fixes a regression from rev 1.24 which lead to a page fault reported by
Martin Ziemer. ok stsp@
|
|
This diff exposes parts of clock_gettime(2) and gettimeofday(2) to
userland via libc eliberating processes from the need for a context
switch everytime they want to count the passage of time.
If a timecounter clock can be exposed to userland than it needs to set
its tc_user member to a non-zero value. Tested with one or multiple
counters per architecture.
The timing data is shared through a pointer found in the new ELF
auxiliary vector AUX_openbsd_timekeep containing timehands information
that is frequently updated by the kernel.
Timing differences between the last kernel update and the current time
are adjusted in userland by the tc_get_timecount() function inside the
MD usertc.c file.
This permits a much more responsive environment, quite visible in
browsers, office programs and gaming (apparently one is are able to fly
in Minecraft now).
Tested by robert@, sthen@, naddy@, kmos@, phessler@, and many others!
OK from at least kettenis@, cheloha@, naddy@, sthen@
|
|
Client mode was subtly broken after support for CCMP offload was added.
In client mode we should be using the first key table slot for our CCMP
pairwise key, not an arbitrary slot based on our association ID (as is
done in hostap mode).
When the interface came up again after being reset the CCMP hardware engine
was left in a non-working state. Apparently the key table was messed up or
contained stale entries. Fix a potential timing issue in the code path which
attempts to clear the key table on device power-up. For good measure, also
clear the key table before the device is powered down.
While here, fix off-by-ones in key table slot range checks.
Problems reported by Tim Chase, Kevin Chadwick, Austin Hook, Stefan Kapfhammer.
Fix tested by me on AR9280 (PCI) and AR9271 (USB) and Kevin Chadwick on AR9280
|
|
from Miguel Landaeta
|
|
|
|
from Miguel Landaeta
|
|
do so.
|
|
"looks right" deraadt@
|
|
|
|
it's like ksyms, but different
|
|
a kstat is an arbitrary chunk of data that a part of the kernel
wants to expose to userland. data could mean just a chunk of raw
bytes, but generally a kernel subsystem will provide a series of
kstat key/value chunks.
this code is loosely modelled on kstat in solaris, but with a bunch
of simplifications (we don't want to provide write support for
example). the named or key/value structure is significantly richer
in this version too. eg, ssolaris kstat named data supports integer
types, but this version offers differentiation between counters
(like the number of packets transmitted on an interface) and gauges
(like how long the transmit queue is) and lets kernel providers say
what the units are (eg, packets vs bytes vs cycles).
the main motivation for this is to improve the visibility of what
the kernel is doing while it's running. i wrote this as part of the
recent work we've been doing on multiqueue and rss/toeplitz so i
could verify that network load is actually spread across multiple
rings on a single nic. without this we would be wasting memory and
interrupt vectors on multiple rings and still just using the 1st
one, and noone would know cos there's no way to see what rings are
being used.
another thing that can become visible is the different counters
that various network cards provide. i'm particularly interested in
seeing if packets get dropped because the rings aren't filled fully,
which is an effect we've never really observed directly.
a small part of wanting this is cos i spend an annoying amount of
time instrumenting the kernel when hacking code in it. if most of
the scaffolding for the instrumentation is already there, i can
avoid repeatedly writing that code and save time.
iterated a few times with claudio@ and deraadt@
|
|
off. Sigh.
|
|
|
|
(ahc(4) and qlw(4)) can just compare the values of the "bus" member
directly.
A slightly different path to the same result that matthew@ traversed
in his work culminating in scsiconf.h r1.146.
|
|
dev_parent twiddling. Improves chances of progress on eliminating
various bus related fields from struct scsi_link.
Tested by kmos@
"might actually be an improvement" kettenis@.
|
|
|
|
|
|
since this makes it easier to reason about the accounting.
|
|
|
|
like done in utvfu(4)
Fixes webcam detection in firefox 78, where code was added to check for
V4L2_CAP_VIDEO_CAPTURE capability on 'device_caps', whereas we only set it
in the 'capabilities' field.
According to
https://www.kernel.org/doc/html/v4.14/media/uapi/v4l/vidioc-querycap.html#description
those distinct fields are here for drivers that provide several devices,
but firefox decided to check for 'device_caps' field instead of
'capability' (cf
https://hg.mozilla.org/integration/autoland/rev/33facf191f23) - so fill
the field for compatibility reasons, while
https://bugzilla.mozilla.org/show_bug.cgi?id=1650572 discusses with
upstream what's the right way.
ok mglocker@
|
|
completely fix the case where the FPU is used in a signal handler
but it is part of the solution and makes sure the processor mode check
in sys_sigreturn() passes if the process was using the FPU when the signal
happened.
|
|
zero bytes.
|
|
|
|
Problem reported and fix tested by Bastian Wessling on bugs@
ok jmatthew@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
through into the data storage interrupt when handing a data segment
interrupt.
|
|
value. Makes things work again on the rpi3.
ok jsg@
|
|
devices which has been re-introduced by pms.c revision 1.92.
ok tb@
|
|
Simplify the logic by always exporting the return value and errno in the
syscall profiler.
|
|
|
|
This leaves knote_remove() for kqueue's internal use. As a result,
knote_remove() is used to drop knotes from the knlist of a single
kqueue instance. klist_invalidate() clears knotes from a klist that
can contain entries from different kqueue instances.
Use FILTEROP_ISFD to control how klist_invalidate() treats knotes,
to preserve the current behaviour of knote_processexit(). All the
existing callers of klist_invalidate() are fd-based. The existing code
rewires and activates knotes to give userspace a clear indication that
the state of the fd has changed. In knote_processexit(), any remaining
knotes in ps_klist are non-fd-based (EVFILT_SIGNAL). Those are dropped
without notifying userspace.
OK mpi@
|
|
capital letters in locking annotations. Therefore harmonize the existing
annotations.
Also, if multiple locks are required they should be delimited using
commas.
ok mpi@
|
|
provides stronger integrity checks, it needn't cover the end-to-end transport
path. And it is in any case a layer violation for one layer to disable the
checks of another. Skipping the network check saved ~2.4% +/- ~0.2% of cp_time
(sys+intr) on the forwarding path of a 1Ghz AMD G-T40N (apu1). Other checksum
speedups exist which do not skip the check.
ok claudio@ kn@ stsp@
|
|
doing some sort of time measurement. This is necessary since RDTSC
is not a serializing instruction. We can use LFENCE as the serializing
instruction instead of CPUID since all amd64 machines have SSE.
This considerably reduces the jitter in TSC skew measurements.
ok deraadt@, cheloha@, phessler@
|
|
This is the name the other BSDs use for this, there is no reason to
be different, the IPv6 RFCs call these addresses temporary, and some
software in ports wants to use this as well.
Most recently pointed out for firefox by landry.
OK claudio, sthen
|
|
|