summaryrefslogtreecommitdiff
path: root/sys
AgeCommit message (Collapse)Author
2020-07-07add kstat support for reading the "hardware" counters for each ring.David Gwynne
the counters happen to be a series of uint64_t values in memory, so we treat them as arrays that get mapped to a series of kstat_kv structs that are set up as 64 bit counters with either packet or byte counters as appropriate. this helps keep the code size down. while we export the counters as separate kstats per rx and tx ring, you request an update from the hypervisor at the controller level. this code ratelimits these requests to 1 per second per interface to try and debounce this a bit so each kstat read doesnt cause a vmexit. here's an example of the stats. note that we get to see how many packets that rx ring moderation drops for the first time. see the "no buffers" stat. vmx0:0:rxq:5 packets: 2372483 packets bytes: 3591909057 bytes qdrops: 0 packets errors: 0 packets qlen: 0 packets ... vmx0:0:txq:5 packets: 1316856 packets bytes: 86961577 bytes qdrops: 0 packets errors: 0 packets qlen: 1 packets maxqlen: 512 packets oactive: false ... vmx0:0:vmx-rxstats:5 LRO packets: 0 packets LRO bytes: 0 bytes ucast packets: 2372483 packets ucast bytes: 3591909053 bytes mcast packets: 0 packets mcast bytes: 0 bytes bcast packets: 0 packets bcast bytes: 0 bytes no buffers: 696 packets errors: 0 packets ... vmx0:0:vmx-txstats:5 TSO packets: 0 packets TSO bytes: 0 bytes ucast packets: 1316839 packets ucast bytes: 86960455 bytes mcast packets: 0 packets mcast bytes: 0 bytes bcast packets: 0 packets bcast bytes: 0 bytes errors: 0 packets discards: 0 packets
2020-07-07add kstats for rx queues (ifiqs) and transmit queues (ifqs).David Gwynne
this means you can observe what the network stack is trying to do when it's working with a nic driver that supports multiple rings. a nic with only one set of rings still gets queues though, and this still exports their stats. here is a small example of what kstat(8) currently outputs for these stats: em0:0:rxq:0 packets: 2292 packets bytes: 229846 bytes qdrops: 0 packets errors: 0 packets qlen: 0 packets em0:0:txq:0 packets: 1297 packets bytes: 193413 bytes qdrops: 0 packets errors: 0 packets qlen: 0 packets maxqlen: 511 packets oactive: false
2020-07-06Wire down the timekeep page. If we don't do this, the pagedaemon mayMark Kettenis
page it out and bad things will happen when we try to page it back in from within the clock interrupt handler. While there, make sure we set timekeep_object back to NULL if we fail to make the timekeep page into kernel space. ok deraadt@ (who had a very similar diff)
2020-07-06Protect the whole pipex(4) layer by NET_LOCK(). pipex(4) wasmvs
simultaneously protected by KERNEL_LOCK() and NET_LOCK() and now we have the only lock for it. This step reduces locking mess in this layer. ok mpi@
2020-07-06pipex_rele_session() frees memory pointed by `old_session_keys'. Use it inmvs
pipex_destroy_session() instead of pool_put(9) to prevent memory leak. ok mpi@
2020-07-06fix spellingTheo de Raadt
2020-07-06Save and restore FPU around signal handlers.Mark Kettenis
2020-07-06Hide most of the contents behind #ifdef _KERNEL. Reorganize the file aMark Kettenis
bit to achieve this with a single #ifdef/#endif pair.
2020-07-06IEEE1275 (Open Firmware) defines that parameter name strings can have aMark Kettenis
length of up to 31 characters. This limit is also present in the flattened device tree specification/ Unfortunately this limit isn't enforced by the tooling and there are systems in the wild that use longer strings. This includes the device trees used on POWER9 systems and has been seen on some ARM systems as well. So bump the buffer size from 32 bytes (31 + terminating NUL) to 64 bytes. Centrally define OFMAXPARAM to this value (in <dev/ofw/openfirm.h>) replacing the various OPROMMAXPARAM definition scattered around the tree to make sure the FDT implementation of OF_nextprop() uses the same buffer size as its consumers. Eliminate the static buffer in various openprom(4) implementations on FDT systems. Makes it possible to dump the full device tree on POWER9 systems using eeprom -p. ok deraadt@, visa@
2020-07-06defer access of fb_info pointer in drm_fb_helper_hotplug_event()Jonathan Gray
Fixes a regression from rev 1.24 which lead to a page fault reported by Martin Ziemer. ok stsp@
2020-07-06Add support for timeconting in userland.Paul Irofti
This diff exposes parts of clock_gettime(2) and gettimeofday(2) to userland via libc eliberating processes from the need for a context switch everytime they want to count the passage of time. If a timecounter clock can be exposed to userland than it needs to set its tc_user member to a non-zero value. Tested with one or multiple counters per architecture. The timing data is shared through a pointer found in the new ELF auxiliary vector AUX_openbsd_timekeep containing timehands information that is frequently updated by the kernel. Timing differences between the last kernel update and the current time are adjusted in userland by the tc_get_timecount() function inside the MD usertc.c file. This permits a much more responsive environment, quite visible in browsers, office programs and gaming (apparently one is are able to fly in Minecraft now). Tested by robert@, sthen@, naddy@, kmos@, phessler@, and many others! OK from at least kettenis@, cheloha@, naddy@, sthen@
2020-07-06Repair athn(4) in client mode against WPA2 access points.Stefan Sperling
Client mode was subtly broken after support for CCMP offload was added. In client mode we should be using the first key table slot for our CCMP pairwise key, not an arbitrary slot based on our association ID (as is done in hostap mode). When the interface came up again after being reset the CCMP hardware engine was left in a non-working state. Apparently the key table was messed up or contained stale entries. Fix a potential timing issue in the code path which attempts to clear the key table on device power-up. For good measure, also clear the key table before the device is powered down. While here, fix off-by-ones in key table slot range checks. Problems reported by Tim Chase, Kevin Chadwick, Austin Hook, Stefan Kapfhammer. Fix tested by me on AR9280 (PCI) and AR9271 (USB) and Kevin Chadwick on AR9280
2020-07-06match on D-Link DWA-121 rev B1Jonathan Gray
from Miguel Landaeta
2020-07-06regenJonathan Gray
2020-07-06add D-Link DWA-121 rev B1Jonathan Gray
from Miguel Landaeta
2020-07-06Fix pmap_pted_ro() such that it actually takes away PROT_EXEC when asked toMark Kettenis
do so.
2020-07-06wire up kstat(4)David Gwynne
"looks right" deraadt@
2020-07-06kstat does open, close, and ioctl.David Gwynne
2020-07-06tell the kernel how to build kstatDavid Gwynne
it's like ksyms, but different
2020-07-06add kstat(4), a subsystem to let the kernel expose statistics to userland.David Gwynne
a kstat is an arbitrary chunk of data that a part of the kernel wants to expose to userland. data could mean just a chunk of raw bytes, but generally a kernel subsystem will provide a series of kstat key/value chunks. this code is loosely modelled on kstat in solaris, but with a bunch of simplifications (we don't want to provide write support for example). the named or key/value structure is significantly richer in this version too. eg, ssolaris kstat named data supports integer types, but this version offers differentiation between counters (like the number of packets transmitted on an interface) and gauges (like how long the transmit queue is) and lets kernel providers say what the units are (eg, packets vs bytes vs cycles). the main motivation for this is to improve the visibility of what the kernel is doing while it's running. i wrote this as part of the recent work we've been doing on multiqueue and rss/toeplitz so i could verify that network load is actually spread across multiple rings on a single nic. without this we would be wasting memory and interrupt vectors on multiple rings and still just using the 1st one, and noone would know cos there's no way to see what rings are being used. another thing that can become visible is the different counters that various network cards provide. i'm particularly interested in seeing if packets get dropped because the rings aren't filled fully, which is an effect we've never really observed directly. a small part of wanting this is cos i spend an annoying amount of time instrumenting the kernel when hacking code in it. if most of the scaffolding for the instrumentation is already there, i can avoid repeatedly writing that code and save time. iterated a few times with claudio@ and deraadt@
2020-07-05Double checking you committed the correct diff sometimes paysKenneth R Westerback
off. Sigh.
2020-07-05use the intended operator in cpu_rnd_messybits(); ok kettenis@Christian Weisgerber
2020-07-05Nuke struct scsi_link's "scsibus" member. The two drivers using itKenneth R Westerback
(ahc(4) and qlw(4)) can just compare the values of the "bus" member directly. A slightly different path to the same result that matthew@ traversed in his work culminating in scsiconf.h r1.146.
2020-07-05Use scsi_link's 'bus' field rather than slightly more obscureKenneth R Westerback
dev_parent twiddling. Improves chances of progress on eliminating various bus related fields from struct scsi_link. Tested by kmos@ "might actually be an improvement" kettenis@.
2020-07-05Enable xhci(4) and a (deliberately) tiny set of USB devices.Mark Kettenis
2020-07-05We need to set the bypass bit for "raw" DMA memory as well.Mark Kettenis
2020-07-05Count traps and interrupts. And count system calls in the same placeMark Kettenis
since this makes it easier to reason about the accounting.
2020-07-05Don't forget to schedule an AST in need_resched().Mark Kettenis
2020-07-05uvideo_querycap(): Set the 'device_caps' field of struct v4l2_capability ↵Landry Breuil
like done in utvfu(4) Fixes webcam detection in firefox 78, where code was added to check for V4L2_CAP_VIDEO_CAPTURE capability on 'device_caps', whereas we only set it in the 'capabilities' field. According to https://www.kernel.org/doc/html/v4.14/media/uapi/v4l/vidioc-querycap.html#description those distinct fields are here for drivers that provide several devices, but firefox decided to check for 'device_caps' field instead of 'capability' (cf https://hg.mozilla.org/integration/autoland/rev/33facf191f23) - so fill the field for compatibility reasons, while https://bugzilla.mozilla.org/show_bug.cgi?id=1650572 discusses with upstream what's the right way. ok mglocker@
2020-07-05Save FPU state to PCB before running a signal handler. This doesn'tMark Kettenis
completely fix the case where the FPU is used in a signal handler but it is part of the solution and makes sure the processor mode check in sys_sigreturn() passes if the process was using the FPU when the signal happened.
2020-07-05Make sure we return ENAMETOOLONG when copying a string into a buffer ofMark Kettenis
zero bytes.
2020-07-05We're self-hosted now.Mark Kettenis
2020-07-05Fix mbuf leak in urtwn(4) with frames that are CCMP-crypted in hardware.Stefan Sperling
Problem reported and fix tested by Bastian Wessling on bugs@ ok jmatthew@
2020-07-05match on "ti,am335-sdhci" used since linux 5.8-rc3Jonathan Gray
2020-07-04Bump the size of the ramdisk.Mark Kettenis
2020-07-04Use block device numbers instead of character device numbers.Mark Kettenis
2020-07-04Nestle all sc_link initialization near config_found() invocation.Kenneth R Westerback
2020-07-04Fill in nam2blk array.Mark Kettenis
2020-07-04Nestle all sc_c.sc_link initialization near config_found() invocation.Kenneth R Westerback
2020-07-04Set dsisr member of the trapframe struct to a defined value before fallingMark Kettenis
through into the data storage interrupt when handing a data segment interrupt.
2020-07-04OF_finddevice() returns -1 upon failure. Fix various checks of the returnMark Kettenis
value. Makes things work again on the rpi3. ok jsg@
2020-07-04Fix intermittent failing device initialization seen on some SynapticsMarcus Glocker
devices which has been re-introduced by pms.c revision 1.92. ok tb@
2020-07-04Do the same as libc, store "-1" in the return value when a syscall failed.Martin Pieuchot
Simplify the logic by always exporting the return value and errno in the syscall profiler.
2020-07-04Remove no-op cn30xxgmx_reset_board().Visa Hankala
2020-07-04Use klist_invalidate() in knote_processexit()Visa Hankala
This leaves knote_remove() for kqueue's internal use. As a result, knote_remove() is used to drop knotes from the knlist of a single kqueue instance. klist_invalidate() clears knotes from a klist that can contain entries from different kqueue instances. Use FILTEROP_ISFD to control how klist_invalidate() treats knotes, to preserve the current behaviour of knote_processexit(). All the existing callers of klist_invalidate() are fd-based. The existing code rewires and activates knotes to give userspace a clear indication that the state of the fd has changed. In knote_processexit(), any remaining knotes in ps_klist are non-fd-based (EVFILT_SIGNAL). Those are dropped without notifying userspace. OK mpi@
2020-07-04It's been agreed upon that global locks should be expressed usinganton
capital letters in locking annotations. Therefore harmonize the existing annotations. Also, if multiple locks are required they should be delimited using commas. ok mpi@
2020-07-04Permit the stack to check transport and network checksums. Although the linkRichard Procter
provides stronger integrity checks, it needn't cover the end-to-end transport path. And it is in any case a layer violation for one layer to disable the checks of another. Skipping the network check saved ~2.4% +/- ~0.2% of cp_time (sys+intr) on the forwarding path of a 1Ghz AMD G-T40N (apu1). Other checksum speedups exist which do not skip the check. ok claudio@ kn@ stsp@
2020-07-03Use an LFENCE instruction everywhere where we use RDTSC when we areMark Kettenis
doing some sort of time measurement. This is necessary since RDTSC is not a serializing instruction. We can use LFENCE as the serializing instruction instead of CPUID since all amd64 machines have SSE. This considerably reduces the jitter in TSC skew measurements. ok deraadt@, cheloha@, phessler@
2020-07-03Rename IN6_IFF_PRIVACY to IN6_IFF_TEMPORARY.Florian Obser
This is the name the other BSDs use for this, there is no reason to be different, the IPv6 RFCs call these addresses temporary, and some software in ports wants to use this as well. Most recently pointed out for firefox by landry. OK claudio, sthen
2020-07-03We need a RAMDISK kernel config as well of course.Mark Kettenis