Age | Commit message (Collapse) | Author |
|
When finishing a wseventvar in wsevent_fini(), clear the klist.
Otherwise knotes can be left dangling, which can crash the kernel.
In general, klist_invalidate() should happen after vdevgone() in order
to avoid a race with kevent registration. However, the current wscons
drivers clear the wsevent pointer (sc->sc_base.me_evp) before calling
wsevent_fini(). This prevents the drivers from registering new kevents.
Prompted by a report by Peter J. Philipp on bugs@
OK mvs@ miod@
|
|
ok kettenis@
|
|
After changing tcp now tick to milliseconds, 32 bits will wrap
around after 49 days of uptime. That may be a problem in some
places of our stack. Better use a 64 bit counter.
As timestamp option is 32 bit in TCP protocol, use the lower 32 bit
there. There are casts to 32 bits that should behave correctly.
Start with random 63 bit offset to avoid uptime leakage. 2^63
milliseconds result in 2.9*10^8 years of possible uptime.
OK yasuoka@
|
|
ok kettenis@
|
|
IBT/BTI, because many more things are about to work correctly
|
|
for ramdisks. noticed by anton.
this must be reconsidered.
|
|
moving pf forward has been a real struggle, and pfsync has been a
constant source of pain. we have been papering over the problems
for a while now, but it reached the point that it needed a fundamental
restructure, which is what this diff is.
the big headliner changes in this diff are:
- pfsync specific locks
this is the whole reason for this diff.
rather than rely on NET_LOCK or KERNEL_LOCK or whatever, pfsync now
has it's own locks to protect it's internal data structures. this
is important because pfsync runs a bunch of timeouts and tasks to
push pfsync packets out on the wire, or when it's handling requests
generated by incoming pfsync packets, both of which happen outside
pf itself running. having pfsync specific locks around pfsync data
structures makes the mutations of these data structures a lot more
explicit and auditable.
- partitioning
to enable future parallelisation of the network stack, this rewrite
includes support for pfsync to partition states into different "slices".
these slices run independently, ie, the states collected by one slice
are serialised into a separate packet to the states collected and
serialised by another slice.
states are mapped to pfsync slices based on the pf state hash, which
is the same hash that the rest of the network stack and multiq
hardware uses.
- no more pfsync called from netisr
pfsync used to be called from netisr to try and bundle packets, but now
that there's multiple pfsync slices this doesnt make sense. instead it
uses tasks in softnet tqs.
- improved bulk transfer handling
there's shiny new state machines around both the bulk transmit and
receive handling. pfsync used to do horrible things to carp demotion
counters, but now it is very predictable and returns the counters back
where they started.
- better tdb handling
the tdb handling was pretty hairy, but hrvoje has kicked this around
a lot with ipsec and sasyncd and we've found and fixed a bunch of
issues as a result of that testing.
- mpsafe pf state purges
this was committed previously, but because the locks pfsync relied on
weren't clear this just caused a ton of bugs. as part of this diff it's
now reliable, and moves a big chunk of work out from under KERNEL_LOCK,
which in turn improves the responsiveness and throughput of a firewall
even if you're not using pfsync.
there's a bunch of other little changes along the way, but the above are
the big ones.
hrvoje has done performance testing with this diff and notes a big
improvement when pfsync is not in use. performance when pfsync is
enabled is about the same, but im hoping the slices means we can scale
along with pf as it improves.
lots (months) of testing by me and hrvoje on pfsync boxes
tests and ok sashan@
deraadt@ says this is a good time to put it in
|
|
From Rodrigo Siqueira
c50065a3927932cd9baf3d5c94c91b58c31200d5 in linux-6.1.y/6.1.38
2820433be2a33beb44b13b367e155cf221f29610 in mainline linux
|
|
From Bas Nieuwenhuizen
9d0b2afadfd71e9bedd593358bd7ac4701e46477 in linux-6.1.y/6.1.38
a2b308044dcaca8d3e580959a4f867a1d5c37fac in mainline linux
|
|
From Aric Cyr
a905b0b318ad7d37c3041573454129923e0a0723 in linux-6.1.y/6.1.38
32953485c558cecf08f33fbfa251e80e44cef981 in mainline linux
|
|
From Alvin Lee
dd6d6f9d47aebf50713fb857f91402a1c6c3131c in linux-6.1.y/6.1.38
3442f4e0e55555d14b099c17382453fdfd2508d5 in mainline linux
|
|
ok kettenis@ deraadt@
|
|
If fixed-link is present, populate the interface baudrate, link status
(full duplex or half duplex) and media type, and then call the statch
handler to apply that config to the MAC. If fixed-link is specified
then do not attach a phy.
Note that phy lookup and reset still occurs in case the device tree
still uses the deprecated snps,reset-gpio properties.
Based on if_dwqe_fdt.c v1.11 and dwqe.c v1.8.
Tested on a Banana Pi R1 (aka Lamobo R1), which has its dwge interface
connected directly to an ethernet switch.
ok kettenis@
|
|
compatible due to lack of endbr64. Replace the indirect call with a new
hv_hypercall_trampoline() routine which jumps to the hypercall page without any
indirection.
Allows me to boot OpenBSD using Hyper-V on Windows 11 again.
ok guenther@
|
|
establish one more interrupt than would be needed for per-VQ IRQs. This
meant even though there were enough MSI-X vectors available this path could
fail, roll back previously established interrupts and switch to shared IRQs
as a fallback.
ok dv@
|
|
Softdep is a significant impediment to progressing in the vfs layer
so we plan to get it out of the way. It is too clever for us to
continue maintaining as it is.
ok kettenis@ kn@ tobhe@ and most of the g2k23 room except bluhm@
|
|
The corresponding task_add and task_del calls were operating on different
queues by mistake. Background scan tasks should now get cancelled properly
during driver state transitions.
ok mvs@
|
|
holding a spinlock, eg. malloc's malloc_mutex in "Data modified on freelist ..."
triggers "acquiring blockable sleep lock with spinlock or critical section held"
since kpageflttrap() grabs the kernel lock before fault() to serialise multiple
threds/faults avoid interleaved console text.
But fault() immediately sets the per-CPU panic string, so the kernel lock does
not really help here.
Use 'show panic' to recover from garbled console text if need be, as usual.
The i386 equivalent does not use the kernel lock, either.
OK bluhm kettenis
|
|
the VisionFive 2 from OpenBSD.
ok jsing@
|
|
From Min Li
1af1cd7be370b08694d8752c97325fe51fdab6aa in linux-6.1.y/6.1.36
982b173a6c6d9472730c3116051977e05d17c8c5 in mainline linux
|
|
From Tom Chung
9bcac453890bf2c0ab5a7cefb407c0a9d6cbc4cb in linux-6.1.y/6.1.36
ea2062dd1f0384ae1b136d333ee4ced15bedae38 in mainline linux
|
|
From Rodrigo Siqueira
e538342002cbe468224f71b7ae116586e55c1134 in linux-6.1.y/6.1.36
81f743a08f3b214638aa389e252ae5e6c3592e7c in mainline linux
|
|
From Rodrigo Siqueira
8d855bc67630fa2b17855d85de61b9cd4300e3ad in linux-6.1.y/6.1.36
f7511289821ffccc07579406d6ab520aa11049f5 in mainline linux
|
|
still criplled as the SD/MMC controllers only do 32-bit DMA.
ok jsing@
|
|
ok kettenis@
|
|
ok miod@
|
|
This refactoring is another step to make standalone socket buffers
locking. sblock() uses M_WAITOK and M_NOWAIT flags passed as the third
argument together with the SB_NOINTR flag on the `sb_flags' to control
sleep behaviour. To perform uninterruptible acquisition, SB_NOINTR flag
should be set before sblock() call. `sb_flags' modification requires to
hold solock() around sblock()/sbunlock() that makes standalone call
impossible.
Also `sb_flags' modifications outside sblock()/sbunlock() makes
uninterruptible acquisition code huge enough. This time only sorflush()
does this (and forgets to restore SB_NOINTR flag, so shutdown(SHUT_RDWR)
call permanently modifies socket locking behaviour) and this looks not
the big problem. But with the standalone socket buffer locking it will
be many such places, so this huge construction is unwanted.
Introduce new SBL_NOINTR flag passed as third sblock() argument. The
sblock() acquisition will be uninterruptible when existing SB_NOINTR
flag is set on `sb_flags' or SBL_NOINTR was passed.
The M_WAITOK and M_NOWAIT flags belongs to malloc(9). It has no M_NOINTR
flag and there is no reason to introduce it. So for consistency reasons
introduce new SBL_WAIT and use it together with SBL_NOINTR instead of
M_WAITOK and M_NOINTR respectively.
ok bluhm
|
|
ci_mds_tmp needs to be 32-byte aligned, otherwise we trip a CTASSERT
in amd64/cpu.c and break kernel compilation. However, ci_mds_tmp's
32-byte alignment is at risk: the size of schedstate_percpu is about
to change.
Move ci_curproc and ci_schedstate up just after ci_mds_buf. This puts
ci_mds_tmp at page offset 64 with no structs ahead of it in cpu_info.
With this arrangement it should remain 32-byte aligned without much
effort.
With input from guenther@.
ok guenther@
|
|
handler, allocate it ourselves. The firmware doesn't seem to mind
that it's us doing the allocation. This fixes a splassert, because
the code that retrieves the item reaches through the shared memory
driver, which acquires the hardware mutex, which in turn mallocs.
ok kettenis@
|
|
pf_open_trans() can issue for each clone of /dev/pf
to 512. The pf_open_trans() is currently being used
by DIOCGETRULES ioctl(2). The limit avoids processes
to consume all kernel memory by asking DIOCGETRULES
for more tickets. If DIOCGETRULES hits the limit, then
the application will see EBUSY error.
This diff was fine tuned with feedback from cluadio@,
deraadt@ and kn@.
OK kn@
|
|
Pointed out by bluhm.
ok bluhm@
|
|
|
|
Based on an initial diff from jsing@
ok jsing@, patrick@
|
|
ok jsing@, patrick@
|
|
|
|
longer identify function boundaries and as such no kprobes were found anymore.
Adjust the parser accordingly.
ok mpi@
|
|
periodically read rules from pf(4) to consume all kernel
memory. The bug has been discovered and root caused by florian@.
In this particular case it was snmpd(8) what ate all kernel
memory.
This commit introduces DIOCXEND to pf(4) so applications such
as snmpd(8) and systat(1) to close ticket/transaction when
they are done with fetching the rules. This change also
updates snmpd(8) and systat(1) to use newly introduced
DIOCXEND ioctl(2).
OK claudio@, deraadt@, kn@
|
|
|
|
OK jmatthew@
|
|
Based on an initial diff from jsing@
ok jsing@
|
|
kstats. Some of the hardware counters are already used in
bge_stats_update_regs() to update interface counters and work around
hardware bugs, and all counters reset on read, so to keep things simple
the work is split between that function and bge_kstat_read(), which
reads the rest of the counters.
tested by bluhm@ on BCM5720 (with counters) and BCM5704 (without), and
by me on BCM5720 A0 (with counters and hardware bugs)
ok bluhm@ dlg@
|
|
ok kn@ bluhm@
|
|
stolen from jsing@
|
|
|
|
ok sashan@
|
|
ok bru@
|
|
|
|
ok stsp@
|
|
|
|
that the former's text comes from the latter's comments. Rationalize
capitalization, whitespace, and plural-vs-singular. Mark things for
for automation in the future.
Prompted by loss of sync from the addition of M_IFGROUP and M_PF
Previously worked up in discussion with schwarze@ and jmc@
ok deraadt@, miod@, jmc@
|