Age | Commit message (Collapse) | Author |
|
if_detach() will do this.
ok kn@
|
|
and were kept only for backward compatibility reasons.
ok mpi@ yasuoka@
|
|
bpf_catchpacket had a chunk to deal with reader timeouts, but that
has largely been moved to bpfread. the vestigal code that was left
still tried to wake up a reader when a buffer got full, but there
already is a chunk of code that wakes up readers when the buffer
gets full.
bpf_wakeup now checks for readers before calling wakeup directly,
rather than pushing the wakeup to a task and calling it unconditionally.
the task_add is now only done when the bpfdesc actually has something
that needs it.
ok visa@
|
|
Change bd_rtout to a uint64_t of nanoseconds. Update the code in
bpfioctl() and bpfread() accordingly.
Add a local copy of nsecuptime() to make the diff smaller. This will
need to move to kern_tc.c if/when we have another user elsewhere in
the kernel.
Prompted by mpi@. With input from dlg@.
ok dlg@ mpi@ visa@
|
|
ok claudio@ kn@
|
|
local variable. This argument was always passed as 0.
ok kn@
|
|
enc(4) does not use the ifqueue API at all; IPsec packets are directly
transformed in the IP input/output routines.
enc_start() is never called (by design) so remove it for clarity.
OK mpi
|
|
bd_rdStart is strange. It nominally represents the start of a read(2)
on a given bpf(4) descriptor, but there are several problems with it:
1. If there are multiple readers, the bd_rdStart is not set by subsequent
readers, so their timeout is screwed up. The read timeout should really
be tracked on a per-thread basis in bpfread().
2. We set bd_rdStart for poll(2), select(2), and kevent(2), even though
that makes no sense. We should not be setting bd_rdStart in bpfpoll()
or bpfkqfilter().
3. bd_rdStart is buggy. If ticks is 0 when the read starts then
bpf_catchpacket() won't wake up the reader. This is a problem
inherent to the design of bd_rdStart: it serves as both a boolean
and a scalar value, even though 0 is a valid value in the scalar
range.
So let's replace it with a better struct member. "bd_nreaders" is a
count of threads sleeping in bpfread(). It is incremented before a
thread goes to sleep in bpfread() and decremented when a thread wakes
up. If bd_nreaders is greater than zero when we reach bpf_catchpacket()
and fbuf is non-NULL we wake up all readers.
The read timeout, if any, is now tracked locally by the thread in
bpfread().
Unlike bd_rdStart, bpfpoll() and bpfkqfilter() don't touch
bd_nreaders.
Prompted by mpi@. Basic idea from dlg@. Lots of input from dlg@.
Tested by dlg@ with tcpdump(8) (blocking read) and flow-collector
(https://github.com/eait-itig/flow-collector, non-blocking read).
ok dlg@
|
|
Rename klist_{insert,remove}() to klist_{insert,remove}_locked().
These functions assume that the caller has locked the klist. The current
state of locking remains intact because the kernel lock is still used
with all klists.
Add new functions klist_insert() and klist_remove() that lock the klist
internally. This allows some code simplification.
OK mpi@
|
|
Ranges where the left boundary is bigger than the right one are always bogus
as they work like `port any' (`port 34<>12' means "all ports") or in way
that inverts the rule's action (`pass ... port 34:12' means "pass no port at
all").
Add checks for all ranges and invalidate those that yield no or all ports.
For this to work on redirections, make pfctl(8) pass the range's type,
otherwise boundary including ranges are not detected as such; that is to
say, `struct pf_pool's `port_op' member was unused in the kernel so far.
`rdr-to' rules with invalid ranges could panic the kernel when hit.
Reported-by: syzbot+9c309db201f06e39a8ba@syzkaller.appspotmail.com
OK sashan
|
|
OK mpi@
|
|
this is to avoid a timestamp being used on the way out of the stack
(eg, in bpf), or if it reenters the stack (eg, if it goes between
rdomains with pair(4)).
|
|
destination address and their netmasks match, otherwise return EINVAL.
ok bluhm@ patrick@
|
|
OK bluhm@
|
|
OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@
|
|
the latter is too clever, and nothing else does it.
|
|
this is so _bpf_mtap can look at the mbuf with packet headers on
it so it can fill in more stuff in the bpf_hdr struct.
ive been running this in production for most of a month now and
it's working well.
|
|
|
|
Reading and writing bd_rtout is not an atomic operation, so it needs
to be done under the per-descriptor mutex.
While here, start annotating locking in bpfdesc.h. There's lots more
to do on this front, but you have to start somewhere.
Tweaked by mpi@.
ok mpi@
|
|
putting packets into random buckets means packets in a flow/connection
will be reorderd. pf assigns a flowid if it's enabled, and you need
pf to configure code, so it's reasonable to assume that most packets
will have a flowid. using bucket 0 like this is what we do in most
other places that bin packets with the flowid.
|
|
this "fixes" TCP going over an interface with fq codel enabled. the
way the codel code classifies a packet without a flowid set is to
randomly assign it to a bucket. this in turn means that packets
will get reordered, and tcp hates that.
sthen was able to find a test case and narrow down at which time
the problem appeared, helped greatly.
tested by sthen@ and millert@
ok sashan@ jmatthew@
|
|
Fixed up a reference to gre_wccp where a fixed value from wwcp
standard was intended.
ok gkoehler@
|
|
|
|
issue noticed by sthen@. fix discussed with bluhm@ and procter@
OK bluhm@, kn@, procter@
|
|
This was triggering a WITNESS detection
witness: lock_object uninitialized: 0xffff800000bcf0d8
Starting stack trace...
witness_checkorder(ffff800000bcf0d8,9,0) at witness_checkorder+0xab
rw_enter_write(ffff800000bcf0c8) at rw_enter_write+0x43
noise_remote_decrypt(ffff800000bcea48,c4992785,0,fffffd80073c89bc,10) at noise_remote_decrypt+0x135
wg_decap(ffff80000054a000,fffffd805f53ac00) at wg_decap+0xda
wg_decap_worker(ffff80000054a000) at wg_decap_worker+0x7a
taskq_thread(ffff80000012d900) at taskq_thread+0x9f
alternating between two lock objects. From Matt Dunwoodie, thanks semarie@
for explanations about witness and looking at the code.
|
|
ok denis@, jmatthew@
|
|
- Move most of the processing out of rtable.c (reasonnable tb@, ok bluhm@)
- Remove memory allocation, store pointer to existing ifaddr
- Fix tunnel interface handling
looks fine mpi@
|
|
Tested with multiple Window 10 Pro (ver 2004) clients, and OpenBSD+iked
as the server.
OK tobhe@ sthen@ kn@
|
|
Advised by bluhm@
|
|
Unlike the other cases of sysctl_bounded_arr this one uses a dynamic limit.
OK millert@
|
|
this helps nvgre follow things like carp masters changing on the
inside of the virtual network.
"makes sense" jmatthew@
|
|
fixes a "noise_keypair: lock not held" panic observed by Caspar Sc
hutijser
from Jason A. Donenfeld
|
|
Based/previous work on an idea from deraadt@
Input from claudio@, djm@, deraadt@, sthen@
OK deraadt@
|
|
Reported-by: syzbot+b9af9c29ed1a6dabda25@syzkaller.appspotmail.com
OK anton@
|
|
file as part of tcpdump(8). Unbreaks the tree.
ok deraadt@
|
|
OK mpi
|
|
Used a different variable to not shadow `entry' allocated before grabbing
the lock.
|
|
outside of NET_LOCK()/PF_LOCK() scope in easy spots.
OK kn@
|
|
have to pull in <sys/param.h>
ok kn@ sashan@ deraadt@
|
|
|
|
if_clone_{create,destroy}(). This fixes the races described below.
if_clone_{create,destroy}() are kernel locked, but since they touch
various sleep points introduced by rwlocks and M_WAITOK allocations,
without serialization they can intersect due to race condition.
The avoided races are:
1. While performing if_clone_create(), concurrent thread which performing
if_clone_create() can attach `ifp' with the same `if_xname' and made
inconsistent `if_list' where all attached interfaces linked.
2. While performing if_clone_create(), concurrent thread which performing
if_clone_destroy() can kill this incomplete `ifp'.
3. While performing if_clone_destroy(), concurrent thread which performing
if_clone_destroy() can kill this dying `ifp'.
ok claudio@ kn@ mpi@ sashan@
|
|
unused by the rule. So skip the rest of the check in that case.
Fixes rulest loading for semarie@
OK semarie@
|
|
Unlike "... rtable N", pf.conf(5)'s "on rdomain N" does not alter packet
state and will always work no matter if rdomain N currently exists or not,
i.e. the rule "pass on rdomain 42" will simply match (and pass) packets if
rdomain 42 exists, and it will simply not match (neither pass nor block)
packets if 42 does not exist.
There's no need to reload the ruleset whenever routing domains are created
or deleted, which can already be observed now by creating an rdomain,
loading rules referencing it and deleting the same rdomain immediately
afterwards: pf will continue to work as expected.
Relax both pfctl(8)'s parser check as well as pf(4)'s copyin routine to
accept any valid routing domain ID without expecting it to exist at the time
of ruleset creation - this lifts the requirement to create rdomains before
referencing them in pf.conf while keeping pf behaviour unchanged.
Prompted by yasuoka's recent pfctl parse.y r1.702 commit requiring an rtable
to exist upon ruleset creation.
Discussed with claudio and bluhm at k2k20.
Feedback sashan
OK sashan yasouka claudio
|
|
|
|
ok deraadt@ claudio@
|
|
"Correct" by deraadt@
|
|
ok mpi@
|
|
pppx_if_qstart() and pppac_qstart() with netlock held. Otherwise we can't
be sure about netlock status while performing these handlers.
Problem reported by Glen Faustino.
ok yasuoka@
|
|
Pretty much all members are under the net lock, some are proctected by
both net and kernel lock, e.g. the start routine is called with
KERNEL_LOCK().
OK mpi
|
|
There is no reason to change flags on member interfaces when removing
them, aggr(4) does not pull its members down either.
OK florian bluhm
|