Age | Commit message (Collapse) | Author |
|
ifq_deq_set_oactive is a variation on ifq_set_oactive that can be
called inside an if_deq_begin "transaction".
afresh@ found de(4) was calling ifq_set_oactive while holding the
ifq mutex via ifq_deq_begin, which led to a panic because ifq_set_oactive
also tries to take the ifq mutex. ifq_deq_set_oactive assumes the
caller is already holding the mutex.
de(4) is confusing, so it seemed simpler to add a small tweak to
ifqs than try and do major surgery on such a hairy driver.
tested by afresh@
|
|
this replaces a hand rolled list that's been here since 1.1.
ok claudio@ kn@ tb@
|
|
after the last rewrite i was showing bpf ip packets, not the pfsync
payload like the PFSYNC DLT expected.
this also lets bpf see packets being processed by pfsync input
handling, so if you want to see only what's being sent you'll need
to filter by direction.
reported by Marc Boisis
|
|
there's no reason to limit tun/tap to small packets.
ok claudio@
|
|
|
|
when bpfsdetach is called by an interface being destroyed, it
iterates over the bpf descriptors using the interface and calls
vdevgone and klist_invalidate against them. however, i'm not sure
the reference the interface holds against the bpf_d is accounted
for properly, so vdevgone might drop it to 0 and free it, which
makes the klist_invalidate a use after free.
avoid this by taking a bpf_d ref before calling vdevgone and
klist_invalidate so the memory can't be freed out from under the
feet of bpfsdetach.
Reported-by: syzbot+b3927f8ad162452a2f39@syzkaller.appspotmail.com
i wasn't able to reproduce whatever syzkaller did. it's possible
this is a double free, but we'll wait and see if it pops up again.
ok mpi@
|
|
userland can request that network packets that are read from or
written to the device special file get prepended with a "tun_hdr"
struct. this struct contains bits which say what offloads are
requested for the packet, including things like ip/tcp/udp/icmp
checksums, tcp segmentation offloads, or ethernet vlan tags.
userland can write a packet with any of these offloads requested
into the kernel at any time, but has to request which ones it's
able to handle coming from the kernel. enabling the tun_hdr struct
and which offloads userland can handle is done with a new TUNSCAP
ioctl.
this is based on the virtio_net_hdr in linux, which jan@ actually
implemented and had working with vmd. however, claudio@ and i
strongly opposed to what feels like a layer violation by pulling
virtio structures into the tun driver, and then trying to emulate
virtio/linux semantics in our network stack, and playing catch up
when the "upstream" projects decide to change the shape or meaning
of these bits. tun_hdr is specific to the openbsd network stack and
it's semantics, which simplifies our kernel implementation. jan has
been pretty gracious about the extra work on the vmd side of things.
tested by and ok jan@
ok claudio@
sthen@ backed this out cos of confusion with the ioctl numbers i
picked to controlling this feature. i've picked new numbers that
don't conflict this time.
|
|
tb@ agrees
|
|
userland can request that network packets that are read from or
written to the device special file get prepended with a "tun_hdr"
struct. this struct contains bits which say what offloads are
requested for the packet, including things like ip/tcp/udp/icmp
checksums, tcp segmentation offloads, or ethernet vlan tags.
userland can write a packet with any of these offloads requested
into the kernel at any time, but has to request which ones it's
able to handle coming from the kernel. enabling the tun_hdr struct
and which offloads userland can handle is done with a new TUNSCAP
ioctl.
this is based on the virtio_net_hdr in linux, which jan@ actually
implemented and had working with vmd. however, claudio@ and i
strongly opposed to what feels like a layer violation by pulling
virtio structures into the tun driver, and then trying to emulate
virtio/linux semantics in our network stack, and playing catch up
when the "upstream" projects decide to change the shape or meaning
of these bits. tun_hdr is specific to the openbsd network stack and
it's semantics, which simplifies our kernel implementation. jan has
been pretty gracious about the extra work on the vmd side of things.
tested by and ok jan@
ok claudio@
|
|
this should let people specify interface and queue bandwidths greater
than ~4Gbit.
this changes the pf ioctls used to specify queues, so if you want
to try this you'll need a new kernel, new headers, and a new pfctl
(and systat). or upgrade using a snapshot. the effort and benefit
of providing compat isn't worth it.
putting it in now so people can kick it around.
|
|
missed when the prototype was removed in ifq.h rev 1.25
ok dlg@
|
|
with TTL field to zero. To fix it function pf_test_state_icmp()
must initialize ttl field in pf_pdesc structure for inner packet.
feedback from bluhm@
OK bluhm@
|
|
|
|
|
|
. Use m_align() to ensure that mbufs are packed towards the end so that
additional headers don't require costly m_prepends.
. Stop using m_copyback(), the way it was used there was actually wrong,
instead just use memcpy since this is just a single mbuf.
. Kill all usage of m_calchdrlen(), again this is not needed or can simply
be m->m_pkthdr.len = m->m_len since all this code uses a single buffer.
. In wg_encap() remove the min() with t->t_mtu when calculating plaintext_len
and out_len. The code does not correctly cope with this min() at all with
severe consequences.
Initial diff by dhill@ who found the m_prepend() issue.
Tested by various people.
OK dhill@ mvs@ bluhm@ sthen@
|
|
from macro copy-paste. No functional changes.
ok mpi dlg
|
|
before it was using 256000000 things per second, so this isn't a
huge change, but it can use nsecuptime() to get the time.
kjc and cheloa like it
ok claudio@
|
|
ok claudio@
|
|
|
|
|
|
tun_init turns interface/stack config into a set of flags that
tun(4) keeps in tun_softc sc_flags, but never uses.
ok miod@ kn@
|
|
netintro says it's deprecated, and most of our other drivers are
doing fine without it.
ok miod@ kn@ patrick@
|
|
unused since "rewrite to merge arp and routing tables"
in CSRG if_ether.c 7.14 (Berkeley) 06/25/91
used by SIOCSARP, SIOCGARP, SIOCDARP, OSIOCGARP ioctls in Net/2
which were removed before 4.4BSD-Lite
ok sthen@ who tested this with a ports build
|
|
|
|
|
|
|
|
|
|
historically there was just tun(4) that supported both layer 3 p2p
and ethernet modes, but had to be reconfigured at runtime by userland
to properly change the interface type and interface flags. this is
obviously not a great idea, mostly because a lot of stack behaviour
around address management makes assumptions based on these parameters,
and changing them at runtime confuses things.
splitting tun so ethernet was handled by a specific tap(4) driver
was a first step at locking this down. this takes a further step
by restricting userlands ability to reconfigure the interface flags,
specifically IFF_BROADCAST, IFF_MULTICAST, and IFF_POINTOPOINT.
this change lets userland pass those values via the ioctls, but
only if they match the current set of flags on the interface. these
flags are set appropriate for the type of interface when it's
created, but should not be changed afterwards.
nothing in base uses these ioctls, so the only fall out will be
from ports doing weird things.
ok claudio@ kn@
|
|
ok mvs
|
|
|
|
net/if_pppx.c is the only place where `if_description' accessed outside
ifioctl() path and there is no reason to take netlock here. SIOCSIFDESCR
case of ifioctl() modifies `if_description' with the only kernel lock.
ok bluhm
|
|
ok sashan@
|
|
Input and ok jmc@, jsg@
|
|
|
|
you can basically plug rdomains together and route between them
over rport interfaces. people keep asking me if this is so you can
leak routes between rdomains, and the answer is yes.
this is like pair(4) but cheaper because it avoids all the mucking
around with putting an ethernet header on the mbuf just to take it
off again later, and is more efficient with address space because
it's a p2p ip interface.
it has a small tweak from mvs@
ok denis@ claudio@
|
|
ok sashan
|
|
- ETHERIPCTL_ALLOW - atomically accessed integer;
- ETHERIPCTL_STATS - per-CPU counters
ok bluhm
|
|
in the same routing domain
Input and OK claudio@
|
|
from Matthew Luckie <mjl@luckie.org.nz> via tech@
deraadt@ likes it.
|
|
Both NET_BPF_MAXBUFSIZE and NET_BPF_BUFSIZE (`bpf_maxbufsize' and
`bpf_bufsize' respectively) are atomically accessed integers. No locks
required to modify them.
ok bluhm
|
|
ip_directedbcast is read once in either ip_input() or pf_test()
during packet processing. So writing the variable does not need
net lock.
OK mvs@
|
|
this is avoids passing excessively large values to timeout_add_nsec.
Reported-by: syzbot+f650785d4f2b3fe28284@syzkaller.appspotmail.com
|
|
Sending IPv6 fragments over a bridge with pf did not work. During
input pf reassembles the packet, and at bridge output it should be
refragmented. This is only done for PF_FWD direction, but bridge(4)
and veb(4) called pf_test() with PF_OUT argument.
OK sashan@
|
|
ok mvs
|
|
ok mvs
|
|
of updating it blindly.
ok mvs
|
|
incoming SADB_ADD and SADB_UPDATE message. Since we send them as part of
the SADB_GET reply we must also accept them on SADB_ADD/UPDATE as sasyncd
will forward payloads previously received in SADB_GET. Fixes a bug where
sasync can't restore SAs because pfkey returns EINVAL.
From Rafa\xc5\x82 Ramocki
ok bluhm@
|
|
from markus@
|
|
Use atomic operations to read ip6_forwarding while processing packets
in the network stack.
To make clear where actually the router property is needed, use the
i_am_router variable based on ip6_forwarding. It already existed
in nd6_nbr. Move i_am_router setting up the call stack until all
users are independent.
The forwarding decisions in pf_test, pf_refragment6, ip6_input do
also not interfere.
Use a new array ipv6ctl_vars_unlocked to make transition of all the
integer sysctls easier. Adapt IPv4 to the new style.
OK mvs@
|
|
udp_send() and following udp{,6}_output() do not append packets to
`so_snd' socket buffer. This mean the sosend() and sosplice() sending
paths are dummy pru_send() and there is no problems to simultaneously
run them on the same socket.
Push shared solock() deep down to sesend() and take it only around
pru_send(), but keep somove() running unedr exclusive solock(). Since
sosend() doesn't modify `so_snd' the unlocked `so_snd' space checks
within somove() are safe. Corresponding `sb_state' and `sb_flags'
modifications are protected by `sb_mtx' mutex(9).
Tested and OK bluhm.
|