summaryrefslogtreecommitdiff
path: root/sys/net
AgeCommit message (Collapse)Author
4 daysprovide ifq_deq_set_oactive.David Gwynne
ifq_deq_set_oactive is a variation on ifq_set_oactive that can be called inside an if_deq_begin "transaction". afresh@ found de(4) was calling ifq_set_oactive while holding the ifq mutex via ifq_deq_begin, which led to a panic because ifq_set_oactive also tries to take the ifq mutex. ifq_deq_set_oactive assumes the caller is already holding the mutex. de(4) is confusing, so it seemed simpler to add a small tweak to ifqs than try and do major surgery on such a hairy driver. tested by afresh@
4 daysuse a tailq for the global list of bpf_if structs.David Gwynne
this replaces a hand rolled list that's been here since 1.1. ok claudio@ kn@ tb@
5 daysfix tcpdump on pfsync interfaces.David Gwynne
after the last rewrite i was showing bpf ip packets, not the pfsync payload like the PFSYNC DLT expected. this also lets bpf see packets being processed by pfsync input handling, so if you want to see only what's being sent you'll need to filter by direction. reported by Marc Boisis
6 daysbump the "mru" up to MAXMCLBYTES.David Gwynne
there's no reason to limit tun/tap to small packets. ok claudio@
6 daysinclude tun_hdr in the length reported by FIONREAD and kq if it's enabled.David Gwynne
6 daysmake sure bpfsdetach is holding a bpf_d ref when invalidating stuff.David Gwynne
when bpfsdetach is called by an interface being destroyed, it iterates over the bpf descriptors using the interface and calls vdevgone and klist_invalidate against them. however, i'm not sure the reference the interface holds against the bpf_d is accounted for properly, so vdevgone might drop it to 0 and free it, which makes the klist_invalidate a use after free. avoid this by taking a bpf_d ref before calling vdevgone and klist_invalidate so the memory can't be freed out from under the feet of bpfsdetach. Reported-by: syzbot+b3927f8ad162452a2f39@syzkaller.appspotmail.com i wasn't able to reproduce whatever syzkaller did. it's possible this is a double free, but we'll wait and see if it pops up again. ok mpi@
7 daysprovide network offloads between the kernel and userland againDavid Gwynne
userland can request that network packets that are read from or written to the device special file get prepended with a "tun_hdr" struct. this struct contains bits which say what offloads are requested for the packet, including things like ip/tcp/udp/icmp checksums, tcp segmentation offloads, or ethernet vlan tags. userland can write a packet with any of these offloads requested into the kernel at any time, but has to request which ones it's able to handle coming from the kernel. enabling the tun_hdr struct and which offloads userland can handle is done with a new TUNSCAP ioctl. this is based on the virtio_net_hdr in linux, which jan@ actually implemented and had working with vmd. however, claudio@ and i strongly opposed to what feels like a layer violation by pulling virtio structures into the tun driver, and then trying to emulate virtio/linux semantics in our network stack, and playing catch up when the "upstream" projects decide to change the shape or meaning of these bits. tun_hdr is specific to the openbsd network stack and it's semantics, which simplifies our kernel implementation. jan has been pretty gracious about the extra work on the vmd side of things. tested by and ok jan@ ok claudio@ sthen@ backed this out cos of confusion with the ioctl numbers i picked to controlling this feature. i've picked new numbers that don't conflict this time.
9 daysrevert tun(4) changes for now, breaks in kdump build (TUNSCAP/TIOCEXT clash)Stuart Henderson
tb@ agrees
10 daysprovide a way to negotiate network offloads between the kernel and userland.David Gwynne
userland can request that network packets that are read from or written to the device special file get prepended with a "tun_hdr" struct. this struct contains bits which say what offloads are requested for the packet, including things like ip/tcp/udp/icmp checksums, tcp segmentation offloads, or ethernet vlan tags. userland can write a packet with any of these offloads requested into the kernel at any time, but has to request which ones it's able to handle coming from the kernel. enabling the tun_hdr struct and which offloads userland can handle is done with a new TUNSCAP ioctl. this is based on the virtio_net_hdr in linux, which jan@ actually implemented and had working with vmd. however, claudio@ and i strongly opposed to what feels like a layer violation by pulling virtio structures into the tun driver, and then trying to emulate virtio/linux semantics in our network stack, and playing catch up when the "upstream" projects decide to change the shape or meaning of these bits. tun_hdr is specific to the openbsd network stack and it's semantics, which simplifies our kernel implementation. jan has been pretty gracious about the extra work on the vmd side of things. tested by and ok jan@ ok claudio@
12 daysbump the type used to specify traffic queue bandwidth to 64bit.David Gwynne
this should let people specify interface and queue bandwidths greater than ~4Gbit. this changes the pf ioctls used to specify queues, so if you want to try this you'll need a new kernel, new headers, and a new pfctl (and systat). or upgrade using a snapshot. the effort and benefit of providing compat isn't worth it. putting it in now so people can kick it around.
2024-11-09remove unused ifq_is_serialized()Jonathan Gray
missed when the prototype was removed in ifq.h rev 1.25 ok dlg@
2024-11-08pf(4) when doing af-to translation for ICMP protocol sends packetsAlexandr Nedvedicky
with TTL field to zero. To fix it function pf_test_state_icmp() must initialize ttl field in pf_pdesc structure for inner packet. feedback from bluhm@ OK bluhm@
2024-11-04remove unused inline function; ok dlg@Jonathan Gray
2024-11-01remove unused local variableJonathan Gray
2024-10-31Rewrite mbuf handling in wg(4).Claudio Jeker
. Use m_align() to ensure that mbufs are packed towards the end so that additional headers don't require costly m_prepends. . Stop using m_copyback(), the way it was used there was actually wrong, instead just use memcpy since this is just a single mbuf. . Kill all usage of m_calchdrlen(), again this is not needed or can simply be m->m_pkthdr.len = m->m_len since all this code uses a single buffer. . In wg_encap() remove the min() with t->t_mtu when calculating plaintext_len and out_len. The code does not correctly cope with this min() at all with severe consequences. Initial diff by dhill@ who found the m_prepend() issue. Tested by various people. OK dhill@ mvs@ bluhm@ sthen@
2024-10-31Drop forgotten backslashes within vxlan_input(). Seems they are stalledVitaliy Makkoveev
from macro copy-paste. No functional changes. ok mpi dlg
2024-10-29move hfsc to using nanoseconds for keeping times.David Gwynne
before it was using 256000000 things per second, so this isn't a huge change, but it can use nsecuptime() to get the time. kjc and cheloa like it ok claudio@
2024-10-29use nsecuptime instead of using nanouptime and doing a bunch of maths.David Gwynne
ok claudio@
2024-10-22correct argument to klist_free(); ok visa@ mvs@Jonathan Gray
2024-10-17remove unneeded if_wg.h and pfsync.h includesJonathan Gray
2024-10-16cut tun_init() out, it does pointless work.David Gwynne
tun_init turns interface/stack config into a set of flags that tun(4) keeps in tun_softc sc_flags, but never uses. ok miod@ kn@
2024-10-16remove SIOCSIFDSTADDR from the network ioctls.David Gwynne
netintro says it's deprecated, and most of our other drivers are doing fine without it. ok miod@ kn@ patrick@
2024-10-15remove struct arpreq from net/if_arp.hJonathan Gray
unused since "rewrite to merge arp and routing tables" in CSRG if_ether.c 7.14 (Berkeley) 06/25/91 used by SIOCSARP, SIOCGARP, SIOCDARP, OSIOCGARP ioctls in Net/2 which were removed before 4.4BSD-Lite ok sthen@ who tested this with a ports build
2024-10-13remove unneeded limits.h and errno.h includesJonathan Gray
2024-10-12remove unneeded rwlock.h includeJonathan Gray
2024-10-12remove unneeded time.h includeJonathan Gray
2024-10-12remove unneeded percpu.h includeJonathan Gray
2024-10-10neuter the tun/tap ioctls that try and modify interface flags.David Gwynne
historically there was just tun(4) that supported both layer 3 p2p and ethernet modes, but had to be reconfigured at runtime by userland to properly change the interface type and interface flags. this is obviously not a great idea, mostly because a lot of stack behaviour around address management makes assumptions based on these parameters, and changing them at runtime confuses things. splitting tun so ethernet was handled by a specific tap(4) driver was a first step at locking this down. this takes a further step by restricting userlands ability to reconfigure the interface flags, specifically IFF_BROADCAST, IFF_MULTICAST, and IFF_POINTOPOINT. this change lets userland pass those values via the ioctls, but only if they match the current set of flags on the interface. these flags are set appropriate for the type of interface when it's created, but should not be changed afterwards. nothing in base uses these ioctls, so the only fall out will be from ports doing weird things. ok claudio@ kn@
2024-09-27Previous pipex.c,v 1.155 was broken if the client was not behind a NAT.YASUOKA Masahiko
ok mvs
2024-09-20remove unneeded semicolons; checked by millert@Jonathan Gray
2024-09-09Don't take netlock while setting `if_description'.Vitaliy Makkoveev
net/if_pppx.c is the only place where `if_description' accessed outside ifioctl() path and there is no reason to take netlock here. SIOCSIFDESCR case of ifioctl() modifies `if_description' with the only kernel lock. ok bluhm
2024-09-07fix RBT_ENTRY in pf_state and pf_state_keyaisha
ok sashan@
2024-09-04Fix some spelling.Marcus Glocker
Input and ok jmc@, jsg@
2024-09-01spelling; checked by jmc@, ok miod@ mglocker@ krw@Jonathan Gray
2024-08-31add rport(4) for p2p l3 connectivity between route domains.David Gwynne
you can basically plug rdomains together and route between them over rport interfaces. people keep asking me if this is so you can leak routes between rdomains, and the answer is yes. this is like pair(4) but cheaper because it avoids all the mucking around with putting an ethernet header on the mbuf just to take it off again later, and is more efficient with address space because it's a p2p ip interface. it has a small tweak from mvs@ ok denis@ claudio@
2024-08-27remove some dead code that wasn't cleaned upaisha
ok sashan
2024-08-20Unlock etherip_sysctl().Vitaliy Makkoveev
- ETHERIPCTL_ALLOW - atomically accessed integer; - ETHERIPCTL_STATS - per-CPU counters ok bluhm
2024-08-17Allow PPP interface to run in an rdomain and get a default route installed ↵Denis Fondras
in the same routing domain Input and OK claudio@
2024-08-15add BIOCSETFNR, which is like BIOCSETF but doesnt reset the buffer or stats.David Gwynne
from Matthew Luckie <mjl@luckie.org.nz> via tech@ deraadt@ likes it.
2024-08-12Prepare bpf_sysctl() for upcoming net_sysctl() unlocking.Vitaliy Makkoveev
Both NET_BPF_MAXBUFSIZE and NET_BPF_BUFSIZE (`bpf_maxbufsize' and `bpf_bufsize' respectively) are atomically accessed integers. No locks required to modify them. ok bluhm
2024-08-06Unlock sysctl net.inet.ip.directed-broadcast.Alexander Bluhm
ip_directedbcast is read once in either ip_input() or pf_test() during packet processing. So writing the variable does not need net lock. OK mvs@
2024-08-05restrict the maximum wait time you can set via BIOCSWTIMEOUT to 5 minutes.David Gwynne
this is avoids passing excessively large values to timeout_add_nsec. Reported-by: syzbot+f650785d4f2b3fe28284@syzkaller.appspotmail.com
2024-08-05Fix bridging IPv6 fragments with pf reassembly.Alexander Bluhm
Sending IPv6 fragments over a bridge with pf did not work. During input pf reassembles the packet, and at bridge output it should be refragmented. This is only done for PF_FWD direction, but bridge(4) and veb(4) called pf_test() with PF_OUT argument. OK sashan@
2024-07-30Exports the statistics when PIPEXDSESSION. Found by ymatsui at iij.YASUOKA Masahiko
ok mvs
2024-07-26Mark ipsecflowinfo immutable.YASUOKA Masahiko
ok mvs
2024-07-26In pipex_l2tp_input(), check if ipsecflowinfo is not changed insteadYASUOKA Masahiko
of updating it blindly. ok mvs
2024-07-23Accept and ignore SADB_X_EXT_REPLAY and SADB_X_EXT_COUNTER payloads forTobias Heider
incoming SADB_ADD and SADB_UPDATE message. Since we send them as part of the SADB_GET reply we must also accept them on SADB_ADD/UPDATE as sasyncd will forward payloads previously received in SADB_GET. Fixes a bug where sasync can't restore SAs because pfkey returns EINVAL. From Rafa\xc5\x82 Ramocki ok bluhm@
2024-07-18In pfattach() pass malloc type instead of flags to cpumem_malloc().Alexander Bluhm
from markus@
2024-07-14Unlock IPv6 sysctl net.inet6.ip6.forwarding from net lock.Alexander Bluhm
Use atomic operations to read ip6_forwarding while processing packets in the network stack. To make clear where actually the router property is needed, use the i_am_router variable based on ip6_forwarding. It already existed in nd6_nbr. Move i_am_router setting up the call stack until all users are independent. The forwarding decisions in pf_test, pf_refragment6, ip6_input do also not interfere. Use a new array ipv6ctl_vars_unlocked to make transition of all the integer sysctls easier. Adapt IPv4 to the new style. OK mvs@
2024-07-12Switch `so_snd' of udp(4) sockets to the new locking scheme.Vitaliy Makkoveev
udp_send() and following udp{,6}_output() do not append packets to `so_snd' socket buffer. This mean the sosend() and sosplice() sending paths are dummy pru_send() and there is no problems to simultaneously run them on the same socket. Push shared solock() deep down to sesend() and take it only around pru_send(), but keep somove() running unedr exclusive solock(). Since sosend() doesn't modify `so_snd' the unlocked `so_snd' space checks within somove() are safe. Corresponding `sb_state' and `sb_flags' modifications are protected by `sb_mtx' mutex(9). Tested and OK bluhm.