summaryrefslogtreecommitdiff
path: root/sys/net
AgeCommit message (Collapse)Author
2024-07-30Exports the statistics when PIPEXDSESSION. Found by ymatsui at iij.YASUOKA Masahiko
ok mvs
2024-07-26Mark ipsecflowinfo immutable.YASUOKA Masahiko
ok mvs
2024-07-26In pipex_l2tp_input(), check if ipsecflowinfo is not changed insteadYASUOKA Masahiko
of updating it blindly. ok mvs
2024-07-23Accept and ignore SADB_X_EXT_REPLAY and SADB_X_EXT_COUNTER payloads forTobias Heider
incoming SADB_ADD and SADB_UPDATE message. Since we send them as part of the SADB_GET reply we must also accept them on SADB_ADD/UPDATE as sasyncd will forward payloads previously received in SADB_GET. Fixes a bug where sasync can't restore SAs because pfkey returns EINVAL. From Rafa\xc5\x82 Ramocki ok bluhm@
2024-07-18In pfattach() pass malloc type instead of flags to cpumem_malloc().Alexander Bluhm
from markus@
2024-07-14Unlock IPv6 sysctl net.inet6.ip6.forwarding from net lock.Alexander Bluhm
Use atomic operations to read ip6_forwarding while processing packets in the network stack. To make clear where actually the router property is needed, use the i_am_router variable based on ip6_forwarding. It already existed in nd6_nbr. Move i_am_router setting up the call stack until all users are independent. The forwarding decisions in pf_test, pf_refragment6, ip6_input do also not interfere. Use a new array ipv6ctl_vars_unlocked to make transition of all the integer sysctls easier. Adapt IPv4 to the new style. OK mvs@
2024-07-12Switch `so_snd' of udp(4) sockets to the new locking scheme.Vitaliy Makkoveev
udp_send() and following udp{,6}_output() do not append packets to `so_snd' socket buffer. This mean the sosend() and sosplice() sending paths are dummy pru_send() and there is no problems to simultaneously run them on the same socket. Push shared solock() deep down to sesend() and take it only around pru_send(), but keep somove() running unedr exclusive solock(). Since sosend() doesn't modify `so_snd' the unlocked `so_snd' space checks within somove() are safe. Corresponding `sb_state' and `sb_flags' modifications are protected by `sb_mtx' mutex(9). Tested and OK bluhm.
2024-07-12Run sysctl net.inet.ip.forwarding without net lock.Alexander Bluhm
The places in packet processing where ip_forwarding is evaluated have been consolidated. The remaining pieces in pf test, ip input, and icmp input do not need consistent information. If the integer value is changed by another CPU, it is harmless. The sysctl syscall sets the value atomically, so add atomic read in network processing and remove the net lock in sysctl IPCTL_FORWARDING. OK claudio@ mvs@
2024-07-04Implement IPv6 forwarding IPsec only.Alexander Bluhm
IPsec gateways set the forwarding sysctl to 2. While this worked for IPv4 since a long time, adapt this feature for IPv6 now. Set sysctl net.inet6.ip6.forwarding=2 to forward only packets that have been processed by IPsec. Set IPV6_FORWARDING_IPSEC in ip6_input() and pass the flag down to the call stack. This provides consistent view on global variable ip6_forwarding. In ip6_output() or ip6_forward() drop packets that do not match the policy. OK denis@
2024-07-02Read IPsec forwarding information once.Alexander Bluhm
Fix MP race between reading ip_forwarding in ip_input() and checking ip_forwarding == 2 in ip_output(). In theory ip_forwarding could be 2 during ip_input() and later 0 in ip_output(). Then a packet would be forwarded that was never allowed. Currently exclusive netlock in sysctl(2) prevents all races. Introduce IP_FORWARDING_IPSEC and pass it with the flags parameter that was introduced for IP_FORWARDING. Instead of calling m_tag_find(), traversing the list, and comparing with NULL, just check the PACKET_TAG_IPSEC_IN_DONE bit. Reading ipsec_in_use in ip_output() is a performance hack that is not necessary. New code only checks tree bits. OK mvs@
2024-06-26return type on a dedicated line when declaring functionsJonathan Gray
ok mglocker@
2024-06-22remove space between function names and argument listJonathan Gray
2024-06-21My earlier commit [1.1169 of pf.c (2023/01/05)] makes pf(4) to report wrongAlexandr Nedvedicky
rule and anchor number when packet matches rule found and anchor depth 2 and more. The issue has been noticed and reported by Giannis Kapetanakis (billias _at_ edu.physics.uoc.gr), who also co-developed and tested the final fix presented in this commit. To fix the issue pf(4) must also remember the anchor where matching rule belongs while rules are traversed to find a match for given packet. The information on anchor is now kept in anchor stack frame.w OK sthen@
2024-06-20Read IPv6 forwarding value only once while processing a packet.Alexander Bluhm
IPv4 uses IP_FORWARDING to pass down a consistent value of net.inet.ip.forwarding down the stack. This is needed for unlocking sysctl. Do the same for IPv6. Read ip6_forwarding once in ip6_input_if() and pass down IPV6_FORWARDING as flags to ip6_ours(), ip6_hbhchcheck(), ip6_forward(). Replace the srcrt value with IPV6_REDIRECT flag for consistency with IPv4. To have common syntax with IPv4, use ip6_forwarding == 0 checks instead of !ip6_forwarding. This will also make it easier to implement net.inet6.ip6.forwarding=2 for IPsec only forwarding later. In nd6_ns_input() and nd6_na_input() read ip6_forwarding once and store it in i_am_router. The variable name has been chosen to avoid confusion with is_router, which indicates router flag of the packet. Reading of ip6_forwarding is done independently from ip6_input_if(), consistency does not really matter. One is for ND router behavior the other for forwarding. Again use the ip6_forwarding != 0 check, so when ip6_forwarding IPsec only value 2 gets implemented, it will behave like a router. OK deraadt@ sashan@ florian@ claudio@
2024-06-14Switch AF_ROUTE sockets to the new locking scheme.Vitaliy Makkoveev
At sockets layer only mark buffers as SB_MTXLOCK. At PCB layer only protect `so_rcv' with corresponding `sb_mtx' mutex(9). SS_ISCONNECTED and SS_CANTRCVMORE bits are redundant for AF_ROUTE sockets. Since SS_CANTRCVMORE modifications performed with both solock() and `sb_mtx' held, the 'unlocked' SS_CANTRCVMORE check in rtm_senddesync() is safe. ok bluhm
2024-06-09Introduce IFCAP_VLAN_HWOFFLOAD for vio(4).Jan Klemkow
Add IFCAP_VLAN_HWOFFLOAD to signal hardware like vio(4) can handle checksum or TSO offloading with inline VLAN tags. tested by Mark Patruck, sf@ and bluhm@ ok sf@ and bluhm@
2024-06-07Read IP forwarding variables only once.Alexander Bluhm
Do not assume that ip_forwarding and ip_directedbcast cannot change while processing one packet. Read it once and pass down its value with a flag. This is necessary for unlocking the sysctl path. There are a few places where a consistent value does not really matter, they are unchanged. Use a proper ip_ prefix for the global variable. OK claudio@
2024-06-07remove ph_ppp_proto define, unused since rev 1.123Jonathan Gray
2024-05-29remove prototypes with no matching functionJonathan Gray
2024-05-24pfsync must let to progress state for destination peerAlexandr Nedvedicky
The issue has been noticed by matthieu@ when he was chasing cause of excessive pfsync traffic between firewall boxes. When comparing content of state tables between primary and backup firewall the backup firewall showed many states as follows: ESTABLISHED:SYN_SENT FIN_WAIT_2:SYN_SENT * :SYN_SENT this is caused by pfsync_upd_tcp() which fails to update TCP-state for destination connection peer, so it remains stuck in SYN_SENT. matthieu@ confirms diff helps with 'stuck-state'. It also seems to help with excessive pfsync traffic. ok @dlg
2024-05-17Switch AF_KEY sockets to the new locking scheme.Vitaliy Makkoveev
The simplest case. Nothing to change in sockets layer, only set SB_MTXLOCK on socket buffers. ok bluhm
2024-05-17Fix uninitialized memory access in pfkeyv2_sysctl().Vitaliy Makkoveev
pfkeyv2_sysctl() reads the SA type from uninitialized memory if it is not provided by the caller of sysctl(2) because of a missing length check. From Carsten Beckmann. ok bluhm
2024-05-14remove prototypes with no matching functionJonathan Gray
2024-05-13remove prototypes with no matching functionJonathan Gray
ok mpi@
2024-05-12sync_ifp and ticket_pabuf don't exist, remove externsJonathan Gray
2024-05-10make pf_match_rule() prototype match the functionJonathan Gray
2024-04-22Show pf fragment reassembly counters.Alexander Bluhm
Framgent count and statistics are stored in struct pf_status. From there pfctl(8) and systat(1) collect and show them. Note that pfctl -s info needs the -v switch to show fragments. As fragment reassembly has its own mutex, also grab this in pf ipctl(2) and sysctl(2) code. input claudio@; OK henning@
2024-04-14Run raw IP input in parallel.Alexander Bluhm
Running raw IPv4 input with shared net lock in parallel is less complex than UDP. Especially there is no socket splicing. New ip_deliver() may run with shared or exclusive net lock. The last parameter indicates the mode. If is is running with shared netlock and encounters a protocol that needs exclusive lock, the packet is queued. Old ip_ours() always queued the packet. Now it calls ip_deliver() with shared net lock, and if that cannot handle the packet completely, the packet is queued and later processed with exclusive net lock. In case of an IPv6 header chain, that switches from shared to exclusive processing, the next protocol and mbuf offset are stored in a mbuf tag. OK mvs@
2024-04-13correct indentationJonathan Gray
no functional change, found by smatch warnings ok miod@ bluhm@
2024-04-12Split single TCP inpcb table into IPv4 and IPv6 parts.Alexander Bluhm
With two separate TCP hash tables, each one becomes smaller. When we remove the exclusive net lock from TCP, contention on internet PCB table mutex will be reduced. UDP has been split earlier into IPv4 and IPv6. Replace branch conditions based on INP_IPV6 with assertions. OK mvs@
2024-04-11Prevent changing interface loopback flag from userland.Alexander Bluhm
IFF_LOOPBACK is telling userland the behaviour of a specific driver, it is supposed to be static and permanent. Clearing the loopback flag on lo0 could lead to a kernel crash due to inconsistent multicast igmp group. Reported-by: syzbot+2f24ed6c8ddb2d6bb22c@syzkaller.appspotmail.com OK claudio@ deraadt@
2024-04-09Don't include net/art.h in net/rtable.h instead let the two usersClaudio Jeker
include the file themselves. OK bluhm@ mpi@
2024-03-31Combine route_cache() and rtalloc_mpath() in new route_mpath().Alexander Bluhm
Fill and check the cache and call rtalloc_mpath() together. Then the caller of route_mpath() does not have to care about the uint32_t *src pointer and just pass struct in_addr. All the conversions are done inside the functions. A previous version of this diff was backed out. There was an additional rtisvalid() in rtalloc_mpath() that prevented packet output via interfaces that were not up. Now the route in the cache has to be valid, but after new lookup, rtalloc_mpath() may return invalid routes. This generates less errors in userland an preserves existing behavior. OK sashan@
2024-03-26Avoid NULL pointer dereference in routing table an_match().Alexander Bluhm
OK mvs@
2024-03-19count if_enqueue/ifq_enqueue errors as oqdrops.David Gwynne
this helps narrow down where some "output failures" on sec interfaces occur. based on discussion with jason tubnor
2024-03-18expose per port information via kstats.David Gwynne
the most interesting information exposed here is the number of times a port changes state according to the lacp state machine. if a port is flapping, it's hard to see if you only look at the current state. getting a count of changes over time makes problems a lot more visible and therefore fixable. this also exposes counters around how the lacp protocol packets. all of these can be useful when trying to line up behaviors with another system (eg, a switch). ok jmatthew@
2024-03-18use high bits from the mbuf flowid to pick a port to transmit on.David Gwynne
a port here is a physical interface used by an aggr. this leaves the low bits for a physical interface to use to pick a tx ring. without this, aggr used low bits for port selection, which takes bits away from the ring selection, which can lead to uneven distribution of packets over tx rings. ive been running this in production for well over a year now.
2024-03-05Convert `t_lock', `r_keypair_lock' and `c_lock' rwlock(9)s toVitaliy Makkoveev
corresponding mutex(9)es. ifq_start() and following wg_qstart() could be called from software interrupt context if bandwidth control is enabled in pf.conf(5). Remove sleep points provided by rwlock(9)s from wg(4) output start routine. looks ok claudio
2024-03-04white space fixes. no functional changeDavid Gwynne
2024-02-29revert "Combine route_cache() and rtalloc_mpath() in new route_mpath()"Christian Weisgerber
It breaks NFS. ok claudio@
2024-02-28Enable IPv6 AF for ppp(4)Denis Fondras
OK claudio@
2024-02-27Combine route_cache() and rtalloc_mpath() in new route_mpath().Alexander Bluhm
Fill and check the cache and call rtalloc_mpath() together. Then the caller of route_mpath() does not have to care about the uint32_t *src pointer and just pass struct in_addr. All the conversions are done inside the functions. ro->ro_rt is either valid or NULL. Note that some places have a stricter rtisvalid() now compared to the previous NULL check. OK claudio@
2024-02-22Make the route cache aware of multipath routing.Alexander Bluhm
Pass source address to route_cache() and store it in struct route. Cached multipath routes are only valid if source address matches. If sysctl multipath changes, increase route generation number. OK claudio@
2024-02-14Check IP length in ether_extract_headers().Alexander Bluhm
For LRO with ix(4) it is necessary to detect ethernet padding. Extract ip_len and ip6_plen from the mbuf and provide it to the drivers. Add extended sanitity checks, like IP packet is shorter than TCP header. This prevents offloading to network hardware with bougus packets. Also iphlen of extracted headers contains header length for IPv4 and IPv6, to make code in drivers simpler. OK mglocker@
2024-02-13Analyse header layout in ether_extract_headers().Alexander Bluhm
Several drivers need IPv4 header length and TCP offset for checksum offload, TSO and LRO. Accessing these fields directly caused crashes on sparc64 due to misaligned access. It cannot be guaranteed that IP and TCP header is 4 byte aligned in driver level. Also gcc 4.2.1 assumes that bit fields can be accessed with 32 bit load instructions. Use memcpy() in ether_extract_headers() to get the bits from IPv4 and TCP header and store the header length in struct ether_extracted. From there network drivers can esily use it without caring about alignment and bit shift. Do some sanity checks with the length values to prevent that invalid values from evil packets get stored into hardware registers. If check fails, clear the pointer to the header to hide it from the driver. Add debug prints that help to figure out the reason for bad packets and provide information when debugging drivers. OK mglocker@
2024-02-13Merge struct route and struct route_in6.Alexander Bluhm
Use a common struct route for both inet and inet6. Unfortunately struct sockaddr is shorter than sockaddr_in6, so netinet/in.h has to be exposed from net/route.h. Struct route has to be bsd visible for userland as netstat kvm code inspects inp_route. Internet PCB and TCP SYN cache can use a plain struct route now. All specific sockaddr types for inet and inet6 are embeded there. OK claudio@
2024-02-09Route cache function returns hit or miss.Alexander Bluhm
The route_cache() function can easily return whether it was a cache hit or miss. Then the logic to perform a route lookup gets a bit simpler. Some more complicated if (ro->ro_rt == NULL) checks still exist elsewhere. Also use route cache in in_pcbselsrc() instead of filling struct route manually. OK claudio@
2024-02-07Add missing #ifdef INET6 to fix ramdisk build.Alexander Bluhm
2024-02-07Use the route generation number also for IPv6.Alexander Bluhm
Implement route6_cache() to check whether the cached route is still valid and otherwise fill caching parameter of struct route_in6. Also count cache hits and misses in netstat. in_pcbrtentry() uses route cache now. OK claudio@
2024-02-06Invert broken check of panic string in if_linkstate().Alexander Bluhm
original bug report from syzkaller Reported-by: syzbot+d19060a65721eb432a72@syzkaller.appspotmail.com broken fix found by Hrvoje Popovski hint to the problem and OK deraadt@