Age | Commit message (Collapse) | Author |
|
ok mvs
|
|
ok mvs
|
|
of updating it blindly.
ok mvs
|
|
incoming SADB_ADD and SADB_UPDATE message. Since we send them as part of
the SADB_GET reply we must also accept them on SADB_ADD/UPDATE as sasyncd
will forward payloads previously received in SADB_GET. Fixes a bug where
sasync can't restore SAs because pfkey returns EINVAL.
From Rafa\xc5\x82 Ramocki
ok bluhm@
|
|
from markus@
|
|
Use atomic operations to read ip6_forwarding while processing packets
in the network stack.
To make clear where actually the router property is needed, use the
i_am_router variable based on ip6_forwarding. It already existed
in nd6_nbr. Move i_am_router setting up the call stack until all
users are independent.
The forwarding decisions in pf_test, pf_refragment6, ip6_input do
also not interfere.
Use a new array ipv6ctl_vars_unlocked to make transition of all the
integer sysctls easier. Adapt IPv4 to the new style.
OK mvs@
|
|
udp_send() and following udp{,6}_output() do not append packets to
`so_snd' socket buffer. This mean the sosend() and sosplice() sending
paths are dummy pru_send() and there is no problems to simultaneously
run them on the same socket.
Push shared solock() deep down to sesend() and take it only around
pru_send(), but keep somove() running unedr exclusive solock(). Since
sosend() doesn't modify `so_snd' the unlocked `so_snd' space checks
within somove() are safe. Corresponding `sb_state' and `sb_flags'
modifications are protected by `sb_mtx' mutex(9).
Tested and OK bluhm.
|
|
The places in packet processing where ip_forwarding is evaluated
have been consolidated. The remaining pieces in pf test, ip input,
and icmp input do not need consistent information. If the integer
value is changed by another CPU, it is harmless.
The sysctl syscall sets the value atomically, so add atomic read
in network processing and remove the net lock in sysctl IPCTL_FORWARDING.
OK claudio@ mvs@
|
|
IPsec gateways set the forwarding sysctl to 2. While this worked
for IPv4 since a long time, adapt this feature for IPv6 now. Set
sysctl net.inet6.ip6.forwarding=2 to forward only packets that have
been processed by IPsec.
Set IPV6_FORWARDING_IPSEC in ip6_input() and pass the flag down to
the call stack. This provides consistent view on global variable
ip6_forwarding. In ip6_output() or ip6_forward() drop packets that
do not match the policy.
OK denis@
|
|
Fix MP race between reading ip_forwarding in ip_input() and checking
ip_forwarding == 2 in ip_output(). In theory ip_forwarding could
be 2 during ip_input() and later 0 in ip_output(). Then a packet
would be forwarded that was never allowed. Currently exclusive
netlock in sysctl(2) prevents all races.
Introduce IP_FORWARDING_IPSEC and pass it with the flags parameter
that was introduced for IP_FORWARDING.
Instead of calling m_tag_find(), traversing the list, and comparing
with NULL, just check the PACKET_TAG_IPSEC_IN_DONE bit. Reading
ipsec_in_use in ip_output() is a performance hack that is not
necessary. New code only checks tree bits.
OK mvs@
|
|
ok mglocker@
|
|
|
|
rule and anchor number when packet matches rule found and anchor depth 2
and more. The issue has been noticed and reported by Giannis Kapetanakis
(billias _at_ edu.physics.uoc.gr), who also co-developed and tested
the final fix presented in this commit.
To fix the issue pf(4) must also remember the anchor where matching rule
belongs while rules are traversed to find a match for given packet.
The information on anchor is now kept in anchor stack frame.w
OK sthen@
|
|
IPv4 uses IP_FORWARDING to pass down a consistent value of
net.inet.ip.forwarding down the stack. This is needed for unlocking
sysctl. Do the same for IPv6.
Read ip6_forwarding once in ip6_input_if() and pass down IPV6_FORWARDING
as flags to ip6_ours(), ip6_hbhchcheck(), ip6_forward(). Replace
the srcrt value with IPV6_REDIRECT flag for consistency with IPv4.
To have common syntax with IPv4, use ip6_forwarding == 0 checks
instead of !ip6_forwarding. This will also make it easier to
implement net.inet6.ip6.forwarding=2 for IPsec only forwarding
later.
In nd6_ns_input() and nd6_na_input() read ip6_forwarding once and
store it in i_am_router. The variable name has been chosen to avoid
confusion with is_router, which indicates router flag of the packet.
Reading of ip6_forwarding is done independently from ip6_input_if(),
consistency does not really matter. One is for ND router behavior
the other for forwarding. Again use the ip6_forwarding != 0 check,
so when ip6_forwarding IPsec only value 2 gets implemented, it will
behave like a router.
OK deraadt@ sashan@ florian@ claudio@
|
|
At sockets layer only mark buffers as SB_MTXLOCK. At PCB layer only
protect `so_rcv' with corresponding `sb_mtx' mutex(9).
SS_ISCONNECTED and SS_CANTRCVMORE bits are redundant for AF_ROUTE
sockets. Since SS_CANTRCVMORE modifications performed with both solock()
and `sb_mtx' held, the 'unlocked' SS_CANTRCVMORE check in
rtm_senddesync() is safe.
ok bluhm
|
|
Add IFCAP_VLAN_HWOFFLOAD to signal hardware like vio(4) can handle
checksum or TSO offloading with inline VLAN tags.
tested by Mark Patruck, sf@ and bluhm@
ok sf@ and bluhm@
|
|
Do not assume that ip_forwarding and ip_directedbcast cannot change
while processing one packet. Read it once and pass down its value
with a flag. This is necessary for unlocking the sysctl path.
There are a few places where a consistent value does not really
matter, they are unchanged. Use a proper ip_ prefix for the global
variable.
OK claudio@
|
|
|
|
|
|
The issue has been noticed by matthieu@ when he was chasing
cause of excessive pfsync traffic between firewall boxes.
When comparing content of state tables between primary
and backup firewall the backup firewall showed many
states as follows:
ESTABLISHED:SYN_SENT
FIN_WAIT_2:SYN_SENT
* :SYN_SENT
this is caused by pfsync_upd_tcp() which fails to update
TCP-state for destination connection peer, so it remains
stuck in SYN_SENT.
matthieu@ confirms diff helps with 'stuck-state'. It also
seems to help with excessive pfsync traffic.
ok @dlg
|
|
The simplest case. Nothing to change in sockets layer, only set
SB_MTXLOCK on socket buffers.
ok bluhm
|
|
pfkeyv2_sysctl() reads the SA type from uninitialized memory if it is
not provided by the caller of sysctl(2) because of a missing length
check.
From Carsten Beckmann.
ok bluhm
|
|
|
|
ok mpi@
|
|
|
|
|
|
Framgent count and statistics are stored in struct pf_status. From
there pfctl(8) and systat(1) collect and show them. Note that pfctl
-s info needs the -v switch to show fragments. As fragment reassembly
has its own mutex, also grab this in pf ipctl(2) and sysctl(2) code.
input claudio@; OK henning@
|
|
Running raw IPv4 input with shared net lock in parallel is less
complex than UDP. Especially there is no socket splicing.
New ip_deliver() may run with shared or exclusive net lock. The
last parameter indicates the mode. If is is running with shared
netlock and encounters a protocol that needs exclusive lock, the
packet is queued. Old ip_ours() always queued the packet. Now it
calls ip_deliver() with shared net lock, and if that cannot handle
the packet completely, the packet is queued and later processed
with exclusive net lock.
In case of an IPv6 header chain, that switches from shared to
exclusive processing, the next protocol and mbuf offset are stored
in a mbuf tag.
OK mvs@
|
|
no functional change, found by smatch warnings
ok miod@ bluhm@
|
|
With two separate TCP hash tables, each one becomes smaller. When
we remove the exclusive net lock from TCP, contention on internet
PCB table mutex will be reduced. UDP has been split earlier into
IPv4 and IPv6. Replace branch conditions based on INP_IPV6 with
assertions.
OK mvs@
|
|
IFF_LOOPBACK is telling userland the behaviour of a specific driver,
it is supposed to be static and permanent. Clearing the loopback
flag on lo0 could lead to a kernel crash due to inconsistent multicast
igmp group.
Reported-by: syzbot+2f24ed6c8ddb2d6bb22c@syzkaller.appspotmail.com
OK claudio@ deraadt@
|
|
include the file themselves.
OK bluhm@ mpi@
|
|
Fill and check the cache and call rtalloc_mpath() together. Then
the caller of route_mpath() does not have to care about the uint32_t
*src pointer and just pass struct in_addr. All the conversions are
done inside the functions.
A previous version of this diff was backed out. There was an
additional rtisvalid() in rtalloc_mpath() that prevented packet
output via interfaces that were not up. Now the route in the cache
has to be valid, but after new lookup, rtalloc_mpath() may return
invalid routes. This generates less errors in userland an preserves
existing behavior.
OK sashan@
|
|
OK mvs@
|
|
this helps narrow down where some "output failures" on sec interfaces
occur.
based on discussion with jason tubnor
|
|
the most interesting information exposed here is the number of times
a port changes state according to the lacp state machine. if a port
is flapping, it's hard to see if you only look at the current state.
getting a count of changes over time makes problems a lot more
visible and therefore fixable.
this also exposes counters around how the lacp protocol packets.
all of these can be useful when trying to line up behaviors with
another system (eg, a switch).
ok jmatthew@
|
|
a port here is a physical interface used by an aggr.
this leaves the low bits for a physical interface to use to pick a
tx ring. without this, aggr used low bits for port selection, which
takes bits away from the ring selection, which can lead to uneven
distribution of packets over tx rings.
ive been running this in production for well over a year now.
|
|
corresponding mutex(9)es.
ifq_start() and following wg_qstart() could be called from software
interrupt context if bandwidth control is enabled in pf.conf(5). Remove
sleep points provided by rwlock(9)s from wg(4) output start routine.
looks ok claudio
|
|
|
|
It breaks NFS.
ok claudio@
|
|
OK claudio@
|
|
Fill and check the cache and call rtalloc_mpath() together. Then
the caller of route_mpath() does not have to care about the uint32_t
*src pointer and just pass struct in_addr. All the conversions are
done inside the functions. ro->ro_rt is either valid or NULL. Note
that some places have a stricter rtisvalid() now compared to the
previous NULL check.
OK claudio@
|
|
Pass source address to route_cache() and store it in struct route.
Cached multipath routes are only valid if source address matches.
If sysctl multipath changes, increase route generation number.
OK claudio@
|
|
For LRO with ix(4) it is necessary to detect ethernet padding.
Extract ip_len and ip6_plen from the mbuf and provide it to the
drivers.
Add extended sanitity checks, like IP packet is shorter than TCP
header. This prevents offloading to network hardware with bougus
packets.
Also iphlen of extracted headers contains header length for IPv4
and IPv6, to make code in drivers simpler.
OK mglocker@
|
|
Several drivers need IPv4 header length and TCP offset for checksum
offload, TSO and LRO. Accessing these fields directly caused crashes
on sparc64 due to misaligned access. It cannot be guaranteed that
IP and TCP header is 4 byte aligned in driver level. Also gcc 4.2.1
assumes that bit fields can be accessed with 32 bit load instructions.
Use memcpy() in ether_extract_headers() to get the bits from IPv4
and TCP header and store the header length in struct ether_extracted.
From there network drivers can esily use it without caring about
alignment and bit shift. Do some sanity checks with the length
values to prevent that invalid values from evil packets get stored
into hardware registers. If check fails, clear the pointer to the
header to hide it from the driver. Add debug prints that help to
figure out the reason for bad packets and provide information when
debugging drivers.
OK mglocker@
|
|
Use a common struct route for both inet and inet6. Unfortunately
struct sockaddr is shorter than sockaddr_in6, so netinet/in.h has
to be exposed from net/route.h. Struct route has to be bsd visible
for userland as netstat kvm code inspects inp_route. Internet PCB
and TCP SYN cache can use a plain struct route now. All specific
sockaddr types for inet and inet6 are embeded there.
OK claudio@
|
|
The route_cache() function can easily return whether it was a cache
hit or miss. Then the logic to perform a route lookup gets a bit
simpler. Some more complicated if (ro->ro_rt == NULL) checks still
exist elsewhere.
Also use route cache in in_pcbselsrc() instead of filling struct
route manually.
OK claudio@
|
|
|
|
Implement route6_cache() to check whether the cached route is still
valid and otherwise fill caching parameter of struct route_in6.
Also count cache hits and misses in netstat. in_pcbrtentry() uses
route cache now.
OK claudio@
|
|
original bug report from syzkaller
Reported-by: syzbot+d19060a65721eb432a72@syzkaller.appspotmail.com
broken fix found by Hrvoje Popovski
hint to the problem and OK deraadt@
|