Age | Commit message (Collapse) | Author |
|
Fix MP race between reading ip_forwarding in ip_input() and checking
ip_forwarding == 2 in ip_output(). In theory ip_forwarding could
be 2 during ip_input() and later 0 in ip_output(). Then a packet
would be forwarded that was never allowed. Currently exclusive
netlock in sysctl(2) prevents all races.
Introduce IP_FORWARDING_IPSEC and pass it with the flags parameter
that was introduced for IP_FORWARDING.
Instead of calling m_tag_find(), traversing the list, and comparing
with NULL, just check the PACKET_TAG_IPSEC_IN_DONE bit. Reading
ipsec_in_use in ip_output() is a performance hack that is not
necessary. New code only checks tree bits.
OK mvs@
|
|
ok mglocker@
|
|
|
|
rule and anchor number when packet matches rule found and anchor depth 2
and more. The issue has been noticed and reported by Giannis Kapetanakis
(billias _at_ edu.physics.uoc.gr), who also co-developed and tested
the final fix presented in this commit.
To fix the issue pf(4) must also remember the anchor where matching rule
belongs while rules are traversed to find a match for given packet.
The information on anchor is now kept in anchor stack frame.w
OK sthen@
|
|
IPv4 uses IP_FORWARDING to pass down a consistent value of
net.inet.ip.forwarding down the stack. This is needed for unlocking
sysctl. Do the same for IPv6.
Read ip6_forwarding once in ip6_input_if() and pass down IPV6_FORWARDING
as flags to ip6_ours(), ip6_hbhchcheck(), ip6_forward(). Replace
the srcrt value with IPV6_REDIRECT flag for consistency with IPv4.
To have common syntax with IPv4, use ip6_forwarding == 0 checks
instead of !ip6_forwarding. This will also make it easier to
implement net.inet6.ip6.forwarding=2 for IPsec only forwarding
later.
In nd6_ns_input() and nd6_na_input() read ip6_forwarding once and
store it in i_am_router. The variable name has been chosen to avoid
confusion with is_router, which indicates router flag of the packet.
Reading of ip6_forwarding is done independently from ip6_input_if(),
consistency does not really matter. One is for ND router behavior
the other for forwarding. Again use the ip6_forwarding != 0 check,
so when ip6_forwarding IPsec only value 2 gets implemented, it will
behave like a router.
OK deraadt@ sashan@ florian@ claudio@
|
|
At sockets layer only mark buffers as SB_MTXLOCK. At PCB layer only
protect `so_rcv' with corresponding `sb_mtx' mutex(9).
SS_ISCONNECTED and SS_CANTRCVMORE bits are redundant for AF_ROUTE
sockets. Since SS_CANTRCVMORE modifications performed with both solock()
and `sb_mtx' held, the 'unlocked' SS_CANTRCVMORE check in
rtm_senddesync() is safe.
ok bluhm
|
|
Add IFCAP_VLAN_HWOFFLOAD to signal hardware like vio(4) can handle
checksum or TSO offloading with inline VLAN tags.
tested by Mark Patruck, sf@ and bluhm@
ok sf@ and bluhm@
|
|
Do not assume that ip_forwarding and ip_directedbcast cannot change
while processing one packet. Read it once and pass down its value
with a flag. This is necessary for unlocking the sysctl path.
There are a few places where a consistent value does not really
matter, they are unchanged. Use a proper ip_ prefix for the global
variable.
OK claudio@
|
|
|
|
|
|
The issue has been noticed by matthieu@ when he was chasing
cause of excessive pfsync traffic between firewall boxes.
When comparing content of state tables between primary
and backup firewall the backup firewall showed many
states as follows:
ESTABLISHED:SYN_SENT
FIN_WAIT_2:SYN_SENT
* :SYN_SENT
this is caused by pfsync_upd_tcp() which fails to update
TCP-state for destination connection peer, so it remains
stuck in SYN_SENT.
matthieu@ confirms diff helps with 'stuck-state'. It also
seems to help with excessive pfsync traffic.
ok @dlg
|
|
The simplest case. Nothing to change in sockets layer, only set
SB_MTXLOCK on socket buffers.
ok bluhm
|
|
pfkeyv2_sysctl() reads the SA type from uninitialized memory if it is
not provided by the caller of sysctl(2) because of a missing length
check.
From Carsten Beckmann.
ok bluhm
|
|
|
|
ok mpi@
|
|
|
|
|
|
Framgent count and statistics are stored in struct pf_status. From
there pfctl(8) and systat(1) collect and show them. Note that pfctl
-s info needs the -v switch to show fragments. As fragment reassembly
has its own mutex, also grab this in pf ipctl(2) and sysctl(2) code.
input claudio@; OK henning@
|
|
Running raw IPv4 input with shared net lock in parallel is less
complex than UDP. Especially there is no socket splicing.
New ip_deliver() may run with shared or exclusive net lock. The
last parameter indicates the mode. If is is running with shared
netlock and encounters a protocol that needs exclusive lock, the
packet is queued. Old ip_ours() always queued the packet. Now it
calls ip_deliver() with shared net lock, and if that cannot handle
the packet completely, the packet is queued and later processed
with exclusive net lock.
In case of an IPv6 header chain, that switches from shared to
exclusive processing, the next protocol and mbuf offset are stored
in a mbuf tag.
OK mvs@
|
|
no functional change, found by smatch warnings
ok miod@ bluhm@
|
|
With two separate TCP hash tables, each one becomes smaller. When
we remove the exclusive net lock from TCP, contention on internet
PCB table mutex will be reduced. UDP has been split earlier into
IPv4 and IPv6. Replace branch conditions based on INP_IPV6 with
assertions.
OK mvs@
|
|
IFF_LOOPBACK is telling userland the behaviour of a specific driver,
it is supposed to be static and permanent. Clearing the loopback
flag on lo0 could lead to a kernel crash due to inconsistent multicast
igmp group.
Reported-by: syzbot+2f24ed6c8ddb2d6bb22c@syzkaller.appspotmail.com
OK claudio@ deraadt@
|
|
include the file themselves.
OK bluhm@ mpi@
|
|
Fill and check the cache and call rtalloc_mpath() together. Then
the caller of route_mpath() does not have to care about the uint32_t
*src pointer and just pass struct in_addr. All the conversions are
done inside the functions.
A previous version of this diff was backed out. There was an
additional rtisvalid() in rtalloc_mpath() that prevented packet
output via interfaces that were not up. Now the route in the cache
has to be valid, but after new lookup, rtalloc_mpath() may return
invalid routes. This generates less errors in userland an preserves
existing behavior.
OK sashan@
|
|
OK mvs@
|
|
this helps narrow down where some "output failures" on sec interfaces
occur.
based on discussion with jason tubnor
|
|
the most interesting information exposed here is the number of times
a port changes state according to the lacp state machine. if a port
is flapping, it's hard to see if you only look at the current state.
getting a count of changes over time makes problems a lot more
visible and therefore fixable.
this also exposes counters around how the lacp protocol packets.
all of these can be useful when trying to line up behaviors with
another system (eg, a switch).
ok jmatthew@
|
|
a port here is a physical interface used by an aggr.
this leaves the low bits for a physical interface to use to pick a
tx ring. without this, aggr used low bits for port selection, which
takes bits away from the ring selection, which can lead to uneven
distribution of packets over tx rings.
ive been running this in production for well over a year now.
|
|
corresponding mutex(9)es.
ifq_start() and following wg_qstart() could be called from software
interrupt context if bandwidth control is enabled in pf.conf(5). Remove
sleep points provided by rwlock(9)s from wg(4) output start routine.
looks ok claudio
|
|
|
|
It breaks NFS.
ok claudio@
|
|
OK claudio@
|
|
Fill and check the cache and call rtalloc_mpath() together. Then
the caller of route_mpath() does not have to care about the uint32_t
*src pointer and just pass struct in_addr. All the conversions are
done inside the functions. ro->ro_rt is either valid or NULL. Note
that some places have a stricter rtisvalid() now compared to the
previous NULL check.
OK claudio@
|
|
Pass source address to route_cache() and store it in struct route.
Cached multipath routes are only valid if source address matches.
If sysctl multipath changes, increase route generation number.
OK claudio@
|
|
For LRO with ix(4) it is necessary to detect ethernet padding.
Extract ip_len and ip6_plen from the mbuf and provide it to the
drivers.
Add extended sanitity checks, like IP packet is shorter than TCP
header. This prevents offloading to network hardware with bougus
packets.
Also iphlen of extracted headers contains header length for IPv4
and IPv6, to make code in drivers simpler.
OK mglocker@
|
|
Several drivers need IPv4 header length and TCP offset for checksum
offload, TSO and LRO. Accessing these fields directly caused crashes
on sparc64 due to misaligned access. It cannot be guaranteed that
IP and TCP header is 4 byte aligned in driver level. Also gcc 4.2.1
assumes that bit fields can be accessed with 32 bit load instructions.
Use memcpy() in ether_extract_headers() to get the bits from IPv4
and TCP header and store the header length in struct ether_extracted.
From there network drivers can esily use it without caring about
alignment and bit shift. Do some sanity checks with the length
values to prevent that invalid values from evil packets get stored
into hardware registers. If check fails, clear the pointer to the
header to hide it from the driver. Add debug prints that help to
figure out the reason for bad packets and provide information when
debugging drivers.
OK mglocker@
|
|
Use a common struct route for both inet and inet6. Unfortunately
struct sockaddr is shorter than sockaddr_in6, so netinet/in.h has
to be exposed from net/route.h. Struct route has to be bsd visible
for userland as netstat kvm code inspects inp_route. Internet PCB
and TCP SYN cache can use a plain struct route now. All specific
sockaddr types for inet and inet6 are embeded there.
OK claudio@
|
|
The route_cache() function can easily return whether it was a cache
hit or miss. Then the logic to perform a route lookup gets a bit
simpler. Some more complicated if (ro->ro_rt == NULL) checks still
exist elsewhere.
Also use route cache in in_pcbselsrc() instead of filling struct
route manually.
OK claudio@
|
|
|
|
Implement route6_cache() to check whether the cached route is still
valid and otherwise fill caching parameter of struct route_in6.
Also count cache hits and misses in netstat. in_pcbrtentry() uses
route cache now.
OK claudio@
|
|
original bug report from syzkaller
Reported-by: syzbot+d19060a65721eb432a72@syzkaller.appspotmail.com
broken fix found by Hrvoje Popovski
hint to the problem and OK deraadt@
|
|
To optimize route caching, count cache hits and misses. This is
shown in netstat -s for both inet and inet6. Reuse the old IPv6
forward cache counter. Sort ip6s_wrongif consistently. For now
only IPv4 cache counter has been implemented.
OK mvs@
|
|
[1] that if_downall() tries to send route messages and triggers panic
again but in knote(9) layer.
1. https://syzkaller.appspot.com/bug?extid=d19060a65721eb432a72
ok bluhm
|
|
This prevents gcc3's 'parameter has incomplete type' warning that
causes kernel build failure.
Suggested by claudio@, ok bluhm@
|
|
The outgoing route is cached at the inpcb. This cache was only
invalidated when the socket closes or if the route gets invalid.
More specific routes were not detected. Especially with dynamic
routing protocols, sockets must be closed and reopened to use the
correct route. Running ping during a route change shows the problem.
To solve this, add a route generation number that is updated whenever
the routing table changes. The lookup in struct route is put into
the route_cache() function. If the generation number is too old,
the cached route gets discarded.
Implement route_cache() for ip_output() and ip_forward() first.
IPv6 and more places will follow.
OK claudio@
|
|
Thus, dhcpleased accept non-calculated checksums which were verified by
hardware/hypervisor.
With tweaks from dlg@
ok bluhm@
mkay tobhe@
|
|
sec(4) was already looking for this mbuf tag so it could drop packets
that had already been sent out on the same interface, but i forgot
the code that adds the tag.
this was reported by jason tubnor who experienced spins/lockups
when using sec and a physical interface was disconnected. rather
than being a locking problem like we initially assumed, it turned
out that unplugging a physical interface caused a route for ipsec
encapsulated traffic to go out over sec(4), causing the packet to
loop in the stack.
the fix was also tested and verified by jason. sorry for taking so
long to look at it.
|
|
`pipex_session_list' foreach walkthrough with `pipex_list_mtx' mutex(9)
relocking. It inserts special item after acquired `session' and keeps it
linked until `session' release. Only owner can unlink it's own item, so
the LIST_NEXT(session) is always valid even the `session' was unlinked.
The iterator skips special items at the `session' acquisition time, as
all other foreach loops where `pipex_list_mtx' mutex(9) is not relocked.
ok yasuoka
|
|
ok yasuoka
|
|
ok bluhm
|