Age | Commit message (Collapse) | Author |
|
The best-guessed limits will be tested by trial.
|
|
The best-guessed limits will be tested by trial.
|
|
OK sashan
|
|
... these all look fine, derradt@
|
|
|
|
|
|
Thanks kettenis@ for pointing out.
ok kettenis@
|
|
This introduces bounds checks for many net.inet.tcp sysctl variables.
Folded some fitting cases into the framework: tcp_do_sack, tcp_do_ecn.
ok derradt@
|
|
This replaces a piece of observationally identical code which was much
more complicated.
ok mpi@
|
|
RFC 4291 dropped this requirement from RFC 3513:
o An anycast address must not be used as the source address of an
IPv6 packet.
And from that requirement draft-itojun-ipv6-tcp-to-anycast rightly
concluded that TCP connections must be prevented.
The draft also states:
The proposed method MUST be removed when one of the following events
happens in the future:
o Restriction imposed on IPv6 anycast address is loosened, so that
anycast address can be placed into source address field of the IPv6
header[...]
OK jca
|
|
Reported by Peter J. Philipp.
ok mvs@ deraadt@
|
|
Range violations are now consistently reported as EOPNOTSUPP.
Previously they were mixed with ENOPROTOOPT.
OK kn@
|
|
ok kn
|
|
an uvm fault. Check that ifp0 is not NULL.
OK sashan@ mvs@
|
|
Zero-tick timeouts rely on implicit behavior in the timeout layer that
inhibits optimizations in softclock().
bluhm@ says waiting a tick for the reaper shouldn't break anything.
ok bluhm@
|
|
ok sashan@
|
|
the interface input handler lists were originally set up to help
us during the intial mpsafe network stack work. at the time not all
the virtual ethernet interfaces (vlan, svlan, bridge, trunk, etc)
were mpsafe, so we wanted a way to avoid them by default, and only
take the kernel lock hit when they were specifically enabled on the
interface. since then, they have been fixed up to be mpsafe.
i could leave the list in place, but it has some semantic problems.
because virtual interfaces filter packets based on the order they
were attached to the parent interface, you can get packets taken
away in surprising ways, especially when you reboot and netstart
does something different to what you did by hand. by hardcoding the
order that things like vlan and bridge get to look at packets, we
can document the behaviour and get consistency.
it also means we can get rid of a use of SRPs which were difficult
to replace with SMRs. the interface input handler list is an SRPL,
which we would like to deprecate. it turns out that you can sleep
during stack processing, which you're not supposed to do with SRPs
or SMRs, but SRPs are a lot more forgiving and it worked.
lastly, it turns out that this code is faster than the input list
handling, so lots of winning all around.
special thanks to hrvoje popovski and aaron bieber for testing.
this has been in snaps as part of a larger diff for over a week.
|
|
carp_input is only tried after vlan and bridge handling is done,
and after the ethernet packet doesnt match the parent interfaces
mac address.
this has been in snaps as part of a larger diff for over a week.
|
|
this is the first step in refactoring how ethernet frames are demuxed
by virtual interfaces, and also in deprecating interface input list
handling.
we now have drivers for three types of virtual bridges, bridge(4),
switch(4), and tpmr(4), and it doesn't make sense for any of them
to be enabled on the same "port" interfaces at the same time.
currently you can add a port interface to multiple types of bridge,
but which one gets to steal the packets depends on the order in
which they were attached.
this creates an ether_brport structure that holds an input function
for the bridge, and optionally some per port state that the bridge
can use. arpcom has a single pointer to one of these structs that
will be used during normal ether_input processing to see if a packet
should be passed to a bridge, and will be used instead of an if
input handler. because it is a single pointer, it will make sure
only one bridge of any type is attached to a port at any one time.
this has been in snaps as part of a larger diff for over a week.
|
|
time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.
This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).
There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.
There is no performance cost on 64-bit (__LP64__) platforms.
With input from visa@, dlg@, and tedu@.
Several bugs squashed by visa@.
ok kettenis@
|
|
|
|
i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.
|
|
|
|
this is so protocols (eg, udp) can let things (eg, kernel support
for wireguard or vxlan or geneve) look at and possibly steal packets
before they get added to a socket buffer.
i wrote the original version of this, but it was tweaked by Matt
Dunwoodie and Jason A. Donenfeld for use with wireguard.
|
|
avoidance. The problem and fix is noted in RFC5681 section 3.1, page 7.
Report, diff and testing from Brian Brombacher, thanks!
Testing and a cosmetic tweak by myself.
ok claudio
|
|
Prevent a panic in syn_cache_insert() found by syzbot.
Reported-by: syzbot+aee24ad9b7bf5665912d@syzkaller.appspotmail.com
ok sashan@, anton@, millert@
|
|
address. In that case, the linking to the pf state must be dissolved
as the latter still contains the old address. If it is a divert
state, also remove the state as any divert state must be associated
with a matching socket. Call pf_remove_divert_state() and
pf_inp_unlink() from in_pcbconnect().
reported by Tim Kuijsten; OK sashan@ claudio@
|
|
Since our last concurrency mistake only ioctl(2) ans sysctl(2) code path
take the reader lock. This is mostly for documentation purpose as long as
the softnet thread is converted back to use a read lock.
dlg@ said that comments should be good enough.
ok sashan@
|
|
these packets have generally already been counted on the interface
because that's where they were sent or received from. the protocol
handling side of things already counts things like packets, which
you see with netstat -sp carp.
|
|
this is modelled on vlan_transmit, and basically enqueues the packet
directly on the parent interface.
even though carp is generally not used to transmit packets, we run
dhcp relays on it at work and hit a situation where we unecessarily
dropped packets because it's ifq maxlen was 1. i've been running
this for a month in production.
ok jmatthew@
|
|
|
|
encryption or decryption. This allows us to keep plaintext and encrypted
network traffic seperated and reduces the attack surface for network
sidechannel attacks.
The only way to reach the inner rdomain from outside is by successful
decryption and integrity verification through the responsible Security
Association (SA).
The only way for internal traffic to get out is getting encrypted and
moved through the outgoing SA.
Multiple plaintext rdomains can share the same encrypted rdomain while
the unencrypted packets are still kept seperate.
The encrypted and unencrypted rdomains can have different default routes.
The rdomains can be configured with the new SADB_X_EXT_RDOMAIN pfkey
extension. Each SA (tdb) gets a new attribute 'tdb_rdomain_post'.
If this differs from 'tdb_rdomain' then the packet is moved to
'tdb_rdomain_post' afer IPsec processing.
Flows and outgoing IPsec SAs are installed in the plaintext rdomain,
incoming IPsec SAs are installed in the encrypted rdomain.
IPCOMP SAs are always installed in the plaintext rdomain.
They can be viewed with 'route -T X exec ipsecctl -sa' where X is the
rdomain ID.
As the kernel does not create encX devices automatically when creating
rdomains they have to be added by hand with ifconfig for IPsec to work
in non-default rdomains.
discussed with chris@ and kn@
ok markus@, patrick@
|
|
Prevent concurrency in the socket layer which is not ready for that.
Two recent data corruptions in pfsync(4) and the socket layer pointed
out that, at least, tun(4) was incorrectly using NET_RUNLOCK(). Until
we find a way in software to avoid future mistakes and to make sure that
only the softnet thread and some ioctls are safe to use a read version
of the lock, put everything back to the exclusive version.
ok stsp@, visa@
|
|
made from socket close path. Most device drivers are not MP-safe yet,
and the closing of AF_INET and AF_INET6 sockets is no longer under the
kernel lock.
This fixes a panic seen by jcs@.
OK mpi@
|
|
ok bluhm@
|
|
in RFC8622; ok job@
|
|
IP forwarding is disabled. Issue reported by Daniel Jakots (danj@)
OK bluhm@
|
|
We only install flows for IPcomp. When processing an incoming ESP SA,
look for a bundled IPcomp SA and use that in the policy check.
ok bluhm@
|
|
|
|
where such packet is bound to. This check is enforced if and only
IP forwarding is disabled.
Change discussed with bluhm@, claudio@, deraadt@, markus@, tobhe@
OK bluhm@, claudio@, tobhe@
|
|
ok bluhm@
|
|
Same fix as for the IPv6 case. Fixes a regression in ports/net/openvpn
spotted by landry@, ok bluhm@
|
|
isakmpd and iked to REQUIRE. Filter policy violations earlier.
ok sashan@ bluhm@
|
|
netmask in the kernel.
OK visa@
|
|
unfiltered in the future, so this prevents rresvport_af(3) from randomly
exposing a service intended for local visibility only.
ok florian
|
|
tp->snd_wnd. This can happen, for example, when the remote side
responds to a window probe by ACKing the one byte it contains.
from FreeBSD; via markus@; OK sashan@ tobhe@
|
|
ifpromisc() already refcounts, so carp doesn't have to do it
implicitly with the carpdev list. there's no functional change, the
code just gets a bit simpler.
|
|
this follows what's been done for detach and link state hooks, and
makes handling of hooks generally more robust.
address hooks are a bit different to detach/link state hooks in
that there's only a few things that register hooks (carp, pf, vxlan),
but a lot of places to run the hooks (lots of ipv4 and ipv6 address
configuration).
an address hook cookie was in struct pfi_kif, which is part of the
pf abi. rather than break pfctl -sI, this maintains the void * used
for the cookie and uses it to store a task, which is then used as
intended with the new api.
|
|
SIOCGIFADDR, SIOCGIFNETMASK, SIOCGIFDSTADDR, SIOCGIFBRDADDR,
SIOCSIFADDR, SIOCSIFNETMASK, SIOCSIFDSTADDR, and SIOCSIFBRDADDR.
Name in_ioctl_set_ifaddr() consistently. Use in_sa2sin() to validate
inet address. Combine if_addrlist loops and add comment. Although
netmask is not a inet address, length must be valid.
Reported-by: syzbot+5fc6da002fc4e8d994be@syzkaller.appspotmail.com
OK visa@
|
|
making RTM_INVALIDATE code path perform same check as RTM_DELETE does.
ok mpi@
|