summaryrefslogtreecommitdiff
path: root/sys/netinet
AgeCommit message (Collapse)Author
2020-09-01Convert *_sysctl in ipsec_input.c to sysctl_bounded_arrgnezdo
The best-guessed limits will be tested by trial.
2020-09-01Convert icmp6_sysct to sysctl_bounded_argsgnezdo
The best-guessed limits will be tested by trial.
2020-08-24Convert divert*_sysctl to sysctl_bounded_argsgnezdo
OK sashan
2020-08-22Convert icmp_sysctl to sysctl_bounded_argsgnezdo
... these all look fine, derradt@
2020-08-22Convert ip_sysctl to sysctl_bounded_argsgnezdo
2020-08-22Convert udp_sysctl to sysctl_bounded_argsgnezdo
2020-08-18Style fixups from hurried commitsgnezdo
Thanks kettenis@ for pointing out. ok kettenis@
2020-08-18Convert tcp_sysctl to sysctl_bounded_argsgnezdo
This introduces bounds checks for many net.inet.tcp sysctl variables. Folded some fitting cases into the framework: tcp_do_sack, tcp_do_ecn. ok derradt@
2020-08-17Simplify igmp_sysctl to directly return error in default casegnezdo
This replaces a piece of observationally identical code which was much more complicated. ok mpi@
2020-08-08No longer prevent TCP connections to IPv6 anycast addresses.Florian Obser
RFC 4291 dropped this requirement from RFC 3513: o An anycast address must not be used as the source address of an IPv6 packet. And from that requirement draft-itojun-ipv6-tcp-to-anycast rightly concluded that TCP connections must be prevented. The draft also states: The proposed method MUST be removed when one of the following events happens in the future: o Restriction imposed on IPv6 anycast address is loosened, so that anycast address can be placed into source address field of the IPv6 header[...] OK jca
2020-08-05Don't compare pointers against zero.Marcus Glocker
Reported by Peter J. Philipp. ok mvs@ deraadt@
2020-08-01Move range check inside sysctl_int_arrgnezdo
Range violations are now consistently reported as EOPNOTSUPP. Previously they were mixed with ENOPROTOOPT. OK kn@
2020-07-28Don't treat an error if carppeer is an unicast and the peer is down.YASUOKA Masahiko
ok kn
2020-07-28After the previous commit, src/regress/sys/netinet/carp triggeredAlexander Bluhm
an uvm fault. Check that ifp0 is not NULL. OK sashan@ mvs@
2020-07-24netinet: tcp_close(): delay reaper timeout by one tickcheloha
Zero-tick timeouts rely on implicit behavior in the timeout layer that inhibits optimizations in softclock(). bluhm@ says waiting a tick for the reaper shouldn't break anything. ok bluhm@
2020-07-24Use interface index instead of pointer to `ifnet' in carp(4).mvs
ok sashan@
2020-07-22deprecate interface input handler lists, just use one input function.David Gwynne
the interface input handler lists were originally set up to help us during the intial mpsafe network stack work. at the time not all the virtual ethernet interfaces (vlan, svlan, bridge, trunk, etc) were mpsafe, so we wanted a way to avoid them by default, and only take the kernel lock hit when they were specifically enabled on the interface. since then, they have been fixed up to be mpsafe. i could leave the list in place, but it has some semantic problems. because virtual interfaces filter packets based on the order they were attached to the parent interface, you can get packets taken away in surprising ways, especially when you reboot and netstart does something different to what you did by hand. by hardcoding the order that things like vlan and bridge get to look at packets, we can document the behaviour and get consistency. it also means we can get rid of a use of SRPs which were difficult to replace with SMRs. the interface input handler list is an SRPL, which we would like to deprecate. it turns out that you can sleep during stack processing, which you're not supposed to do with SRPs or SMRs, but SRPs are a lot more forgiving and it worked. lastly, it turns out that this code is faster than the input list handling, so lots of winning all around. special thanks to hrvoje popovski and aaron bieber for testing. this has been in snaps as part of a larger diff for over a week.
2020-07-22move carp_input into ether_input, instead of via an input handler.David Gwynne
carp_input is only tried after vlan and bridge handling is done, and after the ethernet packet doesnt match the parent interfaces mac address. this has been in snaps as part of a larger diff for over a week.
2020-07-22add code to coordinate how bridges attach to ethernet interfaces.David Gwynne
this is the first step in refactoring how ethernet frames are demuxed by virtual interfaces, and also in deprecating interface input list handling. we now have drivers for three types of virtual bridges, bridge(4), switch(4), and tpmr(4), and it doesn't make sense for any of them to be enabled on the same "port" interfaces at the same time. currently you can add a port interface to multiple types of bridge, but which one gets to steal the packets depends on the order in which they were attached. this creates an ether_brport structure that holds an input function for the bridge, and optionally some per port state that the bridge can use. arpcom has a single pointer to one of these structs that will be used during normal ether_input processing to see if a packet should be passed to a bridge, and will be used instead of an if input handler. because it is a single pointer, it will make sure only one bridge of any type is attached to a port at any one time. this has been in snaps as part of a larger diff for over a week.
2020-06-24kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)cheloha
time_second(9) and time_uptime(9) are widely used in the kernel to quickly get the system UTC or system uptime as a time_t. However, time_t is 64-bit everywhere, so it is not generally safe to use them on 32-bit platforms: you have a split-read problem if your hardware cannot perform atomic 64-bit reads. This patch replaces time_second(9) with gettime(9), a safer successor interface, throughout the kernel. Similarly, time_uptime(9) is replaced with getuptime(9). There is a performance cost on 32-bit platforms in exchange for eliminating the split-read problem: instead of two register reads you now have a lockless read loop to pull the values from the timehands. This is really not *too* bad in the grand scheme of things, but compared to what we were doing before it is several times slower. There is no performance cost on 64-bit (__LP64__) platforms. With input from visa@, dlg@, and tedu@. Several bugs squashed by visa@. ok kettenis@
2020-06-21wrap a long line. no functional change.David Gwynne
2020-06-21if an inp_upcall is set, let it look at and maybe steal the udp packet.David Gwynne
i wrote the original version of this, but it was tweaked by Matt Dunwoodie and Jason A. Donenfeld for use with wireguard.
2020-06-21knf: the inp_upcall line was too long.David Gwynne
2020-06-21add a inp_upcall function pointer and inp_upcall_arg to struct in_pcb.David Gwynne
this is so protocols (eg, udp) can let things (eg, kernel support for wireguard or vxlan or geneve) look at and possibly steal packets before they get added to a socket buffer. i wrote the original version of this, but it was tweaked by Matt Dunwoodie and Jason A. Donenfeld for use with wireguard.
2020-06-19Break a glass ceiling on cwnd due to integer division during congestionRichard Procter
avoidance. The problem and fix is noted in RFC5681 section 3.1, page 7. Report, diff and testing from Brian Brombacher, thanks! Testing and a cosmetic tweak by myself. ok claudio
2020-06-18Refuse to set 0 or a negative value for net.inet.tcp.synbucketlimit.Martin Pieuchot
Prevent a panic in syn_cache_insert() found by syzbot. Reported-by: syzbot+aee24ad9b7bf5665912d@syzkaller.appspotmail.com ok sashan@, anton@, millert@
2020-05-27Connectionless sockets like UDP can be re-connected to a differentAlexander Bluhm
address. In that case, the linking to the pf state must be dissolved as the latter still contains the old address. If it is a divert state, also remove the state as any divert state must be associated with a matching socket. Call pf_remove_divert_state() and pf_inp_unlink() from in_pcbconnect(). reported by Tim Kuijsten; OK sashan@ claudio@
2020-05-27Document the various flavors of NET_LOCK() and rename the reader version.Martin Pieuchot
Since our last concurrency mistake only ioctl(2) ans sysctl(2) code path take the reader lock. This is mostly for documentation purpose as long as the softnet thread is converted back to use a read lock. dlg@ said that comments should be good enough. ok sashan@
2020-05-21don't count packets in the carp protocol handling against an interface.David Gwynne
these packets have generally already been counted on the interface because that's where they were sent or received from. the protocol handling side of things already counts things like packets, which you see with netstat -sp carp.
2020-05-21implement a carp_transmit that bypasses the ifq on output.David Gwynne
this is modelled on vlan_transmit, and basically enqueues the packet directly on the parent interface. even though carp is generally not used to transmit packets, we run dhcp relays on it at work and hit a situation where we unecessarily dropped packets because it's ifq maxlen was 1. i've been running this for a month in production. ok jmatthew@
2020-04-29remove some trailing whitespace. no functional change.David Gwynne
2020-04-23Add support for autmatically moving traffic between rdomains on ipsec(4)tobhe
encryption or decryption. This allows us to keep plaintext and encrypted network traffic seperated and reduces the attack surface for network sidechannel attacks. The only way to reach the inner rdomain from outside is by successful decryption and integrity verification through the responsible Security Association (SA). The only way for internal traffic to get out is getting encrypted and moved through the outgoing SA. Multiple plaintext rdomains can share the same encrypted rdomain while the unencrypted packets are still kept seperate. The encrypted and unencrypted rdomains can have different default routes. The rdomains can be configured with the new SADB_X_EXT_RDOMAIN pfkey extension. Each SA (tdb) gets a new attribute 'tdb_rdomain_post'. If this differs from 'tdb_rdomain' then the packet is moved to 'tdb_rdomain_post' afer IPsec processing. Flows and outgoing IPsec SAs are installed in the plaintext rdomain, incoming IPsec SAs are installed in the encrypted rdomain. IPCOMP SAs are always installed in the plaintext rdomain. They can be viewed with 'route -T X exec ipsecctl -sa' where X is the rdomain ID. As the kernel does not create encX devices automatically when creating rdomains they have to be added by hand with ifconfig for IPsec to work in non-default rdomains. discussed with chris@ and kn@ ok markus@, patrick@
2020-04-12Stop processing packets under non-exclusive (read) netlock.Martin Pieuchot
Prevent concurrency in the socket layer which is not ready for that. Two recent data corruptions in pfsync(4) and the socket layer pointed out that, at least, tun(4) was incorrectly using NET_RUNLOCK(). Until we find a way in software to avoid future mistakes and to make sure that only the softnet thread and some ioctls are safe to use a read version of the lock, put everything back to the exclusive version. ok stsp@, visa@
2020-03-15Guard SIOCDELMULTI if_ioctl calls with KERNEL_LOCK() where the call isVisa Hankala
made from socket close path. Most device drivers are not MP-safe yet, and the closing of AF_INET and AF_INET6 sockets is no longer under the kernel lock. This fixes a panic seen by jcs@. OK mpi@
2020-03-06Fix uninitialized use of variable 'len'.tobhe
ok bluhm@
2020-01-26add define for IPTOS_DSCP_LE; "low effort" DSCP codepoint standardisedDamien Miller
in RFC8622; ok job@
2019-12-23rdr-to with loopback destination should work even thoughAlexandr Nedvedicky
IP forwarding is disabled. Issue reported by Daniel Jakots (danj@) OK bluhm@
2019-12-10Make bundled IPcomp/ESP policies work with IPSEC_LEVEL_REQUIRE.tobhe
We only install flows for IPcomp. When processing an incoming ESP SA, look for a bundled IPcomp SA and use that in the policy check. ok bluhm@
2019-12-09always pull in if_types.h, to unbreak ramdisksTheo de Raadt
2019-12-08Make sure packet destination address matches interface address,Alexandr Nedvedicky
where such packet is bound to. This check is enforced if and only IP forwarding is disabled. Change discussed with bluhm@, claudio@, deraadt@, markus@, tobhe@ OK bluhm@, claudio@, tobhe@
2019-12-06Checking the IPsec policy is expensive. Check only when IPsec is used.tobhe
ok bluhm@
2019-12-01Don't require a valid sa_len for a bunch of IPv4 "get" ioctlsJeremie Courreges-Anglas
Same fix as for the IPv6 case. Fixes a regression in ports/net/openvpn spotted by landry@, ok bluhm@
2019-11-29Change the default security level for incoming IPsec flows fromtobhe
isakmpd and iked to REQUIRE. Filter policy violations earlier. ok sashan@ bluhm@
2019-11-28Although ifconfig(8) checks it already, enforce contiguous inetAlexander Bluhm
netmask in the kernel. OK visa@
2019-11-13Add DoT 853 to DEFBADDYNAMICPORTS_TCP. This port will be increasinglyTheo de Raadt
unfiltered in the future, so this prevents rresvport_af(3) from randomly exposing a service intended for local visibility only. ok florian
2019-11-11Prevent underflows in tp->snd_wnd if the remote side ACKs more thanAlexander Bluhm
tp->snd_wnd. This can happen, for example, when the remote side responds to a window probe by ACKing the one byte it contains. from FreeBSD; via markus@; OK sashan@ tobhe@
2019-11-08void being too clever about setting/clearing ifpromisc on the parent.David Gwynne
ifpromisc() already refcounts, so carp doesn't have to do it implicitly with the carpdev list. there's no functional change, the code just gets a bit simpler.
2019-11-08convert interface address change hooks to tasks and a task_list.David Gwynne
this follows what's been done for detach and link state hooks, and makes handling of hooks generally more robust. address hooks are a bit different to detach/link state hooks in that there's only a few things that register hooks (carp, pf, vxlan), but a lot of places to run the hooks (lots of ipv4 and ipv6 address configuration). an address hook cookie was in struct pfi_kif, which is part of the pf abi. rather than break pfctl -sI, this maintains the void * used for the cookie and uses it to store a task, which is then used as intended with the new api.
2019-11-07Do propper kernel input validation for in_control() ioctl(2)Alexander Bluhm
SIOCGIFADDR, SIOCGIFNETMASK, SIOCGIFDSTADDR, SIOCGIFBRDADDR, SIOCSIFADDR, SIOCSIFNETMASK, SIOCSIFDSTADDR, and SIOCSIFBRDADDR. Name in_ioctl_set_ifaddr() consistently. Use in_sa2sin() to validate inet address. Combine if_addrlist loops and add comment. Although netmask is not a inet address, length must be valid. Reported-by: syzbot+5fc6da002fc4e8d994be@syzkaller.appspotmail.com OK visa@
2019-11-07Avoid NULL dereference in arpinvalidate() and nd6_invalidate() byKenneth R Westerback
making RTM_INVALIDATE code path perform same check as RTM_DELETE does. ok mpi@