summaryrefslogtreecommitdiff
path: root/sys/netinet
AgeCommit message (Collapse)Author
2023-04-08Do not reload `inp' in gre_send(). The pointer to PCB of raw socket isVitaliy Makkoveev
immutable, we don't need to reload it again. ok bluhm@
2023-04-07Remove kernel locks from the ARP input path. Caller if_netisr()Alexander Bluhm
grabs the exclusive netlock and that is sufficent for in_arpinput() and arpcache(). with kn@; OK mvs@; tested by Hrvoje Popovski
2023-04-05ARP has a sysctl to show the number of packets waiting for an arpAlexander Bluhm
response. Implement analog sysctl net.inet6.icmp6.nd6_queued for ND6 to reduce places where mbufs can hide within the kernel. Atomic operations operate on unsigned int. Make the type of total hold queue length consistent. Use atomic load to read the value for the sysctl. This clarifies why no lock around sysctl_rdint() is needed. OK mvs@ kn@
2023-04-05ARP has a queue of packets that should be sent after name resolution.Alexander Bluhm
ND6 did only hold a single packet. Unify the logic and add a mbuf hold queue to struct llinfo_nd6. This is MP safe and queue limits are tracked with atomic operations. New function if_mqoutput() has common code for ARP and ND6. ln_saddr6 holds the source address of the requesting packet. That is easier than fiddling with mbuf queue in nd6_ns_output(). OK kn@
2023-04-04When sending IP packets to userland with divert-packet rules, theAlexander Bluhm
checksum may be wrong. Locally generated packets diverted by pf out rules may have no checksum due to to hardware offloading. Calculate the checksum in that case. OK mvs@ sashan@
2023-03-14To avoid misunderstanding, keep variables for tcp keepalive inYASUOKA Masahiko
milliseconds, which is the same unit of tcp_now(). However, keep the unit of sysctl variables in seconds and convert their unit in tcp_sysctl(). Additionally revert TCPTV_SRTTDFLT back to 3 seconds, which was mistakenly changed to 1.5 seconds by tcp_timer.h 1.19. ok claudio
2023-03-08An invalid source routing IP option could overwrite kernel memoryAlexander Bluhm
by using a bad option length. This bug is only reachable if both pf IP option check is disabled and IP source routing is enabled. reported by @fuzzingrf Erg Noor OK claudio@ deraadt@
2023-03-08Delete obsolete /* ARGSUSED */ lint comments.Philip Guenther
ok miod@ millert@
2023-03-04properly initialise LIST headKlemens Nanni
This worked because the global head variable is zero-initialised, but one must not rely on that. OK mvs claudio
2023-02-07Remove needless #ifdef INET6 from struct ether_extracted field inAlexander Bluhm
public header file. Makes debugging with special kernels easier.
2023-02-06consolidate mbuf header parsing on device driver layerJan Klemkow
with tweaks from mvs@, mpi@, dlg@, naddy@ and bluhm@ "go for it" deraadt@ ok naddy@, mvs@
2023-01-31Remove the last ones route lock references from comments.Vitaliy Makkoveev
No functional change.
2023-01-31Route lock was reverted, adjust forgotten commentary.Vitaliy Makkoveev
No functional changes.
2023-01-28Revert the `rt_lock' rwlock(9) diff to fix the recursiveVitaliy Makkoveev
rwlock(9) acquisition. Reported-by: syzbot+fbe3acb4886adeef31e0@syzkaller.appspotmail.com
2023-01-26backing "consolidate mbuf header parsing on device driver layer"Theo de Raadt
easily repeatable ASSERT happens seconds after starting compiles over nfs.
2023-01-24consolidate mbuf header parsing on device driver layerJan Klemkow
with tweaks from mvs@, mpi@ and dlg@ ok mvs@, dlg@
2023-01-22Move SS_CANTRCVMORE and SS_RCVATMARK bits from `so_state' to `sb_state' ofVitaliy Makkoveev
receive buffer. As it was done for SS_CANTSENDMORE bit, the definition kept as is, but now these bits belongs to the `sb_state' of receive buffer. `sb_state' ored with `so_state' when socket data exporting to the userland. ok bluhm@
2023-01-21Introduce `rt_lock' rwlock(9) and use it instead of kernel lock toVitaliy Makkoveev
serialize arpcache() and arpresolve(). In fact, net stack already has sleep points, so the rwlock(9) is better here because we avoid intersection with the rest of kernel locked paths. Also this new lock assumed to use to route layer protection instead of netlock. Hrvoje Popovski had tested this diff and found no visible performance impact. ok bluhm@
2023-01-21Introduce per-sockbuf `sb_state' to use it with SS_CANTSENDMORE.Vitaliy Makkoveev
This time, socket's buffer lock requires solock() to be held. As a part of socket buffers standalone locking work, move socket state bits which represent its buffers state to per buffer state. Opposing the previous reverted diff, the SS_CANTSENDMORE definition left as is, but it used only with `sb_state'. `sb_state' ored with original `so_state' when socket's data exported to the userland, so the ABI kept as it was. Inputs from deraadt@. ok bluhm@
2023-01-12Binding the accept socket in TCP input relies on the fact that theAlexander Bluhm
listen port is not bound to port 0. With a matching pf divert-to rule this assumption is no longer true and could crash the kernel with kassert. In both pf and stack drop TCP packets with destination port 0 before they can do harm. OK sashan@ claudio@
2022-12-27Fix array bounds mismatch with clang 15Patrick Wildt
New warning -Warray-parameter is a bit overzealous. ok millert@ tb@
2022-12-13In tcp_now() switch from getnsecuptime() to getnsecruntime()Claudio Jeker
The tcp timer is not supposed to run during suspend but getnsecuptime() does and because of this sessions with TCP_KEEPALIVE on reset after a few hours of sleep. Problem noticed by mlarkin@, investigation by yasuoka@ additional testing jca@ OK yasuoka@ jca@ cheloha@
2022-12-12Revert sb_state changes to unbreak tree.Theo Buehler
2022-12-11This time, socket's buffer lock requires solock() to be held. As a part ofVitaliy Makkoveev
socket buffers standalone locking work, move socket state bits which represent its buffers state to per buffer state. Introduce `sb_state' and turn SS_CANTSENDMORE to SBS_CANTSENDMORE. This bit will be processed on `so_snd' buffer only. Move SS_CANTRCVMORE and SS_RCVATMARK bits with separate diff to make review easier and exclude possible so_rcv/so_snd mistypes. Also, don't adjust the remaining SS_* bits right now. ok millert@
2022-12-09Some TCP timer units have changed from slowhz to msec and theirAlexander Bluhm
type from short to int. Also switch local variables holding temporary timer values from short to int. OK yasuoka
2022-12-08Convert tcptv_keep_init in milliseconds before comparing other valuesYASUOKA Masahiko
of tcp time. This fixes the retransmit timer of syn_cache which was broken. reported by naddy, input dlg, test jca ok jca
2022-12-07Fix comment by splitting sentance in two. From kn@Claudio Jeker
2022-12-06Add missing kernel lock around (*if_ioctl)() call withinVitaliy Makkoveev
in{,6}_addmulti(). Since kernel lock is no more taken while following setsockopt() path, it should be taken in this place. Corresponding in{,6}_delmulti() already acquire kernel lock around (*if_ioctl)(). Problem reported and diff tested by weerd@ ok kn@ bluhm@
2022-11-19Push kernel lock into pru_control() aka. in6_control() / in_control()Klemens Nanni
so->so_state is already read without kernel lock inside soo_ioctl() which calls pru_control() aka in6_control() and in_control(). OK mvs
2022-11-17style(9) fix. No functional change.Vitaliy Makkoveev
2022-11-11timeout(9): remove timeout_set_kclock(), TIMEOUT_INITIALIZER_KCLOCK()Scott Soule Cheloha
We have too many timeout(9) initialization functions and macros. Let's slim it down and combine some interfaces. - Remove timeout_set_kclock(), TIMEOUT_INITIALIZER_KCLOCK(). - Expand timeout_set_flags(), TIMEOUT_INITIALIZER_FLAGS() to accept an additional "kclock" parameter. - Reimplement timeout_set(), timeout_set_proc() with timeout_set_flags(). - Reimplement TIMEOUT_INITIALIZER() with TIMEOUT_INITIALIZER_FLAGS(). - Update the sole timeout_set_flags() user to pass a kclock parameter. - Update the sole timeout_set_kclock() user to call timeout_set_flags(). - Update the sole TIMEOUT_INITIALIZER_FLAGS() user to provide a kclock parameter. The timeout(9) code is now a bit out of sync with the manpage. This will be corrected in a subsequent commit. ok kn@
2022-11-09Add missin 'e' in comment.Claudio Jeker
OK dlg@
2022-11-07Modify TCP receive buffer size auto scaling to use the smoothed RTTYASUOKA Masahiko
(SRTT) instead of the timestamp option. Since the timestamp option is disabled on some OSs (eg. Windows) or dropped by some firewalls/routers, in such a case the window size had been fixed at 16KB, this limits throughput at very low on high latency networks. Also replace "tcp_now" from 2HZ tick counter to binuptime in milliseconds to calculate the SRTT better. tested by krw matthieu jmatthew dlg djm stu stsp ok claudio
2022-11-05Fix kernel build without IPSEC option.Jan Klemkow
ok deraadt@
2022-10-17Change pru_abort() return type to the type of void and make pru_abort()Vitaliy Makkoveev
optional. We have no interest on pru_abort() return value. We call it only from soabort() which is dummy pru_abort() wrapper and has no return value. Only the connection oriented sockets need to implement (*pru_abort)() handler. Such sockets are tcp(4) and unix(4) sockets, so remove existing code for all others, it doesn't called. ok guenther@
2022-10-03System calls should not fail due to temporary memory shortage inAlexander Bluhm
malloc(9) or pool_get(9). Pass down a wait flag to pru_attach(). During syscall socket(2) it is ok to wait, this logic was missing for internet pcb. Pfkey and route sockets were already waiting. sonewconn() must not wait when called during TCP 3-way handshake. This logic has been preserved. Unix domain stream socket connect(2) can wait until the other side has created the socket to accept. OK mvs@
2022-09-13Change pru_rcvd() return type to the type of void. We have no interestVitaliy Makkoveev
on pru_rcvd() return value. Drop "pru_rcvd != NULL" check within pru_rcvd() wrapper. We only call it if the socket's protocol have PR_WANTRCVD flag set. Such sockets are route domain, tcp(4) and unix(4) sockets. ok guenther@ bluhm@
2022-09-13Do soreceive() with shared netlock for raw sockets.Vitaliy Makkoveev
ok bluhm@
2022-09-08Rename global ifnet TAILQKlemens Nanni
Naming the list like the struct itself makes for awful grepping. Call the global variable "ifnetlist" from now on. There used to be kvm(3) consumers in base picking up this symbol, but those have long been converted to other interfaces. A few potential ports users remain, same deal as sys/net/if_var.h r1.116 "Remove struct ifnet's unused if_switchport member": they get bumped. Previous users pointed out by deraadt OK bluhm
2022-09-05Use shared netlock in soreceive(). The UDP and IP divert layerAlexander Bluhm
provide locking of the PCB. If that is possible, use shared instead of exclusive netlock in soreceive(). The PCB mutex provides a per socket lock against multiple soreceive() running in parallel. Release and regrab both locks in sosleep_nsec(). OK mvs@
2022-09-04spellingJonathan Gray
2022-09-03Move PRU_PEERADDR request to (*pru_peeraddr)().Vitaliy Makkoveev
Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets, except tcp(4) case. Also remove *_usrreq() handlers. ok bluhm@
2022-09-03Initialize TCP mutex forgotten in previous commit.Alexander Bluhm
found by Hrvoje Popovski with witness; OK mvs@
2022-09-03Use a mutex to update tcp_maxidle, tcp_iss, and tcp_now. ThisAlexander Bluhm
removes pressure from the exclusive netlock in tcp_slowtimo(). Reading is done atomically. Ensure that the tcp_now value is read only once per function to provide consistent time. OK yasuoka@
2022-09-03Move PRU_SOCKADDR request to (*pru_sockaddr)()Vitaliy Makkoveev
Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4) inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability. The key management and route domain sockets returns EINVAL error for PRU_SOCKADDR request, so keep this behaviour for a while instead of make pru_sockaddr handler optional and return EOPNOTSUPP. ok bluhm@
2022-09-02Move PRU_CONTROL request to (*pru_control)().Vitaliy Makkoveev
The 'proc *' arg is not used for PRU_CONTROL request, so remove it from pru_control() wrapper. Split out {tcp,udp}6_usrreqs from {tcp,udp}_usrreqs and use them for inet6 case. ok guenther@ bluhm@
2022-09-01Move PRU_CONNECT2 request to (*pru_connect2)().Vitaliy Makkoveev
ok bluhm@
2022-08-31Move PRU_SENDOOB request to (*pru_sendoob)().Vitaliy Makkoveev
PRU_SENDOOB request always consumes passed `top' and `control' mbufs. To avoid dummy m_freem(9) handlers for all protocols release passed mbufs in the pru_sendoob() EOPNOTSUPP error path. Also fix `control' mbuf(9) leak in the tcp(4) PRU_SENDOOB error path. ok bluhm@
2022-08-30Refactor internet PCB lookup function. Rename in_pcbhashlookup()Alexander Bluhm
so the public API is in_pcblookup() and in_pcblookup_listen(). For internal use introduce in_pcbhash_insert() and in_pcbhash_lookup() to avoid code duplication. Routing domain is unsigned, change the type to u_int. OK mvs@
2022-08-30Protect the receive socket buffer in UDP input with per PCB mutex.Alexander Bluhm
OK claudio@ mvs@