Age | Commit message (Collapse) | Author |
|
immutable, we don't need to reload it again.
ok bluhm@
|
|
grabs the exclusive netlock and that is sufficent for in_arpinput()
and arpcache().
with kn@; OK mvs@; tested by Hrvoje Popovski
|
|
response. Implement analog sysctl net.inet6.icmp6.nd6_queued for
ND6 to reduce places where mbufs can hide within the kernel.
Atomic operations operate on unsigned int. Make the type of total
hold queue length consistent.
Use atomic load to read the value for the sysctl. This clarifies
why no lock around sysctl_rdint() is needed.
OK mvs@ kn@
|
|
ND6 did only hold a single packet. Unify the logic and add a mbuf
hold queue to struct llinfo_nd6. This is MP safe and queue limits
are tracked with atomic operations. New function if_mqoutput() has
common code for ARP and ND6. ln_saddr6 holds the source address
of the requesting packet. That is easier than fiddling with mbuf
queue in nd6_ns_output().
OK kn@
|
|
checksum may be wrong. Locally generated packets diverted by pf
out rules may have no checksum due to to hardware offloading.
Calculate the checksum in that case.
OK mvs@ sashan@
|
|
milliseconds, which is the same unit of tcp_now(). However, keep the
unit of sysctl variables in seconds and convert their unit in
tcp_sysctl(). Additionally revert TCPTV_SRTTDFLT back to 3 seconds,
which was mistakenly changed to 1.5 seconds by tcp_timer.h 1.19.
ok claudio
|
|
by using a bad option length. This bug is only reachable if both
pf IP option check is disabled and IP source routing is enabled.
reported by @fuzzingrf Erg Noor
OK claudio@ deraadt@
|
|
ok miod@ millert@
|
|
This worked because the global head variable is zero-initialised,
but one must not rely on that.
OK mvs claudio
|
|
public header file. Makes debugging with special kernels easier.
|
|
with tweaks from mvs@, mpi@, dlg@, naddy@ and bluhm@
"go for it" deraadt@
ok naddy@, mvs@
|
|
No functional change.
|
|
No functional changes.
|
|
rwlock(9) acquisition.
Reported-by: syzbot+fbe3acb4886adeef31e0@syzkaller.appspotmail.com
|
|
easily repeatable ASSERT happens seconds after starting compiles over nfs.
|
|
with tweaks from mvs@, mpi@ and dlg@
ok mvs@, dlg@
|
|
receive buffer. As it was done for SS_CANTSENDMORE bit, the definition
kept as is, but now these bits belongs to the `sb_state' of receive
buffer. `sb_state' ored with `so_state' when socket data exporting to the
userland.
ok bluhm@
|
|
serialize arpcache() and arpresolve(). In fact, net stack already has
sleep points, so the rwlock(9) is better here because we avoid
intersection with the rest of kernel locked paths. Also this new lock
assumed to use to route layer protection instead of netlock.
Hrvoje Popovski had tested this diff and found no visible performance
impact.
ok bluhm@
|
|
This time, socket's buffer lock requires solock() to be held. As a part of
socket buffers standalone locking work, move socket state bits which
represent its buffers state to per buffer state.
Opposing the previous reverted diff, the SS_CANTSENDMORE definition left
as is, but it used only with `sb_state'. `sb_state' ored with original
`so_state' when socket's data exported to the userland, so the ABI kept as
it was.
Inputs from deraadt@.
ok bluhm@
|
|
listen port is not bound to port 0. With a matching pf divert-to
rule this assumption is no longer true and could crash the kernel
with kassert. In both pf and stack drop TCP packets with destination
port 0 before they can do harm.
OK sashan@ claudio@
|
|
New warning -Warray-parameter is a bit overzealous.
ok millert@ tb@
|
|
The tcp timer is not supposed to run during suspend but getnsecuptime() does
and because of this sessions with TCP_KEEPALIVE on reset after a few hours
of sleep.
Problem noticed by mlarkin@, investigation by yasuoka@ additional testing jca@
OK yasuoka@ jca@ cheloha@
|
|
|
|
socket buffers standalone locking work, move socket state bits which
represent its buffers state to per buffer state. Introduce `sb_state' and
turn SS_CANTSENDMORE to SBS_CANTSENDMORE. This bit will be processed on
`so_snd' buffer only.
Move SS_CANTRCVMORE and SS_RCVATMARK bits with separate diff to make
review easier and exclude possible so_rcv/so_snd mistypes.
Also, don't adjust the remaining SS_* bits right now.
ok millert@
|
|
type from short to int. Also switch local variables holding temporary
timer values from short to int.
OK yasuoka
|
|
of tcp time. This fixes the retransmit timer of syn_cache which was
broken. reported by naddy, input dlg, test jca
ok jca
|
|
|
|
in{,6}_addmulti(). Since kernel lock is no more taken while following
setsockopt() path, it should be taken in this place. Corresponding
in{,6}_delmulti() already acquire kernel lock around (*if_ioctl)().
Problem reported and diff tested by weerd@
ok kn@ bluhm@
|
|
so->so_state is already read without kernel lock inside soo_ioctl()
which calls pru_control() aka in6_control() and in_control().
OK mvs
|
|
|
|
We have too many timeout(9) initialization functions and macros.
Let's slim it down and combine some interfaces.
- Remove timeout_set_kclock(), TIMEOUT_INITIALIZER_KCLOCK().
- Expand timeout_set_flags(), TIMEOUT_INITIALIZER_FLAGS() to accept
an additional "kclock" parameter.
- Reimplement timeout_set(), timeout_set_proc() with timeout_set_flags().
- Reimplement TIMEOUT_INITIALIZER() with TIMEOUT_INITIALIZER_FLAGS().
- Update the sole timeout_set_flags() user to pass a kclock parameter.
- Update the sole timeout_set_kclock() user to call timeout_set_flags().
- Update the sole TIMEOUT_INITIALIZER_FLAGS() user to provide a kclock
parameter.
The timeout(9) code is now a bit out of sync with the manpage. This
will be corrected in a subsequent commit.
ok kn@
|
|
OK dlg@
|
|
(SRTT) instead of the timestamp option. Since the timestamp option is
disabled on some OSs (eg. Windows) or dropped by some
firewalls/routers, in such a case the window size had been fixed at
16KB, this limits throughput at very low on high latency networks.
Also replace "tcp_now" from 2HZ tick counter to binuptime in
milliseconds to calculate the SRTT better.
tested by krw matthieu jmatthew dlg djm stu stsp
ok claudio
|
|
ok deraadt@
|
|
optional.
We have no interest on pru_abort() return value. We call it only from
soabort() which is dummy pru_abort() wrapper and has no return value.
Only the connection oriented sockets need to implement (*pru_abort)()
handler. Such sockets are tcp(4) and unix(4) sockets, so remove existing
code for all others, it doesn't called.
ok guenther@
|
|
malloc(9) or pool_get(9).
Pass down a wait flag to pru_attach(). During syscall socket(2)
it is ok to wait, this logic was missing for internet pcb. Pfkey
and route sockets were already waiting.
sonewconn() must not wait when called during TCP 3-way handshake.
This logic has been preserved. Unix domain stream socket connect(2)
can wait until the other side has created the socket to accept.
OK mvs@
|
|
on pru_rcvd() return value.
Drop "pru_rcvd != NULL" check within pru_rcvd() wrapper. We only call it
if the socket's protocol have PR_WANTRCVD flag set. Such sockets are
route domain, tcp(4) and unix(4) sockets.
ok guenther@ bluhm@
|
|
ok bluhm@
|
|
Naming the list like the struct itself makes for awful grepping.
Call the global variable "ifnetlist" from now on.
There used to be kvm(3) consumers in base picking up this symbol, but those
have long been converted to other interfaces.
A few potential ports users remain, same deal as sys/net/if_var.h r1.116
"Remove struct ifnet's unused if_switchport member": they get bumped.
Previous users pointed out by deraadt
OK bluhm
|
|
provide locking of the PCB. If that is possible, use shared instead
of exclusive netlock in soreceive(). The PCB mutex provides a per
socket lock against multiple soreceive() running in parallel.
Release and regrab both locks in sosleep_nsec().
OK mvs@
|
|
|
|
Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.
Also remove *_usrreq() handlers.
ok bluhm@
|
|
found by Hrvoje Popovski with witness; OK mvs@
|
|
removes pressure from the exclusive netlock in tcp_slowtimo().
Reading is done atomically. Ensure that the tcp_now value is read
only once per function to provide consistent time.
OK yasuoka@
|
|
Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.
The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.
ok bluhm@
|
|
The 'proc *' arg is not used for PRU_CONTROL request, so remove it from
pru_control() wrapper.
Split out {tcp,udp}6_usrreqs from {tcp,udp}_usrreqs and use them for
inet6 case.
ok guenther@ bluhm@
|
|
ok bluhm@
|
|
PRU_SENDOOB request always consumes passed `top' and `control' mbufs. To
avoid dummy m_freem(9) handlers for all protocols release passed mbufs
in the pru_sendoob() EOPNOTSUPP error path.
Also fix `control' mbuf(9) leak in the tcp(4) PRU_SENDOOB error path.
ok bluhm@
|
|
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@
|
|
OK claudio@ mvs@
|