Age | Commit message (Collapse) | Author |
|
We have too many timeout(9) initialization functions and macros.
Let's slim it down and combine some interfaces.
- Remove timeout_set_kclock(), TIMEOUT_INITIALIZER_KCLOCK().
- Expand timeout_set_flags(), TIMEOUT_INITIALIZER_FLAGS() to accept
an additional "kclock" parameter.
- Reimplement timeout_set(), timeout_set_proc() with timeout_set_flags().
- Reimplement TIMEOUT_INITIALIZER() with TIMEOUT_INITIALIZER_FLAGS().
- Update the sole timeout_set_flags() user to pass a kclock parameter.
- Update the sole timeout_set_kclock() user to call timeout_set_flags().
- Update the sole TIMEOUT_INITIALIZER_FLAGS() user to provide a kclock
parameter.
The timeout(9) code is now a bit out of sync with the manpage. This
will be corrected in a subsequent commit.
ok kn@
|
|
OK dlg@
|
|
(SRTT) instead of the timestamp option. Since the timestamp option is
disabled on some OSs (eg. Windows) or dropped by some
firewalls/routers, in such a case the window size had been fixed at
16KB, this limits throughput at very low on high latency networks.
Also replace "tcp_now" from 2HZ tick counter to binuptime in
milliseconds to calculate the SRTT better.
tested by krw matthieu jmatthew dlg djm stu stsp
ok claudio
|
|
ok deraadt@
|
|
optional.
We have no interest on pru_abort() return value. We call it only from
soabort() which is dummy pru_abort() wrapper and has no return value.
Only the connection oriented sockets need to implement (*pru_abort)()
handler. Such sockets are tcp(4) and unix(4) sockets, so remove existing
code for all others, it doesn't called.
ok guenther@
|
|
malloc(9) or pool_get(9).
Pass down a wait flag to pru_attach(). During syscall socket(2)
it is ok to wait, this logic was missing for internet pcb. Pfkey
and route sockets were already waiting.
sonewconn() must not wait when called during TCP 3-way handshake.
This logic has been preserved. Unix domain stream socket connect(2)
can wait until the other side has created the socket to accept.
OK mvs@
|
|
on pru_rcvd() return value.
Drop "pru_rcvd != NULL" check within pru_rcvd() wrapper. We only call it
if the socket's protocol have PR_WANTRCVD flag set. Such sockets are
route domain, tcp(4) and unix(4) sockets.
ok guenther@ bluhm@
|
|
ok bluhm@
|
|
Naming the list like the struct itself makes for awful grepping.
Call the global variable "ifnetlist" from now on.
There used to be kvm(3) consumers in base picking up this symbol, but those
have long been converted to other interfaces.
A few potential ports users remain, same deal as sys/net/if_var.h r1.116
"Remove struct ifnet's unused if_switchport member": they get bumped.
Previous users pointed out by deraadt
OK bluhm
|
|
provide locking of the PCB. If that is possible, use shared instead
of exclusive netlock in soreceive(). The PCB mutex provides a per
socket lock against multiple soreceive() running in parallel.
Release and regrab both locks in sosleep_nsec().
OK mvs@
|
|
|
|
Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.
Also remove *_usrreq() handlers.
ok bluhm@
|
|
found by Hrvoje Popovski with witness; OK mvs@
|
|
removes pressure from the exclusive netlock in tcp_slowtimo().
Reading is done atomically. Ensure that the tcp_now value is read
only once per function to provide consistent time.
OK yasuoka@
|
|
Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.
The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.
ok bluhm@
|
|
The 'proc *' arg is not used for PRU_CONTROL request, so remove it from
pru_control() wrapper.
Split out {tcp,udp}6_usrreqs from {tcp,udp}_usrreqs and use them for
inet6 case.
ok guenther@ bluhm@
|
|
ok bluhm@
|
|
PRU_SENDOOB request always consumes passed `top' and `control' mbufs. To
avoid dummy m_freem(9) handlers for all protocols release passed mbufs
in the pru_sendoob() EOPNOTSUPP error path.
Also fix `control' mbuf(9) leak in the tcp(4) PRU_SENDOOB error path.
ok bluhm@
|
|
so the public API is in_pcblookup() and in_pcblookup_listen(). For
internal use introduce in_pcbhash_insert() and in_pcbhash_lookup()
to avoid code duplication. Routing domain is unsigned, change the
type to u_int.
OK mvs@
|
|
OK claudio@ mvs@
|
|
This logic was introduced in 2013 when pf checksum fixup was
temporarily removed. After restoring the pf bahavior in 2016, it
should not be necessary anymore.
OK claudio@
|
|
introduction of tcp_send.
OK mvs@, bluhm@, gnezdo@
Reported-by: syzbot+e859fd353c90eeac26f8@syzkaller.appspotmail.com
|
|
ok bluhm@
|
|
There was a crash due to use after free of the ifa although it is
ref counted. As ifa_refcnt was a simple integer increment, there
may be a path where multiple CPUs access it concurrently. So change
to struct refcnt which is MP safe and provides dt(4) leak debugging.
Link level address for IPsec enc(4) and various MPLS interfaces is
special. There ifa is part of struct sc. Use refcount anyway and
add a panic to detect use after free.
bug report stsp@; OK mvs@
|
|
ok bluhm@
|
|
We abort only the sockets which are linked to `so_q' or `so_q0' queues of
listening socket. Such sockets have no corresponding file descriptor and
are not accessed from userland, so PRU_ABORT used to destroy them on
listening socket destruction.
Currently all our sockets support PRU_ABORT request, but actually it
required only for tcp(4) and unix(4) sockets, so i should be optional.
However, they will be removed with separate diff, and this time PRU_ABORT
requests were converted as is.
Also, the socket should be destroyed on PRU_ABORT request, but route and
key management sockets leave it alive. This was also converted as is,
because this wrong code never called.
ok bluhm@
|
|
The former PRU_SEND error path of gre_usrreq() had `control' mbuf(9)
leak. It was fixed in new gre_send().
The former pfkeyv2_send() was renamed to pfkeyv2_dosend().
ok bluhm@
|
|
ok bluhm@
|
|
ok bluhm@
|
|
are protected by netlock. They are only used as shortcut in fast
timer.
Common prefix in mld6.c is mld6.
OK mvs@
|
|
ok bluhm@
|
|
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@
|
|
ok bluhm@
|
|
are status variables that can be used to avoid locking if timers
are not running. This should reduce contention on exclusive netlock.
OK kn@ mvs@
|
|
ok bluhm@
|
|
ok bluhm@
|
|
ok bluhm@
|
|
reassembly and IPv6 hob-by-hob header chain processing out of
ip_local() and ip6_local(), they are almost empty stubs. The check
for local deliver loop in ip_ours() and ip6_ours() is sufficient.
Recover mbuf offset and next protocol directly in ipintr() and
ip6intr().
OK mvs@
|
|
buffer. Later it may be used to protect more of the PCB or socket.
In divert input replace the kernel lock with this mutex.
OK mvs@
|
|
For the protocols which don't support request, leave handler NULL. Do the
NULL check within corresponding pru_() wrapper and return EOPNOTSUPP in
such case. This will be done for all upcoming user request handlers.
ok bluhm@ guenther@
|
|
code is MP safe and moves from ip6_local() to ip6_ours(). If there
are any options, store the chain offset and next protocol in a mbuf
tag. When dequeuing without tag, it is a regular IPv6 header. As
mbuf tags degrade performance, use them only if a hop-by-hop header
is present. Such packets are rare and pf drops them by default.
OK mvs@
|
|
This function will help to avoid code duplication when tcp_usrreq() will
be divided to multiple handlers.
ok bluhm@
|
|
handlers into it. We want to split existing (*pr_usrreq)() to multiple
short handlers for each PRU_ request as it was already done for
PRU_ATTACH and PRU_DETACH. This is the preparation step, (*pr_usrreq)()
split will be done with the following diffs.
Based on reverted diff from guenther@.
ok bluhm@
|
|
OK claudio@
|
|
do nearly the same thing, so they should look similar.
OK sashan@
|
|
to out of memory. Use a generic idropped counter for those.
OK mvs@
|
|
TCP_INFO provides a lot of information about the TCP session of this socket.
Many processes like to peek at the rtt of a connection but this also provides
a lot of more special info for use by e.g. tcpbench(1).
While the basic minimal info is available all the time the more specific
data is only populated for privileged processes. This is done to not share
data back to userland that may allow to attack a session.
TCP_INFO is available to pledge "inet" since pledged processes like chrome
tend to use TCP_INFO when available.
OK bluhm@
|
|
Use their reference counter in more places.
The in_pcb lookup functions hold the PCBs in hash tables protected
by table->inpt_mtx mutex. Whenever a result is returned, increment
the ref count before releasing the mutex. Then the inp can be used
as long as neccessary. Unref it at the end of all functions that
call in_pcb lookup.
As a shortcut, pf may also hold a reference to the PCB. When
pf_inp_lookup() returns it, it also incements the ref count and the
caller can handle it like the inp from table lookup.
OK sashan@
|
|
NET_RLOCK_IN_IOCTL, which have the same implementation. The R and
W are hard to see, call the new macro NET_LOCK_SHARED. Rename the
opposite assertion from NET_ASSERT_WLOCKED to NET_ASSERT_LOCKED_EXCLUSIVE.
Update some outdated comments about net locking.
OK mpi@ mvs@
|
|
of significant bits on big endian machines. Bug has been introduced
in previous commit by removing the =! 0 check.
OK mvs@
|