Age | Commit message (Collapse) | Author |
|
Shared netlock is not sufficient to call so{r,w}wakeup(). The following
sowakeup() modifies `sb_flags' and knote(9) stuff. Unfortunately, we
can't call so{r,w}wakeup() with `inp_mtx' mutex(9) because sowakeup()
also calls pgsigio() which grabs kernel lock.
However, `so*_filtops' callbacks only perform read-only access to the
socket stuff, so it is enough to hold shared netlock only, but the klist
stuff needs to be protected.
This diff introduces `sb_mtx' mutex(9) to protect sockbuf. This time
`sb_mtx' used to protect only `sb_flags' and `sb_klist'.
Now we have soassertlocked_readonly() and soassertlocked(). The first
one is happy if only shared netlock is held, meanwhile the second wants
`so_lock' or pru_lock() be held together with shared netlock.
To keep soassertlocked*() assertions soft, we need to know mutex(9)
state, so new mtx_owned() macro was introduces. Also, the new optional
(*pru_locked)() handler brings the state of pru_lock().
Tests and ok from bluhm.
|
|
Since inpcb tables for UDP and Raw IP have been split into IPv4 and
IPv6, assert that INP_IPV6 flag is correct instead of checking it.
While there, give the table variable a nicer name.
OK sashan@ mvs@
|
|
Protect all remaining write access to inp_faddr and inp_laddr with
inpcb table mutex. Document inpcb locking for foreign and local
address and port and routing table id. Reading will be made MP
safe by adding per socket rw-locks in a next step.
OK sashan@ mvs@
|
|
ip_output() received inp as parameter. This is only used to lookup
the IPsec level of the socket. Reasoning about MP locking is much
easier if only relevant data is passed around. Convert ip_output()
to receive constant inp_seclevel as argument and mark it as protected
by net lock.
OK mvs@
|
|
receive buffer. As it was done for SS_CANTSENDMORE bit, the definition
kept as is, but now these bits belongs to the `sb_state' of receive
buffer. `sb_state' ored with `so_state' when socket data exporting to the
userland.
ok bluhm@
|
|
optional.
We have no interest on pru_abort() return value. We call it only from
soabort() which is dummy pru_abort() wrapper and has no return value.
Only the connection oriented sockets need to implement (*pru_abort)()
handler. Such sockets are tcp(4) and unix(4) sockets, so remove existing
code for all others, it doesn't called.
ok guenther@
|
|
malloc(9) or pool_get(9).
Pass down a wait flag to pru_attach(). During syscall socket(2)
it is ok to wait, this logic was missing for internet pcb. Pfkey
and route sockets were already waiting.
sonewconn() must not wait when called during TCP 3-way handshake.
This logic has been preserved. Unix domain stream socket connect(2)
can wait until the other side has created the socket to accept.
OK mvs@
|
|
ok bluhm@
|
|
Introduce in{,6}_peeraddr() and use them for inet and inet6 sockets,
except tcp(4) case.
Also remove *_usrreq() handlers.
ok bluhm@
|
|
Introduce in{,6}_sockaddr() functions, and use them for all except tcp(4)
inet sockets. For tcp(4) sockets use tcp_sockaddr() to keep debug ability.
The key management and route domain sockets returns EINVAL error for
PRU_SOCKADDR request, so keep this behaviour for a while instead of make
pru_sockaddr handler optional and return EOPNOTSUPP.
ok bluhm@
|
|
The 'proc *' arg is not used for PRU_CONTROL request, so remove it from
pru_control() wrapper.
Split out {tcp,udp}6_usrreqs from {tcp,udp}_usrreqs and use them for
inet6 case.
ok guenther@ bluhm@
|
|
ok bluhm@
|
|
PRU_SENDOOB request always consumes passed `top' and `control' mbufs. To
avoid dummy m_freem(9) handlers for all protocols release passed mbufs
in the pru_sendoob() EOPNOTSUPP error path.
Also fix `control' mbuf(9) leak in the tcp(4) PRU_SENDOOB error path.
ok bluhm@
|
|
ok bluhm@
|
|
ok bluhm@
|
|
We abort only the sockets which are linked to `so_q' or `so_q0' queues of
listening socket. Such sockets have no corresponding file descriptor and
are not accessed from userland, so PRU_ABORT used to destroy them on
listening socket destruction.
Currently all our sockets support PRU_ABORT request, but actually it
required only for tcp(4) and unix(4) sockets, so i should be optional.
However, they will be removed with separate diff, and this time PRU_ABORT
requests were converted as is.
Also, the socket should be destroyed on PRU_ABORT request, but route and
key management sockets leave it alive. This was also converted as is,
because this wrong code never called.
ok bluhm@
|
|
The former PRU_SEND error path of gre_usrreq() had `control' mbuf(9)
leak. It was fixed in new gre_send().
The former pfkeyv2_send() was renamed to pfkeyv2_dosend().
ok bluhm@
|
|
ok bluhm@
|
|
ok bluhm@
|
|
ok bluhm@
|
|
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@
|
|
ok bluhm@
|
|
ok bluhm@
|
|
ok bluhm@
|
|
For the protocols which don't support request, leave handler NULL. Do the
NULL check within corresponding pru_() wrapper and return EOPNOTSUPP in
such case. This will be done for all upcoming user request handlers.
ok bluhm@ guenther@
|
|
handlers into it. We want to split existing (*pr_usrreq)() to multiple
short handlers for each PRU_ request as it was already done for
PRU_ATTACH and PRU_DETACH. This is the preparation step, (*pr_usrreq)()
split will be done with the following diffs.
Based on reverted diff from guenther@.
ok bluhm@
|
|
NET_RLOCK_IN_IOCTL, which have the same implementation. The R and
W are hard to see, call the new macro NET_LOCK_SHARED. Rename the
opposite assertion from NET_ASSERT_WLOCKED to NET_ASSERT_LOCKED_EXCLUSIVE.
Update some outdated comments about net locking.
OK mpi@ mvs@
|
|
having it return a pointer to something that has a lifetime managed
by a lock without accounting for it or taking a reference count or
anything like that is asking for trouble. copying the address to
caller provded memory while still inside the lock is a lot safer.
discussed with visa@
ok bluhm@ claudio@
|
|
rip_input().
from dhill@
|
|
there it calls sbappendaddr() while holding the raw table mutex.
This ends in sorwakeup() where we finally grab the kernel lock while
holding a mutex. Witness detects this misuse.
Use the same solution as for PCB notify. Collect the affected PCBs
in a temporary list. The list is protected by exclusive net lock.
syzbot+ebe3f03a472fecf5e42e@syzkaller.appspotmail.com
OK claudio@
|
|
for PCB tables. It does not break userland build anymore.
pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@
|
|
previously things that used sendto or similar with raw sockets would
ignore any configured sourceaddr. this made it inconsistent with
other traffic, which in turn makes things confusing to debug if
you're using ping or traceroute (which use raw sockets) to figure
out what's happening to other packets.
the ipv6 equiv already does this too.
ok sthen@ claudio@
|
|
This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.
ok sthen
|
|
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@
|
|
Revert the pr_usrreqs move: syzkaller found a NULL pointer deref
and I won't be available to monitor for followup issues for a bit
|
|
then be shared among protosw structures, following the same basic
direction as NetBSD and FreeBSD for this.
Split PRU_CONTROL out of pr_usrreq into pru_control, giving it the
proper prototype to eliminate the previously necessary casts.
ok mvs@ bluhm@
|
|
usrreq functions move the mbuf m_freem() logic to the release block
instead of distributing it over the switch statement. Then the
goto release in the initial check, whether the pcb still exists,
will not free the mbuf for the PRU_RCVD, PRU_RVCOOB, PRU_SENSE
command.
OK claudio@ mpi@ visa@
Reported-by: syzbot+8e7997d4036ae523c79c@syzkaller.appspotmail.com
|
|
bigger than the IP header len to be valid. With this I can traceroute again.
|
|
with INP_HDRINCL. There is no reason to allow badly constructed packets through
our network stack. Especially since they may trigger diagnostic checks further
down the stack. Now EINVAL is returned instead which was already used for some
checks that happened before.
OK florian@
Reported-by: syzbot+0361ed02deed123667cb@syzkaller.appspotmail.com
|
|
the inpcb apart from the disconnect. Just call soisdisconnected() and
clear the inp->inp_faddr since the socket is still valid after a disconnect.
Problem found by syzkaller via Greg Steuck
OK visa@
Fixes:
Reported-by: syzbot+2cd350dfe5c96f6469f2@syzkaller.appspotmail.com
Reported-by: syzbot+139ac2d7d3d60162334b@syzkaller.appspotmail.com
Reported-by: syzbot+02168317bd0156c13b69@syzkaller.appspotmail.com
Reported-by: syzbot+de8d2459ecf4cdc576a1@syzkaller.appspotmail.com
|
|
It also translated a documented send(2) EACCES case erroneously.
This was too much magic and always prone to errors.
from Jan Klemkow; man page jmc@; OK claudio@
|
|
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@
|
|
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@
|
|
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@
|
|
with sendmsg(2) and MSG_OOB. Sync the code in udp, rip, and
rip6_usrreq. Add an inp NULL check in rip6_usrreq for consistency.
OK benno@ mpi@
|
|
rip{6,}_usrreq() since soreceive() will free it.
ok bluhm@
|
|
For the PRU_CONTROL bit the NET_LOCK surrounds in[6]_control() and
on the ENOTSUPP case we guard the driver if_ioctl functions.
OK mpi@
|
|
is set, pf_find_divert() cannot fail so put an assert there.
Explicitly check all possible divert types, panic in the default
case. For raw sockets call pf_find_divert() before of the socket
loop. Divert reply should not match on TCP or UDP listen sockets.
OK sashan@ visa@
|
|
divert-to or divert-reply was active. If the address was also set,
it meant divert-to. Divert packet used a separate structure. This
is confusing and makes it hard to add new features. It is better
to have a divert type that explicitly says what is configured.
Adapt the pf rule struct in kernel and pfctl, no functional change.
Note that kernel and pfctl have to be updated together.
OK sashan@
|
|
pr_input handlers without KERNEL_LOCK().
ok visa@
|