Age | Commit message (Collapse) | Author |
|
function may sleep, so holding a mutex is not possible. The same
list entry and rwlock is used for UDP multicast and raw IP delivery.
By adding a write lock, exclusive netlock is no longer necessary
for PCB notify and UDP and raw IP input.
OK mvs@
|
|
buffer. Later it may be used to protect more of the PCB or socket.
In divert input replace the kernel lock with this mutex.
OK mvs@
|
|
Use their reference counter in more places.
The in_pcb lookup functions hold the PCBs in hash tables protected
by table->inpt_mtx mutex. Whenever a result is returned, increment
the ref count before releasing the mutex. Then the inp can be used
as long as neccessary. Unref it at the end of all functions that
call in_pcb lookup.
As a shortcut, pf may also hold a reference to the PCB. When
pf_inp_lookup() returns it, it also incements the ref count and the
caller can handle it like the inp from table lookup.
OK sashan@
|
|
NET_RLOCK_IN_IOCTL, which have the same implementation. The R and
W are hard to see, call the new macro NET_LOCK_SHARED. Rename the
opposite assertion from NET_ASSERT_WLOCKED to NET_ASSERT_LOCKED_EXCLUSIVE.
Update some outdated comments about net locking.
OK mpi@ mvs@
|
|
tracepoint for each type of refcnt we have. As a start, add inpcb
and tdb refcnt. When the counter changes, btrace may print the
actual object, the current counter, the change value and optionally
the stack trace.
discussed with visa@; OK mpi@
|
|
for the lock operation and to pass a value to the unlock operation.
sofree() still needs an extra flag to know if sounlock() should be called
or not. But sofree() is called less often and mostly without keeping the lock.
OK mpi@ mvs@
|
|
having it return a pointer to something that has a lifetime managed
by a lock without accounting for it or taking a reference count or
anything like that is asking for trouble. copying the address to
caller provded memory while still inside the lock is a lot safer.
discussed with visa@
ok bluhm@ claudio@
|
|
While it makes sense to limit bind(2) of unicast addresses that overlap
each other to be all from the same UID (like 0.0.0.0:53 and 127.0.0.1:53)
it makes little sense for multicast. Multicast is delivered to all sockets
that match so there is no risk of someone stealing traffic from someone
else. This should hopefully help with mDNS as reported by robert@
OK deraadt@ bluhm@
|
|
|
|
for PCB tables. It does not break userland build anymore.
pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@
|
|
this allows the IP_MULTICAST_IF sockopt to specify which address
you want to send a limited broadcast (255.255.255.255) packet out
of.
requested by and ok claudio@
|
|
This reverts the commit protecting the list and hashes in the PCB tables
with a mutex since the build of sysctl(8) breaks, as found by kettenis.
ok sthen
|
|
run pf in parallel, make parts of the stack MP safe. Protect the
list and hashes in the PCB tables with a mutex.
Note that the protocol notify functions may call pf via tcp_output().
As the pf lock is a sleeping rw_lock, we must not hold a mutex. To
solve this for now, collect these PCBs in inp_notify list and protect
it with exclusive netlock.
OK sashan@
|
|
but would panic instead of waiting. Remove needless error handling.
OK mvs@
|
|
|
|
supported anymore.
|
|
set the error output parameter or return a tdb. Both are ignored
in in_pcbconnect(). Remove the code that does nothing.
OK tobhe@ jca@ mvs@
|
|
ok gnezdo@ semarie@ mpi@
|
|
Technically the whole point of the stoeplitz API is that it's symmetric,
meaning that the order of addresses and ports doesn't matter and will produce
the same hash value.
Coverity CID 1501717
ok dlg@
|
|
drivers that implement rss and multiple rings depend on the symmetric
toeplitz code, and use it to generate a key that decides with rx
ring a packet lands on. if the toeplitz code is enabled, this diff
has the pcb and tcp layer use the toeplitz code to generate a flowid
for packets they send, which in turn is used to pick a tx ring.
because the nic and the stack use the same key, the tx and rx sides
end up with the same hash/flowid. at the very least this means that
the same rx and tx queue pair on a particular nic are used for both
sides of the connection. as the stack becomes more parallel, it
will also help keep both sides of the tcp connection processing in
the one place.
|
|
- Move most of the processing out of rtable.c (reasonnable tb@, ok bluhm@)
- Remove memory allocation, store pointer to existing ifaddr
- Fix tunnel interface handling
looks fine mpi@
|
|
Advised by bluhm@
|
|
Based/previous work on an idea from deraadt@
Input from claudio@, djm@, deraadt@, sthen@
OK deraadt@
|
|
address. In that case, the linking to the pf state must be dissolved
as the latter still contains the old address. If it is a divert
state, also remove the state as any divert state must be associated
with a matching socket. Call pf_remove_divert_state() and
pf_inp_unlink() from in_pcbconnect().
reported by Tim Kuijsten; OK sashan@ claudio@
|
|
Removes a global variable and avoids MP problems.
OK mpi@ visa@
|
|
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx
is held and sorwakeup() is called within the loop. As sowakeup()
grabs the kernel lock, we have a lock ordering problem.
found by Hrvoje Popovski; OK deraadt@ mpi@
|
|
for netstat -a. Introduce a global mutex that protects the tables
and hashes for the internet PCBs. To detect detached PCB, set its
inp_socket field to NULL. This has to be protected by a per PCB
mutex. The protocol pointer has to be protected by the mutex as
netstat uses it.
Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify()
before the table mutex to avoid lock ordering problems in the notify
functions.
OK visa@
|
|
The inet PCB uses one hash with local and foreign addresses, and
one with local port numbers. Give both hashes separate keys. Also
document the struct fields.
OK visa@
|
|
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@
|
|
in6_pcb.c consistent, to ease comparing the code. Move all inet6
functions to in6_. Bring functions in both source files in same
order. Cleanup the include section. Now in_pcb.c is a superset
of in6_pcb.c. The latter contains all the special implementations.
Just moving arround, no code change intended.
OK mpi@
|
|
OK stsp@
|
|
input and OK claudio@
|
|
route socket is flooded with those messages. Instead maek sure that the
removal of the dynamic route that can happen is actually also sent to
the routing socket.
OK mpi@ henning@
|
|
the global inpcb queue and hashes.
OK visa@ mpi@ as part of a larger diff
|
|
in_pcbconnect() to avoid the address family maze in syn_cache_get().
input claudio@; OK mpi@
|
|
Instead introduce two flags to deal with global lock recursion. This
is necessary until we get per-socket lock.
Req. by and ok visa@
|
|
ok visa@, tb@
|
|
OK tb@ visa@
|
|
locking.
ok visa@, bluhm@
|
|
functions.
discussed with and OK mpi@ visa@
|
|
the hashmask. For the resize calculations it is clearer to use the
field inpt_size.
OK visa@ mpi@
|
|
rdomain. Move the printf to the end of the pcb lookup functions.
OK tb@ mpi@ visa@
|
|
So in in_pcbresize() the variant without _SAFE of the TAILQ_FOREACH
macro is sufficient.
OK tb@ mpi@ visa@
|
|
OK visa@
|
|
The account flag `ASU' will no longer be set but that makes suser()
mpsafe since it no longer mess with a per-process field.
No objection from millert@, ok tedu@, bluhm@
|
|
is set, pf_find_divert() cannot fail so put an assert there.
Explicitly check all possible divert types, panic in the default
case. For raw sockets call pf_find_divert() before of the socket
loop. Divert reply should not match on TCP or UDP listen sockets.
OK sashan@ visa@
|
|
|
|
security check prevents that the user accidentally configures
redirect where a divert-to would be appropriate. Instead of spreading
the logic into tcp and udp input, check the flag during PCB listen
lookup. This also reduces parameters of in_pcblookup_listen().
OK visa@
|
|
in common checks for unix, inet, inet6 instead of partial checks
here and there. Some checks are already done at a higher layer,
but better be paranoid with user input.
OK claudio@ millert@
|
|
of src/dst ip/port is unique for TCP. But if the socket is not
bound, the automatic bind by connect happens after the check. If
the socket has the SO_REUSEADDR flag, in_pcbbind() may select an
existing local port. Then we had two colliding TCP PCBs. This
resulted in a packet storm of ACK packets on loopback. The softnet
task was constantly holding the netlock and has a high priority,
so the system hung.
Do the in_pcbhashlookup() again after in_pcbbind(). This creates
sporadic "connect: Address already in use" errors instead of a hang.
bug report and testing Olivier Antoine; OK mpi@
|