summaryrefslogtreecommitdiff
path: root/sys/netinet/in_pcb.c
AgeCommit message (Collapse)Author
2022-08-22Use rwlock per inpcb table to protect notify list. The notifyAlexander Bluhm
function may sleep, so holding a mutex is not possible. The same list entry and rwlock is used for UDP multicast and raw IP delivery. By adding a write lock, exclusive netlock is no longer necessary for PCB notify and UDP and raw IP input. OK mvs@
2022-08-21Introduce a mutex per inpcb to serialize access to socket receiveAlexander Bluhm
buffer. Later it may be used to protect more of the PCB or socket. In divert input replace the kernel lock with this mutex. OK mvs@
2022-08-08To make protocol input functions MP safe, internet PCB need protection.Alexander Bluhm
Use their reference counter in more places. The in_pcb lookup functions hold the PCBs in hash tables protected by table->inpt_mtx mutex. Whenever a result is returned, increment the ref count before releasing the mutex. Then the inp can be used as long as neccessary. Unref it at the end of all functions that call in_pcb lookup. As a shortcut, pf may also hold a reference to the PCB. When pf_inp_lookup() returns it, it also incements the ref count and the caller can handle it like the inp from table lookup. OK sashan@
2022-08-06Clean up the netlock macros. Merge NET_RLOCK_IN_SOFTNET andAlexander Bluhm
NET_RLOCK_IN_IOCTL, which have the same implementation. The R and W are hard to see, call the new macro NET_LOCK_SHARED. Rename the opposite assertion from NET_ASSERT_WLOCKED to NET_ASSERT_LOCKED_EXCLUSIVE. Update some outdated comments about net locking. OK mpi@ mvs@
2022-06-28Use btrace(8) to debug reference counting. dt(4) provides a staticAlexander Bluhm
tracepoint for each type of refcnt we have. As a start, add inpcb and tdb refcnt. When the counter changes, btrace may print the actual object, the current counter, the change value and optionally the stack trace. discussed with visa@; OK mpi@
2022-06-06Simplify solock() and sounlock(). There is no reason to return a valueClaudio Jeker
for the lock operation and to pass a value to the unlock operation. sofree() still needs an extra flag to know if sounlock() should be called or not. But sofree() is called less often and mostly without keeping the lock. OK mpi@ mvs@
2022-05-15have in_pcbselsrc copy the selected address to memory provided by the caller.David Gwynne
having it return a pointer to something that has a lifetime managed by a lock without accounting for it or taking a reference count or anything like that is asking for trouble. copying the address to caller provded memory while still inside the lock is a lot safer. discussed with visa@ ok bluhm@ claudio@
2022-04-14Relax address availability check for multicast binds.Claudio Jeker
While it makes sense to limit bind(2) of unicast addresses that overlap each other to be all from the same UID (like 0.0.0.0:53 and 127.0.0.1:53) it makes little sense for multicast. Multicast is delivered to all sockets that match so there is no risk of someone stealing traffic from someone else. This should hopefully help with mDNS as reported by robert@ OK deraadt@ bluhm@
2022-03-22Fix whitespace.Alexander Bluhm
2022-03-21Header netinet/in_pcb.h includes sys/mutex.h now. Recommit mutexAlexander Bluhm
for PCB tables. It does not break userland build anymore. pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. To run pf in parallel, make parts of the stack MP safe. Protect the list and hashes in the PCB tables with a mutex. Note that the protocol notify functions may call pf via tcp_output(). As the pf lock is a sleeping rw_lock, we must not hold a mutex. To solve this for now, collect these PCBs in inp_notify list and protect it with exclusive netlock. OK sashan@
2022-03-21treat 255.255.255.255 like an mcast address in in_pcbselsrc.David Gwynne
this allows the IP_MULTICAST_IF sockopt to specify which address you want to send a limited broadcast (255.255.255.255) packet out of. requested by and ok claudio@
2022-03-14Unbreak the tree, revert commitid aZ8fm4iaUnTCc0ulTheo Buehler
This reverts the commit protecting the list and hashes in the PCB tables with a mutex since the build of sysctl(8) breaks, as found by kettenis. ok sthen
2022-03-14pf_socket_lookup() calls in_pcbhashlookup() in the PCB layer. ToAlexander Bluhm
run pf in parallel, make parts of the stack MP safe. Protect the list and hashes in the PCB tables with a mutex. Note that the protocol notify functions may call pf via tcp_output(). As the pf lock is a sleeping rw_lock, we must not hold a mutex. To solve this for now, collect these PCBs in inp_notify list and protect it with exclusive netlock. OK sashan@
2022-03-04in_pcbinit() is called during boot. There malloc(9) cannot fail,Alexander Bluhm
but would panic instead of waiting. Remove needless error handling. OK mvs@
2022-03-02Use NULL instead of 0 for pointer.Alexander Bluhm
2022-03-01Remove outdated comment about v4-mapped v6 addresses. They are notAlexander Bluhm
supported anymore.
2021-10-25The implementation of ipsp_spd_inp() is side effect free. It mayAlexander Bluhm
set the error output parameter or return a tdb. Both are ignored in in_pcbconnect(). Remove the code that does nothing. OK tobhe@ jca@ mvs@
2021-03-10spellingJonathan Gray
ok gnezdo@ semarie@ mpi@
2021-02-11Swap faddr/laddr and fport/lport arguments in call to stoeplitz_ipXport().Patrick Wildt
Technically the whole point of the stoeplitz API is that it's symmetric, meaning that the order of addresses and ports doesn't matter and will produce the same hash value. Coverity CID 1501717 ok dlg@
2021-01-25if stoeplitz is enabled, use it to provide a flowid for tcp packets.David Gwynne
drivers that implement rss and multiple rings depend on the symmetric toeplitz code, and use it to generate a key that decides with rx ring a packet lands on. if the toeplitz code is enabled, this diff has the pcb and tcp layer use the toeplitz code to generate a flowid for packets they send, which in turn is used to pick a tx ring. because the nic and the stack use the same key, the tx and rx sides end up with the same hash/flowid. at the very least this means that the same rx and tx queue pair on a particular nic are used for both sides of the connection. as the stack becomes more parallel, it will also help keep both sides of the tcp connection processing in the one place.
2020-11-07Rework source IP address setting.denis
- Move most of the processing out of rtable.c (reasonnable tb@, ok bluhm@) - Remove memory allocation, store pointer to existing ifaddr - Fix tunnel interface handling looks fine mpi@
2020-11-05Replace wrong cast with satosin.denis
Advised by bluhm@
2020-10-29Add feature to force the selection of source IP addressdenis
Based/previous work on an idea from deraadt@ Input from claudio@, djm@, deraadt@, sthen@ OK deraadt@
2020-05-27Connectionless sockets like UDP can be re-connected to a differentAlexander Bluhm
address. In that case, the linking to the pf state must be dissolved as the latter still contains the old address. If it is a divert state, also remove the state as any divert state must be associated with a matching socket. Call pf_remove_divert_state() and pf_inp_unlink() from in_pcbconnect(). reported by Tim Kuijsten; OK sashan@ claudio@
2019-07-15Initialize struct inpcb pool not on demand, but during initialization.Alexander Bluhm
Removes a global variable and avoids MP problems. OK mpi@ visa@
2018-10-04Revert the inpcb table mutex commit. It triggers a witness panicAlexander Bluhm
in raw IP delivery and UDP broadcast loops. There inpcbtable_mtx is held and sorwakeup() is called within the loop. As sowakeup() grabs the kernel lock, we have a lock ordering problem. found by Hrvoje Popovski; OK deraadt@ mpi@
2018-09-20As a step towards per inpcb or socket locks, remove the net lockAlexander Bluhm
for netstat -a. Introduce a global mutex that protects the tables and hashes for the internet PCBs. To detect detached PCB, set its inp_socket field to NULL. This has to be protected by a per PCB mutex. The protocol pointer has to be protected by the mutex as netstat uses it. Always take the kernel lock in in_pcbnotifyall() and in6_pcbnotify() before the table mutex to avoid lock ordering problems in the notify functions. OK visa@
2018-09-14In general it is a bad idea to use one random secret for two things.Alexander Bluhm
The inet PCB uses one hash with local and foreign addresses, and one with local port numbers. Give both hashes separate keys. Also document the struct fields. OK visa@
2018-09-13Add reference counting for inet pcb, this will be needed when weAlexander Bluhm
start locking the socket. An inp can be referenced by the PCB queue and hashes, by a pf mbuf header, or by a pf state key. OK visa@
2018-09-11Make the distribution of in_ and in6_ functions in in_pcb.c andAlexander Bluhm
in6_pcb.c consistent, to ease comparing the code. Move all inet6 functions to in6_. Bring functions in both source files in same order. Cleanup the include section. Now in_pcb.c is a superset of in6_pcb.c. The latter contains all the special implementations. Just moving arround, no code change intended. OK mpi@
2018-09-10Remove useless INPCBHASH() macros. Just expand them.Alexander Bluhm
OK stsp@
2018-09-07Explain the special case for redirect to localhost in a comment.Alexander Bluhm
input and OK claudio@
2018-07-11Retire RTM_LOSING, it no longer makes sense and on busy servers theClaudio Jeker
route socket is flooded with those messages. Instead maek sure that the removal of the dynamic route that can happen is actually also sent to the routing socket. OK mpi@ henning@
2018-06-14In in_pcballoc() finish the inp initialization before adding it toAlexander Bluhm
the global inpcb queue and hashes. OK visa@ mpi@ as part of a larger diff
2018-06-14Assert that the INP_IPV6 in in6_pcbconnect() is correct. Just callAlexander Bluhm
in_pcbconnect() to avoid the address family maze in syn_cache_get(). input claudio@; OK mpi@
2018-06-11Do not unlock the KERNEL_LOCK() unconditionally in sounlock().Martin Pieuchot
Instead introduce two flags to deal with global lock recursion. This is necessary until we get per-socket lock. Req. by and ok visa@
2018-06-11Push the KERNEL_LOCK() inside route_input().Martin Pieuchot
ok visa@, tb@
2018-06-07The global zero addresses must not change, mark them constant.Alexander Bluhm
OK tb@ visa@
2018-06-06Pass the socket to sounlock(), this prepare the terrain for per-socketMartin Pieuchot
locking. ok visa@, bluhm@
2018-06-03Use variable names for rtable and rdomain consistently in the in_pcbAlexander Bluhm
functions. discussed with and OK mpi@ visa@
2018-06-03Rename the incpb table field inpt_hash to inpt_mask as it containsAlexander Bluhm
the hashmask. For the resize calculations it is clearer to use the field inpt_size. OK visa@ mpi@
2018-06-02Cleanup the in_pcbnotifymiss diagnostic printfs. Always print theAlexander Bluhm
rdomain. Move the printf to the end of the pcb lookup functions. OK tb@ mpi@ visa@
2018-06-02The function in_pcbrehash() does not modify the pcb table queue.Alexander Bluhm
So in in_pcbresize() the variant without _SAFE of the TAILQ_FOREACH macro is sufficient. OK tb@ mpi@ visa@
2018-03-30Store the allocation size in inpcbhead for free().David Hill
OK visa@
2018-02-19Remove almost unused `flags' argument of suser().Martin Pieuchot
The account flag `ASU' will no longer be set but that makes suser() mpsafe since it no longer mess with a per-process field. No objection from millert@, ok tedu@, bluhm@
2017-12-04Make divert lookup similar for all socket types. If PF_TAG_DIVERTEDAlexander Bluhm
is set, pf_find_divert() cannot fail so put an assert there. Explicitly check all possible divert types, panic in the default case. For raw sockets call pf_find_divert() before of the socket loop. Divert reply should not match on TCP or UDP listen sockets. OK sashan@ visa@
2017-12-01Fix white spaces and shorten long line.Alexander Bluhm
2017-12-01Simplify the reverse PCB lookup logic. The PF_TAG_TRANSLATE_LOCALHOSTAlexander Bluhm
security check prevents that the user accidentally configures redirect where a divert-to would be appropriate. Instead of spreading the logic into tcp and udp input, check the flag during PCB listen lookup. This also reduces parameters of in_pcblookup_listen(). OK visa@
2017-08-11Validate sockaddr from userland in central functions. This resultsAlexander Bluhm
in common checks for unix, inet, inet6 instead of partial checks here and there. Some checks are already done at a higher layer, but better be paranoid with user input. OK claudio@ millert@
2017-08-04The in_pcbhashlookup() in in_pcbconnect() enforces that the 4 tupelAlexander Bluhm
of src/dst ip/port is unique for TCP. But if the socket is not bound, the automatic bind by connect happens after the check. If the socket has the SO_REUSEADDR flag, in_pcbbind() may select an existing local port. Then we had two colliding TCP PCBs. This resulted in a packet storm of ACK packets on loopback. The softnet task was constantly holding the netlock and has a high priority, so the system hung. Do the in_pcbhashlookup() again after in_pcbbind(). This creates sporadic "connect: Address already in use" errors instead of a hang. bug report and testing Olivier Antoine; OK mpi@